ADVANCED TEXT S I N ECONOMETRIC S General Editors
C. W . J . GRANGE R G
. E . MIZO N
This page intentionally left blank
CO-INTEGRATION, ERROR CORRECTION, AND THE ECONOMETRI C ANALYSIS O F NON-STATIONARY DAT A Anindya Banerjee, Juan J. Dolado, John W. "Galbraith, and Davi d F . Hendry
OXFORD UNIVERSIT Y PRES S
Ms book lias been printed digitally an d produced i n a standard specification in order to ensure its continuing availability
OXFORD UNIVERSITY PRES S
Great Clarendon Street, Oxford 0X 2 6DP Oxford University Press is a department o f the University of Oxford. It furthers the University's objective of excellence in research, scholarship , and education by publishing worldwide in Oxford Ne w York Auckland Bangko k Bueno s Aires Cap e Town Chenna i Dar es Salaam Delh i Hon g Kong Istanbu l Karach i Kolkata Kuala Lumpur Madrid Melbourn e Mexico City Mumba i Nairobi Sao Paulo Shangha i Taipe i Toky o Toronto Oxford i s a registered trade mark of Oxford University Press in the UK and in certain other countrie s Published in the United States by Oxford University Press Inc., New York © A . Banerjee, J.J. Dolado, J.W. Galbraith, and D.F . Hendry 1993 The moral rights of the author have been asserte d Database right Oxfor d University Press (maker) Reprinted 2003 All rights reserved. No part of this publication maybe reproduced, stored in a retrieval system , or transmitted, i n any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriat e reprographics right s organization . Enquiries concerning reproductio n outside the scop e of the above should be sent to the Rights Department, Oxford University Press, at the addres s above You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer ISBN 0-19-828810-7
Preface This boo k i s intended a s a guid e t o th e literatur e o n co-integratio n an d modelling o f integrate d processes . Time-serie s econometric s ha s devel oped rapidl y durin g th e pas t decade , bu t especiall y s o in th e analysi s of non-stationarity. I n particular , th e stud y o f integrate d processe s ha s grown i n importance fro m th e statu s of a n exoti c topic, discusse d onl y in technical journals , t o bein g a n essentia l par t o f th e econometrician' s collection o f techniques . I t ha s thereb y develope d int o a n are a o f interest fo r econometri c theorist s an d applie d econometrician s alike . This boo k i s aime d a t graduat e student s i n economics , applie d econo metricians, econometri c theorists , an d th e genera l audienc e o f econo mists who use empirica l methods t o analys e tim e series. Despite th e growin g importanc e o f th e literatur e o n integratio n an d co-integration, mos t account s o f thi s literatur e remai n confine d t o journals, edite d collection s o f papers , o r surve y papers. Whil e som e o f the survey s ar e quit e detailed , spac e restriction s usuall y d o no t allo w a full expositio n o f man y o f th e theoretica l points . Thi s boo k attempt s t o bridge th e ga p betwee n account s suc h a s surveys , whic h ar e mainl y descriptive, an d account s tha t ar e mainl y theoretical . I t explain s th e important concept s informall y an d als o present s the m formally . Th e asymptotic theor y o f integrate d processe s i s describe d an d th e tool s provided b y thi s theor y ar e use d t o derive , i n som e detail , th e distributions o f estimators. B y taking reader s ste p b y ste p throug h som e of th e mai n derivations , ou r hop e i s t o mak e th e theor y readil y accessible t o a wide audience . We hav e trie d t o mak e th e boo k a s self-containe d a s possible . A knowledge o f econometrics , statistics , an d matri x algebr a a t th e leve l of a final-yea r undergraduat e o r first-yea r graduat e cours e i n econometric s is assumed , bu t otherwis e al l o f th e importan t statistica l concept s an d techniques ar e described . A boo k suc h a s thi s one , whic h discusse s a n are a tha t i s developin g rapidly, i s inevitabl y incomplet e an d run s th e ris k o f no t bein g quit e up-to-date. T o limi t th e tim e take n i n writin g an d revising , w e di d no t seek t o chas e a frontie r tha t wa s expanding in man y directions . Rather , the topic s covere d reflec t ou r view s of issues, models , an d method s tha t are likel y t o remai n importan t fo r som e tim e t o come , man y o f whic h will continue to provid e th e platfor m for futur e research .
Acknowledgements Our boo k wa s writte n i n tw o continents , thre e years , an d fou r univer sities, s o th e lis t o f people , acros s time , space , an d departments , t o whom w e ow e extensiv e debt s o f gratitud e ha s grow n formidably large. A majo r par t o f thi s deb t i s owe d t o th e Department s o f Economic s a t the Universitie s o f Californi a a t Sa n Diego , Florid a i n Gainesville , McGill, an d Oxford , an d th e Ban k o f Spain , wher e th e author s eithe r worked o r visite d for substantia l periods. Thei r generou s suppor t o f ou r work i s much appreciated . The boo k ha s als o benefite d greatl y fro m th e patien t scrutin y o f several o f ou r colleagues , wh o rea d th e entir e typescript an d mad e detailed comments . W e hav e pleasur e i n thankin g Michae l Clements , Rob Engle , Neil Ericsson, Ton y Hall (an d severa l o f his students), Colin Hargreaves, S0re n Johansen , Katarin a Juselius , Teu n Kloek , Jame s MacKinnon, G . S . Maddala , Grayha m Mizon , Jean-Fran9oi s Richard , Mark Rush , Nei l Shephard , Tim o Terasvirta , an d fou r anonymou s referees fo r thei r help . The y hav e mad e a grea t contributio n t o thi s book, an d foun d man y infelicitie s i n earlie r versions , bu t o f cours e ar e not responsibl e for an y that remain. Early version s o f th e boo k wer e inflicte d b y u s upo n ou r graduat e students. Amon g thos e wh o suffere d fro m th e confusio n cause d by obscur e notatio n an d prose , bu t continue d unflinchingly , Hughe s Dauphin, Caro l Dole , Jesu s Gonzalo , Catherin e Liston , Claudi o Lupi , Neil Rickman , an d Geet a Sing h deserve specia l thanks. We ar e als o indebte d t o Juli a Campos , Michae l Clements , Steve n Cook, Nei l Ericsson an d Claudi o Lup i fo r proof reading. The financia l suppor t o f th e Economi c an d Socia l Researc h Counci l (UK) unde r grant s B0125002 4 an d R23118 4 an d th e Fond s pou r l a Formation de s Chercheur s e t 1'Aid e a l a Recherch e (Quebec ) i s grate fully acknowledged . Finally, w e than k Andre w Schulle r an d th e editor s of thi s series , wh o remaine d encouragin g abou t th e projec t despit e it s many difficulties . Oxford A Madrid J Montreal J Oxford D
.B. . J. D . . W. G. . F. H.
Contents Notational Conventions, Symbols , an d Abbreviations x 1. Introductio n and Overview 1 1.1. Equilibrium relationships and the long run 2 1.2. Stationarity and equilibrium relationships 4 1.3. Equilibrium and the specification of dynamic models 5 1.4. Estimation of long-run relationships and testing for orders of integration and co-integration 8 1.5. Preliminary concepts an d definitions 1 1.6. Data representation an d transformations 2 1.7. Examples: typical ARM A processes 3 1.8. Empirical time series: money, prices, output, and interest rates 4 1.9. Outline o f later chapters 4 Appendix 4 Linear Transformations , Erro r Correction , and the Lon g Run i n Dynami c Regressio n 4 2.1. Transformations o f a simple model 4 2.2. Th e error-correction model 5 2.3. A n example 5 2.4. Bdrdsen an d Bewley transformations 5 2.5. Equivalence o f estimates from different transformations 5 2.6. Homogeneity and the ECM as a linear transformation oftheADL 6 2.7. Variances o f estimates o f long-run multipliers 6 2.8. Expectational variables and the interpretation of long-run solutions 6
3
Properties of Integrated Processes 6 3.1. Spurious regression 7 3.2. Trends an d random walks 8 3.3. Some statistical features o f integrated processes 8 3.4. Asymptotic theory fo r integrated processes 8 3.5. Using Wiener distribution theory 9 3.6. Near-integrated processes 9
i
0 8 2 0 2 3
6 8 0 2 3 5 0 1 4 9 0 1 4 6 1 5
viii Content
s
4. Testin g fo r a Unit Roo t 9 4.1. Similar tests and exogenous regressors in the DGP 10 4.2. General dynamic models fo r th e process o f interest 10 4.3. Non-parametric tests for a unit root 10 4.4. Tests o n more than on e parameter 11 4.5. Further extensions 11 4.6. Asymptotic distributions o f test statistics 12
9 4 6 8 3 9 3
5. Co-integratio n 13 5.1. A n example 13 5.2. Polynomial matrices 14 5.3. Integration and co-integration: formal definitions and theorems 14 5.4. Significance o f alternative representations 15 5.5. Alternative representations o f co-integrated variables: two examples 15 5.6. Engle- Granger two-step procedure 15
6 7 0
3 7
6. Regressio n wit h Integrate d Variable s 16 6.1. Unbalanced regressions and orthogonality tests 16 6.2. Dynamic regressions 16 6.3. Functional forms an d transformations 19 Appendix: Vector Brownian Motion 20
2 4 8 2 0
7. Co-integratio n i n Individual Equation s 20 7.1. Estimating a single co-integrating vector 20 7.2. Tests for co-integration i n a single equation 20 7.3. Response surfaces fo r critical values 21 7.4. Finite-sample biases in OL S estimates 21 7.5. Powers o f single-equation co-integration tests 23 7.6. A n empirical illustration 23 7.7. Fully modified estimation 23 7.8. A fully modified least-squares estimator 24 7.9. Dynamic specification 24 7.10. Examples 24 Appendix: Covariance Matrices 25
4 5 6 1 4 0 6 9 0 2 4 2
8. Co-integratio n i n System s o f Equations 25 8.1. Co-integration an d error correction 25 8.2. Estimating co-integrating vectors in systems 26 8.3. Inference about th e co-integration space 26 8.4. A n empirical illustration 26 8.5. Extensions 27
5 7 1 6 8 1
5 3
Contents i
x
8.6. A second example of the Johansen maximum likelihood approach 29 8.7. Asymptotic distributions of estimators of co-integrating vectors i n 1(1) systems 29
3
9. Conclusio n 29 9.1. Summary 29 9.2. Th e invariance o f co-integrating vectors 30 9.3. Invariance o f co-integration under seasonal adjustment 30 9.4. Structured time-series models an d co-integration 30 9.5. Recent research on integration and co-integration 30 9.6. Reinterpreting econometrics time-series problems 30
9 9 0 1 3 4 7
References 31
1
Acknowledgements fo r Quoted Extracts 32
1
Author Index 32
3
Subject Index 32
5
2
This page intentionally left blank
Notational Conventions, Symbols, and Abbreviations The following notationa l convention s will be used throughou t the text: Y, y endogenou X, Z , x , z exogenou
s variables s variables, o r vectors containing both y an d z Greek letters populatio n values (parameters) Greek letters with ~ o r ~ sampl e values (estimates ) Bold lowe r case (Roma n o r Greek) vector s Bold upper cas e (Roman or Greek ) matrice s Equation numbers Equations ar e numbere d consecutivel y i n eac h chapte r an d referre d t o within tha t chapte r b y this number alone . Equation s fro m othe r chapter s are referre d t o b y th e chapte r numbe r an d equatio n numbe r withi n chapter; e.g . th e fift h equatio n i n Chapte r 2 is (5) within Chapter 2 , an d (2.5) elsewhere . Symbols la first-differenc Kronecke fo
g operator:
e operator : r produc t
r al l modulus or absolut e value of x, where x i s a scalar determinan t o f A, wher e A is a matrix x conditiona l on y wea k convergence convergenc e i n distribution convergenc e i n probability Abbreviations
ADF augmente d Dickey-Fuller ADL autoregressive-distribute d lag
xii Notationa
l Conventions , Symbols , an d Abbreviation s
AR autoregressio n ARIMA autoregressiv e integrate d movin g average ARMA autoregressive-movin g averag e ARMAX ARM A + additiona l exogenou s processe s ASE Asymptoti c standard erro r BM Brownia n motio n Cl(d, b) co-integrate d o f order d , b CLT centra l limi t theore m COMFAC commo n facto r error representatio n CRDW co-integratin g regression D W statistic diag diagona l matrix d.f. degree s o f freedom DF Dickey-Fulle r DGP data-generatio n proces s DW Durbin-Watso n statisti c ECM error-correctio n model/mechanis m ESE (average ) estimate d standar d erro r FCLT functiona l centra l limi t theorem/ s FIML full-informatio n maximu m likelihood GLS generalize d least square s GNP gros s national produc t \(d) integrate d of orde r d ID independentl y distribute d IID independentl y an d identically distributed IMA integrate d movin g average IN(/i, a 2 ) independentl y and normall y distributed with mean fi an d variance a 2 IV instrumenta l variables LIML limited-informatio n maximum likelihood MA movin g averag e MDS martingal e difference sequence MLE maximu m likelihood estimato r N(ju, a 2 ) normall y distribute d wit h mean p, and variance a 2 NI near-integrate d OLS ordinar y least square s SC Schwar z information criterion SD standar d deviatio n SE standar d erro r SI seasonall y integrated SSD sampl e standar d deviatio n T sampl e siz e or las t observatio n i n a time-series TFE tota l fina l expenditur e VAR vecto r autoregressio n var varianc e
Notational Conventions, Symbols , and Abbreviations xii vec vectorizin W(r) Wiene
g operator r (Brownia n motion) process wit h increments of variance r
i
This page intentionally left blank
1
Introduction an d Overvie w This boo k consider s th e econometri c analysi s o f bot h stationar y and non-stationar y processe s whic h ma y b e linke d b y equilibriu m relationships. I t exposit s th e mai n tools , techniques , models , con cepts, an d distribution s involve d i n econometri c modellin g o f possibly non-stationar y time-serie s data . Sinc e th e focu s i s o n equilibrium concepts , includin g co-integration an d erro r correction , the analysi s begin s wit h a discussio n o f th e applicatio n o f thes e concepts t o stationar y empirica l models . Late r w e wil l sho w tha t integrated processe s ca n b e reduce d t o thi s cas e b y suitabl e transformations tha t tak e advantag e o f co-integrating (equilibrium ) relationships. I n thi s chapte r w e wil l introduc e som e importan t concepts fro m time-serie s analysi s an d th e theor y o f stochasti c processes, an d i n particula r th e theor y o f Brownia n motio n pro cesses. W e als o offe r severa l empirica l example s whic h us e thes e concepts. A significan t re-evaluatio n o f th e statistica l basis o f econometri c model ling too k plac e durin g th e 1980s . It s analytica l basis expande d fro m th e assumption o f stationarit y t o includ e integrate d processes . Th e effec t o f this shif t i s fa r fro m complete , bu t i s alread y radical , influencin g th e choice o f mode l forms , modellin g practices , statistica l inference , dis tribution theory , an d th e interpretatio n o f man y traditiona l concept s such a s simultaneity , measurement errors , collinearity , forecasting , an d exogeneity. Thi s boo k attempt s t o analys e thes e issues , describ e th e tools necessar y t o investigat e integrate d processes , an d relat e th e ne w methods t o thos e mor e familia r t o econometricians . Researc h i s con tinuing a t a rapi d pace , an d sinc e thi s boo k canno t cove r al l o f th e techniques tha t hav e bee n explored , w e wil l concentrat e o n thos e tha t we believe wil l remain useful . Time-series econometric s i s concerned wit h th e estimatio n o f relation ships amon g group s of variables , eac h o f whic h is observed a t a numbe r of consecutiv e point s i n time . Th e relationship s amon g thes e variable s may b e complicated ; i n particular , th e valu e o f eac h variabl e ma y depend o n th e value s take n b y man y other s i n severa l previou s tim e periods. I n consequence , th e effec t tha t a chang e in one variabl e ha s on another depend s upo n th e tim e horizo n tha t w e consider . I t i s eas y t o
2 Introductio
n an d Overvie w
imagine example s i n whic h a chang e i n on e quantit y ha s littl e o r n o effect o n anothe r a t firs t an d a substantia l effec t later . Alternatively , a variable ma y hav e a substantia l effec t o n anothe r fo r a time , bu t tha t effect ma y eventually die out . It i s useful , therefore , t o distinguis h wha t ar e ofte n calle d 'short-run ' relationships (thos e holdin g ove r a relativel y shor t period ) fro m 'long run' relationships . Th e forme r relat e t o link s tha t d o no t persist . Fo r example, a sudde n stor m ma y temporaril y reduc e th e suppl y o f fres h fish an d increas e it s price , bu t late r fai r weathe r wil l lea d t o th e re-establishing o f th e earlie r pric e i f deman d i s unaltered . Th e long-ru n relationships determin e th e generall y prevailing price-quantity combina tions transacte d i n the market , an d s o are closel y linke d t o th e concept s of equilibriu m relationship s i n economi c theor y an d o f persisten t co movements o f economi c tim e series i n econometrics . Ou r firs t tas k i s t o clarify thes e concepts .
1.1. Equilibriu m Relationship s an d th e Lon g Run An equilibrium state i s define d a s on e i n whic h ther e i s n o inheren t tendency t o change . A disequilibriu m i s an y situatio n tha t i s no t a n equilibrium an d henc e characterize s a state tha t contain s th e seed s o f its own destruction . A n equilibriu m stat e ma y o r ma y no t hav e th e property o f eithe r loca l o r globa l stability ; thus, i t ma y o r ma y no t b e true tha t th e syste m tend s t o retur n t o th e equilibriu m stat e whe n i t is perturbed. However , w e generall y conside r onl y stabl e equilibria , sinc e unstable equilibri a wil l no t persis t give n that ther e ar e stochasti c shock s to th e economy . Tha t is , equilibri a ar e state s t o whic h th e syste m i s attracted, othe r thing s bein g equal . I t ma y als o b e possibl e i n som e circumstances t o vie w th e force s tendin g t o pus h th e syste m bac k int o equilibrium a s dependin g upo n th e magnitud e o f th e deviatio n fro m equilibrium a t a given point i n time. Equilibrium ma y b e eithe r genera l o r partial . I n th e latte r case , a given market i s viewed as having attained equilibriu m i n spite o f the fac t that w e hav e no t take n accoun t o f th e feedbac k fro m othe r markets . I n both cases , a n equilibriu m relationshi p i s expresse d throug h a functio n f(*i, x 2, . - ., xn) = 0, whic h describes th e relationship s tha t hol d amon g the n variable s Xi t o x n whe n th e syste m i s in equilibrium . Th e phras e 'long-run equilibrium ' i s also use d t o denot e th e equilibriu m relationshi p to whic h a syste m converge s over time . Ove r finit e period s o f time , th e long-run o r equilibriu m relationship s ma y fai l t o hold , bu t the y wil l eventually hol d t o an y degre e o f accurac y i f th e equilibriu m i s stable , and i f th e syste m doe s no t experienc e furthe r shock s fro m outside . Expressed differently , a long-ru n equilibriu m relationshi p entail s a
Introduction and Overvie w 3 systematic co-movemen t amon g economi c variable s whic h a n economi c system exemplifie s precisel y i n th e lon g run ; w e wil l writ e equation s representing suc h co-movement s withou t tim e subscript s as , e.g . x\ = fix2 to denot e a linear long-ru n relation betwee n x^ an d x^. Our definitio n o f equilibriu m i s therefor e no t tha t i n whic h 'equili brium' refer s t o clearin g i n a particula r marke t an d wher e 'disequili brium' mean s tha t suppl y i s not equa l t o demand , a s i n Quand t (1978 , 1982): w e us e th e ter m 'market-clearing ' fo r th e forme r an d a 'non clearing market ' fo r th e latter . A non-clearin g marke t involve s quantity rationing o f som e agent s and , dependin g o n th e institutiona l structure , may o r ma y no t involv e a deviatio n fro m a n equilibriu m functiona l relationship. There i s o f cours e a connectio n betwee n th e meanin g o f 'equilibrium ' used i n econometric s b y Quand t an d others , an d tha t use d here , which is mor e commo n i n time-serie s analysis . Whe n a marke t clears , a n equilibrium relationshi p o f th e typ e w e hav e define d ma y als o occu r because clearin g o f tha t marke t ma y retur n th e syste m t o a stat e i n which som e functiona l relationshi p amon g observabl e variable s holds . Our definitio n i s intende d t o b e genera l an d therefor e t o incorporat e market-clearing equilibria , a s well as others whic h may arise throug h th e behaviour o f a variet y o f differen t type s o f systems . Fo r example , w e would sa y tha t a n equilibriu m relationshi p exist s betwee n aggregat e consumption and incom e if consumptio n tend s towar d a fractio n y of income i n th e absenc e o f shock s whic h ma y temporaril y pertur b th e relationship. Thi s nee d no t b e a n equilibriu m i n th e Quand t (1978 ) sense, however , becaus e i t ma y no t correspon d t o th e clearin g o f markets. (Al l consumer s may remain credit-rationed , for example.) Even i f shock s t o a syste m ar e constantl y occurrin g s o tha t th e economic syste m i s neve r i n equilibrium , th e concep t o f long-ru n equilibrium ma y nonetheles s b e useful . Th e presen t i s th e long-ru n outcome o f th e distan t pas t and , a s wil l b e mad e precis e below , a long-run relationshi p wil l ofte n hol d 'o n average ' ove r time . Moreover , a stabl e equilibriu m ha s th e propert y tha t a give n deviatio n fro m th e equilibrium become s mor e an d mor e unlikel y a s th e magnitud e o f th e deviation i s greater , s o tha t on e ma y b e reasonabl y confiden t tha t th e discrepancy between th e actua l relationship connectin g variables an d this long-run relationshi p i s withi n certai n bounds . Precis e definition s ar e provided in Chapte r 5 . Methods fo r investigatin g such long-ru n relationships ar e ou r concer n here. A n examinatio n o f these method s wil l lead u s to discus s aspects of time-series analysis , o f dynami c modelling in general , an d o f th e rapidl y growing literature treatin g co-integration , erro r correction , an d inference from non-stationar y data . Th e firs t ste p i s to clarif y th e statistica l notio n of stationarit y and it s links to th e concep t o f equilibrium.
4 Introductio
n and Overvie w
1.2. Stationarit y an d Equilibriu m Relationship s In economi c theory , th e concep t o f equilibriu m i s wel l establishe d an d well defined . Th e statistica l concept o f equilibriu m centre s o n tha t o f a stationary process, whic h wil l b e define d formall y below. A substantia l body o f method s i s developin g aroun d th e statistica l feature s o f equili brium relationship s amon g time-serie s processes , an d th e concept s o f Stationarity an d particula r form s o f non-stationarit y ar e crucia l t o thes e methods. If a particula r relationshi p suc h a s x\ = fix 2 emerges a s th e economi c system i s allowe d t o settl e down , this wil l describ e a n equilibriu m to a n econometrician jus t a s to a theorist . I n actua l tim e series , however , th e relation jt l t = fix 2t ma y neve r b e observe d t o hold . Consequently , w e look fo r way s of characterizin g the relationship s tha t ca n b e observe d t o hold betwee n x\ t an d x2t. Roughly speaking—again , term s wil l b e define d precisel y i n Chapte r 5—we sa y that a n equilibriu m relationship f(xi,x 2) = 0 hold s betwee n two variable s x j an d x 2 i f th e amoun t E, = f(xit,x2t) b y whic h actua l observations deviat e fro m thi s equilibriu m i s a median-zer o stationar y process.1 Tha t is , th e 'error ' o r discrepanc y betwee n outcom e an d postulated equilibriu m ha s a fixe d distribution , centre d o n zero , tha t does no t chang e ove r time . Thi s erro r canno t therefor e gro w indefin itely; i f i t did , the relationshi p coul d no t hav e bee n a n equilibriu m on e since th e syste m is free t o mov e eve r furthe r awa y fro m it . O f course , i t may b e difficul t t o distinguis h in finit e sample s between a n ever-growing discrepancy i n a n hypothesize d equilibriu m relationshi p an d a rando m fluctuation; forma l statistica l test s fo r problem s suc h a s thi s ar e discussed i n later chapters. Given th e characterizatio n above , th e short-ru n discrepanc y e t i n a n equilibrium relationshi p mus t hav e n o tendenc y t o gro w systematically over time . However , sinc e thi s erro r represent s shock s tha t ar e constantly occurrin g an d affectin g economi c variables , i n a rea l economi c system ther e i s n o systemati c tendenc y fo r thi s erro r t o diminis h ove r time either . I t would fall awa y to zer o only if shocks were to cease . This definitio n o f a n equilibriu m relationshi p hold s automaticall y when applie d t o serie s tha t ar e themselve s stationary . Fo r an y tw o stationary serie s {jc 1(} an d {x 2t}, irrespectiv e o f an y substantiv e economic relationshi p betwee n thes e tw o alone , a differenc e o f th e for m 1 Late r we will consider mor e precisely th e propertie s that th e deviatio n mus t have . Th e requirement i s usually state d a s bein g tha t th e deviatio n fro m th e equilibriu m relationship be integrate d o f orde r zer o (se e below); alternatively , w e migh t impos e onl y th e weake r requirement tha t th e unconditiona l expectatio n o f th e deviatio n fro m th e equilibriu m relationship b e zero , implyin g that onl y th e firs t momen t nee d exis t an d b e constant . Fo r simplicity, w e omit intercept s fro m th e presen t discussion .
Introduction and Overvie w 5 {xit — bx2t} mus t b e a stationary serie s fo r an y b . Thus , whethe r o r no t there exists a non-zero y 3 which describes a true equilibrium relationship , corresponding t o a non-zero derivativ e betwee n x\ an d x2, any arbitrarily chosen b wil l meet th e statistica l equilibriu m condition. Thi s doe s no t imply tha t w e canno t us e statistica l method s t o determin e th e para meters o f a long-ru n relationship , bu t simpl y tha t on e stag e o f th e process, i n which we look fo r a stationary discrepancy , is unnecessary. However, thi s concept o f statistica l equilibrium i s necessary an d usefu l in examinin g equilibriu m relationship s betwee n variable s tendin g t o grow ove r time . I n suc h cases , i f the actua l relationshi p i s x± = fix 2, th e discrepancy xi, - bx 2t wil l b e non-stationar y fo r an y b + /3, sinc e th e discrepancy deviate s fro m th e tru e relationshi p b y th e constan t propor tion ( b - )8 ) o f the growin g variabl e x 2t; onl y th e tru e relationshi p ca n yield a stationar y discrepancy . Wit h mor e tha n tw o variables , however , there ma y b e mor e tha n on e equilibriu m relation , an d thi s lead s t o another o f th e statistica l problem s tha t i s currentl y bein g pursued : th e empirical determinatio n o f th e numbe r o f equilibriu m relationship s between thre e or more non-stationar y tim e series .
1.3. Equilibriu m and th e Specificatio n o f Dynami c Models Equilibrium relationship s hav e playe d a n explici t rol e i n econometri c modelling sinc e it s foundation s (se e Morga n 1990) . I f ther e exist s a stable equilibriu m x\ = fix 2, th e discrepanc y {x\ t — fix 2t} evidentl y contains usefu l informatio n sinc e o n averag e th e syste m wil l mov e towards tha t equilibriu m i f i t i s no t alread y there . I n particular , (x-it-i - /3x 2t-i) represent s th e previou s disequilibrium . Suppos e th e equilibrium relationshi p is betwee n a variabl e {y t} to be modelle d and some serie s {zi} whic h i s exogenou s i n a n appropriat e sense . I f w e le t x = it yt an d X 2t = z t t o distinguis h thei r status , an d denote th e equili brium b y y — PZ, the n th e discrepancy , o r error , {y t — fizt} shoul d b e a useful explanator y variabl e fo r th e nex t directio n o f movemen t o f y t. I n particular, whe n y, — flz t is positive, y, is too hig h relative t o z t, an d on average w e might expect a fal l i n y i n futur e period s relativ e t o it s tren d growth. Th e ter m (y t-\ — Pzt-i), calle d a n error-correction mechanism, is therefor e sometime s include d i n dynami c regression s (se e Sarga n 1964, Hendr y an d Anderso n 1977 , an d Davidson , Hendry , Srba , an d Yeo 1978) . The tru e paramete r /3 characterizin g th e relationshi p i s no t know n i n general. Thi s nee d no t preven t th e error-correctio n mechanis m fro m being useful , however , sinc e th e unknow n paramete r ca n eithe r b e
6 Introductio
n and Overvie w
estimated separatel y i n a prio r analysi s o r estimate d i n th e cours e o f modelling th e variabl e o f interest . Moreover , th e genera l error-correc tion mechanis m ca n b e show n t o b e equivalen t t o variou s othe r transformations o f a genera l linea r mode l incorporatin g pas t value s o f both th e variabl e o f interes t an d th e explanator y variable s (se e Chapte r 2). A particula r advantag e o f th e error-correctio n mechanis m i s that th e extent o f adjustmen t i n a give n perio d t o deviation s fro m long-ru n equilibrium i s give n b y th e estimate d equatio n withou t an y furthe r calculation. Othe r form s o f th e estimate d mode l ar e als o convenien t i n that the y allo w th e implie d long-ru n relatio n itsel f t o b e see n directly . Considerations suc h a s these ar e discusse d i n the followin g chapter . The practic e o f exploitin g informatio n containe d i n th e curren t deviation fro m a n equilibriu m relationship, i n explainin g th e pat h o f a variable, ha s benefite d fro m th e formalizatio n o f th e concep t o f co-inte gration b y Grange r (1981 ) an d Engl e an d Grange r (1987) . Th e informa l definition o f statistica l equilibriu m discusse d abov e i s base d upo n a special cas e o f th e definitio n o f co-integration . Further , th e practic e o f modelling co-integrate d serie s i s closel y relate d t o error-correctio n mechanisms: error-correctin g behaviou r o n th e par t o f economi c agent s will induc e co-integratin g relationship s amon g th e correspondin g tim e series an d vic e versa. A serie s tha t i s tendin g t o gro w ove r tim e canno t b e stationar y (although i t ma y possibl y b e stationar y aroun d som e deterministi c trend), bu t th e changes i n tha t serie s migh t be . T o tak e a mechanica l example, i f a n objec t ha s a fixe d averag e positio n aroun d whic h i t moves, alway s returnin g afte r som e interva l t o thi s positio n lik e a randomly perturbe d weigh t a t th e en d o f a spring , the n it s displacemen t may b e a stationar y series . A n objec t tha t ha s n o suc h fixe d positio n may nevertheles s hav e a velocit y (th e chang e i n positio n pe r uni t time) , or acceleratio n (th e chang e i n th e velocit y pe r uni t time) , tha t i s stationary. Fo r example , i f th e objec t i s movin g eve r furthe r fro m it s point o f origin , bu t wit h velocit y fluctuatin g aroun d som e fixe d positiv e mean accordin g t o a fixe d distributio n function , the n th e velocit y o f th e object i s a stationary series. A serie s is said t o be integrate d o f order 1 (1(1)) if , althoug h it is itself non-stationary, th e change s i n thi s serie s for m a stationar y series . I t i s said t o b e integrate d o f orde r 2 (1(2) ) if , althoug h th e change s ar e non stationary, th e changes in th e changes for m a stationar y series . I n othe r words, i f th e serie s mus t b e difference d exactl y k time s t o achiev e stationarity, the n th e serie s i s l(k), s o that a stationary serie s i s 1(0). W e will us e th e ter m 'integrate d process ' t o refe r t o a serie s wit h orde r o f integration strictl y greate r tha n zero : precis e definition s ar e give n i n Chapter 3 . We ca n no w conside r th e concep t o f co-integration , it s relatio n t o th e
Introduction and Overvie w 7 definition o f long-ru n equilibriu m betwee n serie s give n above , an d it s use a s part o f a statistical descriptio n o f the behaviou r o f time serie s tha t satisfy som e equilibriu m relationship . A simpl e exampl e concern s tw o series, eac h o f whic h i s integrate d o f orde r 1 . Assum e tha t a long-ru n equilibrium relationshi p hold s betwee n them , an d tha t i t i s linear : x = X i P 2- The n (x t — f3x2) mus t be equa l t o zero i n equilibrium and the series {xi t — flx 2t} ha s a constant unconditiona l mean o f zero. Thi s nee d not impl y tha t {xi t — fix 2t} is stationary : th e varianc e o f {x lt - flx 2t} might b e non-constant , fo r example . Th e definitio n o f co-integratio n given b y Engl e an d Grange r (1987) , an d discusse d i n Chapte r 5 , doe s however requir e stationarit y o f th e deviatio n (x\ t~ fait} - Whe n stationarity doe s hold , w e sa y that x\ an d x 2 ar e co-integrate d (1,1) , denoted CI(1,1) ; tha t is , the y ar e eac h integrate d o f order 1 , and ther e exists som e linea r combinatio n {x\ t — /3x2t} whic h i s integrate d o f a n order on e lowe r tha n th e component s (i.e . i s 1(0) here) . I f {x it — fix 2t} has a constan t unconditiona l mea n bu t i s no t stationary , the n w e ma y still wan t t o sa y tha t a n equilibriu m relationshi p holds ; th e serie s wil l not, however , fi t th e stric t Engle-Grange r definitio n o f co-integration , which require s tha t som e linear combinatio n b e stationary. A substantiv e long-ru n equilibriu m relationshi p i s somethin g fro m which th e variable s involve d ca n deviate , bu t no t b y a n ever-growin g amount. Tha t is , th e discrepanc y o r erro r i n th e relationshi p canno t b e integrated o f an y orde r greate r tha n zero . Serie s integrate d o f strictl y positive order s whic h ar e linke d b y suc h a n equilibriu m relationshi p must, therefore , b e co-integrate d wit h eac h other . I n th e exampl e jus t given, th e fac t tha t th e integrate d series jt j an d x 2 mov e togethe r i n th e long ru n i s reflecte d i n th e fac t tha t the y ar e co-integrated ; a linea r relation yield s a stationary deviation . More generally , we can spea k o f variables that ar e co-integrate d (a , b ) when a > b an d b > 0, wher e a i s th e orde r o f integratio n o f th e variables and b is the reductio n in orde r of integration produce d by the linear combination , whic h the n ha s orde r o f integratio n a — b. Whe n b > 0, a linea r relatio n exist s betwee n th e variable s whic h i s integrate d of lowe r orde r tha n eithe r o f th e variable s themselves , bu t whic h ma y none th e les s no t b e 1(0) . I n th e latte r cas e ( a — b >0), th e variable s may deviat e fro m th e linea r relationshi p b y a n ever-growing amount , and s o i t i s no t th e kin d o f relationshi p tha t w e hav e bee n callin g a long-run equilibrium . Nevertheless , variable s tha t ar e CI(a , b) fo r b > 0 do contai n som e informatio n abou t th e long-ru n behaviour o f th e serie s involved. Since a relationshi p betwee n co-integrate d variable s can be show n to be representabl e usin g a n error-correctio n mechanis m (se e Chapte r 5) , and sinc e suc h representation s hav e bee n foun d t o b e valuabl e i n empirical modelling , ther e i s a forma l counterpar t t o th e informa l
8 Introductio
n and Overvie w
argument abov e suggestin g th e usefulnes s o f equilibriu m informatio n i n specifying dynami c regression models .
1.4. Estimatio n o f Long-Run Relationship s an d Testin g for Order s o f Integration an d Co-integratio n The existenc e o f long-ru n relationship s betwee n variables , th e potentia l orders o f integratio n o f particula r tim e series , an d th e implication s o f these fo r th e specificatio n o f dynami c econometri c model s ca n b e understood a s mathematica l propertie s withou t implyin g tha t w e kno w whether o r no t suc h relationship s exist , le t alon e wha t thei r form s fo r a particular empirica l problem woul d be . When a n estimate d regressio n equatio n implie s a n equilibriu m rela tionship betwee n tw o processes , i t i s a straightforwar d operatio n t o extract th e estimate d long-ru n equilibriu m relatio n regardles s o f th e form i n which the equatio n i s estimated. Th e calculatio n can be mad e by expressing th e equatio n i n a n equilibriu m for m an d takin g it s expecta tion. Thi s i s analogou s t o assumin g a stat e i n whic h th e value s o f th e variables d o no t change , s o tha t th e datin g o f variable s become s irrelevant an d th e equatio n i s treate d a s deterministic . Computin g th e derivative betwee n th e tw o serie s i s the n straightforward . Approxima tions t o th e variance s o f estimate d long-ru n multiplier s ca n als o b e computed. Chapte r 2 explore s variou s transformation s o f th e linea r model tha t ar e convenien t fo r these an d relate d calculations . Testing fo r th e existenc e o f suc h a n equilibriu m relationshi p i s no t nearly s o simple. First, i t is difficult empiricall y to establis h th e order s of integration o f individua l time series . Second , th e orde r o f integratio n o f a linea r relationshi p amon g variable s i s even harde r t o discove r tha n th e order o f integratio n o f a singl e series : drawin g inferences is complicate d by th e fac t tha t th e parameter s o f th e relationshi p ar e i n genera l unknown. Testing whethe r a n individua l serie s i s 1(1 ) a s oppose d t o 1(0 ) i s th e problem tha t ha s bee n widel y discusse d a s tha t o f testin g fo r a 'uni t root' i n a time series . Strategie s fo r performin g such testin g hav e ha d t o contend wit h th e proble m tha t 1(0 ) alternative s i n whic h th e serie s i s 'close' t o bein g 1(1 ) (s o tha t th e powe r o f th e tes t i s low ) ar e ver y plausible i n many economic circumstances . Further , th e for m o f the dat a generation proces s (e.g . th e order s o f dynamics ; th e questio n o f whic h exogenous variable s enter ; etc. ) i s not known , an d critica l value s o f tes t statistics ar e typicall y sensitive to th e structur e o f the process . Fuller (1976 ) an d Dicke y an d Fulle r (1979 ) emphasize d tha t testin g for non-stationarit y (again , 1(1 ) a s oppose d t o 1(0 ) series ) i s mor e difficult tha n conventiona l f-test s o f th e hypothesi s tha t th e autoregress -
Introduction and Overvie w 9 ive paramete r i s equa l t o on e i n a n AR(1 ) model . I n fact , wher e ther e are root s greate r tha n o r equa l t o one , conventionall y use d test s d o no t have standar d asymptoti c distributions . Th e origina l test s wer e variant s of conventiona l tests , wit h critica l value s retabulated usin g Monte Carl o experiments t o reflec t th e change s i n distributio n when , under th e null , the serie s are non-stationary. These origina l test s wer e base d o n simpl e form s o f autoregressiv e model: a n AR(1) model , with o r withou t drif t an d tim e tren d term s (i.e . yt = <xy t^i [+/3 ] [+yt\ +E t). Suc h simpl e form s ma y ofte n b e poo r approximations t o th e dat a generatio n process . Thi s wil l manifes t itself in th e failur e o f th e estimate d mode l t o pas s variou s mis-specificatio n tests. I n particular , test s fo r residua l autocorrelatio n wil l ofte n reflec t autocorrelated processe s tha t hav e bee n omitte d fro m th e mode l specifi cation. On e wa y o f dealin g wit h th e proble m o f findin g a n adequat e model withi n whic h t o tes t fo r non-stationarit y ha s therefor e bee n t o retain a simpl e autoregressiv e mode l form , bu t wit h a non-parametri c correction t o th e value s o f th e tes t statisti c t o allo w for a genera l for m of autocorrelatio n i n th e residuals . Anothe r approac h attempt s t o capture th e autocorrelatio n throug h th e additio n o f extra lagge d terms in the dependen t variable . Thes e issues are addresse d i n Chapter 4 . When serie s ma y contai n mor e tha n on e 'uni t root'—i.e . wher e the y may be 1(2 ) or of highe r orders—testin g become s yet mor e difficul t because th e sequenc e i n which different hypothese s ar e teste d ca n affec t inference. Suc h issues are als o considere d i n Chapter 4 . A relate d metho d ca n b e applie d t o th e proble m o f testin g fo r a n equilibrium relatio n betwee n integrate d variables . A prio r ste p mus t b e added t o th e metho d above , i n whic h a linea r relationshi p betwee n o r among th e variable s i n questio n i s estimated . Testin g fo r co-integratio n then entail s testin g th e orde r o f integratio n o f th e erro r i n thi s relationship. Fo r example , a stationar y erro r i n a mode l relatin g integrated serie s entail s a n equilibriu m relationship. Conversely , if there were n o equilibriu m relationship , ther e woul d b e nothin g t o ti e thes e series t o an y estimated linea r relation , an d thi s would imply non-stationarity of the residuals . It migh t appea r a t firs t sight , fo r example , tha t testin g fo r co-integra tion betwee n 1(1 ) serie s {x\ t} an d {x2t} woul d be precisel y th e sam e a s a tes t o f th e hypothesi s tha t {e j = {x lf - I3x 2,} i s 1(1 ) agains t th e alternative tha t {e (} is 1(0). However , thi s is true onl y unde r ver y strong assumptions. Necessar y condition s includ e tha t ther e i s onl y on e co integrating relatio n an d th e value s o f it s parameter s ar e known . I n th e bivariate case , whe n / 3 i s estimated , th e serie s tha t on e test s fo r stationarity i s {£, } = {XK — J3x2t}- Sinc e linea r regressio n minimize s th e variance of e t, the estimate d serie s of deviation s from equilibriu m has a smaller varianc e tha n th e tru e deviation s {x it — f}x2t}, assumin g tha t (3
10 Introductio
n and Overvie w
exists. Tha t is , th e metho d b y whic h /3 i s usuall y estimated amount s t o choosing / 3 i n suc h a wa y tha t th e tw o variable s ar e give n th e bes t chance t o appea r t o mov e together . Regressio n make s co-integratio n appear t o b e presen t mor e ofte n tha n i t should , s o tha t th e critica l values o f tes t statistic s mus t b e adjuste d t o reflec t th e fac t tha t / 3 i s estimated. Co-integratio n test s ar e therefor e similar , bu t no t identical , to standard stationarit y tests. Chapter 7 explore s thes e test s fo r co-integration , an d Chapte r 8 extends the discussio n to estimatio n an d testin g in systems of equations.
1.5. Preliminar y Concept s an d Definitions We assum e tha t reader s ar e acquainte d wit h the fundamenta l principles and method s o f econometric s an d statistica l inference. I t i s nonetheles s worth reviewin g som e importan t concept s an d definition s tha t wil l b e used i n later chapters , establishing terminology as we do so .
1.5.1. Stochastic Processes and Time-series Models A numbe r o f concept s fro m standar d time-serie s analysi s wil l b e necessary. Bo x an d Jenkin s (1970 ) giv e a thoroug h treatmen t o f thes e models. A stochastic process i s a n ordere d sequenc e o f rando m variable s {x(s, t) , s e S, t e T}, suc h that , fo r eac h t e T, x ( • , t) i s a rando m variable o n th e sampl e spac e S and , fo r eac h s e S, x ( s , - ) i s a realization o f th e stochasti c proces s o n th e inde x se t T (tha t is , a n ordered se t of values, each correspondin g t o on e valu e of the inde x set). A give n realization o f th e proces s ma y b e represente d a s {x(t), t e T}, and thi s notatio n i s als o ofte n use d fo r th e stochasti c proces s itself . I n later chapter s w e wil l typicall y refe r t o realization s o f stochasti c pro cesses by the notatio n x t for a value at t, and {x t}i (or {x t} or {*(}? = i) for a ful l se t o f values corresponding t o a n inde x set T = {1 , 2 , . . ., T}. We wil l als o restric t ou r attentio n t o discret e stochasti c processes , fo r which th e inde x se t i s a discret e set , i n whic h case w e generally use th e notation x t rathe r tha n x(i), whic h ma y appl y als o t o continuou s processes. Next, le t (x(f), t e 1} be a stochastic proces s suc h tha t E(\x(t)\) < <* > for al l t € T, an d E(x(i)\$ t_d = x(t - 1 ) fo r al l t e T, wher e E ( • ) is the expectation s operato r an d $ t^i represent s a particula r information set o f dat a realize d b y tim e t - 1. The n {x(t), t e ¥} i s calle d a
Introduction and Overvie w 1
1
martingale wit h respect t o {$ t, t e T}. A martingale difference sequence can the n b e define d b y {y(t) = x(t) - x( t - 1) , f e T}. I t follow s tha t E (\y(t)\} <0 ° V ? e T and that E(y(t)\^ t_1) = 0 V t e T. A stochasti c proces s i s calle d strictly stationary if , fo r an y subse t (?!, t 2, . . ., t n) o f T and any real numbe r h suc h tha t t t + h e T, i = 1, 2, . . ., n, we have
where F ( •) i s th e join t distributio n functio n o f th e n values . Stric t stationarity therefor e implie s tha t al l existin g moment s o f th e proces s are constan t throug h time . Th e proces s i s weakly stationary (o r secondorder stationary or covariance stationary) if
where fi, fi 2, an d th e fa j ar e constan t ove r t, fo r al l t e T an d h suc h that t r + h e T ( r = i, /). Thus , th e contemporaneou s secon d moment s do no t depen d o n time , an d th e la g dependencie s ar e function s only of lag length. Tha t th e firs t tw o raw moments ar e constan t als o implies that the varianc e o f th e proces s i s constant . I f w e conside r a vecto r proces s (x(?)} = {*i(f) > #2(0 > • • • > x m(t}}'> the n w e requir e i n additio n tha t covariances o f th e for m E\Xk(tj)xi(tj)\ ar e finit e constant s an d ar e functions o f i, j, k , I only , for any admissible i, j, k , an d /. We wil l not offe r a rigorous definitio n of a n integrate d proces s a t this stage bu t w e ca n highligh t a numbe r o f th e issue s involved . A n integrated process i s one tha t ca n b e mad e stationar y b y differencing . A discrete proces s integrate d o f orde r d mus t b e difference d d time s t o reach stationarity ; tha t is , & dxt i s stationar y wher e th e differencin g operator A rf i s define d b y ( 1 - L) d (usin g th e la g operato r L , itsel f defined b y L nxt = *,_„). Fo r example , th e firs t differenc e i s Ax, = x, - x,_i, an d th e secon d differenc e i s A. 2xt = Axt — &xt-i = x, — 2x,-i + xt-2 = ( 1 ~ L) 2xt. Th e process ( 1 - L)x, = et, wher e {E,} is a white-nois e serie s (se e below) , i s calle d a random walk an d i s a simple exampl e o f a process integrated o f order 1 . Two issue s meri t comment . First , i f x t i s stationar y then s o i s A* , o r even A dxt fo r d > 0. Thus , th e stationarit y of A d;cr i s not sufficien t fo r x t to b e l(d). (Recal l tha t a n l(d) proces s i s one tha t must b e differenced d time s t o achiev e stationarity. ) Secondly , conside r th e stabl e auto regressive process , x, = a 0 + a\xt^i + st, wher e or j < 1 , XQ = 0, an d E, ~ IN(0 , or 2), t — l, . . ., T . The n {x,} i s non-stationar y sinc e E(xt) = <*o( l ~ <*i)( l ~ <x i)~l whic h i s no t constan t ove r t , althoug h
12 Introductio
n and Overvie w
{x,} i s asymptoticall y stationar y (se e e.g . Spano s 1986) . Henc e w e hav e a non-stationar y serie s tha t i s not a n integrate d proces s i n th e sens e w e wish t o use . Chapte r 3 offers precise definitions . A white-noise process i s a stationar y proces s whic h ha s a zer o mea n and i s uncorrelate d ove r time ; tha t is , {x(t), t e ¥} i s whit e nois e i f V f e T , E[x(t)]= 0, E[(x(t)) 2} = a 2 < o o an d E[x(t)x(t + h)] = 0 where h = £ 0 an d t + h e T. A white-nois e process i s therefore necessar ily second-orde r stationary , an d i f x ( t } i s normall y distribute d i t i s strictly stationar y a s wel l sinc e i n thi s cas e higher-orde r moment s ar e functions o f th e firs t two . An innovatio n (v(f) } agains t a n informatio n se t ^ r _ j i s a proces s whose distributio n D[v(t)\$ t-i] doe s no t depend o n $ t-i'> als°; v (0 i s a mean innovatio n i f E[v(t)\$ t -i] = 0. Thus , a n innovatio n mus t b e whit e noise i f $,-i contain s a histor y o f {v( f - 1) , . . ., v(0)}, bu t no t con versely. Consequently , a n innovatio n mus t b e a martingal e differenc e sequence. (Se e Spano s (1986 ) fo r furthe r discussion. ) For a stationar y process , th e covarianc e betwee n tw o realization s a t different point s i n tim e (indices ) wil l depen d onl y upo n th e differenc e between thos e indices , an d no t o n th e indice s themselves . W e ca n therefore define , fo r a proces s {x,} tha t i s a t leas t second-order stationary with E(x t) = //< °° , th e autocovariance function
Stationarity implie s tha t y(/z ) = y ( — h ) , sinc e th e autocovarianc e be tween tw o value s depend s onl y o n th e distanc e betwee n them . Th e autocorrelation function i s defined similarly, as
y(0) being the varianc e o f the process . Our understandin g o f an d abilit y t o forecas t stochasti c processe s i s often enhance d b y fittin g models . Th e autoregressive-movin g averag e (ARMA) clas s o f model s i s widel y use d fo r univariat e time-serie s modelling, an d w e wil l mak e frequen t referenc e t o suc h models . A n ARMA(p, q) mode l (wit h p autoregressiv e (AR ) an d q moving-average (MA) parameters ) fo r a process {x,} \ i s of the form
with BQ = I an d {e t}i a white-noise process. Using polynomial s i n th e la g operator , w e ca n expres s th e ARM A model a s
Introduction and Overvie w 1
3
with
The polynomial s a(L) an d #(L ) ca n b e expresse d i n term s o f thei r factors a s
If an y facto r ( 1 — Am L) fro m <*(L ) matche s an y ( 1 — 6kL) fro m d(L), then thes e ar e sai d t o b e commo n factors , an d ca n b e cancelle d fro m both side s o f (1) . Thi s i s importan t because , i f y(L ) i s an y arbitrar y polynomial of order n , fro m (1 ) it is also true that
Such redundan t commo n factor s mus t b e cancelle d t o ensur e a uniqu e representation. I f th e A R polynomia l <x(L) contain s th e facto r ( 1 — L), (that is , i f ther e i s som e A, - equa l t o one) , the n th e proces s i s sai d t o contain a unit root.2 When th e parameter s {a t} an d {6j} ar e chose n t o fi t the autocorrela tions o f th e observe d proces s a s wel l a s possible , th e resultin g AR M A process ma y b e a usefu l predictiv e device . A n autoregressiv e integrated moving-average (ARIMA ) proces s allow s for an integrate d componen t in th e underlyin g tim e series ; thus , a n ARIMA(p , d, q) proces s i s a n I(d) proces s fo r which the dth differenc e follow s a n ARMA(p, q). An ARM A mode l wit h give n parameter s implie s particula r autoco variance an d autocorrelatio n functions ; se e Bo x an d Jenkin s (1970 : 74 ff.) fo r a descriptio n o f a n algorith m by which these ca n b e calculate d for a general ARMA(p, q ) process . If th e parameter s o f th e ARM A proces s ar e known , checkin g stationarity is not difficult . Provide d tha t a(L) and 6(L) contai n no commo n factors, stationarit y o f a n ARMA(p , q) proces s depend s onl y o n th e p parameters o f th e autoregressiv e part . A n A R o r ARM A mode l i s stationary i f an d onl y i f th e root s o f th e A R polynomia l (1 — a^L — . . . — apLp) li e outside th e uni t circl e (or , equivalently , if and onl y i f th e laten t root s o f th e polynomial , bein g th e root s o f (zp — (XIZP~I — . • • — <x p), li e inside th e uni t circle) . A n analogou s condition mus t hol d i n th e M A polynomia l t o guarante e invertibility o f the process ; se e Box and Jenkins (1970 ) o r Fuller (1976) . Factors such a s (1 + L ) o r ( 1 + L 2) yiel d roots with moduli o f unity.
14 Introductio
n and Overvie w
Examples o f processe s havin g thes e form s wil l b e give n late r i n thi s chapter.
1.5.2. Orders of Magnitude, Convergence in Probability, and Convergence in Distribution During th e cours e o f th e analysis , w e will examin e th e limitin g behavi our o f many random variables . I n particular , we wil l ofte n b e intereste d in determinin g whethe r o r no t a give n sequenc e o f rando m variable s converges o r tend s t o a limitin g value (o r t o a limi t rando m variable) , and th e rat e a t whic h an y suc h convergenc e occurs . Th e definition s given below , take n fro m Fulle r (1976 ) an d base d o n Man n an d Wal d (1943), mak e these concepts of convergence rigorous. It i s usefu l t o star t wit h a sequenc e o f variables — say, rea l numbers — that ar e non-stochastic. Let {flr)r= i b e a sequenc e o f rea l number s an d {gr}T= i b e a sequence of positive real numbers . The n 1. a T i s of smaller orde r (i n magnitude ) than g T, denote d a T = o(gT), i f limr_^ a T/gT = 0. 2. a T i s a t mos t o f orde r (i n magnitude ) g T, denote d a T= O(g T), i f there exists a real numbe r M suc h that gy 1 aT « £ M fo r al l T. For a sequenc e o f rando m o r stochasti c variables , 'orde r i n probabil ity' i s the relevan t concept. Le t {X T} b e a sequence o f random variables with {gr} as above. The n 3. Th e sequenc e {X T} converges in probability t o th e rando m variabl e X, denote d eithe r X T -^ X o r p]imX T = X, if , fo r ever y e > 0, lim^oo Pr {\XT - X \ > E} = 0. The probability limit of XT i s X . 4. X T i s o f smalle r orde r i n probabilit y tha n g T, denote d X T — o p(gT), if p\imX T/gT = 0. 5. X T i s a t mos t o f orde r i n probabilit y g T, denote d X T — Op(gT), if , for ever y e > 0, ther e exist s a positiv e rea l numbe r M £ suc h tha t Two importan t point s shoul d be noted . First , th e distinctio n between the little- o an d big- O concept s o f convergenc e ma y b e understoo d intuitively b y thinkin g of th e forme r a s scalin g a rando m variabl e suc h that th e scale d variabl e tend s t o zer o in the limit ; fo r the latter , al l tha t is require d i s tha t th e scale d variabl e remain s bounde d b y a finit e interval o f th e rea l line . I n a trivia l case , sa y X T i s o p (l). (Her e th e sequence {g T} i s a degenerat e sequenc e o f Is. ) Tha t is , unsealed , XT^0. The n i t i s certainly true tha t X T i s O p(l). Th e convers e i s not true i n general .
Intro ductidh and Overvie w 1
5
The secon d poin t concern s th e specifi c us e made o f these convergenc e concepts i n this book. Th e sequenc e {X T} wil l in general b e a sequenc e of estimators . Th e sequenc e o f ordinar y leas t square s (OLS ) estimator s in a regressio n mode l i s a goo d example . Th e estimato r J3 T i s derive d from a sampl e o f siz e T, wher e i n time-serie s analysi s T denote s time . A sampl e o f siz e T i s therefor e compose d o f observation s o n a se t o f variables fo r T tim e periods , usuall y denote d t = 1, 2 , . . . , T. Thus , PT -4 ft if and only if linir_oo Pr {|/3r - /3 | > e } = 0. The correspondin g sequenc e {g T} i s usually a power functio n o f time . Thus, fo r OL S estimators whe n the variables ar e stationary , g r = T~ 1/2 and {fi T - /? } = O P(T~ll2). I n a n alternative terminology , w e often say that / § tends t o / 3 at rat e T 1/2. I f th e variable s ar e integrated , g T = T~ l or larger, which is the cas e of super- (o r faste r tha n T 1^) convergence . The lemmat a give n belo w ar e ofte n usefu l i n determinin g order s i n magnitude an d i n probabilit y o f function s (sums , differences , products , and quotients ) o f random variables (see Fulle r 1976 , and Whit e 1984) . LEMMA!. Le t {a T}, {b T} b e sequence s o f rea l numbers . Le t {/ r} and {g T} b e sequence s o f positive real numbers.
16 Introductio
n and Overvie w
A secon d typ e o f convergenc e i s th e convergenc e o f a sequenc e o f distribution function s t o a limi t function . Importan t example s o f suc h convergence ar e centra l limi t theorems, wher e a sequence o f distributio n functions converge s point - wise t o th e norma l distributio n function . Th e appendix t o thi s chapte r use s th e Liapuno v centra l limi t theore m t o derive th e asymptoti c distributio n o f a scale d functio n o f th e sampl e mean. 6. I f {X T} i s a sequenc e o f rando m variable s with distributio n functions { F X r ( x ) } , the n {X T} i s sai d t o converg e i n distributio n t o th e random variabl e X wit h distributio n functio n F x(x), denote d XT-+ X, i f lim 7-^00 FX T(X) = F x(x), a t al l points of continuity x. Finally, convergenc e i n probabilit y implies convergenc e i n distribution . Thus, 7. Le t {X T} b e a sequenc e o f rando m variables . I f ther e exist s a random variable X suc h that pli m XT = X, the n X T -i X . 1.5.3. Ergodicity and Mixing Processes The followin g definition s ar e base d o n Davidso n an d MacKinno n (1992), Spano s (1986) , and Whit e (1984) , whic h readers can consul t for further details . Ergodicity, unifor m mixing , an d stron g mixin g ar e thre e type s o f asymptotic independence, implyin g that tw o realization s of a tim e serie s become eve r close r t o independenc e a s th e distanc e betwee n the m increases. Generically , a stochastic process {y t} i s defined a s asymptotically independent i f
as /z^>°o ; tha t is , th e join t distributio n functio n o f th e tw o sub sequences o f {y,} approache s th e produc t o f th e distribution s o f eac h o f
Introduction and Overvie w 1
7
the sub-sequence s a s th e distanc e betwee n th e sub-sequence s increase s without bound. A proces s {y,} i s defined a s ergodic i f it is stationary and if , fo r an y t ,
A sufficien t bu t no t necessar y conditio n fo r thi s t o hol d i s tha t cov(yt, y,+ r) —» 0 a s T— > °°. Thu s ergodicit y i s a wea k for m o f averag e asymptotic independence , an d usuall y w e wil l assum e tha t stronge r conditions hold which imply ergodicity. If tw o event s A an d B ar e independent , the n th e quantitie s P(A\B)-P(A) an d P(AnB) - P(A)P(B) ar e bot h equa l t o zero , where P(A\B) i s the conditiona l probability of A give n B an d P(AC\B) is the joint probabilit y o f A an d B. The concept s o f uniform mixing an d strong mixing ar e base d o n thes e tw o quantitie s respectively , an d require tha t expression s havin g these form s b e equa l t o zer o asymptotically. Unifor m mixin g an d stron g mixin g ar e als o calle d cf>-mixin g an d or-mixing, afte r th e sequence s o f number s {<$>„} an d {a n} use d i n defining them . Unifor m mixin g implies stron g mixing , and fo r a stationary proces s eithe r o f these implie s ergodicity. Begin by defining th e bounde d mapping s Gi(y f, . . ., y,-+/, ) an d G 2(yi, . . ., yi+k) ont o th e rea l line . The n th e sequenc e {y t} i s define d a s 4>-mixing i f ther e exist s a sequenc e {«} > wit h 4> n > 0 V n, wher e 4>n — > 0 as n —» °°, suc h tha t fo r n » h
The sequenc e {y t} i s defined a s ^-mixin g if there exist s a sequenc e {&„} with a n > 0 V n an d where a n —> 0 as n —» °° , suc h that
1.5.4. Exogeneity While ou r primar y focu s i s o n integrate d serie s an d th e problem s the y imply fo r standar d econometri c analyses , rathe r tha n o n th e problem s created b y a failur e o f exogeneit y (i n th e appropriat e sense) , i t wil l b e important t o conside r exogeneit y a t several points . Econometric analysi s often proceed s o n th e basi s o f a single-equatio n model o f a proces s o f interest . Implicitly , w e assum e tha t knowledg e of the processe s generatin g th e explanator y variable s woul d carr y n o information relevan t t o th e parameter s o f interest . A s Engle , Hendry ,
18 Introductio
n and Overvie w
and Richar d (1983 ) indicate , concept s o f exogeneit y relat e t o th e circumstances i n whic h thi s assumptio n i s valid . Rathe r tha n refe r t o particular variable s a s exogenou s i n general , Engl e e t at. refe r t o a variable a s exogenous with respect t o a particular parameter i f knowledge of th e proces s generatin g th e exogenou s variabl e contain s n o informa tion abou t that parameter . The thre e differen t concept s introduce d b y Engl e e t al. ar e calle d weak, strong , an d supe r exogeneit y an d correspon d t o thre e differen t ways i n whic h a paramete r estimat e ma y be used : inference , forecasting conditional o n forecast s of th e exogenou s variables , an d polic y analysis. These differen t use s requir e tha t differen t condition s mus t b e me t fo r exogeneity t o hold . Thes e condition s ca n b e examine d wit h th e follow ing definitions. Let \t = (y t, Zt)' b e generated b y the process wit h conditional densit y function D(x t\X.t-i> A) , where X,_i denote s the histor y of the variabl e x: X,_j = (x,_j , x ( _2, . . ., XQ) . Le t th e parameter s A e A b e partitione d into (A l5 A 2) t o suppor t th e factorization Then [(y, z t', &i),(z t', A^) ] operate s a sequential cu t o n D(x r |X,_!,A) i f and onl y if A ! an d A 2 ar e variation free; tha t is , i f an d onl y if so tha t th e paramete r spac e A i s th e direct produc t o f A j an d A 2. I n other words , fo r an y value s o f A j an d A 2, admissibl e value s o f th e parameters A of th e join t distributio n ca n b e recovered . Th e essentia l element o f weak exogeneit y is that th e margina l distribution contain s n o information relevan t to A ! (for an exposition , se e Ericsson 1992) . Weak exogeneity: z t i s weakl y exogenou s fo r a se t o f parameter s o f interest ij> i f an d onl y i f ther e exist s a partitio n (A j , A2) o f A such that (i ) t/> i s a functio n o f A j alone , an d (ii ) [ ( y t z t ' , ^ i ) , (z t\ A^) ] operate s a sequential cut . Strong exogeneity. z t i s strongl y exogenou s fo r t/ > i f an d onl y i f z t i s weakly exogenou s fo r \f> an d so that y doe s not Granger-caus e z . Super exogeneity: z t i s supe r exogenou s fo r t y i f an d onl y i f z t i s weakly exogenous fo r \l> and A \ i s invariant t o intervention s affecting A^ . Weak exogeneit y ensure s tha t ther e i s n o los s o f informatio n abou t parameters o f interest fro m analysin g only the conditiona l distribution ; a variable z t i s weakl y exogenou s fo r a se t o f parameter s t/ > i f inferenc e concerning t/ ; can b e mad e conditiona l o n z t wit h no los s o f information relative t o tha t whic h could be obtaine d usin g the joint density o f y t an d
Introduction and Overvie w 1
9
Zf Stron g exogeneit y i s necessar y fo r multi-ste p forecastin g whic h proceeds b y forecasting future z s an d the n forecastin g ys conditiona l on those zs . Supe r exogeneit y sustain s polic y analysi s o n A I whe n th e marginal distribution of z t i s altered . Engle e t al. contras t thes e thre e type s o f exogeneit y wit h th e tradi tional concept s o f strict exogeneity an d pre-determinedness . I f u t i s th e error ter m i n a model , the n z t i s sai d t o b e strictl y exogenou s i f E[ztUt+i] = 0 V i, wherea s z t i s said t o b e predetermine d i f E[z tut+i] = 0 V i 3 = 0. Ehgl e e t al . sho w tha t th e latte r concept s ar e neithe r necessar y nor sufficien t fo r vali d inferenc e sinc e neithe r relate s t o parameter s o f interest. The following example (fro m Engl e e t al. 1983 ) seeks t o clarif y thes e concepts. Conside r th e DGP:
with
The parameter s (/? , <5 l5 d 2, o u, o l2, o 22) ar e assume d t o b e variatio n free beyon d th e requirement s tha t ensur e th e erro r covarianc e matri x is positive definite . /3 i s th e paramete r o f interest . Th e reduced-for m equation for y t i s with
and E[y, z tHE cORRESPONDING CONDITIONAL VARIANCE VAR
Consider no w the regressio n mode l If o- 12 = 0, b y substitutin g this valu e int o th e expression s fo r b, c t, and a2, w e see tha t E[y t z t, Y t-i, z r-i] =fa t an d tha t th e conditiona l variance i s a n. Also, E\Zt\L t-\, Y t-i] — dizt-i + fyyt- i wit h varianc e a22- Henc e th e conditiona l densit y of (y t, z t) factorize s as i n ou r earlie r definition o f a sequentia l cut , s o tha t (ft, o u) an d (<5 1; 6 2, o 22)
20 Introductio
n an d Overvie w
correspond t o A J an d A 2 respectively . Sinc e th e paramete r / ? is (trivially ) a functio n o f A ! only, z t i s weakly exogenous fo r /3. I n th e contex t o f th e regression model , thi s show s u p i n th e fac t tha t / ? ca n b e derive d fro m the parameter s o f thi s regression , withou t knowledg e o f th e parameter s of th e proces s generatin g z t. Here , i n fact , cr 12 = 0 implie s b = /3. I f a12 ¥= 0, the n f i canno t b e obtaine d fro m th e parameter s b\, c\, c 2, and a2 o f the regressio n mode l (o r o f the conditiona l distribution) . As lon g a s 8 2 ^ 0, lagge d ys affec t z an d s o z i s neither strongl y no r strictly exogenou s fo r f t irrespectiv e of it s wea k exogeneit y status . Whe n ou = £ 0, an d /3 i s an invarian t parameter , change s i n th e parameter s (5 l5 82, cr 22) determinin g th e margina l proces s affec t th e parameter s o f th e conditional process , b an d c, . Thus , th e failur e o f z t t o b e weakl y exogenous ca n lea d t o a failur e o f constanc y i n th e conditiona l mode l when th e margina l proces s changes . Conversely , whe n cr 12 = 0 an d (/3, an ) ar e invarian t t o change s i n (<5 1; <5 2, <7 22), z t i s supe r exogenou s for p . Thi s hold s eve n whe n 6 2 = £ 0 an d z t i s no t strongl y exogenou s for /3 . 1.5.5. Functions of Deterministic Trends Various sum s o f power s o f trend s appea r regularl y i n th e derivation s i n this boo k an d s o i t i s convenien t t o recor d th e mos t commo n o f thes e here:
In eac h case , w e als o hav e that These formula e ar e wel l known an d easy , i f tedious, t o establish , an d can be checke d b y induction. Le t
Then fo r T = 1 , 2 , 3 , . . . , y ' + l , solv e th e resultin g simultaneou s system. Fo r example , whe n ; =1, ^/= i f = aiT + a2T2. A t T = 1, 1 = a\ + a2, while a t T = 2, 3 = 2a± + 4a2. Solvin g fo r a\ an d a 2 give s GI = 1/ 2 and a 2 = 1/2 , so that
Introduction an d Overvie w 2
1
Also, T~ 2[(1/2)7 + (1/2)T 2 ]^ 1/2 . Th e polynomia l tha t i s fitte d t o 2)t=i^ i s assume d t o b e o f order / + 1 . To sho w that thi s is the correc t order t o use , conside r wha t woul d happe n i f 2f= i f ha d bee n se t equa l to a third-orde r polynomia l flj T + a 2T2 + a$T3. Solvin g a s abov e fo r a1; a 2, and a 3 would yield a\ = a2 = 1/2 (as before) wit h « 3 = 0. We ca n summariz e som e o f th e relation s abov e a s follows . Le t A, = (1/2)7(7- + 1), A^ = (1/3)(2T + 1), and A3 = (l/5)[3T(T + 1) -1] , Then
These sum s ar e o f order s ively, becaus e
1.5.6. Wiener Processes Wiener, o r Brownia n motion , processe s ar e use d i n explorin g th e properties o f statistic s involvin g integrated data . W e begi n ou r discus sion b y constructin g a n integrate d process , an d the n ma p a transformation o f it into a Wiener process. Let {x t} b e a normall y distributed , zero-mean , uni t variance , station ary, an d ergodi c martingal e differenc e sequence , s o tha t ^~IN(0,1) , and le t
and
Thus, 5' r ~N(0, T) an d i s an 1(1 ) proces s wit h independen t increments . Specifically, 5 ^ i s a rando m walk : S T = ST-I + XT- Mor e generally ,
Introduction and Overvie w
22
when the distributio n of {x t} is well behaved, the limitin g distribution of a suitably standardized S T wil l also be well behaved. The analysi s o f regression s wit h integrate d serie s use s th e concep t o f limit theorem s i n functio n space s know n a s functional limit theorems. These ar e als o calle d invariance principles becaus e th e sam e for m o f limiting distributio n result s fo r a wid e rang e o f processe s { x t } , havin g different degree s o f heterogeneit y an d memor y (se e Phillip s 1987a) . Figure 1. 1 illustrate s a sampl e realizatio n o f a rando m wal k S T fo r T — 10. W e wil l describ e th e variou s stage s o f analysi s i n term s o f thi s example. W e firs t conside r convergenc e o f th e transformatio n Sy/V T from (2 ) to a continuous Wiener process denote d b y W(r) for r e [0,1]. A Wiene r proces s i s lik e a continuou s rando m wal k define d o n th e interval [0,1 ] (regar d thi s a s th e horizonta l axis) , bu t ha s unbounde d variation despit e bein g continuous , an d s o ca n b e imagine d a s moving extremely erraticall y i n th e vertica l direction . I n an y sub-interva l [a , b] of [0,1] , W(r ) fo r r e [a , b] remain s equall y erratic . I n general , a continuous proces s V(t), t 3= 0, i s a Wiene r proces s i f (i ) fo r al l t ^ 0, E[V(t)] = 0; (ii ) fo r al l fixe d t 5 * 0, V(t) i s normall y distribute d an d non-degenerate; (iii ) V(t) ha s independen t increments ; an d (iv ) Pr {V(0) = 0} = 1 . A Wiene r proces s ma y be though t of a s the limi t of a discrete-time rando m wal k a s th e interva l betwee n realization s goe s t o zero. It s derivativ e is a continuous-tim e normally distribute d white-nois e process, whic h i s a n abstraction , no t a physicall y realizabl e process . Nonetheless, th e limitin g distribution s describe d b y th e Wiene r proces s may b e usefu l approximation s in many circumstances. There ar e fe w convenien t analytica l expression s fo r function s o f the distributio n o f W(r) , r e [0,1], althoug h a s w e hav e note d
FlGl
,1. Realizatio n o f a random wal k over 1 0 points
Introduction and Overvie w
23
W(r) ~ N(0 , r) fo r fixe d r , an d W(r ) ha s independen t increments . Various functions o f W(r) hav e been tabulated , usuall y by simulation. The followin g formulatio n map s th e increasin g interva l fro m 0 t o T into th e fixe d interva l [0 , 1] so that result s wil l be invarian t to th e actua l value of T . T o d o so , w e construc t fro m S T a new random ste p functio n Rr(r) a s follows . Le t [rT] denot e th e intege r par t o f rT, wher e r e [0,1]. Fo r example , i f T = 10 0 an d r = 0.101, the n [rT] = [10.1] = 10 . Divide th e interva l [0 , 1] int o T + 1 parts a t 0 , l/T, 2/T, . . ., 1 , and let For example , /? 100(0.101) = 510/10 wherea s /? 100(0.11) = Su/10. Thus , RT(?) i s constant fo r value s of r within jumps at successiv e integers, an d is a right-continuou s rando m variabl e define d ove r [0,1] . Figur e 1. 2 shows thi s second-stag e mapping , leadin g to the ste p functio n grap h of Rr(r) i n Fig . 1.3 . A s T— »°°, R T(r) become s increasingl y dens e o n [0,1]. Figure s 1. 4 an d 1. 5 sho w thi s happenin g fo r T = 100 an d T = 1000 . Th e horizonta l axi s lengt h i s fixed , s o th e vertica l axi s variability increases a s T grows . Let = > denot e wea k convergenc e i n th e sens e tha t th e probabilit y measures converge : thi s i s th e analogu e fo r functio n spaces , o f conver gence i n distributio n fo r rando m variable s (se e Hal l an d Heyd e 1980) . Then, unde r weak assumptions abou t {x t}, (4)
Furthermore, i f /( • ) is a continuous functional o n [0 , 1], the n (5)
FIG 1.2. Mappin g the 10-poin t grap h on t o a step functio n
24
Introduction an d Overvie w
FIG 1.3. Ste p representatio n o f a random walk ove r 1 0 points
FIG 1.4. Ste p representation of a random walk over 10 0 points For furthe r details , se e Billingsle y (1968) , Dicke y an d Fulle r (1979 , 1981), Hall an d Heyd e (1980) , an d Phillip s (1986, 1987a) . In distribution s involvin g 1(1) variables , functional s o f Wiene r pro cesses aris e quit e generally , whereas conventional methods o f obtaining limiting distribution s tend t o b e specifi c to th e assumption s made abou t the dat a o r erro r process. 3 Also , man y of the statistic s regularly used in 3 B y thi s w e mea n tha t onl y wea k restriction s nee d t o b e satisfie d b y the {x,} sequenc e for convergenc e result s suc h a s (4 ) an d (5 ) t o hold . Phillip s (1987a ) provide s a goo d account o f this issue, an d a discussion is also containe d i n Ch . 3 .
Introduction and Overvie w
25
FIG 1.5. Ste p representation o f a random walk over 100 0 points
empirical researc h involvin g 1(1) tim e serie s hav e differen t distribution s from thos e tha t aris e wit h 1(0 ) data . I n particular , man y statistics i n 1(1) processes d o no t converge t o constants , a s i n th e 1(0 ) case , bu t instea d converge t o rando m variables . Thus , differen t critica l value s ma y b e required fo r tests , dependin g o n th e degre e o f integratio n o f th e tim e series. Consider th e rando m walk , y t = v ( _! + e t, wit h e, ~ IN(0,1 ) an d >>o - 0 . Then
Alternatively, fro m (7) ,
26
Introduction an d Overview
Similarly, corr 2 (yr , yt-k) na s a numerator of (t — k)2 an d a denominato r of t( t - k ) for k > 0, and so equals 1 - k/t. Whe n k < 0, let 5 = t - k so that t = s + k, an d let r = —k > 0 , in which case
Since y 0 = 0, we have that
The las t approximatio n use s To illustrat e th e us e o f Wiene r processe s i n derivin g distribution s involving 1(1 ) variables , w e wil l deriv e th e limitin g distributio n o f the sampl e mean , y = T~ l Xf= iJ V Becaus e {y,} i s a rando m walk , its mean converge s t o a functiona l o f a Wiene r process . Le t RT(r) = y^n/V r = y^/Vr fo r ( i - l)/T = £ r < i/T ( i = 1, . . ., T) , and Rr(l) = yr/VT. Rj(r} i s a ste p functio n wit h step s a t i/T, fo r z' = 1 , . . ., T , an d i s constant betwee n steps . Thus,
Introduction and Overvie w 2
7
The las t expressio n i s yi/VT, wher e y\ i s the lagge d mean . Thi s resul t uses th e fac t that , fo r any constant c,
From (3 ) and (4) ,
and hence
The unlagge d sample mean ha s the sam e limiting distribution. An interestin g aspec t o f (10 ) i s that th e Lindeberg-Felle r centra l limit theorem4 (whic h applies t o independen t bu t heterogeneously distribute d observations; se e Whit e 1984 ) ca n b e applie d t o obtai n th e distributio n of y an d henc e sho w that
Thus, som e functiona l o f Wiene r processe s ar e familia r rando m vari ables i n disguis e and w e will develo p thi s aspect a s we proceed. A proo f of (11 ) i s given in the Appendix . 7.5.7. Monte Carlo Simulation The purpos e o f Mont e Carl o simulatio n i s t o evaluat e b y experimen t quantities tha t woul d be ver y difficult o r impossibl e t o evaluat e analytically. Suc h experiment s typicall y begi n b y creatin g a se t o f dat a wit h known statistica l properties . Thi s i s achieve d b y specifyin g ever y aspec t of a data-generatin g process , o r clas s o f suc h processes , an d replacin g the rando m error s o f th e DG P b y pseudo-rando m numbers . Pseudo random number s ar e number s generate d deterministicall y t o mimi c a random proces s wit h a particula r distribution . A n investigato r typically generates a large numbe r o f suc h artificial data set s (calle d replications ) to investigat e statistica l technique s whic h analys e thes e dat a a s i f th e process generating them were no t known. Th e performanc e o f th e statistical techniqu e i n revealin g som e characteristi c o f th e dat a se t ma y 4
Strictl y speaking , th e versio n w e us e her e i s a specia l cas e o f thi s theorem , sometime s called the Liapuno v centra l limi t theorem.
28 Introductio
n and Overvie w
then b e evaluate d b y generatin g it s distributio n fro m independen t replications o f the experimen t an d comparin g th e result s wit h the known characteristics o f the proces s generatin g the data . For example , a n econometricia n ma y wis h t o examin e th e perform ance o f th e standar d Mes t i n dat a generate d b y a rando m walk . Artificial data-set s followin g a rando m wal k ma y easil y b e constructe d using pseudo-rando m disturbances , an d th e empirica l distributio n o f th e f-statistic i n sample s o f siz e T ca n b e generate d b y replicating N set s of T observations . Th e mean , variance , o r variou s critica l value s o f th e f-statistic ca n b e calculate d fro m th e empirica l distributio n and , fo r sufficiently larg e N , wil l b e clos e t o thei r populatio n (i.e . analytic ) counterparts. Th e investigato r can als o var y the parameter s o f the DG P in orde r t o observ e thei r effect s o n th e outcome . I n eac h experiment , the investigato r know s th e tru e parameter s o f th e process , an d s o ca n evaluate the estimator s an d tests used . Unlike analytica l studies , Mont e Carl o simulation s canno t produc e exact results ; an y resul t fro m a Mont e Carl o experimen t come s fro m a (pseudo-)random sample , an d therefor e ha s som e variabilit y attached t o it. Moreover , Mont e Carl o experiment s ar e inevitabl y specifi c t o th e particular dat a generatio n processe s examine d (althoug h i t ma y b e possible t o prov e analyticall y tha t result s wil l b e invarian t t o certai n parameters i n the process) . Nonetheless , Mont e Carl o result s ar e usefu l when analytica l results ar e difficul t t o obtain . I n particular, Mont e Carl o experiments ar e ofte n use d t o investigat e th e finite-sampl e performanc e of statistica l techniques , th e analytica l propertie s o f whic h ar e know n only asymptotically . There ar e a numbe r o f subtletie s t o th e desig n an d interpretatio n o f Monte Carl o experiment s whic h deman d carefu l attention , includin g th e methods use d t o generat e pseudo-rando m numbers , variance-reductio n methods suc h a s commo n rando m numbers , antitheti c rando m number s and contro l variate s intende d t o improv e precision , th e calculatio n o f standard error s of the experimenta l estimate s o f unknown quantities, th e use o f respons e surface s t o summariz e an d interpolat e results , an d recursive updatin g o f quantitie s o f interest . Exposition s o f Mont e Carl o methods ma y b e foun d in , fo r example , Hammersle y an d Handscom b (1964), Hendr y (1984) , Riple y (1987) , Hendry , Neale , an d Ericsso n (1990), an d Davidso n an d MacKinno n (1992) .
1.6. Dat a Representation an d Transformation s Since dat a transformation s pla y a n importan t rol e i n econometric s generally, w e briefl y consider thei r impac t o n 1(1 ) data . Conside r th e hypothesis tha t a se t o f integrate d dat a ca n b e describe d b y a linea r
Introduction and Overvie w 2
9
model wit h a constan t erro r variance . I n particular , a normall y dis tributed rando m wal k wit h drif t i s ofte n postulate d s o tha t Axt ~ IN(jW , cr 2). Man y economi c tim e serie s (suc h a s consumption , national income an d expenditure , o r th e pric e level ) d o gro w over time , but th e amoun t b y whic h the y gro w i n eac h perio d als o tend s t o rise . However, A.x t = x t — xt-i wil l b e stationar y onl y if the absolut e amoun t of growt h is stationary , i n whic h cas e fo r n > 0, a/x t wil l ten d t o zero . Percentage growth , b y contrast , ofte n display s n o obviou s tendenc y t o rise o r fall , makin g it a more likel y candidate fo r stationarity . Since th e levels o f man y economi c variable s ar e initiall y positive , an d recallin g that
we se e tha t stationarit y o f th e rat e o f growt h implie s stationarit y o f Alog(jc ( ). Change s i n th e logarithm s o f economi c dat a serie s suc h a s those jus t mentioned , therefore , see m mor e likel y t o b e stationar y than changes i n th e levels . W e wil l retur n t o thi s poin t i n Chapte r 6 below, where w e conside r ho w co-integratio n i s affecte d b y th e logarithmi c transformation. W e illustrat e som e o f thes e point s wit h actua l dat a series. The tim e serie s tha t we analys e is rea l net nationa l produc t (Y, in 1929 fmillion ) fo r th e Unite d Kingdo m ove r 1872-1975 . Th e dat a ar e taken fro m Friedma n an d Schwart z (1982 ) an d ar e als o investigate d i n Hendry an d Ericsso n (19910) . Figure s 1.6-1. 9 plo t thi s dat a serie s an d
FIG 1.6. U K rea l net nationa l produc t ( Y i n 192 9 fmillion), 1872-197 5
30 Introductio
n and Overvie w
FIG 1.7. Logarith m (lo g Y ) o f UK rea l net nationa l product
various transformation s o f it . Figur e 1. 6 plot s th e untransforme d serie s Yt; th e serie s i s tending t o gro w by increasing amounts , and s o would be better approximate d b y a conve x functio n than by a straight line . Thi s is visible fro m th e upwar d curvatur e an d th e muc h close r fi t o f th e quadratic trend lin e compare d wit h the linea r trend . I n Fig . 1.7 , w e plo t the logarith m o f th e series : th e curvatur e i s no longe r apparent , an d th e quadratic an d linea r trend s ar e ver y simila r an d fi t abou t equall y well . Thus, th e logarith m o f th e serie s i s relativel y wel l approximate d b y a straight lin e and , whil e growing , ther e i s n o eviden t tendenc y fo r th e growth rate to chang e over time . Figure 1. 8 plot s th e changes , AY ( . Ther e i s a tendenc y fo r bot h th e mean an d th e varianc e t o gro w ove r time , an d th e linea r tren d show n highlights th e former . (I t require s mor e carefu l inspectio n t o se e th e latter owin g to th e ver y large shock i n 1919-20. ) Differencin g th e initial series ha s therefor e no t produce d a stationar y series . I n Fig . 1.9 , however, wher e A log Yt i s plotted, ther e i s no longe r an y major chang e in th e mea n o r variabilit y of th e serie s ove r th e sample , wit h perhaps a slight tendenc y fo r th e varianc e t o b e smalle r i n th e perio d sinc e 1945 . Certainly, an y tren d i n th e mea n o f AlogY r i s negligible . Thi s series , then, ma y wel l b e stationary , althoug h neithe r th e logarithmi c transfor mation no r th e first-differenc e transformatio n produce d a stationar y series o n it s own . Sinc e th e difference s i n th e logarithm s appea r stationary, w e migh t expec t t o fin d tha t th e logarithm s o f th e origina l
Introduction and Overvie w
31
FIG 1.8. Change s (AY ) in UK real net nationa l produc t
FIG 1.9. Change s i n th e logarith m (AlogY ) o f U K rea l ne t nationa l product series ar e 1(1) , whil e th e untransforme d initia l serie s apparentl y i s no t and differencing i t is not sufficien t t o produce stationarity. Alternatively, an y linea r mode l o f AY , will hav e a n erro r term , whic h we denot e b y ut, with a standar d deviatio n o u tha t mus t b e in the sam e
32 Introductio
n and Overvie w
units a s Y t. Sinc e thes e ar e 192 9 fmillion, th e linea r mode l assume s a constant absolut e erro r standar d deviation . However , ne t nationa l product ha s grow n abou t six-fol d ove r th e sampl e s o tha t o u/Yt (th e relative error ) wil l b e muc h smalle r i n 197 5 than i n 1875 . It woul d b e difficult t o imagin e reasons fo r such a decline. The log-linea r model , b y wa y of contrast , assume s a constan t relativ e error standar d deviatio n (e.g . 2\ percen t o f Y , a t al l point s i n time) , which seem s muc h mor e plausible . Failin g t o transfor m th e dat a adequately violate s th e statistica l model of an 1(1) o r 1(0 ) series , an d ca n induce trendin g mean s an d variances , makin g testin g les s reliable . Certainly, a relativel y lon g tim e serie s i s neede d t o mak e suc h factor s obvious, bu t the y operat e eve n withi n post-wa r quarterl y dat a (se e e.g. Ermini an d Hendr y 1991) . Moreover , change s i n mean s an d variance s over tim e ar e ver y apparen t i n nomina l tim e series , an d ca n confus e attempts t o determin e co-integration . Grange r an d Mailma n (1991 ) analyse genera l transformation s i n 1(1 ) tim e series , an d Chapte r 4 below explores forma l statistica l test s o f hypothese s abou t th e degre e o f integration o f individual time series .
1.7. Examples : Typica l ARM A Processes Figures 1.10-1.2 0 present graph s o f typical examples o f serie s generate d by specia l case s o f ARMA(1,1) processes . Fo r eas e o f comparison, eac h series i s computer-generate d usin g th e sam e se t o f 20 0 observation s o n normally distribute d white-nois e error s s , ~ IN(0,1 ) wit h w 0 = 0. Th e data generatio n processe s are: Fig. 1.1 0 u
t
= £ t [whit
e noise ]
Fig. 1.1 1 u,
= e, + 0.8e,_i [MA(1)
, stationary]
Fig. 1.1 2 u,
= e, - 0.8£,_ ! [MA(1)
, stationary ]
Fig. 1.1 3 u,
= 0. 5 «,_! + e t [AR(1)
, stationary ]
Fig. 1.1 4 u,
= 0.5 ut-v + e t + Q.8e t^i [ARMA(1,1)
, stationary ]
Fig. 1.1 5 u,
- 0. 5 Mr _! + e, - 0.8e t _i [ARMA(1,1)
, stationary]
Fig. 1.1 6 u,
= 0.9 «,_! + e, [AR(1)
Fig. 1.1 7 u
t
Fig. 1.1 8 u, Fig. 1.1 9 u Fig. 1.2 0 u,
t
, stationary ]
= 0.9 ut-! + e, + 0.8e,_i [ARMA(1,1)
, stationary ]
= 0.99 «,_! + E , [AR(1)
, stationary ]
= 1.00 M,_! + s t [AR(1)
, non-stationary ]
= 1.0 1 ut-i + e t [AR(1)
, non-stationary ]
Introduction and Overview
Observation
FIG 1.10. A R = 0.0; MA = 0.0
Observation
FIG 1.11. A R =0.0; MA -0.8
33
34
Introduction and Overvie w
Observation
FIG 1.12. A R = 0.0; MA = -0. 8
Observation
FIG 1.13. A R = 0.5; MA = 0.0
Introduction an d Overvie w
Observation
FIG 1.14. A R = 0.5; MA = 0.8
Observation
FIG 1.15. A R = 0.5; MA = -0. 8
35
36
Introduction an d Overvie w
Observation
FIG 1.16. A R = 0.9; M A = 0.0
Observation
FIG 1.17. A R = 0.9; MA = 0.8
Introduction an d Overvie w
Observation
FIG 1.18. A R = 0.99; MA = 0.0
Observation
FIG 1.19. A R = 1.00 ; M A = 0.0 0
37
38 Introductio
n and Overvie w
Observation
FIG 1.20. A R = 1.01 ; M A = 0.00
A proces s suc h a s tha t i n Fig . 1.19 , a n AR(1 ) wit h a uni t root , i s a random walk and ma y also be expresse d a s ARIMA(0,1,0). The scale s o n th e graph s i n Figs . 1.10-1.2 0 ar e no t identical ; fo r th e non-stationary processes , i n particular , th e graph s sho w ver y wid e movements relativ e t o thos e o f th e stationar y series . Non-stationar y processes wit h root s strictl y greate r tha n unit y gro w ver y quickl y even where those root s ar e quit e clos e t o 1 , as can b e see n fro m Fig . 1.20 , a n AR(1) wit h a roo t i n th e autoregressiv e par t o f 1.01 . Th e stationar y processes i n Figs. 1.10-1.1 8 have unconditional means of zero an d finit e unconditional variances . The y ar e 'tied ' t o thi s zer o mea n i n th e sens e that deviation s fro m i t canno t accumulat e indefinitely . By contrast , th e process wit h a singl e roo t o f exactl y unit y (Fig . 1.19 ) ha s a n uncondi tional Varianc e which increases ove r tim e and wil l tend t o wande r widely (see equatio n (7) ) wit h a n unbounde d expecte d crossin g tim e o f th e origin. Th e proces s wit h a root greate r tha n unity (Fig, 1.20 ) i s explosive and will tend t o either + <*> o r - <» . Figures 1.11 , 1.14 , an d 1.1 7 ad d a positiv e M A componen t t o th e series i n Figs . 1.10 , 1.13 , an d 1.1 6 respectively , t o highligh t th e 'smoothing' effec t o f a positive M A term . B y contrast, th e serie s i n Figs. 1.12 an d ^.1 5 ad d a negative MA ter m o f the sam e absolut e magnitude; these negativ e M A term s hav e th e opposit e effect , makin g th e serie s appear les s smoot h tha n th e pur e A R serie s i n Figs . 1.1 0 an d 1.13 . Figure 1.1 5 resemble s Fig . 1.10 , however , reflectin g th e fac t tha t th e
Introduction and Overvie w 3
9
AR an d M A la g polynomial s ar e clos e t o cancelling . (I f th e A R coefficient wer e 0.8 , then the AR and MA polynomial s woul d eac h be (1 — 0.8L), an d thes e redundan t commo n factor s coul d b e cancelled , leaving whit e nois e a s in Fig. 1.10). I n eac h o f th e set s Figs . 1.10-1.12 , 1.13-1.15, 1.1 6 and 1.17 , respectively, th e dat a serie s plotte d hav e th e same AR root , and diffe r onl y in their MA parts . Knowing th e generatin g mechanism , th e difference s amon g th e ARM A processe s give n i n th e figure s ar e fairl y clear . I n practice , however, i t i s not eas y to solv e the convers e proble m o f determining the generating mechanism s from observation s o n th e variables ; i t ma y eve n be difficul t t o determin e fro m a moderately size d sampl e whethe r o r no t a process i s stationary. Althoug h th e distinction s among the example s of stationary an d non-stationar y processe s abov e ar e substantial , thos e among 'borderline ' stationar y an d non-stationar y processes ma y not be . For example, u t = 0.99ut^ + et i s a (borderline) stationar y process, but will closel y resembl e the rando m wal k u, = ut-\ + £t for sample s of the size reproduce d i n Figs. 1.1 8 and 1.19. It i s interestin g t o compar e th e latte r tw o processe s b y rewritin g th e AR(1) i n M A form . Fo r th e proces s u t = <xu t^i + et, i t follow s tha t ut-i = aut-2 + £f- i also . Substitutin g thi s int o th e firs t equation , w e have u t = oc(aUt-2 + e t-i) + £f I f w e continu e t o eliminat e eac h subsequent la g of u , w e fin d
For th e stationar y process , \a\ < 1, s o th e firs t ter m an d th e contribu tions o f mor e distan t error s disappea r a s n — > oo , an d u, ma y b e approximated b y a n MA(rc ) proces s wit h increasing accurac y a s «— » oo. If a = 1, however, the firs t ter m doe s no t disappear , an d the approxima tion fails ; thi s follow s fro m th e failur e o f th e stationarjt y conditio n stated above . Whe n a = 1,
so that u t is the su m of a starting value, u t-n, and al l the error s accruing between t — n + 1 an d t . Thi s representatio n o f th e proces s {u t} a s a sum o f pas t contribution s i s the sourc e o f th e relationshi p o f integration in thi s time-serie s sens e an d integratio n i n th e integra l calculus , wher e the integra l o f a functio n ma y b e though t o f a s th e limi t o f a su m o f discrete area s unde r a curve . Figur e 1.1 9 i s th e cumulativ e sum , or discrete integral , o f the error s recorde d i n Fig. 1.10 . Many economi c tim e serie s hav e bee n modelle d usin g ARM A o r ARIMA processes , an d model s o f these type s will b e use d frequentl y in
40
Introduction and Overview
the followin g chapter s i n describin g th e method s an d tests . Priestley (1989) provide s example s o f othe r type s o f model s tha t ma y b e use d t o characterize non-stationar y processes.
1.8. Empirica l Tim e Series : Money, Prices , Output, an d Interest Rates Figure 1.2 1 graph s th e logarithm s o f quarterly , seasonall y adjusted , nominal M l an d price s (th e implic t deflato r o f tota l fina l expenditure , TFE) i n th e U K ove r th e perio d 1963-89 . Th e serie s (denote d logM , and lo g Pt) hav e stron g trend s an d ar e relativel y smooth , althoug h thei r growth rate s alte r perceptibl y aroun d 197 4 an d agai n aroun d 1980 . Suc h data ar e no t unlik e realization s fro m highl y autoregressiv e (1(1) ) pro cesses. Figure 1.2 2 show s thei r first difference s Alog(M f ) an d Alog(P^) . These ar e mor e errati c bu t ar e stil l highl y autocorrelated . Th e growt h
FIG 1.21. Tim e serie s o f mone y (Ml ) an d price s (implici t deflato r o f total fina l expenditure ) in the UK , seasonall y adjusted , i n logs
FIG 1.22. Tim e serie s o f A log M, an d A log P t
Introduction and Overvie w 4
1
rate o f M appear s t o hav e increase d ove r time , wherea s tha t o f P ha s fallen, especiall y afte r 1980 . These dat a d o no t see m t o b e stationar y although th e graph s b y themselve s d o no t revea l th e sourc e o f th e non-stationarity. Next, Fig . 1.23 shows the behaviou r o f log s o f th e rea l mone y supply (log(M/P,)) an d rea l TF E (log(Y,)) . I t migh t hav e bee n anticipate d from Fig . 1.21 that log(M r ) an d log(P () move d sufficientl y closel y ove r the whol e sampl e fo r thi s differentia l t o b e stationary , bu t Fig . 1.23 shows tha t th e rea l mone y suppl y i s non-stationary . Th e forma l ap paratus o f testin g fo r co-integratio n develope d i n Chapte r 7 i s designe d to detec t suc h relationship s statistically . B y wa y o f contrast , log(Y ( ) looks mor e lik e a serie s wit h a constan t linea r trend , subjec t t o perturbations i n 1973/ 4 and 1979/80 . In economi c terms , surprisin g features o f Figs . 1.22-1.2 3 ar e th e lo w pairwise correlation s betwee n Alog(M ( ) an d Alog(P r ), an d betwee n log(Mt/Pt) an d log(Y ( ), respectively . However , suc h result s hav e n o implications fo r th e existenc e o r otherwis e o f wel l define d relationship s between thes e variables . Monetar y theor y suggest s tha t th e opportunit y cost o f holdin g mone y i s a n importan t determinan t o f th e deman d fo r money, s o Fig . 1.2 4 show s th e tim e serie s o f th e interes t rat e (R t, a three-month loca l authorit y bil l rat e adjuste d fo r financia l innovation ) and th e rat e o f inflation , plotte d i n unit s tha t maximiz e thei r apparen t correlation. Th e serie s {R t} als o seem s t o b e non-stationary , bu t wit h a different tim e profile fro m th e othe r series . I n particular , i t i s much less smooth tha n th e othe r leve l series , bu t les s errati c tha n thei r changes . Finally, Fig . 1.25 shows Alog(Y r ) an d A/?, . Thes e ar e possibl y weakl y stationary, althoug h bot h appea r t o hav e highe r variance s i n the middl e of th e sampl e tha n a t th e ends . However , neithe r i s highl y autocor related, no r d o the y drif t noticeabl y i n an y direction . W e wil l analys e the fou r serie s log(M t), ^og(P t), log(Y,) , an d R t a s a syste m i n late r chapters. (Se e Hendry an d Ericsson (1991b) , who provided th e data. )
FIG 1.23. Tim e serie s o f real mone y (log M,/Pt) an d rea l TF E (lo g Yr)
42
Introduction and Overvie w
FIG 1.24. Time serie s o f a three-mont h interes t rat e (R t) an d th e rat e of inflation (AlogP r ) i n th e U K
FIG 1.25. Tim e serie s of A log Yt an d A7? r
1.9. Outlin e of Later Chapter s Chapter 2 discusses dynamic models fo r stationar y processes. Thi s allows us t o introduce , i n a familia r context , a numbe r o f consideration s which will prov e importan t later . Variou s equivalen t transformation s o f linea r autoregressive-distributed la g model s ar e considered , especiall y error correction, Bewley , and Bardse n forms . The rol e of expectation s in stationary processe s i s als o investigate d an d i s related t o th e absenc e of weak exogeneit y fo r th e parameter s o f th e economi c agents ' decisio n functions. Chapter 3 the n consider s th e analysi s o f 1(1 ) variables , an d explore s the concept s o f uni t roots , non-stationarity , order s o f integration , an d near integration . Th e behaviou r o f least-square s estimator s applie d t o
Introduction and Overvie w 4
3
spurious relationship s i s investigated an d a number o f results establishe d for Wiene r processe s (se e Phillip s 1987a) . Univariat e tests for uni t roots are discusse d i n Chapte r 4 , an d th e forma l definition s in Chapte r 3 ar e related t o th e propertie s o f integrate d series . Mont e Carl o result s illustrate th e variou s distributions . Extension s t o multipl e unit roots an d seasonal dat a ar e considered, an d severa l example s ar e describe d i n detail. Chapter 5 move s o n t o th e topi c o f co-integration . Followin g a bivariate exampl e an d forma l definitions , th e Grange r Representatio n Theorem i s described , linkin g co-integratio n t o erro r correction , an d clarifying th e statu s o f othe r representation s suc h a s commo n trends . The origina l Engle-Grange r two-ste p estimato r o f th e co-integratin g relationship i s analysed . Chapte r 6 firs t consider s inconsisten t regres sions sometime s use d i n orthogonalit y tests ; th e analysi s the n turn s t o distributions o f estimator s i n dynami c regressions wit h 1(1 ) data , base d on th e result s i n Sims , Stock , an d Watso n (1990) , an d i s illustrated b y a number o f examples. Chapter 7 discusse s testin g fo r co-integration . A rang e o f test s i s considered, base d o n testin g fo r a uni t roo t i n th e residual s fro m th e static regression . Whil e widel y used , suc h test s hav e drawbacks , an d Monte Carl o experiment s ar e use d t o illustrat e som e o f these . Test s based o n single-equation dynami c models ar e als o considered . Finally, i n Chapte r 8 , co-integratio n i n system s o f equation s i s analysed. Linea r co-integrate d system s ar e expresse d i n error-correctio n form an d maximu m likelihood estimatio n an d inferenc e fo r co-integrat ing vector s i s discussed, focusin g o n th e approac h propose d b y Johanse n (1988). A rang e o f extension s i s considered , a s ar e variou s othe r estimators. Th e analysi s i s agai n illustrate d b y a numbe r o f example s and simulatio n experiments.
Appendix Equation (11) To prove (11) , w e need t o construct a random variable X t, wher e
44 Introductio
n an d Overvie w
If
then, b y the Liapuno v centra l limi t theorem , The proo f o f (11 ) i s i n thre e steps . First , conside r (fro m (6) ) th e sample mean :
and
Thus X, ~ ID(0, cr?), a s required. Further, notin g tha t and usin g normality of e
and al l the condition s of the Liapuno v theorem are satisfied . Therefore , Finally, usin g the result s above, an d noting that y = TX
Introduction an d Overvie w 4 Since y/VT^> \\W(r)&r fro m result s above , w e hav e tha t y/\/T converges t o both \\W(r}Ar an d to N(0, 1/3) . Therefor e
The derivation s o f later result s follo w simila r lines.
5
2
Linear Transformations, Error Correction, and th e Lon g Run i n Dynamic Regression We begi n b y considerin g th e propertie s o f linea r autoregressive distributed la g (ADL ) model s fo r stationar y dat a processes . Trans formations o f th e AD L mode l t o erro r correctio n an d t o variou s other form s ar e described . W e discus s th e estimatio n o f long-ru n multipliers fro m dynami c models , an d th e equivalenc e o f th e estimates o f thes e multiplier s (an d thei r variances ) fro m an y o f several differen t forms . Finally , w e conside r inferenc e abou t long run multiplier s wher e expectationa l variable s ar e present , an d th e potential problem s ar e show n t o b e specia l case s o f th e genera l invalidity o f inferenc e when th e regressor s ar e no t weakl y exogenous fo r parameter s o f interest. In late r chapters , w e wil l concentrat e o n th e importanc e o f integrate d processes fo r econometri c modelling , an d i n particula r o n th e detectio n of th e stochasti c trend s embodie d i n integrate d processes , o n identifyin g series tha t shar e stochasti c trend s an d therefor e satisf y long-ru n equi librium relations , an d o n th e implication s o f suc h propertie s fo r th e estimation o f economi c relationships . Befor e beginnin g t o explor e thes e concepts, however , ther e ar e a numbe r o f aspect s o f th e us e an d specification o f dynami c econometri c model s whic h ca n b e reviewe d without a thoroug h knowledg e o f integrate d processes , an d whic h wil l be usefu l i n late r discussion . Th e calculatio n o f th e parameter s o f long-run relationship s fro m estimate d models , th e interpretatio n o f linear transformations , an d th e form s o f particula r model s suc h a s th e error-correction mode l ar e amon g thes e topics . Th e variable s use d i n this chapte r ma y al l be treate d a s being stationary , bu t reader s wh o ar e familiar wit h the concept s examine d i n late r chapter s wil l recogniz e tha t the sam e result s appl y if the variable s ar e co-integrated . One simpl e but fundamenta l problem tha t w e addres s i s the following : given a variabl e whic h in genera l depend s upo n it s ow n past an d o n th e values o f variou s exogenou s variables , ho w ca n w e determin e th e long-run equilibriu m relationshi p betwee n th e endogenou s variabl e an d the exogenou s variables ? I f a n endogenou s variabl e y t i s expresse d a s a
Linear Transformations an d ECM s 4
7
function onl y o f the valu e of a se t o f exogenou s variable s z t a t th e sam e point i n time , th e effec t o f z t o n y ( i s immediat e an d complete ; however, i f a la g distribution applie s t o ever y variable i n the model , th e long-run effec t mus t b e derive d a s a function o f al l the la g distributions . Moreover, ther e ar e othe r type s o f informatio n that ca n b e reveale d b y a dynami c equation; an y o f a numbe r o f equivalen t form s wil l provid e the sam e informatio n about, say , short-ru n an d long-ru n adjustment, but different form s o f th e equatio n wil l revea l differen t type s o f information conveniently. We wil l conside r a numbe r o f way s i n whic h t o estimat e long-ru n multipliers fro m dynami c regressio n models , an d i n doin g s o wil l examine severa l differen t type s o f model . Afte r describin g th e genera l autoregressive-distributed la g (ADL ) mode l fro m whic h th e othe r models ar e derived , w e firs t concentrat e upo n th e error-correctio n model, i n whic h th e term s representin g th e exten t o f deviatio n fro m equilibrium ar e explicitl y presen t i n th e estimate d equation , an d whic h therefore immediatel y display s informatio n abou t th e adjustmen t tha t a process make s to a deviation fro m som e long-ru n equilibrium. This chapte r wil l emphasiz e tw o importan t point s abou t linea r transformations. First , eac h o f the transformation s contains precisel y th e same information : th e estimate d value s o f long-ru n multipliers , hypo thesis tes t statistics , an d explanator y power s o f th e differentl y trans formed model s ar e al l identical . Th e choic e o f transformatio n ca n b e made purel y o n th e basi s o f convenience , an d w e wil l conside r whic h ones ar e convenien t fo r differen t purposes . Th e secon d poin t i s a corollary o f th e first , bu t i s wort h emphasizing : th e estimate s o f short-run adjustmen t parameters fro m th e error-correctio n mode l d o no t depend upo n th e paramete r d, use d i n definin g th e error-correctio n term y t_i — 9zt-i, as long a s other level s term s ar e presen t t o allo w for adjustment t o th e chose n parameter . I n particular , a value of unity for 6 may b e chosen , leadin g t o wha t is called 'homogeneity ' (a n error-correc tion ter m o f y t_i — zt-i), a s long a s th e necessar y extr a term s ar e present. Next, w e consider severa l othe r transformation s o f the autoregressive distributed la g model , du e t o Bewle y (1979 ) (an d discusse d b y Wickens and Breusc h 1988 ) an d Bardse n (1989) . Eac h o f thes e transformation s can b e relate d t o th e error-correctio n transformation , an d w e indicat e some o f th e implication s o f thi s fact fo r estimatio n usin g one o r othe r o f the transformations . Finally , w e will discuss som e potentia l difficultie s i n the estimatio n o f long-run equilibriu m relation s an d thei r interpretation , following McCallu m (1984) , Kell y (1985), and Hendr y an d Neale (1988). While thi s chapte r deal s explicitl y wit h stationar y (1(0) ) processes , many o f the model s considere d ca n b e use d wit h co-integrated processe s as well , a s explore d i n Chapter s 5 an d 6 . I n particular , th e equivalenc e of thes e transformation s (i n th e sens e tha t eac h for m ca n b e derive d
48 Linea
r Transformations an d ECM s
from an y othe r b y operatin g linearly o n th e variables ) i s relevant whe n dealing wit h th e Grange r Representatio n Theorem , als o discusse d i n Chapter 5 . Thi s equivalenc e ha s implication s fo r derivation s o f th e distributions o f coefficien t estimate s i n co-integrate d systems . I n a particular transformation , fo r example , th e variable s ma y al l b e inte grated o f orde r zero , s o tha t th e asymptoti c theor y o f stationar y processes applie s to th e distribution s of the estimates . Suc h a parameter ization migh t b e convenien t fo r inference , becaus e it s informatio n content i s identica l t o tha t o f th e origina l parameterization , i f fo r example tha t for m containe d bot h 1(1 ) an d 1(0 ) variables . Thes e issue s are considere d a t lengt h i n Chapte r 6 , an d th e analysi s i n thi s chapte r provides useful backgroun d for that discussion .
2.1. Transformation s o f a Simple Model Before beginnin g a genera l treatment , w e conside r th e first-orde r linea r autoregressive-distributed la g model, denote d ADL(1,1) , a s an exampl e and deriv e severa l linea r transformation s o f it . Eac h transformatio n i s equivalent i n th e sens e tha t eac h implie s th e sam e relationshi p betwee n exogenous an d endogenous variables . Th e ADL(1,1) is where e f ~IID(0, a2 ) an d \<Xi\ < 1 (se e Hendry , Pagan , an d Sarga n 1984). First conside r a stati c equilibriu m defined , a s above , a s a n environ ment i n whic h al l chang e ha s ceased , recallin g tha t w e ar e treatin g ( y t , x t ) a s jointl y stationary . Th e long-ru n value s ar e give n b y th e unconditional expectation s o f th e for m E(y t) i n (la) . Definin g v* = E(y t) an d x* = E(x t) V t, w e have, sinc e E(e t) = 0, and henc e
or
Then ki i s the long-run multiplier o f y wit h respect t o x. Now subtrac t v r _i fro m bot h side s o f (la) an d the n ad d an d subtrac t PoXf-i o n th e right-han d side to get 1 1 Equatio n (la ) i s invarian t t o suc h linea r transformation s whic h preserv e th e erro r process {e,} .
Linear Transformation s an d ECM s 4
9
Alternatively, w e could hav e adde d an d subtracte d (j8 0 + ft);c,_ i o n th e right side , t o get All o f thes e equation s impl y the sam e relationship , becaus e an y on e can be derive d fro m anothe r withou t violating the equality . In equation s (Ic) an d (Id), however , term s representin g th e discrepanc y betwee n yt-i an d x t-i o r betwee n y r _ t an d k\x t-\ appea r explicitly ; th e coefficient (th e sam e fo r eac h form ) o n thes e term s ca n b e take n a s a measure o f th e spee d o f adjustmen t o f y t o a discrepanc y betwee n y and x i n th e previou s period . W e examin e suc h error-correction models in detai l in the nex t section. Equation (Ib) i s similar to (Ic ) an d (id) i n that th e sam e information appears explicitl y a s a coefficient ; tha t is , (<* i - 1 ) represent s th e short-run adjustmen t to a 'discrepancy' , an d thi s coefficien t ca n b e rea d directly fro m an y of the three . Equatio n (Ib) wil l be see n t o b e a special case o f (5 ) below, just as (Ic) is a special cas e o f (3 ) below . Finally, le t u s retur n t o (la) an d tak e a differen t route . Subtractin g <x\yt fro m bot h sides , w e have Defining A ! = ( 1 — ai)~l an d addin g and subtractin g ft*, , w e have This i s agai n a specia l cas e o f on e o f th e genera l form s o f transformed ADL model s give n belo w (equatio n (4)) . Thi s form , followin g Bewle y (1979), convenientl y reveal s th e long-ru n equilibriu m multiplie r a s th e coefficient o f x, i n (le) sinc e A^/J o + ft ) = k±. However , becaus e a contemporaneous valu e o f th e dependen t variabl e appear s o n th e righ t side o f the equation , ordinar y least-square s estimate s ar e no t consistent ; consistent estimatio n ca n be carried out usin g instrumental variables. Next, conside r a data-generatio n process havin g the for m o f a general autoregressive-distributed la g mode l (Hendr y e t al. 1984) . A n ADL(m, n) mode l with a constant an d p exogenou s variables, whic h we will als o writ e as ADL(m, n; p), i s given by 2 2 W e us e th e sam e n fo r eac h o f th e p exogenou s variable s withou t los s o f generality , because an y /^, - ma y b e se t equa l t o zero , s o tha t n i s simpl y th e maximum , rathe r tha n uniform, la g length of the x t.
50 Linea
r Transformation s an d ECM s
where e t ~ IID(0 , cr 2 ). W e migh t als o writ e this, usin g th e la g operator
where a(L) = 1 - ^T= i®iL' an d /3;(L) = ^JL0 /?y,-L' . A s before, ther e are a numbe r o f possibl e transformation s o f thi s equatio n which , because the y d o no t ad d o r remov e an y linearl y independen t column s from th e dat a matrix , ar e equivalen t projection s o f th e dependen t variable on to the data . Give n joint stationarity, th e long-ru n solutio n of (2) is
where or(l ) an d /3y(l ) represen t th e substitutio n o f unit y fo r th e la g operator L i n the la g polynomials.
2.2. Th e Error-correctio n Model The firs t o f th e genera l form s tha t w e examin e i s th e error-correctio n model. Error-correctio n term s wer e use d b y Sarga n (1964) , Hendr y an d Anderson (1977) , an d Davidso n e t al. (1978 ) a s a wa y o f capturin g adjustments i n a dependen t variabl e which depende d no t o n th e leve l of some explanator y variable , bu t o n th e exten t t o whic h a n explanator y variable deviate d fro m a n equilibriu m relationshi p wit h th e dependen t variable. Whe n th e equilibriu m relationshi p i s o f th e for m y * = dx*, then a n error-correctio n ter m i s on e suc h a s (y t — 9xt), i f thejsaramete r in th e equilibriu m relationshi p i s presumed known , or (y t - 6x t) i f it is estimated. However , eve n (y, — xt) coul d b e used , sinc e th e possibilit y of a coefficien t othe r tha n unit y o n x, ca n b e capture d throug h othe r terms i n the regression , a s we will see below. We ca n deriv e a generalize d error-correctio n mode l (ECM) , cor responding t o th e ADL(m, n; p) mode l wit h p exogenou s variable s x^, . . ., x p, b y step s simila r t o thos e use d i n th e specifi c case s above . Th e result, whic h allow s u s t o specif y directl y a genera l dynami c regressio n
Linear Transformations an d ECMs
51
model in the for m o f an ECM, i s (for r ^ m )
with
and
By convention , i n th e cas e o f an y ter m fo r whic h summation s begin from r + 1, th e ter m doe s no t ente r a t al l i f th e lowe r limi t o f th e summation exceed s th e uppe r limit . Fo r eac h o f th e 'error-correction ' terms (y t_t - ] £ £=!*,•,_,•), on e lagge d ter m i n x jt i s presen t t o brea k 'homogeneity':- that is , t o allo w th e error-correctio n ter m t o tak e th e form (_y r _ ; - 2f= i 6j xjt-i)> wher e 8, is not equal t o one. Th e 9j are the equilibrium multiplier s give n above: 9/ = f}j(l)/a(i); an d i f the 9 j wer e known, the y coul d b e inserte d directl y int o th e EC M term s i n (3 ) an d the term s i n lagge d x coul d b e eliminated. 3 I n term s o f th e parameter s of (3) ,
Since th e EC M i s simply a linea r transformatio n of th e AD L model , we might ask what its distinguishing feature is. The answe r is that in the ECM formulation , parameter s describin g th e exten t o f short-ru n adjust ment t o disequilibriu m ar e immediatel y provide d b y th e regression . Although th e for m i n (3 ) i s analyticall y convenient , i t i s no t a usefu l empirical specification . I n practice , a singl e error-correctio n ter m a t la g r i s preferable , a s i t induce s a mor e interpretabl e an d mor e nearl y orthogonal parameterization . The error-correctio n mechanis m will be o f particula r valu e wher e th e extent o f a n adjustmen t t o a deviatio n fro m equilibriu m i s especiall y interesting. I t i s clear tha t th e EC M provide s thi s informatio n when th e error-correction term s ar e o f th e for m (y t-i — E/=i Qj xjt-i)i wit h Qj a known parameter . I f 6j i s not know n i t ca n b e estimated ; moreover , a n unknown 0 ; ca n implicitl y b e allowe d fo r i n th e error-correctio n ter m 3
Not e that this require s
52 Linea
r Transformation s an d ECM s
through th e inclusio n o f extr a lag s i n th e x/, withou t affectin g th e magnitude o f th e estimate d coefficient s 17, - i n (3) . Henc e thes e para meters d o no t nee d t o b e estimate d a t a n earlie r stag e i n order t o allo w us t o us e th e ECM . I n fact , a n importan t poin t i n favou r o f th e generalized EC M (3 ) i s tha t th e estimate d coefficient s o n th e error correction term s ar e unaffecte d by th e incorporatio n o f an y constan t 9 into th e term ; thi s wil l b e prove d afte r w e hav e establishe d som e othe r results whic h wil l simplif y th e proof . Th e implicatio n i s tha t w e ca n interpret th e coefficient s ry , i n (3 ) directl y a s adjustment s t o disequili brium eve n thoug h th e tru e disequilibriu m ter m i s give n b y (yt-t - Zf= i SjXjt-i) an d not by (y r _; - Xf= i *,*-;)• Henc e th e use of a generalized EC M does no t imply homogeneity ( 9 = 1) a s long a s extr a lags i n th e x, ar e incorporated , eve n thoug h th e error-correctio n term s that ente r (3 ) d o no t explicitl y allow for 9 ¥= 1 .
2.3. A n Exampl e An exampl e o f th e us e o f th e error-correctio n mechanis m ca n b e foun d in Davidso n e t al. (1978) , wh o us e a homogeneou s (6 = 1 ) error-correc tion mechanis m i n th e modellin g o f consumers ' expenditure . Th e 'error ' to whic h adjustment i s made i n th e mode l i s the differenc e between th e logarithms o f consumptio n an d income , eac h lagge d fou r quarters . Th e error-correction ter m i s significant i n a wide variety of specifications . I n particular, usin g quarterl y seasonall y unadjuste d dat a fro m th e Unite d Kingdom, expresse d a t constan t price s over th e sampl e perio d o f 1958(1) -1970(IV), th e author s favou r th e model 4 (standar d error s i n parentheses):
where th e statistic s z\ an d z 2 ar e asymptoti c x 2 test s f° r paramete r constancy an d seriall y independen t residuals , respectivel y wit h degree s of freedo m i n parentheses ; C, i s th e fitte d valu e o f rea l consumers ' expenditure o n non-durabl e good s an d service s C t; Y t i s rea l persona l disposable income ; P t i s the pric e deflato r fo r consumption ; an d D° i s a dummy variabl e fo r change s i n taxation. Th e error-correctio n ter m ha s a 4 Th e symbo l AjA 4 represent s th e firs t differenc e o f th e fourt h difference ; e.g . A 4 log Y, - A 4 log y,_j = AjA 4 log Y,.
Linear Transformation s an d ECM s 5
3
coefficient tha t i s reasonably substantia l a s well as statistically significan t at conventiona l levels . Th e mode l ca n readil y b e derive d fro m a n AD L model, notin g tha t log(C/y),_ 4 = lo g C,_ 4 - lo g Yt_4 = c,_4 - y,_ 4 , using lower-case letters t o denot e logarithms . On th e additiona l assumptio n tha t A 4cr, A 4y,, an d A 4pf ar e station ary, wit h £(A 4 c r ) = g c, E(& 4yt) = g y, an d E(A 4pt) = pa (th e annua l rate o f inflation) , then , takin g expectation s o f th e equatio n abov e fo r fixed value s of the estimate d parameters , Hence C * =kY* wher e k = exp(-5.3g_y - 1.3p a), notin g tha t g c = g y given th e proportiona l long-ru n solution . Thi s for m o f solutio n i s consistent wit h th e life-cycl e hypothesi s (se e Deato n an d Muellbaue r 1980), i n whic h case th e coefficient s of g y an d p a shoul d correspon d t o the negative s o f th e annua l wealth-incom e an d liqui d asset-incom e ratios. Th e resulting values seem sensible . For positiv e rea l growt h o r inflation , k<\, and k fall s a s g y o r p a rises. Representativ e value s o f g y an d p a ar e 0.02 5 an d 0.0 5 respect ively. Thes e impl y a valu e o f k o f e~°' 2 = 0.82, an d therefor e a (savings + durable expenditure) to income rati o of (1 - 0.82 ) or 18%. This mode l has an additiona l interpretatio n whic h can ofte n be give n to a n error-correctio n term . Th e coefficien t o f -0.1 0 o n log(C/Y) ; _ 4 suggests, firs t o f all , tha t th e greate r i s th e exces s o f incom e ove r consumption (i n logarithms) for th e correspondin g quarte r on e yea r ago, the highe r i s consumption now . Tha t is, a s income exceeds consumption by more , i t become s optima l t o rais e consumptio n i n th e future . Th e 'error' whic h i s partiall y corrected i s thi s discrepancy ; consumer s ma y consume unusuall y much (o r little ) a t som e poin t i n time , bu t wil l the n tend t o consum e relativel y les s (o r more ) a t som e poin t i n th e future . This i s implied simpl y by th e negativ e sig n of th e effect ; i n addition , we have a n estimat e o f it s magnitude , an d i t i s apparen t tha t th e effec t i s substantial. Moreover , b y adjustin g expenditure i n thi s way , consumption an d incom e ar e tie d togethe r i n th e lon g run , despit e th e growt h over tim e in the leve l of each . Consequentl y the mode l formulatio n actually entail s th e co-integratio n o f consumption an d income, give n that these serie s ar e individuall y integrated : se e Chapte r 5 . Fo r a recen t update, se e Hendry, Muellbauer, an d Murphy (1990).
2.4. Bardse n an d Bewle y Transformations Two othe r transformation s o f th e ADL(m,n;p ) ar e thos e o f Bewle y (1979) an d Bardse n (1989) . Th e Bewle y transformation has the for m
54 Linea
r Transformation s an d ECM s
where A is define d i n (6 ) below . Not e tha t i n (4) , a s i n (2 ) an d i n (5 ) below, ther e ar e k = l + m + [p(n + 1)] coefficient s t o be estimated . The transformatio n treated b y Bardse n ca n b e see n a s a varian t o f a n error-correction mechanism , and ma y be writte n as
where th e coefficient s are relate d t o thos e i n (2) and (4 ) by
Finding th e long-ru n multiplie r implie d b y an y on e o f thes e form s (ADL, ECM , Bewley , Bardsen ) i s quit e straightforward . Define 9 j a s the long-ru n effec t o f a chang e i n x/ o n y. Recall tha t i n a n equilibrium state ther e ar e n o stochasti c shock s an d value s o f al l variable s ar e therefore constant , s o that, writin g y * = E(y t) an d x*= E(xj t),
Then, correspondin g t o the AD L mode l (2), w e have
where A = (1 — 2;= i ^i)"1 = Ml)]" 1- I t i s importan t t o not e tha t thi s formula i s applicabl e onl y wher e |2)S= i ari i s strictl y les s tha n 1 . Otherwise, n o long-ru n equilibrium ca n b e sai d t o exis t betwee n y an d x, as thes e quantitie s ma y diverg e increasingl y a s t -* °°. I n particular , the unconditiona l expectations are no t wel l defined. Corresponding t o th e Bewle y transform (4), we also have
and thi s ca n b e rea d directl y fro m th e estimate d regressio n (4 ) a s th e coefficient o n x it. Finally, usin g the Bardse n transformatio n (5), we have
Linear Transformations an d ECM s 5
5 (7)
This expressio n ca n b e compute d fro m Bardsen' s regressio n (5 ) simpl y by dividin g the coefficien t on Xj,- n b y the negativ e o f that o n y t-m. Each o f thes e transformation s leads t o numericall y identical estimate s of th e long-ru n multiplie r (th e sam e estimate d equilibriu m relationship ) if (2 ) an d (5 ) ar e estimate d b y ordinar y least square s (OLS ) an d (4 ) b y instrumental variable s (IV) , usin g th e regressor s fro m th e AD L mode l (2) a s instruments . Th e necessit y o f I V fo r consisten t estimatio n o f (4 ) stems fro m th e presenc e o f contemporaneous term s i n th e dependen t variable y t o n th e right-han d sid e o f (4) , renderin g th e erro r ter m correlated wit h those explanator y variables. Finally, i t i s wort h pointin g ou t th e sens e i n whic h th e Bardse n transform i s a n error-correctio n form . Th e coefficient s of ar e sums o f the term s jj, - i n th e EC M representin g adjustmen t to disequilibriu m (a s shown followin g (5) above ) an d th e of ma y therefor e b e though t o f a s cumulative adjustments : a* represents the sum of the effect s of error correction term s 1 , . . ., i . Fo r som e purpose s thes e cumulativ e adjust ments are of particula r interest , in whic h case the Bardse n for m will be especially convenient.
2.5. Equivalenc e o f Estimates from Different Transformations Wickens an d Breusc h (1988 ) sho w tha t th e Bewle y transformatio n (estimated b y I V wit h AD L regressors a s instruments ) yield s precisel y the sam e estimate s o f th e long-ru n multipliers Q- t a s doe s th e untrans formed AD L (2 ) estimate d b y OLS . Th e sam e i s tru e o f th e Bardse n transform (5 ) estimate d b y OLS , an d o f th e genera l error-correctio n mechanism (agai n estimate d b y OLS) , a s Banerjee , Galbraith , an d Dolado (1990£> ) show . I n demonstratin g thes e point s w e wil l make us e of the genera l structur e that Wicken s and Breusc h use to compar e linea r transformations o f regression models . Take a s a basic structure th e regressio n mode l where th e X matri x contain s lagge d (bu t no t contemporaneous ) y a s well a s contemporaneou s an d lagge d x terms , an d y is a k x 1 vector . Define thi s as corresponding t o the ADL mode l (2) . The representation s (4) an d (5 ) involv e transformin g the matrice s y an d X b y a transformation matri x A, suc h that, followin g Wickens and Breusch ,
Linear Transformation s an d ECM s
56
so that For example , tak e m = n = 2 and p = 1 in (2 ) so that th e matri x of the transformation t o th e Bardse n for m (5 ) is
1
0
-1
0 0 0 0 0
0 0
1
0
0 0 0 0
1
-1 0 0 0
0 0 0 0 0
0 0 0 0
1
0 0 0
1
0 0 0
1 -1
_1
0
0 0 0 0 0 0
(10a)
1
since x' t = [yt, 1 , y,^, y,_2, xt, jc,_1; *,_2] map s ont o \' t = [&yt, 1, Ayf _i, A*<5 Ax,-i, yt-2,xt-2] i n (5) . Fo r th e Bewle y transformatio n (4 ) an d the sam e cas e ( m = n = 2, p = 1) , the transformatio n matrix is
1
0 0 0 0 0 0
0
1
0 0 0 0 0
1
0
_1
0 0 0 0
1
0 0 ^
0 0 0
0 0 0 0
1
0 0
0 0 0 0
1 -1 0
(f
0 0 0
1
(106)
0
_1
since
First, le t us summariz e the relatio n betwee n th e erro r processe s u an d e. Begi n by partitioning th e genera l matri x A (whic h may b e A a ) A/, , o r another transformation ) to b e conformabl e wit h [ y : X]:
When ther e ar e k regressors , th e element s o f th e partitio n hav e th e following dimensions : au , 1x1; a 12, 1 X k; a 21, k X 1; A22 , k X k . When ther e ar e n o contemporaneou s y variable s o n th e right-han d side o f th e transforme d mode l a 12 = 0 , an d presumin g tha t A i s o f ful l rank, the n th e tw o set s o f errors ar e identical : u = e. If aj 2 ¥= 0, a s in th e Bewley transformation , the n a contemporaneou s y t (multiplie d b y a scalar) ha s bee n adde d t o th e right-han d sid e o f th e equation , an d
Linear Transformations an d ECM s 5
7
therefore t o th e lef t t o preserv e th e equality . Fo r estimation , th e equation mus t the n b e renormalize d s o tha t w e hav e onl y y, (unsealed) on th e left , an d al l elements mus t b e multiplie d b y th e normalizatio n factor. Thi s normalizatio n wil l have t o b e accounte d fo r late r t o conver t back to the original parameters . Transformation b y A require s renormalizin g b y dividin g th e entir e equation b y th e facto r (a n - a uS) ^ 0 t o delive r (8 ) fro m (9a) an d (9b), s o the error s of the ne w process ar e give n by where a is the normalizatio n constant. Fo r example , if we begin with (where [ • ] does not depen d o n y ( _, V i) an d transfor m t o we d o s o b y subtractin g fty , fro m eac h sid e an d dividin g th e entir e equation b y (1 - ft) , s o that d r = -ft/(l - ft) , <5 2 = /32/(l - ft), and M, = £,/(! - ft) . Th e parameters satisf y th e general formula (12
)
with a = (1 - ft), 5 = [-ft/(l - ft): ft/(l - ft)]', y =[ft: ft]', and 11 0 A= 0 - 1 0 00 1 Now conside r th e relationshi p betwee n estimate s o f long-ru n multi pliers i n transforme d an d untransforme d models , startin g wit h th e Bardsen transformatio n whic h ca n b e estimate d consistentl y b y OLS. Different calculation s mus t b e performe d o n th e tw o set s o f regressio n estimates t o ge t long-ru n multipliers; if we estimate (2 ) we must perform the calculatio n (6) , an d i f we us e (5 ) w e mus t perfor m th e divisio n (7) . We wan t t o sho w tha t th e actua l estimate s tha t w e ge t wil l b e numerically identica l whicheve r metho d w e use . We kno w fro m th e definition o f th e OL S estimato r o f th e transforme d mode l ((9b) or , explicitly (5) ) that
as is easily verified by substitutin g the formul a for th e OL S estimato r of d, 6 = (X'X^X'y. B y definitio n o f th e OL S estimato r i n (8 ) (cor responding to (2)) ,
58
Linear Transformation s an d ECM s
again, easil y checked usin g the formul a y = (X'X ) *X'y . Now in the cas e o f the Bardse n transform , a12 = 0 and A has the form
1
0
0
0 ..
.0
0
0
0 1
0 0 0
22
so that (13) and ( y : X) = (A y : XA22) togethe r imply so
(15)
since A 22 i s of full rank . From (14 ) and (15) , we can deduc e that 5 (16)
where th e equalit y follow s fro m th e fact s tha t th e matri x X'( y : X) = F has dimensio n k X (k + 1) an d ran k k, wher e agai n k = 1 + m + [p(n + 1)], an d tha t F ca n b e partitione d a s [F j : F2] wit h F j havin g dimension k x 1 and F 2 havin g dimension k x k an d rank k. So the sam e relationshi p hold s between estimate d parameter s y and d as betwee n th e tru e parameter s y and d (i.e. (16) has th e sam e for m as (12)). Wha t thi s mean s i s tha t th e estimate s of , say , multipliers wil l b e the sam e whichever transformation is used. T o make thi s last ste p i n th e argument, le t ^be a quantit y calculate d fro m th e true parameter s of model A , an d W, , b e th e sam e quantit y calculate d fro m th e tru e parameters o f th e transforme d mode l B . Clearly , VP A —^V B sinc e th e calculation i s adapte d t o produc e th e sam e underlyin g quantity i n eac h case. Le t th e function s describin g th e calculations b e ff. 1,,] an d g[_ x d ] respectively. Then , b y (12),
5 Th e normalizin g constan t a doe s no t appea r her e becaus e i t happen s tha t u = e in (8) (a12 = 0) fo r th e cas e o f th e Bardse n transform , s o tha t a = 1 . Fo r th e Bewle y transform , there is a non-zero normalizing constant .
Linear Transformation s an d ECM s 5
9
But sinc e thi s holds fo r an y y , it must hold fo r y , an d s o
the secon d equalit y followin g fro m (16) . Thi s implie s that 1 P A = V P B : we get th e sam e estimated quantit y fro m eithe r model , usin g th e appropri ate transformation matrix. One ca n therefor e obtai n th e coefficient s of either th e transforme d o r untransformed mode l fro m th e other , usin g th e origina l transformatio n matrix A . Fo r example , i n calculating 0 y- from th e ADL , w e use (6) ; (6 ) is on e par t o f the transformatio n A applied t o th e origina l parameters y . If th e parameter s o f th e Bardse n transfor m ar e d , the n th e 6 j ar e ratio s of element s o f 6 a s i n (7) . Calculatin g 0 ;- fro m th e AD L amount s t o using ratio s o f sum s o f selecte d element s o f th e vecto r A" 1 [ A ] in th e calculation; b y (16) , thi s formul a yield s th e correspondin g element s o f [ Jg], an d s o these result s ar e precisel y th e sam e a s those obtaine d fro m the Bardse n transformatio n an d (7) . Th e sam e hold s tru e fo r an y linea r transformation wher e A is of full ran k (A" 1 is non-singular). In th e cas e o f transformation s fo r whic h a ^ ^ O , suc h a s Bewley's , OLS estimatio n i s inconsisten t becaus e o f th e correlatio n betwee n th e error ter m an d th e contemporaneou s dependen t variable s o n th e right hand side . This bring s us to I V estimation . Where th e instrumenta l variable s use d i n estimatio n ar e thos e o f th e untransformed AD L model , I V estimatio n o f the Bewle y transformatio n also yield s estimate s o f th e long-ru n multiplier s identica l t o thos e fro m the ADL . I n thi s case , analogousl y wit h (13) , i f we le t d b represen t th e parameters o f th e mode l (4 ) an d A fc th e matri x o f tha t transformation , we have
where 8 bilv is the IV estimato r of d b; again , this formula is immediately verifiable 'by substituting 6 3 6>1V = (X'X^X'y . From (17) ,
We mus t the n normaliz e b y a b (define d t o b e th e constan t tha t normalizes th e dependen t variable' s coefficien t t o unity , analogous t o a in (11) ) befor e w e compar e thi s estimato r wit h anothe r whic h ha s bee n normalized t o hav e th e dependen t variabl e ente r wit h a coefficien t of one, an d w e the n obtai n (Wicken s an d Breusc h 1988) , followin g step s similar to thos e above , 6
Th e I V estimator take s this form becaus e the origina l Xs are bein g used as instruments in th e transforme d regression model involving y and X .
60 Linea
r Transformation s an d ECM s
Comparing (19 ) with (16), it i s clear tha t onc e agai n th e estimate s fro m the transforme d mode l ca n b e relate d bac k t o thos e fro m OL S o n th e ADL model , o r t o thos e fro m th e othe r transformatio n A a , throug h th e known transformatio n matrices . Moreover , comparin g (19 ) with (12), the sam e relatio n hold s i n estimate d parameter s a s i n th e tru e para meters, s o tha t estimate s o f function s o f thes e parameter s (suc h a s th e long-run multipliers ) wil l b e th e sam e regardles s o f th e mode l fro m which the y ar e calculated . Here , usin g th e Bewle y transformation , th e long-run multiplier s 9 j appea r directl y in d b; t o calculat e the m fro m th e ADL parameters , w e would use
2.6. Homogeneit y an d th e EC M as a Linear Transformation o f the AD L The result s jus t establishe d allo w a straightforwar d proof o f th e earlie r statement that , b y incorporatin g lag s o f th e level s o f explanator y variables, th e generalize d EC M make s n o implici t homogeneity assump tions. Conside r th e tw o regression s
and
where r = min(m,n). Th e differenc e betwee n (20a ) an d (206 ) lie s i n the fac t tha t th e d j i n (20b) ar e se t t o unit y in (20o) . W e wil l prove tha t the coefficient s o n th e error-correctio n term s ar e non e th e les s equal , i.e. tha t y t = §, • for all i and arbitrar y dj. The ADL mode l is
61
Linear Transformations an d ECM s
We wil l call th e ful l paramete r vector s fro m (20a) , (205) , an d (20c ) a , b, an d c respectively . Then , fro m ou r examinatio n o f genera l linea r transformations above , redefinin g th e particula r transformatio n matri x A6:
In the m = n = 2, p = 1 case, fo r example, A a an d A ft ar e equal to
I
0
-1
0 0 0 0 0
0 0 0 0 0
1
0
Ah =
-1 0 0 0 0
1
0
1
0 0 0 0 0
0 0
1
0 0 0
-1
0 0
1 -1 0
0
0 0 0
0 0 0 0
0 0 0 0 0
o"
0
0
1
0 0 0
0 0
1
0 0
-e
0
1
_1
1
0 0 -8
0 0 0 0
0 0 0 0 0
1 -1
1
1
0 0 0 0 0 0
1_ 0 0 0 0 0
Since th e firs t (min(m,« ) + 3) row s remai n unaffecte d b y th e ne w terms 9j, th e firs t (min(m , n) + 3) entries i n
are unaffecte d b y th e arbitrar y constant s 9j. Henc e th e firs t (min(ra, n) + 2) element s o f th e paramete r vector s a an d b , whic h correspond t o the error-correctio n terms , must be identical. Thus th e generalize d ECM , usin g lagge d term s i n th e exogenou s variables t o brea k homogeneity , produce s precisel y th e sam e estimate s of th e response s t o 'disequilibrium ' whethe r o r no t th e error-correctio n terms involve postulated value s of long-run multipliers explicitly.
2.7. Variance s o f Estimates o f Long-Run Multiplier s We wan t t o b e abl e t o comput e no t onl y th e estimate s o f long-ru n multipliers, bu t als o th e variance s o r standar d error s o f thes e estimates . Since th e long-ru n multiplier s ar e calculate d a s ratio s o f coefficient s or sums o f coefficients , and sinc e ther e i s no genera l formul a for th e exac t
62 Linea
r Transformation s an d ECM s
variance o f a quotien t o f item s wit h know n variances , w e mus t us e a n approximation t o th e varianc e o f the quotient . I n th e cas e o f the Bewle y transformation, sinc e th e long-ru n multiplier s appea r a s coefficient s o n the Xj t, w e ca n rea d th e variance s o n thes e estimate d coefficient s fro m the usua l estimato r o f th e variance-covarianc e matri x o f I V coefficien t estimates; thi s estimat e implicitl y embodie s a n approximatio n t o th e variance o f th e quotient , althoug h i t migh t appea r t o b e a n exac t estimate.7 I n fact , th e differen t transformation s yield equivalen t results , in tha t th e natura l approximat e estimato r o f th e variance s i s th e sam e for each . For th e Bewle y transformatio n (4) , sinc e th e 6j ar e coefficient s in th e regression, w e appl y th e formul a fo r th e covarianc e matri x o f coeffici ents estimate d i n a n instrumenta l variable s regression . Usin g V ^ t o represent th e estimate d varianc e of y , th e estimate d paramete r vector ,
(23) Wickens an d Breusc h (1988 : 198 ) sho w tha t thi s i s equa l t o th e covariance matri x o f th e sam e paramete r vecto r d b<[V, calculate d (in directly) b y applying the transformation \b to the original parameter s y and usin g th e Jacobia n o f thi s transformatio n t o approximat e th e estimated covarianc e matri x Vg ftiv . Tha t is , (24) and Vg b ca n b e reduce d fro m thi s t o th e sam e expressio n a s tha t give n for V^ iv i n (23) . Both th e origina l AD L mode l an d Bardsen' s transformatio n involv e a calculation o f th e 6, a s nonlinear function s of coefficient s in th e origina l regression. Followin g Bardsen , a standar d formul a for a n approximatio n to th e varianc e o f a nonlinea r functio n o f element s wit h know n variances ca n b e use d t o comput e var(§ ; -). Le t / = /(a 1 ,a 2 5 • • •> «//) ; then / = /(«!, 2 2, . . ., a H) and
(25a) In th e cas e o f th e ADL , w e hav e / = §, = ^^f= o^ji
wher
e
7 I t migh t see m impossible tha t th e Bewle y transformatio n coul d involv e a ratio, bein g a case o f a linea r transformatio n matri x A b applie d t o th e origina l linea r regression . Recall , however, tha t ther e i s als o a normalizatio n facto r applied , throug h whic h divisio n b y another linea r functio n o f the origina l coefficient s is accomplished .
Linear Transformation s an d ECM s 6
3
A = ( 1 — Sl^i^i)"1' a s implie d b y (6 ) above . Th e estimate d variance s and covariance s o f th e paramete r estimate s $ are base d o n ^(X'X)"" 1 from (8) . For th e Bardse n transformation , (25a ) take s a particularl y simpl e form, sinc e 6j i s calculate d fro m th e rati o o f onl y tw o parameters . Hence i f estimation i s via (5), we have, afte r takin g derivatives, that
recalling that var(/^j,) , var(^), an d cov (/?*„, o^) ar e easily calculate d from a^X'X)- 1. An equivalen t wa y of writing (25a) is to expres s it in the for m o f (24). For / = /(«!, a2, . . ., aH), a s above , le t V fl b e th e H x H covarianc e matrix o f whic h th e (g , h)th elemen t i s cov(a g, ah). Then , usin g th e Jacobian o f the transformatio n /(•), defined as
we have Wickens an d Breusch sho w that , substitutin g dj fo r / an d comparin g the result s o f (23 ) and (24 ) with thos e o f (25a) an d (256) , th e estimate d variances of long-run multipliers calculate d fro m th e AD L mode l ar e th e same a s thos e provide d i n th e I V estimato r o f th e Bewle y transform . We sho w no w tha t th e sam e i s als o tru e o f th e error-correctio n transformation, o r an y othe r linea r transformation , usin g th e for m (256). Tha t is , th e metho d o f proo f doe s no t us e th e feature s o f th e ECM o r o f an y othe r particula r transformation , bu t instea d applie s t o the estimate s fro m an y non-singula r linear transformation . Th e import ant point , a s above , i s th e equivalenc e o f result s yielde d b y differen t linear transformations of the model . Consider th e long-ru n multiplier vector a s calculated fro m th e ADL, The AD L approximatio n t o th e varianc e of the multiplie r is where J f i s th e Jacobia n o f th e transformatio n represente d b y th e function f ( • , •) . Th e long-ru n multiplie r vecto r calculate d fro m th e ECM is
64 Linea
r Transformations an d ECM s
and th e EC M approximatio n to th e varianc e is where J g correspond s t o th e functio n g ( • , • ). W e ca n prov e tha t for J f an d J g representin g genera l linea r transformations , s o tha t w e now hav e fo r th e variances , a s wel l a s th e poin t estimates , th e resul t that th e estimate s obtaine d d o no t depen d o n whic h transformatio n i s used. Tha t is , var ^ (0) = var £ (0) i f and onl y i f JfA 22 = Jg, because , b y (27) an d the relatio n
it follows that var,t (0) = JfA22 var (f), % )A22 Jf , an
d var
£
(0) = Jg var (ij,
To prove (30) , define
We kno w tha t f(a ) = 0 = g(h). No w J{ = 3f(a)/3a' an d J g = 3g(h)/3h', while (34) states tha t a = A 22h. Henc e
Rearranging yield s (30) immediately . So, estimatin g th e EC M o r anothe r linea r transformatio n an d trans forming t o ge t th e varianc e o f th e long-ru n multiplier lead s t o th e sam e result a s obtaine d b y transformin g th e AD L model . Wicken s an d Breusch (1988 ) showe d th e correspondin g result s fo r th e Bewle y estima tor; th e informatio n reveale d b y al l thre e transformation s i s th e same . As wit h th e Bardse n transformation , a 12 = 0 fo r th e ECM , an d it s parameters ar e therefor e consistentl y estimate d b y OLS .
2.8. Expectationa l Variable s an d th e Interpretatio n o f Long-Run Solution s So far , long-ru n solution s hav e bee n derive d fo r model s wit h vali d conditioning o n regressor s o r instruments . McCallu m (1984 ) an d Kell y (1985) hav e suggeste d potentia l problem s i n th e interpretatio n o f long -
Linear Transformation s an d ECM s 6
5
run solution s i n th e presenc e o f expectational variable s i n th e processe s generating th e data . Th e problem s are , however, readil y interprete d a s resulting fro m invali d (weak ) exogeneit y assumptions , an d d o no t uniquely concer n long-ru n solutions ; short-ru n effect s ma y b e badl y estimated also . Moreover , i f the variable s concerne d ar e non-stationar y and hav e particula r integratio n an d co-integratio n propertie s (se e Chapters 3-5) , then th e long-ru n solution , bu t no t th e short-ru n multipliers , can b e estimate d consistentl y despit e th e expectationa l variables . W e follow McCallu m an d Kell y i n describin g th e circumstance s i n question , and Hendr y an d Neal e (1988 ) i n relatin g thes e t o wea k exogeneit y an d non-stationarity. McCallum (1984 ) offer s the followin g example . Conside r the relation ship betwee n interes t rate s an d inflation , an d i n particula r th e Fishe r effect. Le t IT , denot e th e inflatio n rat e an d i t th e nomina l interes t rate . Then, i n the relationshi p i the Fishe r hypothesi s i s interprete d a s statin g tha t & = 1 ; i.e . i n long-run equilibrium , th e nomina l interes t rat e reflect s inflatio n on e fo r one. No w imagine tha t th e actua l generatio n o f these serie s i s accordin g to the processe s and
with \Hi\ < 1 ; v t an d e t white-nois e processes ; E(e tvs) = 0 V t, s, wher e p i s a constan t rea l interes t rat e an d fi t+\\t represent s agents ' forecast s of n r+1 made a t time t . Equatio n (33 ) implie s that th e Fisher hypothesi s is vali d i n eac h period , rathe r tha n a s a long-ru n equilibriu m only . Imagine furthe r tha t informatio n i s costles s an d tha t agent s understan d the {I T Jproces s s o that fl t+i\t = j" o + MiH, . The n Hence estimatio n o f (32 ) by a consisten t metho d suc h a s OL S wil l produce a coefficien t estimat e /? i whic h converge s t o \t,\ < 1. McCallu m emphasizes th e hazard s o f frequency-domai n time-serie s technique s fo r estimating long-ru n (zero-frequency ) effects , bu t th e conclusio n i s equally wel l applicabl e t o time-domai n methods . I n spit e o f th e validit y of th e Fishe r hypothesis , th e investigato r examinin g th e long-ru n solu tion throug h a model suc h a s (32) woul d falsel y conclud e tha t i t fail s t o hold. Kelly (1985 ) an d Hendr y an d Neal e (1988 ) us e th e following mor e general structur e in orde r to examin e the issue . The dat a are generate d by
66 Linea
r Transformation s an d ECM s
where 4>(L), fi(L) , an d y(L ) ar e finite-orde r polynomial s i n th e la g operator o f the for m
Furthermore, cj) 0 = 1, an d fo r simplicit y in this example w e tak e The lag operato r in the polynomia l P(L) is interprete d as applyin g to the firs t subscrip t of x,\,^i only , so that The underlyin g series {x t} i s generated accordin g to
The erro r term s £ r an d ?/• , ar e mutuall y and seriall y uncorrelate d whit e noise. Combinin g (37) and (38) , we have and
The parameter s /3, - canno t b e determine d fro m (40 ) without know ledge o f th e margina l proces s (38) ; hence, recallin g th e definitio n o f weak exogeneit y i n Chapte r 1 , x t i s no t weakl y exogenous fo r th e f r i n (40). If , however , w e someho w observe d x,\,-i directly , the n w e would be abl e t o estimat e th e /3 , fro m (39) . The proble m identifie d b y McCallu m an d Kell y i s therefor e simpl y one aspec t o f a broader , an d well-known , one: we canno t i n genera l count o n unbiase d estimate s fro m model s i n whic h th e explanator y variables ar e no t (weakly ) exogenous fo r th e parameter s o f interest. Th e solution, i n thi s circumstance , i s therefor e join t estimatio n o f (40 ) and (38). I f (40 ) alone i s estimated , no t onl y i s th e long-ru n solutio n no t consistently estimated , bu t th e short-ru n adjustmen t coefficient s ar e incorrectly estimate d a s well ; wher e $,\ t-i i s omitte d fro m th e model , coefficients o n x t_i ar e no t /3*(L ) bu t /3 0<5(L) + )8*(L) . I f w e d o no t have wea k exogeneity , w e canno t conduc t vali d conditiona l inference .
Linear Transformation s an d ECM s 6
7
This poin t i s independent o f whether ou r primar y interest i s in long-run equilibrium solution s o r in short-run effects . It i s als o interestin g t o not e tha t non-stationarit y i n th e serie s y t an d xt, and co-integration between them , can lead t o consisten t estimatio n of the long-ru n solutio n i n spit e o f th e lac k o f wea k exogeneity . T o mak e this clear , a s well as to clarif y th e positio n o f 'strict ' exogeneity , expres s (39) a s y where £ 2t = (et ~ A)*7t ) an< 3 wher e x t = xtt-i + r) t fro m (38) ; xt is in effect a n error-lade n measuremen t o f i\|,_i . No w grou p th e latte r regressors t o get and defin e
From (38) , xt i s correlate d wit h e 2t i n (41 ) and (42 ) since s 2t depend s upon t\ t. Howeve r w e ca n redefin e th e parameter s i n (42 ) (i.e . reparameterize) t o eliminat e this correlation, usin g (43). Write yielding fi t = et- f) 0rit + ftoc uo^xt + A^c^w,, fro m which , agai n using x t = x,\t-i + f] t, an d assumin g tha t x tt-\ i s uncorrelate d wit h e, and r\ t, we can calculate tha t
since th e firs t right-hand-sid e ter m i s equa l t o zer o b y assumption , an d from th e definitio n o f C an d th e fac t tha t CC"" 1 =1, th e square-brack eted ter m i s zero a s well. So th e re-parameterize d equatio n (44 ) has a n erro r ter m tha t i s uncorrelated wit h th e regressors ; E(n txt) = 0. Not e that , sinc e thi s re-parameterization ha s rendere d x t 'strictl y exogenous ' (Engl e e t al. 1983) i n thi s example , th e inferentia l problems d o no t ste m fro m a lack of exogeneit y in tha t sense : stric t exogeneity , unlik e weak exogeneity, is neither necessar y no r sufficien t fo r valid inference. Equation (44 ) allows u s t o se e th e large-sampl e result s o f estimatin g (42) b y regression , becaus e (44 ) is expresse d suc h tha t th e erro r i s uncorrelated wit h th e regressor s s o tha t th e coefficient s represen t th e impacts o f conditional expectations . Ther e ar e tw o points t o note . First , the existenc e o f biases i n (44) (e.g. j30(l - c ua^) + /3 0) depend s upo n a
68 Linea
r Transformation s an d ECM s
non-zero valu e fo r ff^ , an d therefor e o n th e discrepanc y betwee n x t an d the expectatio n x t\t-i, tha t is , th e biase s ar e attributabl e t o th e lac k of weak exogeneit y implie d b y th e fac t tha t w e us e x, in plac e o f X t\t~i- I f this proble m wer e no t present , bot h shor t ru n an d lon g ru n woul d b e estimable wit h n o bias ; sinc e th e proble m i s present, neithe r shor t no r long run is estimated withou t bias (fo r 1(0) processes) . Second, i f x t i s no t stationar y bu t integrate d o f orde r 1 , an d i f y t ha s the sam e orde r o f integratio n an d i s co-integrate d wit h x t (tha t is , i f there i s a long-ru n equilibriu m relationshi p betwee n them ) whil e r] t remains a stationar y process , the n c\\a^— >0 a s <— » ° ° (se e Chapte r 3) . In th e limi t a s t increase s withou t bound , therefore , th e estimate d coefficient o n x t wil l tend t o f$ 0, an d w e have the possibilit y of consistent estimation o f th e long-ru n solution . Nevertheless , th e short-ru n coef ficients remai n mis-estimated. Clearly, a lac k o f wea k exogeneit y create s seriou s inferentia l prob lems, bu t thes e ar e no t restricte d t o th e estimatio n o f long-run solutions . Further, ther e i s a marke d differenc e i n th e long-ru n outcome s betwee n the 1(0 ) an d 1(1 ) situations , leadin g u s t o stud y th e propertie s o f integrated an d co-integrated processes .
Properties o f Integrated Processes A knowledg e of the fundamenta l properties o f integrated processe s is essentia l fo r a n understandin g o f test s fo r bot h non-stationarit y and th e existenc e o f long-ru n equilibriu m relationships . Her e w e define an d presen t th e importan t propertie s o f integrate d pro cesses. W e dea l wit h th e issu e o f spuriou s regression s an d sho w how a consideratio n o f th e theor y o f integrate d processe s help s u s to understan d th e behaviou r o f standar d estimator s i n model s involving non-stationar y data . Severa l example s illustrat e th e us e of Wiene r distributio n theor y i n derivin g asymptoti c result s fo r such models . Much conventiona l asymptoti c theor y fo r least-square s estimatio n (e.g . the standar d proof s o f consistenc y an d asymptoti c normalit y o f OL S estimators) assume s stationarit y o f th e explanator y variables , possibl y around a deterministi c trend . No t al l economi c tim e serie s ar e station ary, a s w e sa w i n Chapte r 1 , an d fo r man y importan t ones , includin g aggregate consumptio n and nationa l income , stationarit y is not eve n a sensible approximation . Nonetheless, regressio n method s hav e ofte n appeare d t o b e effectiv e when analysin g such series, an d i t was not clea r tha t method s develope d for stationar y serie s woul d no t b e vali d elsewhere . T o som e extent , therefore, man y analyse s o f unadjuste d non-stationar y serie s hav e bee n carried ou t o n th e assumptio n tha t th e non-stationarit y woul d no t matter. A s som e potentia l problem s i n doing s o became clear , however , econometricians naturall y looked fo r method s o f transformin g their dat a in suc h a wa y tha t th e resultin g serie s would b e stationary , an d therefore amenabl e t o analysi s usin g 'traditional ' econometri c o r time series methods . One illustratio n o f th e difficultie s tha t ca n aris e whe n performin g regression wit h clearl y non-stationar y serie s i s th e proble m o f nonsense regression, s o name d b y Yul e (1926) , o r spurious regression, i n th e terminology o f Grange r an d Newbol d (1974) : give n tw o completel y unrelated bu t integrate d series , regressio n o f on e o n th e othe r wil l ten d
70 Propertie
s of Integrated Processe s
to produc e a n apparentl y significan t relationship. 1 Th e realizatio n tha t such thing s coul d occu r le d t o th e interes t i n transformation s t o induc e stationarity. Differencin g data wa s one o f these ; 'removing ' a determin istic trend fro m a series wa s another. 2 Although thes e transformation s enjoyed som e popularity , it eventually became clea r fo r a wid e clas s o f processe s tha t standar d significanc e tests fo r th e hypothesi s tha t ther e i s n o tren d ar e biase d i n favou r o f rejecting th e hypothesi s eve n thoug h i t i s tru e (se e Grange r an d Newbold 1974 , inter alia). Moreover , spuriou s correlation s betwee n unrelated integrate d processe s appea r eve n i n regression s containin g deterministic trends . Th e simpl e metho d o f de-trendin g befor e drawin g inferences fro m non-stationar y data wa s therefore foun d to b e flawed .
3.1. Spuriou s Regressio n The standar d proo f o f th e consistenc y o f ordinar y leas t square s regres sion use s a n assumptio n suc h a s tha t plim(l/r)(X'X ) = Q, wher e X i s the matri x containin g th e dat a o n th e explanator y variable s an d Q i s a fixed matrix . Tha t is , wit h increasin g sampl e information , th e sampl e moments o f the dat a settl e dow n t o thei r populatio n values . I n orde r t o have fixe d populatio n moment s t o whic h thes e sampl e moment s con verge, th e dat a mus t b e stationary—otherwise , a s fo r exampl e i n th e case o f integrate d series , th e dat a migh t b e tendin g t o increas e ove r time, i n whic h cas e ther e ar e n o fixe d value s i n th e matri x o f expectations of sums of squares an d cross-product s o f these data. 3 Some example s o f wha t ca n emerg e whe n standar d regressio n tech niques ar e use d wit h non-stationar y dat a wer e re-emphasize d b y Granger an d Newbold , wh o considere d th e followin g dat a generatio n process:
1 Th e chang e i n terminology ma y be misleading , sinc e Yul e als o use d the ter m 'spuriou s relationships', referrin g t o a correlatio n induce d betwee n tw o variable s tha t ar e causall y unrelated bu t ar e both dependen t o n othe r commo n variables . 2 Thi s i s accomplishe d eithe r b y includin g a functio n o f tim e a s a regressor , o r b y subtracting a functio n o f tim e fro m al l serie s used . B y th e Frisch-Waug h theorem , regressing al l serie s o n tim e an d usin g thei r residual s i n a furthe r regressio n i s numericall y equivalent t o includin g tim e a s a regressor whe n usin g th e unadjuste d series. 3 Anderso n (1958 ) extend s th e standar d asymptoti c distributio n theor y t o dea l wit h th e de-trending o f deterministi c variable s tha t ca n b e suitabl y standardized . However , her e we are concerned wit h integrated stochasti c processes .
Properties o f Integrated Processe s 7
1
That is , x t an d y t ar e uncorrelate d rando m walks . Sinc e x t neithe r affects no r i s affecte d b y y t, on e woul d hop e tha t th e coefficien t fl i i n the regressio n mode l would converg e i n probabilit y t o zero , reflectin g th e lac k o f a relatio n between th e series , an d tha t th e coefficien t o f determinatio n (R 2) fro m this regressio n woul d als o ten d t o zero . However , thi s i s no t th e case . Regression method s detec t correlations , an d i n non-stationar y serie s (a s Yule 192 6 showed ) spuriou s correlation s ma y persis t i n larg e sample s despite th e absenc e o f an y connectio n betwee n th e underlyin g series. I f two tim e serie s ar e eac h growing , for example , the y ma y b e correlate d even thoug h the y ar e increasin g fo r entirel y differen t reason s an d b y increments tha t ar e uncorrelated . Henc e a correlatio n betwee n inte grated serie s cannot b e interprete d i n the wa y that i t could b e if it arose among stationary series . In (3) , bot h th e nul l hypothesi s /3 j = 0 (implyin g y t = /J0 + e ( ), an d the alternativ e /3 i = £ 0 lea d t o fals e models , sinc e th e tru e DG P i s no t nested withi n (3) . Fro m thi s perspectiv e i t i s not surprisin g that th e nul l hypothesis, implyin g tha t {y t} i s a white-nois e process , i s rejected ; th e autocorrelation i n the rando m wal k {y t} tend s t o projec t ont o {x t}, als o a rando m wal k an d therefor e als o strongl y autocorrelated . Test s base d on badl y specifie d model s ca n ofte n b e misleading . Nonetheless , th e spurious regressio n proble m tha t appear s amon g integrate d processe s is distinct fro m th e inferentia l problems tha t ma y appea r amon g stationar y processes. I f {y t} an d {x t} wer e mad e stationar y b y introducin g a coefficient betwee n zer o an d on e o n eac h o f the lagge d terms i n (1 ) an d (2), th e OLS-estimate d regressio n coefficien t / ^ an d th e non-centralit y of it s t -statistic would bot h converg e t o zero , eve n thoug h (3 ) doe s no t nest th e tru e proces s (althoug h th e f-tes t woul d over-reject) . Tha t is , in the stationar y case , regressio n o n a se t o f variable s independen t o f th e regressand produce s coefficient s tha t converg e t o zero ; i n th e non stationary case, thi s need no t b e so . To characteriz e precisel y som e o f th e analytica l result s fo r integrate d processes, w e refe r t o Phillip s (1986) . A simpl e cas e use s (1 ) an d (2 ) above a s th e dat a generatio n process , wit h th e assumption s concerning the erro r processe s u t an d v t capabl e o f bein g weakene d substantially . Then, estimatio n o f th e mode l (3 ) b y ordinar y leas t square s ca n b e shown t o lea d t o result s tha t canno t b e interprete d withi n th e conven tional testin g procedure . T o begin with, conventionally calculated 't-statistics' o n /3 0 an d /3 j d o no t hav e (-distributions , d o no t hav e an y limiting distributions, an d i n fac t diverg e i n distributio n a s th e sampl e siz e T increases; hence , fo r an y fixe d critica l value , rejectio n rate s wil l ten d t o increase wit h sample size . Th e nul l hypothesis tha t i s being rejected her e
72 Propertie
s of Integrated Processe s
is HQ\PI = • 0; henc e a rejectio n rat e increasin g wit h sampl e siz e implie s that a null of n o relationshi p betwee n th e serie s will tend t o b e rejecte d more an d mor e frequentl y in larger samples . Th e /-statistic s are o f order T1//2. Thus , th e invali d inferenc e tha t ther e i s i n fac t a relationshi p i s traceable directl y t o th e non-stationarit y i n th e data-generatio n proces s (1) an d (2) . Whe n (1 ) an d (2 ) ar e replace d wit h stabl e autoregressiv e processes, th e non-centralit y o f th e t -statistic t o tes t //o :A - 0 con verges t o zero , reflectin g the lac k o f relationship betwee n th e series . We will examin e thes e asymptoti c result s i n mor e detai l i n Sectio n 3.5.2 , after reviewin g som e o f th e necessar y concept s fro m Wiene r distributio n theory. Further analytica l result s concernin g th e distributio n o f the F-statisti c for th e hypothesi s tha t /? 0 = f a = 0, an d thos e o f standar d autocorrela tion tes t statistics , ar e als o give n i n Phillip s (1986) . Th e F-statisti c als o diverges, leadin g t o rejection s growin g wit h th e sampl e siz e T , despit e the lac k o f relatio n betwee n {y t} an d {x,}; residua l autocorrelatio n tests, however , provid e a n indicatio n t o th e investigato r tha t th e mode l is mis-specified , by convergin g i n probabilit y t o th e value s implie d b y a serial correlatio n coefficien t o f unity . Tha t is , althoug h th e t - an d F-statistics fo r th e nul l hypothesi s o f interes t ar e grossl y misleading , some informatio n whic h woul d sugges t tha t th e regressio n (3 ) i s mis specified i s provide d b y a tes t fo r residua l autocorrelation . Thi s under lines agai n th e importanc e o f thoroughl y testin g an y regressio n mode l for mis-specification , an d basin g inferenc e onl y upo n thos e i n whic h n o evidence o f serious mis-specificatio n i s found; se e e.g . Spano s (1986) . Consider no w the followin g bivariate DGP, whic h extends (1 ) an d (2 ) by allowin g th e inclusio n o f intercept s correspondin g t o potentia l drift s in th e unit-roo t processes:
To simplif y th e analysis , w e assum e tha t th e tw o shock s e , an d v t ar e independent a t al l points in time, whic h implies that
Assume als o tha t th e initia l values y 0 an d z 0 ar e zero . W e wil l mainl y consider th e cas e wher e a = y = 0, s o tha t bot h variable s ar e simpl e random walk s an d y t an d z t ar e th e sum s o f al l o f thei r respectiv e pas t shocks. Whe n ex an d y ar e no t zero , y t an d z , depen d o n linea r trend s which reflec t th e accumulatio n o f th e successiv e intercepts . Thi s com pletes th e formulatio n of th e statistica l generatin g mechanism i n (4 ) an d (5), othe r tha n statin g a specific form fo r the erro r distributions .
Properties of Integrated Processes 7
3
Turn no w to th e specificatio n o f an economi c hypothesis . A n econom ist ma y wish t o describ e th e relationshi p betwee n {y t} an d {z t} wit h th e model where fi i i s interprete d a s th e derivativ e o f y, wit h respec t t o z t. Conventionally, equation s suc h a s (7 ) ar e estimate d b y ordinar y leas t squares, treatin g {u,} a s a n II D proces s independen t o f z t- Sinc e y t an d zt ar e causall y unrelate d her e b y construction , th e derivativ e fi \ i s zer o in th e sens e tha t n o relatio n exists ; it is not tru e t o sa y that settin g fl i t o zero i n (7 ) give s th e tru e DGP . W e wan t t o examin e th e propertie s o f the conventiona l estimatio n an d hypothesi s testin g procedur e applie d t o (7) when the unknow n DGP i s in fact (4)-(6) . Standard regressio n theor y fo r model s involvin g stationar y regressor s would sugges t tha t pli m (fi{) = fl i = 0, an d tha t th e probabilit y o f th e absolute valu e o f th e t -statistic fo r H 0:fii = 0 exceedin g 1.9 6 i s 5 pe r cent. Becaus e thes e regressor s ar e integrated , however , thi s i s no t so . Reconsider (7) . Sinc e {y t} an d {z t} ar e bot h integrate d processes , (7 ) could b e a well-define d regression wit h a non-zer o j8i , i f a relationshi p between thes e tw o variable s existed . I f howeve r /3 j = 0, a s i s tru e her e by (4)-(6) , w e hav e y, = /3 0 + ut. Now sinc e { y t } i s 1(1), {u t} mus t b e 1(1), whic h violates the assumptio n mad e abou t {u,} above . Ther e is an internal inconsistenc y i n conductin g hypothesi s testin g i n th e standar d way here , becaus e i t i s no t possibl e fo r th e erro r ter m t o b e 1(0 ) whe n /?i i s zero . We ca n us e Mont e Carl o method s t o examin e typica l results, i n finit e samples, o f regression s suc h a s (7 ) where {y,} an d {z t} ar e independen t non-stationary processes . I n th e exercis e tha t follows , w e generat e {y,} using th e DG P (4) , wit h T = 100 , a = 0, y 0 = 0, an d o £ = 1. Similarly, {zt} i s generate d wit h y = 0, z 0 = 0, an d a v = 1 i n (5) . Th e rando m errors ar e normall y distribute d an d generate d independently , consisten t with (6) . A t eac h replication , w e recor d (i ) th e estimate d coefficients , (ii) th e estimate d standar d errors , (iii ) whethe r th e nul l hypothesi s j8j = 0 i s rejecte d whe n conventiona l 5 pe r cen t critica l value s o f th e t -distribution ar e used , (iv ) the valu e o f th e sampl e correlatio n betwee n y an d z , an d (v ) th e valu e o f th e Durbin-Watso n statisti c fo r residua l serial correlation. There are N = 10,00 0 replication s usin g PC-NAIVE . In thi s experiment , th e Mont e Carl o estimat e o f the mea n valu e of / ^ for th e experimen t i s E[fii] = -0.012 , wit h Mont e Carl o standar d erro r (that is , th e standar d erro r o f th e Mont e Carl o estimat e o f th e mea n of fli) o f 0.006 . Becaus e w e ar e estimatin g a mea n usin g independen t replications, a centra l limi t theore m applie s t o th e Mont e Carl o results , so tha t th e sampl e mea n i s asymptoticall y normall y distribute d (i.e . a s N-* oo) . Henc e we can reject th e hypothesi s tha t £[& ] = 0 at T = 100,
74
Properties of Integrated Processe s
despite th e fac t tha t th e estimate d mea n valu e o f ^ i s relatively small . The frequenc y distributio n o f & i s show n i n Fig . 3.1, standardize d t o zero mea n an d uni t variance . Th e sampl e standar d deviatio n (SSD ) o f the value s of {&,. , i = 1, 2, . . ., 10,000 } is 0.63 where SSD is defined as
The Distribution ^ plotte d i n Fig . 3.1 i s tha t o f f t= (/?!,- - £ I[^1])/SSD(^1,-), s o that {ft } ha s a standar d deviatio n o f unity. The Monte Carl o standar d erro r i s a = SSD^-^/N 1/2. The probabilit y o f rejectin g H 0 a t th e conventiona l significanc e leve l of 0.0 5 i s 0.753 ; tha t is , eve n whe n th e nul l hypothesi s i s true , w e wil l reject i t 75. 3 pe r cen t o f th e time , an d therefor e mak e th e wron g decision mos t o f th e time . Figur e 3. 2 reveal s tha t i t i s no t th e shap e of the standardize d ^-distributio n tha t is at fault . Rather , th e actua l statisti c
FIG3.1. Frequenc y distributio n o f th e spuriou s regressio n coefficient , standardized t o zer o mean, uni t varianc e
FIG3.2. Frequenc y distributio n o f th e 'f-test ' o f f t = 0, standardize d t o zero mean , uni t variance
Properties of Integrated Processes
75
calculated doe s not hav e a zero mean, uni t variance distribution. I n fact , where I i s th e mea n f-statistic , I = -0.12 (0.07 ) an d SSD(? ) = 7.3 . Values o f |/ | > 1.9 6 ar e ver y likely wit h suc h a large standar d deviation , and th e empirica l critica l value s i n th e experimen t tha t ensur e a tes t with a siz e o f 5 pe r cen t ar e approximatel y ± 14.5 . However , thes e critical values are not appropriat e a t other sample sizes . This i s th e spurious regression problem : regressio n o f a n integrate d series on anothe r unrelate d integrate d serie s produce s f-ratio s on the slope paramete r whic h indicat e a relationshi p muc h mor e ofte n tha n they shoul d a t th e nomina l test level . Th e phenomeno n i s of course no t specific t o thi s sampl e size , an d i n particula r th e proble m wil l no t disappear a s th e sampl e siz e i s increased. Th e distributio n o f th e f-rati o will, however , depen d o n th e sampl e size ; Fig . 3.3 show s th e grap h o f £[/3j|r] fo r T = 20, 21, . . ., 100 , togethe r wit h ± 2a a t eac h T , wher e a denote s th e Mont e Carl o standar d erro r i n th e graph . Th e bia s i s significantly differen t fro m zer o onl y at th e large r sampl e sizes , but doe s not chang e noticeabl y wit h T. Moreover , th e valu e o f a doe s no t fal l greatly wit h T, whic h differ s fro m wha t on e woul d expec t i f conven tional asymptotic theory were applicable . Figure 3. 4 record s th e mea n valu e o f th e regressio n coefficien t together wit h th e SS D and th e mea n estimate d standar d erro r (ESE ) o f the coefficient . Ther e i s a great differenc e betwee n th e tw o measures of uncertainty: ES E i s th e estimate d standar d erro r o f th e coefficien t j§ j that th e investigato r woul d obtai n o n average , i n a regressio n o f th e form o f (7 ) give n th e DG P i n (4)-(6) ; th e SS D i s th e Mont e Carl o estimate o f th e tru e standar d deviatio n o f thi s paramete r estimate . A s Fig. 3. 4 shows , th e economis t woul d repor t a sever e underestimat e o f the uncertaint y in the estimat e o f /J t . The mea n valu e o f th e ^-statisti c shown i n Fig . 3.5 change s little a s T increases fro m 2 0 t o 100 , bu t th e standar d deviatio n o f t increase s
FIG 3.3. Mea n value of the spuriou s regressio n coefficien t wit h ±2c r (th e Monte Carl o standar d error)
76
Properties of Integrated Processe s
FIG 3A. Mea n valu e o f th e spuriou s regressio n coefficien t wit h th e estimated standar d erro r (ESE ) and samplin g standard deviatio n (SSD) across sample size s
FIG 3.5. Mea n valu e o f th e 'r-test ' o f // 0:/31 = 0, wit h ±2SS D (th e Monte Carl o base d samplin g standard deviation ) rapidly. Thu s the proble m become s worse as T increases ; rejectio n of the nul l hypothesi s of n o relatio n betwee n th e y, and z t serie s become s more likely , despit e one' s initia l intuitio n that , i f th e serie s reall y ar e unrelated, thi s feature shoul d eventuall y dominate a s T — > oo . Figur e 3. 6 records th e rejectio n frequencie s for ever y sampl e siz e considere d i n th e simulation exercise ; Prd^ft = 0)| ^ 2) is 0.30 at T = 20, already greate r than th e nomina l siz e o f th e test , an d th e proble m worsen s a s T i s increased becaus e th e rejectio n frequencie s also increase steadil y with T. The outcome s o f th e simulation s revea l th e danger s o f usin g critica l values justified i n on e contex t (e.g . IID processes ) t o conduc t inference s with statistic s compute d fro m dat a generate d b y a ver y differen t probability mechanism . With th e DG P i n (4 ) an d (5) , the proble m o f discriminatin g betwee n genuine interdependenc e an d spuriou s regression s i s difficul t t o solv e
Properties o f Integrated Processe s
77
FIG 3.6. Rejectio n frequenc y o f th e '/-test ' o f H 0:fii = Q whe n th e hypothesis is true because, unde r bot h th e nul l an d th e alternativ e hypotheses , y t an d z t have a hig h sampl e correlatio n (denote d R) . I n bot h case s w e rejec t H0:/3i = 0 most of the tim e in large samples . An earl y analysi s o f th e spuriou s regression s proble m i s due , a s w e have said , t o Yul e (1926) , wh o als o use d Mont e Carl o simulations . Yule's observation s o n the distributio n o f R remai n noteworthy and may be considere d i n thre e parts , representin g thre e differen t situations : (i ) where th e {y t} an d {z t} serie s ar e bot h mean-zer o II D processes ; (ii ) where the y ar e II D processe s integrate d once ; an d (iii ) wher e the y ar e IID processe s integrate d twice. 4 I n eac h case , th e figure s give n belo w represent th e frequenc y distributio n o f R obtaine d fro m estimatin g equation (7 ) 10,00 0 time s wit h a sampl e siz e o f 100 ; /3 0 = fi \ = 0 i n al l the simulation s (excep t fo r a n irrelevan t locatio n chang e i n cas e (i) , owing t o a progra m restriction , whe n /3 0 = I). Th e followin g feature s of the differen t case s ma y be observed . Case (i). Whe n bot h variable s ar e 1(0 ) an d IID , a s Fig. 3.7 shows , R is wel l behave d an d ha s a symmetric , nearl y Gaussian , distributio n centred o n zero althoug h bounded b y ± 1 . Case (ii). Whe n bot h variable s ar e 1(1 ) an d th e firs t difference s ar e IID, th e densit y o f R , /#(/•) , i s close r to. a semi-ellips e wit h exces s frequency a t bot h end s o f th e distributio n (se e Fig . 3.8). Consequently , values o f R wel l awa y fro m zer o ar e fa r mor e likel y her e tha n i n cas e (i). Case (iii). Whe n bot h variable s ar e 1(2) , th e secon d difference s ar e IID. I n thi s situatio n (se e Fig . 3.9) //?(r ) become s U-shaped , an d th e 4 Orde r o f integratio n wa s define d informall y i n Chapte r 1 ; i t i s explore d formall y i n Section 3. 3 below.
78
Properties of Integrated Processe s
FIG 3.7. Frequenc y distributio n fo r th e correlatio n R betwee n tw o II D independent processe s
FIG 3.8. Frequenc y distributio n fo r R betwee n tw o 1(1 ) processe s wit h independent II D firs t difference s most likel y correlation s betwee n tw o suc h 1(2 ) unrelate d serie s ar e ±1 , which i s precisely what would occur i f the serie s wer e trul y related. If a tes t statistic , base d o n R , assume s th e distributio n t o b e th e on e applying t o cas e (i ) whe n i n fac t th e correc t distributio n i s the on e tha t applies t o cas e (ii) , th e rejectio n frequenc y wil l greatl y excee d th e nominal siz e o f th e tes t (give n b y th e expecte d numbe r o f rejection s i f (i) wer e true) . Cas e (iii ) i s eve n worse : th e leas t likel y outcom e her e would see m t o b e th e discover y o f th e truth . Ther e i s almos t n o
79
Properties of Integrated Processe s
FIG 3.9. Frequenc y distributio n o f R fo r tw o 1(2 ) processe s wit h inde pendent II D second difference s probability o f findin g R — 0 i n thi s las t case , althoug h th e populatio n value anticipate d unde r th e nul l i s zero. Th e mos t likel y sample valu e is R~±l. If th e degree s o f integratio n o f th e dat a serie s ar e unknown , mixtures of case s (i)-(iii ) ar e possible . Fo r T = 100, Tabl e 3. 1 summarize s th e outcomes. Denote th e orde r o f integration o f y, and x t b y di an d rf 2 respectively, and le t d = max{di, d 2}. Th e mea n o f J R i s close t o zer o i n ever y case , but it s standar d deviatio n increase s wit h d^ + d2. Th e estimat e o f th e mean o f fi \ i s relativel y smal l compare d wit h th e SSD , especiall y when TABLE 3.1. Feature s o f regression s amon g serie s wit h variou s order s o f integration Type3
1(0), 1(0) 1(1) 1(2), 1(2) 1(0), 1(1) 1(1), 1(0) 1(2), 1(1) 1(1), 1(2)
1(1),
a
R SSD(R
0 2 4 1 1 3 3
0,,0004 -0,,006 0,,004 0,,0004 0,,0008 -0,,023 -0.,013
0.101 0.490 0.818 0.099 0.101 0.613 0.610
)
ESE SS D Pr(\t (ft = 0)|>2) 0.0004 -0.009 0.015 -0.0001 0.003 -1.84 -0.0005
0.101 0.102 0.103 0.031 0.384 3.84 0.0054
0.,102 0,,631 1..974 0,,033 0,.417 33,,52 0..036
0,,0493 0,,7570 0,,9406 0,,0458 0,,0486 0.,8530 0,,8444
Th e notatio n !(/) , I(k) describe s a regression o f an !(/') variabl e o n an variable, j , k = 0, 1 , 2 . Thus , 1(0) , 1(0) i s a cas e (i ) regression , 1(1) , 1(1) a cas e (ii) regression , an d 1(2) , 1(2) a cas e (iii ) regression . Th e remainin g case s ar e mixtures of the primitiv e (i)- , (ii)- , an d (iii)-typ e regressions.
80
Properties of Integrated Processe s
di = d 2. Th e mea n ES E reporte d b y OL S i s virtuall y unaffected b y d when di = d 2, bu t varie s greatly when di = £ d 2. Th e SS D als o increase s as di + d 2 increase s unles s the regresso r i s of higher orde r o f integration than th e regressand , namel y when di < d2. Th e ESE underestimate s th e SSD b y a facto r i n th e neighbourhoo d o f 1 0 t o 2 0 fo r d = 2. Th e probability o f falsel y rejectin g the nul l tha t / ^ = 0 rises t o abou t 9 4 per cent a s d increases . Thus, th e difficultie s ar e no t restricte d t o spuriou s regression s gener ated b y regressin g independen t serie s o f th e same order o n eac h other . Severe problem s ar e reveale d i n regression s of a n 1(2 ) o n a n 1(1 ) serie s (or vic e versa) . Les s seriou s problem s occu r i n regression s o f 1(1 ) o n 1(0) serie s (o r vic e versa) . Figur e 3.1 0 report s th e distributio n of R fo r an 1(1 ) o n a n 1(2 ) serie s an d reveal s a U-shape d distribution , a s wit h two 1(2 ) series . (Thi s als o occur s fo r a n 1(2 ) o n 1(1 ) series. ) Figur e 3.1 1 shows th e distributio n o f th e least-square s coefficien t estimat e fo r a n 1(2) o n 1(1 ) series ; th e distributio n her e i s long-tailed bu t peake d an d i s distinctly non-normal . Th e t -rejection frequencie s ar e simila r i n thes e two case s an d li e betwee n th e rejectio n frequencie s give n b y th e cas e (ii)- an d cas e (iii)-typ e regressions. Th e distributio n o f R , whe n on e of the serie s i s 1(0), i s similar t o th e distributio n of thi s statisti c when bot h series ar e 1(0). s Overall , w e se e a patter n o f potentia l nonsens e onc e both tim e serie s become integrated .
FIG 3.10 . Frequenc y distributio n o f R betwee n a n 1(1 ) an d a n 1(2 ) process wit h independent II D firs t an d secon d difference s respectivel y 5 Ther e i s goo d reason , a s w e shal l se e i n Ch . 6 , fo r thi s similarit y i n behaviour . I n a regression o f on e 1(0 ) serie s o n anothe r 1(0 ) series , independen t o f th e firs t series , th e estimate o f th e regressio n coefficien t f t tend s i n probabilit y t o zero . However , whe n a n 1(0) serie s i s regresse d o n a n 1(1 ) series , th e onl y wa y i n whic h OL S ca n mak e th e regression consisten t an d minimiz e the su m of square s i s to driv e th e coefficien t o n th e 1(1 ) variable t o zero . Thu s equivalen t result s arise . Thes e possibilitie s d o no t occu r whe n bot h series ar e integrated .
Properties of Integrated Processe s
81
FIG 3.11. Histogra m an d estimate d densit y fo r th e regressio n coefficien t of a n 1(2 ) serie s regressed o n a n 1(1) serie s Phillips (1986 ) als o demonstrate s tha t th e Durbin-Watso n statisti c calculated fro m th e residual s o f (7 ) converge s t o zer o a s the sampl e siz e tends t o infinity . Whe n th e tw o serie s ar e genuinel y related , th e D W statistic converge s t o a non-zer o value . Th e behaviou r o f th e D W statistic therefor e provide s on e wa y o f discriminatin g between spuriou s and genuin e regressions , bu t a tes t base d o n thi s statisti c may have poor power propertie s i n smal l samples . Phillips' s analytica l results ar e usefu l in understandin g th e simulatio n evidenc e tha t Grange r an d Newbol d (1974) advanced , bearin g o n th e regressio n R 2 a s wel l a s th e D W statistic. Thes e author s suggeste d treatin g an y regressio n fo r whic h R2 > DW a s one tha t i s likely to b e spurious . Thi s coul d b e interprete d as a sign of a lack o f an y equilibrium relationship amon g the variable s i n the regression , whic h in tur n implie s a non-stationar y erro r ter m an d s o very stron g autocorrelatio n i n the regressio n residuals . Overall, simulatio n an d analytica l result s sho w tha t th e proble m o f drawing inferenc e fro m non-stationar y dat a i s a seriou s one ; OL S regression interprete d i n th e standar d fashio n ca n b e ver y misleading . Resolution of thi s proble m wil l lea d us int o a mor e detaile d considera tion o f th e integratio n propertie s o f tim e series , bu t firs t w e wil l examine the practic e of de-trending time series .
3.2. Trend s and Rando m Walk s One potentia l solutio n suggeste d fo r dealin g wit h integrate d serie s wa s to assum e tha t th e sourc e o f non-stationarit y coul d b e capture d by , o r
82 Propertie
s o f Integrated Processes
approximated by , a deterministi c functio n o f time . I f thi s wer e so , i t would b e possibl e t o brea k u p a n integrate d serie s int o a deterministi c (and therefor e completel y predictable ) component , an d a stationar y series o f deviation s fro m thi s 'trend' . Method s fo r analysin g stationar y series coul d b e applie d t o th e deviations , an d th e whol e serie s thereb y modelled. Unfortunately, subsequen t evidenc e fro m Mont e Carl o an d analytica l studies (e.g . Phillip s 1986 ) showed tha t inferenc e i n model s tha t con tained tim e trend s coul d no t b e carrie d ou t i n th e straightforwar d way that practitioner s ha d hoped . Firs t o f all , tim e trend s woul d appea r t o be statisticall y significan t i n model s wher e the y shoul d no t be , muc h more ofte n tha n conventiona l tes t size s woul d suggest . Tha t is , th e standard statistic s (especiall y ^-statistics ) fo r th e hypothesi s tha t th e tim e trend shoul d not appea r d o not hav e standard ^-distributions . Second, deterministi c trend s di d no t solv e th e spuriou s regressio n problem, eve n leavin g asid e th e difficult y involve d i n decidin g whethe r or no t the y shoul d b e presen t i n th e regressio n model . Th e reaso n i s that spuriou s correlatio n wil l ten d t o emerg e eve n wit h deterministicall y 'de-trended' random walks. We wil l no w loo k a t som e mor e precis e question s an d thei r answers . The analytica l result s tha t w e summariz e ar e foun d i n Durlau f an d Phillips (1988) ; Mont e Carl o studie s o f model s wit h tim e trend s presen t can b e foun d i n Sai d an d Dicke y (1984 ) an d Schwer t (1989) . Sectio n 3.5.1 describe s th e asymptoti c theory applicable . The tw o question s tha t w e wil l addres s are : (i ) Wha t problem s o f inference appea r i n usin g tim e trends ? an d (ii ) Ca n de-trendin g yiel d stationary serie s an d therefor e a solutio n t o th e proble m o f spuriou s regression? Consider a serie s {y,} whic h i s generate d accordin g t o th e rando m walk An investigato r face d wit h suc h a serie s (without , o f course , knowin g this data-generatio n proces s precisely ) migh t decid e t o attemp t t o dea l with th e apparen t non-stationarit y by de-trending : tha t is , b y including a time tren d i n a regressio n equatio n o r b y removin g th e fitte d value s from a regressio n o n tim e fro m th e series . Th e investigato r migh t therefore us e the regressio n mode l As Durlau f an d Phillip s (1988 ) show , ther e ar e onc e agai n problem s i n conducting inferenc e i n this environment. Whe n c = y = 0 , b y (8) , y ha s a degenerat e limitin g distributio n a t 0 (a s i n a stationar y mode l wit h a trend), wherea s c ha s a divergen t distribution ; tha t is , th e unseale d
Properties o f Integrated Processes 8
3
parameter estimat e c ha s a varianc e tha t grow s wit h th e sampl e size . We will dea l more rigorously with thes e limitin g distributions later i n th e chapter. Moreover, inferenc e concernin g y wil l b e unreliabl e eve n thoug h th e estimate of that parameter i s converging t o its true value of zero. Whil e the paramete r estimat e converge s t o zero , th e t - an d F-statistic s fo r th e hypothesis HQ\ y = 0 do no t converg e t o zero , an d ar e i n fac t asymptot ically unbounde d wit h probabilit y 1 . (Tha t is , ther e exist s som e < 5 > 0 such that , fo r £ representing eithe r o f th e tes t statistics , T~ d£—» ° ° wit h probability 1. ) As i n the spuriou s regressio n cas e above , th e investigator must loo k t o mis-specificatio n tests—i n particular , test s fo r autocorrel ated errors—fo r a suggestio n tha t ther e i s somethin g wron g wit h th e regression model . Since th e spuriou s regressio n proble m betwee n integrate d serie s remains wit h deterministicall y de-trende d series , inclusio n o f a tim e trend i s no t a solution . Conside r agai n th e DG P (l)-(2) , an d a n investigator wh o choose s thi s tim e t o attemp t t o 'tak e accoun t o f th e potential non-stationarit y in these serie s b y including a time tren d i n th e regression. Th e mode l is therefore
The result s fro m (10 ) ar e muc h a s one woul d expec t give n those implie d by (3 ) an d (9 ) abov e (see , again , Durlau f an d Phillip s 1988) . A s before , the distributio n o f c diverge s an d y tend s i n probabilit y t o zero , bu t / ? has a non-degenerat e distributio n asymptoticall y (i.e . doe s no t converg e to zero) . Test s fo r H 0: / ? = 0 diverg e i n distribution, tendin g t o lea d th e investigator falsel y t o rejec t thi s nul l hypothesis . Estimatio n o f th e regressions i n (9 ) an d (10 ) wil l produce substantia l residua l autocorrela tion. I t migh t b e though t tha t modellin g th e autoregressiv e erro r using , say, th e Cochrane-Orcut t algorith m shoul d remov e th e uni t roo t an d thereby allo w vali d test s o f ft = 0 in (10) . Grange r an d Newbol d (1977 ) present Mont e Carl o evidenc e suggestin g that suc h a strategy i s ineffect ive in practice whe n based o n conventiona l critica l values. In summary , th e proble m o f falsel y concludin g tha t a relationshi p exists betwee n tw o unrelate d non-stationar y series , a proble m tha t persists eve n a s th e sampl e siz e grow s without bound, i s no t alleviate d by a n attempt t o remove a trend fro m th e underlying series . In workin g with non-stationar y data , th e investigato r mus t b e particu larly careful . Whil e on e solutio n i s t o transfor m th e serie s t o achiev e stationarity (a t th e cos t o f losin g som e informatio n abou t long-ru n behaviour, a s we shal l se e below) , i t i s essential tha t th e investigato r b e aware o f th e non-stationarit y i n th e dat a i f procedure s fo r modellin g data o f thi s typ e ar e t o b e applie d appropriately . A s i t happens , testin g
84 Propertie
s o f Integrated Processes
for non-stationarit y i s als o potentiall y misleading , i n tha t non-standar d distributions appea r wher e th e dat a ar e non-stationary , s o tha t inferen tial procedure s mus t diffe r fro m thos e applicabl e whe n th e serie s ar e stationary. Our discussio n ha s therefor e le d u s to tw o major area s whic h must b e understood whe n working with potentially non-stationar y data . Th e firs t is compose d o f technique s fo r determinin g whethe r o r no t serie s ar e stationary (mor e generally , the orde r o f integration of a series). Chapte r 4 wil l concentrat e o n thes e techniques , whic h we us e t o decid e whethe r methods o f inferenc e for non-stationar y data ar e necessar y t o overcom e the problem s tha t hav e bee n illustrate d t o thi s point . Method s tha t ca n be use d wit h non-stationar y dat a compris e th e secon d are a tha t w e should examine , an d for m th e subjec t matter o f Chapte r 6 . Moreover , i t must b e note d that , i n spit e o f the inadequac y of deterministi c trend s a s models fo r serie s tha t ar e i n fac t rando m walks , i t remain s conceivabl e that economi c tim e serie s d o actuall y contai n suc h deterministi c com ponents; som e o f th e test s tha t w e conside r late r wil l allo w fo r thi s possibility.
3.3. Som e Statistica l Feature s o f Integrated Processe s Before w e conside r testin g fo r integratio n i n tim e series , w e mus t firs t define order s o f integratio n an d conside r som e o f th e propertie s tha t integrated serie s usuall y display. DEFINITION 1.6 A serie s wit h n o deterministi c componen t an d which ha s a stationar y an d invertibl e autoregressiv e movin g aver age (ARMA ) representatio n afte r differencin g d times , bu t whic h is no t stationar y afte r differencin g onl y d — 1 times , i s sai d t o b e integrated o f order d , denote d x t ~ \(d). The definitio n can b e extende d t o allo w for polynomial s in time of th e form 2f= ojM' - Whe n & dxt contain s a polynomial of order p i n time, x t depends on a polynomial o f order p + d. The propertie s o f serie s integrate d o f strictl y positiv e order s diffe r substantially fro m thos e o f 1(0 ) series . Conside r a serie s containin g a single unit root :
6 Thi s definitio n i s simila r t o tha t o f Engl e an d Grange r (1987) , bu t rule s ou t som e anomalies. Conside r th e stationary , I(—1) , serie s z , = et — £,_1; wher e e, is 1(0) . Integrat ing {z,} give s a serie s tha t i s 1(0) ; bu t i f we cal l {z, } itsel f a n 1(0 ) series , the n w e woul d expect its integral {ej t o be 1(1).
Properties o f Integrated Processe s 8
5
or, afte r integrating , where S, = E/=oPX-;- I f p > 1, y < is non-stationary, an d i f p = 1, it is integrated o f orde r 1 (i.e. 1(1) ) sinc e y, is then th e su m o f al l previou s errors {u/}, j = 1, . . ., t . Th e sequenc e {u t} need no t b e a n innovatio n sequence; u, ma y itsel f follo w a stationar y ARMA(p , g) process , fo r example. Belo w w e wil l assum e a fairl y genera l se t o f propertie s fo r th e {ut} process . First , however , w e consider tw o special case s o f (llfl) :
and In (12) , t o ensur e stationarity , le t u s assum e tha t y 0 i s draw n fro m th e unconditional distribution o f y; that is, y0 ~ IID[0, a\/(l - p 2)]. It i s interestin g t o compar e severa l propertie s o f thes e series , viewe d as possibl e DGPs . Tabl e 3. 2 summarize s som e o f th e difference s between autoregressiv e serie s tha t ar e stationary , an d thos e containin g one (o r more ) uni t root s (whic h requir e differencin g t o b e mad e stationary). Th e propertie s i n th e right-han d colum n o f th e tabl e hol d for integrate d serie s generally . Nonetheless , th e specificatio n (13 ) i s a special one , an d i n a genera l treatmen t w e wan t a les s restrictiv e TABLE 3. 2. Som e propertie s o f stationary an d integrate d processe s
Variance Conditional variance Autocorrelation function a t lag i Expected time between crossings of y = 0 Memory3 a
DGP (12 ) (1(0))
DGP (13 ) (1(1))
Finite (a\(l - p2)-i )
Unbounded (grows as ta^)
Pi = P
1
Finite Temporary
Pi = Vl - (i/f) - » 1 V i as t -» o o
Infinite Permanent13
We sa y tha t a serie s has a permanen t memory if th e effec t o f a shoc k does not disappea r as t —* °°. b ln a multivariat e context , a n integrate d process may hav e som e components that d o no t remai n in th e serie s indefinitely. I f a series is integrated, there must be a t leas t on e componen t tha t wil l hav e permanent effects, bu t ther e ma y b e others wit h temporar y memory. Fo r example , a rando m wal k proces s plu s a n unrelated stationar y process woul d yiel d a n integrate d process , bu t memor y would be permanen t only for the rando m walk component.
86 Propertie
s o f Integrated Processe s
specification whic h wil l cove r a greate r variet y o f series . W e ca n fin d one b y adopting (11«) , fo r example, bu t th e propertie s o f the erro r ter m remain t o b e specifie d sinc e (lla ) require s only tha t i t b e 1(0) . W e d o not, however , wis h t o adop t th e ver y restrictive specificatio n in (12 ) an d (13), whereb y th e erro r i s require d t o b e orthogona l t o it s ow n past . However som e restriction s mus t b e place d o n th e error s t o guarante e non-degenerate limitin g distribution s fo r th e statistic s describe d below . A wea k se t o f restriction s whic h suffice s fo r man y purpose s i s give n below an d i s discusse d i n detai l b y Phillip s (1987a) ; th e mode l (11) , supplemented wit h erro r term s {u t} require d t o mee t onl y thes e conditions, i s capabl e o f representin g a wid e variet y o f univariat e data-generation processes , includin g thos e wit h exogenou s variables , a s long a s th e exogenou s variable s ar e 1(0 ) an d s o ar e capabl e o f bein g subsumed i n {u,} i n (11) . Thes e condition s ar e give n i n (I6a)-(l6d) below. Series tha t ar e 1(0 ) hav e the importan t property tha t certai n function s of th e sampl e value s converg e t o constant s a s th e numbe r o f sampl e values increase s withou t bound . Fo r example , law s o f larg e number s (see e.g . Whit e 1984 ) guarante e th e convergenc e i n probabilit y o f th e sample mea n t o th e tru e mea n o f th e proces s fo r a clas s o f processe s that include s stationar y tim e series . Othe r function s o f th e sampl e ca n have constan t probabilit y limit s a s well; for example , a varianc e estimator ma y converge i n probabilit y to th e tru e varianc e o f th e series . On e of th e primar y fact s abou t integrate d processes , however , i s tha t convergence theorem s o f thi s type , wher e convergenc e i s t o constants , generally fai l t o hold , an d suc h convergence theorem s a s can b e derive d will involv e convergenc e o f sampl e moment s t o random variables. Analytical result s concernin g limitin g distribution s mus t therefor e b e based o n a n extended asymptoti c theory. For a vector tim e serie s x, wit h n components , w e define x t ~ I(d) i f d i s th e highes t orde r o f integratio n o f th e individua l series: x it ~ I ( d t ) and d = max(di, d^, . . ., d n).
3.4. Asymptoti c Theory fo r Integrated Processe s We wil l no w revie w an d develo p som e o f th e asymptoti c theor y appropriate t o integrate d rando m variables . W e us e th e Wiene r pro cesses introduce d i n Chapte r 1 , so tha t th e propertie s o f estimator s an d test statistic s fo r 1(1 ) serie s wil l b e mor e readil y interpretable . Mos t o f our attentio n wil l b e devote d t o th e statistica l propertie s o f serie s containing a singl e unit roo t (i.e . 1(1 ) processes) , extendin g to th e mor e general I(d) clas s only where necessary . Begin by considering the followin g dat a generatio n process :
Properties o f Integrated Processes 8
7
where {u t}i i s a weakly stationary, mean-zero innovation sequence . After integratin g the proces s i n (14),
In general , 1(1 ) serie s suc h a s y t ar e linea r function s o f time , wit h a slope o f zer o wher e fj, = 0. Th e deviation s from thi s function of tim e ar e 1(1), bein g the accumulatio n o f pas t rando m shocks : th e effect s o f thes e shocks d o no t di e out . For example , le t « f ~IN(0,1). Then , fo r 0 ss T *£ T, w e have that E(S T - S r) = 0, and
because 2S= r+iM ? i s distribute d a s x 2 wit h T - T degrees o f freedom . Hence S T~ N(0, T), a rando m wal k wit h independen t normall y dis tributed increments . In general , th e formulatio n in (14 ) nee d no t assum e tha t th e {«, } ar e white-noise disturbances , bu t onl y tha t the y satisf y condition s give n i n (16) below . T o complet e th e specificatio n of th e DGP , w e impose thes e restrictions o n {wjf . Th e condition s ar e stron g enoug h t o sustain th e derivation o f non-degenerate limitin g distributions for th e statistic s t o b e discussed belo w an d wea k enoug h t o b e relevan t fo r man y economi c time series . Thi s se t o f condition s is defined i n detail i n Phillips (1987a) , and ca n b e summarize d as follows . Let {u t}i b e a stochastic process suc h that, fo r S T — 2i=iM <> • E(u t) = 0 for all t; (16a ) • sup , E(\ut\^) < o o for some /3>2; (16b) • o 2 = li m E(T~ 1S2T) exists , and a2 > 0; r^oo • u t i s strongl y mixing , wit h mixin g coefficient s {a m} suc h tha t S»-i«£t-^«*>. (16c ) • fo r stationary {«,} , o 2 ca n be written as
Each o f thes e condition s relate s t o a n importan t aspec t o f th e behaviour o f th e {u t} process . Th e first , i n (16a) , i s th e conventiona l one o f havin g a zer o unconditiona l mea n suc h tha t al l drawing s of {u t} have th e sam e mean . Next , (166) i s sufficient t o ensur e th e existenc e of the varianc e an d a higher non-intege r momen t o f {u t} V t. However , i t is a wea k conditio n i n tha t E(\u,P) i s no t assume d t o b e constant , s o that heterogeneit y i s allowe d i n th e erro r process . Often , thir d o r eve n
88 Propertie
s of Integrated Processe s
fourth moment s wil l b e assume d t o exist , thereb y ensurin g tha t (I6b) holds: normality , fo r example , entail s tha t al l moment s o f finit e orde r exist. Th e thir d conditio n i s neede d t o ensur e non-degenerat e limitin g distributions, an d eithe r (16c ) o r a closel y relate d conditio n i s require d in mos t centra l limi t theorem s t o guarante e tha t informatio n continue s to accrue . Finally , w e discusse d mixin g condition s i n Chapte r 1 , an d these serv e a s a usefu l intermediat e assumptio n which ensures ergodicit y yet allow s a considerabl e degre e o f tempora l dependenc e i n th e {u,} process. Th e /3 i n (16£> ) i s th e sam e a s tha t i n (16c) : th e mor e heterogeneity tha t i s allowed , th e les s th e possibl e tempora l depend ence, an d vic e versa. These condition s impl y tha t th e proces s generatin g th e erro r ter m i n (14) ma y tak e an y on e o f a larg e numbe r o f forms . Possibl e example s include most stationar y ARM A models , an d ARMAX model s where th e exogenous variables ar e 1(0 ) . Note tha t a 2 = o 2u only if the erro r term in (14) i s IID(0, o 2u). Thi s restrictiv e cas e i s of interes t i n tha t i t i s the cas e for whic h most limitin g distributions hav e bee n tabulated ; nevertheless , it wil l no t hol d i n man y empirica l applications. 7 Fo r example , i f u t i s the MA(1 ) process u , = et — det-i, the n o 2u = o 2e(l + 0 2 ), wherea s a2 = a](l -26+ 9 2) = o 2E(l -8) 2. As note d above , ordinar y probability limits an d centra l limi t theorems do no t appl y i n th e cas e o f integrate d processe s l(d), d 5 = 1 . I n orde r t o derive limitin g distributions, i t i s necessar y a s i n th e stationar y cas e t o use sequence s o f rando m variables , th e convergenc e o f whic h is ensure d by appropriat e transformations . Th e evolutio n o f a time-serie s proces s dominated b y a growin g secular component ca n b e suitabl y smoothed b y a choic e o f horizonta l an d vertica l axe s whic h control fo r explosivit y an d curvature, respectively . Mor e precisely , i n th e 1(1 ) framework , we nee d to focu s o n th e sequenc e {S t} whic h ca n b e transforme d suc h tha t eac h element o f th e sequenc e lie s in th e spac e o f real-value d function s o n th e interval [0 , 1] whic h are right-continuous , an d hav e finit e lef t limits ; this space i s denoted D(0 , 1). Th e transformatio n is achieved b y substituting a concentrate d serie s fo r th e stochasti c componen t S t o f th e origina l series. I n particular , we will map a transformation of S, onto th e Wiene r process. Th e firs t step , a s w e sa w i n Chapte r 1 , i s t o ma p th e interva l [0, T ] ont o th e fixe d interva l [0 , 1] by dividin g th e latte r into T + 1 parts at 0 , 1/T, 2/T, . . ., 1 ; next , w e construc t a ne w rando m functio n o n [0, 1] (se e Phillip s 1987a) . A suitabl e concentrate d serie s i s then
The paramete r a 2 ha s a clea r interpretatio n in th e frequenc y domain : i t i s equa l t o , wher e /u(0) i s the spectra l density at frequenc y zero .
Properties o f Integrated Processe s 8
9
with ( t - l)/T ^ r < t/T an d t = 1, 2, . . ., T , s o tha t r e [0, 1]. Her e [z] represents th e intege r par t o f any rational numbe r z . I n thi s way we are abl e t o concentrat e th e origina l horizonta l axi s o f 1 t o T t o th e closed interva l [0,1] , indexin g th e observation s b y r . If , fo r example , T = 100, th e origina l observatio n _y 50 wil l be indexe d b y r e [0.50 , 0.51), and s o on . Th e choic e o f th e powe r o f T i n th e denominato r o f (17 ) i s such tha t th e serie s R T i s neither explosiv e no r converge s to zero. Since , for example , whe n u t i s IID(0 , o 2u), the n var(S r) = O 2UT , th e standar d deviation o f S T wil l b e O(T 1/2), an d thi s i s precisel y th e powe r chose n to modif y th e ordinat e axis . We then have that, a s T grow s without bound, The symbo l = > i s use d her e t o signif y wea k convergenc e o f th e associated probabilit y measure, 8 whil e W(r ) i s a scala r Wiene r proces s with varianc e r, also know n as a Brownian motion process , whic h lies in the spac e C[0 , 1] o f al l real - valued continuou s function s o n th e interva l [0, 1]. Resul t (18 ) i s known as Donsker's theorem; interested reader s ar e referred t o Billingsle y (1968) fo r detail s and proof . An extensio n o f th e Slutsk y theore m i n conventiona l asymptoti c theory (se e e.g . Whit e 1984 ) als o applie s i n this framework, i n the sens e that, i f g ( • ) i s any continuou s functional on C[0 , 1], the n Rr(r) 4 > W(r) implies that This resul t i s calle d th e continuous mapping theorem (se e Billingsle y 1968). The mos t strikin g differenc e betwee n conventiona l asymptoti c theory and thi s theor y appropriat e t o integrate d processe s i s that , wherea s i n the forme r th e sampl e moment s converg e t o constants , i n th e latte r suitably normalize d sampl e function s converg e t o rando m variables . Similarly, a s a resul t o f th e absenc e o f stationarit y and ergodicit y i n th e series {y t}, traditiona l centra l limi t theorem s ar e replace d b y functional central limi t theorems (FCLT) . A usefu l contras t betwee n thi s asymptoti c theor y an d tha t applicabl e to stationar y processe s is provide d by the distributio n of the sampl e mean considere d i n Chapter 1 . Rewrit e (14 ) a s
and conside r th e behaviou r o f th e las t ter m fo r p < 1 an d p = 1 8 Thi s concept , use d i n functio n spaces , i s analogou s to convergenc e i n distributio n fo r ordinary random variables . Se e Hall an d Heyd e (1980) .
90 Propertie
s of Integrated Processe s
respectively. I n th e forme r case , thi s ter m i s 1(0 ) an d a straightforward application o f a La w o f Larg e Number s (agai n see , e.g., White 1984) will show tha t
since £(M,_,- ) = 0 . I n th e 1(1 ) case , whe n p = 1 this las t ter m i s given by St = 2i'= i M i> an d ca n b e writte n i n term s o f th e correspondin g Wiene r process usin g the standardize d su m (see Phillips 1986 and Sect . 1.5.6): 9
Similarly:
Since
Thus:
Note th e differenc e between th e order s o f magnitud e o f thes e limitin g distributions an d th e conventiona l stationar y distributions : i.e . 0 p(T3/2) in (21 ) instead o f O P(T), O p(T2) i n (22 ) instead o f O P(T), O P(T) i n (23) instea d o f O p(Tl/2}, an d O p(T5/2) i n (24 ) instead o f O P(T3/2). These difference s ar e behin d a number o f unconventional feature s of th e distributions o f test statistic s fo r hypothese s involvin g integrated series . E(u,), given the restrictions embodied i n (16).
Properties of Integrated Processe s 9
1
Many o f the functional s t o whic h thes e sample moments converge can be expresse d i n term s o f norma l densities . Tabl e 3. 3 provide s a se t o f distributional result s fo r a numbe r o f thes e functional s fo r II D error s with uni t variance . Sectio n 1.5. 6 an d the appendi x t o Chapte r 1 provide examples o f th e metho d o f proo f o f thes e result s b y showin g tha t th e sample momen t i n exampl e 1 o f Tabl e 3. 3 converge s t o bot h th e functional JoW(r)d r an d th e densit y N(0,1/3) , implyin g tha t th e func tional mus t hav e thi s densit y (als o se e Phillip s 1987 a, b , an d Cha n an d Wei 1988) .
3.5. Usin g Wiene r Distribution Theory We no w presen t tw o example s o f th e applicatio n o f th e asymptoti c distribution theor y fo r integrate d processe s t o hel p understan d regres sion wit h non-stationar y data . Recal l tha t result s o n sum s o f power s of trend term s ar e summarize d i n Sectio n 1.5. 5 above , an d tha t th e relationships amon g sampl e moments , functional s o f Wiene r processes , and densitie s fro m th e norma l famil y ar e summarize d in Table 3.3. TABLE 3.3. Convergenc e result s fo r normalize d sampl e moments 3 Functional Densit ,1 W(r)dr N(0
, 1/3 ) T~
,1 T
2: J rdW(r) N(0
, 1/3 )
3: W(l) N(0
, 1)
r
1T
4: W(r)dW(r)
r f i 1-1/
5: J o W(r) 2 dH J ,1 T
e moment 5
y Sampl
(l/2)(x
2fi
^ W(r)dV(r) N(0
6: J o ( r - a)W(r)dr N(0
2
(l) - 1 ) T~
l 2
ly
l
^ y,-iu, t =l
, 1) , T), T~
52
/ ^ ty, (a = 0)
where Y = (1/60 ) • (8 - 25 « + 20«2) a
I n exampl e 5 , V(r) i s anothe r Wiene r proces s independen t o f W(r). Not e that a specia l cas e o f exampl e 6, whic h we wil l us e later , i s a — 0, whic h yields a density of N(0, 2/15). b Thes e ar e example s o f sampl e moment s whic h converge t o th e correspond ing functionals in th e firs t colum n for _y n = A * = 0 and ff 2 = 1 .
92 Propertie
s o f Integrated Processe s
3.5 .1. Example: Spurious De-trending (Durlauf an d Phillips 1988) Let {y t}™ be generate d as in (14 ) above ; the n Consider th e mode l ) t, (26 This i s a mode l whic h fail s t o tak e accoun t o f th e presenc e o f th e stochastic tren d i n th e dat a serie s an d thereb y attempt s t o de-tren d spuriously. The OL S estimato r o f c in (26 ) is
Substituting (25 ) int o (27 ) an d rearranging , we obtain
However, b y (21) ;
by (24) ; also;
The densit y o f this functiona l ca n b e foun d fro m exampl e 6 in Table 3.3 , by substitutin g a =2/3; i t reduce s t o N(0 , 2cr2/15). Not e i n particula r that c ha s a divergent limiting distribution. Similarly, th e OL S estimat e o f y in (26 ) is
Properties o f Integrated Processe s 9
3
Using (25 ) an d rearrangin g yields
Further,
It the n follows , from th e limitin g results give n above, tha t
where th e las t equalit y follow s fro m settin g a = 1/2 i n exampl e 6 o f Table 3.3 . Usin g simila r techniques , Durlau f an d Phillip s (1988 ) sho w that T" lf2t9, T" l/2tt, T~ lcii, R 2, an d T - D W hav e functional s of Wiener processe s a s their asymptoti c distributions. 10 Sinc e th e estimate d coefficient o n th e tren d converge s t o \JL, a s suggeste d by (29) , an d a s th e distribution o f it s ^-statisti c i s divergent , interpretin g th e result s a t fac e value wil l lead th e investigato r t o suppos e tha t th e tren d i s an importan t determinant o f th e serie s { y t } . I n fact , th e serie s woul d b e bette r modelled wit h a stochasti c tren d a s i n (25) , whic h woul d lea d t o a stationary residual series. 3.5.2. Example: Spurious Regression (see Phillips 1986) Let {y t}i an d [x t}i b e generate d a s pure rando m walks:
The spuriou s regressio n mode l is In orde r t o deriv e th e asymptoti c distribution s of the estimator s an d tes t statistics fo r (30) , i t i s convenien t t o defin e W u(r) an d W E (r) a s th e independent Wiene r processe s o n C[0 , 1] obtaine d fro m cumulatin g th e {wjf an d {ejf series , respectively . Le t x an d y b e th e sampl e mean s of th e {x,} an d {y t} series . The n 10 R i s th e multipl e correlatio n coefficien t o f th e estimate d model , an d D W i s th e Durbin-Watson statisti c computed fro m th e ut.
94
Properties o f Integrated Processe s
From (21),
From (22) ,
It ma y also be shown , usin g the sam e method o f proof, tha t
Substituting (32)-(35) into (31) , it follows tha t
Also,
From (21 ) and (36),
The spuriou s regressio n problem becomes clear upo n inspection o f (36) . The tru e valu e of th e derivativ e of y t wit h respect t o x t i s zero becaus e the error s generatin g th e {x t} an d {y t} serie s i n th e regressio n (30 ) are independent. Ye t / ? fail s t o converge i n probabilit y t o zer o an d instea d has a non-degenerate distribution.
Properties o f Integrated Processe s 9
5
Using simila r techniques , Phillip s (1986 ) show s tha t T~ l/2tp ha s a non-degenerate distribution , o r i n othe r word s tha t th e t -statistic fo r / 3 has a divergen t distribution . Henc e a s T— »°°, th e probabilit y o f a significant f-valu e arisin g i n a regressio n suc h a s (30 ) approaches 1 , leading t o spuriou s inference s abou t th e existenc e o f a relationshi p between y t an d x t (se e Banerjee an d Hendry 1992 , fo r a n exposition) .
3.6. Near-integrate d Processe s In late r chapter s w e wil l dea l wit h variable s tha t ar e 'borderline- ' o r 'near-'integrated. B y thi s w e mea n tha t th e proces s generatin g th e variables ha s a roo t clos e t o bu t no t o n th e uni t circle . Phillip s (19876 ) presents asymptoti c result s fo r 'unit-root ' an d 'near-unit-root ' processe s within a unifie d framewor k t o explai n th e specia l propertie s o f regres sions estimate d usin g borderline-stationar y variable s an d w e follo w hi s approach. Consider th e AR(1) model where u t ~ IN(0, a2). When |p | < 1 and y0 ~ N[0, o2(l - p 2)"1], {y t} i s a stationar y process . Whe n p = 1 and y 0 = 0, i t i s 1(1) an d non-station ary. Apparently , therefore , ther e i s a discontinuit y a t p = 1 wher e stationarity disappears , an d th e constan t unconditiona l varianc e (a2(l - p 2)"1) becomes a trend (to 2). In fact , i f y 0 = 0 in (39) and jp | < 1 but is close t o unity, sa y p = 1 + s with e < 0 for small e , the n
and
Thus, th e varianc e act s lik e a tren d fo r finit e t whe n term s o f 0(e 2) o r smaller ar e negligible , an d ther e i s reall y n o discontinuit y i n practica l terms: fo r sufficientl y smal l e an d finit e t, th e proces s behave s lik e a n 1(1) proces s eve n thoug h i t i s asymptotically stationary . Paraphrasing , in finite samples , fo r e close t o zero , a better approximatio n i s to trea t th e process a s 1(1 ) tha n a s 1(0) , eve n thoug h asymptotically , th e expansio n for th e variance abov e approaches a finite limi t not dependen t upo n t . A mor e convenien t parameterizatio n o f nearl y integrate d processe s i s given b y writin g p = exp(e/T), fo r E < 0. Thi s parameterizatio n define s a sequenc e o f loca l alternative s t o p = 1 for th e process . Whe n e = 0,
96 Propertie
s of Integrated Processe s
p = 1 , whil e p i s les s tha n bu t clos e t o unit y fo r smal l e < 0 an d a s r-»°o, p—»1 . A proces s wit h suc h a valu e o f p i s calle d 'near integrated' becaus e fo r smal l negativ e E it behave s rathe r lik e a n 1(1 ) process.11 There ar e three advantage s t o considerin g near-integrated tim e series. The firs t i s th e lin k the y provid e betwee n conventiona l asymptoti c distribution theor y an d th e Wiene r theor y describe d above , stressin g the continuity o f th e breakdow n i n stationarit y a s a roo t approache s unity . The sketc h o f th e relevan t theor y provide d belo w reinforce s thi s consideration. Th e secon d advantag e is that th e resultin g theory ma y b e empirically mor e relevan t tha n tha t derivin g fro m th e assumptio n o f a n exact uni t root . I t i s too earl y t o reac h a fina l judgemen t o n tha t issue , but th e algebr a belo w suggest s tha t ver y similar finite-sample behaviour would be observe d i n unit-root and near-integrate d processes . The fina l advantage , an d th e rea l reaso n fo r ou r interest , i s tha t near-integration i s neede d whe n examinin g th e powe r function s o f unit-root test s agains t stationar y loca l alternatives . Phillip s (1988 ) em phasizes thi s role , an d Johanse n (1991« ) an d Haldru p an d Hylleber g (1991) presen t application s t o derivin g powe r functions . W e describ e and dra w upon som e o f their result s in th e nex t chapte r whe n discussing testing for a unit root . Reconsider (39 ) wit h p = exp(e/r), y 0 = 0, an d wit h th e {u t}i sequence satisfyin g th e se t o f condition s give n by (I6a)-(16d). I n orde r to deriv e th e limitin g distributio n o f p , th e OL S estimato r o f p , unde r H0, it is convenient t o defin e th e functiona l K E(r):
KB(r) i s also know n as an Ornstein-Uhlenbeck proces s and , fo r fixe d r, is distribute d normall y wit h mea n zer o an d varianc e (l/2)e~ 1 [exp(2r£) - I]. 12 K e(r) i s a first-orde r diffusio n proces s an d is closely relate d t o W(r). (Se e e.g . Grimme t an d Stirzake r (1982 ) fo r details.) I t i s like a n error-correctio n process , havin g been generate d b y the stochasti c differentia l equatio n Using argument s analogou s t o thos e employe d earlie r i n thi s chapte r to deriv e distribution s fo r uni t roo t processes , Phillip s (1987fc ) prove s the followin g asymptotic results for (39) whe n p - ex p (e/T):13 11
Se e Chan an d We i (1988 ) an d Phillip s (19876). Not e tha t lira e _, 0 (e~ 1 /2)[exp(2r£) — 1] = r (usin g L'Hopital' s rule) . Thi s i s a s ex pected because , a s e->0 , K s(r)—>'W(r), an d fo r fixe d r , W(r)~N(0 , r) . Alternatively , use a Taylor serie s expansio n t o give exp(2r£) = 1 + 2rc + O(e 2 ) an d the resul t follows. 13 Th e definition s o f 5 [Tr], S,, A, and a 2 ar e give n in equations (14)-(23) . 12
Properties o f Integrated Processe s 9
7
For example , to demonstrat e (40) , construc t step-processes give n by
and the n sho w that Using the power-serie s expansion for exp(e/T), Now, fro m (39) ,
Thus, fro m (43) ,
Finally, usin g (41) and (42 ) i n (44) ,
When th e non-centralit y paramete r e i s set t o zero , K £ ( r ) = W(r) an d the Dickey-Fulle r distributio n i s recovere d a s a specia l cas e o f (45) . Using th e Dickey-Fulle r distributio n a s a benchmark , i t ca n als o b e seen fro m (45 ) tha t th e effect s o f near-integratio n ar e reveale d i n a shif t in locatio n (give n b y e ) an d a chang e i n shap e o f th e limitin g distribution o f p : p converge s t o 1 (whic h i s th e nul l valu e o f p a s T-*oo) a t rat e T" 1. Thi s i s th e usua l Dickey-Fulle r rat e o f converg ence: se e Chapter 4 . Results i n Banerje e an d Dolad o (1987 ) an d Banerjee , Dolado , an d
98 Propertie
s of Integrated Processes
Galbraith (1990a ) sho w tha t som e o f th e importan t distributiona l features fo r th e near-integrate d cas e (fo r example , th e lower-tai l critical values) ca n b e recovere d fro m th e Dickey-Fulle r table s simpl y b y shifting th e Dickey-Fulle r distributio n b y fixe d numbers . Thes e result s suggest that , eve n i n fairl y larg e samples , th e non-centralit y paramete r in (45 ) i s th e mos t importan t determinan t o f th e shap e o f th e distribu tion o f p. Th e mor e subtl e distributiona l features, which involve change s in shap e an d ar e give n by the secon d par t o f (45) , becom e relevan t only asymptotically.
4
Testing for a Unit Roo t This chapte r describe s method s o f testin g fo r a uni t roo t i n a n observed series . Bot h parametri c regressio n test s and non-parametric adjustment s to thes e tes t statistic s ar e considered , an d w e give the table s o f critica l value s necessar y fo r th e applicatio n o f commonly use d tests . W e als o us e functional s o f Wiener processe s to describ e th e asymptoti c distribution s of important tes t statistics . Since a n 1(1 ) serie s become s stationar y upo n bein g difference d once , i t must contai n on e uni t root . Fo r example , i f we tak e a rando m wal k as the DGP , the n w e ca n immediatel y deriv e tha t it s firs t differenc e i s stationary. I f by contrast the underlyin g data-generating process is where |pj | > 1 , then we have From (1 ) i t i s clea r tha t Ay , i s n o longe r stationary : i t depend s no t only upo n th e stationar y process MI, , bu t als o upo n th e non-stationar y process y t-i (sinc e p i - 1 > 0). Hence a n AR(1) proces s wit h a coeffici ent o f 1 is 1(1) , bu t th e sam e proces s wit h a coefficien t o f 1.0 1 i s not , since differencin g wil l not reduc e this process t o stationarity . Many economi c tim e serie s ma y contai n a n exac t uni t roo t i f w e consider logarithmi c transformation s o f th e for m routinel y applie d t o economic tim e series. Otherwise , root s ver y close to, bu t slightl y greater than, unit y impl y non-stationar y serie s tha t ar e no t l(d) fo r an y d . Roots slightl y les s tha n unit y generat e near-integrate d series . Suc h processes wil l ten d t o b e difficul t t o distinguis h from thos e wit h root s of exactly unit y on moderatel y size d samples ; suc h processe s ar e discusse d in Chapte r 3 . Root s substantially greater tha n unity , by contrast, wil l b e easily detecte d a s the explosiv e characte r o f the serie s wil l be clea r wit h even fairl y smal l samples. Consider th e simples t data-generatio n proces s withi n whic h w e ca n discuss tests for unit roots:
100 Testin
g for a Unit Root
If on e wer e testin g th e tru e hypothesi s H 0:p = p 0 fo r p 0 < 1 , th e test woul d b e easil y performed . Runnin g th e regressio n (2) , th e t-statistic ( p — p0)/SE(p) has , asymptotically , a standar d norma l distributio n and ca n b e compare d wit h table s o f significanc e point s fo r N(0, 1). I n small sample s th e statisti c i s approximatel y t -distributed, althoug h th e coefficient estimat e p i s biased downwar d slightly. For p o = 1 , however , thi s resul t n o longe r holds . Th e distributio n o f the tes t statisti c jus t give n i s no t asymptoticall y normal , o r eve n symmetric. Tables o f critica l value s hav e bee n tabulate d b y D . A . Dickey an d ar e reporte d in , e.g . Fulle r (1976) . I t i s instructiv e t o examine thes e i n detail, an d they are recorde d a s Tables 4.1 and 4.2 . The critica l value s i n Fuller' s table s pertai n t o eac h o f thre e differen t models: i t i s importan t t o not e a t th e outse t that , a s i n man y othe r instances, th e distribution s of tes t statistic s obtaine d depen d no t onl y o n the data-generatio n process , bu t als o o n th e mode l wit h whic h w e investigate it . Fo r th e tim e being , w e wil l conside r thre e possibl e models:
The nul l hypothesi s i s that p , = 1 for i = a, b, c. Th e applicabilit y of each mode l depend s on what is known about th e DGP , sinc e we want t o construct simila r tests (tha t is , test s fo r whic h the distributio n o f the tes t statistic under th e nul l hypothesis is independent o f nuisance parameter s in th e DGP) . I f a tes t i s not similar , then th e appropriat e critica l value s may depen d upo n unknow n nuisanc e parameter s (e.g . a constant) , which will invalidate standar d inferences . W e will return t o th e similarit y of test s below . Fo r th e moment , w e will follow much o f the literatur e o n the topi c i n assumin g that (2 ) i s the DGP , i n whic h case th e issu e doe s not aris e sinc e (2 ) contains no nuisanc e parameters . Another formulatio n o f th e DG P deal s wit h a potentia l difficult y tha t arises fro m (2 ) concernin g th e statu s o f th e nuisanc e parameter s unde r the alternativ e H I . p < 1. Reconsider (2 ) when there is an intercep t
Testing for a Unit Root 10
1
A simpl e solutio n wa s proposed b y Bhargava (1986 ) a s follows. Write the DGP as which i s a commo n facto r mode l (se e Sargan 1980 , and Hendr y an d Mizon 1978) . The n Now, fo r H 0: p= 1, y t i s a rando m wal k wit h n o drift , wherea s when p < 1 it i s stationary aroun d a non-zer o mean . Similarly , if y t i s adde d to the process , so that When H 0: p= 1 holds, A_y , = y + et. Thus, a tren d a t rat e y( l - p ) is present unde r th e alternative , an d drif t a t rat e y unde r th e null . Bhargava develop s severa l test s base d o n thi s formulation . Mor e re cently, Schmid t an d Phillip s (1992 ) hav e als o investigate d the propertie s and power s of test s of H 0:p= l usin g thi s approac h and fin d the m preferable, althoug h th e powe r function s cros s thos e o f correspondin g Dickey-Fuller tests . I n practice , unfortunately , the power s o f availabl e unit-root test s ar e lo w fo r alternative s differen t from , bu t clos e to , th e null of unity . In interpretin g Table 4.1, note that, i f the sig n o f an entr y in the tabl e is negativ e fo r a give n size , sa y a (wher e a i s th e probabilit y o f a smaller value) , the n a t leas t a fractio n a o f estimate s o f p ar e les s tha n 1; for model s (3b) an d (3c) , negative entrie s persis t u p t o a = 0.95 and a = 0.99 respectively i n larg e samples . Althoug h i t i s not explici t i n thi s table, entrie s eve n fo r mode l (3a ) ar e negativ e a t a = 0.50. Fo r al l of these models , then , mos t estimate s o f p ar e les s tha n 1 ; fo r th e latte r two, th e overwhelmin g majorit y ar e les s tha n 1 . Thi s hold s i n spit e o f the fac t tha t th e tru e valu e i s 1 : error s ar e fa r fro m symmetri c aroun d zero. Generally, p i s a downwardl y biased estimato r o f p ; thi s i s tru e fo r any o f th e thre e model s chosen . A tes t conducte d b y th e metho d tha t would typicall y b e use d fo r stationar y processes — that is , a tes t base d upon th e usua l t - o r asymptoti c norma l distributio n applie d t o th e ^-statistic ( p — l)/SE(p), a t conventiona l critica l values— therefore seem s likely t o giv e misleadin g results . Thi s ca n b e confirme d b y examinin g Table 4.2 , again take n fro m Fulle r (1976 ) an d originall y constructed b y Monte Carl o simulation .
102 Testin
g for a Unit Roo t
TABLE 4.1. Empirica l cumulativ e distribution o f T(p — 1) DGP: (2 ) with p = 1 Sample Probabilit y of a smaller value 3 size (T ) 0.10 0.9 0.01 0.025 0.0 5 (a) Mode l 25 50 100 250 500 00
(b) Mode l 25 50 100 250 500 00
(c) Mode l 25 50 100 250 500 00
(3fl)/(8«;1
0 0.9
5 0.97
5 0.9
9
-9.3 -9.9 -10.2 -10.3 -10.4 -10.5
-7.3 -7.7 -7.9 -8.0 -8.0 -8.1
ca —J, J
-5.5 -5.6 -5.7 -5.7 -5.7
1.01 0.97 0.95 0.93 0.93 0.93
1.40 1.35 1.31 1.28 1.28 1.28
1.79 1.70 1.65 1.62 1.61 1.60
2.28 2.16 2.09 2.04 2.04 2.03
(3b)/(8b ) -17.2 -14.6 -18.9 -15.7 -19.8 -16.3 -20.3 -16.6 -20.5 -16.8 -20.7 -16.9
-12.5 -13.3 -13.7 -14.0 -14.0 -14.1
-10.2 -10.7 -11.0 -11.2 -11.2 -11.3
-0.76 -0.81 -0.83 -0.84 -0.84 -0.85
0.01 -0.07 -0.10 -0.12 -0.13 -0.13
0.65 0.53 0.47 0.43 0.42 0.41
1.40 1.22 1.14 1.09 1.06 1.04
(3c)/(8c) -22.5 -25.7 -27.4 -28.4 -28.9 -29.5
-17.9 -19.8 -20.7 -21.3 -21.5 -21.8
-15.6 -16.8 -17.5 -18.0 -18.1 -18.3
-3.66 -3.71 -3.74 -3.75 -3.76 -3.77
-2.51 -2.60 -2.62 -2.64 -2.65 -2.66
-1.53 -1.66 -1.73 -1.78 -1.78 -1.79
-0.43 -0.65 -0.75 -0.82 -0.84 -0.87
-11.9 -12.9 -13.3 -13.6 -13.7 -13.8
-19.9 -22.4 -23.6 -24.4 -24.8 -25.1
a
e.g. , fo r model (3« ) wit h T = 100, P r [T(p - 1 ) < 1.65 ] = 0.975. Al l entries in th e lef t hal f o f the tabl e hav e standard error s les s than 0.15; thos e i n the right half, les s than 0.03. Source: Fulle r (1976 : 371) .
This tabl e give s th e cumulativ e distributio n o f th e f-statisti c fo r HQ: p = 1 in eac h o f th e model s (3a)-(3c) . I t i s especially interestin g t o compare th e result s fo r eac h o f thes e model s wit h thos e w e woul d obtain wit h a stationar y process; becaus e th e ^-statisti c would asymptot ically b e distribute d ./V(0,1 ) i n tha t case , th e statistic s woul d b e distributed a s indicated i n the las t line of the table . For mode l (3a) , w e se e tha t th e result s approximat e thi s outcom e reasonably closel y i f we ad d (ver y roughly) 0. 3 t o eac h entr y i n par t (a ) of th e table ; tha t is , th e entir e distributio n of th e f-statisti c i s shifte d t o more negative values, by approximately this amount.
Testing for a Unit Root 10
3
TABLE 4.2. Empirica l cumulativ e distribution o f ( p — l)/SE(p) DGP: (2 ) with p = 1 Sample Probabilit y o f a smaller value size (T ) 0.01 0.02 5 0.0 5 0.1 0 0.9
(3 )/(8«:1
(a) Model fl 25 -2.66 50 -2.62 100 -2.60 250 -2.58 500 -2.58 00 -2.58
0 0.9
5 0.97
5 0.9
9
-2.26 -2.25 -2.24 -2.23 -2.23 -2.23
-1.95 -1.95 -1.95 -1.95 -1.95 -1.95
-1.60 -1.61 -1.61 -1.62 -1.62 -1.62
0.92 0.91 0.90 0.89 0.89 0.89
1.33 1.31 1.29 1.29 1.28 1.28
1.70 1.66 1.64 1.63 1.62 1.62
2.16 2.08 2.03 2.01 2.00 2.00
(b) Mode l (3&)/(8ft ) 25 -3.75 -3.33 50 -3.22 -3.58 100 -3.51 -3.17 250 -3.14 -3.46 -3.44 500 -3.13 00 -3.12 -3.43
-3.00 -2.93 -2.89 -2.88 -2.87 -2.86
-2.63 -2.60 -2.58 -2.57 -2.57 -2.57
-0.37 -0.40 -0.42 -0.42 -0.43 -0.44
0.00 -0.03 -0.05 -0.06 -0.07 -0.07
0.34 0.29 0.26 0.24 0.24 0.23
0.72 0.66 0.63 0.62 0.61 0.60
(3c)/(8c) -3.95 -4.38 -4.15 -3.80 -4.04 -3.73 -3.69 -3.99 -3.98 -3.68 -3.66 -3.96
-3.60 -3.50 -3.45 -3.43 -3.42 -3.41
-3.24 -3.18 -3.15 -3.13 -3.13 -3.12
-1.14 -1.19 -1.22 -1.23 -1.24 -1.25
-0.80 -0.87 -0.90 -0.92 -0.93 -0.94
-0.50 -0.58 -0.62 -0.64 -0.65 -0.66
-0.15 -0.24 -0.28 -0.31 -0.32 -0.33
-2.33
-1.65
-1.28
1.28
1.65
1.96
(c) Model 25 50 100 250 500 00
N(0, 1 ) 00
-1.96
2.33
Source: Fulle r (1976 : 373).
In model s (3b) an d (3c) , w e se e greate r deviation s fro m th e N(0,1 ) pattern abov e tha t woul d hol d asymptoticall y for \p \ < 1. A s a constan t and the n th e tren d ar e adde d t o a model , w e see mor e entrie s tha t ar e negative i n th e table s (part s (b ) an d (c)) ; a s i n Tabl e 4.1 , a greate r proportion o f estimate d j3 s become negative . With th e informatio n i n Tabl e 4.2 , however , w e ca n no w conside r applying a tes t fo r p = 1 usin g th e f-statisti c fro m an y o f th e thre e models. A s lon g a s we ar e awar e tha t th e distributio n o f th e statisti c is non-standard, an d s o avoid making the mistak e of applying t - or norma l tables, thes e significanc e points tabulate d b y Dicke y an d Fulle r ca n b e used i n their plac e t o provid e a valid test. Fo r example , conside r model
104 Testin
g for a Unit Roo t
(3b). A ^-statisti c o f +1.0 0 woul d no t lea d t o rejectio n o f th e nul l against a n explosiv e alternativ e i f w e wer e applyin g N(0,1 ) tables ; b y Table 4.2b, however , th e tes t reject s a t th e 5 per cen t leve l (o r eve n th e 1 pe r cen t level ) becaus e th e probabilit y o f th e statisti c exceedin g eve n 0.60 i s onl y 0.01 . B y contrast , a valu e o f -2.50 , whic h woul d lea d t o rejection o f H 0 usin g standard norma l tables , ca n n o longe r b e use d t o infer tha t H 0 ( p = 1) i s fals e agains t a stationar y alternativ e a t th e 5 % level. This wa s the firs t for m o f 'unit-roo t test ' t o hav e bee n developed . It s main potentia l disadvantag e lie s i n th e fac t tha t i t i s base d upo n th e assumption tha t th e data-generatio n proces s (2 ) hold s precisel y unde r the null . Man y series wil l b e integrate d o f order 1 but wil l no t hav e thi s form; i n particular, th e DG P ma y contain nuisanc e parameter s suc h a s a constant o r othe r exogenou s variables , o r ma y contai n riche r dynamic s in th e variabl e o f interest . A s a n exampl e o f th e latter , conside r a general AR(/> ) process i n y t: a(L)y, = ut, with a(L) = (I — L)a*(L), an d wher e al l laten t root s o f a*(L) li e within th e uni t circle . Suc h a proces s i s 1(1) , and , dependin g upo n th e form o f th e polynomial , a*(L) ma y b e wel l approximate d b y (2 ) wit h p = 1. T o th e exten t tha t i t i s not, however , the critica l values in Table s 4.1 an d 4. 2 ma y b e inaccurate . W e wil l conside r severa l method s o f dealing wit h thi s i n Section s 4. 2 an d 4.3 . First , however , w e wil l consider th e possibilit y o f additiona l exogenou s regressor s i n th e DGP , and th e proble m o f constructing simila r tests under thes e conditions .
4.1. Simila r Test s an d Exogenou s Regressors i n th e DGP Kiviet an d Phillip s (1992 ) conside r exac t an d simila r test s fo r th e coefficient o n a lagge d dependen t variable , i n a first-orde r autoregress ive mode l tha t ma y includ e multipl e exogenou s variables . I n orde r t o compute th e exac t critica l value s fo r suc h tests , thes e author s us e numerical integratio n base d o n th e Imho f routine . (Se e Imho f (1961 ) o r Koerts an d Abrahams e (1969 ) fo r a n introduction. ) Whil e thi s pro cedure ca n b e use d t o construc t exac t an d simila r test s fo r a DG P wit h first-order dynamic s an d als o containin g arbitrar y strictl y exogenou s processes, th e Dickey-Fulle r test s alread y discusse d wil l b e simila r test s for som e DGPs . Evans an d Savi n (1981 , 1984) , Nankervi s an d Savi n (1985 , 1987) , an d Bhargava (1986) , a s wel l a s Kivie t an d Phillips , al l consider th e properties o f Dickey-Fuller test s fo r variou s DGPs . Som e o f th e result s may b e summarize d a s follow s (fi = £ 0; y = £ 0).
Testing for a Unit Root 10
5
DGP Model s yieldin g similar tests 1 (i) y t = Pyt-i + ut, y0 = 0 (3c) , (36) , (3c ) (ii) y t = py,-! + ut, arbitrary y 0 (36) , (3c ) (iii) y t = [i+ py t-i + ut, arbitrary y 0 (3c ) (iv) y t = [a + yt + pyt-i + u f> arbitrar y y 0 Extensio n o f (3c ) necessar y Thus, fo r example , i n cas e (i) , i f th e mode l i s give n b y (3c) , th e appropriate critica l value s ar e give n b y Table s 4.1(c ) an d 4.2(c) . Th e same table s ca n b e use d t o conduc t inferenc e i n (iii) , despit e a non-zer o value o f n i n th e DGP , becaus e (3c ) yield s a simila r test . Similarit y implies tha t th e distribution s o f p an d it s associate d ^-statisti c ar e no t affected b y th e value , unde r th e null , o f th e nuisanc e parameter , an d the critical value s ar e th e sam e a s the one s tha t woul d appl y fo r n = 0, namely, those i n Tables 4.1(c ) an d 4.2(c). There ar e a numbe r o f noteworth y additiona l points . I n cas e (i ) ther e are n o nuisanc e parameters , s o tha t similarit y i s a trivia l property . I n general, a s this summar y suggests , a simila r tes t havin g a Dickey-Fuller distribution require s tha t th e mode l use d contai n more parameter s tha n the DGP . I n order to hav e a similar test fo r (iv) , one woul d the n nee d a model wit h a ter m suc h a s t 2, necessitatin g anothe r bloc k o f critica l values i n eac h o f Table s 4. 1 and 4.2 . I n cas e (ii) , fo r example , w e nee d at leas t mode l (36 ) (wit h a constant ) t o allo w fo r th e unknow n startin g value. I n cas e (iii ) w e hav e a n unknow n constan t an d nee d th e tren d term i n model (3c ) t o allo w for it s effect . Each o f thes e simila r test s i s als o exac t i n finit e samples , provide d appropriate critica l value s ar e available . I n general , however , i t wil l b e necessary t o abando n exac t test s i n orde r t o us e variant s o f th e Dickey-Fuller tes t wher e ther e ar e mor e unknow n parameters . Thes e parameters ca n typicall y be estimated , s o that asymptoticall y they can b e accounted fo r an d a tes t provided . Again , Kivie t an d Phillip s offe r general exac t an d simila r test s fo r DGP s wher e th e dynamic s ar e restricted t o first-order , a s wel l a s demonstratin g th e similarit y o f th e tests just mentioned . In th e cas e o f exac t parameterizations , suc h a s cas e (iii ) wit h mode l (3£>), w e d o no t hav e simila r test s wit h th e Dickey-Fulle r distributions . However, a s West (1988 ) showed , the f-statistic s i n th e exactl y paramet erized cas e (wit h exogenou s item s suc h a s a constan t i n th e DGP ) ar e asymptotically normal , jus t a s ar e f-statistic s use d fo r standar d prob lems. I n finit e samples , however , th e Dickey-Fulle r distribution s ma y be a better approximatio n tha n th e norma l distribution . We will explor e this asymptoti c normalit y further i n Chapte r 6 below.
1
Critica l value s ar e those corresponding t o the mode l use d i n Table 4.1 or 4.2 .
106 Testin
g for a Unit Roo t
4.2. Genera l Dynami c Model s fo r the Proces s o f Interest The firs t o f th e method s fo r allowin g richer dynamic s in th e DG P o f th e process o f interest , { y t } , wa s develope d concurrentl y wit h th e tes t tha t we hav e alread y describe d fo r a uni t roo t i n th e AR(1 ) model , an d i s reported i n Fulle r (1976) . Thes e mor e genera l method s yiel d tes t statistics tha t hav e th e sam e limiting distribution s a s thos e alread y discussed, becaus e the y ar e base d o n consisten t estimate s o f 'nuisance ' parameters. Henc e w e ma y us e th e las t row s o f Table s 4.1(a)-(c ) o r 4.2(a)-(c) fo r inferenc e wit h thes e statistic s i n larg e samples , bu t i n small sample s percentag e point s o f thei r distribution s will no t i n genera l be th e sam e a s fo r thos e applicabl e unde r th e stron g assumption s o f th e simple Dickey-Fuller model . When y t follow s a n AR(p) process ,
a tes t ca n be constructe d wit h the regressio n model :
The coefficien t p i s use d t o tes t fo r a uni t root , an d T(p — 1) an d (p - l)/SE(p ) hav e th e limiting distribution s tabulate d i n Tables 4.1(a ) and 4.2(a ) fo r T-*°°. Moreover , jus t a s i n th e cas e o f a n AR(1) process, w e ca n exten d thi s regressio n mode l t o allo w for th e possibilit y that th e data-generatio n proces s contain s a constan t (drift ) ter m o r a deterministic time trend. Again , fo r suitably modified regression models , the asymptoti c distribution s of th e statistic s base d o n p ar e thos e give n in Table s 4.1(fe)/(c ) an d 4.2(fe)/(c ) fo r T-^°°. Thes e procedure s ar e called 'augmented ' Dickey-Fulle r (ADF ) tests . The ai m i n modification s suc h a s thes e t o th e simple r for m o f th e Dickey-Fuller tes t i s to us e lagge d change s in th e dependen t variabl e t o capture autocorrelate d omitte d variable s whic h woul d otherwise , b y default, appea r i n th e (necessaril y autocorrelated ) erro r term . Wit h th e additional lagge d term s i t wil l b e possible , i f th e DG P ha s th e for m o f (4), t o produc e a mode l (5 ) i n whic h asymptoticall y the erro r term s ar e white noise , becaus e th e nuisanc e parameters ar e know n asymptoticall y and th e term s involvin g the m ma y b e remove d fro m th e erro r term . With white-nois e errors , th e asymptoti c Mont e Carl o critica l value s given i n th e firs t tw o table s ma y b e applied . Moreover , th e asymptoti c distribution o f th e coefficien t o n th e y r -i ter m i n (5 ) i s no t affecte d b y the inclusio n o f th e additiona l Aj f _, terms . I f y, is 1(1), th e difference d
Testing for a Unit Root 10
7
terms ar e al l 1(0 ) an d appropriat e scalin g ensure s tha t th e variance covariance matri x i s asymptoticall y block-diagonal . (Tha t is , al l cross product term s o f 1(0 ) an d 1(1 ) variable s i n th e matri x ar e asymptoticall y negligible.) I t i s thi s asymptoti c orthogonality tha t drive s th e result , much as , i n a standar d regressio n model , on e use s th e orthogonalit y of the informatio n matri x t o prov e th e statistica l independenc e o f th e estimated coefficien t vecto r fro m th e estimat e o f the standar d error . Th e asymptotic theor y an d th e issu e o f 'appropriate ' scalin g ar e discusse d later i n this chapter an d i n Chapter 6 . By allowin g the DG P t o tak e th e for m (4 ) rather tha n th e muc h mor e restrictive AR(1 ) for m (3) , w e hav e expande d th e clas s o f model s t o which we can validl y appl y unit-roo t test s of thi s type . Not e that , as it will generall y b e th e cas e tha t p i s unknown even wher e y t i s strictly an AR(p) process , i t i s generall y safe r t o tak e p t o b e a fairl y generou s number; i f too man y lags ar e presen t i n (5) , th e regressio n i s free t o se t them t o zer o a t th e cos t o f som e los s i n efficiency , wherea s to o fe w lags implies som e remainin g autocorrelatio n i n (5 ) an d henc e th e inapplicab ility o f even th e asymptoti c distributions i n Tables 4. 1 an d 4.2 . On e can , of course , perfor m test s fo r autocorrelatio n o n th e estimate d residual s from (5 ) i n orde r t o chec k th e acceptabilit y o f th e premis e tha t thes e residuals ar e whit e noise . Alternatively , mode l selectio n procedure s ca n be used t o choose p, and test fo r a unit root, jointly (see Hal l 1990) . We have , therefore , a class o f tests fo r th e uni t root whic h can validly be applie d t o serie s tha t follo w AR(p ) processe s containin g n o mor e than on e uni t root . Th e nex t natura l ste p i s to attemp t t o exten d furthe r the clas s of series t o which we can appl y such tests , ideall y in such a way as t o allo w exogenou s variable s t o ente r th e proces s a s well . Sai d an d Dickey (1984 ) provid e a tes t procedur e vali d fo r a genera l ARM A process i n th e errors ; Phillip s (1987a ) an d Perro n an d Phillip s (1988 ) offer a still more genera l procedure . While th e Said-Dicke y approac h doe s represen t a generalizatio n o f the Dickey-Fulle r procedure , i t agai n yield s test statistic s wit h th e sam e asymptotic critica l value s a s thos e tabulate d b y Dicke y an d Fuller . Th e particular advantag e o f thi s tes t i s tha t w e ca n appl y i t no t onl y t o models wit h M A part s i n th e errors , bu t als o t o model s fo r whic h (as is typically th e case ) th e order s o f th e A R an d M A polynomial s i n th e error proces s ar e unknown . Th e method involve s approximating the tru e process b y a n autoregressio n i n whic h the numbe r o f lag s increases wit h sample size . Begin b y assuming that th e data-generatio n proces s follows :
108 Testin
g for a Unit Root
so tha t th e erro r ter m i n th e autoregressio n follow s a n ARMA(p,q), presumed t o be stationar y an d invertible . Th e DG P ca n be rewritten a s
where k i s larg e enoug h t o allo w a goo d approximatio n t o th e ARMA(/>, q) proces s {u,}, s o tha t {v (} i s approximatel y whit e noise . The nul l hypothesi s i s agai n tha t p = 1. Sai d an d Dicke y sho w tha t th e test i s valid i n spit e o f th e fact s tha t p an d q ar e unknow n and tha t th e ARMA(p, q) i s approximated b y a n A R process , a s lon g a s k increase s with th e sampl e siz e T s o tha t ther e exis t number s c an d r, c > 0 an d r > 0 , suc h tha t c k > T 1/r an d T~ l/3k^Q. Henc e 7 1/3 i s a n uppe r bound o n th e rat e a t whic h th e numbe r o f lags , k , shoul d b e mad e t o grow wit h th e sampl e size . Ordinar y least-square s estimatio n o f th e model (6 ) i s prove n t o yiel d a consisten t estimato r o f ( p — 1); th e tes t can the n b e base d o n th e ?-typ e statistic , ( p - l)/SE(p) , usin g Tabl e 4.2(a). Clearly , th e for m o f th e regressio n implie d b y th e Said-Dicke y test i s precisely the sam e a s that o f the augmente d Dickey-Fulle r test . In thi s case Tabl e 4.2(a) , correspondin g t o a model containin g no drif t or trend , i s used , bu t th e tes t ca n als o b e adapte d t o allo w fo r a non-zero drif t ter m fj, i n th e model . Th e tes t i s modified onl y i n s o fa r a s it i s the n base d no t o n y, bu t o n y t — y,wher e y = T~l^^=iyt. Th e regression mode l (6 ) remain s th e sam e excep t fo r th e firs t regressor , which become s (y t-\ — y), an d tes t statistic s are calculate d i n th e sam e way. B y analogy to th e earlie r result s fo r Dickey-Fuller an d augmente d Dickey-Fuller tests , i t i s no t surprisin g tha t w e no w refe r t o Tabl e 4.2(b), correspondin g t o a mode l containin g a drif t term , fo r th e significance point s o f the (asymptotic ) distributions of th e statistics . Monte Carl o studie s of test powe r i n models wit h autocorrelate d erro r processes, describe d b y Dicke y e t al. (1986) , sugges t tha t th e empirica l levels o f th e T(p — 1) statistics ten d t o b e farthe r fro m th e nomina l tes t levels tha n thos e o f th e f-typ e statistics . Dicke y e t al. therefor e sugges t the us e o f th e f-typ e statistic s in thes e cases . Deviatio n o f nomina l fro m actual tes t level s i s particularly grea t i n DGP s wit h M A part s suc h tha t the M A la g polynomia l contain s a factor o f ( 1 — 6L), wit h 6 nea r unity . The near-cancellation o f such a factor wit h th e factor ( 1 - L ) i n the AR lag polynomia l (unde r th e null ) affect s th e actua l levels o f bot h T(p — 1) and f-typ e statistics , bu t i s especially seriou s fo r th e former .
4.3. Non-parametri c Test s for a Unit Roo t In extendin g th e origina l tests abov e t o allo w for higher-order autocorre lation, w e adde d extr a term s t o th e regressio n mode l t o accoun t fo r th e
Testing for a Unit Root 10
9
autocorrelation i n th e residual s tha t woul d otherwis e b e present . B y extending the model , i t was possible t o continu e to dra w valid inferences from th e asymptoti c critica l value s give n i n Table s 4. 1 an d 4.2 ; other wise i t woul d have bee n necessar y t o recomput e thes e critica l value s for each differen t DGP , whic h i n tur n woul d requir e knowledg e o f th e unobservable orders (p) o f the processe s i n these underlyin g DGPs. In expandin g th e se t o f models to whic h we ca n appl y these tests , ou r aim i s to avoi d increasing the numbe r o f table s o f critical values that we must fin d an d us e whil e nonetheles s allowin g fo r quit e genera l DGPs . Phillips (1987a ) provide s a n alternativ e procedur e tha t largel y allow s us to d o so ; ou r expositio n relie s o n furthe r result s reporte d i n Perro n (1988) an d Phillip s an d Perro n (1988) . Rathe r tha n takin g accoun t o f extra elements i n th e DG P b y addin g the m t o th e regressio n model , Phillips suggest s accounting for th e autocorrelatio n tha t wil l b e presen t (when thes e term s ar e omitted ) throug h a non-parametri c correctio n t o the standar d statistics . Tha t is , whil e th e Dickey-Fulle r procedur e aim s to retai n th e validit y o f test s base d o n white-nois e error s i n th e regression mode l b y ensurin g tha t thos e error s ar e indee d whit e noise , the Phillip s procedur e act s instea d t o modif y th e statistic s afte r estima tion i n orde r t o tak e int o accoun t th e effec t tha t autocorrelate d error s will hav e o n th e results . Asymptotically , th e statisti c is corrected b y th e appropriate amount , an d s o th e sam e limitin g distribution s apply. Fro m one perspective , th e effec t i s the sam e a s that o f ADF-type tests: we can validly conduc t asymptoti c inferenc e usin g Table s 4. 1 an d 4.2 . Thi s procedure doe s not , however , requir e th e estimatio n o f additiona l parameters i n the regressio n model . The data-generatio n process that is assumed to hol d is
or equivalently
It i s importan t t o note , however , tha t th e erro r ter m i s no t bein g assumed t o follo w a white-nois e process . Th e condition s tha t u t mus t satisfy i n (70 ) an d (Ib) ar e thos e liste d above i n Chapte r 3 as conditions (3.160)-(3.16d) give n in Phillips (19870). As wit h th e Dickey-Fulle r tests , test s o f th e Phillip s typ e ar e base d upon on e o f three differen t regressio n models , differin g onl y i n on e cas e from thos e use d earlier , b y centring the tren d term :
110 Testin
g for a Unit Roo t
and It i s eas y t o calculat e fro m thes e regression s th e coefficien t estimate s and th e '^-statistics ' fo r each . Fo r test s o f th e significanc e o f p,- , th e statistics ar e the n adjuste d t o reflec t autocorrelatio n i n th e corresponding Uit series . (W e wil l omi t subscript s a , b , o r c o n u t t o simplif y notation.) I f we defin e
and
then th e limitin g distribution s of th e tes t statistic s do no t depen d upo n the parameter s o f the proces s determinin g th e sequenc e {u t} i f o 2 = ou. In th e cas e o f test s statistic s o f th e Dickey-Fulle r (DF ) typ e tha t w e examined earlier , th e mode l i s presumed t o captur e th e relevan t features of th e proces s i n suc h a wa y tha t th e error s ar e independentl y an d identically distributed ; th e latte r i s sufficien t t o guarante e tha t a 2 = o 2u. Note tha t th e statistic s use d i n th e DF-typ e parametri c test s d o emerg e as specia l case s o f th e non-parametri c statistic s wher e th e estimate s o f the parameter s o 2 an d o 2u ar e equa l (i.e . where th e estimate s S 2U an d S2Tt, give n in (11) and (12 ) below, are equal) . We wil l se e thi s mor e clearl y whe n w e examin e th e non-parametri c statistics. I n orde r t o d o so , w e firs t nee d consisten t estimator s o f o 2 and o 2u. Ther e ar e a numbe r o f possibl e choices . I f \i = 0 i n th e DG P (7), the n th e standar d estimato r fro m an y o f (8a) , (8£>) , (8c ) wil l b e consistent fo r a u\ that is,
where u, represents th e residual s fro m on e o f (8a), (8b), (8c) , above. If j U ^ O , th e estimato r i s no t consisten t usin g th e residual s {u at}, bu t residuals fro m eithe r o f th e othe r tw o model s d o yiel d a consisten t estimate. For th e estimato r o f a 2 , a consisten t estimato r ca n b e foun d a t th e cost o f strengthenin g th e assumptions . First , conditio n (3.16& ) i s re placed wit h the conditio n tha t sup r E(\u t\2^} < ° ° fo r som e fi>2 . Next , a conditio n mus t b e place d o n th e la g truncatio n paramete r € which wil l be use d i n definin g th e estimato r o f a 2. The conditio n i s that £ —»°° a s T—> oo , suc h tha t ( i s o(T 1/4). Tha t is , th e numbe r o f lag s use d i n
Testing for a Unit Root 11
1
estimating autocorrelation s o f th e residual s increase s wit h th e sampl e size, but les s quickly than its fourth root. Given these conditions , a consistent estimato r o f a 2 is
The estimato r i s indexe d b y th e la g truncatio n paramete r € t o indicat e that differen t choice s o f € wil l lead t o differen t values . I t remain s only to specif y th e residual s t o b e use d i n (12) , and, as i n (11 ) above , w e may choos e the m fro m an y o f (8a) , (86) , (8c ) if fj. = 0. Als o a s i n (11), ,u + 0 require s tha t w e us e th e residual s fro m on e o f th e model s tha t does contai n a constant ter m in order t o preserv e th e consistenc y of this variance estimate . Evidentl y th e saf e strateg y i s t o tak e residua l esti mates fro m (8b) o r (8c ) i n an y cas e wher e ther e seem s eve n a smal l probability tha t th e data-generatio n proces s contain s a constan t (drift ) term. It i s important t o not e tha t bot h o f th e varianc e estimates S 2U an d S 2T( could b e define d usin g th e firs t difference s y t — yt_i rathe r tha n th e residuals u t. Under th e nul l hypothesis that p — 1 and that th e drif t an d trend term s are zero , the two wil l of cours e be equivalen t asymptotic ally. I n finit e samples , whic h o f th e tw o method s i s use d ca n mak e a substantial difference , however ; we will return to thi s point below. While S\e jus t define d i s consisten t fo r o 2 give n residual s fro m th e appropriate model , i t unfortunatel y doe s no t guarante e a non-negativ e estimate fo r finit e sampl e sizes . However , on e ca n guarante e a nonnegative estimat e wit h a simpl e modificatio n o f (12 ) pioneered b y Newey an d Wes t (1987) , whic h i s moreove r consisten t unde r precisel y the sam e conditions as is (12). Define
where (o f(j) = 1 - j((, + I)"1. A fe w example s o f test s usin g thes e quantities t o transfor m th e tes t statistic s ca n b e presente d withou t further discussion . Thereafter we will present statistic s for hypothese s o n \nb, \n c, an d y e i n (8b) an d (8c) , and fo r hypothese s involvin g p a s well as these parameters. Consider th e hypothesi s tha t p b = I (i n (8b)). 2 A n asymptoticall y valid tes t consist s of the statistic 3
2
W e trea t th e initia l observatio n a s fixe d a t zero ; not al l statistics here are invarian t t o the initia l value. Se e Phillips (1987a) an d Perron (1988). 3 Thes e statistic s ar e vali d fo r eithe r choic e o f S 2Tt give n abov e (i.e . the Phillip s o r Newey-West forms) .
112 Testin
g for a Unit Roo t
or, alternatively ,
where t(p b) i s th e ^-statisti c associate d wit h testin g th e nul l hypothesi s pb - 1 . Th e first o f these statistics , Z(p b), ha s under th e null hypothesis (H0: p b = 1) the limitin g distribution give n in Table 4.1(6) (T —* °°) ; th e second ha s th e limitin g distribution give n in Tabl e 4.2(6 ) (7 1 — » °°) unde r the sam e null . I t i s especially usefu l t o not e agai n her e th e fac t tha t th e original Dickey-Fuller statistic s are specia l case s o f these. Unde r Dicke y and Fuller' s assumptions , th e {«/,, } f=i ar e independentl y an d identicall y distributed, implying , a s w e note d above , tha t o\ = a2 an d therefor e that E(S 2Tf) = E(S 2U). Henc e o n averag e S 2T{ = S 2U, an d Z(p b) reduce s to T(p b — 1). Thi s i s precisely th e firs t o f th e statistic s tha t Dicke y an d Fuller examine . Moreover , Z(t(p b)) reduce s t o t(p b), th e ordinar y regression ^-statistic , an d ha s the distributio n given in Table 4.2. The correspondin g statistic s for model s (8a) an d (8c ) are als o give n in Perron (1988) , an d shar e thi s property . Fo r (8a), th e tes t statistic s ar e similar t o (14 ) and (15) . They ar e (wit h _y 0 = 0)
and
Analogous t o th e test s o n (8a) , (16 ) has th e significanc e points give n in Table 4.1(a ) an d (17 ) those i n Table 4.2(a) . Finally , fo r mode l (8c) , we have
and having th e limitin g distribution s tabulate d i n Table s 4.1(c ) an d 4.2(c ) respectively. Th e quantit y D x i s defined a s the determinan t o f th e inne r product o f the dat a matri x with itself: for (8c),
where, again , summation s are ove r al l available elements o f the vectors .
Testing fo r a Unit Root 11
3
In additio n t o th e extensio n o f th e Phillip s (1987fl ) result s t o th e cas e of regressio n model s containin g constan t an d trend , Phillip s an d Perro n (1988) presen t simulatio n evidenc e regardin g th e powe r o f th e Phillips type procedure s vis-a-vis that o f the Said-Dicke y procedure , eac h bein g applicable t o processe s tha t hav e genera l ARMA(j> , q) processe s i n th e errors fro m a regressio n mode l tha t consist s o f a constan t an d lagge d dependent variable . Th e data-generatio n process i s taken t o be
To characteriz e th e result s roughly, the Phillip s or Phillips-Perron tes t generally ha s highe r power , bu t suffer s substantia l siz e distortion s fo r 6 < 0, i n sample s o f size s typicall y foun d i n economics . Th e Said Dickey tes t als o involve s siz e distortion s fo r 9 < 0, bu t muc h smalle r ones: tha t is , eac h tes t reject s a tru e nul l o f p = 1 mor e tha n th e nominal siz e ( 5 per cen t i n these experiments ) states , bu t th e proble m is much wors e fo r th e Z(p ) an d Z(t(p)) statistic s o f Phillip s an d Perron , where rejection s o f th e tru e nul l rang e a s hig h a s 99. 7 pe r cen t fo r 6 = -0.8. (Siz e an d powe r als o depen d upo n th e numbe r o f lags chose n in th e Said-Dicke y tes t an d o n th e la g truncatio n paramete r i n th e Phillips-Perron tests. ) Fo r th e Said-Dicke y test , th e larges t siz e distor tions (wit h tw o lags , a tru e nul l i s rejecte d approximatel y 67. 7 pe r cen t of th e tim e a t a nomina l siz e o f 5 per cent ) disappea r a s th e numbe r of lags used increases, fallin g t o onl y 1 2 per cen t where 1 2 lags are used . This simulatio n stud y i s o f cours e a limite d one , dealin g a s i t doe s with onl y on e AR M A proces s fo r th e equatio n errors . I t doe s howeve r suggest tha t th e Phillips-typ e test s ar e mor e likel y to rejec t th e nul l of a unit root , whether or no t i t i s false; fo r error s wit h stron g negativ e M A components, th e differenc e i s quite large . On e migh t suspect a s well that the powe r o f th e Said-Dicke y procedur e woul d be highe r fo r processe s involving A R errors , becaus e th e tes t regressio n capture s A R term s precisely. Phillips an d Perro n conclud e b y recommendin g thei r ow n Z(p ) tes t for model s wit h positiv e M A o r II D errors , an d th e Said-Dicke y statistic for models with negative MA errors .
4.4. Test s o n More than One Paramete r The test s abov e hav e al l been directe d a t testin g th e leve l autoregressiv e parameter alone . I n model s (8b) an d (8c) , however , ther e ar e othe r parameters present , an d on e ma y b e intereste d i n a forma l tes t o f th e hypothesis tha t on e o f thes e i s zero , o r i n a joint test . Test s simila r t o
114 Testin
g for a Unit Roo t
those abov e ca n b e provided , bu t a furthe r se t of table s mus t b e use d t o find th e significanc e point s o f th e distribution s o f th e resultin g tes t statistics. Table s 4. 4 an d 4. 5 belo w ar e base d o n thos e give n b y Dicke y and Fulle r (1981) , wh o provid e likelihoo d ratio , ^-type , an d F-type statistics for test s on th e parameter s fi b, (JL C, an d y c i n (8b) an d (8c) . Th e tables ar e agai n derive d fro m a Mont e Carl o simulation . The statistic s tha t Dicke y an d Fulle r offe r ar e derive d unde r th e assumption tha t u bt an d u ct ar e white-nois e processes , bu t the y sho w that, a s wa s th e cas e wit h test s above , th e sam e distribution s ca n b e applied wher e th e error s follo w a n autoregressiv e proces s an d a cor rectly specifie d mode l i s used t o estimat e th e parameter s o f thi s process . As we noted earlier , however , it is desirable t o generaliz e th e test s t o b e applicable t o a s broad a s possible a class o f error processes , o f unknown form. Thi s ca n be done , onc e again , using a non-parametric correction . Table 4. 3 summarize s th e Mype , F-type , an d non-parametri c tes t statistics used fo r severa l nul l hypotheses involvin g the parameter s fi an d y. I n additio n t o th e quantitie s define d above , w e requir e
The Phillips-Perro n correction s t o th e standar d Dickey-Fulle r statist ics mus t howeve r b e use d cautiously . Again , th e accumulate d evidenc e of severa l Mont e Carl o simulatio n studie s suggest s tha t th e non-para metrically correcte d tes t statistic s d o no t alway s hav e th e correc t size s even in fairl y larg e samples . Schwert (1989 ) make s thi s poin t forcefully . Hi s results , amplifyin g those i n th e Phillips-Perro n simulation s reporte d earlier , sho w tha t th e critical value s o f th e augmente d Dickey-Fulle r tes t statistics , give n b y the standar d Dickey-Fulle r tables , ar e muc h mor e robus t t o th e presence o f movin g averag e term s i n th e error s o f th e random-wal k process tha n ar e th e correspondin g non-parametricall y adjuste d Dickey Fuller statistics . A n example , take n fro m Schwert , i s sufficien t t o illustrate th e point . The data-generatio n proces s i s give n by 4 y, = yt-i + ut + du t~i, 4
Fo r conformit y wit h th e notatio n o f Phillips-Perro n use d earlier , th e sig n o f th e coefficient o n 6 is changed here .
TABLE4.3(a). Tes t statistics for simple hypotheses in models with drif t an d trend 3 Statistic typ e Tes
a
t Statistic
Critica l values for Z(TI) , Z(t2) , an d Z(T^) ar e th e sam e as those fo r TI , TI, an d 7 3 respectively and ar e tabulate d i n Table 4.4. Note als o tha t S 2U an d S\ e ar e define d wit h respect t o th e residual s o f a particula r model , an d s o diffe r acros s models (8a), (8b), and (8c) . c ti(j) i s the it h diagonal element of the invers e second-moment matrix of the regressors i n model j . Sources: Dickey and Fuller (1981 ) and Perro n (1988) .
TABLE 4.3(6). Test statistics for joint hypothesesa
a
Critical values for Z(<£>i), Z(<J> 2 )> and Z(
3) are the same as those for !, 2, and <53 respectively and are tabulated in Table 4.5. Note also that S2U and S2T( are defined with respect to the residuals of a particular model, and so differ across models (8a), (8b), and (8c). Sources: Dickey and Fuller (1981) and Perron (1988).
Testing for a Unit Root 11
7
(t - -19,. . ., T), wher e th e {«, } proces s i s a normall y distribute d white-noise process . Th e firs t 2 0 observation s ar e discarde d t o contro l for th e effec t o f th e initia l conditions . Sample s o f siz e T = 25, 50 , 100 , 250, 500 , an d 100 0 ar e use d i n th e experiment s an d eac h experimen t i s replicated 10,00 0 times . Th e M A paramete r 9 is set equa l t o 0.8 , 0.5 , 0 , -0.5, an d —0.8 . Th e mode l estimate d i s
Six differen t tes t statistic s ar e considered , includin g th e ordinar y an d augmented Dickey-Fulle r statistic s an d th e Phillips-Perro n statistics . Both th e augmente d Dickey-Fulle r an d the Phillips-Perro n statistic s ar e TABLE 4.4. Empirica l cumulative distribution s DGP: (8a ) with p = 1 Sample size (T) Probabilit
y o f a smaller value 3
0.90 0.9
5 0.97
5 0.9
9
(a) Tes t statisti c r i; model (8b) 25 2.20 50 2.18 2.17 100 250 2.16 500 2.16 OO 2.16
2.61 2.56 2.54 2.53 2.52 2.52
2.97 2.89 2.86 2.84 2.83 2.83
3.41 3.28 3.22 3.19 3.18 3.18
(b) Tes t statisti c T 2; model (8c) 25 2.77 50 2.75 100 2.73 250 2.73 500 2.72 00 2.72
3.20 3.14 3.11 3.09 3.08 3.08
3.59 3.47 3.42 3.39 3.38 3.38
4.05 3.87 3.78 3.74 3.72 3.71
(c) Tes t statistic r3; model (8c) 25 2.39 50 2.38 100 2.38 250 2.38 500 2.38 00 2.38 .
2.85 2.81 2.79 2.79 2.78 2.78
3.25 3.18 3.14 3.12 3.11 3.11
3.74 3.60 3.53 3.49 3.48 3.46
a
Al l entrie s i n th e tabl e hav e standard error s o f les s tha n 0.01 . Distribution s are symmetric. Source: Dicke y and Fuller (1981 : 1062) .
Testing for a Unit Roo t
118
TABLE 4.5. Empirica l cumulativ e distribution s Sample Probabilit size (T) 0.01
y of a smaller value a 0.025 0.0
5
0.10
(a) Tes t statistic <E>!; DGP : (8b) wit h Pb = 1 , 25 0.29 0.65 0.38 0.49 50 0.29 0.50 0.66 0.39 100 0.29 0.39 0.50 0.67 250 0.30 0.51 0.67 0.39 0.30 500 0.39 0.51 0.67 00 0.30 0.67 0.40 0.51 (6) Tes t statistic O2; DGP : (8c) wit h 25 0.61 0.75 0.89 0.62 50 0.77 0.91 100 0.63 0.77 0.92 0.63 0.92 250 0.77 0.63 500 0.77 0.92 00 0.63 0.92 0.77 (c) Tes t statistic 0.74 25 0.76 50 0.76 100 250 0.76 0.76 500 00 0.77
0.90
0.95
0.975 0.9
r\.
4.12 3.94 3.86 3.81 3.79 3.78
; mode l (8b) 6.30 5.18 4.86 5.80 4.71 5.57 4.63 5.45 5.41 4.61 4.59 5.38
Me = 0 , yc = 0; model (8c ) 6.75 4.67 5.68 5.13 5.94 4.31 5.59 4.16 4.88 4.07 4.75 5.40 4.05 4.71 5.35 4.03 4.68 5.31 Xc = 0 ; model ( 8c) 0>3; DGP : (8c) wit h PC = 1 , ' 1.33 7.24 8.65 0.90 1.08 5.91 1.37 5.61 6.73 7.81 0.93 1.11 7.44 0.94 1.12 1.38 5.47 6.49 5.39 6.34 7.25 0.94 1.13 1.39 0.94 1.39 5.36 6.30 7.20 1.13 5.34 6.25 0.94 1.39 7.16 1.13 Pc = l ,
1.10 1.12 1.12 1.13 1.13 1.13
9
7.88 7.06 6.70 6.52 6.47 6.43 8.21 7.02 6.50 6.22 6.15 6.09 10.61 9.31 8.73 8.43 8.34 8.27
a
Al l entrie s i n th e lef t hal f o f th e tabl e hav e standar d error s o f les s tha n 0.005; those in the righ t half , les s tha n 0.06 . Source: Dicke y an d Fulle r (1981 : 1063) .
computed fo r tw o differen t length s o f lags . Th e firs t la g lengt h i s given by € 4 = [4(T/100) 1/4] an d th e secon d b y € 12 = [12(7/100) 1/4]; [x ] denote s the largest intege r les s tha n or equa l t o x. The result s o f thi s experimen t ar e presente d i n Table s 1 an d 2 o f Schwert (1989 : 148-9) . The y indicat e tha t th e distribution s o f th e Phillips-Perron test s ar e no t clos e t o th e Dickey-Fulle r distribution . The distributions ar e closest whe n 6 - 0. 5 or 0.8 but diffe r markedl y for values o f 9 —- —0. 5 an d —0.8 . Th e discrepancie s persis t eve n wit h sample size s a s larg e a s T = 1000. Th e AD F statistics , o n th e othe r hand, hav e distribution s tha t ar e muc h close r o n averag e t o th e Dickey-Fuller distribution . The poo r behaviou r o f th e Phillips-Perro n test s wher e negativ e M A terms ar e presen t persist s i n regression s tha t incorporat e a tim e trend .
Testing for a Unit Root 11
9
Schwert als o report s the distribution s of the normalize d unit-roo t estimators (i.e . T(p — 1)) i n thei r AD F an d non-parametricall y cor rected D F versions . Th e conclusion s remai n unaltered . Finally , Schwert's simulation s d o sugges t tha t th e finite-sampl e performanc e under th e nul l o f th e Phillips-Perro n procedures , i n th e case s wher e MA term s caus e siz e distortions , is bette r whe n S 2U and S 2Tf are calculated usin g th e firs t difference s o f y t tha n wher e th e regressio n residuals ar e used . However , th e test s ma y the n fai l t o b e consisten t against som e stationar y alternativ e hypothese s (Stoc k an d Watso n I988b). I t seem s safest , therefore , t o avoi d thes e test s i f ther e i s an y evidence o f th e kin d o f M A componen t t o th e error s tha t cause s siz e distortions. An alternativ e procedur e i s propose d b y Hal l (1989) , wh o suggest s that I V b e use d i n place o f OL S i n augmente d Dickey-Fuller tests . Th e level instrumenta l variabl e use d i n plac e o f y,^. 1 i s y t-(k+i), wher e th e residual autocorrelatio n functio n ha s non-zer o element s onl y u p to la g k (see Sectio n 4.6. 4 below) . Hall' s Mont e Carl o result s sugges t tha t th e method perform s well , particularly for negative MA erro r processes .
4.5. Furthe r Extension s Two mor e extension s o f th e testin g procedur e ma y b e considered . Th e first concern s testin g fo r multipl e uni t root s i n a process . Th e secon d i s testing fo r uni t root s a t seasona l frequencies . Inventorie s ma y b e regarded a s a goo d exampl e o f a variable tha t i s likel y t o b e 1(2 ) (contains tw o uni t roots) , a s i t i s constructe d b y aggregatin g a functio n of flo w variable s (productio n an d sales ) whic h ar e individuall y 1(1) ; a test fo r multipl e uni t root s woul d therefor e b e importan t whe n dealin g with stoc k variable s o f thi s kind . Test s fo r seasona l uni t root s ar e applicable whe n seasona l dat a ar e used . Standar d unit-roo t test s ma y provide misleadin g result s i n th e presenc e o f integratio n a t seasona l frequencies. 4.5.1. Multiple Unit Roots Consider th e proble m o f testin g fo r d > 1 uni t root s i n a series . Th e sequence o f testing—whic h start s wit h a test fo r a singl e unit root i n th e undifferenced series , the n proceed s t o a test fo r a second uni t root (tha t is, test s th e first-difference d series ) i f th e firs t nul l (o f a uni t roo t i n levels) i s not rejected , an d s o on—does not constitut e a statistically vali d testing sequence , sinc e al l o f th e unit-roo t test s considere d i n thi s chapter tak e th e complet e absenc e o f uni t root s a s th e alternativ e
120 Testin
g for a Unit Roo t
hypothesis. Dicke y an d Pantul a (1987 ) sugges t a more natura l sequentia l testing procedur e fo r uni t root s whic h take s th e largest 5 numbe r o f uni t roots unde r consideratio n a s th e firs t maintaine d hypothesi s an d the n decreases th e orde r o f differencin g eac h tim e th e curren t nul l hypothesis is rejected . Thi s continue s unti l th e firs t tim e th e nul l hypothesi s i s no t rejected. The sequentia l procedur e ma y be illustrate d fo r th e cas e d = 2. Le t u s consider th e AR(2 ) model , This mode l ca n be re-parameterize d a s where ft = (pjp 2 - 1 ) and ft = -(1 - pj)( l - p 2). The testin g procedure consist s o f the followin g steps: 1. Tes t th e nul l hypothesi s o f tw o uni t root s agains t th e alternativ e o f a singl e uni t root . Unde r thi s nul l hypothesi s f t = f t = 0 an d a n F-tes t may b e use d t o tes t it . Suc h a test , however , doe s no t tak e accoun t o f the one-side d natur e o f th e alternativ e hypothesis . A mor e powerfu l procedure follow s fro m notin g that , unde r bot h th e nul l an d th e alternative hypotheses , f t = 0. However , f t = 0 unde r th e nul l hypo thesis bu t i s les s tha n zer o unde r th e alternativ e hypothesis . Thus , a more powerfu l tes t i s give n b y estimatin g th e regressio n o f A 2 y, o n Ay f _!, computin g th e f-rati o o f ft , an d performin g a one-side d lower tail test usin g the Dickey-Fulle r critica l values . 2. I f th e nul l hypothesi s abov e i s rejected , procee d t o tes t th e nul l of one uni t roo t versu s th e stationar y alternative . Her e HQ an d HI ar e given b y f t < 0, f t = 0, an d f t < 0, f t < 0 respectively . Thus , a one-sided f-tes t her e involve s estimating the regressio n o f A 2 y, on A y f _ j and y t-\, computin g th e f-rati o o f ft , an d comparin g i t wit h th e Dickey-Fuller values . This testin g procedure ma y be generalize d t o testin g fo r three o r mor e unit roots . Dicke y an d Pantul a (1987 ) contain s th e result s o f a simula tion study . Thei r genera l conclusio n i s tha t th e sequentia l procedure , consisting o f testin g a nul l hypothesi s o f k uni t root s agains t a n alternative o f k — 1 uni t roots , base d o n f-tests , i s considerabl y mor e powerful tha n a n F-test-base d procedure . 4.5.2. Seasonal Integration We hav e s o fa r focuse d attentio n o n testin g fo r a uni t roo t a t th e zer o frequency. However , whe n seasona l dat a ar e used , i t ma y b e necessar y 5
Not e tha t th e firs t sequenc e too k th e smallest numbe r (i.e . 1 ) of uni t root s a s it s firs t maintained hypothesis .
Testing for a Unit Root 12
1
to allo w fo r seasona l averagin g o r seasona l differencin g t o achiev e stationarity. Fo r example , th e appropriat e differenc e to use to transform to stationarit y ma y not be x, - x t-i, bu t xt - x t~4 i n quarterly dat a or xt - x,~i2 i n monthly data. Seasona l integratio n (an d co-integration ) and testing fo r uni t root s a t seasona l frequencie s ar e discusse d b y Engle , Granger, an d Hallma n (1988) , Ghysel s (1990) , Hylleberg , Engle , Granger, an d Yo o (1990) , Engle , Granger , Hylleberg , an d Le e (1993) , and Ilmakunnas (1990) amon g others . Just a s a tim e serie s wit h n o seasona l componen t ma y b e wel l described b y a deterministi c process, a stationar y stochasti c process , o r an integrate d process , th e seasona l componen t o f a tim e serie s ma y b e well describe d b y a proces s fro m an y o f thes e classes , o r ma y combin e elements o f each . Whil e i t i s commo n practic e t o mode l a seasona l component a s havin g a deterministi c o r stationar y form , ther e ma y b e cases wher e i t i s appropriat e t o allo w th e mode l o f th e seasona l component t o drif t substantiall y ove r time . Thi s possibilit y is implicit in the practic e o f seasona l differencin g (se e e.g . Bo x an d Jenkin s 1970) , whereby a proces s observe d s time s pe r yea r woul d b e transforme d t o its , s -period difference , x t — x,-s, o n th e assumptio n tha t th e proces s contains an integrated seasona l component . In orde r t o allo w for a unit root a t a seasonal frequency, it is useful t o factor th e la g polynomial of the process . I f the la g polynomial contains a factor ( 1 - L s ) = A 5 , correspondin g t o a seasona l uni t root , the n i t can be factorize d as
That is , th e seasona l differenc e operato r ca n b e broke n dow n int o th e product o f th e firs t differenc e operato r an d th e moving-averag e seasonal filter 5(L ) containin g further root s o f modulus unity. Engle e t al. (1988 ) defin e a variabl e x t t o b e seasonall y integrated o f orders d an d D (denote d SI(d, D)) , i f & dS(L)Dxt i s stationary . Thus , for quarterl y data , i n th e terminolog y establishe d above , i f A 4 jr r i s stationary, the n x, is SI(1, 1) with S(L) = 1 + L + L 2 + L 3 . Further ,
Hence th e quarterl y seasona l uni t roo t proces s ha s fou r root s o f modulus unity : on e a t th e zer o frequency , on e a t th e two-quarte r (half-yearly) frequency , an d a pai r o f comple x conjugat e root s a t th e four-quarter (annual ) frequency . T o relat e thes e root s t o frequencie s in an intuitiv e way , conside r th e deterministi c proces s a(L)x t = 0. Fo r
122 Testin
g for a Unit Root
a(L) — (1 + L) , the n x,+i = -x, an d so ^(+2 = x t; th e proces s return s to its origina l valu e o n a cycl e wit h a perio d o f 2 . Fo r a(L) = ( 1 — /L), then x t+i = i.xt, x t+2 — f2x, = ~*< > *r+ 3 =— '*r> an d ^, +4 = —i 2xt = *„ s o that th e proces s repeat s wit h a period o f 4. As wit h a proces s wit h a singl e uni t roo t a t th e zer o frequenc y (e.g . the rando m wal k (1 — L)x, = et), a seasonally integrated proces s suc h as (1 - L 4)xt = £ r retain s th e effec t o f shock s indefinitely , an d ha s a variance whic h increase s linearl y wit h time . However , becaus e th e seasonally integrate d proces s contain s multiple roots o f modulus unity, it does no t behav e lik e a n 1(1) proces s i n all respects. Fo r example , shock s to th e syste m wil l als o alte r th e seasona l patter n o f th e series , s o tha t the sequence s o f observation s corresponding to eac h quarte r ma y evolve in differen t ways . Th e firs t differenc e o f suc h a seasonall y integrate d process wil l not b e stationary. Testing fo r a uni t roo t a t a seasona l frequenc y ha s muc h i n commo n with testin g fo r uni t root s a t th e zer o frequency . Test s hav e bee n proposed b y Hasza an d Fulle r (1982) , Dickey , Hasza , an d Fuller (1984) , Osborn, Chui , Smith , an d Birchenhal l (1988) , Hylleber g e t al (1990) , and Engl e e t al. (1993) , amon g others. W e wil l follow Hylleberg e t al. i n describing a testing strategy. Consider a process observe d quarterl y and generate d b y where e t i s IID(0 , cr 2) an d y(L ) i s a fourth-orde r la g polynomial . W e wish t o tes t th e nul l hypothesi s tha t th e root s o f y(L ) li e o n th e uni t circle, agains t th e hypothesi s tha t the y li e outside . Definin g thre e positive parameters <5j , <5 2, an d <5 3, y(L ) ca n b e represente d as 6
For 5 j clos e t o one , thi s ca n b e furthe r rewritte n b y usin g a Taylo r series approximation , a s
where th e las t ter m i s a remainde r (se e Engl e e t al . (1993 ) fo r th e approximation theorem) . Makin g th e substitution s ^ = — A1; 7r 2 = — A2 , 2A3 = — 773 + iTT4, an d 2A 4 = —ir^ —in^, rewritin g th e expressio n fo r 6 Th e las t ter m appear s a s ( 1 + 6 3 L 2 ) rathe r than , a s migh t b e expected , a s (1 + <5 3L)(1 + 6 4L) becaus e y(L ) i s a real la g polynomial , an d henc e a t leas t tw o o f it s roots mus t be comple x conjugates of each other .
Testing for a Unit Root 12
3
and groupin g terms in ?r 3 an d 77 4, w e hav e
Substituting this expression int o (20) and rearranging, we have
(21)
Equation (21 ) can b e estimate d b y OLS , possibly wit h adde d lag s of the dependen t variabl e t o captur e autocorrelatio n i n th e errors . T o tes t the nul l that ther e i s a unit roo t a t zer o frequency , we test A j = 0, which corresponds t o ji ^ = 0 ; t o tes t fo r a roo t o f — 1 (half-yearly frequency) , we tes t A 2 = 0,, corresponding t o ?r 2 = 0 ; t o tes t fo r root s o f ± L (annual frequency), w e tes t tha t A 3 o r A 4 = 0 , eac h o f which requires a joint tes t that 77 3 an d ?r 4 ar e equa l t o zero . Rejectio n o f al l o f thes e nul l hypotheses implies stationarity of the process . The critica l value s fo r thes e test s ar e relate d t o th e Dickey-Fulle r (jri an d 7r 2) an d Dickey-Hasza-Fulle r values , an d ar e tabulate d b y Hylleberg e t al. Various extension s o f th e basi c mode l ar e considere d b y Hasz a an d Fuller (1982) , Dicke y e t al. (1984) , an d Osbor n e t al. (1988) , notabl y t o allow fo r th e presenc e o f a deterministi c constan t an d tren d term s i n (20) an d highe r orders o f integration.
4.6. Asymptoti c Distribution s o f Test Statistic s We wil l no w conside r som e example s o f th e us e o f a functiona l centra l limit theore m t o deriv e th e asymptoti c distributions of test statistic s such as thos e above , fo r hypothese s involvin g integrate d variables . Again , recall tha t result s o n th e sum s of powers o f trend term s ar e summarized in Sectio n 1.5.5 , an d tha t th e relationship s amon g particula r sampl e moments, functional s o f Wiene r processes , an d densitie s fro m th e normal family ar e give n in Table 3.3. 4.6.1. Example: Dickey-Fuller Tests The simples t versio n o f this test i s based o n th e nul l hypothesis tha t th e DGP i s y t = y t-i + u t, u, ~ IID(0 , cr 2) an d y$ = 0. Th e mode l use d i s
124 Testin
g for a Unit Roo t
y, = pyt-\ + ut. Therefore, estimatin g the model b y OLS,
By equations (3.22 ) and (3.23) ,
and
Hence
The percentile s o f this distribution are thos e give n in Table 4.1(a) . Further,
where
The percentile s o f the distributio n are give n in Table 4.2(a) . Now suppos e tha t y t i s generate d b y th e slightl y mor e elaborat e process, with u t ~ IID(0 , a2 ) and y 0 = 0. The model is given by where p c = 1 , y c = 0 unde r th e null . Th e nul l hypothesi s therefor e entails tha t th e serie s i s a rando m wal k wit h possibl e drift , an d th e alternative is stationarity around a possibly non-zero deterministi c trend. Consider usin g th e mode l i n (23) , wit h y c an d fi c unconstrained , t o test nul l hypothese s o f th e for m H\:p c = \, H^:y c = Q, an d HO- (P C ~ 1 ) = Yc =0- # o i s tne standar d Dickey-Fulle r null , give n as case (iii ) in the discussio n earlier o n simila r tests. Thes e test s ma y al l b e
Testing for a Unit Root 12
5
put withi n a commo n framewor k b y usin g a se t o f transformation s suggested b y Sims e t al. (1990). Under th e null HQ, then y t - \i ct + St, where S t - 2i= i M i- I n general, (23) ca n be rewritte n as or
where z j = (zi, t, Z 2 , t , z^, 0 ' = (#1, #2 , #3) . wit h 6 l = (jU c + y c ), #2 = PC, #3 = (y c + PA), and zu = 1, z2 , r = y, - fi ct = St, zi,, = t . The transforme d regressor s ar e linea r combination s o f th e origina l regressors, wit h th e linea r combination s chose n t o isolat e th e regressor s with differen t stochasti c properties — that is , a constant , a n integrate d process wit h n o deterministi c tren d component , an d a linea r trend , respectively. Give n th e rate s o f convergenc e implie d i n (3 . 21) -(3. 24), OLS estimator s o f th e coefficient s i n 6 converg e a t differen t rates . Define th e scalin g matri x Tr = diag(T 1 / 2 , T , T 3/2) partitione d conformably with z, an d 0 . With these definitions , th e OL S estimator o f 0 is
so that where
From (3.21)-(3.24 ) w e ca n deriv e th e limitin g distributions of th e si x elements i n th e 3 x 3 symmetri c matri x V T an d th e thre e differen t elements i n 0 r . Thi s i s don e unde r th e additiona l assumptio n tha t fic — 0, withou t an y los s o f generality , since , havin g included th e tren d in (23) , th e estimate s 0 ar e invarian t t o th e tru e valu e o f fi c give n tha t there is in fact n o trend i n the DGP. 7 Thes e element s are :
Refer t o the discussio n on similar tests earlie r in the chapter .
126 Testin
g for a Unit Root
The analytica l densitie s o f Vr,i,2 > ^7,2, 3 > 07,i > 07, 2 > an d 0 r> 3 ca n b e found fro m Tabl e 3.3 . I n th e cas e o f <j) T^ w e use th e fac t tha t th e squar e of W(l) i s distribute d a s ^ 2(1), recallin g that W(l) i s standar d normal. The closed-for m densit y fo r th e functiona l t o whic h ^7,2, 2 converge s is more difficul t t o derive , but a n asymptoti c expansion i s given by Abadi r (1992). If, a s i n thi s Dickey-Fulle r test , w e ar e particularl y interested i n th e estimator o f p c an d it s ?-ratio , t(p c), choosin g th e appropriat e element s from abov e gives and
where V 2? denote s th e secon d elemen t o n th e diagona l o f V^ 1 , an d fi(W), i = 1 , 2, ar e combination s of th e functional s o f Wiene r processe s derived above . Fo r example , fro m (24)ff . , p c = 9 2, IT, 22 ~ T, an d 9 ^ = 1 under th e null . S o from (26) ,
the secon d elemen t o f the 3 x1 matrix Vj1 ^. Fro m (27 ) w e note tha t (pc — 1) converge s a t rat e O p(T~l) instea d o f th e conventiona l Op(T~^2). Similarly , fro m (28) , th e correspondin g ?-rati o ha s a non degenerate distributio n differin g fro m th e standardize d normal distribu-
Testing for a Unit Root 12
7
tion whic h appear s i n the conventiona l asymptoti c theory appropriat e t o stationary processes . There ar e analogou s expression s fo r genera l Wal d statistic s fo r th e tests o f join t hypotheses . Suppos e tha t th e Wal d statisti c test s th e q hypotheses R0 = r i n (24). The test statistic is
The asymptoti c behaviou r o f thi s tes t statisti c afte r suitabl e scalin g b y T r i s then a function o f the limiting distributions of \T and <j> T. 4.6.2. Example: Augmented Dickey-Fuller Tests In thi s cas e w e assum e tha t th e DG P i s simila r t o (22) , but tha t th e error ter m i s a n AR(j > + 1 ) process wit h a unit root. Th e correspondin g model is with la g polynomial /?(L ) = 2f=i/3,-L' wher e th e root s o f [ 1 - /3(L)L ] lie outsid e th e uni t circle . Unde r th e nul l hypothesi s H 0: {p c = 1 , yc = 0}, th e DG P i s a n AR(jC> ) generalizatio n o f (22 ) so tha t w e ca n again use the transforme d model where no w z' f = (i{ ZS. M Z 4,t) an d 0' = (0[, 6 2, 03, 04). T o defin e the element s o f zj, le t jU c = E(Ay t) = ( 1 - j8(l))~V c = b{i c, the unconditional mea n o f the drif t unde r th e null, usin g b = (1 - ^(l))" 1. Next, let
The 0 { ar e give n b y 0{ = (ft, ft , . . ., ft,) , 0 2 = A* c + j8(l)A c + y c , 63 = pc, an d 0 4 = y c + p cuc. Th e scalin g matri x T r become s diag(r 1/2 ip, T 1/2, T, r3/2) wher e i p i s the uni t vecto r o f dimensio n p . Finally £l p = E(zittz[tt), th e covarianc e matri x o f z^,. Th e element s of the matrice s Vj - an d <J>T ar e simila r t o thos e fo r th e simpl e Dickey Fuller test . Then, usin g 4> to denot e convergenc e in probability
128 Testin
g for a Unit Roo t
Again, Tabl e 3. 3 ma y b e applie d t o fin d th e densitie s o f th e Wiene r processes appearin g above , wit h th e exceptio n o f tha t appearin g i n th e expression fo r VT.S.S ; again , a n expansio n fo r thi s densit y i s give n b y Abadir (1992) . V i s therefor e bloc k diagonal , an d th e estimator s o f th e nuisanc e parameters j 8 are asymptoticall y normal an d d o no t affec t th e asymptoti c distributions o f th e Dickey-Fulle r statistics , s o tha t th e sam e critica l values ca n b e used . Th e b s tha t appea r i n som e o f th e expression s cancel appropriatel y t o mak e thi s possible . Thi s ma y b e see n i n th e simplest cas e wher e th e mode l doe s no t includ e eithe r th e constan t o r the tren d ter m bu t doe s include the Ay ; _ ; - terms . Noting that i n this case the term s Vj-^2 , \T,i,4' ^r,2,3 » 11 ^r,2,4 > Vr,3A> $r,2 > an d 0r, 4 ar e n °t 1 pp relevant, an d tha t V" = diag(o) . . . a) , V^3,3), wher e o> " i s th e z'th diagonal elemen t o f S2 p th e distributio n o f th e f-statisti c i s give n b y t = (o" 2Fri3j3)^1//207-;3. Thi s ha s th e standar d Dickey-Fulle r distributio n with th e critica l values give n by Tables 4.2(a) . Th e result s exten d t o th e cases wher e th e constan t an d (or ) tren d ar e (is ) include d i n th e mode l with th e critica l value s give n b y Table s 4.2(6 ) an d 4.2(c ) respectively .
Testing for a Unit Root 12
9
The inclusio n o f th e 1(0 ) term s Ay ( _ ; leave s unchange d th e asymptoti c distributions o f the parameter s o f interest .
4.6.3. Example: Non-parametric Test Statistics (Phillips 1987a) Consider th e simpl e random-wal k proces s y t = yt^ + ut. Th e mai n features o f non-parametri c correction s ma y b e illustrate d b y assumin g that th e onl y restriction s impose d o n th e stochasti c proces s {wj^ i ar e those give n by condition s (3.16a)-(3.16d) ; {wjjl i ma y therefore b e a n ARMA(p,q) proces s i n whic h cas e th e f-statisti c fo r p , i n th e mode l yt = pyt-i + ut, does no t have the standard Dickey-Fulle r distribution . As discusse d earlie r i n this chapter, a non-parametric correction i s one way o f accountin g fo r th e autocorrelatio n i n th e {wj™ = 1 series . Thi s correction enable s u s t o retai n th e us e o f th e Dickey-Fulle r critica l values t o conduc t inferenc e an d therefor e expand s th e rang e o f model s to which the Dickey-Fulle r test s ca n be applied . Using th e result s i n (3.21)-(3.24) , th e estimato r p an d it s f-rati o t(p) have the following limiting distributions:
where A =(cr 2 — cr2)/2 wher e CT 2 and cr 2 ar e a s define d i n (10a ) an d (106). I f th e u, ar e IID(0 , CT2), then CT2 = CT», and A =0. I f so , th e distributions o f p an d it s r-rati o i n (31 ) an d (32 ) above ar e th e usua l Dickey-Fuller distributions . It ma y the n b e verifie d tha t th e limitin g distributio n o f th e statisti c Z(p), where
is th e sam e a s th e distributio n obtaine d b y settin g A =0 i n (31) . This
130 Testin
g for a Unit Roo t
follows fro m a n inspectio n o f (31 ) an d b y noting that
Similarly, th e limitin g distribution o f the Z(t(p)), wher e
is the sam e a s the distributio n obtained by setting A = 0 in (32) . The limitin g distribution s o f (33 ) an d (34 ) ar e unchange d whe n A is replaced b y A in thes e expressions , wher e A is a consisten t estimato r o f A. Consisten t estimator s o f a 2 an d o 2u ar e require d i n orde r t o obtai n a consistent estimato r o f A and t o implemen t th e non-parametri c correc tions. A consistent estimato r o f a 2u i s given by either T~ 1^ \(yt - yt~i) 2 or 3 n"1Xf(yr — Pyt-i)2 • The asymptoti c equivalenc e o f th e tw o estima tors follow s fro m th e propert y tha t p- * 1 in probability. 8 A consisten t estimator o f o 2 ca n be obtaine d fro m (12 ) o r (13 ) a s before. Using argument s simila r t o thos e outline d above , th e no n -parametric corrections fo r th e mor e elaborat e model s whic h includ e constan t o r constant an d trend , ma y b e derived . I n particular , Z(p,- ) an d Z(f(p,) ) (/ = b, c) ma y be obtained .
4.6.4. Example: Instrumental Variables Test for Unit Roots (Hall 1989) The non-parametri c statistic s describe d i n exampl e 4.6. 3 ar e know n no t to perfor m wel l i n finit e sample s i n th e presenc e o f negativ e moving average error s (se e Schwer t 1989) . Hal l (1989 ) propose d estimatio n b y instrumental variable s a s a n alternativ e t o th e us e o f non-parametri c corrections. H e showe d tha t i n th e regressio n mode l y, = pyt~\ + ut, where u t i s a moving-averag e proces s o f som e specifie d orde r an d p i s equal t o 1 under H 0, the n p iv ha s the standar d Dickey-Fulle r distribu tion. The intuitio n for thi s result ma y b e easil y described: p OLS i n th e abov e model doe s no t hav e th e standar d Dickey-Fulle r distributio n because o f the bia s induce d b y th e correlatio n betwee n y r _i an d u, (whe n u t i s an ARMA(p,q) process) . I t i s therefor e necessar y t o us e a correctio n factor t o remov e thi s bias . Thi s bia s doe s no t appea r when , say , y,_ 2 is used a s a n instrumen t fo r y,_ i an d u t i s a n MA(1 ) process . Th e 8 A s note d above , th e finite-sampl e behaviou r o f thes e tw o estimator s ma y b e quit e different (se e Schwer t 1989) .
Testing for a Unit Root 13
1
Dickey-Fuller table s ca n thu s b e use d directly . W e formaliz e thi s intuition nex t b y presentin g a simpl e exampl e an d b y usin g some o f th e distributional result s derive d earlie r i n th e chapter . Throughout , t o simplify th e algebra , adequat e initia l observation s ar e assume d t o b e available, s o all sums are taken ove r 1 . . . T. Let th e DG P b e give n by
Then p, v, th e instrumenta l variables estimator o f p which uses _y,_ 2 a s an instrument for yt-\, is given by
Next, w e want to prove tha t
where W(r) is th e Wiene r proces s associate d wit h th e sequenc e {«,} . The RH S o f thi s expressio n i s th e limitin g distributio n o f th e simpl e Dickey-Fuller tes t fo r a mode l lik e (35 ) when th e u, ar e II D (see Section 4.6.1) . Thus , w e nee d t o sho w that , fo r th e instrumen t y t~k
Note tha t
Proof o f (i). From(35a) ,
132 Testin
g fo r a Unit Root
This follows from th e fac t tha t
Recall no w from (3.23 ) tha t
for th e DG P give n b y (35a)-(35c) . Further , fo r th e erro r proces s u t, o2u = (l + 0 2 )cr 2 and o 2 = (I + 0) 2o2e. It als o follow s from (3 5 b) tha t
Using (39) , it is now possible t o se e fro m (38 ) tha t
But a 2 = (1 + 0 2 )a 2 . Henc e The las t equalit y follows from th e expressio n fo r a 2 give n previously, (i ) now follows routinely from (40) . Proof of (ii).
All term s o f th e for m r~ 2 Xf= i}Vi M i-p / — 1.2, . . ., ( k — 1), converge in probabilit y t o zero . Thi s i s because th e scalin g T^ 1 i s appropriate fo r these sum s t o hav e non-degenerat e distributions. 9 Th e scalin g T~ 2 induces degeneracy . Th e distributio n o f T~ 2 2T= i.y?-i i s give n b y cr 2 (/oW(r) 2 dr) for the DG P (35a)-(35c) ; (ii) no w follows routinely . Finally, (37 ) follows fro m (36) , usin g k = 2 in (i ) an d (ii) , sinc e 9
Thi s follows fro m argument s similar to thos e used t o prove (3.21)-(3.24).
Testing for a Unit Root 13
3
It als o follow s fro m (37 ) that th e f -ratio form o f the test ,
has the Dickey-Fuller f-distributio n wher e a i s a consistent estimato r of a (possibl y equa l to ( 1 + §)& E, where 6 and d e ar e OL S estimators o f 6 and 0^. Thus, estimatio n b y instrumenta l variable s ha s th e sam e effec t a s th e non-parametric correction s t o p(OLS ) proposed b y Phillips an d Perron . In a smal l Mont e Carl o study , Hal l (1989 ) show s tha t th e siz e problems associate d wit h the Phillips-Perro n tes t ar e partiall y alleviate d by the us e o f this instrumental variable procedure . However , substantia l size distortion s remai n in the case s wher e 6 < 0 in the nul l model . No power calculation s ar e reported i n Hall's paper . 4.6.5. Example: Bounds Test for Unit Roots (Phillips and Ouliaris 1988) A limitatio n o f th e testin g procedure s discusse d i n thi s chapte r i s tha t the distribution s o f th e tes t statistic s ar e non-standard . Consequently , a number o f differen t set s o f critica l value s hav e t o b e use d t o implemen t the tests . This proble m i s at the hear t of a literature whic h exploit s the ide a tha t differencing a n 1(0 ) serie s induce s a uni t roo t i n th e moving-averag e representation o f th e process . Us e i s mad e o f thi s fac t t o devis e a unit-root tes t base d o n th e long-ru n variance, define d i n (3.16c) , o f th e first-differenced tim e series . Th e critica l value s ar e take n fro m th e standard norma l table . In orde r t o illustrat e thi s approach , assum e tha t y t follow s th e IMA(1,1) process , &yt = ( 1 - 9L)e t = ut, (41 ) 2 2 2 2 with E, ~ IID(0, o e). Th e long-ru n varianc e o f Ay , is a = (1 - 9) o E, so a 2 + 0 if and onl y if 9 ¥= 1. I n othe r words , if y, is 1(0), A.y, will have <72 = 0 , whil e i f i t i s 1(1) , wit h |0|<1 , o 2 i= 0. Phillip s an d Ouliari s (1988) therefor e tak e a s thei r nul l hypothesi s H 0: o 2 + 0 o r (equival ently, bu t standardizin g t o eliminat e unit s o f measuremen t effects ) HQ-. T 2 = o 2/o2e = £ 0 agains t th e alternativ e hypothesi s // 1 :r 2 = 0. Ob taining an estimate o f a 2 a s in (13), they prove that 10 €^ 2 (f 2 - T 2)/r2 ~ N(0, 1). (42 ) 10
€ is the lag-truncatio n paramete r a s defined in (12).
134 Testin
g for a Unit Roo t
They propos e a bound s procedur e base d upo n th e confidenc e interva l corresponding t o (42 ) and give n by where z^ , i s th e ( 1 - a)th percentag e poin t o f th e standar d norma l distribution. Accordin g t o th e bound s test , H 0 i s rejecte d i f th e uppe r limit o f r 2 i n (43 ) is sufficiently smal l and clos e t o zero . Conversely , H 0 is no t rejecte d i f the lowe r boun d i s sufficientl y larg e an d non-zero ; 0.10 is recommende d a s a thumb-rul e value o f 'nearness' . Simulatio n result s show tha t thi s suggested critica l value ca n lead t o ver y conservative test s in som e cases . Fo r example , i f th e DG P i s ARIMA(0 , 1, 1) wit h values of th e paramete r 6 i n the interva l (-0.6, 0.6) , the averag e uppe r boun d is 0.45 whil e the averag e valu e of the lowe r boun d i s close t o 0.10. An implicatio n o f thi s typ e o f tes t i s that , becaus e o f asymptoti c normality, i t ca n be applie d t o dea l with very general trend-cycl e model s (for example , linea r function s o f tim e o r an y typ e o f dumm y variable). All tha t i s require d i s t o perfor m th e previou s tes t o n th e difference d residuals o f the regressio n o f y t o n th e deterministi c terms. Phillips an d Ouliari s (1988 ) exten d thi s approac h t o testin g fo r co-integration amon g a set of n variable s in the vector \t. I f x, doe s not form a co-integrate d se t o f variables , a'x, i s 1(1 ) fo r al l a . Hence , generalizing th e analysi s give n above , a'Ax , ha s a positiv e definit e long-run varianc e matri x a'JJa, wher e ft i s the long-ru n varianc e matrix of Ax t . Sinc e a'Sla = £ 0 implies tha t tta ^ 0 , Phillip s and Ouliari s (1990 ) suggest testin g fo r a zero eigenvalu e i n JJ , usin g a multivariate estimator of S2 , unde r th e nul l hypothesi s of 'n o co-integration' . Th e tes t i s base d on th e bound s procedur e discusse d previousl y bu t i s applie d t o th e minimum o f th e estimate d eigenvalue s o f th e consisten t estimato r of Q . Taken together , th e method s o f testin g jus t presente d offe r a mean s of discriminatin g betwee n stationar y an d non-stationar y processe s i n reasonably genera l circumstances , withou t to o grea t a proliferatio n o f tables o f critica l values . Ther e remain s wor k t o b e done , however , i n improving th e powe r o f th e test s an d i n achievin g a greate r conformit y with nomina l size s i n finit e samples , fo r particula r kind s o f erro r process. Moreover , researc h i s neede d int o th e effect s o f paramete r non-constancy, o r eve n o f th e possibilit y tha t th e degre e o f integratio n may no t b e constant , o n suc h tests. Tests fo r uni t root s ar e applie d fo r a wid e variet y o f reasons . Th e tests may , first o f all , be directl y relevan t t o economi c theory , whic h offers a numbe r o f example s o f hypothese s tha t impl y uni t root s i n observable dat a series . Moreover , becaus e o f th e potentia l proble m o f spurious regression , investigator s workin g wit h highl y autocorrelate d series wil l ofte n wan t t o tes t fo r non-stationarit y i n thes e series . I f
Testing for a Unit Root 13
5
non-stationarity ca n b e rejected , standar d regressio n method s ca n b e applied safely ; otherwise , a n investigato r ma y choos e t o transfor m th e series t o stationarity , o r ma y investigat e co-integratin g relationship s between th e dat a serie s which , i f present, coul d agai n justif y regressio n involving the level s of the variables . The nex t chapte r take s u p th e topi c o f co-integratio n amon g differen t processes an d thereb y continue s th e stud y o f regressio n model s o f non-stationary dat a series . Test s fo r co-integration , whic h wil l b e con sidered i n Chapter 7 , bear a close relationship t o test s for unit roots.
5
Co-integration We defin e th e concep t o f co-integratio n o f integrate d time-serie s and giv e severa l examples . A n importan t theore m du e t o Grange r on alternativ e representations o f a system of co-integrated variables is state d an d it s proo f i s sketched . W e the n discus s th e Engle Granger two-ste p procedur e fo r estimatin g th e parameter s characterizing the co-integratin g relationship. In Chapte r 1 we discusse d ou r us e o f th e wor d 'equilibrium' . Th e ide a that variable s hypothesize d t o b e linke d b y som e theoretica l economi c relationship shoul d no t diverg e fro m eac h othe r i n th e lon g ru n i s a fundamental one. 1 Suc h variable s ma y drif t apar t i n th e shor t ru n o r because o f seasona l effects , bu t i f the y wer e t o diverg e without bound , an equilibriu m relationshi p amon g suc h variable s coul d no t b e sai d t o exist. Th e divergenc e fro m a stabl e equilibriu m state must be stochastic ally bounde d and , a t som e point, diminishing over time . 'Co-integration ' may b e viewe d a s th e statistica l expressio n o f th e natur e o f suc h equilibrium relationships. The concep t o f co-integration is a powerful on e becaus e i t allow s us t o describe th e existenc e o f a n equilibrium , o r stationary , relationshi p among tw o o r mor e time-series , eac h o f whic h i s individuall y non stationary.2 Tha t is, while the componen t time-serie s ma y have moment s such a s means , variances , an d covariance s varyin g wit h time , som e linear combinatio n o f thes e series , whic h define s th e equilibriu m rela tionship, ha s time-invariant linear properties . The wor d 'co-integration ' clearl y demand s a forma l definitio n o f 'integration', an d thi s wa s provided i n Chapte r 3 . Informally , a serie s is said t o b e integrate d i f it accumulate s some pas t effects ; suc h a serie s is non-stationary becaus e it s futur e pat h depend s upo n al l suc h pas t influences, an d i s no t tie d t o som e mea n t o whic h i t mus t eventuall y 1 Familia r example s o f hypothesize d long-ru n relationship s includ e th e quantit y theor y of money , th e Fishe r effect , th e permanent-incom e hypothesi s o f consumption , an d purchasing-power parity . 2 Typically , i n economi c application s on e look s fo r th e existenc e o f co-integratin g relationships amon g variable s individuall y integrate d o f orde r one . Th e deviatio n fro m th e equilibrium relationshi p i s thu s integrate d o f orde r zer o (i.e . i s stationary ) whe n th e variables ar e co-integrated .
Co-integration 13
7
return. T o transfor m a n integrate d serie s t o achiev e stationarity , w e must differenc e it a t leas t once . However , a linear combinatio n o f series may hav e a lowe r orde r o f integratio n tha n an y on e o f the m ha s individually. I n thi s case , th e variable s ar e sai d t o b e co-integrated. 3 Thus, fo r example , i f {x t} an d {y t} ar e integrate d o f orde r 1 an d ar e also co-integrated , the n {A*,} , {Ay (}, an d {x t + ayt}, fo r som e a , ar e all stationary series . This chapte r provide s forma l definition s o f co-integratio n an d o f related concepts . Severa l theorem s ar e stated , applying i n particula r t o alternative representation s o f co-integrated processes .
5.1. A n example In orde r t o illustrat e th e precedin g discussion , conside r a simpl e example. Tw o serie s {x t} an d {y t} ar e eac h integrate d o f orde r 1 and evolve accordin g t o th e followin g data-generation process: 4
( e in £ 2t)' i s distribute d identicall y an d independentl y a s a bivariat e normal wit h
Solving fo r x t an d y t fro m th e abov e syste m with a = £ / ? gives
Since {«, } is a rando m wal k and {x t} an d {y j depen d linearl y o n {u t}, these ma y therefor e b e classifie d a s 1(1 ) variables . Nonetheless , {xt + ay,} i s 1(0 ) becaus e e t i s stationar y i n (2) . I n thi s exampl e th e vector [ 1 • a]' i s th e co-integratin g vecto r an d x + a y i s th e equilibriu m relationship. I n th e lon g run , th e variable s mov e toward s th e equili brium x + ay = 0, recognizin g tha t thi s relationshi p need no t b e real ized exactl y even a s t —» °° .
3 Whe n regardin g a co-integratin g combination a s a n 'equilibrium ' relationship , i t i s natural t o expec t this combination to b e integrate d o f order zero. However, definitionally, any reductio n i n th e orde r o f integration—say , fro m d t o d — b (wher e b > 0)—is sufficient fo r th e variables to b e calle d 'co-integrated' . 4 Th e exampl e is taken fro m Engl e an d Grange r (1987).
138 Co-integratio
n
Although thi s i s a simpl e example, muc h o f th e metho d an d reasonin g can be generalize d t o more complex cases. Wha t i s crucial is that, whil e {xt} an d {y t} ar e integrate d processes , no t tie d t o an y fixe d means , a linear combinatio n o f th e tw o variable s make s th e resultin g serie s a stationary proces s an d th e variable s x an d y ma y be sai d to b e linke d by the correspondin g equilibriu m relationship . It i s interestin g t o not e tha t i n th e bivariat e cas e w e hav e th e adde d bonus tha t thi s equilibriu m relationship, i f suc h a relationshi p exists , i s unique. Th e proo f i s straightforwar d an d follow s b y contradiction . Suppose not : tha t is , suppos e tha t ther e exis t tw o distinc t co-integratin g parameters a an d y suc h tha t {x, + ay,} an d {x t + yv( } are bot h 1(0) . This implie s tha t (ex— y)y r i s als o 1(0 ) becaus e subtracting on e I(d) series fro m anothe r canno t lea d t o a serie s integrate d o f orde r ( d + 1) (or higher) . Bu t sinc e {y t} i s 1(1), a non-zero constan t time s {y t} i s als o 1(1). Hence we have a contradiction unles s a = y. The analysi s is not quit e s o straightforwar d i n th e multivariat e cas e a s we mus t allo w fo r th e possibilit y o f severa l co-integratin g vectors . Nevertheless, muc h o f th e intuitio n gaine d fro m th e analysi s o f th e bivariate cas e carrie s through to riche r examples . There ar e a t leas t thre e reason s fo r regardin g th e concep t o f co integration a s centra l t o econometri c modellin g wit h integrate d vari ables, a s wel l a s t o th e examinatio n o f long-ru n relationship s amon g those variables . The firs t i s th e lin k tha t th e concep t formalize s amon g variable s o f higher order s o f integration , fo r whic h som e linea r combinatio n i s o f a lower orde r o f integration . I n th e mos t widel y use d examples , a reduction i s mad e fro m variable s tha t requir e first-differencin g fo r stationarity t o a composite time-serie s tha t i s stationar y i n levels . I n addition, thi s composit e stationar y variable , constructe d b y takin g a linear combinatio n o f th e origina l series , ma y be sai d t o characteriz e th e equilibrium relationshi p linkin g th e series . I f a n equilibriu m exist s among severa l variable s s o tha t suc h a stationar y linea r combinatio n exists, w e ma y coun t o n eventua l retur n o f this linea r combinatio n t o it s mean (typicall y zero) . Second, an d followin g directl y fro m thi s identificatio n o f co-integra tion wit h equilibrium , i s th e complementar y ide a o f meaningfu l versu s spurious regression . Regression s involvin g level s o f tim e serie s o f non-stationary variable s mak e sens e i f an d onl y i f thes e variable s ar e co-integrated. A tes t fo r co-integratio n the n yield s a usefu l metho d o f distinguishing meaningfu l regression s fro m thos e tha t Yule (1926 ) calle d 'nonsense' an d Grange r an d Newbol d (1974 , 1977 ) calle d 'spurious' . Finally, anothe r importan t propert y characterize s variable s tha t ar e co-integrated. A se t o f co-integrate d variable s is known t o have , amon g other representations , a n error-correctio n representation ; tha t is , th e
Co-integration 13
9
relationship ma y b e expresse d s o that a ter m representin g th e deviatio n of observe d values from th e long-ru n equilibriu m enters the model . This is a n interestin g resul t b y itself , bu t i s eve n mor e noteworth y a s a contribution t o resolving , o r synthesizing , th e debat e betwee n time series analyst s an d thos e favourin g econometri c methods . I t allow s a reconciliation, a t leas t i n part , o f time-serie s method s o f analysin g dat a that traditionall y considere d onl y th e propertie s o f difference d time series (whic h coul d mor e legitimatel y b e assume d stationary ) an d thos e econometric method s tha t lai d emphasi s o n the equilibriu m relationship s between variable s an d therefor e focuse d o n th e level s of variables. Bot h methods a s traditionall y use d coul d b e sai d t o hav e bee n flawed , th e former b y th e implie d necessit y o f ignorin g information contained i n th e levels o f variables , th e latte r b y it s tendenc y t o ignor e th e spuriou s regression problem . Reliance o n th e us e o f difference d data , a s a potentia l cur e fo r th e spurious regressio n problem , raise s a set o f new issues. A n exampl e o f a potentially controversia l recommendatio n fo r modellin g economi c time series appear s i n Grange r an d Newbol d (197 7 p . 206 ; emphasi s i n original): 'I n th e presenc e o f some autocorrelatio n o f the error s . . . firs t differencing migh t b e expecte d t o g o a long wa y towards alleviatin g th e problem an d i s certainly preferabl e to doin g nothing at all.' As a n illustration , Grange r an d Newbol d cit e th e result s o f Sheppar d (1971), who regressed U K consumptio n o n autonomou s expenditur e an d mid-year mone y stoc k fo r bot h level s an d changes , usin g annua l dat a over th e perio d 1947-62 . Th e result s wer e take n t o indicat e th e existence o f a significan t relationshi p i n level s whic h disappeare d en tirely whe n firs t difference s wer e employed . Th e level s regression , characterized b y a high value of R 2 an d a low value of the Durbin-Wat son statistic , i s spurious . However , th e first-difference d regression ap pears t o b e testin g a differen t hypothesis. 5 Th e differencin g operation , in particular , omit s an y information abou t long-ru n adjustment s tha t th e data ma y contain. Thus, whil e th e spuriou s regressio n proble m i s a seriou s one , th e practice o f differencin g integrate d serie s t o achiev e stationarity , an d o f treating th e resultin g serie s a s th e prope r object s o f econometri c analysis, i s not withou t costs . Error-correctio n mechanism s (ECMs ) ar e intended t o provid e a wa y o f combinin g th e advantage s o f modellin g both level s an d differences . I n a n error-correctio n mode l th e dynamic s of bot h short-ru n (changes ) an d long-ru n (levels ) adjustmen t processe s are modelle d simultaneously . Thi s ide a o f incorporatin g th e dynami c 5 I n the nex t chapte r we discuss the consequences of differencing (and over-differencing ) in case s wher e differencin g (an y numbe r o f times ) doe s no t alleviat e th e problem s o f non-stationarity an d wher e transformin g th e serie s monotonically , prio r t o differencing , appears to be the appropriat e procedure.
140 Co-integratio
n
adjustment t o steady-stat e target s i n th e for m o f error-correctio n terms , suggested b y Sarga n (1964 ) an d develope d b y Hendr y an d Anderso n (1977) an d Davidso n e t al. (1978) , amon g others , therefor e offer s th e possibility o f revealin g informatio n abou t bot h short-ru n an d long-ru n relationships. The theor y o f co-integratio n provide s a unifie d framewor k fo r th e analysis o f ECM s an d o f tim e serie s i n whic h th e variable s shar e on e o r more stochasti c trends . W e elaborat e upo n th e alternativ e representa tions o f co-integrate d system s i n Sectio n 5.3 , where w e als o provid e a more forma l descriptio n o f th e theory ; w e firs t revie w th e theor y o f polynomial matrice s whic h i s necessar y fo r a thoroug h understandin g o f several proof s i n th e nex t section s an d i n following chapters .
5.2. Polynomia l Matrice s A polynomia l matri x A(L ) i s a matri x fo r whic h th e element s {a ry(L)} are scala r polynomial s i n an argumen t L :
where k^ < ° ° . Usefu l reference s t o th e algebr a o f polynomia l matrice s include Gel'fan d (1967 ) an d Gantmache r (1959) . Th e degree , k , o f A(L) i s the highes t o f th e order s &,-, • o f th e elemen t polynomials :
Thus, A(L ) can be expressed a s
(10) The determinan t |A(L) | o f a polynomia l matri x A(L ) i s a scala r polynomial. A familia r exampl e o f a polynomial matri x i s A (A) = (A 0 - AI) , which occurs i n the characteristi c equatio n which ma y b e solve d fo r eigenvalue s o f th e matri x AQ . Ever y matri x satisfies it s ow n characteristi c equatio n (th e Cayley-Hamilto n theorem ) in that , i f we le t /(A ) = |A(A)| , the n /(A ) = 0 (wher e thi s i s interprete d as a matri x expression) . I n general , i f A(L ) = 2f= oA;L' , the n w e wil l also us e the notatio n A(B ) = 2f=oA,-B', fo r a matrix argument B.
Co-integration 14
1
The inverse of a finit e polynomia l matri x A(L) o f degre e k whic h has all root s o f th e determinanta l equatio n |A(z) | = 0 strictl y outsid e th e unit circle 6 i s given , i n general , b y a n infinite-orde r matri x C(L ) = ^T=oCiL'. Thi s matri x i s wel l define d i f an d onl y i f ]Cf= (AL' ' i s a convergent sequenc e a s A:—»°o . Fo r [ z > 1 (equivalently , |L| = z' 1 < 1) , a sufficien t conditio n fo r thi s t o hol d i s |C;|sSp' I where \p \ < I. 7 Th e C , ar e define d by an infinite set o f matrix identities which ma y b e describe d i n a simpl e scala r case , wher e A(L) = 1 - p L = a 0 + a\L, as follows:
such tha t
The constructio n give n b y (11 ) i s derive d b y usin g th e propert y C(L)A(L) = 1 and equatin g power s of L. The algebr a generalize s to high-order scala r polynomial s A(L) an d to matri x polynomials A(L). I n the nex t sectio n o f thi s chapte r an d i n Chapte r 8 we shal l nee d t o dea l with matri x polynomial s tha t hav e uni t root s ( z = 1). I n thes e cases , while th e matri x A(L ) ma y no t hav e a wel l define d invers e becaus e o f failure o f ran k conditions , transformin g A(L ) an d pre - an d post multiplying i t b y suitabl e matrice s wil l lea d t o a n invertibl e matri x provided certai n condition s ar e satisfied . Two polynomia l matrice s R(L ) an d T(L ) ar e sai d t o b e equivalent if and only if there exis t tw o invertible matrices U(L) an d V(L) suc h that Every polynomia l matri x A(L ) ca n be divide d o n th e lef t b y a matri x of th e for m ( B - LI ) fo r an y matri x B s o that , wher e A(L ) i s of degree k , where H(L ) i s o f degre e k -I an d D i s a constan t matrix , th e remainder term . T o obtai n th e precis e for m o f D , w e wil l deriv e thi s 6 Tha t is , denotin g a n arbitrar y roo t o f the determinan t equatio n b y z , \z \ > 1 + e, for some £ > 0 , fo r al l z satisfyin g this equation . 7 Not e tha t thi s exponentia l deca y conditio n i s onl y sufficien t an d no t necessar y t o guarantee convergence .
142 Co-integratio
n
result, whic h is simply a linear transformatio n o f th e origina l polynomia l matrix. W e hav e
and s o on . B y induction , w e ca n continu e thi s substitutio n fo r an y k t o get
A simila r resul t hold s fo r divisio n o n th e right . I n dealin g wit h integrated series , th e cas e B = I i s of particular interest; the n where A(l ) is equa l t o A(L ) evaluate d a t L = 1 . Not e tha t fro m (13) and (15) , for the cas e B = I ,
and
Further, A(l ) is called th e total effect. Whe n D = A(l) =0 , the n A(L ) is divisible o n th e lef t b y ( 1 — L)I withou t a remainder , an d henc e ca n b e rewritten i n terms of the operator ( 1 - L ) alone. The nex t mai n resul t t o b e prove d i s th e isomorphi c relationshi p between polynomia l matrice s an d companio n matrices . Thi s wil l clarif y the derivatio n of latent roots of polynomia l matrices, whic h are of grea t interest i n analysin g dynamic s an d co-integration . Conside r th e syste m of n deterministi c linear equations :
We se t A Q = I a s a normalization . Th e sam e informatio n ca n b e
Co-integration
143
represented i n stacke d for m (calle d th e companion form) b y definin g the followin g matrice s an d vectors :
Direct multiplicatio n o f 4 > int o 7, t-i an d comparison o f tha t outcom e with X r reveal s tha t th e secon d expressio n i n (18 ) merely augment s th e original syste m with a se t o f identitie s o f the for m x ( _i = x ( _ j , etc . The corresponding advantag e of companion form s i s that, whateve r th e valu e of k i n (16) , the companio n for m i s always of firs t order , an d henc e ca n be analyse d usin g alread y establishe d tools . Thi s advantag e i s pronounced whe n w e wis h t o fin d th e eigenvalue s o f A(L) , an d d o s o b y solving It wil l b e convenien t t o re-expres s (19 ) in term s o f th e negative s o f th e inverses of th e eigenvalues , /j, = —I/A , an d t o solv e Using the definitio n o f <1 > fro m (17 ) in (20) , we hav e
7
rom the partitioned invers e formula, wher e D ^0,
The firs t equalit y follow s fro m th e fac t tha t th e determinan t o f th e firs t
144 Co-integratio
n
matrix followin g th e equalit y i s one . Repeatin g thes e operation s i n th e alternative direction , i f E ^ 0, establishes tha t Both result s wil l b e use d below. Here , w e apply (22) t o th e determinan t in (21) , choosin g E a s th e larg e n( k - 1 ) x n(k — 1) matri x i n th e upper-left corner , an d D = I. The n FD -1G i s zer o excep t fo r it s top-right block, which is -^A^, an d D = 1. Thus,
(23) Comparing (21 ) wit h (23) , th e analysi s can b e see n t o repeat , leadin g t o | A O/) | after k - 1 steps. Thus , the laten t root s ca n b e foun d b y equatin g either expressio n t o zer o an d solving. Sinc e A ( •) i s n x n , O i s n k x n k an d s o ha s n k eigenvalues , as required. From (13) , whe n B = I, i f A(l ) ha s ran k r < n, the n |A(1) | = 0 an d hence A(L ) ha s n — r uni t roots . Conversely , i f A(l) ha s ran k n , A(L ) has none o f its eigenvalues equal to unity. Next, derivative s o f polynomia l matrice s wit h respec t t o thei r argu ments will b e needed , an d w e have
This i s reminiscen t o f th e mean-la g formul a i n a scala r distribute d lag . From th e resul t tha t H(l ) = - ]^= i/A, , w e now see that H(l ) = -T. Thus, whe n A(l ) = 0, s o tha t A(L ) = (1 - L)H(L) , the n |H(L) | = 0 delivers th e remainin g eigenvalues . I f H(l ) di d no t hav e ran k n whe n A(l) = 0, the n |H(1)| = 0, s o H(L ) als o ha s uni t roots . Usin g (13 ) an d (15) t o write H(L) = H(l) + (1 - L)K(L) , w e note that , i n the extreme case tha t T = 0, H(L ) = (1 - L)K(L) , whic h implie s tha t A(L) = (1 - L) 2 K(L). Consequently , equatio n (16 ) woul d becom e (1 — L)2 K(L)x r = 0 , yieldin g a syste m in secon d differences . There i s a close affinit y betwee n th e rank s o f A(l) , H(l) , etc. , an d th e numbe r of differences tha t ca n be extracte d fro m A(L) . Finally, polynomia l matrice s ar e invarian t unde r non-singula r linea r
Co-integration 14
5
transformations i n tha t the y hav e man y equivalen t representation s wit h the sam e properties. This is clear fro m (13 ) above. Mor e generally,
In term s o f (16) ,
For example , whe n k = 1 ,
Such linear transformations are use d regularly in Chapter 8 .
5.3. Integratio n an d Co-integration : Forma l Definition s and Theorem s DEFINITION 1. (adapte d fro m Engl e an d Grange r 1987) . Th e com ponents o f the vecto r x r ar e sai d to be co-integrate d o f order d , b, denoted x t~Cl(d, b) , i f (i ) x , i s l(d) an d (ii ) there exist s a non-zero vector « such that a'\, ~ l(d — b), d ^ b > 0. The vector a, is called the co-integratin g vector. If x , ha s n > 2 components , the n ther e ma y b e mor e tha n on e co-integrating vecto r « ; i t i s possibl e fo r severa l equilibriu m relation ships to gover n th e join t evolution o f the variables . I f there exis t exactly r linearl y independent co-integratin g vectors wit h r ^ n - 1 , then thes e can b e gathere d int o a n n x r matri x a . Th e ran k o f a wil l b e r an d is called th e co-integrating rank. DEFINITION 2. A vecto r time-serie s x , ha s a n error-correctio n representation i f it can b e expresse d a s where (a, i s a stationar y multivariat e disturbance , wit h A(0 ) = !„, A(l) havin g onl y finit e elements , z ( = «'x r , an d y a non-zer o
146 Co-integratio
n
vector. Fo r th e cas e wher e d = b = 1, and wit h co-integrating ran k r, the Grange r Representatio n Theore m holds (se e Sectio n 5.3.1) . Granger's theore m wil l prove tha t a co-integrate d syste m o f variable s can b e represente d i n thre e mai n forms : th e vecto r autoregressiv e (VAR), error-correction , an d moving-averag e forms . Thes e representa tions ar e al l isomorphic t o eac h other , an d th e theore m establishe s th e restrictions tha t hol d betwee n th e lag-polynomia l matrice s i n eac h representation o f the process . We ma y prov e th e theore m i n a t leas t thre e (equivalent ) ways , depending o n th e representatio n fro m whic h w e choos e t o start . Th e theorem i s stated i n Sectio n 5.3.1 . Followin g thi s statement, w e take th e autoregressive representatio n a s ou r starting-poin t an d deriv e th e mai n results. Thi s proo f i s due t o Johanse n (1991fl) . Th e sub-sectio n afte r th e proof contain s a detaile d interpretatio n o f th e results . I n Chapte r 8 we return t o th e theore m an d provide anothe r proof , thi s time startin g fro m the moving-averag e representation . Provin g th e theore m i n tw o way s highlights som e interestin g symmetries which exis t amon g the equivalen t representations o f the process .
5.3.1. Granger Representation Theorem (adapted from Engle and Granger 1987 and Johansen 1991 a) Let x t b e a n 1(1 ) vecto r o f n components , eac h wit h (possibly ) deterministic trend i n mean. Suppos e tha t th e syste m ca n be written a s a finite-order vecto r autoregression :
(25) where th e e t satisf y assumption s (3.16a)-(3.16d ) an d th e firs t k dat a points Xj_fc , Xj-fc+i , . . ., x 0 ar e fixed . Th e mode l ca n the n b e rewritte n in error-correction for m as
Both (25 ) and (26 ) ca n be writte n as where
Co-integration 14
7
Equation (26 ) may also b e written as where V(L ) = (1 - L)~\x(L) - *(!)£* ) = I» - Sti1^'. Fro m (13) above, 1 P(L) can alway s be constructed . Further , th e derivativ e of a(z) at z = 1 is equal to -W = -V(l). Define th e orthogona l complemen t Pj _ o f an y matri x P o f ran k q an d dimension n x g a s follows (0 < q < ri): (i) P_ L i s of dimensio n n x ( n — q); (ii) PI P = 0(B _, )X ,, P'P1 = 0,x(n _ ?) ; (iii) Pj _ ha s ran k n — q, an d lie s i n the nul l space o f P . Certain key assumptions may now be stated . ASSUMPTION Al . Th e characteristi c polynomial ,
has root s eithe r equa l t o o r strictl y greate r tha n one ; that is , |flr(z)| = 0 implies that eithe r z > 1 or z = 1. ASSUMPTION A2 . Th e n x n matri x n ha s reduced ran k r < n and is therefor e expressibl e a s the produc t o f tw o n x r matrice s y and a, where y and a have ran k r. Thus n = y«'. ASSUMPTION A3 . Th e ( n — r) x ( n — r) matri x y'iWa ± ha s ful l rank n — r. Assumption A l guarantee s tha t th e non-stationarit y o f x , ca n b e removed b y differencing . A 2 rule s ou t a stationar y x , process . I f n ha d full ran k (tha t is , i f |JT(Z) | ha d n o root s a t one) , then fro m (27), x, = Ji~ l(L)(/u + et), whic h would impl y that x t wa s stationary. I t is also the statement , i n th e autoregressiv e form , tha t th e syste m has r linearl y independent co-integratin g vectors . I n ligh t o f Assumptio n A2 , y« ' provides a transformatio n o f the n matri x (an d hence a linear combina tion o f th e Xjt whic h i s stationary) . Th e significanc e o f A 3 wil l becom e evident i n du e course , bu t essentially , i t ensure s tha t x r i s integrated of order n o greate r tha n 1 . Unde r th e assumption s state d above , th e following result s ma y be proved : (Rl) Ax r i s stationary.
148 Co-integratio
n
(R2) a'x , is stationary. (R3) £(Ax, ) =
(R4) E(a'x t) = -(
(R5) Ax , ha s a moving-average representation give n by (R6) C(l ) = aj_(y' i < P«j.)~1y'i ha s rank n - r . (R7) «'C(1 ) = O r X B C(l)y=0BXr.
where C(L ) = C(l) + (1 - L)Ci(L) , r= C(l)f» , x 0 i s a constan t (vector) o f integration, an d S, = Ci(L)e t. Proof. Multipl y (27) by y ' an d y' L respectivel y to obtai n th e equation s
using the decomposition n = ya' an d the result tha t y^ y = 0( n-r)Xr. Th e matrix n i s no t invertible , an d th e syste m give n b y (28a)-(28b) therefore canno t b e inverte d directl y t o expres s th e x it i n term s o f th e £;,. T o obtai n a n invertibl e system , w e defin e tw o ne w variables , (ot = (a'a)~la'xt an d v, = (a^ L a_ L )~ 1 a^ L Ax r . Next , defin e th e matrice s «=«(«' a)"1 an d «j _ = a L(a'LaL)~l. Le t R = (a, a± ) b e a n n x n matrix o f ran k n . The n R(R'R)~ 1 R' = !„ an d henc e («« ' + «j.«'i) = !„. Thus , Substituting i n (28a)-(286 ) gives
where i n (28a ) th e firs t ter m o n th e left-han d sid e need s t o b e writte n first a s -(y'y)(«'a)(«'a)~ 1 a'x,. Th e equation s for (a, an d v t ca n now be written i n autoregressive for m a s with
For z = 1 , this matrix has determinant
Co-integration 14
9
which i s non-zer o b y Assumption s A 2 an d A3 . Henc e z = 1 i s no t a root. Fo r z + 1, straightforwar d bu t tediou s algebr a enables u s t o express th e matri x A(z) as To sho w this , substitut e for *P(z ) in A(z ) in term s of n(z) and jr(l) = — nfro m (27) , and us e th e decompositio n n = y«' an d th e orthogonality conditio n yly = a' La = 0( n _ r ) X r . Fo r z = £ 1, therefore , from (31), where w e have used th e resul t tha t th e determinan t o f a matrix obtained by multiplyin g n — r column s (o r rows ) o f a n n x n matri x b y a constant i s th e determinan t o f th e origina l matri x multiplie d b y th e constant raise d t o th e powe r n — r. Thus , fo r z ¥= 1, |A(z) | = 0 i f an d only i f |;r(z) | = 0 . B y Assumptio n Al , i f w e exclud e z = 1, th e onl y remaining roots o f this determinant li e outside th e uni t circle. This show s tha t al l th e root s o f |A(z) | = 0 ar e outsid e th e uni t disk . Hence th e syste m define d b y (29a)-(29b) i s invertibl e an d
Using the formul a for inversio n o f partitioned matrices ,
Thus,
and
From above , £(Ax< ) = a ±E(vt) + aE(Aw,). Notin g tha t E(A(o t) = we have tha t E(Ax t) = aj.(yiV(l)a ±)-Vj.A«- Thi s prove s (R3).
0,
150 Co-integratio
n
Next,
This complete s th e proo f o f (R4) . Fro m (32) , But
where C(L ) = [a(l - L) , « L ](A(L))~ 1 [(y, yj']. Thi s complete s th e proof o f (R5) . To prove (R6) , not e tha t Substituting fo r (A(l))" 1 fro m abov e give s C(l ) = as required . Th e matrice s a ± an d y'j _ an d (7i ll'(l)a±)~1 hav e ran k (n - r ) usin g Assumptions A2 , A3, and the definition o f the orthogona l complements a L an d y±. Thus , C(l ) has rank ( n - r). Thi s complete s the proof o f (R6). Not e tha t E(Ax, ) = C(!)A * = r . (R7) follow s immediately fro m (R6) . Finally, t o prove (R8) , firs t writ e C(L) = C(l) + (1 - L)Ci(L) . Thus , from (R5),
Integrating thi s expression give s
as required. Thi s complete s th e proo f o f the theorem . 5.3.2. Interpreting the Results of the Granger Representation Theorem Several feature s ar e noteworth y i n th e theore m prove d above . First , i t may b e see n fro m (R2 ) and (R8 ) respectively that , whil e x , i s nonstationary (becaus e i t contain s th e integrate d error s 2<'= i £ i')> K/x ( i s
Co-integration 15
1
stationary. I n fact , a'\ t provide s th e se t o f co-integrate d combination s of th e x it. Second, despit e th e presenc e o f a drif t ter m i n th e proces s generatin g x t , there i s no linear tren d i n the co-integrate d combinations . Fro m (R6) and (R8), the tren d i n the x , process disappears if y' Lfi = 0(n _ r ) X l . Third, (R6 ) is the conditio n neede d fo r the proces s to be integrate d of order 1 . I f thi s matri x is not o f ful l rank , |A(L) | = 0 will hav e a root of 1 an d a further uni t root ca n be extracte d fro m th e system , leading to a system of 1(2 ) variables. Fourth, C(l ) i s a n n-dimensiona l squar e matri x bu t ha s ran k n — r. Hence, startin g fro m th e assumptio n o f a reduced-ran k matri x n i n th e autoregressive representation , w e deriv e a reduced-ran k matri x C(l) in the moving-averag e representation . A s w e sho w i n Chapte r 8 , i t i s possible t o g o in th e othe r directio n an d deriv e th e resul t tha t a matrix C(l) wit h rank (n — r) implies , for a co-integrate d system , a matri x of the for m n wit h ran k r . Indeed , ther e i s an interesting dualit y between the singularit y of the 'impact ' matri x n fo r th e autoregressiv e represen tation an d th e singularit y o f th e impac t matri x C(l ) fo r th e moving average representation. Th e null space for C(l)' i s the rang e space for n and th e rang e spac e fo r n' i s the nul l space fo r C(l). This follow s fro m using (R7 ) and notin g tha t jr(l)C(l ) = C(l)*r(l) = O n . Thi s dualit y wil l be furthe r i n evidenc e i n Chapte r 8 , whe n w e deriv e th e autoregressiv e representation o f the syste m from it s moving-average representation. Fifth, i f y' ±jj = 0( n - r ) X i, the n fi lie s in th e orthogona l spac e o f y ± an d hence i n th e space o f y . Thus , ft ma y b e writte n a s y/J 0 wher e /? 0 i s a n arbitrary r X 1 vector. Fro m th e expressio n fo r E(a'x t) i n (R4) , note that £(o'x () = —/So , an d th e constan t enter s th e syste m onl y vi a th e error-correction term . Thi s ma y b e see n mor e clearl y b y rewritin g (26) as
If thi s restriction i s not satisfied , th e intercep t enter s th e syste m both i n the error-correctio n ter m an d a s a n autonomou s growt h component . I n Chapter 8 , wher e w e presen t th e Johanse n maximum-likelihoo d procedure fo r estimatin g th e co-integratin g relationships , th e treatmen t of the constan t i s importan t i n determinin g th e estimatio n procedur e an d the se t of critical values to b e used fo r inference. Finally, th e analysi s ca n be extende d t o includ e seasonal components .
152 Co-integratio
n
The theore m ma y als o b e extende d (se e Hylleber g an d Mizo n 1989a ) t o incorporate severa l additiona l representation s o f co-integrate d systems . Among thes e ar e th e Bewle y an d common-trend s representation s (th e latter du e t o Stoc k an d Watso n 1988ft) .
5.3.3. Granger Representation Theorem (supplement) (R9) There exist s a Bewley (1979 ) representatio n
where S2i(L ) an d f l 2 ( L ) ar e ( n — r) x n an d r X n matrice s consistin g of stabl e la g polynomial s of orde r k — 1, an d wher e fii(O ) an d S2 2(0) ar e matrices differen t fro m th e zer o matrix , whil e y * i s a n r x r matri x of rank r. (RIO) Ther e exists a common-trends representatio n where i s a n n x ( n — r) matri x o f ran k n - r , H, a n (n — r) x 1 vector whic h is a linear transformatio n o f 2J=i £ n an <3 The polynomia l matrix C*(L ) i s defined as
The proo f o f th e supplemen t t o th e Grange r Representatio n Theore m i s given b y Hylleberg an d Mizo n (1989c). Next, w e will consider th e DG P give n by equations (l)-(6 ) abov e an d derive a fe w o f th e abov e alternativ e representations . Th e exercis e wil l then b e repeate d fo r anothe r exampl e take n fro m Engl e an d Yo o (1987). Bu t first , w e nee d t o discus s th e importanc e o f eac h o f thes e alternative representations. 8 8 Th e discussio n in Ch . 2 , althoug h dealin g mainl y wit h 1(0 ) variables , i s relevant here . The propertie s o f linea r transformation s o f linea r model s carr y throug h unchanged , an d consequently s o do th e reason s fo r estimatin g particular transformations.
Co-integration 15
3
5.4. Significanc e o f Alternative Representation s The moving-averag e representation i s a natura l starting-poin t fo r analys ing variable s tha t ar e covariance-stationar y afte r first-differencing . However, th e error-correction , interim-multiplier , an d Bewle y represen tations eithe r offe r greate r insigh t int o th e equilibriu m relationship s among th e co-integrate d variable s o r hav e operationa l valu e i n deriving the long-ru n multipliers or the numbe r o f co-integrating vectors. 1. Th e error-correctio n representatio n ha s th e specia l advantag e o f separating th e long-ru n an d th e short-ru n responses . I t i s als o a n important par t o f wha t ha s com e t o b e know n a s th e Engle-Grange r two-step procedure , whic h i s discusse d late r i n th e chapter . A smal l modification o f the error-correctio n representatio n provide s th e interim multiplier representatio n whic h ha s bee n use d b y Johanse n (1988 ) t o develop a maximum-likelihoo d estimato r o f th e dimensio n o f th e co integration space . Likelihood-rati o test s ca n b e use d t o determin e empirically th e valu e of r, the numbe r o f co-integrating vectors. 2. Th e feature s of the Bewle y representation ar e describe d i n Chapte r 2. I n particular , i f n = 2, on e ca n rea d directl y th e estimat e o f th e co-integrating representation . However , a s th e co-integratin g vecto r i s not necessarily uniqu e fo r n > 2,9 th e Bewle y transfor m properl y estimated (b y IV ) wil l giv e consisten t estimate s o f th e co-integratin g space althoug h non-unique estimates o f the long-ru n parameters. 3. Th e common-trend s representatio n decompose s th e non-stationar y series int o a stationary component an d a stochastic trend component . The choic e amon g thes e equivalen t alternativ e representation s i s determined primaril y by th e particula r questio n th e investigato r wishe s to answer.
5.5. Alternativ e Representation s o f Co-integrated Variables: Two Example s 5.5.7. Example 1 Consider th e DG P give n b y equation s (l)-(6) . Tak e \p \ < 1. The n x, and y t ar e co-integrated , an d b y th e Grange r Representatio n Theore m must hav e vecto r autoregressive , error-correction , an d moving-averag e representations. W e deriv e eac h o f these i n turn. 9 B y thi s we mean that, i f x, ha s n>2 components, then, in general , th e dimensio n of the co-integrating space can b e l =s r =s n - l.
154
Co-integration
VAR representation
Equations (33 ) an d (34 ) ar e derive d fro m (l)-(2 ) b y first-differencin g and usin g e t = (xt + ayt). Thus ,
Taking th e invers e o f th e matri x multiplyin g the vecto r o f th e first-dif ferenced x t an d y t, w e have
which i s th e VA R representation , wher e w e hav e relabelle d th e tw o linear combinations o f e lt an d £ 2t a s £ lf an d £ 2t. ECM representation. Thi s follow s directl y fro m th e VA R representa tion:
where we let d = (a — f$)~l(\ — p) an d z , = (x , + ay,) . From th e EC M representation , < 5 i s non-zer o i f an d onl y i f p i s no t equal t o 1 . Bu t p = 1 is precisely th e conditio n tha t make s bot h u, and vt rando m walk s an d lead s t o a non-cointegrate d system . I n othe r words, i f p = l , ther e doe s no t exis t a n a tha t make s th e linea r combination o f x an d y stationary . Fro m (36a) , a t p = l , th e level s variables vanis h i n th e VAR , whic h is the n i n difference s only . Testin g for co-integratio n i s considere d formall y i n Chapte r 7 . Intuitively , however, test s fo r co-integratio n i n this model ma y b e conducte d i n tw o equivalent forms .
Co-integration 15
5
1. Static regression of x, on y t. Th e tes t fo r co-integratio n i s a tes t o f the nul l hypothesi s tha t p = 1 in th e residuals . Thi s nul l ma y be teste d by usin g th e Sargan-Bhargav a (1983 ) o r Dickey-Fulle r statistic s an d tables. 2. Regression using th e error-correction form o f th e system, followed by a test of the null hypotheses HQ\ 6 l — 0, H$: 9 2 = 0 or of the joint null HQ: 9i = 92 = 0. There is a problem here : i f a: is unknown, it must be estimated fro m th e data . Bu t if the nul l hypothesis that p = 1 is valid, a i s no t identifie d an d th e error-correctio n regression , a t leas t i n th e form specifie d b y th e theorem , i s invalid . Onl y i f th e serie s ar e co-integrated ca n a b e simpl y estimate d b y a co-integratin g regression , but a tes t mus t b e base d upo n th e distributio n o f th e statisti c assumin g that the nul l is true. There i s however a solution . I t consist s i n specifyin g th e error-correc tion quit e generally , o n th e line s suggeste d i n Chapte r 2 , an d deducin g the value s of a , 0 ls an d 9 2. That is , in the absenc e o f prior knowledge , one ma y simpl y us e x t — yt i n th e error-correctio n ter m wit h suitabl e lags of x and y adde d to the regression . Recal l tha t in Chapte r 2 we showed tha t th e estimate s o f the short-ru n adjustmen t coefficients, given here b y 9 l an d d 2, ar e invarian t t o assumption s mad e abou t th e long-run coefficient . Thus , 8 l an d 6 2 ar e estimate d consistentl y regard less o f whether o r no t a homogeneou s EC M i s estimated. Therefore , a n equivalent tes t fo r th e nul l o f n o co-integratio n coul d b e constructe d based o n th e regressio n coefficient s withou t requirin g knowledge of th e value of a . The join t tes t o f H O abov e i s mor e efficien t give n tha t th e cross equation restrictio n i n (37 ) an d (38 ) implie s tha t th e error-correctio n term z t-i enter s bot h equations . Furthermore , estimatin g (37 ) an d (38 ) as a syste m i s likel y t o lea d t o estimate s mor e efficien t tha n thos e derived fro m estimatin g (37 ) an d (38 ) separately . Thi s i s because , i n general, neithe r x , no r y, is weakly exogenous fo r the parameter s o f the other equation , owin g to the cross-equatio n restriction . The issu e of single-equation versu s system s estimatio n whic h thi s exampl e illustrate s is discusse d in Chapte r 8 . MA representation. From (7)-(8), we have
The M A representation follows b y expressing u t a s (1 - L)~ 1£lt an d v, as ( 1 - pL)~~ le2t. Thus , multiplyin g bot h side s o f (7' ) an d (8' ) b y (1 - L ) gives
156 Co-integratio
n 5.5.2. Example 2
Assume that , i n the M A representation , th e DG P i s given by
where e , is the vector (e lf , e2,)'. representation. B y direct inversio n of th e polynomia l matrix,
which implies , upo n multiplyin g bot h side s o f (40 ) b y C(L)" 1 an d cancelling,
This i s the autoregressiv e representation . ECM representation. For th e EC M form , w e nee d t o expres s th e DG P as From th e Grange r Representatio n Theorem (R7) , (a^, K^)' solve s From (40) ,
so that # 1 = 1 and a2 = -2 solv e (44). Moreover, (y ls y 2)' solve s Equation (45 ) gives yi ~ 0-4 an d Yz tation i s given by
=
—0.1, an d so the EC M represen -
Co-integration 15
7
It i s easy t o se e from th e EC M representatio n tha t th e long-ru n solutio n is given by
5.6. Engle-Grange r Two-ste p Procedure Engle an d Grange r (1987 ) propose d a two-ste p estimato r fo r model s involving co-integrate d variables . I n th e firs t step , th e parameter s o f th e co-integrating vecto r are estimate d by runnin g the stati c regressio n in the level s o f th e variables . I n th e secon d step , thes e ar e use d i n th e error-correction form . Bot h step s requir e onl y OLS , an d the result s may be show n t o b e consisten t fo r al l th e parameters . I n particular , th e estimates of the parameter s in the firs t ste p converge to thei r probabilit y limits a t rat e T whil e th e element s o f th e vecto r multiplyin g th e error-correction term , i n th e secon d step , converg e a t th e usua l asymptotic rate of T 1/2. This procedur e is convenient because the dynamic s d o not nee d to b e specified unti l th e error-correctio n structur e ha s bee n estimate d (al though i t ma y nevertheless b e sensibl e t o d o so , a s we shall se e below) . We ca n illustrat e thi s usin g a simpl e argumen t whe n ther e i s n o intercept. An importan t implicatio n of the theor y o f series x t integrate d o f orde r one i s tha t th e varianc e o f A* , i s asymptoticall y negligibl e relativ e t o that o f x t. Assum e the n tha t som e dynami c relationshi p link s th e 1(1 ) series {x t} an d {y,}, an d tha t thes e tw o serie s ar e co-integrated . Consider th e stati c regression o f y t o n x t, Now v, contain s al l o f th e omitte d dynamics , bu t thes e ca n b e re-parameteri/ed i n terms o f Ajc r_7-, Av,_ m , an d (v f _ r - ax t-r), fo r j , m, r > 0, whic h ar e al l 1(0 ) i f co-integrabilit y holds . Thus , a i s consistently estimate d b y th e regressio n despit e th e complet e omissio n of al l dynamics. I n fact ,
(48)
Since {vj i s 1(0) under co-integrabilit y but {x t} i s 1(1),
158 Co-integratio
n
whereas
Thus,
which implie s that Hence a converge s t o a a t a rate o f O p(T) an d no t a t th e usua l rate of Op(T1/2). Convergenc e i s rapi d asymptoticall y an d i t i s thi s rapi d convergence o f th e estimate s o f th e coefficient s tha t i s use d b y Engl e and Grange r as the basis of their two-step estimator. Since & differs fro m a b y term s o f O p(T~l), th e asymptoti c result s for estimatio n o f dynami c model s wit h 1(1 ) variable s wil l b e th e sam e whether a i s estimate d o r known . Moreover , differencin g mus t reduc e the orde r o f integratio n o f a n integrate d variabl e b y unity , s o i f Ay f i s related t o AJC , an d perhap s lag s o f bot h o f these , an d i f {x t} an d {y j are co-integrated , the n y t_i - ax t-i i s 1(0) an d can be include d i n the ECM mode l a s if a wer e know n (that is , the samplin g variance of a ca n be ignored) . I f _{y t] an d {x,} ar e no t co-integrated , the n w e hav e th e familiar spuriou s regression problem ; i f the y ar e co-integrated , th e benefits accruin g from a static regression ar e potentially large . The so-calle d 'super-consistenc y theorem ' du e t o Stoc k (1987 ) ma y be stated formall y as follows. THEOREM (Stoc k 1987) . Suppos e tha t x , satisfie s ( 1 — L)x, = C(L)e, wit h C(L) = C(l) + (1 - L)C*(L) , wher e C*(L ) ha s all o f its laten t root s insid e th e uni t circle . I f C*(L ) i s absolutel y summable,10 th e disturbance s hav e finit e fourth-orde r absolut e moments, an d x , i s CI(1,1) wit h r co-integrating vectors (incorpor ated i n a matrix «) satisfying , uniquely, then11 Thus, instea d o f convergin g a t rat e T 1/2, a s i n stationar y processes , 10 Th e infinit e sequence {c ;}f i s sai d t o b e absolutel y summabl e i f 2*= i c j < °° . Fo r th e matrix C*(L ) t o b e absolutel y summable , th e conditio n i s that 27= ollCj1 l < °°. 11 Th e element s o f q an d Q wil l typicall y be al l zeroes and ones , definin g one coefficien t in eac h colum n o f «to be unit y and defining rotation s i f r > 1 . M = pli m E(T~2 2,^i x r x D-
Co-integration
159
least-squares estimator s converg e a t a rat e o f T. Thi s theore m an d th e error-correction representatio n o f co-integrated system s may be allie d t o give the followin g theorem . THEOREM (Engl e an d Grange r 1987) . Th e two-ste p estimato r o f a single equatio n o f a n error-correctio n syste m with one co-integrat ing vector , obtaine d b y takin g th e estimat e & of a fro m th e stati c regression i n place of the tru e value for estimatio n o f the error-cor rection for m a t a secon d stage , wil l hav e th e sam e limitin g distribution a s th e maximum-likelihoo d estimato r usin g th e tru e value o f a . Least-square s standar d error s i n th e secon d stag e wil l provide consistent estimate s of the tru e standard errors .
5.6.1. Sketch-proof of Engle-Granger Theorem (Bivariate Case) The followin g i s a proof o f thi s theorem fo r th e bivariat e case . Conside r the estimatio n o f ft and y in the tw o equations give n by
y, an d x t ar e co-integrate d 1(1 ) variable s wit h th e co-integratin g para meter give n b y a . I n th e contex t o f th e discussio n i n thi s chapter , th e error-correction mechanis m i s estimate d i n (53 ) usin g th e tru e valu e of th e co-integratin g parameter , whil e i n (54 ) a i s substitute d fo r a , where a i s derive d fro m th e stati c regressio n o f y t o n x t. Also , e * = e « + y(« - oc)x t-]_. Le t zt = yt- «x tWe nee d t o sho w that th e asymptoti c distributions of the estimator s f t and y , o f / 3 an d y respectively , ar e th e sam e regardles s o f whethe r on e uses a o r a (tha t is , whether one estimates (53 ) o r (54)). . In standar d fashion , w e hav e fro m (53 ) (assumin g adequat e initia l values)
The estimator s derive d fro m (54 ) ar e als o give n by (55 ) bu t wit h z t-\ and e f replacin g z t an d s t. From this , i t is easy to deduc e tha t th e resul t will be demonstrate d if the followin g condition s are show n to be true :
160 Co-integratio
n
(iii) th e asymptotic distribution s of are th e same ;
(iv) th e asymptoti c distribution s o f are the same . In (53) , we assum e tha t {e,} i s a n innovatio n proces s suc h tha t E(Axt£t) = 0. Note firs t that , b y th e propertie s o f 1(0 ) an d 1(1 ) series , a s use d an d discussed i n Chapter s 3 and 4 , th e followin g expression s ar e O p(l) (tha t is, non-explosiv e an d non-degenerat e a s T— > <*>) :
Secondly, Using (59) ,
Result (i ) now follows fro m (57 ) an d (58) . Also ,
Co-integration 16
1
Result (ii ) now follows from (56), (57) , an d (58) . Finally,
By (57 ) an d (58) , th e las t tw o expression s o n th e right-han d sid e o f th e above equalit y ar e O p(T~1/2). Resul t (iii ) follows , an d (iv ) i s prove d analogously from :
6
Regression wit h Integrate d Variables We hav e see n ho w th e presenc e o f integrated variables pose s som e special problem s whic h do no t appea r whe n workin g wit h station ary series . Thes e migh t lea d u s t o believ e tha t a ne w rang e o f techniques need s t o b e considere d i n orde r t o handl e suc h data . However, a s w e sho w i n thi s chapter , w e ca n continu e t o appl y standard regression s i f w e pa y attentio n t o order s o f integratio n and us e dynami c specification s whic h tak e accoun t o f an y co integrating relationships amon g the variables . The Engle-Grange r theore m i n Chapte r 5 , layin g emphasi s o n simpl e static regressions , implie s a goo d dea l abou t th e wa y i n whic h a n investigator ough t t o procee d wit h a n econometri c stud y o f integrate d variables. Som e o f thi s i s relate d t o th e evolutio n o f modellin g practic e among econometricians . Econometricians o f th e 1970 s bega n t o b e suspiciou s o f regression s using dat a i n levels . Thei r suspicion s wer e reinforce d b y worrie s expressed b y time-serie s analyst s relatin g t o spuriou s regressions . Th e focus o f attentio n bega n t o shif t toward s th e nee d t o hav e properl y specified model s wit h ric h dynami c structures . Th e move , followin g Mizon (1977) , Sim s (1977) , Hendr y an d Mizo n (1978) , an d Hendr y an d Richard (1982) , wa s toward s a metho d o f econometri c researc h tha t preferred model s whic h began wit h as general a specification as possible, and continue d wit h simplificatio n to a parsimoniou s econometri c mode l following fro m imposin g constraints consisten t wit h observe d data . (Se e Spanos (1986 ) fo r a detaile d treatment. ) Th e literatur e o n co-integratio n reinstated som e confidenc e i n stati c regression s i n levels , an d goo d econometric metho d appeare d t o hav e take n a ful l circle ; a s long a s th e 1(1) variables were co-integrated, suc h regressions mad e sense . There ar e nonetheles s severa l reason s fo r continuin g t o trea t stati c regressions a s being i n general sub-optimal . Firs t o f all, the estimat e a is biased fo r th e co-integratin g paramete r <x and , althoug h tha t bia s i s Op(T~l), i t ca n b e substantia l in finit e samples . Th e bia s i s likely t o b e a functio n o f som e paramete r suc h a s th e mea n la g o f th e dynami c adjustment proces s relatin g {y,} t o {x t}. I n som e circumstances , there -
Regression wit h Integrated Variables 16
3
fore, a retur n t o dynami c modellin g woul d see m t o b e th e appropriat e response t o th e problem s o f static-regressio n biases . Alread y a bod y of work exist s demonstratin g th e poo r performanc e o f static regression s fo r many type s o f proble m (Banerjee , Dolado , Hendry , an d Smit h 1986 , and Stoc k 1987) . Second , th e distribution s o f coefficien t estimate s wil l typically tak e non-standar d form s eve n wher e th e serie s ar e co integrated. Th e 'non-standardness' , b y which we generall y mean asymp totic non-normality , come s fro m th e propert y tha t th e serie s ar e integrated o f orde r greate r tha n o r equa l t o 1 . Th e fundamenta l point is that th e distribution theor y tha t applie s t o non-stationar y serie s i s different fro m th e familia r Gaussia n asymptoti c theory . Th e estimator s have distributions , i n general , whic h ar e functional s o f th e Wiene r processes discusse d i n Chapters 1 and 3. However , som e o f the standar d asymptotic theor y ma y be restore d i n dynamic models. We wil l elaborat e o n th e secon d o f thes e points , leavin g a discussion of th e firs t unti l Chapter 7 . I t i s important t o poin t ou t a t th e outset , i n order no t t o mislea d readers , tha t i t i s no t tru e tha t single-equatio n dynamic models ar e necessaril y superio r t o thei r static counterparts. Th e next tw o section s presen t example s wher e single-equatio n dynami c models d o perfor m satisfactorily . Yet , a s th e discussio n i n Chapte r 8 shows, i t i s possibl e t o construc t man y case s wher e single-equatio n dynamic model s b y themselve s ar e no t sufficien t fo r obtainin g efficien t and unbiase d estimate s (se e Engl e e t al. 198 3 an d Phillip s an d Loreta n 1991). There ar e severa l interrelate d difficultie s whic h ar e importan t an d which collectivel y impl y that the issu e is broader tha n simpl y a comparison o f dynami c wit h stati c models . A n informa l descriptio n o f th e problems encountere d i n modellin g non-stationar y variable s i n a singleequation framewor k woul d identif y a t leas t fiv e effects . First , th e presence o f uni t root s induce s non-standar d distribution s o f th e coeffi cient estimates . Second , th e erro r proces s ma y no t b e a martingal e difference sequence . Third , th e explanator y variable s ma y eac h b e generated b y processes that displa y autocorrelation ; take n i n conjunction with th e secon d effect , thi s give s ris e t o 'second-order ' biases . Fourth , there ma y be mor e tha n on e co-integratin g vector . Finally , th e explanat ory variable s i n th e singl e equatio n ma y no t b e weakl y exogenou s fo r the parameter s bein g estimated . Wea k exogeneit y ca n fai l if , say , a co-integrating vecto r enter s mor e tha n on e equatio n i n th e syste m generating th e variables . Static regression s ca n b e affecte d b y al l fiv e o f th e problem s liste d above, whil e dynami c model s ma y b e abl e t o accommodat e th e firs t three effects , a s i n th e example s give n i n th e section s tha t follow . However, estimate s derive d fro m single-equatio n dynamic model s ar e not optima l i f wea k exogeneit y fail s t o hold . Thi s fina l observatio n
164 Regressio
n wit h Integrated Variable s
extends th e discussio n fro m th e real m o f modellin g unit-roo t processe s to th e all-encompassin g real m o f genera l econometri c modelling . Thi s discussion i s formalize d i n Chapte r 8 an d illustrate d wit h severa l examples.
6.1. Unbalance d Regression s an d Orthogonalit y Tests Mankiw an d Shapir o (1985 , 1986 ) dre w attentio n t o a problem tha t ma y arise i n applyin g standar d distribution s t o inferenc e wher e ther e ar e non-stationary (o r borderlin e non-stationary ) serie s present , an d i n particular t o th e proble m o f inference concernin g orthogonalit y betwee n series. Whil e th e proble m is , a s wit h spuriou s regression , essentiall y a problem o f integrate d data , i t wil l appea r wit h near-integrate d dat a i n finite samples. 1 Wit h thi s qualification, the proble m ma y be sai d t o aris e in unbalanced regressions : tha t is , regression s i n which the regressan d i s not o f th e sam e orde r o f integratio n a s th e regressors , o r an y linea r combination o f the regressors. 2 The Mankiw-Shapir o discussio n centre s o n a condition suc h as Et-i(yt) =
c , implying y, = c + vt, E
t^(vt)
=
0, (1
)
where £ ( _i i s interpreted a s the expectation , conditiona l o n informatio n realized a t tim e t — 1, o f th e valu e of som e variabl e whic h may b e date d in th e future . Tha t suc h a conditio n hold s i s ofte n teste d wit h a regression suc h as where c^ = 0 under th e nul l hypothesi s tha t (1 ) holds . Example s o f such hypotheses an d test s aris e frequentl y i n model s tha t postulat e th e ful l use o f al l realized information . On e suc h exampl e fro m macroeconomic s is Hall' s (1978 ) formulatio n o f th e life-cycle/permanent-incom e model , which, give n a stringen t se t o f assumptions , implie s tha t consumptio n should follo w a rando m walk . Test s o f thi s hypothesi s hav e typicall y taken th e for m o f regression s o f difference d consumptio n o n a constan t and on e o r mor e lagge d incom e o r consumptio n terms ; unde r th e nul l hypothesis th e coefficient s o n th e lagge d term s shoul d no t b e signifi cantly differen t fro m zero . Mankiw an d Shapir o sugges t examinin g th e cas e i n whic h th e regres sor x t follow s the AR(1 ) process : 1 Whil e th e experiment s reporte d her e us e borderlin e stationar y data , th e result s wil l also appl y t o integrate d series . - Thes e ar e sometime s calle d inconsisten t regressions . Inconsistenc y i n thi s sens e i s unrelated t o th e concep t o f an inconsisten t estimato r o f a parameter: se e n . 3.
Regression wit h Integrated Variables 16
5
with corr(e ( , v t) = p an
d corr(e
t+; -,
v t) = 0 V; + 0.
Note tha t thi s is not a problem o f simultaneity bias: th e regresso r x t-\ is uncorrelated wit h v t. A structur e suc h a s thi s i s appropriat e i n man y models i n whic h thes e test s hav e bee n used . I n th e Hal l (1978 ) model , for example , p = 1 where x t an d y t represen t curren t incom e an d th e change i n curren t consumptio n respectively . Manki w an d Shapir o us e Monte Carl o simulation s t o tabulat e estimate s o f th e actua l rejectio n frequencies an d critica l value s i n /-typ e test s o f H 0: c 2 = 0, whe n standard ^-value s ar e used . Tabl e 6. 1 reproduce s a selectio n o f thei r results for model (2 ) and als o fo r the mode l with a linear time trend ,
TABLE 6.1 . Percentag e rejectio n frequencie s o f standar d f-test s a t nominal 5 per cen t level 3 DGP: (1 ) + (3) ; Sampl e siz e = T; No . o f replications = 100 0 Model (2)
e\P
Model (4)
1.0
0.9
0.8
0.5
0.0
1.0
0.9
0.8
0.5
0.0
30 0.99 26 22 0.98 0.95 17 12 0.90 0.00 5 (b) T = 200 0.999 29 0.99 18 13 0.98 0.95 9 0.90 7 0.00 5
24 20 17 12 9 6
20 15 15 10 8 6
11 10 8 7 6 5
7 7 7 6 6 5
60 54 50 38 28 6
45 40 37 30 22 7
36 33 30 25 19 7
16 15 14 12 10 5
6 6 5 6 6 6
23 15 10 7 6 4
20 13 9 7 6 4
10 8 7 6 6 5
5 4 5 5 6 5
61 41 29 17 10 5
48 32 24 14 9 5
38 27 20 12 8 4
18 13 11 7 6 5
5 5 6 6 7 5
(a) T = 50
0.999
a
Thi s tabl e compare s tw o sampl e sizes . Whil e th e tes t siz e distortion s ar e generally smalle r fo r th e large r sampl e an d wil l vanis h as T -» °°, thi s feature i s specific t o th e borderline-stationar y processe s use d (0<1) . Fo r 6 = 1, distor tions will persist as T — * o°. Each of the entrie s a,-, - (expressed a s a fraction) ha s a standard erro r whic h can be approximated by [(a,y)(l - a^/N] 1/2, wher e N i s the number of replications (equa l t o 100 0 here). Source: Manki w and Shapiro (1986) .
166
Regression wit h Integrated Variable s
As wit h th e Dickey-Fulle r statistic s see n earlier , th e siz e distortion s in Table 6. 1 spring from bia s an d skewnes s in the f-typ e statistic . The critica l value s reporte d b y Manki w an d Shapir o ar e no t repro duced here ; Galbraith , Dolado , an d Banerje e (1987 ) sho w that thes e ar e sensitive t o unobservabl e parameter s o f th e underlyin g DGP , and , i n considering a mor e genera l DGP , i t i s possible t o relat e th e proble m o f size distortion s t o co-integratio n amon g regressors , an d s o t o wha t ha s been calle d th e balanc e o r imbalanc e of the regressio n model. 3 General ize (1 ) an d (3 ) t o
with v t ~ I N ( 0 , l ) , e f t ~IN(0,l), E(v t£ls) = 6tsp, E(E lt£2s) = G, an d <5te = 1 i f t = s, an d 0 otherwise . Th e fitte d mode l i s generalize d fro m (2) t o incorporating th e ne w regressor . A classificatio n o f possibl e case s i s given i n Table 6.2 . Th e notatio n NI(/c ) (nearl y integrated) indicates tha t the series , althoug h I(k — 1), ar e clos e t o integrate d processe s o f th e given order . Onl y i n cas e C ar e th e tw o regressor s 'nearly ' co-integrated series, and in this case the co-integrating slope is (1 - 0\\)~ 1Q\2. The siz e distortion s stresse d b y Manki w and Shapir o fo r hig h values of p als o appea r i n th e case s B , D , an d E (se e Galbrait h e t al. 1987) , where [x\ t] i s (nearly ) 1(1 ) an d no t co-integrate d wit h {jc 2t}. Siz e distortions begin to appea r i n case A a s f9 u rise s and cas e A approache s case E . Wher e th e regressor s ar e co-integrated , howeve r (cas e C) , th e TABLE 6.2. Classificatio n of cases of interes t Case A B C D E F
022
«1.0 0.999 «1.0 0.999 0.999 0.999
0.999 «1.0 0.999 «1.0 0.999 0.999
0.0 0.0 ¥=0.0 ¥=0.0 0.0 ¥=0.0
1(0) NI(1) NI(1) NI(1) NI(1) NI(2)
NI(1) 1(0) NI(1) 1(0) NI(1) NI(1)
3 A regressio n i s defined t o b e balance d i f and onl y i f the regressan d an d th e regressor s (either individuall y o r collectively , a s a co-integrate d set ) ar e o f th e sam e orde r o f integration. Th e mer e fac t tha t a regressio n i s unbalance d ma y no t b e a matte r fo r concern; fo r example , AD F statistic s ar e compute d fro m model s that , i n thi s terminology , are unbalanced . The y ar e nevertheles s vali d tool s fo r inferenc e a s lon g a s th e correc t critical value s ar e used.
Regression with Integrated Variables 16
7
regression i s balance d (ther e exist s a linea r combinatio n o f th e regres sors tha t ha s th e sam e orde r o f integratio n a s th e regressand ) an d siz e distortions d o not appear . Cas e F resemble s C except tha t 9 U i s close t o 1, indicatin g that co-integratio n i s broken betwee n th e regressors ; non e the less , siz e distortion s ar e no t detectabl e a s lon g a s # 12 remain s non-zero. Thi s las t findin g demonstrate s th e difficult y o f distinguishing, at modes t sampl e sizes , th e result s o f regression s wit h co-integrate d regressors fro m thos e wit h regressor s o f differing , bu t bot h strictl y positive, orders . We see , i n summary , tha t fo r integrate d serie s (or , i n finit e samples , for th e borderline-stationar y serie s examine d i n thes e papers) , wit h p=l, siz e distortion s ma y emerge whe n ther e i s no linea r combinatio n of regressor s tha t has the sam e order of integration as the regressand. 4 For a n intuitive view of these results , le t u s return t o th e consumptio n example an d conside r th e order s o f integratio n o f th e variable s o n th e two side s o f th e regression . Consumptio n an d incom e ar e bot h typically variables integrate d o f orde r one . Thus , th e regressio n (2 ) ha s a n 1(0 ) variable (difference d consumption ) regressed o n a n 1(1 ) variabl e (lagge d income i n level ) an d th e regressio n i s unbalanced ; th e investigato r i s attempting t o explai n a n 1(0 ) variable b y a variabl e integrate d o f highe r order. Thi s strateg y wil l eventuall y fail , a s th e tw o variable s mus t diverge b y ever-large r amounts . Therefore , a requiremen t o f estimatio n with integrate d variable s mus t be balanc e i n th e order s o f integratio n of the variable s o n th e left-han d an d right-han d side s o f th e regressio n equation. However , ther e ma y be circumstance s i n whic h a test wil l b e designed t o involv e regressan d an d regressor s havin g differen t order s of integration—for example , efficienc y test s suc h a s thos e mentione d above. W e mus t bea r i n mind , o f course , tha t tes t statistic s fro m suc h regressions will have non-standard distributions. The importanc e o f th e latte r poin t follow s fro m th e observatio n that , even whe n th e regressan d (e.g . y t) an d th e regresso r (x t) ar e bot h integrated o f orde r 1 an d ar e co-integrate^ , th e ^-statisti c o n th e coefficient o f x t stil l ha s a non-standar d distributio n whic h make s ordinary t an d norma l tables unusabl e fo r purposes o f inference. O n th e other hand , i f th e orde r o f integratio n o f bot h side s i s zero (whic h may be ensure d b y lookin g fo r a co-integrate d se t o f regressor s an d usin g a sufficiently difference d ter m a s th e regressand) , th e t -statistics ca n b e shown t o hav e asymptoticall y norma l distributions . Thi s implie s som e advantage t o th e us e o f dynami c rathe r tha n static regressions , sinc e lagging variable s an d includin g the m a s regressor s ofte n ha s th e sam e effect a s providin g a co-integrate d se t o f regresso r variables . Th e 4 Campbel l an d Dufou r (1991 ) offer , a s a wa y o f overcomin g th e Mankiw-Shapir o problem, a n alternativ e non-parametric test o f orthogonality which i s independent of some nuisance parameters i n the DGP .
168 Regressio
n wit h Integrated Variable s
essential poin t i s t o fin d som e wa y o f re-parameterizin g th e regressio n such tha t i n th e re-parameterize d form , th e regressors , eithe r jointl y or individually, ar e integrate d o f orde r zero . Correspondingly , th e re gressand mus t als o b e 1(0) . However , provide d n o restriction s ar e imposed, it is irrelevan t whethe r or not the re-parameterizatio n is actually carrie d out . A s w e hav e see n abov e i n Chapte r 2 , non-singular linear transformation s yiel d numericall y equivalen t result s afte r trans forming back , an d s o regressions tha t ar e linea r transformation s of eac h other hav e identica l statistica l properties . Wha t i s important, therefore , is th e possibility o f transformin g in suc h a wa y tha t th e regressor s ar e integrated o f th e sam e orde r a s th e regressand , a possibilit y tha t i s enhanced i n a dynami c model a s th e probabilit y o f a co-integrate d se t being presen t i s increased , although , a s th e discussio n i n th e previou s section shows , care mus t be take n if weak exogeneity fails t o hold .
6.2. Dynami c Regression s The remark s abov e woul d sugges t tha t a sensibl e procedur e fo r econo metric investigations , eve n i n th e presenc e o f integrate d variables , i s t o use a dynami c specificatio n tha t i s a s genera l a s th e constraint s o f dat a and sampl e allow . Th e 'genera l to specific ' modellin g method i s effectiv e here fo r a straightforwar d reason: th e inclusio n of severa l variable s and their lag s as regressors increases the chance s of obtaining a co-integrated set o f regressors . A dependen t variabl e made stationar y by differencin g can b e regresse d o n thi s co-integrate d set , an d standar d t- , F- , an d normal table s ca n be use d for inference. The regressio n would then tak e the for m o f a difference d variabl e a s the regressand , an d difference s an d levels of variables a s regressors. A comprehensiv e accoun t o f th e asymptoti c theor y associate d wit h dynamic regression s o f thi s kin d appear s i n Sim s e t al. (1990) . I n thei r general formulation , the variable s may have drift s an d ar e allowe d t o b e integrated an d co-integrate d o f arbitrar y order . Th e intuitio n fo r thei r results, moreover , i s straightforward. The y show that estimator s o f thos e parameters whic h ca n be rewritte n as coefficient s o n mean-zero , non-integrated regressors , wil l hav e asymptoticall y normal join t distributions , converging a t a rat e T 1/2 t o thei r probabilit y limits . Thi s rewritin g may be accomplishe d eithe r b y differencin g th e regressor s t o achiev e statio narity o r b y linearl y combinin g subset s o f thes e regressor s a s show n i n Chapter 4 . Stationarity , o r mor e precisel y non-integratedness , i s achieved i n th e latte r cas e i f an d onl y i f subset s o f th e regressor s ar e co-integrated. There ar e thre e importan t propertie s o f thes e transformations . First , starting fro m th e origina l dynami c regression an d transformin g linearl y in suc h a wa y that th e regressio n is rewritten i n term s o f non-integrated
Regression wit h Integrated Variable s 16
9
regressors, th e origina l parameter s o f interest ca n b e identifie d fro m th e parameters o f th e transforme d regression . Second , becaus e th e trans formed paramete r estimate s ar e asymptoticall y normall y distributed , s o are th e untransforme d estimates . Again , thi s is because linea r transformations d o no t alte r an y o f th e statistica l propertie s o f th e estimator s o f the regressio n coefficients . Finally , a s show n by th e analysi s in Chapte r 2, becaus e an y informatio n obtaine d fro m a transforme d regressio n ca n be obtaine d fro m a n untransforme d regressio n a s well , th e essentia l point i s not tha t th e transformation s actuall y be undertaken , bu t rathe r that th e scop e exist s fo r th e appropriat e transformation s t o b e made , because appropriat e regressor s ar e present . There is at leas t on e othe r importan t cas e wher e asymptoti c normality of coefficien t estimate s ha s bee n show n t o hold . Th e resul t i s du e t o West (1988 ) an d occur s whe n a stochasti c tren d i s presen t i n th e regression bu t i s dominated , i n th e sens e o f order s o f probability , b y a non-stochastic tren d component . Wes t consider s OL S an d linea r I V models o f the for m where y t i s a scala r 1(1 ) tim e serie s th e firs t differenc e o f whic h ha s a non-zero unconditiona l mean , x 1( i s a vecto r o f stationar y observabl e variables, an d e t i s a stationar y disturbanc e term . Th e dependen t variable w, i s stationar y i f y = 0 bu t non-stationar y otherwise . Wes t shows tha t th e paramete r estimate s a an d y ar e asymptoticall y normal , given tha t E(A.y t) i s non-zero. Not e that , wher e w e take y t = w t-\ an d let X j b e a constant term , w e have this i s th e proces s an d mode l examine d b y Dicke y an d Fulle r (1979) , with th e exceptio n tha t the y too k E(Aw t) = 0. I n tha t case , th e asymptotic distribution o f th e /-statisti c fo r H 0: p = 1 is give n i n Tabl e 4.2. Additio n o f a non-zer o constan t t o th e data-generatio n process , however, make s the asymptoti c distribution o f this statistic normal . Asymptotic normalit y hold s onl y whe n th e non-zer o constan t (an d trend) i n th e DG P i s (are ) matche d b y a constan t (an d trend ) i n th e model. Includin g a tren d i n th e mode l whe n th e DG P doe s no t contai n a tren d destroy s thi s result. I n th e latte r case , th e Dickey-Fulle r critica l values give n i n th e thir d bloc k o f Tabl e 4. 2 ar e agai n th e appropriat e ones t o use. 5 5 Thi s resul t follow s fro m th e similarity propertie s o f th e Dickey-Fulle r statistics . Th e third bloc k o f Tabl e 4. 2 i s compute d b y usin g a pur e rando m wal k (withou t constan t o r trend) a s the DGP and y, = a + yt + py,^ + u, as the model. Whe n th e DGP is altered to include a non-zero constant, with th e mode l remainin g unchanged , th e critica l value s of th e distributions o f p an d o f th e associate d f-statisti c ar e no t affected : th e distribution s ar e invariant t o th e valu e o f th e constan t i n th e DG P whic h implie s similarity . A detaile d discussion o f this issu e appear s in Kivie t an d Phillip s (1992) .
170 Regressio
n wit h Integrated Variable s
Since som e non-zer o constan t seem s likel y t o b e presen t i n th e firs t differences o f th e processe s generatin g man y economi c tim e series , w e might suspec t tha t th e Dickey-Fulle r distribution s examine d i n detai l i n Chapter 4 ma y b e o f limite d relevance . However , th e relevanc e o f th e Dickey-Fuller (DF ) value s i s determine d b y th e relativ e magnitude s of the drif t ter m an d the standar d deviation o f the process . Hylleberg an d Mizo n (19896 ) presen t som e simulatio n evidenc e fo r the AR(1 ) model . Th e critica l value s derive d fro m ou r simulations , fo r sample size s and value s of the constan t chose n b y Hylleber g an d Mizon , are give n i n Tabl e 6.3 . Sinc e i t i s th e siz e o f ^ relativ e t o (var(e,)) 1/2 that i s relevant , an d va r (et} = 1 i n th e experiment s reporte d i n Tabl e 6.3, w e treat \i a s this ratio rather tha n th e valu e o f the constan t term in the DG P alone . I n general, for l u/(var(e())1'2 ^ 1/2 , th e critical values of the norma l densit y ar e close r t o (althoug h les s than ) th e actua l critica l values tha n are the DF critical values. TABLE 6.3. Empirica l cumulativ e distribution o f t( p = 1) DGP: x t = (JL + x,-i + et, e t ~ IN(0,1) , x 0 = 0; model: x t = /j. + px t.~i + et Sample siz e Probabilit y o f a smaller value" (T) 0.01 0.02 5 0.0 5 0.1 0 0.5
DFb
0 0.9
0 0.9
5 0.97 5 0.9 9
-3,.43 -3.12 -2.86 -2.57 -1.56 -0.44 -0.07 0.23
(a)n = 0 50 -3,.57 100 -3,,50 ^ .47 200 400 -3,.45 (b) ni = 0.001 50 -3,.57 100 -3,,49 200 -3,,47 400 -3,,46 (c) (i = 0.010 •! ,57 50 100 -3,,50 200 -3,.47 400 -3,,46 (d) /j: = 0.10 50 -3,,53 100 -3.,41 200 -3,,34 400 -3,,17
0.60
-3.22 -3.17 -3.15 -3.13
-2.92 -2.89 -2.88 -2.87
-2.60 -2.59 -2.58 -2.57
-1.55 -1.56 -1.56 -1.56
-0.40 -0.42 -0.42 -0.43
-0.03 -0.06 -0.06 -0.07
0.30 0.26 0.25 0.25
0.66 0.62 0.62 0.62
-3.22 -3.17 -3.14 -3.14
-2.93 -2.89 -2.88 -2.87
-2.60 -2.58 -2.57 -2.57
-1.55 -1.56 -1.56 -1.56
-0.40 -0.42 -0.42 -0.43
-0.03 -0.06 -0.06 -0.07
0.29 0.26 0.26 0.24
0.65 0.63 0.63 0.62
-3.22 -3.16 -3.14 -3.13
-2.93 -2.90 -2.88 -2.87
-2.60 -2.58 -2.57 -2.57
-1.55 -1.56 -1.56 -1.56
-0.40 -0.41 -0.41 -0.42
-0.03 -0.05 -0.06 -0.06
0.29 0.27 0.26 0.26
0.67 0.64 0.64 0.64
-3.17 -3.08 -3.00 -2.83
-2.87 -2.80 -2.71 -2.54
-2.54 -2.48 -2.37 -2.19
-1.46 -0.22 -1.37 -0.08 -1.20 0.11 -0.94 0.34
0.16 0.30 0.48 0.70
0.49 0.61 0.81 1.02
0.89 0.98 1.18 1.39
Regression with Integrate d Variable s 17
1
TABLE 6.3 (cont.) Sample size Probabilit y of a smaller value" (T) 0.01 0.02 5 0.0 5 0.1 0 0.5 0 0.9 0 0.9 5 0.97 5 0.9 9
(e) n = 0.25 50 100 200 400 (/) J* = 0.5 50 100 200 400 (g) J* = 1 50 100 200 400 (h) n = 10 50 100 200 400
N(0, 1)
.35 -3 ,10 -2.86 -2.68
-2.95 -2.72 -2.48 -2.31
-2.64 -2.41 -2.16 -2.00
-2.28 -2.05 -1.80 -1.63
-1.,02 -0,.74 -0,.52 -0,.36
0.29 0.53 0.74 0.90
0.67 0.90 1.11 1.26
1.00 1.21 1.42 1.58
1.35 1.58 1.77 1.95
-2.94 2.70 2.60 -2.50
-2.53 -2.33 -2.21 -2.14
-2.20 -2.01 -1.89 -1.82
-1.81 -1.64 -1.53 -1.46
-0.51 -0,.35 -0..25 -0,.18
0.78 0.93 1.02 1.09
1.15 1.29 1.40 1.45
1.47 1.61 1.71 1.79
1.85 1.98 2.06 2.16
-2.65 _2.52 -2.48 -2.42
-2.26 -2.15 -2.10 -2.06
-1.93 -1.84 -1.77 -1.74
-1.55 -1.47 -1.41 -1.37
-0.24 -0.17 -0.12 -0.09
1.06 1.12 1.15 1.18
1.44 1.49 1.53 1.54
1.76 1.81 1.84 1.87
2.16 2.19 2.20 2.25
-2.44 _2.38 -2.35 -2.35 -2.32
-2.03 -2.00 -1.99 -1.98 -1.96
-1.71 -1.68 -1.67 -1.66 -1.65
-1.32 -1.31 -1.30 -1.30 -1.28
-0.02 -0.02 -0.01 -0.01 0.00
1.28 1.27 1.27 1.27 1.28
1.65 1.65 1.65 1.64 1.65
1.99 1.97 1.97 1.96 1.96
2.39 2.35 2.33 2.33 2.32
•!
a
Th e entrie s i n thi s tabl e ar e base d o n a t leas t 100,00 0 replication s usin g GAUSS. Fo r an y /j,, th e sample s a t th e smalle r sampl e size s ar e sub-sample s of the large r samples , tendin g t o reduc e variabilit y acros s T fo r an y jj,. I n consequence the result s are monotoni c in T fo r give n [i. b Source: Secon d block of Table 4.2, in the limitin g case T -* °°. For a dat a serie s suc h a s annua l GNP , E(Alo g (GNP,)) = 0.025, which i s roughl y the sam e a s th e standar d deviatio n o f th e series . Sinc e the rati o o f ^ t o th e standar d deviatio n o f th e serie s i s clos e t o 1 , we refer t o th e ', u = 1 ' block , whic h suggest s tha t fo r th e GN P serie s th e appropriate critica l value s ar e quit e clos e t o thos e o f th e norma l distribution. West's resul t i s i n th e spiri t o f th e discussio n earlie r i n thi s section . Asymptotic normalit y prevail s onl y i n th e absenc e o f dominatin g sto chastic trends , becaus e i n that cas e conventional 6 central-limi t theorem s may b e use d t o deriv e convergenc e result s fo r th e paramete r estimates . 6
B y 'conventional' we mean those applying to stationar y ergodic processes.
172
Regression wit h Integrated Variable s
This ca n b e achieve d i n a regressio n whic h ca n b e rewritte n i n term s o f non-stochastically trendin g components , o r wher e a deterministi c tren d dominates th e stochasti c one . The nex t sectio n applie s th e asymptoti c theor y derive d earlie r t o regressions wit h integrated variables . Th e firs t tw o example s ar e deriva tions, fo r specifi c cases , o f asymptoti c distribution s o f estimator s o r tes t statistics. W e the n consider th e issu e o f dynami c modellin g mor e generally, b y firs t presentin g a n exampl e fro m Stoc k an d Wes t (1988) . Five mor e example s the n appl y thi s theory . Th e fina l sectio n look s a t the issu e o f co-integratio n testin g whe n th e origina l data-se t ha s bee n transformed (fo r example, b y differencing or b y taking logarithms). 6.2 .1. Asymptotic Normality of Unit-root Tests (West 1988) Consider a DG P tha t contain s a constan t o r a constan t an d a trend . I f the sam e variable s appea r i n th e model , th e estimato r o f th e coefficien t of th e lagge d dependen t variabl e i s asymptotically normall y distribute d and doe s not hav e a Dickey-Fuller type distribution . Assume tha t y t i s generated by (fi h ¥= 0)
The modl is given by Define a scaling matrix
(7) Now, notin g tha t y, = \n bt + ^'s=\us = fj, bt + St, i t i s possibl e t o sho w that
Regression wit h Integrate d Variable s
173
An importan t featur e o f this derivatio n i s that, becaus e o f the particula r scaling matri x chosen , onl y th e deterministi c par t o f y, plays a rol e i n generating th e join t distributio n o f T~^ 2^]= \ut an d T~ 3//2 XT= i3 ; r-i M w e need conside r onl y the distributio n of
(see Tabl e 3.3) becaus e
since
The scalin g factors are suc h that an y term wit h th e stochasti c componen t of y t, namel y S t, ha s a degenerat e asymptoti c distributio n an d ma y b e ignored asymptotically . From (7) , w e have tha t
where
Now, lim r^» T 2 2f= i*-i= M6/ 2 an d (where, again , onl y th e deterministi c componen t o f is importan t i n determining thes e limits) . Thu s B r -5>B. A simpl e applicatio n o f Slut sky's theore m an d Cramer' s theore m i s then neede d t o prove , usin g (8) , that
174
Regression wit h Integrated Variable s
From (9 ) i t ma y the n b e deduce d tha t T Looking no w at th e t -ratio,
3/2
(pb - 1 ) =>N(0, 12a 2/j4).
s 2 i s a consisten t estimato r o f <7 2, whic h ma y b e constructe d fro m th e residuals o f th e estimate d model , s o plim i = o u. Usin g th e result s derived earlier ,
The resul t tha t t(pb) i s asymptoticall y distribute d a s a standar d norma l variable no w follow s fro m a n inspectio n o f (10 ) an d b y notin g tha t This las t resul t i s remarkable. Th e asymptoti c distributions o f estima tors i n model s wher e stochasticall y trendin g variable s ar e presen t ar e typically o f th e Dickey-Fulle r type . I n direc t contrast , th e asymptoti c distribution o f th e estimator s i n (6 ) i s bivariate norma l whe n \i b = £ 0. Similar result s appl y i f bot h th e mode l an d th e data-generatio n process contai n a drift term , /j, c, an d a trend. In this case T 5/2(pc - 1 ) ^ N(0, ISOcr2/^), an d agai n t(p c) ^> N(0, 1). 6.2.2. Co-integrating Regression Consider th e followin g bivariat e syste m o f co-integrate d variable s {y t}^ and {x t}™:
where 8 ts i s th e Kronecke r delta . Th e least-square s estimato r o f / ? i s given by
(13) Thus,
(14)
Regression wit h Integrated Variable s 17
5
From (3.22) ,
In orde r t o deriv e th e limitin g distributio n o f T 2j t= ixtut, i t i s convenient t o condition u t on e t i n the followin g fashion: By construction, E(e tvs) = 0 V t ¥ = s . Define W E(r) an d W a(r) a s th e independen t Wiene r processe s o n C[0,1] obtaine d fro m th e {£ t}i an d {v t}i series , respectively . Now , using (12 ) an d (15) ,
(16) Using (3.23 ) (als o se e Phillips 1987a : 282),
By the property assume d for the e, series,
Finally,
The proo f fo r (19 ) i s simila r t o thos e presente d i n Chapte r 3 an d i s given i n Phillip s (1986 : 327) . Equatio n (20 ) follow s fro m (i ) e t an d v t being identicall y an d independentl y distribute d processe s wit h zer o means an d variance s o f o 2e an d al respectively , an d (ii ) th e independ ence o f the e t an d v t processe s (obtaine d b y construction). The limiting distribution o f (16 ) ca n now be deduce d b y using (17)-(20) an d is
176 Regressio
n wit h Integrated Variable s
It ca n be show n that (se e Par k an d Phillip s 1988, an d Table 3.3 )
The ter m S 2 i s a consisten t estimato r o f a 2u an d ma y b e calculate d fro m the residual s of the estimate d regressio n o f y, on x t. Thus, usin g (22) ,
In general , therefore , th e f-rati o o f f l wil l no t hav e a standar d norma l distribution unles s y = 0 (tha t is , unles s x t i s strongly exogenou s fo r th e estimation o f /?) . Whe n y ^ O , th e firs t ter m i n (24 ) give s ris e t o 'second-order' o r 'endogeneity ' bia s (se e Phillip s an d Hanse n 1990) , which, althoug h asymptoticall y negligibl e i n estimatin g f i du e t o supe r consistency, ca n be importan t i n finit e samples . The Durbin-Watso n statistic , compute d fro m th e residual s o f th e estimated regressio n (11 ) ma y be show n t o converg e i n probability limit , to 2 . Thi s resul t follow s fro m ou r assumptio n tha t th e u t ar e independ ently an d identicall y distributed . I f th e u t ar e first-orde r autoregressiv e with autocorrelatio n paramete r p\, the Durbin-Watso n statisti c tend s t o the usua l 2( 1 — Pi), familia r fro m th e asymptoti c theor y fo r stationar y processes. Not e that , if pi = 1, {y t} and {x,} are not co-integrated , and the estimate d valu e o f th e Durbin-Watso n statisti c shoul d b e clos e t o zero. Thi s propert y i s th e basi s fo r th e Sargan Bhargava test fo r co-integratio n (se e Chapte r 7) . The existenc e o f nuisanc e parameter s ha s importan t effect s upo n th e distribution o f p . I n th e ligh t o f Sectio n 6.2. 1 thi s i s t o b e expected . Suppose [x t}i i s generated b y (fo r \i b ¥ = 0)
Regression wit h Integrated Variable s 17
7
Then
By result s i n Stoc k (1987 ) an d Wes t (1988) , an d intuitivel y fro m th e orders o f magnitude involved,
Following Wes t (1988 ) (se e also Sectio n 6.2. 1 above), i t ma y the n b e shown tha t n bT~^2^^= 1tut, an d henc e r~ 3//2 2f= i^Wf> i s normall y distributed wit h mea n zer o an d varianc e ^ 2ba2u/3. Fro m Sectio n 6.2.1 , plim r~3 2f=i^ 2 = M&/3 - Hence , b y Slutsky' s theore m an d Cramer' s theorem,
6.2.3. Example (Stock an d West 1988) This exampl e describe s ho w a dynami c regressio n equatio n ca n b e transformed t o validat e th e us e o f asymptoti c normal-distributio n theory. W e nex t formaliz e th e argument s b y presentin g a genera l theoretical framework . All o f th e example s discusse d in this section ma y be viewe d a s specia l case s o f thi s genera l formulation . Thi s generaliza tion i s necessar y t o illustrat e th e subtletie s inheren t i n derivin g th e distribution theory . Fou r mor e example s follo w th e descriptio n o f th e general theory . Thes e elaborat e upo n an d illustrat e some specia l aspect s of th e theor y an d yiel d recommendation s fo r empirica l modellin g wit h integrated series . Stock an d Wes t (1988 ) i s on e o f severa l paper s dealin g wit h test s o f the Hal l (1978 ) permanen t incom e hypothesis. 7 Hall' s regression s tak e the followin g form : where c t i s consumptio n i n perio d t an d y t i s disposabl e income . Th e processes generatin g c t an d y t ar e assume d t o hav e tw o importan t properties. First , c t an d y dt hav e uni t roots ; tha t is , the y ar e bot h integrated o f orde r 1 . Second , give n tha t th e permanen t incom e hypo thesis i s correct , y t ma y b e show n t o b e co-integrate d wit h c t. Thus , 7 Othe r paper s includ e Mankiw and Shapir o (1985 , 1986) , Banerje e an d Dolad o (1987) , and Galbrait h e t al. (1987) .
1 78 Regressio
n with Integrated Variable s
while c t an d y dt ar e individuall y non-stationary , y dt — ct i s stationary , possibly wit h a non-zero mean . The permanent-incom e hypothesi s ha s tw o implications : first , / 3 = 1; and second , n l = 7T 2 = . . . = n p = 0. I n mos t o f th e discussio n i n Stoc k and Wes t (1988) , J3 i s restricted t o it s hypothesize d valu e o f one . Thus , a tes t o f th e permanen t incom e hypothesi s take s th e for m o f testin g th e joint exclusio n restriction s o n th e TT,- . A join t tes t o f th e restriction s o n / 3 and th e TT , raise s severa l interestin g issues, an d w e wil l dea l wit h these i n the contex t o f a late r example . I t wil l become clea r tha t suc h a joint tes t will no t hav e th e usua l F distribution . Th e F-tes t o n th e JT,- , wit h th e restriction o n 13 imposed, doe s howeve r hav e th e standar d F distributio n asymptotically. The ke y featur e o f th e regressio n give n b y (25 ) i s tha t al l th e coefficients o n incom e ca n b e writte n a s coefficient s o n mean-zer o stationary variables . On e possibl e rearrangemen t o f the variable s yields
or
where k i s th e intercep t o f th e long-ru n consumptio n function, 8 m = fi + k^P=1TTh an d 0 = (0 + 2f= iJr,-) . Theorem 1 i n Sim s e t al. (1990 ) implie s tha t th e OL S estimator s o f {TT,-} ar e jointl y asymptoticall y normall y distributed , convergin g t o th e true value s a t th e rat e T 1//2. Theorem 2 of Sim s e t al. implie s tha t th e t or F-test s o n an y o r al l subset s o f thes e estimate d n { coefficient s hav e the usua l asymptoti c distributions . I t i s wort h re-emphasizin g tha t i t i s only th e existence o f a transformation , t o stationar y an d mean-zer o regressors, tha t i s important. Ther e i s no uniqu e way t o accomplis h thi s transformation, but , becaus e nothin g depend s o n th e precis e parameter ization chosen , uniquenes s i s not necessar y fo r th e result s t o hold . Test s and coefficien t estimate s base d o n an y on e o f th e linearl y transforme d regression model s wil l b e equivalent . I n particular , then , thi s wil l b e true o f the untransforme d regression . Having establishe d th e intuitio n fo r th e result s derive d b y Sim s e t al. , inter alia, i t i s necessar y t o procee d t o a formalizatio n o f th e model . This sub-sectio n o f th e chapter , whil e relyin g heavil y o n Sims , Stock , and Watso n (1990 ) (hencefort h SSW) , doe s no t presen t th e argument s Possibly equal t o zero .
Regression wit h Integrated Variable s 17
9
in al l thei r possibl e generality . Referenc e shoul d b e mad e t o SS W for a complete description . Thei r notatio n is retained fo r convenience.
6.2.4. General Formulation Most o f th e examples usuall y discusse d i n thi s literatur e ma y b e expressed a s special case s of the followin g linea r time-serie s model : where Y , i s a ^-dimensiona l vecto r an d A i s a k x k matri x o f coefficients. Th e N x 1 vecto r o f disturbance s {if, } i s a martingal e difference sequenc e wit h E[tj t\tii, . . ., q r _i] = 0 an d E[ij tri't\rii, . . ., tlt-i] = lNtoTt = \,...,T? The N X N matri x S2 1/2 is the square root of th e covarianc e matri x fl o f th e error s (iJ 1//2tj,). Th e matri x G i s a selection matri x fo r th e errors . I t i s o f siz e k x N , i s assume d t o b e known a priori, and determine s whic h errors ente r whic h equations. I t is also assume d tha t A ha s k j eigenvalue s wit h absolut e valu e les s tha n 1 , and tha t th e remainin g k — k± eigenvalue s are exactl y equal t o unity. In general , th e component s o f Y f ar e rando m term s o f various order s of integration , constants , an d polynomial s i n time . Linea r combination s of element s o f Y f , wit h order s o f integratio n lowe r tha n thos e o f it s component elements , ma y als o b e included . A s lon g a s th e syste m possesses suc h generalized co-integrating vectors, 10 SS W sho w tha t r~p]£(3l1Y/Y! converge s t o a singula r limit , fo r a suitabl y chose n p . Thus, th e analysi s mus t b e undertake n wit h a transforme d se t o f variables Z M The variabl e r Lt ha s severa l importan t properties . First , th e non-singular matrix D i s chose n i n suc h a wa y tha t Z r i s decomposabl e int o it s non-stochastic an d stochasti c components . Second , th e momen t matri x 2f=iZ(ZJ mus t be invertibl e almos t surely . If ther e ar e n o stochasti c tren d component s i n th e decompositio n o f Z, int o it s stochasti c an d non-stochastic components , o r a t leas t n o dominating stochasti c tren d components , the n asymptoti c normalit y of 9
E[ij t\ti1, . . ., ij,_i ] = 0 i s th e propert y tha t define s a martingal e differenc e sequence ; see Ch . 1 (or Hall an d Heyde 1980) . Thi s martingale differenc e sequenc e assumptio n i s not important fo r th e derivatio n o f th e results . Al l convergenc e theorem s i n SS W ca n b e proved whe n th e ij , ar e mixingales (Hal l an d Heyd e 1980 ) an d follo w a process suc h a s th e one give n in , fo r example , Phillip s (1987a) . 10 Thi s i s SSW' s terminology . The y refe r t o suc h vector s a s generalize d co-integratin g vectors t o allo w th e possibilit y tha t no t al l o f th e componen t element s o f th e linea r combination hav e th e sam e orde r of integration .
180
Regression wit h Integrated Variables
the regressio n coefficient s holds , bot h i n th e transforme d an d i n th e untransformed regressions . I n thi s case , w e ar e abl e t o transfor m th e original regressor s an d expres s the m i n term s o f variable s tha t d o no t contain stochasti c trends . Normalit y i s the n a natura l consequenc e o f this transformatio n fo r th e sam e reason s a s i n standar d econometrics , where th e matri x o f sampl e secon d moment s tend s i n probabilit y t o a non-random positiv e definit e matri x and th e usua l central-limit theorem s apply. The detail s o f th e derivatio n o f th e matri x D an d it s existenc e ar e contained i n SSW . W e wil l procee d b y recordin g th e fina l for m o f th e transformation. Lettin g | 1>r = 2s= i1s > ar >d definin g |,- j( (th e /-fol d summation o f th e if^ ) recursivel y a s |fy )( = Ss= il;-i,.s > 1 ^J ^ S> tn e transformation D is chosen suc h that
or, equivalently where
and L i s th e la g operator . Th e variate s v, ar e referre d t o a s th e
Regression wit h Integrated Variable s
181
canonical regressor s associate d wit h Y ( . Th e la g polynomial F U (L) ha s dimension k\ x N , an d ^JLoFiijF'iij i s non-singular . F yy i s assume d t o have ful l ro w ran k k; (ma y be equa l t o zero ) fo r j = 2, . . ., 2 g + 1, so Since w e ma y b e intereste d i n estimatin g onl y som e o f th e k equations i n (28) , we nex t need t o defin e a selectio n matri x C . I f w e needed t o conside r onl y n ^ k , w e could loo k a t th e regressio n o f CY , on Y f _ i , wher e C i s a n n X k matri x o f constants . Th e n regressio n equations t o be estimated ar e the n
The asymptoti c analysi s i n SS W is derive d i n stacke d single-equatio n form. I n orde r t o us e thi s form, we need th e symbo l ® whic h denotes a Kronecker produc t define d a s follows : conside r th e m x n matri x A = {fly } an d th e p X q matri x B ; th e Kronecke r produc t o f A an d B (in that order) i s the m p x n q matrix ,
V e c ( - ) denote s th e column-wis e vectoring operator . Thus , writin g the matrix A a s A = (a 1; a 2 , . . ., a n ), wher e eac h o f th e a , i s a n m x 1 vector, vec (A) is given by
X = [Yi , Y 2 , . . ., Yj--!]', s = vec(S) , v = ve c (if), an d ft = vec((A)'), then (32 ) ca n be writte n in stacked for m a s In orde r t o expres s (33 ) in term s o f th e transforme d regressor s Z = [Z{, Z 2 , . . ., Z'T-_I] ' = XD', not e tha t th e coefficien t vecto r correspond ing to thes e i s given by 6 = (!„ ® D'"1)/?.11 Thus , finally , 11 T o sho w this , substitut e fo r Z = XD' an d 5 = (!„ OD'^ 1)^ i n (34 ) giving s = (!„ ® XD')(In ® D'- 1 )/? + (£ J/2 ® ir _ 1 ) v . NOW (Aj ® A 2 )(A 3 ® A 4) = (A!A 3) ® (A 2 A 4 ), for arbitrar y matrice s A,- , i = 1 , 2, 3 , 4 , provide d th e matrice s ar e conformable . Usin g thi s rule (33 ) is recovered a s required.
182 Regressio
n wit h Integrated Variables
The OL S estimator 5 of 6 in th e stacke d transforme d regressio n mode l (34) is given by It i s possible t o se e fro m (30 ) tha t th e moment s involving the differen t components o f Zt converg e a t differen t rates . Fo r example , Z l j f an d Z 2 , are O p(l) whil e Z 3>f i s O p(t^2), Z 4j , i s O p(t), an d s o on . Henc e th e sample secon d moments , whic h is what we would be intereste d i n when looking a t th e matri x Z'Z, converg e a t a rate o f T fo r th e Z l i t an d Z 2tt components, a t a rat e T 2 fo r th e Z 3;( component , an d a t a rat e T 3 fo r the Z 4 r component . I n orde r t o handl e thes e differen t orders , SS W use the scaling matrix Tr , given by
(36) 1
All the convergenc e result s use the scale d Z' Z matri x T^Z'ZTy ; le t us call this scaled matri x Q . The firs t ste p in th e proo f i s to deriv e th e limitin g matrix for Q . SSW show that , unde r certai n regularit y conditions , Q = $ > V wher e th e elements of V may b e describe d a s follows : (a) V u an d V 12 ar e non-rando m matrice s give n b y S7= o Fn/Fii/ an d 2F=oFii/F2iy respectively . Additionally, V ]2 = V 21. (b) V l p = V ^ = 0, p = 3, ...,2g + l. (c) V 22 is also non-random, give n by F22F22 + S 7=0^21/^21; • (d) V mp , wher e m, p = 3, 5 , 7 , . . ., 2 g + 1, ar e rando m matrice s involving functionals of multivariate Wiener processes . (e) V mp, where m = 2, 4 , 6 , . . ., 2g , p = 3, 5 , 7 , . . ., 2 g + 1, are als o random matrice s involvin g functional s o f multivariat e Wiene r pro cesses. (f) V mp = [2/(p + m-2)] ¥ mm¥'pp, p = 4, 6, . . ., 2g, m = 2, 4, 6, . . ., 2g. This i s the firs t tim e w e have used multivariat e Wiener processes . Th e mathematical detail s involve d i n goin g fro m univariat e t o multivariat e Wiener processe s ar e comple x an d wil l no t b e deal t wit h her e (fo r a good account , se e Phillip s an d Durlau f 1986) . Howeve r th e generaliza tions fro m ou r analysi s in Chapte r 3 can b e understoo d intuitivel y fairl y easily an d the appendix sketche s th e bivariate case . Thus, eac h elemen t o f a standardize d n x 1 multivariat e Wiene r process W(r ) i s a univariat e Wiene r proces s an d th e element s o f W(r ) are independent . I n particular , W(l ) i s the multivariat e standar d norma l
Regression with Integrated Variables
183
density, tha t is , N(0, !„). Further, W(r ) e C[0,1]", wher e C[0,1 ] is the space of continuous function s defined on [0,1] . Convergence result s analogou s t o (3.17) , fo r a sequence o f mean zero random vector s {u (}, ca n b e prove d b y definin g standardize d sum s such as
with (t - l)/ r ^r an d tn e matri x f t i s th e long-ru n variance-covarianc e matrix o f u, - define d b y f t = limr^00.E(T~1S:rS'r) analogousl y wit h (3.16c). Th e {uj innovatio n sequenc e satisfie s conditions equivalen t t o those give n by (3.16a)-(3.16d) fo r the univariat e case . Provide d suitabl e regularity condition s ar e satisfied , the following multivariate analogue of (3.18) may be proved : RT(I-) = > W(r). Finally, multivariat e analogue s o f al l th e convergenc e result s give n earlier fo r univariat e processe s ma y b e derived . Thus , fo r example , referring t o Table 3.3, wher e y, = y r _ j + u r :
To derive th e result s abov e w e have assumed , a s in Table 3.3 , tha t {u j is a white-noise innovatio n sequence wit h !„ a s the varianc e matrix. The nex t ste p o f the argumen t involve s rewritin g the estimato r 6 i n a form suc h tha t it s distributio n ca n b e derived . Thi s i s don e b y firs t defining a non-singula r matri x H which , i n essence , transpose s th e stacked version of the matri x Z. Thus , (37)
From (35) ,
184 Regressio by substitutin g fo r s
n wit h Integrated Variable s fro m (34) . Next , usin g th e resul t tha t
Thus,
(38) As note d abov e th e matri x V is the limitin g matrix of Q . The asymptoti c distribution of
is neede d t o giv e us th e fina l result . Thi s limitin g vector, denote d b y takes th e followin g form:
where (a) (j) m fo r al l m ^ 3 are functional s of multivariate Wiener processes ; (b) 0 2 = 02 i + 022 , wher e ft, 2 = vec[F 22W(l)'S1/2], W(l ) is th e multi variate standar d norma l densit y function, and
Finally,
where (ft , 0 21) ar e independen t o f (0 22, ft , . . ., ft these steps , w e have the followin g theorem.
g+i).
Consolidatin g
This provide s u s wit h severa l interestin g results . First , d, an d henc e /} , is a consisten t estimato r o f 6, respectivel y /J , i n th e presenc e o f arbitrarily man y uni t root s an d deterministi c tim e trends . Thi s observa tion relie s o n th e assumptio n tha t th e mode l i s correctly specified , i n th e
Regression wit h Integrated Variable s 18
5
sense tha t th e error s ar e martingal e differenc e sequences , an d th e T T may rescale by powers of T greate r tha n \. We have alread y noted tha t th e estimate d coefficient s o n th e element s of Z r converg e t o thei r probabilit y limit s a t differen t rates . Hence , if some o f th e transforme d regressor s ar e dominated , i n a n orde r o f probability sense , b y stochasti c components , thei r limitin g distributions will b e non-normal . O n th e othe r hand , i f ther e ar e n o Z , regressor s dominated b y stochastic trend s (tha t is , if & 3 = k 5 = . . . = k 2g+i - 0) , then d, an d henc e ft , ha s a n asymptoti c normal joint distribution . This happens becaus e th e term s involvin g the rando m integrals ar e n o longe r present, a s ma y be see n fro m (30) , where k 3, k$, . . ., k 2g+i ar e th e ranks of matrices multiplying the stochasti c canonical regressors. I f these matrices ar e absent , th e transforme d regressio n i s considerabl y simpli fied a s i t i s expressibl e solel y i n term s o f stationar y variable s and deterministi c tren d terms . I n suc h a case , therefore , H(I B ®T r )(3-*)4. N(0 , H(S ® V^)H') wher e V i s no w a nonrandom matrix . Additionall y th e F-statisti c associate d wit h testin g a n arbitrary se t o f q linea r restriction s R/ J = r, i s asymptotically distributed as $ in this case . If a singl e stochasti c tren d i s dominate d b y a non-stochasti c trend , then, again , asymptoti c normalit y holds . Thi s i s th e resul t o f Wes t (1988) an d ma y b e see n usin g (30 ) and keepin g trac k o f th e rate s o f convergence o f th e sampl e moment s o f th e separat e component s o f Z f . Consider, fo r example , th e se t o f canonica l regressor s give n b y (tit, 1 , %itt, t)' an d suppos e th e transforme d regressio n i s expressibl e i n terms of these canonica l regressors. Thus , whil e the sampl e variability of the stochasti c tren d ter m i s O p(T), tha t o f th e deterministi c tren d i s O(T3/2). A s show n b y Wes t (1988) , an d discusse d i n Sectio n 6.2.1 , i n deriving th e asymptoti c distributio n for thi s case , th e deterministi c trend component dominate s th e stochasti c componen t an d asymptoti c normality follows . The Stock-Wes t (1988 ) example , discusse d earlier , work s because w e are abl e t o rewrit e th e regressio n i n term s of canonica l regressors which do no t hav e an y dominating stochasti c component. Th e issu e o f domina tion, i n this context, i s best addresse d b y looking at the scalin g matrix. Four mor e example s wil l no w b e give n t o illustrat e thes e arguments , using th e framewor k develope d above . Th e fina l exampl e i n thi s se t o f four contain s recommendations fo r modelling with integrated series . 6.2.5. Example (Sims e t al . 1990:119) Let th e proces s {x,} b e generate d accordin g t o th e followin g AR(2) process without drift :
186
Regression wit h Integrated Variable s
Under H 0, f a = 0, f a + fa = 1 and |/3 2| < 1 so tha t th e autoregressiv e polynomial i n (39 ) ha s onl y on e uni t root . I f a constan t i s include d i n the regressio n o f x, o n it s tw o lags , Y , (i n th e notatio n develope d earlier) i s given by
Transforming t o th e canonica l regressor form, 12 w e have
(40) where 61 = —fa, 6 2 = fa , an d 6 3 = f a + fa , Z l>t — Z 3; f = x t. It ma y also be shown that
Z 2 ( = 1 , an d
(41) where 0(L ) = (1 + faL)' 1 an d 0*(L) = (1 - L)" 1 [0(L) Note fro m (41 ) tha t F 2 i(L) = 0. Thi s implies , b y referrin g t o th e description o f th e V matri x above , tha t V i s block-diagonal . Th e estimate d j o f the coefficien t on th e (differenced ) stationary ter m ha s an asymptotically norma l distributio n wit h mea n 0 an d varianc e give n b y Vf]1. Th e margina l distribution o f o 2) however , i s no t normal ; becaus e F23 i s no t equa l t o zero , Z 2 ,t an d Z 3 j r ar e asymptoticall y correlated , and sinc e Z^ t ha s a Wiener distribution , so does the coefficien t o n Z 2:t . If a n intercep t i s no t include d i n th e regression , w e hav e a 2 x 2 block-diagonal V matrix . Th e estimate d coefficien t o j stil l ha s a n asymptotically norma l distribution , wit h d^ convergin g to it s probability limit a t rat e T 1/2, whil e S 3 has a Wiene r distributio n wit h convergence at rat e T . An y join t tes t involvin g di an d 6 3 wil l als o hav e a non-standard distribution. The analog y with the Stock-Wes t exampl e is direct. I n (27 ) we ha d a series o f term s integrate d o f orde r zero . Th e coefficien t estimate s o n al l these stationar y term s were jointly and individuall y asymptotically normally distributed . Th e join t distributio n o f 0 i n (27) , wit h an y o f th e 77, , was o f cours e non-standard . Thi s observatio n applie s equall y well here . There is , however , a n importan t differenc e betwee n th e Stock-Wes t 12 Thi s transformatio n i s no t unique , an d on e coul d imagin e choosin g others ; however , (39) ca n be rewritte n a s x, = (f) l + /3 2)*,_i - /3 2(*,-.i ~ x t-2> + 1t> because j8 0 = 0 under th e null, an d thi s suggest s th e decompositio n give n b y (40) . I t ha s th e advantag e o f makin g 6 l (= — /32) th e coefficien t o f a non-integrate d rando m variable , sinc e x , i s a n integrate d series.
Regression wit h Integrated Variable s
187
example an d th e curren t example . I n th e forme r case , becaus e /3 ha d already bee n se t equa l t o 1 , ou r parameter s o f interes t coul d al l b e written a s coefficient s o n mean-zer o an d non-integrate d variables . Inference coul d the n b e conducte d usin g standar d tables . I n th e latte r case, althoug h w e can us e standar d table s t o tes t fo r th e significanc e o f j32, a test o f fli + /3 2 = 1 still requires u s to us e non-standard distributio n theory (an d s o table s constructe d b y simulation) . I n a sense , ou r rewriting i n term s o f stationar y variables i s not sufficientl y successfu l t o enable u s t o conduc t inferenc e solel y usin g standar d tables . Exampl e 6.2.6 examines this issue in more detail .
6.2.6. Example (Sims e t al . 1990: 128) Suppose no w tha t x, is generate d a s in Sectio n 6.2. 5 bu t /? 0 i s non-zero under the null . The canonica l representation 13 yields
(42)
(43)
where 6(L) an d 0*(L ) ar e define d a s in Section 6.2. 5 above. Here, unlik e th e exampl e i n Sectio n 6.2.5 , ther e ar e n o element s o f Z ( dominated b y a stochasti c integrate d process . Th e stochastic-tren d term i s dominated, i n sample variability, by the deterministic-tren d ter m t. A detaile d discussio n of this case appears i n West (1988) .
6.2.7. Example (Banerjee an d Dolado 1988) This exampl e i s a consolidatio n o f most o f th e principa l points discussed in th e page s above . I t i s a variation of the Stock-Wes t example , an d al l statements concernin g th e distribution s o f variou s paramete r estimate s may be derive d fro m earlie r genera l principles. 13 Thi s decompositio n agai n ha s th e advantag e o f makin g 6 1 th e coefficien t o f a non-integrated variable . Th e motivatio n fo r choosin g thi s transformatio n i s therefor e similar t o tha t give n fo r the exampl e i n Sect. 6.2.5.
188 Regressio
n wit h Integrated Variable s
Consider th e followin g regression :
where y f denote s th e logarith m o f disposabl e incom e an d c t th e logarithm o f consumption , an d bot h variable s ar e 1(1 ) i n levels . Here , although w e hav e non-stationar y variable s a s regressors , i f the y ar e co-integrated wit h each other , a s the y mus t b e i f any o f th e permanent income/life-cycle model s o f consumptio n ar e t o mak e sense , the n thi s co-integration propert y make s bot h side s o f th e regressio n equatio n 1(0 ) and th e /-test s o f th e coefficient s o f al l the regressor s ar e asymptotically normal. Th e long-run - multiplier betwee n consumptio n an d incom e ca n be deduce d muc h as in an y dynamic model. A varian t of (44 ) is the mode l
Although th e individua l t-ratio s ar e asymptoticall y normally distributed , the distributio n o f th e Wal d statistic , use d fo r testin g th e join t nul l hypothesis j 3 =< 5 = 0 , i s a functiona l o f a Wiene r proces s an d it s distribution i s non-standard. Mor e interestingly , if (45) were re-paramet erized a s
where s t-i = y,_i - c t _j, yi = ft + 6, y 2 = j8 , an d st-i ma y be show n to be 1(0 ) under th e assumption s of the permanent-incom e hypothesis , the n I(YI = 0) woul d b e a functiona l o f a Wiene r proces s wherea s f(y 2 = 0) would hav e an asymptoticall y normal distribution . In th e genera l mode l give n b y (44) , th e followin g result s ma y b e proved, using theorems 1 and 2 in SSW (1990): (a) Th e /-statisti c o f eac h coefficien t individuall y i s asymptoticall y normally distributed. (&) Th e F-statistic s o f join t significanc e of an y prope r subse t o f th e se t of stationar y regressor s hav e standar d asymptoti c distributions . Thus, an y tes t o f th e join t significanc e of Ay f _y ( / = 1 , . . ., n — 1 ) and Ac ( _y ( / = 1, . . ., m - 1 ) will hav e th e correc t siz e i f standar d tables ar e used . Further , give n tha t th e non-stationar y variable s ar e co-integrated, i f th e regressor s i n th e non-stationar y se t wer e com bined, say , t o giv e p stationar y regressor s an d q non-stationar y regressors,14 a n F-statisti c tha t use s an y o f th e derive d p stationar y 14 I n (46) , fo r example , p = q = • 1 and th e origina l numbe r o f non-stationar y regressor s (excluding the trend ) is 2.
Regression with Integrated Variables 18
9
regressors i n combinatio n wit h an y o f th e origina l stationar y regres sors wil l also have a standard distributio n asymptotically . (c) Th e F-statistic s o f join t significanc e o f an y subse t o f th e se t o f non-stationary regressor s hav e non-standar d distributions . Moreover, a n F-statisti c tha t use s an y stationar y regressors i n combination wit h an y non-stationar y regressor s wil l hav e a non standard distribution . Point (a ) i s obtaine d fro m th e propert y o f th e non-stationar y regres sors formin g a co-integrate d set ; a s in Sectio n 6.2. 3 above, bot h 6 and /3 can b e writte n a s coefficient s o n mean-zer o stationar y variable s (wit h (46) givin g on e suc h re-parameterizatio n fo r /?) . Th e nex t exampl e reconsiders thi s poin t i n th e contex t o f modellin g practice . Poin t (b) i s not surprisin g becaus e th e F-statistic s considere d us e onl y stationar y regressors. Th e fac t tha t som e o f thes e stationar y regressor s ma y b e re-parameterizations o f som e o r al l of the origina l non-stationary regres sors i s an interesting feature . Point (c ) i s surprising in two respects. Conside r (44 ) and (46) ; the firs t surprising featur e i s th e non-standar d behaviou r o f th e F-statisti c an d the secon d i s that , whil e th e f-rati o o f th e coefficien t o f c t-\ ha s a standard distributio n unde r parameterizatio n (45) , unde r th e linea r re-parameterization give n b y (46 ) th e t -ratio ha s a Wiene r distribution . Both result s follo w fro m th e asymptoti c singularit y o f a particula r variance-covariance matrix. 15 Consider y i i n (46) , whic h tend s t o a non-degenerat e distributio n a t rate T ; T l/22 i s asymptotically normally distributed. Thus ,
and s o
This account s fo r th e asymptoti c singularit y o f th e variance-covarianc e matrix o f [ 6 , /?]' an d th e correspondin g non-standar d behaviou r o f th e F-statistic i n (45) . However , th e distributio n o f Tji ma y b e show n t o be non-degenerate . y \ ca n b e writte n a s a functiona l o f Wiene r processes, an d th e scalin g facto r (o f T ) suggest s th e resultin g non standard distribution . 15 Th e asymptoti c singularit y o f th e variance-covarianc e matri x i s th e proble m o f multi-collinearity in another guise. O n this , also see SS W (1990).
190 Regressio
n wit h Integrated Variable s
It i s instructive t o not e tha t th e regressio n give n by (44 ) would no t b e sensible unles s th e right-han d variables or regressor s wer e co-integrated . A specia l exampl e o f (44 ) wa s discusse d i n sectio n 6.1 , wher e w e spok e of a n unbalance d regression . Thi s i s a muc h mor e genera l poin t tha n that mad e i n th e contex t o f spurious regression. A regressio n involvin g a right-hand se t o f variable s integrate d o f a n orde r differen t fro m th e order o f integratio n o f th e left-han d sid e i s jus t a s problemati c a s a regression betwee n tw o unrelate d non-stationar y series . I n eac h case , the distribution s of the statistic s are non-standard . 6.2.8. Example (Stock and Watson 1988a) Stock an d Watso n (1988a ) provid e a n exampl e o f th e danger s involved in no t properl y takin g accoun t o f th e order s o f integratio n o f th e regressors an d th e regressand . The y se t u p a simpl e data-generatio n process base d o n th e permanent-incom e hypothesis:
where y* = the permanen t componen t o f disposabl e incom e whic h i s as sumed t o follo w a random wal k ct = consumption yst = transitory componen t o f disposabl e incom e whic h is a stationary innovation proces s p, = price leve l in period t. The innovation processes u, and v t ar e uncorrelated . Stock an d Watso n relat e th e tal e o f two econometricians tryin g to tes t versions o f Friedman' s permanen t incom e hypothesis . Th e misguide d econometrician, unawar e o f o r choosin g t o ignor e th e order s o f integration o f the series , estimate s the followin g regressions : c, = <x\ + Pipt (t
o chec k money illusion)
ct = a 2 + $2* (t
o check whethe r consumptio n ha s a trend )
Ac, = a 3 + !3 3Ay, (t
o calculat e the margina l propensity t o consume)
Ac, = 1X4 + 04y t-i (t
o tes t th e permanen t incom e hypothesis).
Each o f the inference s from thes e regressions i s invalid.
Regression wit h Integrated Variable s 19
1
The firs t regressio n i s a spuriou s regressio n o f th e classica l Granger Newbold kind ; c, an d p, ar e unrelate d rando m walks , an d th e eco nometrician's findin g o f a larg e ^-statisti c fo r j8 l5 thereb y leadin g hi m t o conclude i n favour of money illusion, 16 i s a spurious one . The secon d regressio n i s als o spuriou s sinc e i t attempt s t o explai n a random wal k (or, i n other words , a stochastically trending variable) b y a deterministic trend . Nelso n an d Kan g (1981 ) pointe d ou t th e danger s of running regression s whic h attemp t t o de-tren d stochasticall y trendin g data i n th e vai n hop e o f achievin g stationarity aroun d a trend . I n bot h cases th e problem s wit h th e inference s aris e becaus e th e regression s involve variables tha t ar e no t co-integrate d (se e Chapte r 3) . The thir d equatio n appear s t o b e correctl y specifie d bu t nevertheles s leads t o downwardl y biased estimate s o f th e coefficien t for th e margina l propensity t o consum e becaus e disposabl e incom e measure s th e chang e in permanen t incom e wit h error , sinc e i t include s th e chang e i n transitory incom e a s well . Th e fina l regressio n i s wha t w e calle d a n 'unbalanced regression ' a s i t trie s t o explai n a variabl e integrate d o f order zer o b y a variabl e integrate d o f orde r 1 . Th e serie s o f paper s noted abov e (Manki w an d Shapir o 1985 , 1986 ; Banerje e an d Dolad o 1988; Galbrait h e t al. 1987 ) conside r th e exten t t o whic h th e f -statistics in suc h case s ar e biase d awa y fro m zero , leadin g t o misleadin g infer ences abou t th e significanc e of coefficients. Stock an d Watso n compar e th e predicamen t o f thi s econometricia n with econometricia n B , say , wh o look s a t th e result s o f th e followin g alternative regressions :
The inference s fro m eac h o f thes e regression s wil l be , b y an d large , correct. Th e firs t regressio n her e i s th e standar d co-integratin g regres sion an d thi s tim e i s valid. Th e estimat e o f th e coefficien t 61 wil l have a Wiener distributio n bu t wil l be super-consistent . Th e reporte d standar d error wil l be incorrec t owin g to untreated autocorrelation . The secon d regressio n ca n be re-parameterized 17 a s Thus, (5 3 ca n b e writte n a s a coefficien t o n a stationar y variable (a s ca n 62 treate d i n isolation). Th e theory , a s described above , implie s that th e 16 Inferenc e o f thi s kin d woul d appea r t o b e faulty , i n an y case . T o conside r a rejectio n of H 0: fl l = 0 a s a reaso n fo r acceptin g an y specifi c alternativ e i s statistically an d logicall y unjustifiable. 17 O r i n a form analogou s to tha t give n b y (44) .
192 Regressio
n wit h Integrate d Variable s
usual t an d F distributions 18 wil l apply . A simila r argumen t applie s t o the thir d regression , wit h th e exceptio n tha t i n thi s cas e y t~i — ct_i forms th e co-integratin g relation . Stoc k an d Wes t (1988 ) an d Banerje e and Dolad o (1988 ) discus s regressions o f this form i n further detail . The mora l o f th e econometricians ' stor y i s the nee d t o kee p trac k o f the order s o f integration o n bot h side s o f the regressio n equation , whic h usually mean s incorporatin g dynamics ; model s tha t hav e restrictiv e dynamic structure s ar e relativel y likel y t o giv e misleadin g inference s simply fo r reason s o f inconsistenc y o f order s o f integration . Specificit y was clearly th e proble m wit h several o f the model s propose d b y th e firs t econometrician. A genera l t o specifi c metho d o f econometri c modellin g would hav e overcom e man y o f th e problem s o f spuriou s inference s an d non-standard distributions . A n initia l model, mor e genera l tha n th e on e postulated b y the secon d econometrician , o f the form , say, would b e mor e appropriat e fo r inferenc e whe n wea k exogeneit y condi tions ar e satisfied. 19 Accoun t mus t b e take n o f fact s (a)-(c ) o f Sectio n 6.2.7 whe n conductin g suc h inference ; mor e generally , th e exampl e illustrates way s i n whic h th e theor y o f modellin g wit h integrate d variables ha s contribute d t o improvin g ou r understandin g o f wha t constitutes goo d practice i n dynamic modelling.
6.3. Functiona l Form s an d Transformation s We dre w attentio n i n Chapte r 1 t o th e fac t tha t man y economi c tim e series wil l com e clos e t o conformit y with the integrate d model s onl y if a logarithmic transformatio n i s applied . Th e logarithm s o f man y suc h series ma y b e integrated , bu t i t seem s unlikel y that th e untransforme d levels o f macroeconomi c tim e serie s suc h a s consumption , nationa l income, an d th e pric e leve l coul d b e mad e stationar y b y differencin g alone. I t i s worth examinin g this transformation mor e closely , alon g with the effec t tha t i t ma y b e expecte d t o hav e o n a n equilibriu m relation ship. I f th e level s o f tw o serie s ar e co-integrated , d o w e expec t th e logarithms to be co-integrate d also , an d vice versa? Begin by examining a series wit h a tendency t o gro w over tim e subject to stochasti c shock s whic h ten d t o gro w wit h th e underlyin g series. Fo r example, 18 Th e F-distributio n wil l appl y whe n lookin g a t test s o f join t significanc e o f subset s o f regressors, eac h o f which is 1(0). I n thi s example , becaus e on e o f th e regressor s i s 1(1) an d the othe r i s 1(0), th e F-statisti c will hav e a non-standard distribution . 19 Se e Ch . 8 and earlie r discussio n i n this chapter .
Regression wit h Integrated Variables 19
3
where e t ha s a mean o f 1 and i s log-normally distributed. A serie s suc h as Y t might describe a number of economic tim e series , a t leas t i n broad outline. Takin g th e logarithmi c transformatio n o f (51 ) an d usin g lowercase letters t o denot e th e transforme d variables with Y, > 0,
where log(1 + y ) — y and e t = log (e t ). Equation (53 ) i s indee d commonl y use d a s a simpl e characterization of th e logarithm s o f economi c tim e series . A s a descriptio n o f suc h a transformed dat a series , (52 ) o r (53 ) seem s a t leas t admissible ; Ay , i s the growt h rate o f the leve l serie s Y t, and this growth rate varies aroun d a (typicall y positive ) mean . Tha t thi s equatio n coul d describ e th e leve l of th e serie s (s o y t denote s th e origina l dat a withou t th e logarithmi c transformation) seem s implausible , however: (53 ) woul d then impl y that the absolut e amoun t o f growt h varie s aroun d a fixe d mean , an d therefore that , a s th e serie s grows , th e averag e amoun t o f growt h fall s to zer o a s a proportion o f th e serie s itself . Moreover, cr 2 /var(Y < ) would tend t o zero , forcin g th e serie s t o becom e essentiall y deterministi c i n relative terms . Thi s criticis m doe s no t appl y t o (53 ) sinc e a i s a proportion o f Y t. Ermini an d Hendr y (1991 ) conside r th e issu e o f testin g 'logarithm s versus levels' b y formulating a test base d o n the encompassin g principle. The nul l mode l MI may be sai d to encompas s the riva l or alternativ e model MI i f M\ i s able t o explai n th e finding s o f M 2 . Alternatively , if the riva l mode l doe s no t adequatel y characteriz e th e propertie s o f th e process generatin g the series , th e nul l model ough t t o b e abl e t o predic t the form o f mis-specification one woul d expect to fin d i f the riva l mode l were estimated. To pursu e th e las t point , suppos e a dat a serie s {Y t} i s well characterized b y a rando m wal k i n logarithm s wit h a stabl e drif t an d homo skedastic errors. Suppos e furthe r tha t thi s implies that regressin g AY , on a constan t woul d yiel d unstabl e estimate s an d heteroskedasti c errors . A simple initia l tes t woul d the n b e t o estimat e th e rando m wal k i n bot h logarithms an d level s an d se e whethe r th e model s displaye d th e pre dicted behaviour. 20 I f th e nul l model als o ha d prediction s t o offe r abou t 20 Th e processe s correspondin g t o 'rando m wal k i n logarithms ' an d 'rando m wal k i n levels' ar e Ay , = f t + £ , an d A Y, = fi 2 + v,, respectively.
194 Regressio
n wit h Integrated Variable s
the for m o f th e instabilit y o f th e parameters , th e tes t coul d b e sharpened b y testin g for th e presenc e o f particular kind s of misspecification—say, drif t o r variance s of errors increasing exponentially over time . In general , th e entir e argumen t shoul d als o b e ru n i n revers e b y taking the riva l mode l a s th e null ; however , linea r model s d o no t ensur e positive observations, so awkwar d issue s arise. We illustrat e thi s discussio n wit h th e tim e serie s analyse d i n Chapte r 1, namely real ne t nationa l produc t (Y, i n 192 9 £million) for th e Unite d Kingdom ove r 1872-197 5 (fro m Friedma n an d Schwart z 1982) . Th e approach follow s that in Ermini an d Hendry (1991) . First, w e mode l th e leve l o f ne t nationa l produc t ove r th e sampl e 1875-1975 b y OLS . Onl y on e lagge d differenc e wa s neede d t o remov e any residual serial correlation, yielding
where th e standar d error s o f coefficien t estimate s ar e show n i n paren theses, o i s th e equatio n standar d error , an d S C i s th e Schwar z criterion. (Smalle r value s on balanc e produc e preferabl e models. ) Sinc e the mea n o f Y i s 4701.0 , th e a a s a percentag e o f Y i s 3. 1 pe r cent . However, th e coefficient s ar e no t constan t ove r th e sampl e period , a s shown i n Fig . 6.1 fo r th e intercept , an d Fig . 6.2 fo r th e one-ste p residuals an d o . (Se e Hendr y (1989 ) fo r details.) 21 Th e intercep t trend s upwards, an d o increase s ove r time , eve n ignorin g the larg e shoc k i n 1919-20. O n an y constancy test, th e mode l i s rejected a t fa r beyon d th e 1 per cen t leve l (e.g. tha t of Hansen 1992) . Next w e mode l growt h i n logs . A s before , on e lagge d differenc e removed residua l seria l correlation, giving
21 Recursiv e estimatio n involve s estimatin g a n equatio n ove r successivel y large r sub samples, startin g fro m a minimu m sub-sampl e an d extendin g t o th e ful l sample . Paramete r instability ma y b e tracke d b y lookin g a t th e behaviou r o f th e estimate d coefficients , a s sample siz e i s increased , t o se e whethe r the y fluctuat e significantl y o r remai n stable . Recursive Cho w (1960 ) test s ma y b e compute d i n a t leas t tw o ways . Th e firs t involve s estimating th e equatio n from , say , t = 1 to ( = 7\ , wher e T l i s greater tha n th e minimu m sample size , an d the n fro m t = I t o t = T t + 1. The one-step-ahea d Cho w tes t is based on a compariso n o f th e residua l varianc e o f th e tw o estimate d equation s an d i s a n F-tes t under th e nul l o f paramete r constancy . A secon d tes t i s give n b y estimatin g th e equatio n from, say , t = 1 to ( = T } an d comparin g th e residua l varianc e o f this regressio n wit h tha t of th e equatio n estimate d ove r th e ful l sample . A sequenc e o f thes e Cho w test s i s built u p by augmentin g th e sub-sampl e siz e b y on e a t eac h step , e.g . T 1 + 1 t o 7 \ + 2, an d
Regression with Integrated Variables
195
FIG 6.1. Recursiv e estimate s o f intercept i n levels mode l
FIG 6.2. One-ste p residuals i n levels mode l
comparing th e residua l varianc e o f eac h o f thes e equation s wit h th e ful l sampl e residua l variance. Alternatively , th e sequenc e o f one-ste p residual s (o r forecas t errors ) ca n b e examined relative to the residua l variance a t eac h sampl e size.
196
Regression with Integrated Variables
The percentag e a i s 3. 3 pe r cen t bu t no w th e intercep t i s constan t a s shown i n Fig. 6.3 , an d littl e residual heteroskedasticity remain s (se e Fig . 6.4). Th e mode l fail s constanc y test s onl y prio r t o th e larg e shoc k i n 1919-20. Ermini an d Hendr y us e result s fro m Ermin i an d Grange r (1991 ) t o describe th e particula r for m o f instabilit y an d heteroskedasticit y on e would expec t i n th e mode l i n level s i f th e dat a wer e generate d b y th e logarithmic model . Ermin i an d Grange r sho w that , i f th e dat a ar e generated by with time-invarian t distribution Ay , ~ IN(jU , cr 2), an d i f th e riva l mode l is then E(AY t) =
<5exp(Af); var(A7, ) = 0exp[(2 A + o 2 ) t ] , wher e
0 = exp (2v0 ){l - 2ex p [-(A + a2)] + exp [-(2A + a2)]}; and Y0 = exp (y0 ) is the startin g observatio n postulate d fo r the mode l i n levels. Thus, i f th e logarithmi c mode l wer e true , th e mode l i n level s woul d have bot h a drift , (5exp(Af) , an d variance , 0exp[(2A + cr 2 )/], exponen tially increasin g with time. Further , i n the regressio n with A =j u + & 2/2, wher e f t an d a 2 ar e obtaine d fro m estimatin g (55) ,
1980
FIG 6.3. Recursiv e estimate s o f intercept i n log mode l
Regression wit h Integrated Variable s 19
7
MI (th e logarithmi c model ) encompasse s M 2 (mode l i n levels ) onl y if <5 *0 and y = 0. We no w appl y thei r tes t t o th e linea r mode l o f U K nationa l incom e over th e las t century . Becaus e o f the lagge d dependen t variabl e in (54) , the long-ru n solutio n provide s th e estimat e f t fo r \i i n th e Ermini Hendry test, namely
Thus, A t = 0.0191?; calculat e exp(Af ) an d ente r thi s a s a n additiona l regressor i n the linea r model . Th e empirica l outcome is
The coefficien t o n exp(Af ) i s significan t an d make s th e intercep t insignificant. Thi s resul t confirm s th e earlie r graphica l evidenc e o n th e inappropriateness o f th e linea r mode l agains t a log-linea r form . Finally , dropping the intercep t i n (57) ,
FIG 6.4. One-ste p residual s i n log model
198 Regressio
n wit h Integrated Variable s
Figure 6. 5 show s th e recursiv e estimate s o f < 5 fro m (58 ) ove r th e sample, an d reveal s greatl y reduce d evidenc e o f paramete r non-con stancy in using the exponentia l tren d relativ e t o a n intercept . These principle s ma y b e extende d t o decidin g whethe r i t i s the level s or th e logarithm s o f variables that ar e co-integrated . Thus , consider tw o 1(1) processe s X t > 0 and Z t > 0 between whic h there i s a co-integratin g relationship in levels: Defining th e transforme d serie s x, — log (Xt) an d it = log (Z,), we have Using a Taylor serie s expansion of the logarithmi c function, w e obtai n
from whic h w e ca n se e tha t th e term s i n th e summatio n wil l declin e i n importance a s Z, grows , sinc e b y (59 ) u t i s of fixe d variance , whil e th e variance o f Z t i s o f O(t). Henc e w e expec t t o fin d a n equilibriu m relation o f som e sor t amon g th e logarithm s o f variable s tha t ar e co-integrated i n levels . Asymptotically , thi s equilibriu m relatio n i s o f a degenerate kin d wit h th e distributio n o f x t — zt collapsin g aroun d logQ3). Thi s i s als o a testabl e predictio n o f th e hypothesi s tha t th e random wal k mode l i n level s encompasse s th e logarithmi c model, 22 although th e tes t i s likely to hav e lo w power becaus e th e varianc e in th e errors i s likely to persist eve n in fairl y larg e samples . Conversely, i f we begin with a co-integrating relationship betwee n two series whic h hav e alread y been transforme d t o logarithms, then th e relationshi p amon g the level s of the serie s is which implies
22 T o se e this , simpl y substitut e A r,_1 fo r Z, . Th e instabilit y o f th e rando m wal k mode l in level s mad e a forma l tes t i n th e level s — > logarithms directio n unnecessar y i n th e Ermini-Hendry discussion , althoug h i n principle suc h a test coul d be carrie d out .
Regression wit h Integrated Variable s
199
a
FIG 6.5. Recursiv e estimate s o f d
or
This n o longe r ha s th e for m o f a standar d co-integratin g relationship , since W t — kV, = V t(V®~lvt — k) = ry r ; whil e v , ma y remai n a stationary process, th e erro r ter m r\ t i n th e ne w relationshi p depend s o n th e integrated serie s V t an d i s therefor e no t stationar y i n general . N o co-integrating relationshi p ma y therefore appear , an d a regression o f th e form W, = kV t + r] t i s likely to displa y considerable instability . At th e sam e time , i t shoul d b e note d that , i n eithe r o f th e abov e examples, onl y on e o f th e logarith m an d th e leve l o f a variabl e wil l b e an integrate d proces s (capabl e o f bein g mad e stationar y b y differen cing), althoug h stationarit y o r non-stationarit y wil l b e commo n t o bot h representations. Th e standar d definitio n o f co-integration , whic h de scribes equilibriu m relation s amon g integrate d processes , can be legiti mately applie d t o onl y one o f the tw o cases at a time. The fac t remains , however , tha t a co-integratin g relationshi p amon g the level s o f variable s suggest s th e existenc e o f som e linear equilibriu m relationship amon g the logarithm s of those sam e variables. The convers e need no t i n general b e true .
200 Regressio
n wit h Integrate d Variable s
Appendix: Vecto r Browman Motio n Consider th e bivariat e 1(1) dat a generatio n proces s give n by:
The DG P i n (Al ) i s a re-parameterizatio n o f a genera l bivariat e norma l distribution fo r (Ay, , Az f ) wit h covarianc e JJCT ^ an d define s th e inte grated vector process : when x, = (v, : z,)' an d v , = (e lt + r)£ 2t, £21)'- The n v , ha s non-unit error variance matri x £:
As i n Chapte r 1 , a suitably scaled functio n o f x f converge s t o a vecto r Brownian motio n process , denote d BM(E) . W e firs t deriv e th e standardized Brownian motion b y the transform:
and s = Oi/o 2. The n m ( ha s a unit error varianc e matrix since:
Alternatively, fro m (A2 ) an d (A4) :
(A6) Next, usin g a componen t b y componen t analysi s simila r t o tha t i n Chapter 3 , fro m (A5) :
where B(r ) = (#i(r), B 2(r))' (denote d BM(I)) , an d th e fl,-(r ) ar e th e standardized Wiene r processe s associate d wit h accumulatin g th e {e it}. Further:
Regression wit h Integrated Variable s
201
These vecto r formula e ar e natura l generalization s o f th e scala r Wiene r processes i n Chapter 3 . Scalar function s o f vecto r 1(1 ) variable s ca n b e handle d a s follows . Consider th e distributio n o f th e differenc e betwee n y t an d z t, namel y ut = d'xt fo r d' = (1, -1). The n fro m (A4):
202
Regression wit h Integrated Variable s
(A10) By direct calculatio n fro m (Al ) however ,
and W(r ) i s the Wiene r proces s associate d wit h {n^/a,,,} . B y definition, w t ~ £ it + (> ? ~ 1) £ 2«> s o tha t cr lv W(r) = OiB^r) + (r] - \}o 2B2(r), an d hence th e expression s i n (A10 ) an d (All ) are equal , bu t provid e different insight s into th e behaviou r o f the scala r second moment . Similarly, le t f = (1,0 ) s o tha t f'e t = EK/CTI , the n w e ca n deriv e a covariance suc h as:
Returning t o th e standardize d vecto r Brownia n motion , le t V(r) = (V^i(r) , V 2(r))' (whic h is BM(i:)) be associate d wit h the accumu lation o f {v,} . No w Vi(r) an d V 2(r) ar e no t independen t sinc e E(vltv2t) ¥= 0. The standardize d vecto r Brownia n motio n is B(r) = K'V(r) where K' i s defined i n (A4). Multiplyin g out, w e have : 2(r).
(A13 )
Indeed, i f w e conditio n v 1( o n v 2t (whic h generate s £ 1;) an d le t Vi. 2(r) be th e associate d "conditional " unstandardize d Wiene r process , the n
Regression wit h Integrated Variable s 20
3
and V 2(r) ar e independent . Becaus e £ lr = v 1( - £ r (v lr |v 2r ) = M we see that Vj. 2 (r) = Vi(r) - riV 2(r) = aiB^r) fro m (A13) . Finally, conside r a n expression o f the form :
Then the erro r covarianc e matri x is added on if the cross-produc t unde r analysis i s a contemporaneou s rathe r tha n a lagge d on e (se e th e appendix t o Chapte r 7 fo r a n extension) . Phillip s an d Durlau f (1986 ) and Phillip s (19886) provide proofs and generalizations.
7
Co-integration in Individua l Equations We firs t examin e method s o f testin g fo r co-integratio n vi a stati c regressions, an d provid e simulatio n estimate s o f th e uppe r percen tage point s o f th e distribution s o f statistic s use d i n th e tests . Next , we look a t th e propertie s o f the estimator s derive d fro m suc h stati c regressions. I n particular , w e focu s o n th e finite-sampl e biase s i n the estimate s o f co-integratin g vector s an d th e power s o f test s t o detect co-integration . Finally , w e conside r modifie d estimator s an d dynamic models . I n Chapte r 8 , system s method s o f estimatin g co-integrating relation s wil l be considered . The previou s chapte r focuse d o n th e propertie s o f co-integrate d pro cesses an d th e implication s o f modellin g wit h co-integrate d variables . We hav e discusse d th e 'super-consistency ' o f th e coefficien t estimate s i n the static o r co-integratin g regression , balance d an d unbalance d regres sions, an d th e distribution s o f th e statistic s commonl y use d t o tes t fo r the significanc e of regression coefficients . The tw o issues o f being abl e t o tes t fo r th e existenc e o f an equilibriu m relationship amon g variable s an d t o accuratel y estimat e suc h a relation ship ar e complementary . Indeed , a s demonstrate d i n discussin g spuriou s regressions i n Chapte r 3 , stati c regression s amon g integrate d serie s ar e meaningful i f an d onl y if they involve co-integrate d variables . Thus , i t i s of interes t t o discover , first , ho w wel l th e mos t frequentl y use d test s of co-integration perform , an d second , ho w accuratel y th e correspondin g equilibrium relationship i s estimated. The objectiv e o f thi s chapte r i s t o develo p test s applicabl e t o singl e equations whic h ma y b e use d t o detec t a long-ter m relationshi p o f th e form discusse d an d exploite d i n earlie r chapters . W e als o attemp t t o formulate som e recommendation s fo r efficien t estimatio n o f co-integrat ing parameter s an d testin g fo r co-integratio n i n finit e samples . I t wil l become clea r fro m th e discussio n that th e asymptotic propertie s o f static regression estimator s ar e ofte n rathe r differen t fro m thei r behaviou r i n empirically relevan t sampl e sizes . Further , lac k o f wea k exogeneit y du e to co-integratin g vector s enterin g severa l equation s als o alter s finit e sample behaviour . I t therefor e become s important , i n th e fac e o f dat a
Co-integration i n Individua l Equation s 20
5
limitations, t o conside r alternativ e method s which do not rel y exclusively on single-equatio n stati c regressions . Thes e ar e th e topi c o f Section s 7-9.
7.1. Estimatin g a Single Co-integratin g Vector Consider th e proble m o f estimatin g th e singl e co-integratin g vector a using the stati c mode l We conduc t th e discussio n i n thi s an d th e followin g section s i n thre e stages. First , w e elaborat e upo n th e theorem s presente d i n Chapte r 5 and develo p a n intuitiv e discussio n o f stati c regressions . Next , w e proceed t o th e issu e of testing for co-integratio n using static regressions . The testin g an d th e parameterizatio n o f the equilibriu m relationship ar e seen t o b e complementar y exercises . Finally , w e discus s simulatio n studies whic h cas t ligh t o n th e behaviour , i n finit e samples , o f th e static-regression estimator s an d th e power s o f th e test s fo r co-integra tion. In orde r t o kee p th e analysi s a s tractabl e a s possible , w e wil l restric t ourselves to considering CI(1,1 ) systems . Thus , suppos e tha t all the elements i n x, are 1(1). I n general , then , an y linear combination 6'x t o f the element s o f x ( wil l produc e a n 1(1 ) serie s u t. The onl y exception , if one exists , i s a co-integrating vector a suc h tha t «'x r i s 1(0).1 Ordinar y least square s minimize s th e residua l varianc e o f x t , an d therefor e a simple OL S regressio n o f th e for m (1 ) shoul d provid e a n excellen t approximation t o th e tru e co-integratin g vecto r whe n on e exists , a s discussed i n Chapte r 5 . The simplicit y o f thi s metho d an d th e eleganc e o f th e theoretica l argument hel p explai n th e popularit y o f suc h regressions . Al l tha t i s needed t o parameterize a long-run equilibriu m relationshi p amon g a set of variable s i s a stati c OL S regression . Thi s regressio n i s performe d a s the firs t ste p o f th e Engle-Grange r two-ste p estimator 2 an d serve s a s a preliminary chec k o n th e equilibriu m relationship s postulate d b y eco nomic theory to exist amon g the variables. 1 Initiall y w e focu s o n th e cas e wher e (apar t fro m normalization ) th e co-integratin g vector a i s uniqu e an d i s therefor e o f dimensio n n x 1 . A s th e analysi s i n Ch . 5 showe d (especially th e discussio n o f th e Grange r Representatio n Theorem) , thi s i s clearl y a restrictive assumptio n t o make . I n general , ther e wil l exis t r co-integratin g vectors , O^s r s n — 1, an d whe n gathere d i n a n array , th e matri x a wil l b e o f orde r n x r . Th e problem of estimatin g co-integratin g vector s i n system s is considered i n Ch . 8 . 2 Th e two-ste p estimato r an d it s asymptoti c propertie s ar e discusse d i n Ch . 5 . Th e general cas e i s derived b y Engle an d Grange r (1987 : 262, Theorem 2) .
206 Co-integratio
n i n Individua l Equation s
However, ther e ar e reason s fo r preferrin g alternative s t o th e simpl e static regressio n in sample s o f the siz e typica l i n economics. This chapte r will conside r dynami c regressio n method s an d modifie d estimators . These technique s hel p to reduc e or eliminat e source s of finite-sampl e biases whic h aris e fro m stati c estimation , an d whic h ca n b e ver y substantial i n practice.
7.2. Test s fo r Co-integration i n a Single Equatio n The simples t test s fo r co-integratio n propose d b y Engl e an d Granger , test fo r th e existenc e o f a uni t roo t i n th e residual s o f th e stati c regression. Th e method s o f Chapte r 4 ca n therefor e b e followe d wit h minor modifications . W e firs t conside r th e bivariat e case , wher e
*t = (yt,z ty.
The modification s are necessar y because, whil e the test s for uni t root s discussed i n Chapte r 4 us e th e origina l series , sa y {w t}, th e co-integra tion test s ar e base d o n th e estimated, o r derived, residual series ,
Hence, a s th e co-integratin g regressio n estimate s y 3 before th e tes t i s performed, th e co-integratio n tes t i s not simpl y a standar d test fo r a unit root i n the series u t. If / J wer e know n i n th e exampl e presente d i n Chapte r 5 (give n b y equations (5.1)-(5.6)) , th e nul l hypothesi s o f n o co-integration , cor responding t o p equa l t o 1 , coul d b e teste d b y constructin g th e serie s ut = y t — [3zt, treating thi s series a s the on e tha t ha s th e uni t roo t unde r the null , an d usin g the Dickey-Fulle r tables . However , i f / ? is unknown, it mus t b e estimate d (e.g. ) fro m th e stati c regressio n o f y t o n z t- Th e test is based on the nul l hypothesis of no co-integration , with the critica l values fo r th e tes t statistic s calculate d t o ensur e th e appropriat e prob ability of rejection of th e nul l hypothesis. Some o f th e mos t widel y use d test s o f co-integratio n hav e bee n th e co-integrating regression Durbin-Watson tes t (CRDW) , th e Dickey Fuller tes t (DF) , an d the augmente d Dickey-Fuller test (ADF) . The CRDW , suggeste d b y Sarga n an d Bhargav a (1983) , i s compute d in exactl y the sam e fashion as the usua l DW statisti c and i s given by
where u t denotes the OLS residual fro m the co-integrating regression . The nul l hypothesi s bein g tested , usin g th e CRD W statistic , i s o f a single uni t root : tha t is , u t i s a rando m walk . Thi s i s t o b e contraste d
Co-integration i n Individual Equations 20
7
with th e conventiona l us e mad e o f thi s statisti c i n standar d regressio n analysis where the nul l of no first-order autocorrelation i s tested. The us e of this statistic is problematic i n the presen t setting . First , th e test statisti c fo r co-integration depend s upo n th e numbe r of regressors in the co-integratin g equation and , mor e generally , o n th e data-generatio n process an d henc e o n th e precis e dat a matrix . Onl y bound s o n th e critical value s ar e available. 3 Second , th e bound s diverg e a s the numbe r of regressors i s increased , an d eventuall y ceas e t o hav e an y practica l value fo r th e purpose s o f inference . Finally , th e statisti c assume s th e null wher e u t i s a rando m walk , an d th e alternativ e wher e u t i s a stationary first-orde r autoregressiv e process . I n suc h circumstances , Bhargava (1986 ) demonstrate s tha t i t ha s excellen t powe r propertie s relative t o alternativ e tests . However , th e tabulate d bound s ar e no t correct i f ther e i s higher-orde r residua l autocorrelation , a s wil l com monly occur . Exac t inference i s therefor e possibl e i f an d onl y i f eac h regression exercis e i s augmented b y the us e o f algorithms such as that of Imhof (1961 ) t o cpmput e th e relevan t critica l values . I n principle , i t i s possible fo r simulatio n method s t o b e use d t o comput e th e critica l values. However , i n practic e thi s implie s a proliferatio n o f table s o f different critica l value s fo r differen t data-generatio n processe s an d simulation exercises . As w e hav e argue d previously , th e onl y hop e fo r uncomplicate d inference lie s in generatin g a robus t se t o f critica l values. Robustnes s i s defined b y lac k o f sensitivit y o f th e critica l value s t o a wid e rang e o f changes t o th e data-generatio n process . Test s that ar e simila r for a wide range o f nuisanc e parameters woul d ensur e thi s non-sensitivity . In othe r words, i t i s importan t t o hav e a se t o f tables tha t coul d b e use d regardless o f th e precis e propertie s o f th e DGP , a s lon g a s th e regression mode l i s parameterized t o satisf y certai n basi c properties suc h as balance . Test s o f co-integratio n base d no t directl y o n th e residual s but o n th e regressio n coefficient s themselves , migh t have highe r power . As a n alternativ e method , on e coul d conside r usin g non-parametri c corrections o f the sor t describe d i n Chapte r 4 to conduc t inferenc e usin g only a smal l se t o f tables , fo r a rang e o f possibl e data-generatio n processes. Example s o f bot h thes e procedure s wil l b e presente d i n du e course. Similar qualification s appl y to th e us e o f the D F statisti c and less so to the ADF , i f the numbe r o f Aw r _, term s appearin g i n the data-generation process coincide s wit h thos e use d i n th e implementatio n o f th e test . Since th e numbe r o f suc h term s appearin g i n th e DG P i s unknown , it seems safes t t o over-specif y th e AD F regression , an d us e a s man y 3 Whil e th e CRD W statisti c doe s no t hav e a limitin g distributio n wit h a non-zer o variance, T(CRDW ) = J~ l ^ = 2(u, - u,^) 2/T-2 £f= i«r 2 does .
208 Co-integratio
n i n Individua l Equations
lagged term s a s degrees-of-freedo m restrictions wil l allow . O f course , i n practice, th e choic e o f the la g structure i n ADF test s ma y be a d hoc an d different result s ca n b e obtaine d b y changin g th e lengt h o f th e auto regression. I n particular , th e powe r o f th e tes t ma y b e affecte d ad versely. Table 7. 1 provides , fo r illustratio n ( a mor e detaile d descriptio n o f applicable critica l value s wil l b e give n below) , th e 5 pe r cen t critica l values o f th e DW , ADF(l) , an d ADF(4 ) tests , fo r thre e sampl e size s (T = 50, 100 , 200) . Th e data-generatio n process i s a n «-variat e rando m walk wit h n less tha n o r equa l to 5 , as in Engle an d Yo o (1987) . It i s importan t t o emphasiz e that , i n commo n wit h th e test s fo r uni t roots, test s fo r co-integratio n ma y lac k powe r t o discriminat e betwee n unit root s an d borderline-stationar y processes. I n a small-scal e stud y of the powe r propertie s o f thi s test , Engl e an d Grange r (1987 ) sho w that , when th e data-generatio n proces s o f th e disturbance s o f the co-integrat ing equatio n i s a n AR(1 ) proces s wit h th e autoregressiv e paramete r equal t o 0.9 , th e power s o f the CRDW , DF , an d AD F test s a t th e 5 per cent critica l value s ar e 20 , 15 , an d 1 1 per cen t respectively . Whe n th e DGP i s altered t o b e a more genera l AR(1 ) proces s wit h a unit root , th e power o f th e AD F tes t become s 6 0 per cent , dominatin g strongl y bot h the power s of the CRD W an d D F test s a t the 5 per cen t level. Engle an d Grange r (1987 ) emphasiz e th e robustnes s t o change s in th e data-generation proces s o f th e AD F critica l values . Th e discussio n i n Chapter 4 help s t o explai n thi s result . Phillip s an d Ouliari s (1990 ) sho w that th e limitin g distribution of the AD F tes t statisti c is the sam e a s tha t of th e non-parametricall y adjuste d D F statistic . Becaus e th e limitin g distribution o f th e latte r statisti c i s invarian t t o nuisanc e parameter s i n the processe s generatin g th e dat a series , th e resul t follows . Eac h tes t manages t o correc t fo r variou s features that ma y be presen t i n the DGP , in on e cas e b y capturin g th e effect s i n a regressio n model , i n th e othe r by implicitl y adjusting th e critica l values. Phillips an d Ouliari s (1990 ) deriv e th e distribution s of severa l test s o f co-integration. W e clos e thi s sectio n b y presentin g a summar y o f th e theoretical result s presente d there . The y conside r th e linea r co-integrating regressions :
and
where y, an d z t satisf y (multivariate ) unit-roo t processes . Th e asymp totic distribution s o f a numbe r o f residual-base d test s ar e discussed , from whic h we wil l conside r fiv e (thi s analysi s is of cours e relate d t o th e
Co-integration i n Individual Equations 20
9
TABLE 7.1. Fiv e pe r cen t critica l value s fo r th e co-integratio n test s n
T
CRDW
ADF(l)
ADF(4)
2
50 100 200
0.72 0.38 0.20
-3.43 -3.38 -3.37
-3.29 -3.17 -3.25
3
50 100 200
0.89 0.48 0.25
-3.82 -3.76 -3.74
-3.75 -3.62 -3.78
4
50 100 200
1.05 0.58 0.30
-4.18 -4.12 -4.11
-3.98 -4.02 -4.13
5
50 100 200
1.19 0.68 0.35
-4.51 -4.48 -4.42
-4.15 -4.36 -4.43
Source: Th e CRD W critica l value s (se e Sarga n an d Bhargav a 1983 ) an d th e ADF(l) critica l value s were generate d b y PC-NAIV E usin g 10,00 0 replications . The ADF(4 ) critica l value s hav e bee n take n fro m Engl e an d Yo o (1987) . Th e ADF critica l value s ar e compute d b y replicatin g th e regressio n AM , = pu,-i + 2f =1 <£;A«,_i + v, fo r k = 1,4, followin g estimation o f /3 i n (2 ) augmente d b y a constant.
analysis of unit-root test s foun d i n Chapter 4) : (i) Dickey-Fulle r p DF(p)= Tp , wher e p i s obtaine d fro m th e regressio n AM, = pu,-i + fj t', (ii) Dickey-Fulle r t (DF ) DF(f) = t p= 0 i n the regressio n AM , = pw,_ i + r) t; (iii) augmente d Dickey-Fulle r (ADF )
(iv) Phillip s (1987a ) Z
where
210 Co-integratio
n i n Individual Equation s
where a)((j) = 1 — /(€ + I)" 1 fo r som e choic e o f la g window €, an d p an d th e fj t ar e derive d from th e D F regressio n give n in (i) ;
with 57 ^ an d S 2 a s in (iv ) an d p an d th e j) , ar e agai n derive d fro m the D F regressio n give n in (i). Some propertie s o f thes e test s ma y no w b e enumerated . First , unde r the maintaine d hypothesi s o f n o co-integration , th e distribution s o f Z p and Z t (p=Q), fo r an y genera l specificatio n o f th e erro r proces s {r\ t}, ar e the sam e a s thos e o f DF(p ) an d DF(f ) respectively , whe n th e distribu tions ar e compute d unde r th e restrictiv e assumptio n o f II D errors . Th e distributions o f Z p an d Z f ( p= 0 ) ar e independen t o f nuisanc e parameter s (leading t o asymptoticall y similar tests) , althoug h the y do depen d o n th e number o f regressors i n the system ; thus, th e non-parametri c correction s serve th e sam e rol e i n th e contex t o f co-integratin g regression s a s the y do i n unit-roo t tests : the y eliminat e nuisanc e parameter s an d enabl e th e use o f a standar d se t o f Dickey-Fulle r tables . Correction s mus t stil l b e made fo r si/e i n the origina l Dickey-Fulle r table s t o preven t over-rejec tion o f th e nul l hypothesis . Som e o f th e table s appea r i n Phillip s an d Ouliaris (1990) . Second, th e AD F tes t an d Z ( ( p= 0 ) hav e th e sam e asymptoti c distribu tion. Thi s i s a n interestin g resul t becaus e i t re-emphasize s th e tw o alternative bu t equivalen t way s o f takin g accoun t o f nuisanc e para meters. I n orde r t o us e a standard set o f tables, on e eithe r augment s the Dickey-Fuller regressio n o r adjusts , non-parametrically , th e unaug mented Dickey-Fulle r statistic . Third, i f the statistic s ar e base d o n a regressio n wit h a fitte d intercep t or tim e trend , th e interpretatio n o f th e test s i s not altere d althoug h th e asymptotic critica l values change . Thi s issu e i s considered i n mor e detai l in th e nex t section . Fourth, i f th e non-parametricall y adjuste d statistic s wer e constructe d by imposin g p = 0 an d therefor e usin g th e v t, where v l = ut — ut-i, in the tes t statisti c instea d o f th e fj t, th e statistic s woul d hav e th e sam e asymptotic distributio n unde r th e null ; however , a s show n b y Phillip s and Ouliari s (1990) , thes e woul d have inferio r powe r properties . Finally, a n alternativ e clas s o f test s o f co-integratio n no t base d o n regression residual s ha s bee n propose d i n th e literature . Prominen t among the test s ar e thos e du e t o Johanse n (1988 ) an d Stoc k an d Watso n (1988ft). Thes e test s als o appl y t o multivariat e system s o f equation s an d have thei r mos t natura l use s whe n investigatin g multiple co-integratin g
Co-integration i n Individual Equations 21
1
vectors. A discussio n o f th e Johanse n maximu m likelihoo d procedur e appears i n the nex t chapter .
7.3. Respons e Surface s fo r Critical Values When compare d wit h the correspondin g critica l values for unit-root test s given i n Chapte r 4 , th e critica l value s i n Tabl e 7. 1 are illustrativ e of th e changes in test level s implied b y the presenc e o f estimated parameter s in the relationshi p yieldin g th e serie s t o b e teste d fo r stationarity . I n themselves, however , they cover onl y a limited set of cases. Othe r table s are provide d i n Engl e an d Yo o (1987 ) an d Phillip s an d Ouliari s (1990) . MacKinnon (1991 ) provide s result s o f a mor e extensiv e se t o f simula tions, summarize d i n response surfaces: tha t is , critica l value s fo r particular test s ar e give n a s a se t o f parameter s o f a n equatio n relatin g the exac t critica l valu e t o a constan t ter m an d term s involvin g sampl e size, fro m whic h a critica l valu e fo r an y give n sampl e siz e ca n b e approximated. W e will describe th e latte r results . Dickey-Fuller (o r augmente d Dickey-Fuller ) test s fo r uni t root s o r co-integration ca n b e considere d withi n a commo n framework . Conside r n tim e serie s give n b y y lt, y 2t, • . ., y nt, n ^ 1 , t = 1, 2 , . . ., T . I f n — 1, w e ar e testin g for a uni t root i n a singl e series , an d t o establis h a uniform notation , w e defin e th e tim e serie s unde r tes t a s {«JJ°=i = {yit}?=i- I f n > 1, we ar e firs t intereste d i n obtainin g a se t of residuals fro m th e estimate d relationshi p amon g th e n variables , an d s o begin with the (static ) co-integrating regression,4
Let y t = (y\t, y 2t, • • -, y nt) b e th e vecto r o f measurement s a t tim e t o n the n variables . Th e serie s t o b e teste d fo r stationarit y the n become s ut = [I :-/J']y r , wher e J3 ' i s th e vecto r o f estimate d parameters . Subjec t to th e relevanc e o f th e normalize d variable , th e orderin g o f variables in the co-integratin g regression wil l not affec t th e asymptoti c distribution of the tes t statistic , althoug h i n finit e sample s th e valu e wil l depen d upo n which variabl e i s th e regressand . Th e nul l hypothesi s o f n o co-integra tion implie s tha t u t i s 1(1). We test thi s null using the test s considered i n Chapter 4 . I n particular, the augmente d Dickey-Fuller test take s the for m o f one o f the followin g 4 Below , th e parameter s <5 respectively.
0
an d t> [ wil l denot e coefficient s o n a constan t an d trend ,
212 Co-integratio
n i n Individua l Equations
models, with € chosen t o eliminat e any autocorrelatio n i n th e residuals :
For n s = 2, s o tha t ' a co-integratin g regressio n precede s th e us e o f on e of thes e models, 5 mode l (6 ) coul d als o b e use d wit h constant an d trend , adding <5 0 o r 6 0 + d^t t o th e regression . Co-integratio n test s eithe r include a constan t i n (6) , o r includ e a constan t i n th e regressio n mode l (76). I f a constant i s added t o (6 ) an d mode l (la) i s used, the strateg y is equivalent t o omittin g th e constan t ter m an d usin g mode l (lb); i f constant an d tren d ar e adde d t o (6 ) an d mode l (la) i s used, the n thi s is equivalent t o usin g mode l (7c) , an d s o on . Th e mode l typ e referre d t o in Tabl e 7. 2 describe s thi s presenc e o r absenc e o f constan t an d tren d i n the models . A tes t wit h constan t bu t n o trend , fo r example , implie s model (lb) wit h n o constan t i n th e co-integratin g regressio n (6) , o r a constant i n (6 ) used wit h mode l (la). The critica l values , o r uppe r quantile s o f th e distributions , ca n b e calculated fro m th e parameter s o f Table 7.2 using the relatio n where C(p) i s th e p pe r cen t upper-quantil e estimate . Th e parameter s were estimate d fro m regressio n ove r a se t o f individua l simulatio n results covering , fo r mos t value s o f n , 4 0 sets o f parameter s fo r eac h o f 15 sampl e sizes . Mode l (8 ) (wit h a n adde d erro r term ) wa s foun d t o represent wel l th e variou s critica l value s tha t emerge d fro m th e man y individual experiments ; bu t othe r model s coul d i n principl e hav e bee n used t o fi t a response surfac e t o th e results ; see MacKinno n (1991 ) fo r a description o f the experimenta l technique , includin g the feasibl e general ized leas t square s techniqu e b y whic h estimatio n o f th e fina l respons e surface mode l wa s undertaken, t o allo w for heteroskedasticit y in (8) . As a n example , th e estimate d 1 pe r cen t critica l valu e fo r 15 0 observations, n = 6 an d constan t + trend include d in th e mode l i s given 5
I f n 3 = 2 bu t th e value s o f th e parameter s i n / J ar e known , the n th e residual s u, = [1: — P']y, ca n b e constructe d withou t a co-integratin g regression, an d th e tes t statisti c is interprete d a s i f n wer e equa l t o unity . I n thi s cas e w e hav e on e know n serie s o f observations to b e teste d fo r stationarity , not a series constructed o n th e basi s of estimated parameters. Unde r th e nul l of no co-integration , however , (6 ) is a spurious regression s o fi has a non-degenerat e limitin g distribution , whic h induce s differen t critical value s fro m D F tests.
Co-integration in Individual Equation s 21
3
TABLE 7.2. Respons e surface s fo r critica l values of co-integration tests n
Model
Point (% )
000
SE
0i
02
1
No constant , no tren d Constant, no trend
1
Constant + tren d
2
Constant, no tren d
2
Constant + tren d
3
Constant, no trend
3
Constant + tren d
4
Constant, no tren d
4
Constant + tren d
5
Constant, no trend
5
Constant + tren d
6
Constant, no tren d
6
Constant + tren d
-2.5658 -1.9393 -1.6156 -3.4336 -2.8621 -2.5671 -3.9638 -3.4126 -3.1279 -3.9001 -3.3377 -3.0462 -4.3266 -3.7809 -3.4959 -4.2981 -3.7429 -3.4518 -4.6676 -4.1193 -3.8344 -4.6493 -4.1000 -3.8110 -4.9695 -4.4294 -4.1474 -4.9587 -4.4185 -4.1327 -5.2497 -4.7154 -4.4345 -5.2400 -4.7048 -4.4242 -5.5127 -4.9767 -4.6999
(0.0023) (0.0008) (0.0007) (0.0024) (0.0011) (0.0009) (0.0019) (0.0012) (0.0009) (0.0022) (0.0012) (0.0009) (0.0022) (0.0013) (0.0009) (0.0023) (0.0012) (0.0010) (0.0022) (0.0011) (0.0009) (0.0023) (0.0012) (0.0009) (0.0021) (0.0012) (0.0010) (0.0026) (0.0013) (0.0009) (0.0024) (0.0013) (0.0010) (0.0029) (0.0018) (0.0010) (0.0033) (0.0017) (0.0011)
-1.960 -0.398 -0.181 -5.999 -2.738 -1.438 -8.353 -4.039 -2.418 -10.534 -5.967 -4.069 -15.531 -9.421 -7.203 -13.790 -8.352 -6.241 -18.492 -12.024 -9.188 -17.188 -10.745 -8.317 -22.504 -14.501 -11.165 -22.140 -13.641 -10.638 -26.606 -17.432 -13.654 -26.278 -17.120 -13.347 -30.735 -20.883 -16.445
-10.04
1
1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10
0.0 0.0 -29.25 -8.36 -4.48 -47.44 -17.83 -7.58 -30.03 -8.98 -5.73 -34.03 -15.06 -4.01 -46.37 -13.41 -2.79 -49.35 -13.13 -4.85 -59.20 -21.57 -5.19 -50.22 -19.54 -9.88 -37.29 -21.16 -5.48 -49.56 -16.50 -5.77 -41.65 -11.17 0.0 -52.50 -9.05 0.0
Source: MacKinno n (1991) . W e ar e gratefu l t o Jame s MacKinno n fo r permis sion t o reproduce thi s table.
214 Co-integratio
n i n Individua l Equations
by -5.512 7 - 30.735/15 0 - 52.50/150 2 = -5.7199. Estimate d standar d errors fo r finite-sampl e critical values such a s this are generall y less than those reporte d fo r $«, , althoug h MacKinno n argue s tha t thes e ma y understate th e tru e standar d errors b y roughly a factor o f 2.
7.4. Finite-sampl e Biase s in OL S Estimates In th e nex t chapte r w e will consider system s estimation o f co-integrating vectors. Here , w e wil l examin e on e o f th e mai n reason s fo r usin g such an estimatio n strategy : th e larg e finite-sampl e biase s tha t ca n aris e i n static OL S estimate s o f co-integratin g vectors o r parameters . Whil e such estimates ar e super-consisten t (T-consistent) , Mont e Carl o experiment s nonetheless sugges t tha t a larg e numbe r o f observation s ma y b e neces sary before the biase s becom e smal l (see Banerjee e t al, 1986) . Some investigator s have suggeste d that w e ma y explai n th e finding s of such Mont e Carl o studie s b y th e fac t tha t th e particula r data-generation processes considere d wer e to o specific , o r possesse d som e specia l properties, whic h meant tha t th e probabilit y o f findin g larg e biase s wa s unusually high . Thi s poin t i s partl y valid , i n tha t eac h DG P ca n b e regarded a s specifi c i n som e way . Moreover, i t i s certainl y tru e that , with sufficien t patience , th e exac t expression s fo r th e stati c biases coul d be worke d ou t fo r an y data-generatio n process , a s function s o f th e parameters o f th e DGP . However, th e poin t i s no t tha t som e o f thes e data-generation processe s ar e mor e likel y t o lea d t o hig h biase s whil e others wil l giv e lowe r values , bu t rathe r that , i n th e absenc e o f information o n th e data-generatio n process , som e metho d othe r tha n static regression ma y giv e superio r estimate s o f th e co-integratin g vector or test s wit h highe r powers . I n particular , dynami c regressions ma y b e more robus t t o a rang e o f data-generatio n processes . Eve n wher e stati c regressions behav e pooil y i n finit e samples , dynami c regression s ma y provide u s with quit e goo d estimates . Sinc e the investigato r is in genera l unaware o f the particula r properties o f the data-generatio n process (suc h as whethe r i t wil l ten d t o lea d t o lo w biase s o r hig h biases) , i t make s sense to allo w the regressio n to be as flexible as possible. Robustnes s in the sens e o f adequate performanc e fo r a wide range o f underlying DGPs is an importan t property . Most o f th e evidenc e tha t ha s bee n presente d i n favou r o f th e existence o f finite-sampl e biases ha s com e i n th e for m o f Mont e Carl o experiments; we present tw o investigation s of the bia s properties of OL S estimators. B y specifyin g th e data-generatio n process , Mont e Carl o experiments provid e complet e knowledg e an d contro l o f th e feature s of interest; i n particular , i n th e presen t case , w e kno w th e co-integratin g
Co-integration i n Individua l Equations 21
5
parameter. Performin g regressions o n th e artificia l data, whil e notionally ignoring th e data-generatio n process , put s u s i n th e positio n o f th e empirical investigator ; however , w e are the n able t o compar e ou r result s with th e tru e parameters , fo r a set o f chosen exampl e cases . The firs t experimen t considere d use s th e data-generatio n proces s
The vecto r (e lf , e 2t)' i s distribute d identicall y an d independentl y a s a bivariate norma l with
The structur e o f th e DG P i s th e sam e a s tha t o f Engl e an d Grange r (1987). Three case s o f interest ma y be distinguished . In cas e A, p\\ < 1, |p2| < 1 so tha t bot h z an d y ar e 1(0 ) variables . I n cas e B , PI = p2 = 1 so tha t bot h variable s ar e 1(1 ) an d ar e no t co-integrated . W e wil l concentrate o n cas e C , wher e P I = 1 , p 2| < 1 , so tha t th e variable s ar e still 1(1 ) bu t ar e no w co-integrated . I n thi s las t case , th e co-integratin g coefficient i s -2 . For cas e C , th e nul l hypothesi s o f a uni t root i n the erro r dynamic s in (10) i s false. Interes t therefor e lie s i n investigatin g the usefulnes s of th e estimate o f th e co-integratin g parameter i n th e stati c regression o f y t o n zt an d als o i n checkin g th e abilit y o f unit-roo t test s t o rejec t th e fals e null o f non-stationarity . W e us e 500 0 replication s o n th e paramete r space s x T X PJ; s = a^ja-i = (16, 8 , 4 , 2 , 1 , \), T = (25 , 50 , 100 , 200) , and p 2 = (0.6 , 0.8 , 0.9 ) givin g ris e t o 7 2 experiments. Th e rang e o f th e ratio o f standar d deviations , th e significanc e o f whic h w e wil l describ e below, i s ver y large . Obviously , i t woul d b e difficul t t o distinguis h between o 2u an d o 2v whe n Oi an d T ar e smal l an d o 2 an d p 2 ar e large ; for larg e value s o f s , OL S essentiall y pick s u p equatio n (9 ) instea d o f equation (10) . The proble m of finite-sampl e biase s is illustrate d in the figures . Figures 7.1(a)-7.4(a ) refe r t o th e simples t for m o f stati c mode l whic h contains n o constant , whil e Figs . 7.1(£>)-7.4(£> ) pertai n t o stati c model s which d o contai n constan t terms . Th e figure s sho w th e relationshi p between bia s an d sampl e siz e fo r fou r differen t value s o f th e rati o o f standard deviations . Th e horizonta l scal e i s implicitly Iog 2 (T/25) s o tha t the fou r point s show n ar e equidistant . Firs t o f all , i t i s eviden t tha t th e bias doe s no t declin e a t rat e T . Fo r example , i n Fig . 7.4(a ) (ol/o2 = 0.5), wit h p 2 = 0.6, th e bia s a t T = 2 5 i s 0.45 , a t T = 50 is 0.32, a t T = 100 i s 0.21 , an d a t T = 200 i s 0.13 . Thus , a n eightfol d increase i n sampl e siz e reduce s th e bia s b y a facto r o f approximatel y
216
Co-integration in Individual Equations
Sample size
Fio7.1(a). N o constant in model, estimate d bias v. sample size, s = 16
Sample size Fio7.1(&). Constan t i n model, estimate d bias v . sampl e size, s = 16 3.5. A s anothe r example , w e se e i n Fig . 7.2(a ) (01/02 = 4), wit h p2 = 0.6, th e biase s a t th e sam e se t o f sampl e size s ar e 0.017 , 0.010 , 0.005, 0.0026. 6 Her e a n eightfol d increas e i n sampl e siz e reduce s th e 6
Thes e number s ar e take n fro m th e experimenta l outpu t rathe r tha n rea d fro m th e figures. Th e standar d erro r o f th e smalles t o f these number s i s roughly 5 x 10~ 5.
Co-integration i n Individual Equation s
217
Sample size
Fio7.2(a). N o constant in model, estimate d bias v. sampl e size , s = 4
Sample siz e
FIG 7.2(6). Constan t in model, estimate d bias v . sampl e size , s = 4
bias b y a facto r o f 6.5 . Usin g a standard-deviation ratio o f 4 again but a value o f p 2 = 0.9, the biase s ar e 0.04 , 0.024, 0.014, an d 0.008 , a fivefol d decrease i n bias . Th e rat e o f declin e o f th e bia s i s alway s faster tha n but no t a s fast a s T fo r sampl e sizes up t o 200. Second, th e biase s increas e uniforml y i n pi an d decreas e uniforml y i n
Co-integration i n Individual Equation s
Sample si/.e
Fio7.3(a). N o constant i n model, estimated bia s v . sampl e size, s = I
Sample size
FIG 7.3(6). Constan t i n model, estimate d bia s v . sample size .
01/02- T o understan d this , we can rewrite (9 ) and (10 ) t o ge t
Co-integration in Individual Equations
219
Sample size
Fio7.4(a). N o constant in model, estimate d bia s v . sampl e size , s = 0.5
Sample size
Fio7.4(b). Constan t i n model, estimate d bia s v . sampl e size, s = 0.5
Since p i = 1 , {v, } i s a rando m wal k an d therefor e asymptoticall y dominates {«<} . Henc e th e co-integratin g paramete r o f —2 . I n finit e samples th e regressio n wil l com e close r t o revealin g thi s long-ru n relationship i f th e varianc e o f u, i s smal l relativ e t o tha t o f v t. Recal l
220 Co-integratio
n i n Individua l Equations
that b y equatio n (10) , u t i s th e discrepanc y fro m thi s long-ru n relation ship. Smalle r value s o f p 2 an d smalle r value s o f o 2 (large r value s o f Oi/o2) mak e the varianc e of u t relativel y small , and so we obtai n smalle r biases a s p 2 fall s o r a s 0i/o 2 rises . The fac t tha t thes e biase s d o disappea r les s quickl y tha n T, an d ma y remain substantia l fo r sampl e size s larg e relativ e t o man y foun d i n economics, suggest s tha t th e result s fro m pur e stati c model s mus t b e treated wit h caution . W e wil l late r examin e way s i n whic h w e ca n improve upo n simpl e stati c estimatio n eithe r b y includin g dynami c elements, adjustin g th e result s o f th e stati c model , o r estimatin g a system o f equations . Finally, th e biase s ar e strongl y positivel y correlate d wit h ( 1 — R2), which indicate s tha t co-integratin g regression s wit h value s o f R 2 wel l below unit y should b e viewe d with caution. 7 However , i n th e contex t of multivariate regressions , a high value of R 2 i s not sufficien t t o guarante e that th e biase s ar e small . Thi s i s because th e R 2 o f a n equatio n canno t fall whe n a n additiona l variabl e i s adde d t o it . Thus , th e inferenc e tha t high value s o f the R 2 impl y low biases, especially wher e th e forme r may have bee n achieve d b y a n a d ho c additio n o f regressors , i s no t valid . Banerjee e t al. (1986 ) explor e th e relationshi p betwee n bia s an d (1 - R 2} i n more detail . It i s usefu l t o conside r a n informa l explanatio n fo r th e existenc e o f biases i n stati c regressions . Th e effec t o f usin g stati c regression s t o estimate th e co-integratin g slope / ? is to allo w th e residua l u t t o captur e all th e dynami c adjustmen t terms . Accordin g t o th e super-consistenc y theorem, thi s i s certainl y permissibl e asymptotically. I t i s importan t t o emphasize tha t th e proble m w e ar e discussin g her e i s strictl y a finite sample one ; the omissio n o f th e dynamic s ma y b e justifie d asymptotic ally b y observin g that , a s the y ar e o f a lowe r orde r o f magnitud e tha n the non-stationar y term s i n th e regression , the y ma y b e ignore d i n th e limit. However , th e omitte d dynamics , despit e bein g o f a lower orde r o f magnitude, ca n matte r considerabl y i n determinin g biase s eve n i n fairl y large bu t finit e samples. 8 Henc e i t seem s appropriat e t o pa y attentio n t o modelling th e omitte d terms . The dynami c terms ca n al l be parameterize d i n term s o f 1(0 ) serie s o f the for m A >>,„,•, Az,_ ; , an d ( y — yz)t-k wher e th e value s o f i , j, and k 1 W e ar e gratefu l t o To m Rothenber g fo r pointin g out tha t R 2 i s a rando m variabl e in the presen t context . However, i t remain s a usefu l descriptiv e statistic. 8 Th e proble m o f finit e sampl e biase s wa s als o demonstrate d b y Hendr y an d Neal e (1987). Usin g recursiv e procedure s fo r OL S estimation , the y estimate d a bivariat e stati c regression fo r sampl e size s rangin g fro m 4 0 t o 200 , considering th e bia s o f th e coefficien t estimate fo r eac h sampl e size . Th e result s indicate d that, eve n fo r sampl e size s o f 200, the long-run coefficien t fro m th e stati c regressio n wa s approximatel y 0. 7 whil e th e tru e long-run coefficien t wa s 1.0 . Convergenc e t o th e tru e valu e wa s no t nearl y a s fas t i n practice a s T~ ! whic h dominate s for sufficientl y larg e T: se e (18 ) below.
Co-integration i n Individual Equations 22
1
will depen d upo n th e natur e o f the ARIM A process generatin g {y t} an d {z<}.9 Conside r a simpl e mode l i n whic h {z t} i s strongl y exogenou s fo r the regressio n parameter s an d th e tru e dynami c relationship, apar t fro m deterministic components, i s given by
where {y,} an d {z t} ar e CI(1 , 1). 10 Th e error s ar e mea n zero , mutually and seriall y uncorrelate d norma l variates . Th e variance s o f £ 1( an d £ 2t are denote d b y o\ an d a\ respectively . Suppos e tha t economi c theor y suggests that , i n th e lon g run , th e homogeneit y restrictio n 2;= i7 i = 1 holds. Equatio n (14a ) ca n be rewritte n as or a s
Now ( y — z) an d A z mus t bot h b e 1(0 ) usin g th e co-integratio n assumption, a s is s\t. Hence, b y estimating the stati c regression the dynamics , give n b y A.z t an d ( y — z)t-i> ar e & U containe d i n th e residual u t\ whe n \YI\ < 1, 13 = (72 + 73)7( 1 ~ 7i) - I n general , u, will b e serially correlated . It s long-ru n varianc e o 2, whic h appear s i n th e expressions fo r th e Wiene r distributiona l limit s o f th e sampl e moments , is given by where
It ma y then be show n that
Phillips (1986 ) show s that i t i s th e presenc e o f A in (18 ) tha t cause s th e biases. 9
Se e e.g. th e derivatio n o f the EC M representatio n i n Ch. 5 for CI(1 , 1 ) series. A simpl e rewritin g o f equatio n (10 ) above , t o tak e accoun t o f th e structur e o f th e residual autocorrelation , give s u s a versio n o f (14a ) wit h th e y ; suitabl y interpreted . Late r in thi s chapte r w e conside r a generalizatio n o f (14 ) an d investigat e th e consequence s o f using stati c an d dynami c regressions . 10
222 Co-integratio
n i n Individua l Equations
A simpl e wa y t o reduc e th e biase s i s to reparameteriz e th e equatio n in suc h a wa y tha t A is se t a t zero . Bot h (15a ) an d (156 ) satisf y thi s property. Fo r comparison , followin g Banerje e e t al. (1986) , w e ra n a second se t o f experiment s i n orde r t o investigat e th e effect s o f suc h re-parameterizations. Usin g th e DG P give n b y (14a)-(146), we estimate equation (15a) , wit h a lagge d z include d a s a n extr a regressor . Th e dynamic regression equatio n estimate d i s therefore
The extr a lagge d variable , z t-\, i s include d t o avoi d imposin g homo geneity (se e Chapte r 2) , a s i t woul d b e unrealisti c t o assum e tha t th e investigator know s th e precis e for m o f th e data-generatio n process . Th e co-integrating coefficien t i s estimate d b y computin g th e expressio n 1 - d/c: se e Sect . 2.4 . Th e stati c regressio n give n b y (16 ) i s als o estimated. The stron g exogeneit y propert y require d o f z t i s guaranteed , i n th e design o f th e experiment , b y drawin g e lt an d e 2t fro m uncorrelate d pseudo-normal distributions . Th e value s o f y , ( i = 1, . . ., 3 ) ar e varie d as i n Tabl e 7.3 , while ensurin g tha t long-ru n homogeneit y i s preserved . The sampl e size s an d th e rati o o f the standar d deviation s o f e lr an d e 2t are als o varied , t o giv e a se t o f 9 0 experiments . Th e simulation s ar e al l conducted with 5000 replications . The purpos e o f th e firs t par t o f thi s exercise i s to compar e th e biase s in th e estimate s o f th e co-integratin g paramete r obtaine d fro m dynami c regression wit h thos e obtaine d fro m th e stati c regression . (Th e tru e value o f th e co-integratin g paramete r i s 1. ) Som e o f th e result s fo r different configuration s o f th e y , parameter s an d standard-deviatio n ratios ar e give n i n Tabl e 7.3 . We repor t th e estimate d biases , fo r fou r different sampl e sizes , i n th e stati c model . Th e correspondin g estimate d biases fro m th e dynami c regressio n (wher e th e co-integratin g paramete r is calculated a s (1 — d/c)) ar e i n almost al l cases so small a s to b e withi n 2 Monte Carl o standar d error s o f zero an d s o ar e no t reported . W e wil l return t o th e compariso n o f these estimator s (stati c an d dynamic ) below ; for th e tim e being , th e noteworth y point i s simply that substantia l biases remain i n stati c estimate s fo r paramete r combination s a t whic h th e biases i n dynami c estimate s ar e zero , o r ver y clos e t o zero , sinc e th e dynamic model ha s been specifie d s o a s to mak e A close t o zero . While th e dynami c estimate s contai n negligibl e biase s i n thes e ex amples, Z t is strongly exogenou s fo r th e paramete r o f interest . Whil e i t is fairl y straightforwar d t o exten d thi s specificatio n t o includ e weakl y exogenous z t , th e usefulnes s o f estimate s fro m dynami c single equation s is reduce d substantiall y i f th e regressor s ar e no t weakl y exogenous . I t also become s difficul t t o mak e unambiguou s comparison s betwee n
Co-integration i n Individua l Equations 22
3
TABLE 7.3. Biase s in static models a DGP: (14« ) + (146) ; 5000 replications Sample siz e (T) 25 5 7i = 0.9 , 72 s =3 Yi = 0.9 , 72 s =1 Yi = 0.5 , 72 s =3 Yi = 0.5 , 72 s =1
= 0 ,-5, = 0 ,,5, = 0 ,• 1 , = 0 .1,
0 10
0 20
0 40
0
-0.,39
-0.25
-0.15
-0.07
-0.,04
-0,.32
-0.22
-0.14
-0.08
-0..04
-0,,23
-0.13
-0.07
-0.03
-0,,02
-0.,21
-0.12
-0.06
-0.03
-0,,02
a
Standar d error s o f thes e estimate s var y widely, but th e estimate d biase s ar e in almos t al l case s significantl y differen t fro m zero , fo r sampl e size s o f 5 0 o r greater. Not e tha t agai n th e biase s appea r t o declin e les s quickl y than T~ l, bu t more quickl y than T~V Z. Calculation s wer e undertaken usin g GAUSS .
dynamic an d stati c single-equatio n estimates . W e discus s thi s issu e below. Recalling th e discussio n i n Chapte r 5 , a tes t o f th e nul l hypothesi s H0 : c = 0, base d o n th e t -statistic t c= 0, i s a vali d tes t fo r co-integra tion.11 Thi s statistic , unde r th e nul l o f n o co-integration , i s no t asymp totically normall y distributed . Therefor e a secon d par t o f th e exercis e was used t o comput e th e critica l values of the distributio n of t c= 0 an d t o use thes e critica l values t o deriv e th e powe r o f thi s statistic , for a rang e of cases , t o detec t co-integration . Thi s i s a n exampl e o f a tes t o f co-integration base d no t directl y o n th e residuals , bu t o n a regressio n coefficient. A powe r comparison , betwee n a residual-base d tes t an d th e Mest, i s give n i n Tabl e 7.7 ; bu t firs t w e us e a mor e genera l DG P t o consider furthe r th e issu e of finit e sampl e biases . 7.4.1. General Data-generation Processes
Consider no w th e compariso n o f stati c an d dynami c estimate s o f th e long-run multiplie r whe n th e tim e serie s ar e derive d fro m a mor e 11 Whe n y an d z ar e no t co-integrated, ( y - z),_ 1 i s 1(1), in which case (19 ) ca n only be balanced i f c - 0 . This observatio n form s the logica l basis fo r a test o f co-integration base d on t c= a- Th e stron g exogeneit y o f z , (fo r th e parameter s i n (14a) ) ensure s tha t a tes t base d on estimate s fro m a single equatio n suc h a s (19 ) i s fully efficient .
224 Co-integratio
n i n Individua l Equation s
general DGP . Th e experiment s describe d abov e ar e specia l case s o f this more genera l DGP . Th e 'static ' estimat e o f the co-integratin g coefficient [3 is called ft s, whil e the dynami c estimate i s denoted p d. The exogenou s variabl e i s generated a s
so tha t z t ca n b e mad e eithe r 1(0 ) o r 1(1 ) b y choic e o f
Finally, th e dynami c regression model 12 is
In comparin g th e data-generatio n proces s wit h th e model , thre e interesting case s ca n b e identified . Thes e ar e th e case s i n whic h th e model i s over-parameterized , under-parameterized , an d exactl y para meterized. For eac h o f thes e cases , severa l sub-cases , whic h deriv e fro m th e integration propertie s o f th e {z,} an d th e {y t} series , ar e o f interest . I n particular, w e migh t b e intereste d i n determinin g whethe r th e relativ e performances o f th e stati c an d dynami c regression s depen d upo n th e proximity o f th e larges t laten t roo t o f eithe r o f th e processe s t o unity— whether, fo r example , performanc e whe n th e {z t} serie s i s 1(0 ) and th e {y t} serie s i s very nearly 1(1) differ s fro m tha t whic h holds when the {z t} serie s i s 1(1 ) an d th e {y t} serie s i s nearl y 1(2) , o r whethe r i n general th e result s ar e affecte d b y specifyin g th e {z t} serie s t o b e non-stationary ($ = 1.0 ) rathe r tha n clearl y stationary (
Thi s produce s a n estimat e o f th e co-integratin g paramete r equa l t o tha t produce d b y linear transformation s such a s the error-correctio n form .
Co-integration i n Individual Equations 22
5
TABLE 7.4. Example s o f propertie s o f {z t} an d {y (} fo r variou s para meter values a Pi + P 2 + p3 b
A B C D
Propert «1 «1 1.0 1.0
1(0) 1(0)
1(1) 1(1)
y o f {z t} Propert
y o f {y,}
c
1(0) nearly 1(1) nearly 1(2) 1(1)
a
Parameter s are those appearing in equations (20) and (21). I n Tabl e 7. 5 belo w w e trea t value s o f 0.9 9 as ' = 1' an d value s o f 0.9 5 or lower (i n absolut e value ) a s '« 1'. Not e tha t w e canno t hav e pi + p2 + PI = 1 exactly, sinc e th e ter m ( 1 — pi - p 2 ~ Pa)" 1 appear s i n th e equatio n fo r th e long-run equilibriu m solution. c B y 'nearly' , w e mea n tha t th e serie s i n questio n i s o n th e borderlin e between tw o order s o f integration : i n finit e sample s th e differenc e between , fo r example, a n AR(1 ) with paramete r 1. 0 an d a n AR(1 ) with paramete r 0.9 9 is a difference o f degre e rathe r tha n o f kind. W e sa y that z , = 0.99z,_ i + e t i s nearly 1(1): see Ch . 3. b
The Mont e Carl o result s ar e organize d a s follows . Tabl e 7. 5 contain s three sections , applyin g to model s tha t ar e exactl y parameterized, over- , and under-parameterized . Fo r eac h cas e w e report percentag e biase s in the estimatio n o f a scala r co-integratio n paramete r an d th e standar d errors o f th e experimenta l estimate s o f thos e biases , fo r a rang e o f parameter value s representativ e o f eac h o f th e case s A , B , C , D above . Entries ar e marke d a s being example s o f either cas e A , B , C , or D . Our inten t i n examinin g th e result s i s not simpl y to dra w conclusion s about th e relativ e merit s o f th e stati c an d dynami c regressions , bearin g in min d tha t i n practic e th e investigato r doe s no t kno w th e for m o f th e DGP, an d s o canno t i n genera l produc e a mode l tha t contain s precisely the correc t numbe r o f lag s o f relevan t variables . W e ar e als o intereste d in discoverin g th e case s i n whic h on e o r bot h o f the method s (stati c an d dynamic regression ) yiel d especiall y larg e finite-sampl e biases . Al l results pertain t o a sample siz e o f 12 0 observations. The followin g conclusion s emerg e fro m examinatio n o f Tabl e 7. 5 an d the example s of each o f the fou r case s A-D. First, th e dynami c regressio n tend s t o produc e lowe r biase s i n estimates o f th e co-integratio n parameter . Thi s resul t doe s no t depen d upon a clos e correspondenc e betwee n th e dynami c mode l an d th e data-generation process : fo r example , eve n i n th e cas e wher e th e DG P is a simpl e one , s o tSa t th e mode l use d her e i s substantiall y over parameterized, estimate s fro m a dynami c mode l ten d t o b e a t leas t a s good a s from th e stati c model .
226
^ ^ ^^^ ^
____^
<M<«QU<;m<:mQU<;
Co-integration i n Individua l Equation s cfl
£ ^ .&1 Vj
1 ON ^ o f) rT-H ,—* ON ON CM o •* l ^- - 0 0 , — . O 4 O O , — - N O O C O 0 0 ^ f I O T- H ro^qooo^Hi/-)Oi-<mo<^^HOO
~I
O r- H
^ i 1' : -? <:: ^
c cr c
O^
^
pj
H _, *g ^ ^
l f
s
'
^ ^^~, ^^ ^^ ^- v
^^^ ^^ ^^
S'SSS-S'SS'SSS'S-S-S-SS
O
^o-*^tNtNONvcmcocoror-~oom o)O<-n(siinomO'-H(NrMOooom
<~, ^~, ^^ s~*
ocNooodoo'OO^oOv-idoo
«i g i
?''1 0 9
2 ^
.
O *-
S
;
S ~—'
, O O i - H O C N C N O O ^ O f N C N O i - H
'•
C
^ ^~- ^ S ^^ ^ ^^ in ^~^ G- ^ ^^ ^^ ~~^ ~~'
II
^
c/)
^ ^~~
\.
I!
~^.
ty) CH ^O
M
Z ^ [/I " CD <<S5 -<-*
G
Bj I
S
•a S O
|-2 S
e s I* >, °
-a o "O
,—
ci'~ H S
ra
03 c/r
•"
-§ •S S 52 oo O ^
_I
QJ
|u- «i
C\3 fl
.^4 _£) I
1 7 1 1
1
0 ) 0 , ? '•Q - ^ o o r - O N O ( N ^ H a \ o v o o ^ o r s i v o u - i - * Sr n O ^ ^ CT\or~a\moooNON^oos(Noor^(N i
5^o § §I $..-*-
° ~~~"
A
n. ^ ^ II O O O O O O O O O O O O ^ t ^ ^ J ^ " .. . C-IC3INO
m
Q
S,
O
-S-
o.
M
\
Q.
\
O
\
O
N
O
N
^
l
^^nin
oocooooooooqooooooqo dddoT-H^dddd^H^ddT-H
ddddddddddddddd
ON ON ON ON ON ON ON ON ON CTN ON ON ON ON ON
o o p o o o o o o o o o •*_•*•* ddddddddddddddd
O
ddddddddddddddd
«^inw-jinir)'n>o'Oioi^i^i^;
to + S ^ u o-
CO
o O S
r>
^S
^
rj
-a g2
"3 G o on
3
'g
~
in W1
?*^
m
oxS
fi
t^ 0
r^ "
ui
«a •o
'Q. T-H
o^ O
O
Co-integration in Individual Equations 22
'—s O^
O^
^O
rH r-- Tj^-v CN
O*l CO O
V^
OO O
OO ON OO O> ON O> in '—v ^O
^O
(N rO
ONON^OfN
rH f—•* rH ^H
^H 00 C-4
<<<«
»O
CN U"j ^ CO
oo
f^^^l
CN •* C4
o o "^
f^^1^
OS
^ ^ '-J
^^^
CN (N i-H
r*]o^Hv.oooorr><-Hr-HTj-inoO'Hioooc>r-: S'S'^i^ SS-S^ S-S2^2-S-SS^ SSS-^ SS-S-S-
I
I I
I ^ I
I
I ^ I
I
I
II
I l l ^ l l l
^HO
cN^)-^-^oost--^-moooN<^r--c^o^ooX:5sar'i/Tsi/-r^sCHs\osc7~oo" S2'2'2'2'2'2'2'S-2'2'S'S-2'2'SSS-2'S2'2^S-2-
^ H O O O O O ^ H O O O O C ^ O O O O O O v H O O O C N f N
COOOCOOOOONrHCNlrOV^^HCNCO^O^-^H
^
T-H
i i i
i—I
in
i i i i i v '
<—I
rH
"n
i r~--o s i i
i i i
T-H
i i
C Noo
T-H
i—ifNOO-^HCNlOOT—ic<)OOT—
vn
| i
cOcNi-HONOOO^-OONI--'^-a\CNUlON^H
°
o
,—,
S~*^
55.
^H
Sj"
_
i i i
' ima\a\i<sit--a-i
<S2.
c
II ®-
11
II
cT I r- ir
^ S II ^ ^Q.
<£ vc
II
iO
N ^
O
oooooooooooov")inini/^v")v/-)ir)i/^i/"}ir>i/^i/"i
O~
ovoONO\ov~io^ONOw^oaNOino%ooi/~jaN^o>oONO ooooooocJoooooocsooooooocJo
N ^
^ ^-
O O O O O O O O O O O O O O O C 5 O O O O O O O O O O O O O O O C 5 OOOOOOOOT-H^Hi-H^-5ooOOOOOC5T-HT-H^H^-H
\O
. O C 5 O O O C 5 O O O O O O O O O O O O O O O O O
,0
Q,
>§ T3 (0
.s '^ OJ
o> 6O
PH
CQ t-l CO
O
>
M 4)
"H3 T3 O
"Q-
" CT
o '§ ca
a >> Q ^ &
7
228
u? C/3
in
O
'-H O
OO O-^ ^*~~s
•^~
MD CN) T—1 ON ^) f^ i^j- CO Tf i—i | 1
m s*-'
CO CN
O O
00 -
O O CO in oo
r- m co o o o
1/1 CN rH
in rH in in NO in 1
NO
f? US'
< m Q Q u m <
^
rH
m CN o o o o
T—i CN ON ,—,CO ON ^^oo" C"-~ i—I Tf rH "si" O O 00s O CN CO
CO <
Co-integration i n Individua l Equation s
,-^G NO
&o NO
•^
rH
1
1
'
^^
ON ON ON ON O
8
CN
7
r~ ON ON oo o 1 1 1 1
8
1
x—V
1
1
1
y—V
1
t- ON
in ON
1
CN
1
1
G"^ oo ^t ^
CN CO
ON CN
m co
O O
in o ON in o ^ co ^|- Tf CO
88 8 8 8 88
ooo oo oo
ooo oo
CO ^
ON
in in in o o o CN O O CN OO CN 10 in in m rH m in rH CN CN
rH
O O o o o
in in in ON i/}O
"^ CN
rH O
CO O ,—, /^~s W) ^^ CO "!:T in in o CN OO ON rH O ^O fO T—4
O
1 1
in 1/1
O
rH^f^ 1/1 ON
rH
CN
O 1
£
1
^
1
T
1
in ONON m m m m in in 1/1 m m o o o o o o o o oo o o
O O o o O O O
ON ON ON O O
O
rH
^—^ x—v
1 —1
o o
&
O Q, <^
cd
^
.-i
•'"'
^
IB
rH
c/l 'O
U
/* ! >H
3 g> en
^ aj 2
(^
0 -^ 4H
u .an M
C3
O
^ 'fe Q^ & CD TO 43
3
a CD
M) S1
II
.g 3 CD ^O
CS 00
PH
CD
O Q ^ X
^H
g
N
?
CD >H CTJ
6
Q. CD cu 1) ^
§
H
CTj TJ CD "
CD
S »
-^
-g ^
°
Q SH CQ
B
.Si
cd
CD
^ "§
S ^J
+j CD
•SP i3 ^
'||
. ^
O
-D
0
HH
F-3
•1 1
15
o "§ C3 "^ S --^ o-5 o *
C -3 C c3 '** m .52 D ™ 23 & "-rj CQ CD K .3 uJ O
cu 73
°
o 8 T)CD CDrt £03 3w 'CN T30
CJ J^H
18 £
§|
SH
S3 g>
^ 0
-1 S
*" "CD 0 T3
?%
9 2
CD
J2 '55 x 2
* .£ CD "*—'
g CD
c
04
rH
rH
O •*
O O rH CO
O ON
m o
^_^^
O
^__^
CO ^-v
1 —1
00 ON 1 1
co r-~ co o
rH
|
O ON
|
rH O
IT)
T)
rH ON rH O rH in rH ^f 1 *> CN 1/1 i —1 1
~ ^
s —
CO
in r-~ oo co
7 CN
2
rH
O O
ON
T
o o o oo 00 00 o o o o o O 00 oo oo o o o rH O O o o O O rH T—1 1—I o o o o o o
in ONON in o o o o
o o
in m
o o o o 1 1 1 1
in m
m m o in in o
1
CO
rH
^•vG 0s!--
QQ.
QQ.
v
T—1
S
"*•—r1
/•
["T'] **~s
«£ i
-©-
5
£
«£
^i.
o' o
S
c^ § II
£,
o.
n
a II
<£ i—i n ^
"2 .s CD *5 CD
O.
CO tn
0
•ta "O
"c 3 ^^ "~^I
c o 8 ~
i-^
^'ea3 ^ a w >->
<j .•—^ H -S
Co-integration i n Individual Equations 22
9
Second, a n especiall y troublesom e cas e arise s wher e th e root s o f th e lag polynomia l i n th e serie s {y,} ar e suc h tha t {y t} i s close t o bein g a n 1(1) serie s i n spit e o f th e stationarit y o f {z f}. Thi s cas e lead s t o th e largest biase s o f thos e examine d here . A s i t i s alread y know n tha t regression o f 1(0 ) serie s o n 1(1 ) serie s produce s troublesom e results , especially i n th e for m o f non-standar d distributions of tes t statistics , this result i n th e opposit e cas e i s unsurprising. It suggest s that th e investigation o f th e propertie s o f t - an d f-statistic s tha t woul d b e generate d i n the co-integratin g regression s examine d her e ma y b e o f independen t interest. A featur e of thes e nearly unbalance d regression s i s that th e biase s i n the estimate s o f th e long-ru n multiplier in th e static regression s ten d t o be associate d wit h muc h lowe r standar d error s tha n thos e i n th e corresponding dynami c regressions. Thi s is attributable to th e resul t tha t the varianc e o f & i s o f orde r T~ 2 whil e i n th e dynami c regression , because o f th e asymptoti c normalit y o f th e coefficien t estimates , th e variance of fi ^ i s of order T" 1.13 Expresse d differently , whe n c i s small, ftd ca n tak e extremel y larg e values , an d i n finit e sample s may no t hav e any analytica l moments (see Sarga n 1980 an d Hendr y 1991a) . Finally, th e under-parameterize d dynami c regressions , fo r a wid e range o f paramete r values , perfor m notabl y wors e tha n thei r correctl y and over-parameterize d counterparts . I n th e absenc e o f a priori infor mation abou t la g structures this would appea r t o suppor t th e inclusio n of a fairl y ric h dynami c structure i n th e regression . Not e als o tha t i n part s (a) an d (c ) o f Tabl e 7. 5 ther e i s a t leas t on e cas e i n whic h th e stati c regression appear s t o b e superio r t o th e dynamic . Henc e a preferenc e for th e dynami c form seem s reasonabl e base d o n th e overal l results , bu t the result s shoul d no t b e interprete d t o mea n tha t th e dynami c regression is invariably superio r eve n wit h stron g exogeneity. The classificatio n recorde d i n Tabl e 7. 4 help s u s t o interpre t furthe r the result s appearing in Table 7.5 . Case s labelled A an d D ar e examples of balance d regressions . Howeve r whil e case A represent s regression s of an 1(0 ) variabl e on othe r 1(0 ) variables , case D represent s regression s of an 1(1 ) variabl e on othe r 1(1 ) variables . Thus in case A experiments , th e omitted dynamic s are importan t an d th e dynami c model shoul d perform noticeably bette r tha n th e correspondin g static model . A n examinatio n of part s (a ) an d (b) o f Table 7. 5 show s that this is indeed true . The mor e interestin g case , fro m th e poin t o f vie w o f th e stud y o f co-integration, i s cas e D , i n whic h we hav e tw o 1(1 ) processe s tha t ar e co-integrated. I t ma y b e see n that , fo r thi s cas e too , i n a substantia l majority o f experiment s th e dynami c regression estimate s o f th e long run coefficien t ar e mor e accurat e tha n th e stati c estimates . Thi s recalls, 13 Th e rate s o f convergenc e ar e determine d b y usin g th e SS W theorems , discusse d i n Ch. 6 .
230 Co-integratio
n i n Individual Equations
in th e contex t o f thi s mor e genera l data-generatio n process , th e character o f th e result s i n Banerjee e t al. (1986) . Cases B an d C denot e nearl y unbalance d regressions , an d th e result s here shoul d b e interprete d wit h caution . I n a sense , on e migh t argu e that th e regression s ar e spuriou s because variable s o f differen t order s o f integration canno t b e linke d b y a n equilibriu m relationship . Hence , following fro m th e wor k o f Phillip s (1986) , thes e regression s ar e likel y to b e characterize d b y asymptoticall y divergent coefficient estimate s an d t-statistics. W e woul d als o expec t bot h th e stati c an d dynami c regres sions t o behav e rathe r badly , wit h th e behaviou r worsenin g th e furthe r the regressio n move s awa y fro m balance . Fo r example , i n cas e B , th e greater th e absolut e discrepanc y between 0 and p x + p 2 + p^, th e large r the biase s i n the estimates . A s 0 approaches 1 (for P i + p 2 + P s clos e t o the uni t circle) , cas e B approache s cas e C , an d th e biase s ar e generall y lower. Cas e C represent s unbalance d regression s o f a rathe r specia l kind—namely, regression s o f a near-I(2 ) variabl e o n 1(1 ) an d near-I(2 ) variables—and th e propertie s o f suc h regression s appea r t o b e bette r than migh t have been expecte d (se e Chapte r 3) . Where th e exogenou s variabl e i s 1(1) , th e regressio n estimate s ar e super-consistent an d th e bia s for th e properl y specifie d mode l i s close t o zero. Wher e eac h o f th e serie s i s 1(0 ) large r biase s ca n appear ; th e largest aris e wher e on e serie s deviate s from anothe r b y a quantity that is close to bein g non-stationary. In sum , substantia l biases i n stati c OL S estimator s exist , an d specify ing dynami c regressions ca n hel p alleviat e th e problem . Th e desirabilit y of usin g dynami c regression s i s reinforce d b y a consideratio n o f thei r ability t o detec t co-integration , so we ca n compar e th e powe r propertie s of a test base d o n dynami c models with on e base d o n th e residual s fro m static regressions . Sectio n 7. 6 illustrate s th e tw o method s empirically . In Section 7.7 , w e conside r method s o f correctin g stati c estimator s o f co-integrating vectors an d discuss their properties .
7.5. Power s o f Single-equation Co-integratio n Tests A rang e o f alternativ e test s fo r co-integratio n ha s bee n discusse d i n earlier sections , an d her e w e commen t o n a numbe r o f feature s tha t influence tes t power , followin g th e analysi s i n Kremers , Ericsson , an d Dolado (1992) . Reconsider th e DG P i n (9)-(12) above , i n case C: Az, + Ay ( = e lt (9'
Ay, + 2Az r = (^ - l)(j>,- i + 2*,_! ) + £ 2t. (10'
) )
Co-integration i n Individua l Equations 23
1
The stati c regression involve s estimating an equation o f the for m and th e D F tes t i s conducted o n where v t = yt- fiz t- Th e DGP is optimal for the DF test her e becaus e (10') ha s a vali d commo n facto r whe n £(e 1( £ 2< ) = 0 (se e Hendr y an d Mizon 1978 , an d Sargan 1980). Sinc e ft = -2 , v t = y, + 2zt, so that (10') corresponds t o Au f = (p 2 - l)v. t-i + £ 2t an d henc e CD, coincide s wit h e 2, except fo r term s involvin g (/ 3 - f$)z t, etc . Fo r thi s reason , th e DG P selected b y Engle an d Grange r (1987 ) i s relatively favourable to th e D F test. By contrast , conside r th e DG P i n (14 ) wit h th e stati c regressio n i n (16) an d th e sam e form o f DF tes t a s in (24) : In thi s case, u t = yt- fiz t s o that i n (25), evaluated a t ) § = /? , hence In (26) , a common-facto r restrictio n i s impose d o n th e dynamics , bu t this tim e i t i s no t necessaril y a vali d representatio n o f (14a) . Indeed , since [3 = 1 by homogeneity, (14a) can be writte n as Comparison wit h (26 ) reveal s tha t th e ne w error [£ lf + (y 2 - l)Az J i s white noise , bu t ha s a large r varianc e tha n tha t o f th e erro r i n (14a) . Kremers e t al. (1992 ) sho w tha t t^ i n (24 ) retain s th e Dickey-Fulle r distribution unde r th e null ,
232 Co-integratio
n i n Individua l Equations
using results demonstrated i n Chapter 3 . Under H T, however , usin g result s o n near-integrate d processe s i n (3.40)-(3.42),
(29) where r\ = (|J#e (r)dW(r)) (^K E(r)2dr)~l. Whe n e = 0, w e reproduc e the distributio n unde r H 0. Otherwise , fo r e < 0 , th e distributio n i s shifted t o th e lef t b y e(\\K £(r)2 dr) 1/2 . Whe n T = 100, e = - 1 implie s that p= 0.99, an d e = -5 implie s tha t p = 0.95; a s e— » — °°, th e powe r tends t o 1 . Kremers e t al. (1992 ) argu e tha t simila r consideration s sho w tha t th e non-centrality paramete r o f th e ECM-base d tes t fo r co-integratio n i s larger tha n tha t o f th e non-parametri c statistic s discusse d i n Chapte r 4 . Their Mont e Carl o result s support thes e asymptoti c results . Return no w t o th e Mont e Carl o experimen t give n b y equation s (14fl)-(146). On e appealin g tes t fo r co-integratio n tha t w e hav e men tioned consist s i n usin g th e mode l (15a) , where , unde r th e nul l o f n o co-integration, j l = 1 so tha t th e secon d coefficien t i s equa l t o zero . A f-test fo r thi s conditio n i s therefor e a tes t fo r co-integration . Whil e w e would expec t th e distributio n o f thi s tes t statisti c t o b e non-standard , i t is a straightforwar d tes t an d woul d therefor e b e especially usefu l i f it s power wer e high . I n particular , fo r strongl y exogenou s regressor s i t i s similar (se e Kivie t an d Phillip s 1992) . We examin e th e tes t wit h a small Monte Carl o experiment , comparin g its powe r wit h tha t o f th e AD F tes t base d o n a static mode l t o estimat e the co-integratin g parameter , i n th e DG P give n b y (I4a)-(l4b). Th e first tes t i s th e AD F tes t wit h on e lag , compute d fro m th e residual s of the static regressio n (16) . Th e secon d tes t i s base d o n th e ^-statisti c fo r c i n (19) . A s note d earlier , i f the nul l of no co-integratio n i s true, c = 0 . Under th e nul l (i.e . •y^ = 1 , y 2 = y 3 = 0, o l = o 2 = 1 in (14a)-(14&)) , £ c=0 ha s a Wiener distribution . The critica l values of this distribution an d the AD F wer e compute d b y simulatin g th e nul l mode l fo r 500 0 replications usin g PC-NAIV E (Hendry , Neale , an d Ericsso n 1990) . Thes e critical value s wer e the n use d fo r computin g th e tes t power , an d ar e shown i n Tabl e 7. 6 fo r regression s lik e (19 ) wit h a n intercept . (Th e population constan t i s zero. ) Th e sam e critica l value s resul t fo r 72 + 7 s = 0 when thes e parameter s ar e individuall y non-zero, s o the tes t
Co-integration i n Individua l Equations 23
3
TABLE7.6. Fractile s o f f-statisti c fo r H Q: c = 0 in (19 ) Fractiles of t c= 0 in (19) Fractile T
25 50 100
0.10 -2.99 -2.95 -2.93
0.05 -3.42 -3.33 -3.28
s of ADF(l) 0.01 -4.22 -4.06 -3.95
0.10 -3.15 -3.10 -3.09
0.05 -3.51 -3.41 -3.39
0.01 -4.30 -4.08 -4.00
is simila r fo r th e impac t o f Az r : thi s findin g i s base d o n replicatin g th e null experimen t a t differen t paramete r value s usin g th e sam e rando m numbers. The ADF(l ) critica l values are als o the sam e for all the values of th e nul l model' s parameter s sinc e th e AD F tes t i s know n t o b e similar. (Th e sam e Mont e Carl o tric k wa s use d t o chec k tha t feature : see Banerjee an d Hendry 1992. ) The t c= o fractile s ar e slightl y close r t o zer o tha n th e correspondin g fractiles o f th e augmente d Dickey-Fulle r distribution . Unde r th e alter native hypothesi s o f co-integration , t c= 0 i s asymptoticall y normall y distributed. Each entr y in Table 7. 7 show s the proportiona l frequenc y o f rejection of th e false nul l hypothesi s o f n o co-integration. 14 Th e powe r o f eac h test fo r eac h se t o f paramete r value s o f th e DG P an d sampl e siz e i s shown separately . A t smal l value s o f y 1 — 1, th e powe r P a o f th e ADF(l) tes t i s ver y clos e t o tha t o f th e t c= 0 tes t (P c), bu t th e latte r dominates a s YI ~ 1 increases. Increasin g the signal-nois e rati o o 2/Oi o r (1 — y2) als o favour s P c. Th e power s converg e t o unit y a s th e sampl e size T increases , bu t slowl y when ( 1 — yjj = 0.1 . Thus, th e power o f t c= 0 relativ e t o th e ADF increases wit h ( 1 - y :), (1 - y2)°2/°i, an d T , matchin g th e result s i n Kremer s e t al. note d above. Th e firs t thre e experiment s hav e dynamic s tha t ar e clos e t o satisfying a commo n facto r restriction : th e AD F equatio n ha s a residual standard erro r tha t i s only abou t 4 per cen t large r i n (a ) tha n th e DGP . On thes e experiment s th e AD F tes t doe s relativel y well , althoug h bot h tests d o poorly i n absolut e terms . Whe n a common facto r approximatio n is poo r a s in (/) , th e AD F tes t suffer s abou t a n 8 5 per cen t increas e i n the residua l standar d erro r b y imposin g th e commo n facto r an d doe s relatively badly , i n som e case s dramaticall y s o (e.g . T = 50 a t th e 1 % significance level) . Owin g t o th e larg e valu e o f ( 1 - Yi), bot h test s d o well absolutely for sampl e size s of 100 . The tes t power s respon d i n a nonlinea r wa y to change s i n th e desig n parameter values , bu t som e understandin g o f th e rejectio n frequencie s 14
A s i n Table 7.6 , all the result s ar e base d on 5000 replications.
234 Co-integratio
n i n Individua l Equations
TABLE 7.7. Tes t rejectio n frequencie s i n ECMs DGP: (14a ) + (146) ; 500 0 replication s Estimated powe r a t given tractile 0.10
WADF
(«);l/i = T=
25 50 100
0.9 , y 2 = 0.5 , 5 = 3 "
0.13/0.13 0.21/0.17 0.44/0.31
0.05 WADF
0.01
0.06/0.06 0.10/0.10 0.26/0.20
0.01/0.01 0.02/0.02 0.07/0.05
0.06/0.05 0.10/0.09 0.30/0.19
0.01/0.01 0.02/0.02 0.08/0.04
0.07/0.05 0.12/0.07 0.40/0.14
0.02/0.01 0.03/0.01 0.13/0.03
0.45/0.20 0.97/0.72 1.00/1.00
0.16/0.05 0.78/0.34 1.00/0.97
0.66/0.18 1.00/0.67 1.00/1.00
0.29/0.04 0.94/0.28 1.00/0.96
0.87/0.12 1.00/0.60 1.00/1.00
0.64/0.03 1.00/0.22 1.00/0.94
i
( * > ) 'Yi = °-9 , 7 2 = 0.5 , s 0.14/0.11 r = 25 0.21/0.15 50 0.49/0.30 100 (c) yi = 0.9 , y 2 = 0.5 , s = 1/ 3 0.13/0.10 T = 25 0.24/0.13 50 0.59/0.24 100 5 (d): Yi = °- > 7 2 = 0.1 , s = 3 0.66/0.35 T = 25 0.99/0.84 50 1.00/1.00 100 = 1 /! = 0.5 , y = 0.1 , s 2 W: 0.79/0.31 r = 25 1.00/0.80 50 1.00/1.00 100 = 1/ 3 /! = 0.5 , y = 0.1 , s 2 (/)i 0.94/0.23 r = 25 1.00/0.75 50 1.00/1.00 100 a
S = CTi/0-2.
in Table 7. 7 ca n be obtaine d fro m th e followin g analysis. Neglectin g th e intercept, th e AD F tes t essentiall y involve s testin g YI = 1 in where th e firs t ste p regressio n o f y t o n z t estimate s fi , whic h her e ha s a population valu e o f unity . Unde r th e alternative , y t-i — flzt~~i is station ary, an d fo r y 3 = 1 the non-centralit y o f th e AD F pseud o Mes t wil l b e given approximatel y b y
Co-integration i n Individua l Equations 23
5
(see Mizo n an d Hendr y 1980) , wher e AS E denote s th e coefficien t asymptotic standar d erro r calibrate d t o a sampl e siz e o f T. Fo r give n design paramete r values , th e AS E i s easil y calculate d usin g PC-NAIVE , and som e outcome s ar e show n below . Similarly, the t c= 0 test i s actually based o n testing y j = 1 in
Since th e regresso r y ( _j - z t-\ i s stationar y unde r th e alternative , i f 7s + 72 + 7i = 1 i s impose d an d henc e z t-\ omitted , th e asymptoti c non-centrality o f th e Mes t o f y i = 1 (agai n i n PC-NAIVE) , yield s th e following illustrativ e values for T = 25: Case NCadf NC,ecm
(a) -1.15 -1.19
(*) -1.15 -1.28
(c) -1.15 -1.52
(d) -2.89 -3.25
(«) -2.89 -3.88
(/) -2.89 -5.32
In practice , thes e approximat e non-centralitie s wer e clos e t o th e mea n values o f the correspondin g tes t statistic s in th e Mont e Carlo , excep t fo r (fl)-(c) fo r th e ADF , which ha d a mea n o f abou t -2.1 5 (se e (4.28)). Their values hel p explai n both th e increasin g power s o f both test s acros s the experiment s an d th e relativel y bette r performanc e o f f c = 0 - Compared wit h th e critica l value s i n Tabl e 7.6 , and give n th e samplin g standard deviation s o f th e test s o f abou t 0. 8 fo r AD F an d 1. 0 for t c= 0, the non-centralitie s als o accoun t for the absolut e power s of the tests : when th e mea n outcom e i s below the critica l value, a power o f less tha n 0.5 usuall y results ; whe n th e mea n i s more tha n on e standar d deviatio n below th e critica l value , th e resultin g powe r i s under 0.2 ; two standar d deviations lowe r induce s a ver y lo w power ; an d s o on . Simila r argu ments appl y fo r deviation s o f the mea n abov e th e critica l value. Overall, ther e woul d see m t o b e som e advantag e i n modellin g dynamics les s restrictivel y tha n b y commo n factor s whe n th e latte r i s a poor approximation . Not e tha t th e absenc e o f an y contemporaneou s effect fro m Az , alway s induce s a violatio n o f commo n factors . Finally , since th e long-ru n paramete r i s no t assume d know n i n thes e experi ments, th e t c= 0 tes t procedur e i s a n operationa l one , and ha s th e sam e number of parameters her e as the AD F test . The mai n drawbac k t o suc h a n approac h i s its dependenc e o n stron g exogeneity. Boswij k (1991 ) propose s a Wal d tes t fo r co-integratio n i n individual equation s whe n th e regressor s ar e no t eve n weakl y exogen ous. Thi s jointly test s the nul l for th e coefficient s o f all the lagge d level s in a Bardsen formulation . Th e resultin g test i s asymptotically similar an d in effec t test s fo r a commo n facto r o f unit y (se e Hendry an d Mizo n 1978). Boswij k an d Franses (1992) investigat e the powe r o f this test.
236 Co-integratio
n i n Individua l Equations
7.6. A n Empirica l Illustratio n To illustrat e severa l test s fo r co-integratio n i n singl e equations , w e return t o conside r th e U K seasonall y adjuste d quarterl y dat a o n mone y demand. Th e ra w dat a serie s wer e show n i n Chapte r 1 , an d w e concentrate her e o n th e DW , DF , an d AD F test s base d o n a stati c regression, an d o n thei r compariso n wit h a dynami c regression, whic h is heavily over-parameterized . I n al l cases , w e assum e tha t ther e i s onl y one co-integratin g vecto r an d tha t i t enter s th e money-deman d model . See Kremer s e t al. (1992 ) an d Ericsson , Campos , an d Tra n (1990 ) fo r related analyses . The long-ru n determinant s o f th e deman d fo r transaction s mone y M, as measure d b y Ml , ar e th e pric e leve l P, rea l incom e a s measure d b y constant 1985-pric e tota l fina l expenditur e X S5, an d th e opportunit y cos t of holdin g mone y measure d b y R n. (Se e Hendr y an d Ericsso n (1991i> ) for detail s o f it s calculation. ) W e assume d a log-linea r equation , consonant wit h pric e an d incom e homogeneity , give n by where lower-cas e letter s denot e logs , ai = 1 i s anticipated , an d a, > 0 , / = 1 , 2, 3. Least-square s estimatio n o f th e stati c regressio n ove r the sampl e 1963(I)-t o 1989(11 ) yielded
The residual s wer e the n teste d fo r a uni t roo t usin g th e D F an d AD F tests, th e latte r commencin g wit h fou r lag s an d testin g down . Th e following result s were obtained :
No lagge d values of A w prove d significant , leadin g to th e D F test :
In n o cas e doe s an y tes t rejec t th e nul l o f n o co-integration , a s th e lvalues on th e estimate d coefficien t o f M J ar e i n the neighbourhoo d o f 2 in bot h th e D F an d th e AD F regressions . Tha t outcom e continue s t o hold i f a tren d i s adde d t o th e basi c static-regressio n mode l (30) , or i f
Co-integration i n Individual Equation s
237
price homogeneit y i s imposed an d Ap adde d a s a regressor, correspond ing t o allowin g m an d p t o be 1(2), wit h ( m - p ) an d Ap bein g 1(1). In that last case , R 2 fo r real mone y is equal to onl y 0.68. We assum e no w that Ap, x S5, an d R n ar e weakl y exogenou s fo r th e parameters i n th e conditiona l mone y deman d model . Th e outcom e o f estimating a dynami c equatio n i n th e level s o f th e variable s wit h fiv e lags o n eac h o f m — p, Ap , * 85, an d R n (plu s a constant ) b y leas t squares i s shown in Table 7.8. TABLE 7.8. Empirica l result s Variable
Lag 1
0 m— p
-1.000
xss
-0.041 0.115 -0.411 0.117 -0.757 0.210 -0.124 0.169
SE SE
Rn
SE Ap SE CONSTANT SE
0.
3
2
4
5
Sum o f lags
A 0.164 .147 0.549 0.,240 0,,251 0 .152 0,,132 0.,135 0 ,131 0.109 0,,028 0.118 0.087 0,.162 0.293 -0,,067 -0.,240 0,,130 0 .139 0.119 0 .026 0.135 0..139 0..139 -0.361 -0,,122 -0.,046 -0 .084 -0.045 -1.070 0.130 0 .187 0.178 0.,185 0.,176 0 .175 0.069 -1,.102 0.020 0,,307 -0.,412 -0 .329 0 .222 0.255 0,.253 0,,246 0 .246 0.203 - -0.12 4 0 .169
R2 = 0.9966 a = 0.0130 F(23 , 76) = 975.3 8 D W = 1.976 SC = -7.85 3 Mea n = 10.89613 1 S D = 0.19617 3 Normality % 2(2) = 4.29 AR 1- 5 F[5, 71] = 0.2 0 ARC H 4 F[4 , 68] = 0.22 Xj F[37,38] = 0.6 6 RESE T F[l,75 ] = 0.98 COMFACF[15,76] = 3.14 Tests on the significance of each variable Variable
Ffnum., denom. ]
Value
Probability
Unit-root Mest
m— p
F[5,76] F[6, 76] F[6, 76] F [6, 76] F[l,76]
340.201 7.801 12.127 6.846 0.536
0.000 0.000 0.000 0.000 0.466
-5.168 6.171 -5.719 -4.963 -0.732
*85
Rn
Ap
CONSTANT
Solved static long-run equation m — p = 1.102jc 85 - 7.278R n - 7.493A; ? - 0.84 2 (0.112) (0.528) (1.482 ) (1.230 )
238 Co-integratio
n i n Individual Equations
These dynami c estimate s ar e wel l behaved: th e unit-roo t f-test s ar e al l in th e neighbourhoo d o f 5 o r large r i n absolut e valu e an d ever y regressor matter s a s a se t (i.e . testin g al l fiv e lags) ; th e solve d lon g ru n is wel l define d an d compare s favourabl y wit h (30 ) sinc e th e thre e economic variable s have highl y significant coefficient s wit h sensible sign s and magnitudes ; th e goodnes s o f fi t i s reasonable ; an d th e diagnosti c tests o f th e dynami c specification ar e al l acceptable . Not e tha t th e su m of al l the lag s of th e dependen t variable , a s shown in th e fina l colum n of Table 7.8 , i s similar t o tha t foun d i n th e D F regression , bu t ha s a muc h smaller standar d error . Only th e firs t la g i s strongl y significant , a s i s show n i n Tabl e 7.9 . Tests o f commo n factor s i n th e la g polynomial s usin g th e procedur e i n Sargan (1980 ) yiel d the result s in Table 7.10 . Thus, th e hypothesi s o f fiv e commo n factor s ca n b e rejecte d a t an y reasonable leve l o f significance . Recallin g th e discussio n i n Sectio n 7. 5 above, thi s outcom e help s explai n wh y th e D F an d AD F test s di d no t reject th e nul l o f n o co-integration , wherea s th e dynami c mode l ha s done s o decisively . Give n tha t th e commo n facto r restriction s ar e rejected, th e D F an d AD F test s ar e no t wel l suite d t o detectin g co-integration. Th e EC M versio n o f thi s equation , reporte d i n Hendr y and Ericsso n (I99lb), ha s a ?-valu e greate r tha n 1 0 in absolut e valu e fo r the EC M coefficient , i n a mode l whic h parsimoniousl y encompasse s th e unrestricted equatio n fitte d above . Thus , th e evidenc e favour s rejectin g no co-integration, an d the result s in the nex t chapter suppor t tha t claim . TABLE 7.9. Test s on th e significanc e o f eac h la g Lag F[num.
, denom. ] = Valu e Probabilit
5 4 3 2 1
0.691 1.615 1.654 1.416 12.967
F [4, 76] F [4, 76] F [4, 76] F [4, 76]
F[4, 76]
y 0.600 0.179 0.170 0.237 0.000
TABLE 7.10. COMFA C Wald tes t statisti c summary table Order x 13 26 39 41 51
2
2 5
d.f . Valu
e Incrementa 0.086 0.196 4.176 8.101 47.128
3 3 3 3 3
l x 2 d.f . Valu
e
0.086 0.110 3.980 3.925 39.028
Co-integration in Individual Equations 23
9
7.7. Full y Modifie d Estimatio n This sectio n consider s method s fo r correctin g th e finite-sampl e biase s i n static regressions . Par k an d Phillip s (1988) , Phillip s an d Durlau f (1986) , Phillips an d Hanse n (1990) , an d Phillip s (19880 , 1991 ) hav e argue d tha t the performanc e o f estimator s o f co-integratin g vectors base d o n static regressions is adversely affecte d b y the existenc e of second-order biases. As show n i n th e example s below , thes e biase s hav e n o effec t o n th e consistency o f th e estimators , bu t resul t i n th e asymptoti c distribution s of scale d estimators , suc h a s T(p — ft) i n (31 ) below , havin g non-zer o means. Such biase s pla y a potentiall y importan t role i n finit e samples . Fo r example, le t the variables ylt an d y 2t b e generated by
When th e {u it} ar e autocorrelate d an d intercorrelated , a stati c regres sion o f yit o n y 2(, b y no t usin g an y informatio n abou t th e proces s generating y 2t, provide s a n estimat e o f y 3 whic h ca n b e quit e severel y biased eve n i n fairl y larg e samples . Phillip s e t al. therefor e recommen d full-system maximu m likelihood estimatio n o f co-integrate d systems . A s an alternativ e t o estimatio n o f th e ful l system , the y propos e correctin g the single-equatio n estimate s non-parametricall y i n orde r t o obtai n median-unbiased an d asymptoticall y norma l estimates . Thes e re commended corrections , fo r simultaneit y bia s an d residua l autocorrela tion, us e expression s derive d fro m th e asymptoti c distribution s o f th e estimators althoug h th e correction s ar e mad e t o estimator s fro m finit e samples. Phillip s an d Hanse n (1990 ) sho w tha t thes e correction s wor k effectively i n sampl e size s a s smal l a s 50. 15 Thei r exampl e i s presente d in Sectio n 7.10. 4 below. The estimate s obtaine d fro m full y modifie d an d full-informatio n methods ar e asymptoticall y equivalent . Thi s equivalenc e i s o f interes t because i t link s th e discussio n wit h a thir d possibl e metho d o f reducin g finite-sample biases , namely , estimatin g single-equatio n dynamic regres sions. Th e ai m o f th e analysi s i n thi s sectio n i s t o compar e th e non-parametrically corrected estimate s (whic h ar e als o asymptoticall y efficient an d median-unbiased ) wit h estimate s obtaine d fro m dynami c regressions i n eithe r thei r AD L o r EC M forms . Th e for m o f th e autocorrelation i n th e erro r proces s i n (31 ) an d (32 ) i s crucia l t o thi s comparison. Fo r som e specification s o f th e erro r process , a dynami c 15 Whil e i t i s possible t o deriv e exac t expression s fo r th e biase s i n finit e sample s t o an y desired leve l o f accuracy , usin g Edgeworth-typ e expansions , thi s i s a complicate d pro cedure .
240 Co-integratio
n i n Individua l Equation s
regression equatio n implicitl y perform s th e sam e correction s a s thos e achieved b y the non-parametri c correctio n terms . Th e long-ru n estimate s obtained fro m thi s properly specifie d dynamic equation ar e the n equivalent, asymptotically , t o th e non-parametricall y correcte d estimates. 16 I n such cases , therefore , tw o way s o f incorporatin g informatio n abou t th e marginal process (tha t is , th e proces s generatin g y^t) presen t them selves: non-parametri c correction , o r dynami c specification . However , for othe r specification s o f th e autocorrelatio n proces s a single-equatio n dynamic regressio n ma y fai l t o achiev e efficiency , o r eliminat e th e effects o f second-order bias , regardles s o f th e richnes s o f th e parameter ization, owin g t o a failur e o f th e conditionin g variables t o b e weakl y exogenous fo r the parameter s o f the dynami c equation. Our theoretica l discussio n i s based o n Phillip s (19880) . Althoug h i t is fairly straightforwar d to describ e an d categoriz e th e circumstance s unde r which dynami c single-equation estimate s wil l perfor m well , th e detaile d theoretical backgroun d fo r thi s descriptio n i s length y an d complex . Readers intereste d i n implementin g th e non-parametri c correction s ar e referred t o th e paper s b y Phillip s an d hi s co-author s cite d previously . We shal l focus on presentin g th e argument s intuitivel y and wil l illustrat e the theoretica l analysi s wit h tw o simulatio n exercises , th e firs t take n from Phillip s and Hanse n (1990) , an d th e secon d fro m Gonzal o (1990) .
7.8. A Fully Modifie d Least-square s Estimato r Consider th e data-generatio n proces s give n b y (31 ) an d (32 ) an d disregard, fo r th e moment , th e precis e autocorrelatio n structur e o f u ( = [«],, « 2 f]'• Assum e onl y tha t u ( i s weakly stationary with it s mean vector an d long-ru n covarianc e matri x give n b y [0,0] ' an d S 2 respect ively, wher e i H = {a)y}y = 12 . 17 Th e followin g decompositio n o f th e fl matrix i s usefu l i n understandin g it s structure : Q = V + F + F" , wher e V = £[u 0uo] an d r = 2)/t= i<£[ u o u it]- Thus , i f th e u proces s i s seriall y uncorrelated an d stationary , the S 3 matri x is the usua l covariance matrix. In th e presenc e o f seria l correlation , additiona l term s i n th e for m o f T need t o b e incorporated . Th e appendi x explains the derivatio n of J2 . 16 I n Ch . 3 we compare d th e performanc e o f th e AD F tes t wit h th e performanc e o f th e non-parametrically correcte d D F test . Th e tw o test s wer e equivalen t asymptotically , i n their abilit y t o mo p u p th e effect s o f residua l autocorrelatio n i n th e D F regression , bu t they coul d behav e quit e differentl y i n finit e samples . Th e sam e comparison s appl y here . Even whe n a particula r dynami c specificatio n estimato r i s asymptoticall y equivalen t t o a non-parametric correction , i t i s stil l o f interes t t o compar e th e performance s o f th e estimates obtaine d fro m each method . 17 Th e long-ru n covarianc e matri x i s give n b y 2irf uu(Q), wher e f m(0) i s th e spectra l density matri x o f u , evaluate d a t zero . Th e correction s discusse d belo w ar e terme d 'non-parametric' becaus e consisten t estimate s o f thi s covarianc e matrix , an d o f relate d matrices, mus t b e obtaine d non-parametrically .
Co-integration in Individual Equation s 24
1
The full y modifie d least-squares estimato r o f [3 takes th e for m
In (33)-(36) , S + i s a bia s correctio n term , fi>21 and fi)22 are consisten t estimates o f th e correspondin g element s i n th e long-ru n covarianc e matrix, an d A i s a consisten t estimat e o f A . Unde r quit e genera l conditions,
The notatio n BM(12 U 2) i s used t o denot e a bivariat e Brownia n motion process wit h covarianc e matri x S2n. 2 an d i s a matri x generalizatio n o f scalar Wiene r processes, a s discussed i n Chapter 6 . The limitin g distribution (37 ) is a covariance matri x mixture of normals (see Table 3.3). The 'ful l modification ' i n (33 ) achieve s tw o notabl e aims . First , b y taking accoun t o f an y seria l correlatio n i n th e residuals , th e bia s correction ter m 6 + mitigate s th e effect s o f second-orde r bias . Second , the correction s fo r long-ru n simultaneit y i n th e syste m mad e b y usin g yit (i n plac e o f yi t) permi t th e us e o f conventiona l (asymptotic ) procedures fo r inference . Thus , definin g th e full y modifie d standar d error b y s+ where ,
where o) result:
112
i s a consisten t estimato r o f ft>ii. 2, w e hav e th e following
242 Co-integratio
n in Individua l Equations
Phillips an d Hanse n (1990 ) sho w tha t thi s approac h i s asymptoticall y equivalent t o system s procedure s suc h a s ful l maximu m likelihoo d estimation discusse d i n Chapte r 8 . Bot h (38) , which simplifie s th e process o f inference , an d th e reductio n i n th e second-orde r bia s i n /3 + help estimatio n an d testin g o f singl e equation s i n co-integrate d systems . Our us e o f a simpl e data-generatio n process i s solely for th e purpose s o f exposition; th e literatur e t o whic h w e hav e referre d i s capabl e o f treating co-integrated system s at a high level of generality.
7.9. Dynami c Specificatio n Is i t possible , b y suitabl e dynami c specification alone , t o mak e th e sam e corrections a s those mad e b y the techniqu e describe d above ? I n orde r t o answer thi s question , Phillip s (1988a ) consider s a dynami c versio n o f equation (31):
yit = /3y 2t + r% + »? „ (39
)
where x t i s a vecto r wit h jointl y stationar y elements . Thus , x t contain s lagged value s o f A_y l r an d curren t an d lagge d value s o f Ay 2 r . Whil e far fro m bein g a genera l dynami c model , (39 ) i s a linear-in parameters AD L model . The proces s o f constructin g a regressio n equatio n suc h a s (39 ) ha s been extensivel y discusse d i n th e literatur e (see , i n particular , Engl e e t al. 1983) . Thus , focusin g o n th e DG P give n b y (31 ) and (32 ) and imposing no restrictions upo n the autocorrelatio n structur e o f the u it,
where %F f-i ' s th e informatio n se t containin g informatio n o n pas t realizations o f y lt, y 2t an d henc e o f «,,_/ , / = 1 , 2 ; / 5 = 1. B y construc tion, {rj t} i n (40 ) is a martingale difference sequence . If th e process generatin g u r i s now specialized t o th e cas e wher e i t is a linear process , s o that
where The varianc e o f v} t i s give n b y cr n 2 = a\\ — O2io22, and r] t i s orthogona l to £ 2, as well a s t o th e entir e histor y o f e, given b y (f,_i , £ r _ 2 > • • •)• 18
Not e that £ = {<7,y}j,,»i, 2 an d I z i s th e ( 2 x 2 ) identit y matrix .
Co-integration i n Individual Equations 24
3
Estimating th e regressio n (39 ) is asymptotically equivalen t t o maxi mizing th e conditiona l likelihoo d functio n o f (s u, e 12, . . ., CIT), give n (s2t, t = 1, 2, . . ., T) . Assumin g invertibility of A(L) in (41), we have
Equation (42 ) implie s and
Thus, t o maximize the conditiona l likelihood require s
which involves }
Solving (43 ) is equivalen t t o least-square s estimatio n o f th e regression model vt; (44 ) where d^L) = 2Jc°=ldliL> , d 2(L) = ^0d2jL>, an d v t ~ IN(0, a u.2) which is independent o f the regressors . It is then possible t o sho w that
(45) where / ? i s th e estimat e o f th e coefficien t o f y 2t i n (44) . Bv(r) an d B2(r) compris e a bivariate Brownian motion process with a well-defined variance-covariance matrix . The questio n pose d a t th e beginnin g o f thi s sub-sectio n ca n no w b e answered. Comparin g (37 ) and (45) , the full y modifie d estimato r fi + and th e dynami c single-equation least-squares estimator ar e equivalen t if and onl y i f B v(r) = BI ,2(r). Thes e tw o Brownia n motion processe s ar e not necessaril y equa l t o eac h other . Thi s i s becaus e B v(r) ca n b e correlated wit h B 2(r), despit e it s constructio n i n (40) . The generatin g mechanism fo r u 2t ma y therefor e b e informative , and optima l inference then require s join t estimatio n wit h th e error-correctio n model . Phillip s (1988) describe s thi s a s a failur e o f wea k exogeneit y or vali d conditioning. If , o n th e othe r hand , B v(r) an d B 2(r) ar e uncorrelate d a t al l frequencies, th e conditiona l proces s i s completel y informativ e fo r th e purposes o f estimation o f f t an d th e margina l process generating u2t ma y be ignored . In suc h a case, B v(r) — B\ 2(r).
244 Co-integratio
n i n Individual Equations
The example s followin g thi s sub-sectio n wil l elaborat e upo n thes e conditions, bu t w e wil l clos e thi s sectio n wit h a n interpretation . Th e non-equivalence o f th e dynami c regressio n estimato r an d th e full y modified estimato r arise s fro m possibl e correlatio n betwee n th e residual s r)t o f th e conditiona l proces s an d th e residual s u 2t o f th e margina l process. Thi s correlatio n arise s because, althoug h t] t i s orthogonal t o u 2t and th e pas t histor y o f u 2t (t] t i s orthogona l t o it s ow n pas t b y construction), u 2t i s no t necessaril y orthogona l t o th e pas t o f u\ t an d hence (r\ t, u 2t)' jointl y is not a martingale difference sequenc e (MDS) . Three example s ar e presente d below . The y ar e adapte d fro m Phillip s (1988a) an d ar e specia l case s o f th e example s appearin g i n tha t paper . Three differen t specification s o f th e autocorrelatio n structur e o f th e u , process ar e considere d whil e the data-generatio n proces s continue s t o b e (31) an d (32) . The example s hel p t o integrat e an d interpre t th e discussion s o n wea k exogeneity, dynami c modelling, and full y modifie d estimation. Exogene ity play s a n importan t rol e i n dealin g wit h non-stationar y variables . Dynamic regressio n equation s i n whic h the conditionin g is on weakl y or strongly exogenou s variable s (fo r th e parameter s o f interest ) provid e asymptotically unbiase d estimates. Further , inferenc e ma y b e conducte d with standar d tables . I n case s wher e suc h conditionin g i s no t possible , improperly conditione d equation s lea d t o inefficien t an d biase d esti mates. Th e ful l syste m mus t therefore b e estimate d o r th e non-paramet rically modifie d estimate s used . I t i s see n tha t full y modifie d estimation is anothe r wa y o f addressin g th e issu e of the completenes s o f conditiona l models fo r purpose s o f estimatio n an d inference . 7.10. Example s 7.10.1. Example (Phillips 1988a: 352)
In reduce d form , th e DG P (31 ) an d (32 ) is given by
Hence
Co-integration in Individual Equations 24
5
Thus, usin g th e formul a fo r th e conditiona l expectatio n o f bivariat e normal rando m variables , w e have Defining and usin g (48), we obtai n or, alternatively , where Finally, substitutin g for £ Several feature s ar e no w evident . B y construction , j\ t i s a n MDS. Second, agai n by construction , r\ t i s uncorrelated wit h u 2t.19 Fro m (47), we hav e tha t th e u 2t proces s i s serially uncorrelate d bot h wit h pas t u 2t and wit h pas t w l f . I t follow s tha t r\ t an d u 2t ar e incoheren t (tha t is , uncorrelated a t al l lag s o r frequencies) , tha t th e long-ru n covarianc e matrix o f [r] t, u 2t]' i s diagonal , an d tha t th e estimatio n o f a singl e dynamic equatio n should provid e a full y efficien t an d unbiase d estimat e of th e vector a . Looking a t th e conditiona l an d margina l processe s give n b y (50 ) and the secon d equatio n i n (46) respectively, and a t th e propertie s identified in th e previou s paragraph , single-equatio n leas t square s o n (50 ) i s equivalent t o full-informatio n maximu m likelihood fo r estimatin g y3 . Th e orthogonality o f th e r) t an d u 2t processe s ensure s tha t th e join t likeli hood functio n fo r th e syste m factorize s into th e likelihoo d function s fo r the margina l an d conditiona l model s give n b y th e secon d equatio n i n (46) an d (50 ) respectively. Ther e ar e n o cross-equatio n restrictions ; th e parameter o f interes t /3 ca n b e estimate d an d identifie d fro m (50 ) alone; and, recallin g th e discussio n o f wea k exogeneit y i n Chapte r 1 , th e marginal proces s generatin g u 2t nee d no t b e modelle d whe n estima ting 13. 7.10.2. Example (Phillips 1988a: 355)
where,
246 Co-integratio
n i n Individua l Equations
Then
The long-ru n covariance matrix of (rj t, u 2t)' i s given by
where CTH 2= au - o\ 2a22. The expression fo r Sln.2 follow s from appli cation o f th e conditional-expectation s formul a an d fro m inspectio n o f (53). t], an d u 2, ar e agai n incoherent , an d th e limi t Brownia n motion s are
where B n an d B 2 ar e independen t an d 5, , = BI 2 . Thus , estimatin g a dynamic single-equatio n mode l (th e conditional model ) provide s esti mates identical , asymptotically , t o thos e provide d b y th e Phillips Hansen procedure . Her e th e conditiona l mode l is given by In error-correctio n format , we may rewrit e (54) a s
Equation (54 ) is th e on e tha t mus t b e estimate d i n orde r t o obtai n a n asymptotically unbiase d estimato r o f 13. Th e static regressio n i s augmented i n (54 ) by th e term s Ay 2 r an d Ay 2 r _j. Thes e additiona l term s are incorporate d t o reduc e o r eliminate , in finit e samples , th e effect s o f second-order bias , without estimating the ful l system . Phillips (1988fl ) note s tha t th e bia s correctio n ter m d + fo r thi s example i s equal t o zer o sinc e A = (« 12, ft>22)'. However, t o obtai n full y modified estimates , fro m (34 ) y^ need s t o b e correcte d fo r long-ru n endogeneity a s follows : The sam e correctio n i s achieve d i n th e dynami c regression b y th e tw o Ay 2 r -/ term s i n (55) . The static regressio n produce s biase s b y ignoring these corrections . 7.10.3. Example (Phillips 1988a: 356)
Co-integration in Individua l Equations
247
We tak e th e proces s (e lt , e 2t)' t o b e distribute d a s i n Sectio n 7.10.2 . Then i t may be show n that The long-ru n covariance matrix is given by
where a 11-2 is as defined i n Sectio n 7.10.2, an d
The Brownia n motion s B^ an d B 2 ar e correlate d an d th e single equation dynami c estimato r an d th e full y modifie d estimato r ar e n o longer equivalent , unles s $ 21 =0. Fo r th e structur e o f th e correlatio n between B n an d B 2 (se e Phillips 1988a): where B^ 2(r) i s a univariat e Brownia n motio n proces s wit h varianc e given by crn 2 - oli^d^H' 1 an d is independent o f B2(r). Further ,
From (58 ) setting 9 2\ equa l t o zer o make s th e B^r) an d equivalent t o eac h other . Further , B^^r) ha s a variance o f on 2 an d is in al l respect s equivalen t t o th e S 12 (r) proces s give n i n (37 ) above. Thus, th e B n(r) an d B i2(r) processe s ar e equivalent , and , in accord ance wit h th e previou s discussion , thi s equivalenc e lead s t o th e equival ence o f th e single-equatio n dynami c estimato r an d th e full y modifie d estimator. It shoul d b e note d tha t # 21 = £ 0 also implie s that th e T-typ e term s (see Section 7.8 ) are importan t i n th e long-ru n varianc e matri x fo r th e (TJ ( , M 2()' process . Thi s i s jus t anothe r wa y o f sayin g tha t th e pas t o f th e process i s importan t (an d so, i n th e (rj t, u 2t)' constructio n w e hav e no t achieved a martingal e differenc e sequence) . Thus , th e equivalenc e o f dynamic single-equatio n estimator s an d full y modifie d estimator s ma y also b e assesse d b y lookin g fo r th e presenc e o f T-typ e term s i n th e long-run varianc e matrix . Thes e ar e th e term s (fo r example, th e firs t term i n (59) ) tha t giv e ris e t o biase s i n th e single-equatio n dynami c estimates o f the co-integratin g vector. The necessar y an d sufficien t conditio n fo r non-equivalenc e ha s a natural interpretatio n i n th e languag e of a n earlie r literatur e o n dynamic
248 Co-integratio
n i n Individua l Equation s
modelling. I t i s eviden t tha t th e conditio n 621 ^ 0 violate s wea k exo geneity20 a s ma y b e verifie d fro m (57) ; an d onc e again , i t ma y b e see n that th e issue s o f a full y modifie d estimation an d dynami c specification are closel y related . Thi s exampl e form s th e basi s fo r th e simulatio n exercise discusse d i n the fina l sub-section . 7.10.4. Simulation Example (Phillips and Hansen 1990: 116) The data-generatio n proces s fo r thei r simulation study is given by
The desig n o f th e experimen t consiste d i n allowin g o 2\ an d 0 21 t o vary. Thus , fou r value s o f a 21 an d thre e value s o f 0 21 wer e used . Th e values o f CT21 considered wer e -0.8 , -0.4 , 0.4 , an d 0.8 , an d th e thre e values o f th e moving-averag e parameter 0 21 were 0.8 , 0.4 , an d O.O. 21 f t was se t equa l t o 2 fo r al l twelv e combinations o f th e value s o f 02 1 an d 02i- Th e ai m wa s t o calculat e an d compar e th e distribution s of estima tors an d /-statistic s fo r th e co-integratin g parameter obtaine d b y OLS , single-equation dynamic , and full y modifie d methods. For th e full y modifie d method , Phillip s an d Hanse n use d a Bartlet t triangular windo w of lag length 5 and th e OL S residuals u lt t o calculate non-parametric estimate s o f A , J 2 an d henc e o f d +. W e shal l denot e these estimate s b y A , fi , an d < 5 +. Th e OL S f-statisti c wa s estimate d b y using St u (th e (1,1 ) elemen t fro m th e non-parametricall y estimate d long-run varianc e matrix ) a s a n estimat e o f th e standar d error . Th e dynamic equatio n regresse d y lt o n (v 2 t , Ay 2 < , Ay 2 ,_i, Ay 2 ( _ 2 , A y l r _ l 5 Ayif- 2 ), usin g 30,000 replication s fo r eac h simulatio n (tha t is , fo r eac h pair o f values o f (0 21,
Co-integration i n Individua l Equation s 24
9
equation i s only a n approximation . Th e result s ar e presente d i n Table s 7.11 an d 7.12. Table 7.1 1 present s th e Mont e Carl o mean s an d standar d deviation s of (/ 3 - j8 ) fo r th e OLS , dynami c (D) , an d full y modifie d (FM ) estimators. I t i s clea r fro m thi s tabl e that , i n general , OL S give s th e most heavil y biase d estimator . However , ther e seem s t o b e littl e t o choose betwee n th e full y modifie d estimato r (FM ) and th e dynami c estimator (D) . No consisten t patter n o f superiorit y (measure d i n term s of lowe r absolute value s o f biase s an d standar d errors ) appear s t o b e present. Consider firs t th e cases where 0 2i ^ 0. F° r a value of a2i = -0.8, D is more biase d tha n F M when 0 21 = 0.8 but i s less biase d whe n 0 21 = 0.4. When cr 2i = 0.8, the opposit e i s true. Fo r th e tw o intermediate values of a21 considered , th e bia s i n F M i s les s tha n o r equa l t o th e bia s i n D . Thus, althoug h F M provide s lowe r biase s i n a large r numbe r o f case s than D , th e evidenc e i s mixed . Fo r th e case s wher e 0 21 = 0, an d th e single-equation an d full y modifie d estimato r hav e distribution s that ar e asymptotically equivalent , D out-perform s F M o n thre e ou t o f fou r occasions. I n th e onl y case wher e FM provide s a lower bia s (cr 21 = 0.4) , the D an d F M biase s ar e 0.00 9 an d 0.00 4 (i n absolut e value ) respect ively. Thi s compariso n i s therefor e mad e i n th e contex t o f bot h D an d FM performing well in providing low biases. TABLE 7.11. Mea n (standar d deviation) of ( p — /?) 02i = 0. 4 02
021 = -0. 8
i = 0. 0
OLS D FM 0-21 = -0. 4 OLS D FM
-0.137 (0.125 ) -0.062 (0.106 ) -0.025 (0.127 )
-0.090 (0.089 ) -0.021 (0.066) -0.028 (0.079 )
-0.055 (0.061 ) -0.003 (0.041 ) -0.025 (0.052 )
-0.067 (0.081) -0.051 (0.086 ) -0.042 (0.094 )
-0.057 (0.079 ) -0.030 (0.077 ) -0.027 (0.081 )
-0.040 (0.061 ) -0.007 (0.060 ) -0.015 (0.063 )
OLS D FM 0-21 = 0. 8 OLS D FM
-0.024 (0.040 ) -0.023 (0.046 ) -0.023 (0.048 )
-0.020 (0.046 ) -0.019 (0.053 ) -0.012 (0.052 )
-0.011 (0.050) -0.009 (0.060 ) 0.004 (0.060 )
-0.015 (0.025 ) -0.009 (0.024 ) -0.016 (0.028 )
-0.010 (0.028 ) -0.008 (0.030 ) -0.005 (0.030 )
-0.004 (0.036 ) -0.005 (0.039 ) 0.015 (0.043 )
a21 = 0.4
Reproduced fro m Phillip s an d Hanse n (1990) .
250 Co-integratio
n i n Individua l Equations
Phillips an d Hanse n als o estimat e th e probabilit y densit y function s for the D an d F M estimators. When 6*2 1 = 0.8, th e densit y functio n o f FM is better centre d (a t zero ) tha n D , whil e th e opposit e i s tru e whe n 021 = 0. 22
Based the n o n a consideratio n o f biases alone , ther e d o no t appea r t o be stron g ground s fo r preferrin g th e F M ove r th e D estimator . Thi s observation mus t b e qualifie d b y tw o cautionar y remarks . First , th e experiment considere d her e i s ver y limited , an d a mor e extensiv e simulation exercis e migh t revea l th e superiorit y o f F M ove r D . Second , both F M (i n finit e samples ) an d D coul d b e out-performe d b y full information maximu m likelihoo d estimatio n o f th e two-equatio n system. Such a comparison i s undertaken i n the nex t chapter . The argumen t i n favou r o f usin g full y modifie d method s i s stronge r when on e consider s th e evidenc e presente d i n Tabl e 7.12 , wherei n th e means an d standar d deviation s o f th e distribution s of th e f-statistic s ar e tabulated. Her e th e conclusion s ar e mor e nearl y unambiguous . Whe n 02i ^ 0, th e f-statistic s fro m D ar e mor e heavil y biase d tha n thos e obtained fro m F M (i n all but on e case ) an d hav e highe r standard errors . The F M /-statisti c come s muc h close r t o achievin g a distributio n that i s roughly norma l tha n doe s th e /-statisti c fro m D . A s note d b y Phillip s and Hansen , th e relativel y inferio r behaviou r o f th e dynami c /-statisti c may hav e bee n cause d b y th e inclusio n of a n insufficien t numbe r o f la g terms i n th e D regression . Whe n 6> 2i = 0, th e dynami c /-statisti c i s substantially les s biase d (i n al l but on e case ) tha n th e F M /-statistic , bu t its variance i s much higher. Since th e us e o f th e norma l distributio n i s a considerabl e simplifica tion an d th e bia s comparison s ar e a t bes t ambiguou s fo r th e dynami c estimates (whe n $2 1 ^ 0) > ther e ma y b e reason s t o prefe r th e F M estimator over th e D estimato r whe n onl y long-ru n parameter s ar e o f interest. Thi s recommendatio n mus t b e qualifie d b y noting tha t a mor e richly parameterize d dynami c mode l ma y hav e provide d lowe r biase s and a distributio n o f th e /-statisti c close r t o th e norma l distribution . Performance wit h a negativ e M A paramete r i s als o important ; som e early studie s hav e suggeste d tha t th e F M estimato r perform s less well in such cases . Bot h thes e qualification s poin t t o th e nee d fo r mor e extensive simulation studies . What i s clea r fro m al l th e studie s considere d s o fa r i s th e poo r performance o f unmodifie d estimate s derive d fro m stati c regressions . Some for m o f incorporatio n o f th e dynami c structur e o f th e data generation process , eithe r b y mean s o f a non-parametri c correctio n o f the stati c regressio n estimate s o r b y runnin g dynami c regressions , i s 22 Phillip s an d Hanse n rationaliz e thi s behaviou r b y statin g tha t 'whe n thi s conditio n [02i = 0] doe s hold , th e parametri c natur e o f th e [dynamic ] metho d give s i t a natura l advantage ove r ou r semi-parametri c approach ' (1990 : 119) .
Co-integration i n Individual Equation s
251
TABLE 7.12. Mea n (standar d deviation ) of
02i = -0. 8
OLS D FM 021 = -0. 4 OLS D FM
02! = 0. 4
OLS D FM
CT21 = 0. 8
OLS D FM
02i = 0. 4
92i = 0. 0
-1.616 (1.268) -1.259 (2.040 ) -0.388 (1.432 )
-1.240 (1.105 ) -0.563 (1.701 ) -0.449 (1.092 )
-0.930 (1.00 ) -0.003 (1.40) -0.025 (0.896 )
-1.156 (1.32) -1.058 (1.69) -0.729 (1.49 )
-0.986 (1.25) -0.636 (1.57 ) -0.516 (1.35 )
-0.754 (1.149) -0.163 (1.388) -0.335 (1.193)
-0.711 (1.19) -0.664 (1.29 ) -0.606 (1.26 )
-0.520 (1.21) -0.478 (1.34 ) -0.267 (1.30 )
-0.267 (1.24 ) -0.213 (1.37) 0.096 (1.36 )
-0.575 (0.955 ) -0.445 (1.15) -0.519 (0.922 )
-0.302 (0.979 ) -0.339 (1.25 ) -0.102 (0.962 )
-0.098 (1.04 ) -0.184 (1.36 ) 0.418 (1.12 )
Reproduced fro m Phillip s an d Hanse n (1990) .
necessary fo r inference . Whil e super-consistenc y theorem s sho w tha t 1(0) term s ma y b e ignore d asymptotically i n regression s wit h 1(1 ) variables, thes e asymptoti c result s hav e littl e bearing , o n sampl e size s common i n econometrics , wher e 1(0 ) term s ar e importan t an d nee d t o be accommodated . The othe r importan t issu e raise d b y thes e example s i s th e wea k exogeneity o f th e conditionin g variable s fo r th e parameter s o f interest . Reconsider th e DG P i n (31 ) and (32 ) where u t i s a first-orde r auto regressive process, s o that a finite la g length dynamic model is valid:
where
Then or
252 Co-integratio
n i n Individua l Equation s
in term s o f 1(0 ) variables . Le t £[£]. < |e2f ] = °u a22£2t = Y£2t s o £
Further, assum e tha t 0 = (ft* : a : ft : §)' denote s th e parameter s o f interest, an d indee d tha t 6 i s bot h constan t an d invarian t t o regim e shifts affectin g Ay 2 ( . Nevertheless , althoug h (61 ) appear s t o defin e a valid conditiona l mode l fo r al l value s o f 0 , i f c 21 ¥= 0 the n Ay 2 , i s no t weakly exogenou s fo r 6 . Becaus e o f th e resultin g non-diagonality o f th e long-run c o variance matrix , thi s los s o f wea k exogeneit y ca n hav e a detrimental impac t o n th e bia s an d efficienc y o f th e least-square s estimator o f 9 in finit e samples . In fact , c 21 ¥= 0 jointly violates th e wea k an d stron g exogeneit y o f y 2f for 0 . To sor t ou t whic h aspect i s dominant, thre e case s meri t comment : the followin g implication s ar e base d o n Mont e Carl o studie s o f (61) . First, eve n i f y = 0 , s o tha t ther e i s n o simultaneit y an d 13* = p , th e previous conclusio n holds . Second , i f y = £ 0 wherea s c 21 = 0 , y 2r i s strongly exogenou s fo r 6 an d n o problem s result . Finally , i f stron g exogeneity alon e i s violated , bu t wea k exogeneit y holds , a s woul d happen i f A y l r _ j directl y affected Ay 2 , whe n c 21 = 0 , ther e ar e agai n no serious bia s effects . Thus , th e presenc e o f th e co-integratin g vecto r i n another equatio n appear s t o b e th e primar y determinan t o f th e finite sample bias . Consequently , co-integratio n force s a renewe d emphasi s on systems method s i f potentiall y misleadin g inferences ar e t o b e avoided . That i s the focu s o f Chapte r 8 .
Appendix: Covarianc e Matrice s Consider th e DG P i n (Al ) wher e y, is th e stationar y first-orde r vecto r autoregressive process : y r = Ay,_ i + e , wher e e t ~ IN(0 , S), (Al ) and al l th e laten t root s o f A li e insid e th e uni t circle . Ther e ar e thre e distinct c o variance matrice s relevan t t o th e analysis , a s follows , notin g that £(y f ) = 0. (a) Th e conditional (o r contemporaneous) covariance matrix
Co-integration i n Individual Equations
253
(b) Th e unconditional covariance matrix
obtained a s show n b y substitutin g (Al ) fo r y t, multiplyin g out , an d using stationarity . Th e element s o f G ca n be obtaine d b y vectoring (A3 ) and solving . (c) Th e long-run covariance matrix Consider th e finit e sampl e expression , analogou s t o E[T~~ 1S2T] i n th e scalar case :
Rewriting £ 2 as (I - A)( I - A)^ : G + A + A' + G(I - A')' 1 ^ ~ A') - G, on simplifyin g we have that : However, a mor e convenien t for m o f Q , directl y relate d t o th e spectra l density a t the origin , result s fro m (A3) :
(A5) 1
so tha t o n pre-multiplyin g E b y ( I -A)" an d post-multiplyin g b y (I - A')" 1 and using (A4):
254 Co-integratio
n i n Individua l Equation s
Similar principle s appl y t o derivin g thes e thre e matrice s i n mor e general weakl y stationar y processes . A s a secon d example , i f (Al ) i s altered t o th e first-orde r moving average: then, usin g j>t-i t o denot e availabl e information:
and: (A10) Following Phillip s an d Durlau f (1986) , consider a genera l 1(1 ) vecto r process: and v t i s a weakl y stationar y stochasti c proces s wit h unconditiona l covariance E(v tv't) = G an d long-ru n covarianc e Q = G + A + A'. Fro m (A4), A ca n be writte n as:
Extending th e analysi s in Chapte r 3 to allo w fo r vecto r processes , an d in th e appendi x t o Chapte r 6 t o allo w fo r non-II D errors , x r /Vr converges t o the vecto r Brownia n motion BM(fi) :
Then:
These vector formula e could b e standardize d usin g V(r) = K'B(r) wher e fi" 1 =KK'.
8
Co-integration i n System s of Equations We hav e s o fa r considere d onl y single-equatio n estimatio n an d testing. Whil e th e estimatio n o f singl e equation s i s convenien t an d often efficient , fo r som e purpose s onl y estimatio n o f a syste m provides sufficien t information . Thi s i s true, fo r example , whe n we consider th e estimatio n o f multipl e co-integratin g vectors , an d inference abou t th e numbe r o f suc h vectors. Traditionally , system s have bee n estimate d whe n ther e i s a failur e o f weak exogeneit y i n a singl e equation , an d thes e consideration s als o appl y here . Thi s chapter examine s method s o f findin g th e co-integratin g rank , considers eircumstance s whe n dynami c single-equatio n method s will be asymptoticall y equivalen t t o system s methods , an d provide s examples t o illustrat e thes e issues . Asymptoti c distribution s ar e also derived . In earlie r chapters , w e investigate d dat a serie s containin g uni t root s i n their scala r autoregressiv e representation s (i.e . thei r margina l distribu tions), an d denote d suc h serie s a s 1(1). I n thi s chapter w e will consider a vector tim e serie s of dimensio n n, a, = (*u,*2o • • •> x nt)' (generalizin g the analysi s t o an y numbe r o f variables) , wher e x , i s 1(1 ) s o tha t Ax r i s 1(0). Generally , an y arbitrar y linea r combinatio n o f th e element s o f x f , say w ( = a'x t, wil l als o b e 1(1) , an d suc h linea r combination s impl y o r give ris e t o spurious regressions. However , ther e ma y exis t vector s a , such tha t in whic h case th e relevant component s o f \t are co-integrated . In th e simples t bivariat e case , a s w e hav e seen , w e ma y tak e xf = (y t, z ty, wher e y t an d z t ar e individuall y 1(1). Th e arbitrar y linea r combination (y, - Kz t) wil l als o b e 1(1) , bu t i f there exist s a value i q of K suc h tha t (y, - jqz, ) ~ 1(0) , the n y t an d z t ar e co-integrated . Lettin g a{ = (1, — iq) b e th e co-integratin g vecto r i n thi s case , a^ mus t b e unique, sinc e fo r an y othe r valu e K*, the n y t — K*zt = yt ~ *q£ r + (jq - K*)z t = w t + (KI — n*)zt, whic h i s the su m of a n 1(0 ) proces s an d an 1(1) process, an d therefor e 1(1 ) unless j q = ie* .
256 Co-integratio
n i n System s o f Equations
For n element s i n x t ~ 1(1) , ther e ca n be , a t most , n — 1 co-integrating combinations. l Henc e 0 ^ r ^ n — I an d th e r vector s ma y b e gathered i n a n n x r matri x « = [« 1; «2, . . ., a,.] . Outsid e th e bivariat e model, n > 2 an d th e co-integratin g matri x i s n o longe r uniqu e i n th e absence o f prio r information . W e note d i n Chapte r 2 th e relate d issu e for stationar y equilibria , onl y som e o f whic h nee d correspon d t o substantive economi c hypotheses . A simpl e cas e of non-uniquenes s occur s whe n subset s of the Xj t are co-integrated. I n fact , fo r an y non-singula r r x r matri x F , wf = Fa'x t = a*'x t i s als o 1(0) . Thi s las t resul t show s tha t linea r combi nations o f th e co-integratin g vector s themselve s for m co-integratin g combinations. Sinc e a)x r an d a-x , ar e 1(0) , s o i s any linea r combinatio n thereof. I n th e terminolog y o f linea r algebra , th e dimensio n o f th e co-integrating spac e (give n b y th e ran k o f th e matri x a ) i s r an d th e columns o f « form th e basis vectors of this space . Pre-multiplyin g «' b y an r x r non-singula r matri x F doe s no t alte r eithe r th e co-integratin g space o r it s dimensions . Therefore , strictl y speaking , estimatin g th e co-integrating matri x « essentiall y involve s derivin g th e basi s vectors . The matri x a i s non-unique in the absenc e o f prior information . A brie f justificatio n may b e offere d fo r focusin g on th e ope n interva l (0, n) o f N , a s the domai n o f values for r. When r = n, x, must b e 1(0) , as show n in Sectio n 8. 1 below . W e therefor e exclud e thi s case whe n we know tha t \ t i s 1(1 ) an d onl y conside r stochasti c processe s wher e variables ar e marginall y 1(1) . Thus , n — r > 0, an d w e ca n re-expres s the proces s {x,} i n term s o f 1(0 ) processes , usin g th e r co-integratin g relationships an d n — r firs t difference s o f th e process . Th e cas e o f r = 0 is a trivia l on e a s i t implie s th e absenc e o f eve n a singl e co-integratin g vector an d suggest s respecification of th e syste m in differences. As w e sa w i n Chapte r 5, Engl e an d Grange r (1987 ) establishe d a n isomorphism betwee n co-integratio n an d error-correctio n models . I n order t o examin e co-integratio n i n system s o f equations , w e wil l deriv e that result , formulatin g the syste m in EC M form , i n som e detai l below , starting thi s time fro m th e moving-averag e representation o f the process . From tha t system , a maximu m likelihoo d estimato r (MLE ) o f r, th e number o f co-integratin g relationships , wil l b e obtaine d base d o n a method propose d b y Johansen (1988) . Thi s wil l i n turn enabl e u s to tes t hypotheses concernin g th e dimensio n o f th e co-integratio n space , an d establish a 'central value' o f a . A proo f o f this result i s given i n Sect . 8.1.
Co-integration i n System s o f Equations 25
7
8.1. Co-integratio n and Erro r Correction We no w retur n t o th e representatio n o f a co-integrate d syste m i n autoregressive o r (equivalently ) i n error-correctio n form . Whe n {Ax, } is a stationar y proces s (possibly ) wit h drift , w e ca n expres s i t a s a multivariate movin g averag e usin g th e Wol d (1954 ) decompositio n theorem: where e , ~ IID(0 , ft) ; L i s agai n th e la g operator , an d C(L ) i s a polynomial matrix in L give n by
The cumulativ e or tota l effec t fro m C(L ) i s given by
where th e C , agai n obe y a n exponentia l deca y conditio n o f th e for m discussed i n Chapter 5 . Using C(l), w e can rewrite C(L ) as where C*(L ) = Zr=oCfL' an d Cf= -E^+iC / s o that Cj f = !„ - C(l) . Note tha t th e existenc e o f thes e matrice s i s agai n guarantee d b y th e exponential deca y condition. Thus , fro m (1) , or
where fi = C(l)m . The ke y assumption s needed t o deriv e th e autoregressiv e representa tion o f th e proces s ar e give n below . A s i n Chapte r 5 , th e proo f follow s Johansen (1991a) . ASSUMPTION Bl. Th e characteristi c polynomial,
has root s eithe r equa l t o o r strictl y greate r tha n 1 ; tha t is , |C(z)| = 0 implies tha t eithe r \ z > 1 or z = 1. ASSUMPTION B2. Th e matri x C(l ) ha s reduce d ran k n — r an d i s therefore expressibl e a s the produc t o f two n x ( n - r ) matrice s (j> and tj, wher e ^ an d i\ have rank n — r. Thus, C(l ) = <j)t]' .
258 Co-integratio
n i n System s o f Equations
ASSUMPTION B3. Th e r X r matri x 0j_C*(l)i/ i ha s ful l ran k r.
2
Assumptions B1-B3 are analogous t o Assumptions A1-A3 in Chapter 5 . Given ou r result s o n C(l) in Chapter 5 , i t is natural t o requir e tha t C(l) be o f reduced ran k an d have ran k n - r . Also, r = n implie s tha t C(l) is identicall y th e nul l matrix . Thus , fro m (3) , (Ax, — fi) = C*(L)Ae( , which implies , afte r integration , tha t x , i s integrate d (a t most ) o f orde r 0. Assumption B 3 then rule s out th e possibilit y that C*(L ) ha s a root o n the uni t circle , s o x , canno t b e integrate d o f orde r — 1. I n eithe r case , we hav e a contradictio n o f the assumptio n tha t th e component s o f x, ar e 1(1). To deriv e th e autoregressiv e representation , multipl y (3 ) b y tjt' an d 0i respectivel y t o obtain th e equations
using th e decompositio n C(l ) = <$>r\' an d th e resul t tha t The matri x C(l) is not invertibl e an d th e syste m given b y (4a) an d (4b) therefore canno t b e inverte d directl y t o expres s th e x it i n term s o f th e e,,. A n invertibl e syste m i s obtaine d b y defining , a s i n Chapte r 5 , tw o new variables , w , = (i/'»/)~ 1 ij'e r an d y, = (i/li/i^i/iAe,. Repeatin g th e steps use d i n Chapte r 5 , th e matrice s fj an d i] L ar e define d a s and i/^i/'ii/i)" 1 respectively . Next, agai n as in Chapter 5 , Substituting int o (4a ) an d (4b) give s
We therefor e hav e with
For z = l , thi s matrix has determinant
2 ±
Th e orthogona l complemen t of a matrix is defined in Sect . 5.3.1. Usin g this definition, an d i/ ± ar e n x r dimensiona l matrices with rank r.
Co-integration i n System s of Equations 25
9
which i s non-zero , usin g Assumption s B 2 an d B3 . Thus , B(z ) does no t have a root a t 1 . For \ z > 1, where (7 ) ma y b e show n b y substitutin g fo r C*(z ) i n B(z ) in term s of C(z ) an d C(l ) = (jtrj 1 , an d usin g th e orthogonalit y conditio n tj>'±4> = i f Iff = O r x ( n _ r ) . Fo r z > 1 , from (7), Thus fo r z > 1, |B(z ) = 0 i f an d onl y if |C(z) [ = 0 . Excludin g z = 1 , by Assumption B l th e onl y remainin g root s o f thi s determinan t li e outsid e the uni t circle . All the root s of |B(z) | = 0 are therefor e outsid e th e uni t disk , an d th e system define d by (5a) an d (5b) i s invertible. Thus , fro m (6), Also fro m (6) , note tha t
and, usin g the formul a for inversio n of partitioned matrices ,
From th e definitio n of Ae ?,
where F(L ) = [fj(l - L) , i Integrating (9 ) gives where x 0 is the constan t o f integration. T o deriv e the valu e of F(l), not e that Substituting fo r (B(l))- 1 fro m (8 ) give s F(l ) = Thus, recallin g tha t fi = C(l)m = (>if') m> F(l)^ i = 0 n x l . Th e auto regressive representation , i n its fina l form , is therefore give n by
260 Co-integratio
n i n System s of Equations
Several feature s o f th e derivation s abov e ar e noteworthy , particularl y with respec t t o th e F(l ) matrix. First , F(1)C(1 ) = C(1)F(1) = O n . Thi s result follow s fro m substitutin g »/j_(V»lC*(l)ij 1 )^ 1 ^>j. fo r F(l ) and tfrtj' for C(l ) and usin g the orthogonalit y conditions . Thi s re-emphasize s th e duality, firs t mentione d i n Chapte r 5 , betwee n th e impac t matri x i n th e MA representation , give n b y C(l) , and th e impac t matri x i n th e A R representation, give n her e b y F(l). The nul l spac e o f th e forme r i s th e range spac e o f the latte r an d vic e versa. Second, th e isomorphis m o f F(l) with the ya ' matri x in Chapter 5 can be demonstrate d easily . Note tha t
Both »/i(^j.C*(l)i/ 1 )~ 1 an d <j> L ar e matrice s o f ran k r an d dimensio n n x r . Thus , redefinin g §\ a s «' an d iji(0j_C*(l)i|_ L )~ 1 a s y , w e have F(l) = ya' , whic h i s a n n x n matri x wit h ran k r an d i s isomorphi c t o n. I t i s natura l t o defin e (jt'^x, (a'x, i n Chapte r 5 ) a s th e co-integrate d combinations o f th e x it. Integratin g (4b) show s tha t ^x, doe s no t contain a n integrate d componen t o f th e for m 2;= i e c Further , b y th e orthogonality o f ft wit h <j> L, th e co-integratin g combination s d o no t contain a trend . Bot h thes e result s matc h exactl y th e correspondin g results o n a'x t i n Chapter 5 . Third, i f B(L) wer e no t o f ful l rank , i t woul d b e possibl e t o extrac t another uni t roo t i n th e representatio n give n b y (6) , and th e syste m would b e 1(0 ) instea d o f 1(1) , a s assume d originally . Th e importanc e o f Assumption B 3 i s no w clear . Finally , usin g th e resul t tha t th e ran k o f F(l) i s r, i t is possible t o rewrit e th e mode l i n error-correction for m as where F(l) , lik e n i n Chapte r 5 , i s a matrix o f rank r an d ca n therefore be decompose d int o tw o n x r matrices , eac h o f ran k r. Th e step s involved i n goin g from th e fina l autoregressiv e for m of the syste m t o th e ECM form ar e given in (5.25)-(5.27), with n playin g the rol e o f F(l) . Sections 5. 3 an d 8. 1 hav e demonstrate d th e isomorphis m o f th e moving-average, error-correction , an d autoregressiv e representation s o f co-integrated processes . Th e nex t sectio n return s t o th e autoregressiv e representation an d relate s thi s t o th e metho d use d b y Johanse n (1988 ) which test s th e ran k o f n = ya', since , i f ther e ar e r co-integratin g vectors an d ya ' = n, the n ran k (it) = r. Th e non-uniquenes s o f thes e vectors (i n the absenc e o f a priori information ) is easily seen: for al l r X r non-singula r matrice s P . However , sinc e ran k (a ) = r , w e can normaliz e a * (perhap s afte r suitabl e rearrangemen t o f the variables ) such tha t «* ' = (I r : ft'), and so a*'x, = \at + fl'xbt wher e x' t = (x^ : x' bt).
Co-integration in System s of Equations 26
1
An importan t poin t fo r inference , give n (10) , is that th e EC M term s «'xr-/t wil l generall y ente r mor e tha n on e equation . Thi s wil l violat e weak exogeneit y whe n a i s a parameter o f interest, sinc e th e ECM s wil l be presen t i n som e o f th e othe r margina l distributions , an d wil l therefore necessitat e join t estimatio n fo r efficienc y a s discusse d i n Chapter 7 (see e.g. Phillips 1991, and Phillip s an d Hanse n 1990) . Henc e a necessar y conditio n fo r th e us e o f single-equatio n method s t o b e appropriate i n th e analysi s of co-integrate d system s is tha t th e relevan t ECM term s ente r only th e equatio n unde r study ; thi s i s clearl y no t a sufficient condition , sinc e i t i s possibl e tha t ther e ca n b e link s betwee n other parameters . As a n illustratio n o f (10) , conside r th e cas e wher e n = 2 an d r = 1. Let « ' = (1 , — K) an d X Q = 0 , s o that th e respectiv e system s become
(11') The for m i n (11 ) i s th e 'canonical ' representatio n i n 1(0 ) space , an d Phillips (1991 ) focuse s o n estimatio n o f thi s system . Whe n E(uitu2t) + 0, a 'simultaneit y problem ' i s present, bu t thi s ca n be deal t with b y th e inclusio n of A.x 2t a s a regressor i n th e firs t equatio n o f (11). The functiona l central-limi t theorem s fo r Wiene r processe s note d i n earlier chapter s appl y despit e th e seria l dependenc e i n u ( = [« lf , u 2t]', and direc t estimatio n o f K in th e firs t equatio n o f (11) ca n b e see n a s th e method originall y propose d b y Engl e an d Grange r (1987) . Inferenc e must, however, allo w for the seria l dependenc e i n ut. The latte r system , (11') , highlight s the 'structural ' form . At leas t one of Yi o r 7 2 mus t b e non-zero , sinc e otherwis e th e syste m ca n b e expressed i n term s o f difference d variable s alone . Wea k exogeneit y i s violated b y (among other possibilities ) YiY2 ^ 0. Since we are unlikely to know a priori whic h other equations ar e influence d by an y give n ECM, we tur n no w t o a metho d o f estimatin g th e co-integratin g rank r o f a system, which will also allo w tests o f this aspect o f weak exogeneity.
8.2. Estimatin g Co-integratin g Vector s in System s Consider the linea r system in (10) rewritten as
262 Co-integratio
n i n System s of Equations
where, fo r simplicity , w e hav e exclude d deterministi c term s suc h a s trends o r constants . W e shal l retur n t o a consideratio n o f thes e i n Section 8.5 . I n general , th e numbe r o f co-integratin g vector s wil l b e unknown i n empirica l modelling , an d mus t firs t b e determine d fro m th e data. Thi s step is important, becaus e both under - an d over-estimatio n of r hav e potentiall y seriou s consequence s fo r estimatio n an d inference . Under-estimation implie s th e omissio n o f empiricall y relevan t error correction terms , wit h thes e omitte d term s bein g relegate d t o e ( . Over-estimation implie s tha t th e distribution s o f statistic s wil l b e non standard. Thi s ma y b e demonstrate d b y inspectio n o f (12) . I f n i s correctly specified , al l th e variable s i n (12 ) ar e 1(0 ) an d standar d distributional result s apply . Howeve r ft">it-k will no t b e 1(0 ) i f the matri x « contain s vector s 0,%, say , suc h tha t a£x r _£ i s no t a co-integratin g combination an d i s therefore 1(1) . Th e vecto r itx. t-k W 'H hav e a mixture of 1(0 ) an d 1(1 ) term s correspondin g t o th e correc t an d incorrec t (o r over-estimated) co-integratin g vector s respectively . Incorrec t inference s will resul t fro m th e us e o f conventiona l critica l value s i n tests . W e wil l see late r tha t thi s ma y als o hav e a n advers e effec t o n forecastin g accuracy. Once r i s known , w e ca n procee d t o estimat e a an d y , notin g tha t non-singular linea r combination s o f thes e matrice s provid e equivalen t representations. Indeed , (« : y) is an over-parameterization o f n, so only the dimensio n o f the co-integratin g space ca n be establishe d directly . A tes t fo r th e nul l hypothesi s tha t ther e ar e r co-integratin g vector s can b e base d o n th e maximu m likelihoo d approac h propose d b y Johansen (1988) . Th e tes t i s equivalen t t o testin g whethe r j r = y « ' , where a an d y are n x r ; henc e i t i s a tes t o f the hypothesi s tha t n ha s less tha n ful l rank . We emphasiz e that , o f th e thre e distinc t cases , (i ) r = n, (ii ) r = 0, and (iii ) 0 < r < n, onl y cas e (iii ) wil l b e considere d formally . W e hav e already show n tha t cas e (i ) implie s tha t al l th e variable s i n x t ar e 1(0 ) and woul d onl y b e o f interes t i f ou r initia l assumption , tha t x , i s 1(1) , were incorrect . I n cas e (ii) , n = 0 and the syste m ought t o b e respecified in difference s t o achiev e stationarity . W e ca n potentiall y cove r thi s cas e as an extrem e o f cas e (iii) . For 0 < r < n, unde r th e assumption s tha t (12 ) i s the DGP , tha t al l coefficient matrice s ar e constant , tha t xj_ f c . . . x0 ar e give n and that 3
3 Phillip s an d Durlau f (1986 ) deriv e th e limitin g distributio n o f th e least-square s estimator o f (the equivalen t of) n , allowin g fo r more genera l error processes .
Co-integration i n System s o f Equations 26
3
the log-likelihoo d functio n i s derive d fro m th e multivariat e norma l distribution:4
The firs t ste p i s t o concentrat e L ( •) wit h respect t o £2 , whic h involves no ne w considerations , an d yield s th e conventiona l resul t tha t £2 = r~ 1 X; r = 1 e r eJ. Next , we remov e th e know n 1(0) variable s fro m (12) to focu s o n th e matri x of interest n , whic h requires concentratin g L ( •) with respec t t o (D 1; . . ., D^_j) . T o d o so , sinc e th e {D J ar e unre stricted, w e ca n partia l ou t th e effect s o f (Ax,_! , . . ., A.x,_ k+l) fro m both Ax t an d x ( _^ b y regression , t o obtai n residual s Ro f an d R ^ respectively. Le t q ( = (AxJ_ 1; . . ., AxJ_ A + i)'; then
The concentrate d likelihoo d functio n L*(JT ) no w depend s onl y o n {Rn,, Rift} an d take s th e form
Next, w e comput e th e second-momen t matrice s o f al l o f thes e residuals and their cross-products , S 0o, S 0 ^, Sk0, Skk, where
4 Not e that we use th e upper-cas e n fo r th e rati o of the circumferenc e of a circle to it s diameter, a s opposed to the lower-case n define d earlie r a s the matrix product yo'.
264 Co-integratio
n i n System s o f Equation s
Consequently, fro m (18) ,
If n were unrestricted, a conventional regression estimator would result . However, w e ar e intereste d i n th e clas s o f solution s tha t resul t from th e imposition o f the restrictio n tha t Hence, fro m (20) ,
Next, concentrat e L*(y , a) wit h respec t t o y , whic h wil l delive r a n expression fo r th e ML E o f y a s a functio n o f « , an d yield s a furthe r concentrated likelihoo d functio n whic h depend s onl y o n a . Onc e th e MLE o f a i s obtained , w e ca n solv e backward s fo r estimate s o f al l th e other unknow n parameter s a s function s o f th e ML E o f a . Thus , fro m (21),
Substituting $ into (21) yields L**(«) :
At firs t sight , differentiatin g L**(« ) wit h respec t t o « looks formidable, but i n fac t th e algebr a involve d i s clos e t o tha t underlyin g th e well known LIM L estimato r fo r a singl e equatio n fro m a simultaneou s system; bot h depen d o n reduced-ran k restriction s bein g imposed . I n order t o solv e th e problem , w e appl y partitione d inversio n result s t o (23) an d obtai n
Then maximizin g L**(a) wit h respec t t o a correspond s t o minimizing the generalized varianc e ratio , noting tha t [Soo l i s a constant . T o locat e tha t minimum , we procee d a s with LIM L an d impos e th e normalizatio n tha t a'S kka= I. Th e ML E now requires tha t w e minimize, with respec t t o « ,
Co-integration i n System s o f Equations 26
5
This involve s finding th e saddle-poin t o f the Lagrangian , where
266 Co-integratio
n i n System s o f Equation s
symmetric, positive-definit e matri x fo r finit e T, it s invers e ca n b e factorized a s where G is non-singular. Substitutin g thi s expressio n int o (27 ) produce s a conventional eigenvalu e problem : In derivin g (30) , w e hav e mad e us e o f th e fac t tha t G'S^ G = I. Thus , only conventional estimatio n tool s ar e needed . Further, fro m (29) , where A is the diagona l matri x of eigenvalues. Hence , as V'S^V = I ,
so tha t SfcoSoo^of c i s diagonalize d t o A b y th e V , V transformation . Moreover, A i s ordered suc h tha t th e firs t r element s (denote d A r ) ar e the larges t eigenvalues , an d th e remainin g {n — r) (denote d A n _ r ) ar e the smallest . Thes e eigenvalue s wil l pla y a primar y rol e i n inferenc e about th e dimensio n r o f th e co-integratin g space . W e focu s o n thi s issue i n th e nex t sectio n wher e th e asymptoti c distributio n o f th e estimators o f the eigenvalue s is also discussed . Finally , fro m (32) , where p i s the ( n - r ) x n matrix , analogou s t o y , an d corresponds t o the omitte d eigenvectors .
8.3. Inferenc e abou t th e Co-integratio n Spac e From (24 ) and (32) , the maximize d value of the likelihoo d functio n (23 ) is given b y
since A r i s th e sub-matri x o f A correspondin g t o th e r larges t eigen values.
Co-integration in System s o f Equations 26
7
Denote by H r th e hypothesi s that there ar e r co-integratin g vector s i n the syste m (i.e . ther e ar e n — r uni t roots) . Whe n n i s unrestricted, al l n eigenvalue s ar e retaine d an d th e unrestricte d maximu m o f th e likelihood functio n i s given b y
Since th e r larges t eigenvalue s delive r th e co-integratio n vectors , an d since A r+1 , A r+2 , . . ., A n shoul d b e zer o fo r th e non-co-integratin g combinations, test s o f th e hypothesi s tha t ther e ar e a t mos t r co-inte grating vector s 0 = £ r < n, an d thu s n — r uni t roots , ca n b e base d o n twice th e differenc e betwee n th e log-likelihoo d i n (33 ) an d tha t i n (34) ; that is,
The distributio n o f th e ry r o r trace statisti c i s derive d unde r th e hypothesis tha t ther e ar e r co-integratin g vector s an d test s H r withi n Hn. Th e tes t strateg y is, therefore , th e multivariat e analogue o f the D F test: th e potentiall y stationar y varian t i s estimated , th e coefficien t (matrix) o f th e level s i s teste d fo r significance , an d uni t root s ar e imposed wher e th e nul l canno t b e rejected . Th e testin g therefor e proceeds i n sequence fro m r] 0, jjj , . . ., t] n-\. Th e numbe r of co-integrating vector s selecte d i s r + 1 wher e th e las t significan t statisti c i s ?j r , which thereb y reject s th e hypothesi s o f n — r uni t roots . H 0 i s no t rejected i f r] 0 i s insignificant ; H O i s rejecte d i n favou r o f HI i f r]i i s significant; etc . Sinc e r] r = — Tlog|l — Ar , fro m (32 ) r\ r measure s th e 'importance' o f th e adjustmen t coefficient s o n th e eigenvector s t o b e potentially omitted . Th e distributio n o f r] r wil l b e discusse d shortly ; however, i t wil l no t b e th e conventiona l x 2 distributio n becaus e x f i s a (multivariate) 1(1 ) process . Thus , whil e Tt] r stil l measure s th e cos t i n likelihood term s o f omittin g n — r linea r combination s o f th e level s o f \t-k, th e metri c fo r judgin g a significan t los s o f likelihoo d i s differen t from tha t i n the 1(0 ) case . Alternatively, test s o f significance of the larges t A r coul d b e base d o n From (36) , £ r test s H, withi n H r+i. Th e t, r statisti c i s ofte n calle d th e maximal-eigenvalue o r K-max, statistic . Both rj r an d £ r hav e non-standar d distribution s which ar e functional s of multivariat e Wiene r processes . Fo r r) r, thi s proces s i s o f dimensio n n — r. Thes e distribution s ar e generalization s o f th e scala r (Dickey Fuller) Wiene r processe s considere d i n earlie r chapters . Th e crucia l
268 Co-integratio
n i n System s of Equations
feature tha t make s thes e method s operationa l i s tha t th e distribution s only depen d o n th e dimensio n n o f th e proces s unde r analysis . Thus , although ther e ar e n o analytica l form s fo r th e distributions , critica l values unde r thei r respectiv e null s ca n b e obtaine d b y Mont e Carl o simulation. Fo r example , critica l value s fo r th e abov e test s hav e bee n tabulated b y Johanse n (1988 ) an d Osterwald-Lenu m (1992) , inter alia, for a rang e o f value s o f n . Th e uppe r percentile s o f th e Osterwald Lenum table s ar e give n in Tabl e 8.1. 8 Eve n thoug h th e distribution s ar e non-standard, Johanse n (1988 ) suggest s a ^ 2-based approximatio n t o th e distribution o f r] r o f th e for m where h = 0.85 - 0.58/(2m 2) for m = n - r . Once th e degre e o f co-integratio n ha s bee n established , th e co integrating combinations ar e give n by and thes e linea r combination s o f th e dat a ar e th e estimate d ECMs . A s before, linea r transformation s ar e als o vali d co-integrating vectors, an d a choice amon g thes e coul d b e mad e eithe r o n th e basi s o f prio r information o r b y followin g test s fo r hypothesize d vector s a s considere d in Sectio n 8.52 . Moreover, onc e th e ECM s hav e bee n defined , y reveal s th e import ance of eac h co-integratin g combinatio n in eac h equation , and is relate d to th e speed s o f adjustmen t of each dependen t variabl e t o th e associate d disequilibria. I f a give n EC M enter s mor e tha n on e equation , th e co-integration parameter s ar e inherentl y cross-linke d betwee n suc h equations, an d henc e thei r dependen t variable s canno t b e weakl y exogenous i n th e relate d equations . Thi s implie s tha t join t estimatio n i s required t o comput e full y efficien t estimators . B y wa y o f contrast , i f a given colum n o f y is zero excep t fo r a singl e entry, an d ther e is only on e co-integrating vector , single-equatio n estimation o f tha t relatio n wil l no t lead t o an y loss o f information on co-integration .
8.4. A n Empirica l Illustratio n To illustrat e th e calculatio n involve d i n th e MLE , w e conside r th e relationship betwee n th e (log s of ) th e price s o f ne w an d second-han d 8
Th e table s i n Osterwald-Lenu m (1992 ) giv e critical values for value s o f n runnin g fro m 1 to 1 1 and are therefor e mor e extensiv e than those i n Johansen (1988) . W e ar e gratefu l t o Michael Osterwald-Lenu m fo r permission t o reproduc e thi s table.
Co-integration i n System s o f Equations 26
9
TABLE 8.1. Quantile s of th e asymptoti c distribution of the co-integratio n rank test statistic s rj r an d £ r DGP an d model: Ax , = ^fr^D/Ax,.. ; + jrx,_ fe + e, ; e t ~ IN(0 , £2) n - r 90
% 95
1
2 3 4 5 6 7 8 9 10 11
2.86 9.52 15.59 21.58 27.62 33.62 38.98 44.99 50.65 56.09 61.96
1 2 3 4 5 6 7 8 9 10 11
2.86 10.47 21.63 36.58 55.44 78.36 104.77 135.24 169.45 206.05 248.45
% 97.5
t,r (A-max ) 3.84 11.44 17.89 23.80 30.04 36.36 41.51 47.99 53.69 59.06 65.30 t\r (trace ) 3.84 12.53 24.31 39.89 59.46 82.49 109.99 141.20 175.77 212.67 255.27
% 99
%
4.93 13.27 20.02 26.14 32.51 38.59 44.28 50.78 56.55 61.57 68.35
6.51 15.69 22.99 28.82 35.17 41.00 47.15 53.90 59.78 65.21 72.36
4.93 14.43 26.64 42.30 62.91 86.09 114.22 146.78 181.44 219.88 261.71
6.51 16.31 29.75 45.58 66.52 90.45 119.80 152.32 187.31 226.40 269.81
Source: Osterwald-Lenu m (1992 : Table 0).
houses i n th e U.K. , denote d p n>t an d p hi, respectively , ove r th e quarterly (seasonall y unadjusted ) sampl e 1957(111) - 1981(11). A la g length of two periods is selected to captur e the mai n short-run dynamics in a parsimoniou s way , an d th e syste m t o b e estimate d take s th e for m (see Ericsso n an d Hendr y 1985 )
270 Co-integratio
n i n System s of Equations
The constan t an d th e thre e seasona l dumm y variable s (denote d q it) included unrestrictedl y i n bot h equation s wer e firs t concentrate d ou t o f the likelihoo d b y regressin g th e remainin g variable s o n the m an d takin g the residual s a s th e 'new ' dat a set . Next , th e lagge d difference d variables wer e remove d i n a simila r wa y (se e equations (14)-(17) ) t o leave th e R 0r an d R 2r term s use d i n calculatin g th e secon d moment s S, y in (19) . Give n thes e moments , (27 ) ca n b e solve d fo r th e eigenvalue s Ay, which yielde d The test-statistic s r\ r an d t, r base d o n these , togethe r wit h thei r 5 pe r cent critica l value s fro m Tabl e 1 o f Osterwald-Lenu m (1992 ) (denote d by r] r(Q.Q5) etc. ) are give n i n Tabl e 8.2 . The hypothesi s tha t ther e ar e two uni t root s ca n b e rejecte d i n favour of one uni t root (an d henc e on e co-integrating vector ) a t the 5 per cent level using bot h statistics, but th e hypothesis tha t ther e i s on e uni t roo t canno t b e rejecte d agains t th e maintained hypothesi s o f no uni t roots. W e therefor e selec t r = 1 in this case. The correspondin g estimate d eigenvector s (normalize d b y thei r diagonal elements ) ar e give n i n Tabl e 8.3 . The row s ar e th e row s o f a', and bot h ar e approximatel y (1 , -1) an d (-1,1), whic h correspond s t o the relativ e pric e (p n —ph) bein g the co-integratin g relation, a s might b e expected fo r a n ECM . The estimate s o f y ar e give n i n Tabl e 8.4 . The firs t colum n corresponds t o th e firs t colum n o f y an d reveal s on e reasonabl y larg e feedback coefficien t o f -0.0 6 from (p n>t-2 ~ Ph,t-2) o n t o Ap n>f ; mos t o f the remainin g coefficient s ar e relativel y clos e t o bein g negligible , give n the meanin g an d unit s of the EC M here . Thus , i t woul d no t b e possibl e to rejec t th e hypothesi s tha t p hit wa s weakl y exogenou s i n th e p n>t equation o n th e basi s o f thi s evidenc e alone . Th e smal l value s o f th e coefficients i n th e secon d colum n ar e consisten t wit h th e ver y smal l values of rji and £ 1; s o littl e los s of likelihoo d woul d resul t fro m respecifying th e syste m i n term s o f th e 1(0 ) variable s &p n,t, &Ph,t> an d (Pn,t-l ~ Ph,t-l)-
TABLE 8.2 . Test s an d Critica l value s £,(0.05)
n —2 = r =0 n i_ ~ r = j
16.1 0.41
14..1 3..76
Source: Osterwald-Lenu m (1992) , Tabl e 1 .
16 .5 0 .41
15,.4 3,.76
Co-integration i n System s of Equations 27
1
TABLE 8.3. Normalize d eigenvector s « ' Variable p
n
ph
pn 1.00 ph -1.06
0 -1.07 3 1.00
7 0
TABLE 8.4. Adjustmen t coefficients y Variable p pn -0.06 ph 0.02
n
p
h
3 -0.00 2 -0.01
7 9
8.5. Extension s The precedin g result s hol d fo r a simpl e model . Severa l possibl e exten sions an d othe r consideration s aris e i n thi s mode l an d w e shal l briefl y consider eigh t o f these: 1. dumm y variables (suc h a s constants an d trends) ; 2. linea r restriction s o n co-integrating vectors ; 3. power s o f tests ; 4. forecastin g in co-integrated processes ; 5. finite-sampl e properties; 6. selectin g la g length; 7. 1(2 ) variables; 8. wea k exogeneity an d conditional models . 8.5.1. Dummy Variables The firs t issu e o f practica l importanc e i s th e potentia l presenc e o f intercepts i n th e equations . Th e inclusio n o f intercept s i n th e estimate d system alter s th e critica l values of th e test s fro m thos e tha t obtai n whe n no intercept s ar e presen t (a s a compariso n o f Tabl e 8. 1 (n o constant ) with Tabl e 8. 5 belo w shows) . Unde r th e nul l o f n o co-integratin g vectors, non-zer o intercept s woul d generat e trends . However , eve n i n equations wit h ECMs , tw o possibilitie s arise : tha t th e intercep t enter s only i n th e ECM , o r tha t i t als o enter s a s an autonomou s growt h facto r in th e equation . Bot h case s ar e considere d b y Osterwald-Lenu m (1992 ) and Johanse n an d Juselius (1990) . I n term s o f (12), th e mode l become s
272 Co-integratio
n i n System s o f Equations
where fi i s a n n x 1 vector o f intercepts . Whe n ji i i s unrestricted , i t ca n be concentrate d ou t o f th e likelihoo d function , an d merel y make s al l variables deviation s abou t thei r sampl e means . Afte r estimatio n o f y and a, th e ML E o f fi ca n b e derive d i n th e sam e wa y a s th e othe r parameters, concentrate d ou t o f th e likelihoo d function , wer e estimate d in Sectio n 8.2 . If an y give n equatio n contain s a n ECM , the n th e estimate d (un restricted) intercep t coul d b e include d i n tha t term , perhap s a t th e cos t of havin g ECM s wit h non-zer o means . However , thi s coul d lea d t o th e system havin g different mean s for th e sam e EC M i n differen t equations . An interestin g alternativ e possibilit y i s tha t fi i s restricte d t o enterin g only th e ECMs , namely , where « 0 is r x 1 . I n tha t case , (37 ) become s
Equations withou t ECM s clearl y ar e rando m walk s withou t drif t (bu t may hav e lagge d differences) , whil e equation s wit h ECM s hav e a common mea n give n b y y« 0, an d henc e als o hav e n o drift . Model s o f the ter m structur e o f interes t rate s migh t b e expecte d t o hav e suc h a property. Hall , Anderson , an d Grange r (1992) , Johanse n an d Juseliu s (1990), an d Osterwald-Lenu m (1992 ) discus s testing fo r thi s possibility. More specifically , consider a syste m writte n i n first-orde r autoregress ive for m (eithe r se t k = 1 o r regar d th e syste m a s bein g stacke d a s i n Chapter 5): where n = y« ' an d n* = I + ya ' . Reformulat e (38 ) i n 1(0 ) spac e b y partitioning x , int o (\' at:x'bt)' wher e «'x , an d Ax fe , ar e 1(0 ) b y construc tion. Fro m (38) , where w , = (xj a : AxJ,,)' = (w^:wj,,) ' an d y' = (r'a-Yb) whic (r x r:r X (n - r)}, s o that normalizin g by a'(l r : «*') the n t,
where E, ~ IN(0 , E). Lettin g J' = (0:1) , it is seen tha t
hi s
Co-integration i n System s o f Equations 27
3
This 1(0 ) for m allow s u s t o determin e th e unconditiona l mean s an d variances o f th e variable s an d henc e t o establis h th e impac t o f ft o n th e growth o f the variables . Whe n a' y is non-singular, th e long-ru n solutio n for th e syste m is defined by
so that which determine s th e growt h i n th e system . Sinc e n*y= ( I + y«')y = y(I + a'y ) = y^ > where , matchin g th e structur e o f C , ip = (I + «'y) , i t follows tha t jr* s y= yi/^ . Bu t sinc e C define s th e 1(0 ) representation , tys — » 0 a s s — »o o , s o tha t JT * ha s som e root s equa l t o unit y an d a convergent componen t ip. I n a bivariate case , ty would b e th e stationar y root o f JT* . The matri x K i s non-symmetri c an d idempoten t wit h a' K = 0' an d K y = 0 s o that,?r* K = K. Also , whe n y = 0 the n K = I. Sinc e th e condition tha t fi fall s i n th e co-integratin g spac e i s fi = y« 0 wher e « 0 i s r x 1 , then confirming th e absenc e o f any linear tren d i n x, when fi = y« 0. Further, th e unconditiona l varianc e matri x o f w t , var[w, ] = G , i s G = CGC' + £, or
This long-ru n varianc e matri x can be solve d by vectorizing , and reveal s the dependenc e o f G o n n onl y throug h y fc an d ip. Th e diagonalit y o r otherwise o f G i s importan t fo r determinin g th e qualit y o f single equation least-square s estimatio n o f co-integratin g relation s (se e Chapter 7). Tables 8.5-8. 7 provid e critica l values , agai n take n fro m Osterwald Lenum, fo r th e trac e an d A-ma x statistic s fo r bot h treatment s o f intercepts. The two possibilitie s may be deal t wit h mor e explicitl y by rewriting (37) as
wher e y ± i s a n n x (n — r) matri x orthogona l t o y an d f t — y«0 + Y±Po
274 Co-integratio
n i n System s o f Equation s
without los s o f generality . Thus , (} 0 = 0 correspond s t o th e cas e wher e the intercep t enter s onl y via the EC M terms . Equivalently , the constan t fi lie s i n th e spac e spanne d b y y an d henc e y'±fi= 7i7« o + Y'i.Y±Po ~ TiTiA) =0 whe n /J 0 = 0. Th e cas e )8 0 = £ 0 allow s the intercept s t o ente r autonomously a s growt h factors . Th e critica l value s i n th e table s appl y to three interestin g DGP-model combinations. Table 8. 5 provide s critica l value s whe n a 0 ¥= 0, P 0= £0 i n bot h th e DGP an d the mode l (i.e . th e intercep t enters separately) . Critica l values for « 0 ¥= 0, f a = 0 in th e DG P an d a 0 = £ 0, /? 0 ^ 0 in the mode l ar e given in Tabl e 8. 6 (intercep t enter s onl y EC M bu t mode l i s over-parameter ized). Tabl e 8. 7 considers the DGP-mode l combination given by « 0 = £ 0, TABLE 8.5. Quantile s o f th e asymptoti c distribution of th e co-integratio n rank tes t statistic s r\ r an d £ r DGP an d model: Ax ( = ^fj^Dj-Ax,-, - + n\ t-k + Y ao + Y-iPo + K t> «o * 0, fa * 0; e ( ~ IN(0 , Q) n - r 90
% 95
1
2 3 4 5 6 7 8 9 10 11
2.69 12.07 18.60 24.73 30.90 36.76 42.32 48.33 53.98 59.62 65.38
1 2 3 4 5 6 7 8 9 10 11
2.69 13.33 26.79 43.95 64.84 89.48 118.50 150.53 186.39 225.85 269.96
% 97.5
t,r (A-max ) 3.76 14.07 20.97 27.07 33.46 39.37 45.28 51.42 57.12 62.81 68.83 r\r (trace ) 3.76 15.41 29.68 47.21 68.52 94.15 124.24 156.00 192.89 233.13 277.71
Source: Osterwald-Lenu m (1992 : Tabl e 1).
% 99
%
4.95 16.05 23.09 28.98 35.71 41.86 47.96 54.29 59.33 65.44 72.11
6.65 18.63 25.52 32.24 38.77 45.10 51.57 57.69 62.80 69.09 75.95
4.95 17.52 32.56 50.35 71.80 98.33 128.45 161.32 198.82 239.46 284.87
6.65 20.04 35.65 54.46 76.07 103.18 133.57 168.36 204.95 247.18 293.44
Co-integration i n System s o f Equations 27
5
TABLE 8.6. Quantile s o f the asymptoti c distribution of the co-integratio n rank tes t statistic s r] r an d £ r DGP: Ax , = ^fr/DiAx^ , + :rx ( _ fc + y« 0 + e, , Model :a t n— r
90%
1 2 3 4 5 6 7 8 9 10 11
6.50 12.91 18.90 24.78 30.84 36.35 42.06 48.43 54.01 59.19 65.07
1 2 3 4 5 6 7 8 9 10 11
6.50 15.66 28.71 45.23 66.49 90.39 118.99 151.38 186.54 226.34 269.53
«0 ^ 0 e . ^£_ i.
f~IN(0,
95%
£r (A-max) 8.18 14.90 21.07 27.14 33.32 39.43 44.91 51.07 57.00 62.42 68.27 r]r (trace ) 8.18 17.95 31.52 48.28 70.60 95.18 124.25 157.11 192.84 232.49 277.39
fi);
97.5%
ft + e, 99%
9.72 17.07 22.89 29.16 35.80 41.86 47.59 53.85 59.80 64.98 70.69
11.65 19.19 25.75 32.14 38.78 44.59 51.30 57.07 63.37 68.61 74.36
9.72 20.08 34.48 51.54 74.04 99.32 129.75 162.75 198.06 238.26 283.84
11.65 23.52 37.22 55.43 78.87 104.20 136.06 168.92 204.79 246.27 292.65
a
I n th e model , fi = y«o + y±Po enter s unrestrictedly; that is , « 0 = £ 0, /J 0 ^ 0. Source: Osterwald-Lenu m (1992 : Table 1.1*) .
fa = 0 in bot h th e DG P an d the mode l (intercep t enter s onl y ECM an d model i s correctl y parameterized) . Not e tha t th e critical value s fo r th e DGP-model combinatio n give n by « 0 = 0, /} 0 = 0 in both th e DG P an d the mode l appear i n Table 8.1. Other possibl e dumm y variables includ e a trend , whic h would allo w the possibilit y tha t som e variable s wer e tren d stationary , an d seasona l dummy variable s i n quarterl y dat a (o r equivalen t dummie s i n dat a o f other frequencies) . Critica l value s fo r som e o f these additiona l case s ar e given b y Osterwald-Lenum , althoug h th e necessar y critica l value s t o
276 Co-integratio
n i n System s o f Equations
TABLE 8.7. Quantile s o f th e asymptoti c distribution o f th e co-integratio n rank tes t statistic s r\ r an d £ r DGP and model: Ax r = ^fj'/Dj-Ax,-., - + n^t-k + Y ao + e n «0 ^ 0 e , ~ IN(0 , ft) n- r 90
% 95
1
2 3 4 5 6 7 8 9 10 11
7.52 13.75 19.77 25.56 31.66 37.45 43.25 48.91 54.35 60.25 66.02
1 2 3 4 5 6 7 8 9 10 11
7.52 17.85 32.00 49.65 71.86 97.18 126.58 159.48 196.37 236.54 282.45
% 97.5
£r (A-max ) 9.24 15.67 22.00 28.14 34.40 40.30 46.45 52.00 57.42 63.57 69.74 77, (trace ) 9.24 19.96 34.91 53.12 76.07 102.14 131.70 165.58 202.92 244.15 291.40
% 99
%
10.80 17.63 24.07 30.32 36.90 43.22 48.99 54.71 60.50 66.24 72.64
12.97 20.20 26.81 33.24 39.79 46.82 51.91 57.95 63.71 69.94 76.63
10.80 22.05 37.61 56.06 80.06 106.74 136.49 171.28 208.81 251.30 298.31
12.97 24.60 41.07 60.16 84.45 111.01 143.09 177.20 215.74 257.68 307.64
Source: Osterwald-Lenu m (1992 : Tabl e 1*) .
implement test s fo r al l r an d fo r al l possibl e DGP-mode l combination s are no t available. 8.5.2. Linear Restrictions on Co-integrating Vectors A differen t se t of generalizations concern s testin g linea r restriction s o n « and y . Thes e woul d correspon d t o investigatin g a priori theorie s abou t the co-integratin g vectors , an d abou t thei r role s i n differen t equations . Conditional o n r bein g th e numbe r o f co-integratin g relationships, an d the mode l bein g transforme d t o 1(0 ) space , th e relevan t hypothese s
Co-integration i n System s of Equations 27
7
generally involv e standar d x 2 distributions . (Again , se e Johanse n 1988 , and Johansen an d Juselius 1990. ) As an example, conside r testin g linear restriction s o n a of the for m where J i s a know n n x s matrix an d * P i s an s x r matri x of unknown parameters an d r = s s < n . Maximizatio n o f th e likelihoo d functio n i s unaltered until equation (26) , which becomes
(39) In plac e o f (27) , w e mus t solv e fo r th e eigenvalue s A f s= A f & . . . ^ Af from th e equatio n using th e principle s applie d above . A likelihood-rati o tes t agains t th e unrestricted valu e o f a ca n b e calculate d an d amount s t o testin g H } within H r, an d is therefore based o n
The % r tes t result s i n a n asymptoti c X 2[r(n ~ s )] distribution . I t i s important t o not e tha t th e analysi s is now i n 1(0 ) space , conditiona l on having selecte d r earlier . Simila r results obtain fo r testin g the hypothesi s that a subset of « equals a known matrix. 8.5.3. Test Power Johansen (1989 ) ha s investigate d the powe r functio n o f th e r\ r tes t using the theor y o f 'near-integrated ' processe s a s develope d i n Phillip s (1991 ) and discusse d in Chapter 3 . I n place o f n — ya', Johansen considers where t/ » an d t ar e n x 1 fixe d vectors . Fo r a give n standardize d importance o f th e co-integratin g vecto r effect , th e powe r fall s a s n — r rises (sinc e a large r spac e ha s t o b e searche d t o fin d th e co-integratin g vector), an d depend s bot h o n th e magnitud e of the EC M impac t an d o n the positio n o f th e 'local ' co-integratin g vector s i n th e space . I n th e simple cas e where r = 1 , two scalar measures of the impac t o f the 'local ' co-integrating vector ar e give n by When eithe r i s zero , powe r rise s wit h th e other , bu t thei r effect s als o interact. Otherwise , not muc h is know n as yet abou t the powe r properties o f this systems approach.
278 Co-integratio
n i n System s o f Equations
An implicatio n o f thi s lack o f knowledge i s that mor e tha n usua l car e should b e take n i n decidin g upo n th e relevan t valu e o f r. T o rejec t th e null o f r + 1 co-integrating vectors, a critica l value fro m a n ( n — r — 1)dimensional Brownia n motio n i s consulted . Thi s i s a muc h large r valu e than tha t associate d wit h th e usua l ^-distribution , s o a large r absolut e value o f th e likelihoo d rati o seem s acceptabl e i f onl y r co-integratin g vectors ar e retained . However , i f the en d resul t o f a modellin g exercis e is a n overal l tes t o f th e validit y o f al l th e over-identifyin g restriction s imposed, a n investigato r wh o regarde d th e ( r + l)th co-integratin g vector a s 1(0) woul d obtai n a larg e valu e o f the tes t statisti c for omittin g this component . Sinc e test s o f over-identificatio n ten d t o hav e hig h numbers o f degree s o f freedom , tha t additiona l likelihoo d los s coul d b e highly significant . Thus, i t ma y not b e wis e simply to omi t co-integratin g vectors whic h ar e clos e t o som e conventiona l significanc e value . Alter natively, al l over-identificatio n tests shoul d b e conducte d i n 1(0 ) space , and th e reductio n fro m th e origina l level s syste m fo r x z teste d firs t a s 1(1) —»1(0) an d the n fo r furthe r restriction s conditiona l o n th e firs t tes t (see Hendr y an d Mizo n 1992) . 8.5.4. Forecasting with Co-integrated Systems Engle an d Yo o (1987 ) investigate d th e possibl e gain s fro m utilizin g co-integration informatio n whe n makin g /z-step-ahea d forecast s fro m dynamic system s fo r larg e h . The y conside r a dynami c bivariat e syste m and contraste d a n EC M formulatio n base d o n th e Engl e an d Grange r (1987) two-ste p approac h wit h a n unrestricte d VAR . Fro m th e commo n trends formulatio n of th e syste m (Stoc k an d Watso n 1988& ) discusse d i n Chapter 5 ,
where th e firs t ter m o n th e right-han d side i s a stochasti c tren d o f ran k n — r. If th e C*(L ) weight s decline rapidl y as functions of power s o f L , then fo r larg e h, th e h -step-ahead forecas t conditiona l o n informatio n available a t tim e t is approximatel y
Forecast error s ar e give n by
Such forecas t error s hav e variance s o f O(h) fo r individua l series , bu t
Co-integration i n System s o f Equations 27
9
remain 0(1 ) fo r combination s o f th e for m a'f t+fl\t sinc e a'C(l ) = 0. Thus, th e n tim e serie s shar e onl y n — r trends , s o forecast s o f th e series mov e togethe r i n linea r combination s eve n thoug h forecast s o f individual series diverg e fro m outcomes . Henc e to th e orde r o f approximatio n i n (42) . A n EC M impose s thi s conditio n whereas a VAR doe s not ; henc e th e forme r may be expecte d t o forecast better fo r long horizons. Engle an d Yoo present a Monte Carl o exampl e with thi s property. However , the y find tha t th e VA R doe s slightl y better on shor t horizons ; we comment o n this below. When th e proces s ha s a non-zero mean ft, a term of the for m fi(t + h ) should b e include d i n the abov e analysis , which otherwis e i s unchanged: see Section 8.5. 1 fo r th e cas e where fi lie s in the co-integratio n space . The fac t tha t variance s o f forecas t error s fo r co-integrate d combina tions remain bounde d doe s no t resolv e th e proble m o f long-run forecasting wit h integrate d variables . A simpl e scala r exampl e illustrate s th e difficulty. Conside r th e proces s where \n < 1. Then, b y repeated substitution , the /z-step-ahea d forecas t at tim e t , denote d x t+i,\t, is given by As h —»oo , x t+h\t — > 710(1 — Tr)"1, whic h is th e unconditiona l mean o f th e process.9 Thi s argument , whe n applie d t o stationar y variable s suc h a s «'xr o r &x it (wher e \, = (* 1(, x2t, • • ., x nt)'), implie s tha t th e syste m of equations, i f rewritte n entirel y i n term s o f 1(0 ) variables , lose s th e ability t o forecas t futur e value s base d o n it s past . A s th e forecas t horizon increases , th e bes t predicto r turn s ou t t o b e th e unconditiona l mean. Workin g i n th e level s o f 1(1 ) variable s i s equall y problematic — now th e pas t i s apparentl y informative , bu t forecas t error s hav e variances increasing with h. An exampl e fro m Hendr y (1991& ) demonstrate s th e importan t features o f the problem . Conside r a system of three variables, 'consump tion', 'income' , an d 'saving' , denote d b y C, Y, an d S respectively . Th e data-set i s artificia l bu t matche s importan t propertie s o f actua l U K series, suc h as the growt h rate o f income, whe n the variable s ar e viewed as logarithm s of th e origina l dat a (s o S t i s the lo g o f th e saving s ratio) . Using PC-NAIVE, dat a ar e generate d by 9 Th e algebr a generalize s t o th e cas e wher e x , i s a n n-dimensiona l vecto r an d n i s a matrix. Th e necessar y an d sufficien t condition s fo r stationarit y o f a vecto r proces s ar e given in Ch . 1.
280 Co-integratio
n in System s of Equations
where e it ~ IN(0 , a,-,- ) wit h £ r(e1,e2s) = 0 Vt,s, o o22 = 0.05. Th e syste m can be written i n levels a s
n
= 0.02, an d
Note no w tha t consumptio n an d incom e ar e bot h 1(1 ) variables , con sumption an d incom e ar e co-integrated , an d savin g i s a stationar y variable. Th e equation s
define th e syste m i n 1(0 ) space . Th e discussio n abov e provide d tw o implications, bot h o f which may now be confirmed. A: The system in 1(0) space loses predictive power but variances of forecast errors remain bounded. The confirmatio n o f thi s predictio n i s twofold . First , definin g th e vector w , = (S, , AY,)' an d the matri x A as
we hav e w , = k + Aw ( _ x + v r , wher e k = (-0.025 , 0.050) ' an d vt = (0,5£2t-£it, £ 2t)' . The various power s o f A are a s follows:
Thus, notin g tha t \v t+h\, - (I 2 - A)^ : (l2 ~ A ) k + A wh n th e abilit y to predict AY , vanishe s rapidl y an d littl e remain s fiv e period s ahead . Thi s is also tru e fo r 5 r, although th e rat e o f decay i s slower. Forecasting fro m th e syste m usin g th e artificiall y generate d dat a provides additiona l confirmatio n o f implicatio n A . Figur e 8. 1 show s th e forecast behaviou r fo r th e chang e i n consumption . Th e forecas t vari ances rapidl y converg e t o a constan t size , spannin g abou t on e unit , which matche s th e rang e o f th e observe d change s i n consumptio n i n th e sample use d t o estimat e th e system . Th e forecas t reveal s a retur n o f th e
Co-integration i n System s o f Equation s
281
FIG 8.1. Eight-year-ahea d forecas t of A C
growth rat e t o it s unconditiona l mea n o f 0. 1 afte r abou t fiv e periods , where i t then settles . Figure 8. 2 shows the correspondin g forecas t behaviou r for saving . The outcome i s simila r t o tha t depicte d i n Fig . 8.1. Th e forecas t variance s stabilize rapidly , ther e i s som e informatio n u p t o abou t eigh t period s ahead, bu t thereafte r conditiona l forecast s ar e n o bette r tha n th e unconditional mea n o f -0.125. B: The system in 1(1) space has variances of forecast errors increasing linearly with h. Figure 8. 3 report s th e dynami c forecasts fo r th e leve l o f consumptio n together wit h th e forecas t erro r bars . Th e hug e increas e i n th e forecas t
1i
1
,
1
,
!
,
1
,
1
FIG 8.2. Eight-year-ahea d forecas t of 5
Co-integration i n System s o f Equation s
FIG 8.3. Eight-year-ahea d forecast for C standard error s a s the horizo n increase s i s obvious. The y tren d upwards , and a t 3 2 periods ahead , correspondin g t o eigh t year s o f quarterl y data , span a rang e almos t a s larg e a s tha t o f th e previou s 6 0 dat a observa tions. Tha t rang e i s about 7. 5 units , whereas savin g never varie s outsid e ±1. Th e mea n forecas t quickl y become s a tren d sinc e th e serie s i s 1(1) and th e forecast s ar e uninformativ e after 1 0 periods becaus e o f the larg e variances. Eithe r a larg e recessio n o r a majo r boo m woul d b e compat ible wit h th e confidenc e interval s calculated . Figur e 8. 4 report s a recession scenari o fo r consumptio n tha t induce s a fal l o f over 1 0 pe r cent i n final-perio d consumptio n relativ e t o th e centra l forecast , bu t nevertheless lie s entirel y withi n th e 9 5 per cen t confidenc e band s o f th e latter. The discussio n s o fa r ha s abstracte d fro m th e problem s arisin g fro m parameter uncertainty . Th e analysi s has bee n conducte d i n wha t migh t be regarde d a s a Utopia n worl d fo r a n economi c forecaster . Th e mode l coincides wit h th e mechanis m tha t generate d th e data , a n assumptio n that seriousl y underestimate s th e uncertaint y likel y to b e presen t i n an y realistic setting . Allowin g for , say , paramete r uncertaint y make s fore casts eve n more uncertain . Sampson (1991 ) describe s th e effect s o f paramete r uncertaint y o n th e variances o f conditiona l forecas t errors . Th e conditiona l forecas t vari ance grow s wit h th e square o f th e forecas t horizon , bot h fo r unit-roo t (difference-stationary) an d trend-stationar y models . Chon g an d Hendr y (1986) discus s th e sam e issu e fo r a stationar y example . Brandne r an d Kunst (1990 ) sho w tha t a marke d deterioratio n i n forecas t accurac y occurs i f 1(1) combination s ar e retained , s o some o f the suppose d ECM s are spurious . Clements an d Hendr y (1991 ) als o fin d tha t poo r estimate s of « induce a simila r effect , whic h help s accoun t fo r th e Engle-Yo o Mont e Carl o results. However , the y als o sho w tha t mean-squar e forecas t error s
Co-integration i n System s of Equation s
283
FIG 8.4. Alternativ e futur e trajectorie s for C
(MSFEs) constitut e a n inadequat e basi s fo r selectin g forecastin g models or method s becaus e o f a lac k o f invarianc e o f MSFE s t o non-singular , scale-preserving linear transforms . As a result, fo r multi-ste p forecasts in systems o f equations , minimu m MSF E fo r on e linea r functio n o f predicted variable s doe s no t impl y minimu m MSF E o n another . On e method ca n dominat e al l other s fo r comparison s i n th e level s o f variables, ye t los e t o on e o f th e other s fo r differences , t o a secon d fo r co-integrating vectors , an d t o a thir d fo r combination s o f variables . Thus, th e outcom e o f a forecas t compariso n ca n depen d o n whic h representation i s selected . By re-examinin g th e Mont e Carl o stud y o f Engl e an d Yo o (1987) , Clements an d Hendr y (1991 ) fin d tha t differen t ranking s o f VA R an d Engle-Granger (EG ) estimators d o indee d resul t fro m th e 1(0 ) an d 1(1 ) representations o f the process . Fo r MSF E calculation s usin g co-integrating combination s rathe r tha n levels , th e VA R dominate s E G fo r al l forecast horizon s eve n thoug h th e difference s o f th e variable s ar e predicted wit h approximatel y th e sam e accuracy . The y propos e a n alternative invarian t criterio n whic h ensure s a uniqu e rankin g acros s models o r method s an d show s that ther e i s little t o choos e betwee n th e VAR an d E G estimator s i n a bivariat e process . However , bot h ar e dominated, fo r mos t o f th e paramete r value s considered , b y th e Johansen maximu m likelihood estimato r (MLE) . The asymptoti c formula e fo r th e /z-step-ahea d forecas t variance s i n co-integrated autoregressiv e system s ar e derive d b y Clement s an d Hendry (1992) . Th e /z-step-ahea d realization s fo r know n parameter s i n terms o f (38) for xt ove r th e forecast perio d T + 1 to T + h ar e
284 Co-integratio
n i n System s o f Equation s
where The conditiona l expectatio n E[x T+h X T] a t T i s
with forecas t erro r
Thus, th e forecas t error varianc e matrix is
For C and £ i n the model define d i n Section 8.5.1 , usin g w,,
Hence, th e MSF E fo r x , i n (45) i s O(h), whil e the MSF E fo r w , i n (46 ) is O(l ) i n h sinc e C s -*• 0 a s s — > oo . Thes e result s reflec t the fac t tha t x , is 1(1 ) bu t w ( ~ 1(0) . Th e covarianc e between forecas t error s a t h an d / , denoted b y co v [ • ], i s
when m = min(/, h) . When th e syste m i s expresse d i n difference s t o forecas t Ax outcomes ar e give n by
T+h,
Letting AXJ-+/ , denot e th e conditiona l expectation Then, subtractin g (49) fro m (48) ,
and s o for known parameters, th e variance formul a is Q for h = 1 and
Thus, th e MSF E i n (50 ) i s agai n 0(1) . In al l cases , whe n parameter s need t o b e estimated , mor e complicate d formula e with additional term s result.
Co-integration in System s o f Equations 28
5
These asymptoti c forecast erro r varianc e formula e revea l a grea t dea l about th e behaviou r o f forecas t error s a s horizon s increase . Clement s and Hendr y (1992 ) repor t a Mont e Carl o stud y fo r a bivariat e syste m which show s tha t th e formula e abov e reflec t th e mai n finit e sampl e effects whe n T = 100 . Thei r evidenc e als o suggest s tha t ther e i s littl e benefit fro m imposin g reduce d ran k co-integratio n restriction s i n a bivariate VA R unles s the forecas t horizo n i s short o r th e sampl e siz e is small. However , ther e ar e losse s fro m omittin g relevan t co-integratin g vectors. Thei r conclusion s ar e base d o n experiment s wher e th e numbe r of co-integratin g combination s i s known . Whe n th e numbe r o f co-inte grating vector s ha s t o b e determine d fro m th e data , th e performanc e of th e ML E wil l reflec t both under - an d over-specificatio n of th e degre e of co-integration . Also , th e ML E migh t b e expecte d t o dominat e th e unrestricted vecto r autoregressio n i n large r system s when co-integrating relations impos e many more restrictions .
8.5.5. Finite Sample Properties Gonzalo (1990 ) ha s undertaken a Monte Carl o stud y of the small-sample behaviour o f th e Johanse n procedur e i n a bivariat e model , an d ha s compared it s performanc e wit h th e Engl e an d Grange r (1987 ) two-ste p approach, a s wel l a s severa l othe r procedure s base d o n canonica l correlations an d principa l components . Eve n thoug h th e paramete r estimates i n 1(1 ) processe s converg e a t a rat e o f T, rathe r tha n T 1/2, quite larg e difference s i n estimate s emerg e fro m th e variou s method s considered. Th e finding s ar e reasonabl y encouragin g fo r th e maximumlikelihood method . Specifically , Gonzalo find s tha t th e ML E frequentl y has th e smalles t mean-square d erro r acros s a rang e o f parameter value s of interes t t o empirica l research . H e als o delineate s severa l feature s of the DG P whic h influenc e th e relativ e performance s o f th e variou s estimators significantly . Fo r example , whe n ther e i s on e co-integratin g vector an d a commo n facto r erro r representatio n (COMFAC ) i s valid (se e Hendry an d Mizo n 1978 , an d Sarga n 1980) , the n th e Engle-Grange r two-step metho d i s asymptoticall y equivalent t o MLE . Generally , ML E does better a t large r sampl e size s an d whe n COMFA C does no t hold . Th e effects o f non-normal errors see m minimal . However, give n the similari ties of the ML E t o LIML , particularl y the normalization s in «, the ML E may hav e no finit e sampl e moment s (se e Anderson 1976) . Gonzalo's pape r als o provide s usefu l derivation s o f th e asymptoti c distributions o f al l th e estimator s h e consider s i n th e Mont e Carlo , an d relates th e simulatio n finding s t o thes e limitin g distributions. W e retur n to thi s below.
286 Co-integratio
n i n System s o f Equation s
Reimers (1991 ) compare s th e power s o f variou s test s fo r co-integra tion fo r bivariat e an d trivariat e processes . H e find s tha t th e Johanse n procedure over-reject s whe n th e nul l i s true , i n smal l samples , and suggest s correctin g thi s usin g ( T - p)log( l - A,- ) instea d o f T log (1 — A,-) fo r th e tes t statistic s wher e p = nk take s accoun t o f th e number o f estimate d parameters . Whil e nk/T i s asymptoticall y negli gible, i t ca n b e larg e i n smal l samples . Th e powe r o f th e test s i s dependent o n th e specificatio n of the DGP , bu t Reimer s doe s no t relat e his simulation finding s t o th e typ e of analysis in Section 8.5.3 . 8.5.6. Selecting Lag Length Both Gonzalo' s (1990 ) an d Reimers' s (1991 ) studie s conside r th e effect s on th e ML E o f usin g incorrec t la g length s fo r th e short-ru n dynamics . Gonzalo find s tha t th e los s o f efficienc y fro m choosin g to o lon g a la g is small, an d tha t th e ML E perform s best eve n i f a la g o f fou r period s i s used fo r th e short-ru n dynamic s instea d o f th e correc t valu e o f 0 . However, i f to o shor t a la g lengt h i s use d (fo r example , zer o lag s instead o f one ) the n th e ML E i s n o longe r th e bes t method . Mor e practical experienc e is required befor e a fina l judgemen t can b e reache d on th e relativ e cost s o f under-specifyin g versu s over-specifyin g th e lag-length, bu t Gonzalo' s simulatio n evidenc e seem s intuitivel y reason able sinc e under-specificatio n wil l induc e residua l autocorrelation . Reimers find s tha t th e Schwar z criterio n doe s wel l i n a data-base d lag-length selectio n exercise . However , sinc e th e rol e o f th e Ax ( _, i s t o whiten th e error , i t i s no t clea r tha t th e us e o f th e Schwar z criterion , which penalize s th e additio n o f lag s strongly , will prov e optima l i n thi s context. 8.5.7. The Analysis of 1(2) Variables Reconsider th e basi c autoregressive system with lag length k, written as
where A 0 = I, s o that
Co-integration i n System s o f Equations 28
7
Writing this system in the usua l form ,
we see tha t
The mean-la g matrix is given by
To preclud e x , bein g integrate d o f orde r 2 , y'±i/ ' wher e <j) an d t] ar e (n — r) x p matrice s of ran k p = £ (n — r). Whe n p < (n - r), an addi tional conditio n i s neede d t o preven t 1(3 ) variables , simila r i n for m t o the earlie r mean-la g condition. W e assum e tha t x , is 1(2) s o Ax , i s 1(1), and A 2 x, i s 1(0) . However , th e origina l serie s a'x t wil l usuall y be 1(1) , and combination s of the for m a * 'Ax, an d a'x , + d' Ax, will be 1(0) . Thi s result help s explai n wh y investigator s ofte n nee d variable s suc h a s inflation i n long-ru n mone y deman d equations . Whe n nomina l mone y and price s ar e 1(2 ) bu t co-integrat e t o 1(1 ) a s rea l money , an d rea l income i s 1(1) , velocit y ma y stil l b e 1(1 ) an d requir e inflatio n t o co-integrate t o 1(0) . Further , th e concept s o f multi-co-integratio n (se e Granger an d Le e 1990 ) o r polynomia l co-integratio n (se e Engl e an d Yoo 1991 ) ca n be linke d b y such results to th e analysi s of 1(2) processes . Thus, earlie r model s of , fo r example , consumers ' expenditur e involving
288 Co-integratio
n i n System s o f Equations
the wealth-incom e rati o a s a n integra l correctio n mechanis m ca n b e appropriately re-interprete d (se e Hendr y an d Ungern-Sternberg 1981) . Jbhansen (1991b ) provide s a statistica l procedur e base d o n a n exten sion o f th e 1(1 ) MLE , whic h essentiall y consist s i n repeatin g th e 1(1 ) method twice . Th e firs t stag e proceed s a s usua l fo r th e reduced-ran k analysis o f th e level s o f th e variables , correctin g fo r th e lagge d firs t differences an d an y dumm y variables, t o determin e r , y , an d a . Next , one transform s th e variable s t o 1(1 ) combination s a s jus t describe d b y creating a^Ax^ j an d a'x r _i, y^A 2 x ( an d regresse s o n thos e tw o plu s lagged A 2 x ( _; u p t o la g lengt h k — 2 t o establis h <j>, ij , an d p . Johanse n shows that , asymptotically , this procedur e determine s th e correc t para meters. H e als o obtain s th e relevan t limitin g distribution s o f th e estimators. 8.5.8. Weak Exogeneity and Conditional Models Most large-scal e econometri c system s an d man y other empirica l model s are ope n i n th e sens e tha t the y trea t a subse t o f th e variable s a s 'exogenous'. I n thi s sub-section , w e wil l focu s o n th e potentia l wea k exogeneity o f contemporaneou s conditionin g variable s fo r th e para meters o f interes t i n 1(1 ) co-integrate d system s (se e Engl e e t al. 1983) . As discusse d i n Chapte r 1 , wea k exogeneit y require s tha t ther e i s n o loss o f informatio n abou t th e parameter s o f interes t i n reducin g th e analysis fro m th e join t distributio n t o a conditiona l model . Th e concep t was develope d initiall y in th e contex t o f stationar y processes, bu t a s th e results i n Chapte r 7 suggested, it play s a n importan t rol e i n 1(1) system s as well. In particular , whe n th e vecto r o f observable s x , i s 1(1 ) ther e ca n b e cross-equation link s betwee n parameters , whic h ar e induce d b y th e occurrence i n severa l equation s o f commo n co-integratin g combinations «'x ( . I f a'\ t enter s bot h th e z't h and ;'t h equations , the n Xj t canno t b e weakly exogenou s fo r th e parameter s o f th e z't h equatio n sinc e th e parameters o f the tw o equations shar e commo n component s o f a'x , an d so canno t b e variatio n free . Failur e t o accoun t fo r suc h paramete r dependencies ca n adversel y affec t th e validit y o f inferenc e i n finit e samples (se e Chapte r 7 , Phillip s 1991 , Phillip s an d Loreta n 1991 , an d Hendry an d Mizo n 1992). To develo p notatio n fo r an 1(1) ope n system , tw o partitions o f \t are needed. T o exposi t th e basi c idea , i t i s convenien t t o retur n t o th e first-order syste m in (38 ) above , writte n as where e r ~ IN(0,£) an d « ' i s r x n o f ran k r. First , w e have th e usua l
Co-integration i n System s o f Equations 28
9
transformed partitio n o f x, int o w ( = (xJer.Ax^)' , capturin g the location s of th e uni t root s an d th e co-integratin g vectors , wher e ther e ar e r elements i n x',a an d ( n - r ) i n Ax& r . Th e histor y o f th e proces s u p t o time t - 1 is denoted i n 1(0) spac e b y Wj_ i = (w l5 . . ., w,_i) . Second , we partitio n Ax ( int o (Axi,:Ax 2r)', wher e Ax 2f i s r a x 1 an d i s t o b e treated a s weakl y exogenous fo r th e vecto r paramete r o f interes t tjt e 4> , which include s thos e element s o f a an d y relevan t t o Ax lt . Fo r late r use, w e explicitl y write ou t nm t-i i n term s o f (xi^ix^-i)', whe n ther e are r v + r2 = r co-integrating relations i n the tw o blocks, namely
The dimension s o f y n , y 12, y 2i, an d y 22 ar e ( n — m) X r 1; (n - m ) x r 2, m x r l 5 an d m x r 2 respectively ; and , correspondingly, a'n, a[ 2, « 21, an d « 22 ar e r^x ( n — m), TI x m , r 2x ( n - m) , an d r2 x m. If r 2 — 0, the n the relevan t element s are set to zero . Sinc e the analysis i n term s o f w , i s i n 1(0 ) space , th e approac h i n Engl e e t al. applies. The complet e se t of parameters o f the join t distributio n i s 0 e 0, an d these ar e mappe d one-for-on e t o f(0 ) = A e A, an d partitione d int o A=(Ai:A2)' wher e ^ e \i an d A 2 e A2. Factoriz e th e join t sequentia l density D x(^t Wj_ l 5 ff ) o f Ax ( int o it s conditiona l an d margina l components:
(56) Since w ( _! = (xJ-jtrAx^-j)', al l th e informatio n o n th e co-integratin g vectors i s retained i n Wj_j . Consequently , Ax 2f i s weakly exogenous fo r <j> i f (jt depend s o n A t alone , an d A : an d A 2 ar e variatio n free , s o tha t A = A j x A 2. Wea k exogeneit y o f Ax 2( fo r (j> canno t occu r whe n A ! an d A2 bot h depen d o n commo n component s o f a . As a consequenc e o f th e normalit y assumption , an d usin g the expres sion in (55) for ya'x^, conditionin g Ax lf o n Ax 2, lead s t o th e mea n of the conditiona l density:
290
Co-integration i n System s o f Equations
where W = E^E^1. Thus , a necessary conditio n fo r the wea k exogeneity of Ax 2( fo r (yii:«ii:«i 2) i s that eithe r {y 12 - Vy22} = 0 o r y 22 = 0; i.e. («2ix lt _i + a 22x2r-i) appear s i n onl y on e o f D Xl\X2(-) o r D Xl(-), bu t not both . Further , unles s y 21 = 0, the n (a'uXi t~i + a 'ux2t-i) wil l appea r in th e margina l distributio n o f Ax 2( , s o y 21 = 0 is als o necessary . Ther e are sufficien t condition s for thes e necessar y conditions t o hold , including 721-0, y 22 — 0 an d y 12 = 0 wher e th e latte r tw o aris e becaus e r 2 = 0. Such condition s ca n b e teste d usin g th e approac h i n Johanse n (1992b) and Johansen an d Juselius (1990) . Short-run parameter s ma y depen d o n som e o f th e element s i n a without jeopardi/in g efficien t inference s abou t long-ru n parameter s o f interest. However , i f al l th e element s o f ^ ar e o f interest , the n agai n variation-free parameter s ar e required , an d an y cross-restrictions violat e weak exogeneity. To illustrat e thi s analysis , reconside r th e exampl e i n equation s (31) and (32 ) and (60 ) of Chapte r 7 . Ther e i s one co-integratin g vecto r wit h parameter /? , r\ = r = 1, r 2 = 0, m = 1 , and n = 2:
This representatio n i s in term s o f w r (se e (38) above) bu t i s written a s a triangular syste m erro r correctio n a s i n Phillip s (1991) , imposin g a specific first-orde r autoregressiv e parametri c for m fo r th e erro r proces s u, (compare d wit h the genera l processe s allowe d by Phillips): The unconditiona l covarianc e matri x o f u r i s pli m T~1 ^u t uJ = G , derived i n Sectio n 8.5.1 . Le t c 12 = c 22 = 0 sinc e thes e parameter s onl y determine th e presenc e o f the lagge d differenc e o f x2t, an d d o not affec t co-integration vectors . The n th e long-ru n covariance matri x is (see Ch. 7 appendix):
where ft)u = on/(l - c n ) 2 an d ^12 = cr12/(l - c u). Th e non-diagonalit y of fl implie s tha t ther e i s informatio n abou t th e parameter s o f eac h equation i n th e other . However , b y conditionin g Ax lf o n Ax 2, i n th e first equation , th e cr 12 effec t i s removed. The n eve n i f th e firs t equatio n is dynamic , s o c u ¥ = 0, th e diagonalit y o f fl onl y depend s o n c 21 = 0. When c 21 + 0, th e long-ru n covarianc e matri x i s non-diagonal an d ther e
Co-integration i n System s o f Equations 29
1
is a los s o f wea k exogeneity , whic h ca n hav e a detrimenta l impac t o n the bia s an d efficienc y o f th e least-square s estimato r o f f i i n finit e samples. Not e tha t c 12 = £ 0 ca n b e correcte d withi n th e firs t equatio n treated i n isolatio n b y addin g lagge d A* 2/, bu t tha t c 21 ¥= 0 require s modelling th e syste m (althoug h correction s base d o n addin g lead s o f Ax 2 , hav e been propose d t o exploi t th e obvers e Grange r causalit y of x\ on X2'. se e Stoc k an d Watso n 1991) . We no w deriv e th e conditiona l an d margina l factorizations . I n term s of observables , th e origina l syste m fro m Chapte r 7 ca n b e writte n a s w, = Cw,_! + e t, or
Rewritten a s a VAR i n 1(0) variables as in (37) , w e have
where d 12 = c12 + ^c22, y n = (cn - 1 + /3c21), d 22 = c 22, an d y21 = c21. The restricte d firs t colum n o f D i s a n incidenta l effec t fro m assumin g a first-order autoregressiv e erro r initially. Finally, solvin g fo r th e conditiona l an d margina l representations , w e have
where W = ouo22\ A u = (/3 + W), A 12 = (cu - 1 - Wc 21), A 13 = (c12 - ^c 22), A 21 = c 21, A 22 = c 22, an d E[v ts2t] = 0. Assum e tha t <j> = (An:A12:A13:/J)' i s th e vecto r paramete r o f interest . Whe n A 21 = 0 , least-squares estimatio n o f 0 from th e firs t equatio n involve s n o los s of information. I n fact , x 2t i s strongl y exogenous fo r 0 in suc h a system . However, whe n A 21 + 0, Ax 2( i s no t weakl y exogenous fo r <j> an d th e analysis i s no t full y efficient . Mont e Carl o studie s (e.g . Phillip s an d Loretan 1991 ) confir m th e impac t o f thi s los s o f efficienc y i n finit e samples (se e Chapte r 7 ) . Irrespective o f th e valu e o f A 21, th e firs t equatio n i n (62 ) i s th e conditional expectation fro m (58) , namely Thus, onc e dat a ar e 1(1 ) bu t co-integrated , th e fac t tha t a n equatio n coincides wit h th e conditiona l expectatio n i s no t sufficien t t o justif y single-equation least-square s modelling . Rathe r surprisingly , weak exo geneity is at leas t a s important i n 1(1) processe s a s in 1(0) processes .
292 Co-integratio
n i n System s o f Equation s
8.6. A Second Exampl e o f the Johanse n Maximu m Likelihood Approach We reconside r th e U K seasonall y adjuste d quarterl y dat a fro m Sect . 7. 6 on money , prices , output , an d interes t rates , thi s tim e treate d a s a system, represente d b y a VA R wit h tw o lag s o n eac h o f m — p, &p, xS5, and R n, plus a constant an d a trend. Th e la g length was selected b y commencing a t fiv e lag s on ever y variable, an d sequentiall y testin g fro m the highes t order . Th e sampl e wa s 1964(3)-1989(2) . Th e residua l standard deviation s o f th e fou r equation s wer e 0.0161 , 0.0069 , 0.0126 , and 0.012 7 respectively , an d o n recursiv e F-test s al l fou r equation s ha d acceptably constan t coefficient s usin g one-of f 1(0 ) critica l values . Th e residuals als o yielde d insignifican t outcome s o n % 2 test s fo r autocorrela tion bu t no t fo r normality. In almos t ever y instance, tw o co-integratin g combinations wer e signifi cant (i.e . tw o unit roots were rejected) ; th e secon d o f these wa s virtually the sam e i n al l la g specifications , bu t th e firs t wa s ofte n a linea r combination o f th e firs t tw o row s reporte d i n Tabl e 8.9 . Suc h a findin g matches tha t i n Hendr y an d Mizo n (1992 ) an d Ericsso n e t al. (1991) . Beginning wit h th e larges t statistics , tw o o f th e test s i n eac h colum n ar e significant (se e Osterwald-Lenu m 1992 : Tabl e 2). The correspondin g eigenvector s ar e show n i n Tabl e 8.9 , i n rows , augmented b y th e tw o non-co-integratin g combination s i n th e las t tw o TABLE 8.8. Eigenvalues , tes t statistics , an d 5 per cen t critica l value s Eigenvalues
0.013817
Statistics
-riog(i-ft.;) £,(0.05
n — 4= r =
0
n — 3= r = 1 n - 2 = r =2 n - 1 =r = 3
72.82 28.73 6.22 1.39
0.060350
30.33 23.78 16.87 3.74
)
0.249694
0.517240
-riog(l - M, ;) »? 109.17 36.34 7.62 1.39
n - r (0.05)
54.64 34.55 18.17 3.74
TABLE 8.9. Normalize d eigenvector s « ' Variable
m— p
«i
1.0000 0.0311 -0.2633 0.9838
«2
»'l l>2
R,,
6.3966 1.0000 0.9435 4.5659
-0.8938 -0.3334 1.0000 -0.7701
7.6838 -0.1377 -1.2117 1.0000
Co-integration i n System s o f Equations
293
rows. Th e firs t ro w suggest s th e following long-ru n solutio n fo r th e money equation: This i s clos e t o tha t foun d fro m th e single-equatio n dynami c analysis in Chapter 7 . N o tren d i s required . Th e y matri x i s give n i n Tabl e 8.10. Only th e firs t entr y i n th e firs t colum n i s a t al l large , s o tha t th e firs t co-integrating vecto r onl y affect s th e firs t equatio n consisten t wit h th e weak exogeneit y o f x 85, R n, an d A p fo r th e parameter s o f th e money-demand equation . Thi s agai n matche s th e findin g ove r a shorte r sample in Hendry an d Mizo n (1992) . The secon d ro w o f Tabl e 8. 9 deliver s th e approximat e long-ru n solution This correspond s t o th e impac t o f exces s demand , a s measure d b y th e deviation fro m it s linea r trend , o n inflatio n wit h a smal l an d possibl y insignificant effec t fro m interes t rates . N o additiona l tren d i s the n required. Th e secon d colum n o f y show s a larg e effec t o f thi s ECM o n all fou r equations , violatin g an y possibilit y o f treatin g an y o f th e fou r variables a s weakly exogenous i n a model o f inflatio n or exces s demand when the parameter s o f interest includ e th e long-ru n multipliers. When th e orderin g o f variables is ( m — p,Ap, x S5, R n) th e long-ru n n matrix is -0.082 -0.245 -0.081 -0.761 0.164 -0.009 -0.474 0.112 0.007 0.146 -0.108 -0.147 -0.021 -0.119 0.149 -0.059
8.7. Asymptoti c Distributions o f Estimators o f Co-integrating vectors in 1(1 ) system s Gonzalo (1990 ) review s an d compare s th e variou s alternative s t o OL S for th e estimatio n o f co-integrating vectors, includin g those propose d b y TABLE 8.10. Adjustmen t coefficients y Variable
7i
72
m- p Ap
-0.0952 0.0048 -0.0210 -0.0001
0.4268 -0.5147 0.2578 -0.2253
*85
Rn
-0.0300 -0.0013 -0.0318 0.0796
-0.0076 0.0024 0.0116 0.0069
294
Co-integration i n System s o f Equation s
Stock (1987) , Stoc k an d Watso n (19886) , Johanse n (1988) , Phillip s (1988a), an d Phillip s an d Hanse n (1990) . Whil e al l o f th e suggeste d methods shar e th e super-consistenc y property , w e hav e see n tha t ther e can b e substantia l difference s i n thei r performanc e o n moderatel y size d samples. Gonzalo make s th e compariso n on a simple dat a generatio n proces s i n which co-integratio n hold s between th e 1(1 ) serie s z t an d y t: and
This syste m i s a specia l cas e o f (58 ) an d ca n therefor e b e represente d i n the error-correction for m
where w l f = /3e 2r + eic U 2t = £ 2n an d £(uu' ) = A, with
The logarith m o f th e likelihoo d functio n fo r th e EC M i s therefore L(a, y , A) = K - (r/2)ln|A |
where x , = (y t, z t)' , J~ (p— 1,0)' , « ' = (1 , -/?), an d y« ' i s th e 2 x 2 matrix o f rank 1 given i n (64). The system s (63 ) an d (64 ) hav e th e propert y tha t z t i s weakl y exogenous fo r /? . Sinc e th e u it are normall y distribute d (fro m (63)) , tak e conditional expectation s in (64)
Taking th e covariance s o f the u t fro m (65) , w e have
Co-integration in System s o f Equations 29
5
The paramete r /3 i s recoverabl e fro m (67) . Moreover, / ? doe s no t enter th e margina l distribution . Weak exogeneit y o f z t fo r / ? implie s tha t inferenc e concernin g f t ca n be carrie d ou t wit h n o los s o f informatio n b y usin g th e densit y o f y t conditional o n z t an d ignorin g th e margina l densit y o f z t (tha t is , th e DGP o f z t)- I t i s the n no t surprisin g that , whe n th e log-likelihoo d i s formally spli t int o a conditiona l an d a margina l likelihood, th e margina l density contain s n o informatio n abou t ft . Tha t is , (66 ) can b e rewritte n as
with A 0 = An - A 12A^A21, £, = Ay , - ( p - l)(y t-i - fizt-i) ~ ty&z t, and, finally , i/ > = A^A^ 1 = (f t + 0ffi/ff 2 ); V ca n b e interprete d a s a short-run multiplier , bein g th e coefficien t o n Az , i n (67) , while th e long-run multiplie r i s ft , fro m (63) . The ter m i n parenthese s i n (68 ) is the margina l likelihood o f z t (o r Az r ) an d doe s no t involv e /3; estimatio n of f t ca n b e carrie d ou t b y maximizin g the conditiona l likelihoo d alone . The estimat e i s tha t whic h woul d b e obtaine d fro m OL S i n th e regression correspondin g to (67). In orde r t o discus s th e asymptoti c propertie s o f differen t estimatio n methods, w e us e th e multivariat e functiona l central-limi t theore m an d transformation t o th e uni t interva l describe d i n Chapte r 6 . Fo r th e vector e t - (v t, E 2t)' , let pt - p,_ j + ef . The n
with B(r ) = (5i(r), B 2(r))'. Th e long-ru n covarianc e matri x o f thi s bivariate Brownia n motion proces s ca n b e calculate d a s in th e appendi x to Chapte r 7 :
Further,
where
296 Co-integratio
n in System s o f Equations
Hence
Results o n th e asymptoti c distribution s o f th e differen t estimator s o f co-integrating parameters wil l be state d withou t proof, bu t ca n b e found in Gon/al o (1990). (i) Static regression estimated by OLS. For \t generate d by (63) , the OLS estimator o f ft in a static regression ha s the asymptoti c distribution
using th e decomposition BI(S) = a)i 2a)22B2(s) + ( = / 3 implie s 6 = 0, an d s o A 2 = A% = 0. Whil e th e limiting distributio n abov e i s specifi c t o th e DG P (63) , i/ > = / ? wil l typically onl y aris e becaus e o f a n absenc e o f lagge d value s o f z t an d y t from th e DGP ; if fo r exampl e y, = yzt + Y\yt-\ + Y2Zt-i + error, the n the long-ru n multiplie r i s / ? = ( V + 72)/( l ~ 7i) > m whic h cas e 7i — 72 = 0 i s sufficien t fo r fi = ty . A commo n facto r (y 2 = — VYi) i s necessary an d sufficient . The term s A 2 an d A 3 abov e ca n b e eliminate d whe n if> = £ / ? by th e us e of othe r estimatio n methods, a s will be see n below .
Co-integration i n System s o f Equations 29
7
(ii) Non-linear least squares (Stock 1987). Thi s method , whic h elimin ates th e bia s containe d i n (70c) , consist s i n minimizin g th e su m o f squared residual s defined as
which i s non-linea r i n tha t th e coefficien t o n z t-i i n th e correspondin g regression mode l i s YiP- Th e coefficien t f t ca n howeve r b e recovere d from th e ordinar y linear regressio n
The asymptoti c distribution o f thi s NL S estimato r i s simila r to tha t i n (69), bu t wit h the ter m (70c ) omitted an d (706 ) modifie d to
Comparing (706) and (706') , we see that (706' ) contain s a factor of ty rather tha n (i/;-/3) . A s (706 ) is on e o f th e term s responsibl e fo r second-order bias , i t seem s likel y tha t OL S wil l perform relativel y well when ty— ft = Q, reducin g th e bia s i n (706) , an d tha t NL S wil l perfor m relatively wel l whe n ^ = 0, reducin g th e bia s i n (706') . I n th e Mont e Carlo stud y of Stock (1987) , th e DG P chose n implie s that ip = 0, leading to th e superiorit y o f th e NL S technique ; wher e t/ ; = ft , however , OL S may d o better . Recal l fro m th e definitio n of if> tha t V = f t i f 0 > a scaling factor fo r th e correlatio n betwee n th e underlyin g white-nois e disturb ances in y t an d z, t, is equal to zero . (in) Full-information maximum likelihood (FIML). Th e FIM L pro cedure o f Johanse n (1988 ) fo r estimatin g the matri x a o f co-integrating vectors i n a syste m i s describe d above . Gonzal o show s that , fo r th e DGP (63) , the FIML estimator o f ft has the asymptoti c distribution
where AI i s as given i n (70a) . Therefor e (71 ) is equivalent t o (69 ) wit h terms A 2 an d A 3 eliminated . FIML estimatio n eliminate s two sources of bias: th e non-symmetr y caused b y ip = £ ft which leads t o a bias in median (term (706)), an d th e simultaneous-equation s bias , whic h i s a bia s i n mean (ter m (70c)) , whic h results when the long-ru n covariance betwee n zt an d v t i n (63 ) i s no t accounte d for . Th e FIM L estimato r i s asymptotically symmetrically distributed.
298 Co-integratio
n i n System s of Equations
Moreover, th e asymptoti c distributio n give n i n (71 ) i s a mixtur e o f normals. (Recal l tha t i n (70a ) B 2(s) an d W(s) ar e independen t Brow nian motio n processes. ) A s a result , standar d asymptoti c chi-square d hypothesis tests ar e valid. (iv) Other estimators. Stoc k an d Watso n (19886 ) an d Bossaert s (1988 ) propose additiona l method s o f estimatio n base d o n principa l compon ents an d canonica l correlations respectively . The principal-componen t metho d find s th e linea r combinatio n o f y t and z t wit h minimu m variance , whic h amount s t o findin g th e co integrating vector. Give n th e covarianc e matrix of (y t, z t), th e principalcomponent estimat e o f th e co-integratin g vecto r i s th e eigenvecto r corresponding t o th e smalles t eigenvalu e o f thi s covarianc e matrix . Fo r the DG P (63) , it s asymptoti c distribution i s like tha t o f OL S a s given in (69), wit h th e additio n o f a fourt h ter m groupe d wit h A\, A-i an d AT,. Calling thi s term A 4, The additiona l ter m affect s th e bia s i n mean , whic h ma y b e large r o r smaller tha n tha t o f OL S a s thi s term ma y b e positiv e o r negative . Lik e FIML, th e principal-componen t metho d lend s itsel f naturall y t o th e estimation o f more than on e co-integratin g vector. The metho d o f canonica l correlatio n i s base d o n a searc h fo r th e linear combinatio n o f (y t, z t) an d (y t-i, z t-i) whic h ha s th e maxima l correlation subjec t t o normalizatio n and identificatio n constraints. Gonzalo compare s th e method s i n a Mont e Carl o simulatio n that use s a DGP simila r to (63) , but wit h (63a ) modifie d t o
where a\ = 0 o r 1 and wit h a\ = 1 . Th e result s ar e consisten t wit h th e analysis o f biase s give n above , an d i n particula r suppor t th e contentio n that th e Johansen-typ e FIM L estimato r wil l ten d t o b e superior . Whic h of OL S an d NL S i s superior depends , a s anticipated , o n th e parameter s V an d t y — fi. Moreover, a s w e hav e see n above , i t appear s tha t th e efficiency cos t o f over-parameterizatio n o f th e FIM L o r NL S estimator s is modest , whil e th e consequence s o f under-parameterizatio n ma y b e more serious .
9
Conclusion We briefl y summariz e th e mai n theme s o f th e book , an d the n consider th e invarianc e o f th e matri x o f co-integrating vectors i n a linear syste m unde r bot h linea r transformation s an d seasona l adjustment. Next , co-integratio n i s related t o structure d time-serie s models, whic h offe r a n alternativ e approac h t o modellin g inte grated data . Recen t researc h o n integratio n an d co-integratio n i s described, an d th e boo k conclude s b y re-interpretin g som e ol d econometric problem s i n the ligh t of co-integration theory .
9.1. Summar y Many economi c tim e serie s appea r t o b e non-stationar y and to drif t ove r time. Efficien t inferenc e i n time-serie s econometric s require s takin g account o f thi s phenomenon . Thi s boo k describe d th e modellin g o f economic variable s a s integrate d processes , allowin g fo r th e possibilit y that variable s ma y b e linke d i n th e lon g run , implyin g tha t linea r combinations of them ar e co-integrated . We firs t presente d th e backgroun d t o th e theor y o f integrate d series , building o n concept s fro m time-serie s analysi s an d th e theor y o f sto chastic processes . Th e resultin g distribution s o f estimator s an d test s applied t o integrate d dat a wer e functional s o f Wiene r processes , whic h when combine d wit h a functional central-limi t theorem le d to a powerfu l and genera l metho d fo r derivin g their limitin g distributions. These wer e different fro m th e limitin g distribution s conventionall y applie d t o sta tionary processes , bot h becaus e th e normalizatio n facto r was the sampl e size rathe r tha n it s squar e root , an d becaus e th e for m o f the asymptoti c distribution wa s non-normal . A n importan t implicatio n wa s tha t th e critical value s o f tes t statistic s differe d betwee n 1(0 ) an d 1(1 ) data . Although th e asymptoti c distributio n theor y involve d ne w type s o f derivations, i t wa s feasibl e t o maste r th e logi c o f Wiene r processe s without excessiv e effort ; th e pay-of f wa s tha t th e approac h simplifie d other derivation s (suc h a s constanc y tests , a s i n Hanse n 1992) , and , i n addition, wa s very general. The Wiene r proces s tool s the n allowe d u s t o analys e suc h divers e problems a s spuriou s (o r nonsense ) regressions , spuriou s detrending ,
300 Conclusio
n
parametric an d non-parametricall y adjuste d univariat e test s fo r uni t roots, regression s o n 1(1 ) data , an d test s fo r co-integration . W e showe d that eve n wit h 1(1 ) dat a man y test s ha d conventiona l distributions , bu t some di d not , s o car e wa s require d i n conductin g inference . Fo r example, test s suc h a s the Johansen statisti c Tlo g (1 - A ) for co-integration ha d distribution s whic h wer e functiona l o f Wiene r processes , although test s o n co-integratin g vector s wer e asymptoticall y normal . I n particular, over-identificatio n test s neede d t o b e formulate d after map ping t o th e spac e o f 1(0) variable s t o ensur e tha t thei r distribution s wer e not a mixture of thes e tw o type s of distributions (se e Hendr y an d Mi/o n 1992). Conditionin g test s o n th e 1(1 ) decisio n fo r th e numbe r o f co-integrating relation s allowe d th e test s t o b e treate d a s having conventional distributions . Co-integration provide d a conceptua l framewor k fo r mappin g t o 1(0 ) space an d therefor e w e examine d i t a s a data-reductio n too l an d investigated som e o f it s wide-rangin g implications. Test s fo r co-integra tion base d o n residual s fro m stati c regression s an d o n system s wer e derived. Th e Grange r Representatio n Theore m linke d co-integratio n t o a variet y of other representations , includin g error-correction mechanism s (ECMs) whic h hav e been widel y used sinc e th e lat e 1970s . This lin k in tur n entail s a ne w view of dynamics : lagged feedbacks an d ECMs d o no t necessaril y violate rationalit y in a n 1(1 ) world . Further , a s in Davidso n e t al. (1978) , th e rol e o f differencin g i s a s a transform , which preserve s co-integration , an d no t a s a filter , whic h eliminate s levels variable s an d henc e lose s co-integration . Conversely , omittin g a n ECM generall y induces a negative moving-averag e error, a point elabor ated upo n below .
9.2 Th e Invarianc e o f Co-integrating Vectors Linear systems , perhap s formulate d afte r suitabl e dat a transformation s (such a s logarithms) intende d t o mak e linearit y a reasonable approxima tion, pla y a leadin g role i n co-integratio n analysis . A linea r syste m i s invariant unde r non-singula r linea r transforms , bu t usuall y it s para meters ar e altere d b y suc h transforms . Chapte r 2 discusse d th e proper ties o f linea r autoregressiv e distribute d la g (ADL ) model s fo r stationar y data, relatin g transformation s o f ADL s t o ECM s t o demonstrat e th e equivalence o f estimator s o f long-ru n multiplier s fro m an y o f th e transforms eve n thoug h th e parameter s o f the equatio n wer e altered . I n 1(1) processes , th e correspondin g resul t i s that co-integratio n define s a n invariant o f a linear system , a s we now show . Consider a n identifie d n x r co-integratio n matri x « i n th e 1(1 ) system:
Conclusion 30
1
(1 ) where e ( ~IN(0,i;). Th e syste m i n (1 ) ha s parameter s (T , y, a, fi, E). Then, \, is 1(1 ) i f an d onl y i f rank (yl^aj j = n — r wher e * P i s th e mean la g matrix defined i n Chapter 8 . Here (y : y± ) has rank n, with y ± being n X (n — r) suc h tha t y i y = 0 an d (a:a ± ) ha s ran k n wit h «^« = 0 fo r «_ L o f siz e nx(n — r). Pre-multiplyin g (1 ) b y a know n n x n non-singula r matri x B (s o | B = £ 0), t
The syste m i n (2 ) ha s th e sam e likelihoo d a s (1) , bu t wit h parameter s (r*, y*, a, jti* , £*) wher e £ * = B£B'; a n exampl e o f a n admissibl e transform i s an y just-identifie d reformulatio n o f (1) . Onl y a i s unaf fected b y th e linea r transform , an d a'x,_ i remain s th e co-integratin g combination, s o a i s an invariant parameter o f the system. The 1(1 ) propert y o f th e syste m i s als o preserve d a s follows . Th e mean-lag matri x become s *P * = B*P and , lettin g (y * : yj) = (By: B^'yj.) s o that y*'y l = 0, the n and henc e th e tw o matrices hav e th e sam e rank . The invarianc e of « is a natural propert y o f reduced-ran k system s an d extend s t o 1(2 ) processe s and t o conditiona l systems . Thus , fo r a give n vecto r x, , reduce d forms , marginal models , conditiona l models , an d structura l form s al l ca n b e modelled wit h the sam e se t of co-integration vectors .
9.3. Invarianc e o f Co-integration Unde r Seasona l Adjustment The co-integratin g vecto r a i s invarian t t o seasona l adjustmen t b y a diagonal seasona l filte r S(L ) whic h satisfie s th e scale-preservin g prop erty S(l ) = I, a s does a procedur e lik e X-ll . Th e result s i n this sectio n are draw n fro m Ericsson , Hendry , an d Tra n (1992) . I t i s assume d tha t S(L) annihilate s an y deterministi c seasona l dummies . Th e invarianc e result hold s becaus e S(L ) can be written a s (see Chapte r 5) : We firs t sho w th e co-integratio n relatio n betwee n adjuste d an d unadjusted dat a an d the n establis h th e invarianc e o f th e co-integratio n matrix a o f x, . Le t x , = S(L)x,. denot e th e seasonall y adjuste d vecto r variable. The n
302 Conclusio
n
so tha t x , — \t = S*(L)Ax r . Henc e \ at an d x, co-integrat e wit h a uni t coefficient t o 1(0 ) whe n x, i s 1(1). Mos t seasona l adjustmen t filter s ar e two-sided an d symmetri c for mos t o f th e availabl e sample , s o that i n fac t S*(l) = 0 an d S(L ) = I + S**(L)A 2 . The n x ? - x , = S**(L)A 2 x ( s o that co-integratio n t o 1(0 ) occur s betwee n adjuste d an d unadjuste d dat a even whe n x t i s 1(2). Alternatively , i f Ax r i s 1(0) wit h a non-zer o mea n (as i n GNP) , the n x " - x , ha s a zer o mean , a s seem s sensibl e fo r the seasonal residual . Generally , i f S(L ) = I + St(L)A d , the n x ? an d x , co-integrate wit h a unit coefficien t to 1(0 ) whe n xt i s I(d), an d als o hav e a zer o mea n differenc e whe n x ( i s \(d — 1). Whe n x", — xt i s a t mos t 1(0), an y co-integratin g vecto r « ' o f eithe r x ? o r x , i s a co-integratin g vector o f th e other , s o co-integratio n parameter s ar e unaffecte d b y S(L). Sinc e x", = xt + S**(L)A2 x ( , we have tha t
and henc e th e differenc e is at leas t tw o order s o f integratio n lowe r tha n that of xt. However, th e adjustmen t paramete r y i s altere d a s follows . Multipl y (1) by S(L) t o give Ax? = S(L)fi + S(L)rAx,_! + S(L)y«'x f _ 1 + S(L)e ,
By suitabl e additio n an d subtractio n o f lag s an d difference s o f x ? o n th e right-hand side ,
When Sf(-L ) i s a scala r time s th e uni t matri x (th e sam e filte r fo r al l x it), vat = ef. I n (6) , i t look s a s i f y i s als o a n invariant , bu t a s o at involve s lagged, current , an d futur e difference s of x, o f dth o r highe r order , a s well a s e", the n on e o f v at o r e t i s likel y t o b e autocorrelated . Sinc e «'x?_i i s a n 1(0 ) variable , conventiona l seria l correlatio n biase s appl y t o it, an d henc e y will usuall y b e affecte d b y whethe r o r 'not th e dat a ar e seasonally adjusted . Th e short-ru n dynamic s wil l be change d whe n e t i s an innovation , becaus e v" i s correlate d wit h Ax?_i , an d additiona l lag s are neede d t o remov e it s autocorrelation .
Conclusion 30
3
9.4. Structure d Time-serie s Models and Co-integratio n An alternativ e approac h t o modellin g integrate d processe s i s offered b y structured time-serie s model s (se e Harvey 1989) . 1 I n thi s section , w e briefly explai n thei r for m an d relat e thei r dat a descriptio n propertie s t o a co-integrated system . A simpl e univariat e example i s given by
and E[e tvs] = 0 V?,s . Thei r for m generall y lead s t o th e presenc e o f negative moving-average errors , sinc e (7 ) and (8 ) imply that The proces s {e t — et_i + vt} ca n be re-expresse d a s a first-order moving average {e, — 9et-i}, wher e th e moment s o f th e derived proces s ar e identical t o thos e o f the origina l process an d determin e 9 . Th e variance of th e forme r i s 2o 2E + o 2v, an d tha t o f th e latter , {e t-det_i}, i s (1 + 0 2)ol, an d thes e mus t b e equa l t o eac h other ; thei r first-orde r auto-covariances ar e — o2 an d — 9o2, and agai n these mus t be equal . Al l longer la g c o variances vanish . Equatin g th e first-orde r seria l correlatio n coefficients of the two representations yield s where q = o2Ja2. Equatio n (10 ) is a quadratic i n 6 that, give n q, can be solved fo r a valu e o f 9 betwee n 0 an d 1 . Finally , equatin g first-orde r covariances a 2, = o 2e/9. Thus , Ay , i s 1(0 ) an d ha s a negativ e moving average erro r wit h parameter 9 : Ay, = e t — (?e,_i. There ar e clos e link s betwee n negativ e moving-averag e error s an d error-correction mechanism s a s remarke d earlie r (se e e.g. Gregoir an d Laroque 1991) . Conside r a simple co-integrated system ,
To marginaliz e with respect t o z a t al l lags in (11), firs t rewrit e it a s so that, i n terms o f differences , In (14) , w, = Ay3v,_ ! + AM , an d a s wit h (9) , when {v, } an d {u s} ar e mutually independent , w e ca n rewrit e w t a s £ , — T£,_I, wher e equatin g 1 Harve y call s suc h model s 'structural' , bu t a s tha t wor d i s heavil y over-use d i n econometrics, we have substituted 'structured' .
304 Conclusio
n
moments yield s -t/( l + r 2) = -l/( 2 + s) fo r s = )?ff-o 2v/o2u. Thus , a negative moving-averag e erro r als o result s fro m th e marginalizatio n providing A ^ 0 (th e uni t roo t i n (14 ) cancel s whe n A =0 sinc e the n s = 0 an d s o r = l ) . I f (7 ) an d (8 ) allowe d fo r a short-ru n dynami c element, th e observe d outcom e woul d b e simila r t o tha t entaile d
by (14) .
A structure d time-serie s mode l tha t generalize s (8 ) b y includin g a time-varying slope generate s a n 1(2) series ,
Thus, a s long as cr 2 + 0, Hence fro m (7) , When cr ^ = 0, we have £ t = t, t_v = £ 0, say, so that and C o i s th e mea n growt h rat e £[Ay r ] = g y = £ 0- Whe n a 2 ¥=0, (18 ) entails changes in £[Ay r ] = g y (f) over tim e an d generate s y , a s 1(2). The alternativ e possibilit y to evolvin g growt h rate s i s tha t o f change s in mean s ove r time , s o tha t g y(t) take s differen t value s i n differen t epochs. Suc h behaviour coul d b e approximate d b y a mode l i n which th e distribution D n(r]t) wa s non-normal, wit h a large mass a t zer o an d smal l probabilities o f larg e values . The n £ r woul d usuall y b e constant , bu t would occasionall y jum p t o a ne w level . Thus , i t i s unsurprisin g tha t discrimination betwee n integrate d an d regime-chang e model s i s difficul t (see Perro n 1989) . Conversely , ther e ar e clos e affinitie s betwee n struc tured time-serie s an d econometri c model s fo r integrate d data . Indeed , several researcher s hav e suggested switchin g from a unit-root nul l to on e of 1(0 ) o r co-integration . Fo r example , on e migh t see k t o tes t a 2, = 0 when a ^ = 0 (s o £ r = £ W) a s a tes t fo r a uni t roo t (se e e.g . Kwiatkowski, Phillips , an d Schmid t (1991) an d Leybourn e an d McCab e (1992)) .
9.5. Recen t Researc h o n Integration an d Co-integratio n During th e las t decad e ther e ha s bee n a n explosio n o f researc h o n integrated an d co-integrate d processes . Dozen s o f papers appeare d whil e we wer e writin g the book , an d man y will appea r betwee n completio n o f
Conclusion 30
5
writing an d it s appearanc e i n print . Wit h suc h a rapidl y movin g target, we focuse d o n centra l researc h topic s t o explai n wha t see m likel y t o remain th e majo r concepts , tools , techniques , models , methods , an d tests. Consequently, som e researc h area s receive d scan t treatment , including other estimatio n method s fo r co-integratio n vectors , a s well as studies of their properties : see inter alia Ahn and Reinse l (1988) , Bewley , Orden , and Fishe r (1991) , Boswij k (1991) , Bo x and Tia o (1977) , Engl e an d Yo o (1991), Phillip s (1991) , Saikkone n (1991) . Som e comparativ e Mont e Carlo studie s o f finit e sampl e behaviou r an d relate d econometri c theory have bee n noted , bu t other s appea r apac e an d w e ca n expec t man y more ove r th e nex t fe w year s clarifyin g th e choic e o f method , an d th e likely problem s confrontin g eac h proposal . Researcher s wil l als o stud y the problem s o f join t selectio n of , e.g . la g lengt h an d th e numbe r o f co-integration vectors . Anothe r researc h topi c i s th e orde r i n whic h hypothesis tests should be conducted . Intuitio n suggest s that i t should b e constancy, la g length , co-integration , congruenc e o f th e system , wea k exogeneity, structura l restrictions , encompassing , intercept s (an d whether the y lie in the co-integratio n space), etc . However , th e distributions o f test s o f th e firs t hypothesi s ar e affecte d b y th e presenc e o f co-integration, an d i t ma y wel l b e difficul t t o implemen t a goo d order , although i f the dat a ar e indee d 1(1) , test s fo r la g length based o n lagged first difference s wil l b e i n 1(0 ) space . On e recommendatio n concernin g choices o f method s an d estimator s tha t emerge d a s w e proceede d wa s for a system s approac h i n preferenc e t o single-equatio n modellin g until weak exogeneit y has been ascertained . Further development s hav e occurre d i n testin g fo r uni t root s i n univariate processe s suc h a s instrumenta l variable s test s an d Durbin Hausman test s (se e e.g . Hal l 1991 , Cho i 1992 , Schmid t an d Phillip s 1992, Kremer s e t al. 1992 ; an d Banerje e an d Hendr y 199 2 fo r a summary). However , th e previou s recommendatio n o f modellin g th e system rathe r tha n usin g univariate representation s bring s into questio n the poin t o f conductin g unit-roo t test s i n margina l processes . On e purpose migh t be t o rejec t th e nul l of integration against trend stationar ity. Here , th e availabl e test s ar e know n to hav e relativel y low power. I n particular, investigator s ofte n us e t( p = 1) rathe r tha n T(p — 1) (se e Sect. 4.6 ) althoug h Mont e Carl o evidenc e show s th e latte r t o hav e higher power . I n an y case , failur e t o rejec t th e nul l doe s no t entai l accepting it as 'true' . For example , univariat e unit-roo t test s can reflec t other non-modelle d form s o f non-stationarit y suc h a s regim e shifts , an d inherent non-stationarit y i n mea n an d varianc e functions . Further , variables inherit uni t roots fro m marginalizin g with respect t o othe r unit root processe s o n whic h they depend . Thus , failur e t o rejec t a nul l o f a unit roo t tell s u s littl e abou t th e persistenc e o f shock s t o th e variabl e
306 Conclusio
n
being considere d i n isolatio n o r i n a small, highly marginalized syste m a s discussed i n Campbel l an d Perro n (1991) . A secon d purpos e migh t be t o chec k that variable s i n a system ar e no t 1(2) (se e e.g . Pantul a 1991) , s o th e nul l woul d b e a uni t roo t i n th e differences o f th e origina l variables . However , i f th e intentio n i s t o model th e system , the n i t seem s bette r t o procee d fro m th e genera l t o the specifi c her e a s wel l an d tes t th e necessar y ran k condition s o n th e mean la g matri x o f th e syste m (se e followin g (1 ) above) . Nevertheless , sequential test s i n thi s contex t rais e som e ne w problems . Fo r example , the outcom e o f a pretest fo r a uni t root (i.e . rejec t o r no t reject ) affect s the critica l values used t o tes t economi c hypotheses , s o the possibilit y of Type-I error s a t th e firs t stag e ma y lea d t o siz e o r powe r distortion s a t the secon d stag e whe n conventional initia l values ar e used . Finally, a uni t roo t ma y b e o f interes t i n orde r t o validat e a specific estimator (e.g . Engle-Granger ) b y appealing t o super-consistency . Her e a uni t roo t tes t ma y b e o f descriptive valu e as i t depend s o n th e rati o of the covarianc e o f the firs t differenc e wit h the leve l to th e varianc e o f th e level, an d s o should b e clos e t o zer o whe n ther e i s a unit root, althoug h we showe d i n Sectio n 3. 6 tha t simila r distribution s wil l resul t fo r integrated an d near-integrate d processes . Th e rati o o f th e varianc e o f the firs t differenc e to tha t o f th e leve l i s another inde x of th e rapidit y of accrual o f information (either fro m trend s o r fro m drift) . Other likel y researc h interest s concer n test s o f structural , long-run , exogeneity, causality , an d encompassin g hypothese s (se e e.g . Boswij k 1991, Hendr y an d Mizo n 1992 , an d Banerje e an d Hendr y 1992) . Modelling 1(2 ) system s i s i n it s infanc y (se e Johanse n 1991fo) , bu t ha s close links to multi-co-integratio n an d th e analysi s of stock-flow relations (see Grange r an d Le e 1990) . Thi s las t developmen t provide s a n addi tional explanatio n fo r suc h phenomen a a s th e rol e o f inflatio n i n rea l money deman d equations : i f nominal money and th e pric e leve l are 1(2) , and rea l mone y an d inflatio n ar e 1(1) , the n th e las t ma y b e neede d t o create a n 1(0 ) co-integratio n vector . Extensiv e development s als o see m likely t o occu r i n estimatio n an d dynami c modelling , sinc e fo r man y objectives i n econometrics , includin g forecasting and policy , the focu s o f interest mus t b e al l parameter s o f th e syste m an d no t jus t th e long-ru n parameters. In co-integrate d processes , wea k exogeneit y o f th e conditionin g vari ables fo r th e parameter s o f interes t remain s a s vita l a s i t di d i n stationary processes—eve n fo r th e long-ru n parameters . Thus , i t i s important t o tes t fo r th e presenc e o f co-integratin g vector s i n othe r equations a s discusse d i n Chapte r 8 . Doin g so , however , implie s syste m modelling eve n fo r a n L M tes t (se e Boswij k 1991) . Further , Urbai n (1992) show s tha t test s fo r orthogonalit y betwee n regressor s an d error s lack powe r t o detec t suc h a weak exogeneity failure.
Conclusion 30
7
9.6. Reinterpretin g Econometrics Time-series Problems Integration an d co-integratio n als o lea d t o th e re-interpretatio n o f many extant econometric s time-serie s problems . W e conside r a fe w o f these , commencing with multi-collinearity.
9.6.1. Multi-collinearity When x , ~ 1(1 ) an d a'x , ~ 1(0) , the n includin g all the element s o f x ( o r \t-i a s regressors i n a singl e equatio n wil l induc e a n apparentl y seriou s collinearity problem . Th e secon d momen t matri x (X'X ) will b e O(T 2), whereas th e linea r combinatio n (a'X'Xa ) wil l b e O(T). Consequently , (T~ 2 X'X) will converge on a singular matrix . Generally, it is inadvisable to 'solve ' thi s proble m b y deletin g variables ; fo r 1(1 ) data , doin g s o jeopardizes th e possibilit y of co-integration . I f th e dependen t variabl e i s 1(0), the n th e solutio n i s to fin d th e co-integratin g combination a'x t o r «'x,-i an d us e tha t a s a n explanator y variable . Thi s strateg y cor responds t o th e usua l recommendatio n o f transformin g t o near-ortho gonal an d interpretabl e variables . I n othe r cases , wher e th e dependen t variable i s 1(1) bu t i s co-integrated wit h a subset o f \t, say, elimination may b e sensible , bu t Wiener-base d critica l value s shoul d b e use d fo r variables tha t canno t b e writte n implicitl y a s a n 1(0 ) functio n (se e Chapter 7) . Thes e idea s ar e relate d t o th e earlie r techniqu e o f con fluence analysi s in Hendry an d Morga n (1989) .
9.6.2. Measurement Errors Measurement error s ar e a secon d proble m wher e treatmen t recommen dations ca n differ i n the light o f data bein g integrated . Whe n \t ~ 1(1), then Ax ( ~ 1(0) , an d i f the dat a ar e i n logarithms , then th e change s ar e growth rates . I f observed growt h rate s ar e t o b e a t al l sensible, the n th e error wit h which the y ar e measure d mus t no t b e 1(1 ) o r higher . Lettin g x? denot e th e observe d series , on e possibl e mode l i s Ax t = Ax, + u f , where u, i s 1(0), s o that If th e measuremen t erro r i n level s i s denote d vr t = x°t — \t, then w r i s apparently 1(1) . Thi s consideratio n therefor e onl y rather weakl y bounds the scal e o f measuremen t error . Indeed , i f the DG P i s of th e for m tha t Ax, = e t, then u, and e t ar e essentially indistinguishable in models of x°.
308 Conclusio
n
However, whe n a'x ( i s a n 1(0 ) co-integratin g combination , then , o n pre-multiplying (20 ) by a', Since Aa'x , is I(—1) an d a'Xf wil l b e 1(0 ) onl y if a'u, i s I(—1) . Thus , 1(0 ) measuremen t error s o n growth rate s mus t co-integrat e t o I(—1 ) wit h co-integratio n matri x a if the observe d serie s ar e t o co-integrat e i n th e sam e wa y a s th e laten t variables whe n the measuremen t errors ar e 1(0 ) o n growt h rates. Nowa k (1990) call s a failur e t o observ e a'x° t bein g 1(0 ) whe n a'x, i s 1(0 ) a problem o f 'hidde n co-integration' . However , man y co-integratio n rela tionships, suc h a s consumption and income , ar e likel y to hav e connecte d measurement errors . Governmenta l statistica l bureaux ma y eve n correc t the dat a o n suc h serie s i n a relate d wa y t o avoi d divergence , whic h suggests a n 1(0) measurement erro r for , say, the rati o betwee n them . An alternativ e mode l o f measurement error fo r logarithm s is one wit h a constant-percentag e standar d deviation, s o that th e siz e of the absolut e error grow s with th e variable . This lead s t o x ? = x, + v t wher e var[v f ] is constant. Suc h a measuremen t erro r woul d no t imped e co-integratio n analyses, i n tha t inconsistenc y would not resul t a s in a n 1(0 ) setting , bu t would hav e th e usua l impact i n 1(0 ) representation s sinc e a'v t coul d b e 1(0). A n importan t instanc e is when v t i s an expectation s error , i n whic h case th e distribution s of th e long-ru n parameter estimate s ar e unaffecte d but short-ru n paramete r estimate s ma y b e biase d (se e Engl e an d Granger 1987 , an d Hendr y an d Neal e 1988) . 9.6.3. Incorrectly Omitted and Included Variables When a relevan t 1(1 ) variabl e i s omitte d fro m a relationship , 1(0 ) co-integration i s impossibl e an d seriou s biase s ca n result . I n particular , for a n 1(0 ) dependen t variable , al l th e remainin g 1(1 ) regressor s ma y cease t o b e significan t give n th e appropriat e critica l values , leadin g th e model t o collaps e t o on e i n differences . Includin g a n irrelevan t 1(1 ) variable o r vecto r wil l probabl y lowe r th e efficienc y o f estimate s o f th e co-integrating vector s bu t shoul d b e detectabl e i n larg e enoug h samples , with th e usua l possibility of Type-I errors. If on e incorrectl y include s a n 1(0 ) variabl e i n a co-integratio n vecto r in a stati c regression , it s coefficien t wil l b e biase d whe n tha t variabl e i s correlated wit h omitte d 1(0 ) variables . Th e consequence s i n th e max imum likelihoo d procedur e see m les s seriou s a s it is possible t o tes t fo r a unit vecto r (i.e . on e o f th e for m ( 0 ... 0 1 0 ... 0) ) lyin g i n th e co-inte -
Conclusion 30
9
gration spac e (se e Sect . 8.5.2.) . However , conditionin g o n th e estimate d coefficients o f 1(0 ) variable s i s inappropriate , an d spuriousl y smal l confidence interval s fo r th e remainin g 1(0 ) effect s wil l usuall y result . Finally, excludin g a n 1(0 ) variable fro m a mode l wil l no t affec t th e long-run paramete r estimate s i n larg e samples , bu t wil l usually bias th e short-run parameter s as in conventional econometric derivations . 9.6.4. Parameter Change in Integrated Processes The mos t seriou s proble m arisin g fro m possibl e paramete r chang e i n econometrics i s th e predictiv e failur e o f model s tha t fai l t o incorporat e the necessar y effects . Unfortunately , i t i s difficul t eve n t o diagnos e th e problem sinc e i t is easy to confus e a n 1(1) proces s wit h an 1(0 ) subjec t to shifts (se e e.g . Perro n 1989 , Rappopor t an d Reichli n 1989 , an d Hendr y and Neal e 1991) . Indeed , a s note d i n Sectio n 9. 4 above , structure d time-series model s implemen t th e latte r an d produc e th e former . Whether it is mor e usefu l to vie w economi c dat a as integrate d (in the sense o f havin g a uni t roo t i n th e autoregressiv e representatio n subjec t to regula r smal l shocks ) o r a s subjec t t o larg e an d persisten t regim e shifts (th e abolitio n o f fixe d exchang e rates followin g Bretto n Woods , o r their reinstatemen t i n th e ERM ; th e formatio n o f OPEC ; th e denation alization o f large sector s o f a n economy ; ne w form s o f monetary contro l or thei r removal ; financial and technological innovation ; etc.) remain s to be seen . However , bot h type s ar e boun d t o pla y importan t roles , an d although w e hav e focuse d o n th e forme r i n thi s book , understandin g economic behaviou r wil l necessitat e modellin g bot h integrate d dat a an d breaks appropriately . E x ante, structura l break s ca n lea d t o ba d predictions, whic h 1(1) data alon e d o not see m to cause . E x post, testing for paramete r chang e i n 1(1 ) dat a mus t allo w fo r a wid e rang e o f possible choice s fo r brea k points . Usefu l development s ar e occurrin g in deriving appropriat e test s base d o n Wiene r distributions , an d decisio n taking i n thi s are a shoul d improv e rapidl y (se e Nyblo m 1989 , Ch u an d White 1991 , 1992 , Andrew s an d Ploberge r 1991 , Hanse n 1991 , an d Li n and Terasvirt a 1991) . 9.6.5. Conditional Models o f Co-integrated Processes Chapter 8 emphasize d th e maximum-likelihoo d approac h t o testin g fo r and estimatin g co-integratin g vector s i n th e contex t o f a VAR . Thi s imposed th e minimu m conditionin g assumption s an d allowe d a clea r focus o n th e propertie s o f co-integratio n estimation . However , man y papers hav e begu n t o develo p approache s i n th e contex t o f systems that
310 Conclusio
n
treat a subset o f variables a s weakly exogenou s fo r al l the parameter s o f interest: se e Johansen (1992a , 1992&), Johanse n an d Juseliu s (1990) , an d Boswijk (1991) , inter alia. Relate d wor k include s tha t o n testin g fo r Granger causalit y i n co-integrate d system s (se e Tod a an d Phillip s 1991 , Mosconi an d Giannin i 1992 , an d Hunte r 1992) . For a lon g time , econometrician s hav e 'talked ' co-integratio n withou t realizing it : fo r example , Klei n (1953 ) discusse s variou s grea t ratio s o f economics, namel y consumption-income , capital-output , wag e shar e i n total income , an d s o on, implicitl y assuming a stationary , o r 1(0) , world . From ou r perspective , give n tha t th e component s o f thes e relation s ar e 1(1), Klein' s ratio s are earl y example s of co-integratio n hypotheses . In a log-linear multivariat e analysis , thes e postulat e particula r form s fo r th e rows of the co-integratio n matrix , highlightin g the potentia l confirmatory role o f th e method s discusse d i n Chapte r 8 . Econometrician s nee d n o longer simpl y assume long-ru n equilibrium relation s sinc e i t is feasible t o test fo r thei r existence . Onc e tha t i s establishe d th e analysi s is reduce d from 1(1 ) t o 1(0 ) space , allowin g th e applicatio n o f wel l establishe d tools. Thus, th e recen t focu s o n conditiona l o r ope n model s take s us back t o the 1970 s i n a n importan t sens e wit h th e link s betwee n economi c theor y or long-ru n equilibriu m reasonin g an d dat a modellin g havin g bee n placed o n a sounder footing . As w e hav e show n i n thi s book , ther e stil l remai n man y difficul t theoretical an d empirica l problem s t o b e overcome . However , th e literature o n co-integration , erro r correctio n an d th e econometri c analy sis of non-stationary data ha s enable d u s to gai n many important insights into modellin g relationship s amon g integrate d variables . Thi s ha s en hanced rathe r tha n replace d existin g method s o f dynami c econometri c modelling of economic tim e series.
References ABADIR, K . M . (1992) , 'Th e Limitin g Distributio n o f th e Autocorrelatio n Coefficient Unde r a Unit Root' , Annals o f Statistics, forthcoming . AHN, S . K. , an d REINSEL , G . C . (1988) , 'Neste d Reduced-Ran k Autoregressiv e Models fo r Multipl e Tim e Series' , Journal o f th e American Statistical Association, 83: 849-56. ANDERSON, T . W . (1958) , A n Introduction t o Multivariate Statistical Analysis, John Wiley , New York. ——(1976), 'Estimatio n o f Linea r Functiona l Relationships : Approximat e Distributions an d Connection s wit h Simultaneou s Equation s i n Econometric s (with discussion)' , Journal of th e Royal Statistical Society B,38 : 1-36 . ANDREWS, D . W . K. , an d PLOBERGER , W . (1991) , 'Optima l Test s o f Paramete r Constancy', mimeo. , Yale University Press. BANERJEE, A. , an d DOLADO , J . (1987) , 'D o W e Rejec t Rationa l Expectation s Models Too Often ? Interpretin g Evidence using Nagar Expansions', Economics Letters, 24: 27-32. (1988), 'Test s o f th e Lif e Cycle-Permanen t Incom e Hypothesi s i n th e Presence o f Rando m Walks : Asymptoti c Theor y an d Smal l Sampl e Interpre tations', Oxford Economic Papers, 40: 610-33. -and GALBRAITH , J . W . (1990a) , 'Orthogonalit y Test s wit h De-trende d Data: Interpretin g Mont e Carl o Result s using Nagar Expansions' , Economics Letters, 32: 19-24. -HENDRY, D . F. , an d SMITH , G . W . (1986) , 'Explorin g Equilibriu m Relationships i n Econometric s throug h Stati c Models : Som e Mont e Carl o Evidence', Oxford Bulletin of Economics an d Statistics, 48: 253-77. -GALBRAITH, J . W. , an d DOLADO , J . (19906) , 'Dynami c Specificatio n with the Genera l Error-Correctio n Form' , Oxford Bulletin o f Economics an d Statistics, 52: 95-104. -and HENDRY , D . F . (eds. ) (1992) , Testing Integration an d Cointegration, special issue of th e Oxford Bulletin of Economics and Statistics, 54, 225-55. BARDSEN, G . (1989) , 'Th e Estimatio n o f Long-Ru n Coefficient s fro m Error Correction Models' , Oxford Bulletin of Economics and Statistics, 51: 345-50. BEWLEY, R . A . (1979) , 'Th e Direct Estimatio n of the Equilibriu m Response i n a Linear Model' , Economics Letters, 3 : 357-61. BEWLEY, R . A. , ORDEN , D. , an d FISHER , L . (1991) , 'Box-Tia o an d Johanse n Canonical Estimator s o f Cointegratin g Vectors' , Universit y o f Ne w Sout h Wales, Economics Discussion Paper, 91/5 . BHARGAVA, A . (1986) , 'O n th e Theor y o f Testin g fo r Uni t Root s i n Observe d Time Series' , Review of Economic Studies, 53 : 369-84. BILLINGSLEY, P . (1968) , Convergence of Probability Measures, John Wiley , New York. BOSSAERTS, P . (1988) , 'Commo n Non-Stationar y Components o f Asse t Prices' , Journal o f Economic Dynamics an d Control, 12 : 347-64.
312 Reference
s
BOSWIJK, H . P . (1991) , 'Testin g fo r Cointegratio n i n Structura l Models', Univer sity o f Amsterdam, Econometric s Discussio n Pape r AE7/91 . (1992), 'Efficien t Inferenc e on Cointegratio n Parameter s i n Structural Erro r Correction Models' , Universit y o f Amsterdam , Econometric s Discussio n Paper, -and FRANSES , P . H . (1992) , 'Dynami c Specificatio n an d Cointegration' , Oxford Bulletin o f Economics an d Statistics, 54: 369-81. Box, G . E . P. , an d JENKINS , G. M . (1970) , Time Series Analysis Forecasting and Control, Holden-Day , Sa n Francisco. and TIAO , G . C . (1977) , ' A Canonica l Analysi s o f Multipl e Tim e Series' , Biometrika, 64: 355-65. BRANDNER, P. , an d KUNST , R . (1990) , 'Forecastin g Vecto r Autoregressions : Th e Influence o f Cointegration', Memorandu m 265 , IAS , Vienna . CAMPBELL, B. , an d DUFOUR , J.-M . (1991) , 'Over-Rejection s i n Rationa l Expec tations Models : A Non-Parametri c Approac h t o th e Mankiw-Shapir o Prob lem', Economics Letters, 35 : 285-90. CAMPBELL, J . Y. , an d PERRON , P . (1991) , 'Pitfall s an d Opportunities : Wha t Macroeconomists Shoul d Kno w Abou t Uni t Roots' , i n Blanchard , O . J . an d Fischer, S . (eds) , NBER Economics Annual 1991, MIT Press . and SHILLER , R . J . (1991) , 'Cointegratio n an d Test s o f Presen t Valu e Models', Journal o f Political Economy, 95 : 1062-88. CHAMBERS, M . J . (1991) , ' A Not e o n Forecastin g i n Co-Integrate d Systems' , Department o f Economics, Universit y of Essex . CHAN, N . H. , an d WEI , C. Z . (1988) , 'Limitin g Distribution s o f Least-Square s Estimates o f Unstabl e Autoregressiv e Processes' , Annals o f Statistics, 16 : 367-401. CHOI, I . (1992) , 'Durbin-Hausma n Test s fo r Uni t Roots' , Oxford Bulletin o f Economics an d Statistics, 54: 289-304. CHONG, Y . Y. , an d HENDRY , D . F . (1986) , 'Econometri c Evaluatio n o f Linea r Macroeconomic Models' , Review o f Economic Studies, 53 : 671-90. CHOW, G . C . (1960) , 'Test s o f Equalit y Betwee n Set s o f Coefficient s i n Tw o Linear Regressions' , Econometrica, 52: 211-22. CHU, C.-S . J. , an d WHITE , H . (1991) , 'Testin g fo r Structura l Chang e i n som e Simple Tim e Serie s Models' , Discussio n Pape r 91-6 , Universit y of California, San Diego, Dept . o f Economics . (1992) ' A Direc t Tes t fo r Changin g Trend' , Journal o f Business an d Economic Statistics, 10: 289-99. CLEMENTS, M . P. , an d HENDRY , D . F . (1991) , 'O n th e Limitation s o f Mea n Square Erro r Forecas t Comparisons' , Discussio n pape r 138 , Oxfor d Institut e of Economic s an d Statistics . Forthcoming, Journal o f Forecasting. (1992), 'Forecastin g i n Cointegrate d Systems' , Discussio n pape r 139 , Oxford Institut e o f Economics an d Statistics . DAVIDSON, J . E . H. , HENDRY , D . F. , SRBA , F. , an d YEO , S. (1978) , 'Economet ric Modellin g of th e Aggregat e Time-Serie s Relationshi p Between Consumers ' Expenditure an d Incom e i n th e Unite d Kingdom' , Economic Journal, 88 : 661-92. DAVIDSON, R. , an d MACKINNON , J . G . (1992) , Estimation an d Inference i n Econometrics, Oxfor d University Press. DEATON, A . S. , an d MUELLBAUER , J . N . J . (1980) , Economics an d Consumer
References 31
3
Behavior, Cambridge University Press. DICKEY, D . A . (1976) , 'Estimatio n an d Hypothesi s Testin g fo r Nonstationar y Time Series' , Ph.D . dissertation , Iowa State University. and FULLER , W . A . (1979) , 'Distributio n o f the Estimator s fo r Autoregress ive Tim e Serie s wit h a Uni t Root' , Journal o f th e American Statistical Association, 74 : 427-31. -(1981), 'Likelihoo d Rati o Statistic s fo r Autoregressiv e Tim e Serie s with a Unit Root' , Econometrica, 49: 1057-72. — and PANTULA , S . G . (1987) , 'Determinin g th e Orde r o f Differencin g i n Autoregressive Processes' , Journal o f Business an d Economic Statistics, 15 : 455-61. — and SAID , S . E . (1981) , Testin g ARIMA(p , 1, q) agains t ARM A (p + l,q)', Proceedings of the Business and Economic Statistics Section, American Statistical Association, 28 : 318-22. — BELL, W . R. , an d MILLER , R . B . (1986) , 'Uni t Root s i n Tim e Serie s Models: Test s an d Implications', American Statistician, 40: 12-26. -HASZA, D . P. , an d FULLER , W . A . (1984) , 'Testin g fo r a Uni t Roo t i n Seasonal Tim e Series' , Journal o f th e American Statistical Association, 79 : 355-67. DURLAUF, S . N. , an d PHILLIPS , P . C . B . (1988) , 'Trend s versu s Random Walk s in Tim e Serie s Analysis', Econometrica, 56: 1333-54. ENGLE, R . F. , an d GRANGER , C . W . J . (1987) , 'Co-integratio n an d Erro r Correction: Representation , Estimatio n an d Testing' , Econometrica, 55 : 251-76. and Yoo , B . S . (1987) , 'Forecastin g an d Testin g i n Co-integrate d Systems', Journal o f Econometrics, 35: 143-59. (1991), 'Cointegrate d Economi c Tim e Series : A n Overvie w wit h New Results', i n R . F . Engl e an d C . W . J . Grange r (eds.) , Long-Run Economic Relationships, Oxfor d University Press, 237-66 . GRANGER, C . W . J. , an d HALLMAN , J . (1988) , 'Mergin g Short - an d Long-run Forecasts : An Applicatio n of Seasona l Co-integratio n to Monthl y Electricity Sales Forecasting', Journal of Econometrics, 40: 45-62. -HYLLEBURG, S. , an d LEE , H. S . (1993) , 'Seasona l Co-Integration : Th e Japanese Consumptio n Function' , Journal of Econometrics, 55: 275-98. -HENDRY, D . F. , an d RICHARD , J.-F . (1983) , 'Exogeneity' , Econometrica, 51: 277-304. ERICSSON, N . R . (1992) , Cointegration, Exogeneity an d Policy Analysis, Specia l Issue, Journal of Policy Modeling, 14 , 3 and 4 . CAMPOS, J. , an d TRAN , H.-A . (1990) , 'PC-GIV E an d Davi d Hendry' s Econometric Methodology' , Revista de Econometrica, X, 7-117. and HENDRY , D . F . (1985) , 'Conditiona l Econometri c Modelling : A n Application t o Ne w House Prices i n the Unite d Kingdom' , i n Atkinson, A. C . and Fienberg, S . E . (eds) , A Celebration o f Statistics, Springer-Verlag , 251-85. -HENDRY, D . F . an d TRAN , H.-A . (1992 ) 'Cointegration , Seasonality , Encompassing an d th e Deman d fo r Mone y i n th e Unite d Kingdom' , Discus sion Paper , Boar d o f Governor s o f th e Federa l Reserv e System , Washington, DC. ERMINI, L. , an d GRANGER , C . W . J . (1991) , 'Som e Generalization s o n th e
314 Reference
s
Algebra o f 7(1 ) Processes' , Workin g Paper , Departmen t o f Economics , University of Hawaii at Manoa . ERMINI, L. , an d HENDRY , D . F . (1991) , 'Lo g Incom e vs . Linea r Income : A n Application o f th e Encompassin g Principle' , Workin g Pape r no . 91-11 , De partment o f Economics, Universit y of Hawaii at Manoa. EVANS, G . B . A. , an d SAVIN , N . E . (1981) , 'Testin g fo r Uni t Roots : 1' , Econometrica, 49: 753-79. (1984), Testin g for Unit Roots : 2 ' Econometrica, 52 : 1241-69. FRIEDMAN, M. , an d SCHWARTZ , A . J . (1982) , Monetary Trends i n th e United States and the United Kingdom: Their Relation to Income, Prices, and Interest Rates, 1867-1975, Universit y o f Chicago Press . FULLER, W . A . (1976) , Introduction t o Statistical Time Series, John Wiley , New York. GALBRAITH, J . W. , DOLADO , J. , an d BANERJEE , A . (1987) , 'Rejection s o f Orthogonality i n Rationa l Expectation s Models: Furthe r Mont e Carl o Result s for a n Extende d Se t of Regressors', Economics Letters, 25 : 243-7. GANTMACHER, F . R . (1959) , Applications o f th e Theory o f Matrices, Inter science, Ne w York. GEL'FAND, J . M . (1967) , Lectures on Linear Algebra, Interscience , New York. GEWEKE, J . (1986) , 'Th e Super-Neutralit y of Mone y i n th e Unite d States : A n Interpretation o f the Evidence' , Econometrica, 54 : 1-21 . GHYSELS, E . (1990) , 'O n th e Economic s an d Econometric s o f Seasonally' , paper presente d t o th e Sixt h World Congress o f the Econometri c Society. GONZALO, J . (1990) , 'Compariso n o f Fiv e Alternativ e Method s o f Estimatin g Long-Run Equilibriu m Relationships' , Discussio n Paper , Universit y of Cali fornia a t Sa n Diego. GRANGER, C . W . J . (1981) , 'Som e Properties o f Time Serie s Dat a an d thei r Us e in Econometri c Mode l Specification' , Journal of Econometrics, 16: 121-30. (1983), 'Forecastin g Whit e Noise', i n A. Zellne r (ed.) , Applied Time Series Analysis o f Economic Data, Bureau o f the Census , Washington, DC, 308-14 . (1986), 'Development s i n th e Stud y of Co-integrate d Economi c Variables' , Oxford Bulletin of Economics an d Statistics, 48: 213-28. -and HALLMAN , J . (1991) , 'Th e Algebr a o f 1(1) Processes' , Journal of Time Series Analysis, 12 : 207-24. -and LEE , T.-H. (1990) , 'Multicointegration' , i n G . F . Rhode s Jr . an d T . B . Fomby (eds.) , Advances i n Econometrics, JA I Press , Greenwic h Conn. , 71-84. and NEWBOLD , P . (1974) , 'Spuriou s Regression s i n Econometrics' , Journal of Econometrics, 2: 111-20 . -(1977), 'Th e Tim e Serie s Approac h t o Econometri c Mode l Building' , in C . A . Sim s (ed.) , Ne w Methods i n Business Cycle Research, Federa l Reserve Ban k o f Minneapolis. -(1978), Forecasting Economic Time Series, Academi c Press , Ne w York. — and WEISS , A . A . (1983) , 'Time-Serie s Analysi s o f Error-Correctio n Models', i n S . Karlin , T . Amemiya , an d L . A . Goodma n (eds.) , Studies i n Econometrics, Time Series an d Multivariate Statistics, Academi c Press , Ne w York.
References 31
5
GREOOIR, S. , an d LAROQUE , G . (1991 ) 'Multivariat e Integrate d Tim e Series : A General Error Correctio n Representatio n wit h Associated Estimatio n an d Tes t Procedures', Discussio n pape r 53/G305 , INSEE, Paris . GRIMMET, G . R. , an d STIRZAKER , D . R . (1982) , Probability an d Random Processes, Oxford University Press. HALDRUP, N. , an d HYLLEBERG , S . (1991) , 'Integration , Near-Integratio n an d Deterministic Trends' , Discussio n Pape r no . 1991-15 , Aarhu s University , Denmark. HALL, A . (1989) , 'Testin g fo r a Uni t Roo t i n th e Presenc e o f Movin g Average Errors', Biometrika, 79 : 49-56. (1990), 'Testin g fo r a Uni t Roo t i n Tim e Serie s using Instrumenta l Variables Estimator s wit h Pre-tes t Data-Base d Mode l Selection' , Discussio n Paper, Nort h Carolin a Stat e University. -(1991), 'Mode l Selectio n an d Uni t Roo t Test s base d o n Instrumenta l Variables Estimators', Discussio n paper, North Carolin a Stat e University. HALL, A . D. , ANDERSON , H . M. , an d GRANGER , C . W . J . (1992) , ' A Cointegration Analysi s o f Treasur y Bil l Yields' , Review o f Economics an d Statistics, 74: 116-25. HALL, P. , an d HEYDE , C . C . (1980) , Martingale Limit Theory an d Applications, Academic Press , Ne w York. HALL, R . E . (1978) , 'Stochasti c Implication s o f th e Life-Cycl e Permanen t Income Hypothesis' , Journal of Political Economy, 86: 971-87. HAMMERSLEY, J . M. , an d HANDSCOMB , D . C . (1964) , Monte Carlo Methods, Methuen, London . HANSEN, B . E . (1991) , Test s fo r Paramete r Instabilit y in Regression s wit h 1(1) Processes', Discussio n paper . Universit y of Rochester . (1992), 'Testin g fo r Paramete r Instabilit y i n Linea r Models' , Journal o f Policy Modeling, 14 : 517-33. HARVEY, A . C . (1989) , Forecasting, Structural Time Series Models an d th e Kalman Filter, Cambridge Universit y Press. HASZA, D . P. , an d FULLER , W . A . (1982) , 'Testin g for Nonstationary Paramete r Specifications i n Seasona l Time-Serie s Models' , Annals o f Statistics, 10 : 1209-16. HENDRY, D . F . (1984) , 'Mont e Carl o Experimentatio n i n Econometrics' , ch . 16 in Z . Griliche s an d M . D . Intrilligato r (eds.) , Handbook o f Econometrics, ii , North-Holland, Amsterdam, 937-76. (1989), PC-GIVE: A n Interactive Econometric Modelling System, Institut e of Economic s an d Statistics , Oxfor d University, Oxford . (1991o), 'Usin g PC-NAIV E i n Teachin g Econometrics' , Oxford Bulletin o f Economics and Statistics, 53, 199-223. (1991 b), 'Economi c Forecasting' , Repor t t o th e Treasur y an d Civi l Servic e Committee, UK . and ANDERSON , G . J . (1977) , 'Testin g Dynami c Specificatio n i n Smal l Simultaneous Models : A n Applicatio n t o a Mode l o f Buildin g Societ y Beha vior i n th e Unite d Kingdom' , ch . 8 c i n M . D . Intrilligato r (ed.) , Frontiers o f Quantitative Economics, iii(a) , North-Holland, Amsterdam, 361-83 . and CLEMENTS , M. P . (1992) , 'Toward s a Theory o f Economic Forecasting', unpublished paper , Institut e of Economics an d Statistics , Oxfor d University.
316 Reference
s
HENDRY, D . F. , an d ERICSSON , N . R . (1991a) , 'A n Econometri c Appraisa l o f U.K. Mone y Deman d i n Monetary Trends i n th e United States and th e United Kingdom b y Milto n Friedma n an d Ann a J . Schwartz' , American Economic Review, 81: 8-38 . and ERICSSON , N . R . (19916) , 'Modellin g th e Deman d fo r Narro w Mone y in th e Unite d Kingdo m an d th e Unite d States' , European Economic Review, 35: 833-81 . -and MIZON , G . E . (1978) , 'Seria l Correlatio n a s a Convenien t Simplifica tion, no t a Nuisance : A Commen t o n a Stud y o f th e Deman d fo r Mone y b y the Ban k of England', Economic Journal, 88 : 549-63. (1992), 'Evaluatin g Dynami c Model s b y Encompassin g th e VAR' , i n P. C . B . Phillip s (ed.) , Models, Methods, an d Applications o f Econometrics, Basil Blackwell , Oxford. — and MORGAN , M . S . (1989) , ' A Re-analysi s o f Confluenc e Analysis' , Oxford Economic Papers, 41 : 35-52 : reprinte d i n N . d e March i an d C . L . Gilbert (eds.) , History an d Methodology o f Econometrics, Clarendo n Press , Oxford, 1990 . -MuELLBAUER, J . N . J. , an d MURPHY , A . (1990) , 'Th e Econometric s o f DHSY', i n J . D . He y an d D . Winc h (eds.) , A Century o f Economics, Basi l Blackwell, Oxford , 298-334. — and NEALE , A . J . (1987) , 'Mont e Carl o Experimentatio n usin g PC NAIVE', i n T . Fomb y an d G . Rhode s (eds.) , Advances i n Econometrics, vi , JAI Press, Greenwich , Conn. , 91-125. -(1988), 'Interpretin g Long-Ru n Equilibriu m Solution s i n Conventiona l Macro Models : A Comment' , Economic Journal, 98 : 808-17. -(1991), ' A Mont e Carl o Stud y o f th e Effect s o f Structura l Break s o n Unit Roo t Tests' , i n P . Hack l an d A . H . Westlun d (eds.) , Economic Structural Change: Analysis an d Forecasting, Springer-Verlag, Vienna , 95-119 . -and ERICSSON , N . R . (1990) , PC-NAIVE: A n Interactive Program fo r Monte Carlo Experimentation i n Econometrics, Institut e o f Economic s an d Statistics, Oxfor d University, Oxford. — PAGAN, A . R. , an d SARGAN , J . D . (1984) , 'Dynami c Specification' , ch . 18 in Z . Griliche s an d M . D . Intrilligato r (eds.) , Handbook o f Econometrics, ii, North-Holland, Amsterdam , 1023-100 . -and RICHARD , J.-F . (1982) , 'O n th e Formulatio n o f Empirica l Model s i n Dynamic Econometrics', Journal of Econometrics, 20: 3-33 . -and UNGERN-STERNBERG , T . VO N (1981) , 'Liquidit y an d Inflatio n Effects o n Consumers' Behaviour' , ch . 9 in A . S . Deato n (ed. ) Essays i n th e Theory an d Measurement o f Consumers' Behaviour, Cambridge Universit y Press, 237-60 . HUNTER, J . (1992) , 'Test s o f Cointegratin g Exogeneit y fo r PP P an d Uncovere d Interest Rat e Parit y in the UK' , Journal of Policy Modeling, 14 : 453-64. HYLLEBERG, S . (1991) , Modelling Seasonally, Oxfor d University Press. and MIZON , G . E . (1989a) , 'Cointegratio n an d Erro r Correctio n Mechan isms', Economic Journal (Supplement) , 99 : 113-25. -(1989&), ' A Not e o n th e Distributio n o f th e Leas t Square s Estimato r of a Random Wal k with Drift', Economics Letters, 29 : 225-30. — ENGLE, R . F. , GRANGER , C . W . J. , an d Yoo , B . S . (1990) , 'Seasona l Integration an d Co-Integration' , Journal of Econometrics, 44: 215-28.
References 31
7
ILMAKUNNAS, P . (1990) , Testin g th e Orde r o f Differencin g i n Quarterl y Data : An Illustratio n o f th e Testin g Sequence' , Oxford Bulletin o f Economics an d Statistics, 52: 79-88. IMHOF, P . (1961) , 'Computin g th e Distributio n o f Quadrati c Form s i n Norma l Variates', Biometrika, 48: 419-26. JARQUE, C . M. , an d BERA , A . K . (1980) , 'Efficien t Test s fo r Normality , Homoskedasticity an d Seria l Independence o f Regression Residuals' , Economics Letters, 6: 255-9. JAZWINSKI, A . H . (1970) , Stochastic Processes an d Filtering Theory, Academi c Press, Ne w York. JOHANSEN, S . (1988) , 'Statistica l Analysi s o f Cointegratio n Vectors' , Journal o f Economic Dynamics and Control, 12 : 231-54. (1989), 'Th e Power o f the Likelihoo d Rati o Tes t fo r Cointegration', mimeo, Institute o f Mathematical Statistics, Universit y of Copenhagen . (1991fl), 'Estimatio n an d Hypothesi s Testin g o f Cointegratio n Vector s i n Gaussian Vector Autoregressive Models', Econometrica, 59: 1551-80. (1991&), ' A Statistical Analysi s of Cointegration fo r 1(2 ) variables', Institut e of Mathematica l Statistics, Universit y of Copenhagen . (1992a), 'Cointegratio n i n Partia l System s an d th e Efficienc y o f Singl e Equation Analysis' , Journal o f Econometrics, 52: 389-402. (19926), Testin g Wea k Exogeneit y and th e Orde r o f Cointegratio n i n U K Money Demand', Journal of Policy Modeling, 14 : 313-34. -and JUSELIUS , K . (1990) , 'Maximu m Likelihoo d Estimatio n an d Inferenc e on Cointegration—wit h Application s t o th e Deman d fo r Money' , Oxford Bulletin of Economics and Statistics, 52: 169-210. KELLY, C . M . (1985) , ' A Cautionar y Not e o n th e Interpretatio n o f Long-Ru n Equilibrium Solution s i n Conventiona l Macr o Models' , Economic Journal, 95: 1078-86. KIVIET, J. , an d PHILLIPS , G . D . A . (1992) , 'Exac t Simila r Test s fo r Uni t Root s and Cointegration , Oxford Bulletin of Economics and Statistics, 54: 349-67. KLEIN, L . R . (1953) , A Textbook o f Econometrics, Row , Peterso n an d Com pany, Evanston, 111 . KOERTS, J. , an d ABRAHAMSE , A . P . J . (1969) , O n th e Theory an d Application o f the General Linear Model, Rotterda m Universit y Press. KREMERS, J . J . M. , ERICSSON , N . R. , an d DOLADO , J . (1992) , Th e Powe r o f Co-integration Tests' , Oxford Bulletin of Economics and Statistics, 54: 325-48. KWIATKOWSKI, D. , PHILLIPS , P . C . B. , an d SCHMIDT , P . (1991) , Testin g the Null Hypothesis o f Stationarit y agains t the Alternativ e o f a Uni t Root: Ho w Sur e Are W e tha t Economi c Tim e Serie s Hav e a Uni t Root' , Cowle s Foundatio n Discussion Pape r No . 979 . LEYBOURNE, S . J. , an d MCCABE , B . P . M . (1992) , ' A Simpl e Tes t fo r Cointegration', typescrip t Nottingham University. LIN, C.-F. , an d TERASVIRTA , T . (1991) , Testin g th e Constanc y o f Regressio n Parameters agains t Continuou s Structura l Change', Discussio n paper , Univer sity o f California at Sa n Diego . MCCALLUM, B . T . (1984) , 'O n Low-Frequency Estimate s o f Long-Run Relation ships in Macroeconomics', Journal of Monetary Economics, 14 : 3-14 . MACKINNON, J . G . (1991) , 'Critica l Value s fo r Co-Integratio n Tests' , i n R . F .
318 Reference
s
Engle an d C . W . J . Grange r (eds.) , Long-Run Economic Relationships, Oxford Universit y Press, 267-76 . MANKIW, N . G. , an d SHAPIRO , M . D . (1985) , 'Trends , Rando m Walk s and Test s of th e Permanen t Incom e Hypothesis' , Journal o f Monetary Economics, 16 : 165-74. (1986), 'D o W e Rejec t To o Often ? Smal l Sampl e Propertie s o f Test s of Rationa l Expectation s Models', Economics Letters, 20: 139-45 . MANN, H . B. , an d WALD , A . (1943) , 'O n Stochasti c Limi t an d Orde r Relation ships', Annals o f Mathematical Statistics, 14: 217-77. MIZON, G . E . (1977) , 'Mode l Selectio n Procedures' , i n M . J . Arti s an d A . R . Nobay (eds.) , Studies in Modern Economic Analysis, Basi l Blackwell, Oxford. and HENDRY , D . F . (1980) , 'A n Empirica l Applicatio n an d Mont e Carl o Analysis o f Test s o f Dynami c Specification', Review o f Economic Studies, 47 : 21-45. MORGAN, M . S . (1990) , Th e History o f Econometric Ideas, Cambridg e Univer sity Press . MOSCONI, R. , an d GIANNINI , C . (1992) , 'Non-Causalit y i n Cointegrate d Systems : Representation, Estimatio n an d Testing' , Oxford Bulletin o f Economics an d Statistics, 54: 399-417. NANKERVIS, J . C. , an d SAVIN , N . E . (1985) , 'Testin g th e Autoregressiv e Parameter wit h the r-statistic' , Journal of Econometrics, 27: 143-61 . (1987), 'Finit e Sampl e Distribution s o f t an d F Statistic s i n a n AR(1) model with an Exogenous Variable' , Econometric Theory, 3 : 387-408. NELSON, C . R. , an d KANG , H . (1981) , 'Spuriou s Periodicit y i n Inappropriatel y Detrended Tim e Series' , Journal of Monetary Economics, 10 : 139-62. NEWEY, W . K. , an d WEST , K . D . (1987) , ' A Simpl e Positiv e Semi-Definit e Heteroskedasticity an d Autocorrelation-Consistent Covarianc e Matrix' , Econometrica, 55: 703-8. NOWAK, E . (1990) , 'Hidde n Cointegration' , Discussio n paper , Universit y o f California a t Sa n Diego. NYBLOM, J . (1989) , 'Testin g fo r th e Constanc y o f Parameter s ove r Time' , Journal o f th e American Statistical Association, 84: 223-30. OSBORN, D . R. , CHIU , A . P . L. , SMITH , J . P. , an d BIRCHENHALL , C . R . (1988) , 'Seasonality an d th e Orde r o f Integratio n fo r Consumption' , Oxford Bulletin of Economics an d Statistics, 50: 361-78 . OSTERWALD-LENUM, M . (1992) , ' A Not e wit h Fractile s o f th e Asymptoti c Distribution o f th e Maximu m Likelihoo d Cointegratio n Ran k Tes t Statistics : Four Cases' , Oxford Bulletin o f Economics an d Statistics, 54: 461-72. PANTULA, S . G . (1991) , 'Testin g fo r Uni t Root s i n Tim e Serie s Data' , Econometric Theory, 5 : 265-71. PARK, J . Y. , an d PHILLIPS , P . C . B . (1988) , 'Statistica l Inferenc e in Regression s with Integrate d Processes : Par t F, Econometric Theory, 4 : 468-97. PERRON, P . (1988) , 'Trend s an d Rando m Walk s in Macroeconomi c Tim e Series : Further Evidenc e fro m a New Approach' , Journal of Economic Dynamics an d Control, 12 : 297-332. (1989), 'Th e Grea t Crash , th e Oi l Shoc k an d th e Uni t Roo t Hypothesis' , Econometrica, 57: 1361-402. PHILLIPS, P . C . B . (1986) , 'Understandin g Spuriou s Regression s i n Economet -
References 31
9
tics', Journal o f Econometrics, 33: 311-40. — (1987o), 'Tim e Serie s Regressio n wit h a Uni t Root' , Econometrica, 55 : 277-301. — (19875), 'Toward s a Unifie d Asymptoti c Theor y o f Autoregression' , Biometrika, 74 : 535-48. -(1988a), 'Reflection s o n Econometri c Methodology' , Economic Record, 64: 344-59. — (19885), 'Multipl e Regressio n wit h Integrate d Tim e Series' , Contemporary Mathematics, 80 : 79-105. -(1991), 'Optima l Inferenc e i n Co-integrate d Systems' , Econometrica, 59 : 282-306. — and DURLAUF , S . N . (1986) , 'Multipl e Tim e Serie s Regressio n wit h Integrated Processes' , Review of Economic Studies, 53: 473-95. — and HANSEN , B . E . (1990) , 'Statistica l Inferenc e i n Instrumenta l Variables Regression wit h 1(1) Processes' , Review of Economic Studies, 57 : 99-125. — and LORETAN , M . (1991) , 'Estimatin g Long-Ru n Economi c Equilibria' , Review of Economic Studies, 58: 407-36. — and OULIARIS , S . (1988) , Testin g fo r Co-integratio n usin g Principa l Components Methods' , Journal o f Economic Dynamics an d Control, 12 : 205-30. -(1990), 'Asymptoti c Propertie s o f Residua l Base d Test s fo r Cointegra tion', Econometrica, 58: 165-93. — and PARK , J . Y . (1988) , 'Asymptoti c Equivalenc e o f Ordinar y Leas t Squares an d Generalize d Leas t Square s i n Regression s wit h Integrate d Vari ables', Journal of th e American Statistical Association, 83: 111-15. -and PERRON , P . (1988) , 'Testin g fo r a Uni t Roo t i n Tim e Serie s Regres sion', Biometrika, 75 : 335-46. PRIESTLEY, M . B . (1989) , Nonlinear an d Nonstationary Time Series Analysis, Academic Press , Ne w York. QUANDT, R . E . (1978) , 'Test s o f Equilibriu m vs . Disequilibriu m Hypotheses' , International Economic Review, 19 : 435-52. (1982), 'Econometri c Disequilibriu m Models' , Econometric Reviews, 1 : 1-63. RAPPOPORT, P. , an d REICHLIN , L . (1989) , 'Segmente d Trend s an d Non-Station ary Tim e Series' , Economic Journal, 99 : 168-77. REIMERS, H . E . (1991) , 'Comparison s o f Test s fo r Multivariat e Co-integration', Discussion Pape r no . 58, Christian-Albrechts University, Kiel. RIPLEY, B . D . (1987) , Stochastic Simulation, Joh n Wiley , New York. SAID, S . E. , an d DICKEY , D . A . (1984) , 'Testin g fo r Uni t Root s i n Autoregres sive-Moving Average Models of Unknown Order', Biometrika, 71 : 599-607. SAIKKONNEN, P . (1991) , 'Asymptoticall y Efficien t Estimatio n o f Cointegratin g Regressions', Econometric Theory, 1 : 1-21 . SAMPSON, M . (1991) , 'Th e Effec t o f Paramete r Uncertaint y o n Forecas t Vari ances an d Confidenc e Interval s fo r Uni t Roo t an d Tren d Stationar y Time Series Models' , Journal o f Applied Econometrics, 6 : 67-76. SARGAN, J . D . (1964) , 'Wage s an d Price s i n th e Unite d Kingdom : A Stud y i n Econometric Methodology' , i n P . E . Hart , G . Mills , an d J . K . Whitake r (eds.), Econometric Analysis fo r National Economic Planning, Butterworth ,
320 Reference
s
London; reprinte d i n D. F . Hendr y an d K. F . Wallis (eds.), Econometrics and Quantitative Economics, Basil Blackwell , Oxford , 1984 . SARGAN, J . D . (1980) , 'Som e Test s o f Dynami c Specificatio n fo r a Singl e Equation', Econometrica, 48: 879-97. and BHAROAVA , A . (1983) , 'Testin g Residual s fro m Leas t Square s Regres sion fo r Bein g Generate d b y th e Gaussia n Rando m Walk' , Econometrica, 51 : 153-74. SCHMIDT, P. , an d PHILLIPS , P . C . B . (1992) , 'L M tes t fo r a Uni t Roo t i n th e Presence o f Deterministi c Trends' , Oxford Bulletin o f Economics an d Statistics, 54: 257-87. SCHWERT, G . W . (1989) , 'Test s fo r Uni t Roots : A Mont e Carl o Investigation' , Journal o f Business and Economic Statistics, 1: 147-59. SHEPPARD, D . K . (1971) , Th e Growth and Role o f U K Financial Institutions 1890-1962, Methuen , London . SIMS, C. A. (ed. ) (1977) , New Methods in Business Cycle Research, Federa l Reserve Ban k o f Minneapolis. STOCK, J. H. , an d WATSON , M . W . (1990) , 'Inference i n Linear Tim e Serie s with Som e Uni t Roots' , Econometrica, 58 : 113-44. SPANOS, A . (1986) , Statistical Foundations o f Econometric Modelling, Cambridg e University Press . STOCK, J . H . (1987) , 'Asymptoti c Propertie s o f Least-Square s Estimator s o f Co-integrating Vectors', Econometrica, 55 : 1035-56. and WATSON , M . W . (1988«) , 'Variabl e Trend s i n Economi c Tim e Series' , Journal o f Economic Perspectives, 2: 147-74. (1988&), 'Testin g fo r Commo n Trends' , Journal o f th e American Statistical Association, 83: 1097-107. - (1991) ' A Simpl e MLE o f Cointegratin g Vectors i n Genera l Integrate d Systems', Typescript , Northwester n University , -and WEST , K . D . (1988) , 'Integrate d Regressor s an d Test s o f th e Perman ent Incom e Hypothesis' , Journal of Monetary Economics, 21: 85-96. TODA, H. , an d PHILLIPS , P . C . B . (1991) , 'Vecto r Autoregression s an d Causal ity', Cowle s Foundation Discussio n Paper, 997 . URBAIN, J.-P . (1992) , 'O n Wea k Exogeneit y i n Erro r Correctio n Models' , Oxford Bulletin o f Economics an d Statistics, 54: 187-207. WEST, K . D . (1988) , 'Asymptoti c Normality , whe n Regressor s hav e a Uni t Root', Econometrica, 56 : 1397-418. WHITE, H . (1980) , ' A Heteroskedasticity-Consisten t Covarianc e Matri x Estima tor an d a Direct Tes t for Heteroskedasticity' , Econometrica, 48 : 817-38. (1984), Asymptotic Theory fo r Econometricians, Academi c Press , Ne w York. WICKENS, M . R. , an d BREUSCH , T . S . (1988) , 'Dynami c Specification , the Lon g Run an d th e Estimatio n o f Transforme d Regressio n Models' , Economic Journal, 9 8 (Conference 1988) : 189-205 . WOLD, H . (1954) , A Study i n th e Analysis o f Stationary Time Series, Almqvis t and Wiksell , Stockholm . YULE, G . U . (1926) , 'Wh y D o W e Sometime s Ge t Nonsens e Correlation s Between Tim e Series ? A Stud y i n Samplin g and th e Natur e o f Tim e Series' , Journal o f th e Royal Statistical Society, 89 : 1-64 .
Acknowledgements for Quoted Extracts The author s ar e gratefu l t o the followin g fo r permission t o reproduce extracts: Elsevier Scienc e Publishers , fo r materia l from N . G . Manki w and M . D . Shapir o (1986), 'D o w e reject to o often : Small-sampl e properties o f rational expectations models', Economics Letters, 20: 142-3. The Review o f Economic Studies, fo r materia l fro m P . C . B . Phillip s an d B . E . Hansen (1990) , 'Statistica l Inferenc e i n Instrumenta l Variables Regressio n wit h 1(1) Processes', Review of Economic Studies, 57: 116-17. The Econometri c Societ y fo r materia l fro m D . A . Dicke y an d W . A . Fulle r (1981), 'Likelihoo d Rati o Statistic s fo r Autoregressiv e Tim e Serie s wit h a Uni t Root', Econometrica, 49: 1062-3. David A . Dickey , Professor o f Statistics, North Carolin a Stat e University. John Wile y & Sons , Inc. , fo r materia l fro m Wayn e A. Fulle r (1976) , Introduction to Statistical Time Series, 371-3.
This page intentionally left blank
Author Inde x Abadir, K . M . 126 , 128 Abrahamse, A . P . J . 10 4 Ahn, S . K . 30 5 Anderson, G . J . 5 , 50, 140 Anderson, H . 27 2 Anderson, T. W . 70n. , 26 5 n., 285 Andrews, D . W . K . 31 0 Banerjee, A . 55 , 95, 97, 163 , 166, 177n., 187, 191 , 192, 214, 215, 220, 222, 230 , 233, 306 , 307 Bardsen, G . 47 , 53, 56, 62, 235 Bewley, R. 47 , 49, 53, 152 , 305 Bhargava, A. 101 , 104, 155, 176, 207, 209 Billingsley, P . 24 , 89 Birchenhall, C . R . 12 2 Bossaerts, P . 29 8 Boswijk, H . P . 235 , 305, 307, 310 Box, G . E . P . 10 , 13, 121, 305 Brandner, P . 28 2 Breusch T . S . 47 , 55 , 56 , 59 , 62 , 63 , 64 Campbell, B . 167n . Campbell, J . Y . 30 6 Campos, J . 23 6 Chan, N . H . 91 , 96 n. Chiu, A . P . L . 12 2 Choi, I . 30 6 Chong, Y . Y . 28 2 Chow, G . C . 194n . Chu, C.-S . J. 31 0 Clements, M . P . 282 , 283, 285 Davidson, J . E . H . 5 , 50, 52, 140, 300 Davidson, R. 16 , 28 Deaton, A. S . 5 3 Dickey, D . A . 8 , 24, 82, 100 , 103, 107, 108, 112-23 , 169 Dolado, J. J . 55 , 97, 163, 166, 177n., 187, 191, 192 , 230 Dufour, J.-M . 167n. Durlauf, S . N . 82 , 92 , 93 , 182 , 203, 238 , 254, 262n . Engle, R . F . 6 , 7, 17 , 18, 19, 43, 67, 84n., 121, 122 , 137 n., 145 , 146, 152, 157-9, 163, 205n. , 208, 209, 211, 215, 231, 242 , 256, 261, 278, 279, 282, 283, 287, 288 , 305, 30 9
Ericsson, N . R . 18 , 28, 29, 41, 230, 232 , 236, 238, 269, 292, 301 Ermini, L . 32 , 193-7 Evans, G . B . A . 10 4 Fisher, I. 6 5 Fisher, L . 30 5 Frances, P.-H . 23 5 Friedman, M . 29 , 190 , 194 Fuller, W . A . 8 , 13 , 14, 15 , 24, 26, 100-3 , 106, 107 , 112-23, 169 Galbraith, J . W . 55 , 98, 166 , 177n., 191 Gantmacher, F . R . 14 0 Gel'fand, J . M . 14 0 Ghysels, E. 12 1 Giannini, C . 31 0 Gonzalo, J . 240 , 285, 286, 293, 294, 296-8 Granger, C . W . J. 6 , 7, 32 , 43, 69, 70, 81, 83, 84n. , 121 , 137n., 138 , 139, 145, 146, 157-9, 196 , 205n., 208, 209, 215, 231, 256, 257, 260, 261, 272, 278, 285, 287 , 307, 309, 310 Gregoir, S . 30 4 Grimmet, G . R . 9 6 Haldrup, N . 9 6 Hall, A . 107 , 119, 130, 133, 306 Hall, A. D. 27 2 Hall, P . 23 , 24, 89n., 179n . Hall, R . E . 164 , 165, 177 Hallman, J. 32 , 121 Hammersley, J . M . 2 8 Handscombe, D . C . 2 8 Hansen, B . E . 176 , 194, 238-41, 246, 248-51, 261, 294, 299, 310 Harvey, A . C . 30 3 Hasza, D . P . 122 , 123 Hendry, D . F . 5 , 17 , 28, 29, 32, 41, 47, 48, 49, 50 , 53 , 65 , 95 , 101 , 140, 162, 163, 193-5, 197 , 221, 229, 231-3, 235, 236 , 238, 269, 278, 279, 282, 283, 285, 288 , 292, 300, 301, 306-309 Heyde, C . C . 23 , 24, 89n., 179n . Hunter, J. 31 0 Hylleberg, S . 96 , 121-3 , 152 , 170 Ilmakunnas, P . 12 1 Imhof, P . 104 , 207
324
Author Index
Jenkins, G . M . 10 , 13, 121 Johansen, S . 43 , 96 , 146 , 151 , 153 , 211 , 256, 257 , 260 , 262 , 265 , 268 , 271 , 272 , 277, 287 , 288 , 290 , 292 , 294 , 297, 298 , 307, 31 0 Juselius, K . 271 , 272 , 277, 290 , 31 0 Kang, H. 19 1 Kelly, C . M . 47 , 64 , 65, 66 Kiviet, J . 104 , 105 , 169n. , 232 Klein, L . R. 31 0 Koerts, J . 10 4 Kremers, J . M . J. 230-3 , 306 Kunst, R . 28 2 Kwiatkowski, D. 30 4 Laroque, G. 30 4 Lee, H . S . 12 1 Lee, T.-H . 287 , 307 Lin, C.-F . 31 0 Loretan, M . 163 , 288 , 29 1 Leybourne, S . J. 30 4 McCabe, B . P . M . 30 4 McCallum, B . T. 47 , 64- 6 MacKinnon, J . G . 16 , 28, 211, 213 , 214 Mankiw, N . G . 164 , 165 , 166 , 177n. , 191 Mann, H . B . 1 4 Mizon, G . E . 101 , 152 , 162 , 170 , 231 , 235 , 278, 285 , 288 , 292 , 300 , 30 7 Morgan, M . S . 5 , 308 Mosconi, R . 31 0 Muellbauer, J . N . J . 5 3 Murphy, A . 5 3 Nankervis, J. C . 10 4 Neale, A . J . 47 , 65, 221, 309 Nelson, C . R . 19 1 Newbold, P . 69 , 70, 81 , 83 , 138 , 139 , 19 1 Newey, W. K. Il l Nowak, E . 30 8 Nyblom, J . 310 Orden, D . 30 5 Osborn, D . R . 122 , 12 3 Osterwald-Lenum, M. 268-76 , 292 Ouliaris, S . 133 , 134 , 208 , 210 , 21 1 Pagan, A . R . 4 8 Pantula, S . G. 120 , 121 , 30 6 Park, J . Y . 176 , 238 Perron, P. 107 , 109 , 111-19 , 133, 248n. , 304, 306 Phillips, G . D . A . 104 , 105 , 169n. , 232 Phillips, P . C . B . 22 , 24, 43, 71, 72, 81-3 , 86-8, 90-3 , 95 , 96, 101 , 107 , 109 , 111 , 113, 114 , 119 , 129 , 133 , 134 , 163 , 175 ,
176, 179n. , 182 , 203 , 208 , 210 , 211 , 222 , 230, 238-41 , 242-51, 254 , 261, 262n. , 277, 288 , 290 , 291 , 294 , 304-6, 310 Ploberger, W . 31 0 Priestley, M . B . 4 0 Quandt, R . E . 3 Rappoport, P. 30 9 Reichlin, L . 30 9 Reimers, H . E . 28 6 Reinsel, G . C . 30 5 Richard, J.-F . 18 , 162 Ripley, B. D. 2 8 Rothenberg, T . 220n . Said, S . E. 82 , 107 , 108 , 11 3 Saikkonnen, P . 30 5 Sampson, M . 28 2 Sargan, J. D . 5 , 48, 50, 101 , 140 , 155 , 176 , 207, 209 , 229 , 231 , 238 , 28 5 Savin, N . E . 10 4 Schmidt, P . 101 , 304 , 306 Schwartz, A. J . 29 , 194 Schwert, G . W . 82 , 114 , 119 , 130 , 248n . Shapiro, M . D. 164-6 , 177n. , 191 Sheppard, D . K . 13 9 Sims, C. A . 43 , 125 , 162 , 168 , 178 , 186- 9 Smith, G . W . 16 3 Spanos, A . 12 , 16 , 72 , 162 Stirzaker, D . R . 9 6 Stock, J. H . 43 , 119 , 152 , 158 , 163 , 172 , 177, 178 , 185-90 , 192 , 211 , 278 , 291 , 294, 296-8 Terasvirta, T . 31 0 Tiao, G . C . 30 5 Toda, H. 31 0 Tran, H.-A . 236 , 301 Ungern-Sternberg, T . vo n 28 8 Urbain, J.-P . 30 7 Wald, A . 14 , 43 Watson, M. W . 119 , 152 , 178 , 187-90 , 211, 278 , 291 , 294 , 298 Wei, C . Z. 91 , 96n. West, K . D . 105 , 111 , 169 , 171 , 172 , 177 , 178, 185-7 , 18 9 n., 192 White, H . 15 , 16, 27, 86 , 89 , 90, 310 Wickens, M . R . 47 , 55 , 56 , 59 , 62, 63 , 64 Wold, H. 25 7
Yeo, S . 5 Yoo, B . S . 121 , 152 , 208 , 209 , 278 , 279 , 282, 283 , 287 , 305 Yule, G . U . 69 , 70n., 71, 77, 138
Subject Inde x absolute summabilit y 15 8 adjustment: coefficient 15 5 disequilibrium 51 , 52, 55, 61 speed of 26 8 approximation theore m 12 3 asymptotic: convergence 15 8 independence 16 , 17 normality 105 , 126, 134, 163, 177, 178, 180, 185 ; and drif t ter m 169-7 4 asymptotic standar d erro r (ASE ) 235 Augmented Dickey-Fulle r tes t (ADF ) 106 , 108, 109 , 207-12, 232-4 , 238, 239 n. asymptotic distributio n 127 , 128 comparison wit h non-parametrically ad justed D F 114- 9 use o f IV i n 11 9 autocorrelation 13 , 71-2, 83 , 129, 163, 191, 206, 207, 212, 221 n., 238-42, 244, 286, 29 2 function 12 , 1 3 autocovariance functio n 12 , 13 autoregressive: -distributed lag (ADL) model 47-55 , 60-4, 224 , 239, 242 error 83 , 114 , 191, 291 process 12 , 72, 251, 257-60; see also autoregressive moving-average (ARMA) proces s representation (VAR) , see co-integrat ing: representations o f co-integrate d systems autoregressive integrate d moving-averag e (ARIMA) process 13 , 38, 39, 221 autoregressive moving-averag e (ARMA ) process 12 , 13, 39, 84 , 85 , 88 , 107, 108 examples o f 32- 8 Bardsen transformation , se e transformation: Bardse n Bartlett windo w 24 8 Bewley: representation 152 , 153 transformation, se e transformation : Bewley bias 67 , 68, 191 , 244, 246-8, 249, 250, 290 , 309 in AR(1 ) parameter 100 , 101 correction ter m 241 , 246
in estimate s o f co-integratin g vecto r 162-3, 214-30 , 238 , 239, 246, 250, 252 second-order 163 , 176, 238, 240, 246 , 296, 29 7 simultaneity 238 , 241, 297, 298 borderline-stationary 39 , 95, 166 , 208, 225 see also near-integrate d proces s bounds tes t 133 , 134 Brownian motio n 21 , 89 , 152 , 153, 241, 243, 246 , 247, 255, 278, 296, 297 see also Wiene r proces s vector, 200- 3 Cayley-Hamilton theore m 14 0 central limi t theorem 16 , 73, 88, 89, 171, 295 functional (FCLT) , see functional centra l limit theore m Liapunov 16 , 27, 44 Lindeberg-Feller 2 7 co-integrating: combination 279 , 283, 288 parameters 215 , 220, 222, 224, 248 rank 145 , 146, 262 regresssion 191 , 220, 229, 230; asymptotic theory o f 174- 7 representations o f co-integrated system s (EC, MA , VAR) 146 , 153-7, 257-6 1 vector 137 , 138, 145, 158, 159, 163, 205, 214, 236 , 248, 252-6, 262, 267, 268, 276, 277, 285, 289, 290, 293; asymptotic distributio n o f estimator s of 293-8; biase s i n estimation of , see bias; generalized 179 ; invariance of 300- 3 co-integration 6-8 , 67 , 136-61, 167 , 189, 255, 268 , 300, 308 definition 14 5 in logarithm s or level s 198 , 199 multi- 287 , 307 seasonal 121 , 151 space 256 , 266-99, 273, 279 system 257 , 260, 261 testing for 9 , 134 , 176, 205-52, 286; table o f critical value s 213 ; test power 230- 5 common facto r 13 , 101, 231, 233, 235, 238, 239, 285, 296 common tren d 152 , 153, 278 companion for m 143 , 181-3, 272 concentrated serie s 88 , 89, 263, 264, 272
326
Subject Inde x
conditioning, imprope r 244 , 245 constant, inclusio n of 212-1 9 continuous mapping theorem 89 , 90 convergence: in distributio n 1 6 of functional s o f Wiener processe s 91 , 183 in probabilit y 14 , 15 , 16 , 86, 157 , 176 , 185 to rando m variabl e 86 , 89 rate o f 14 , 125 , 158-9 , 168 weak 23 , 8 9 Cramer's theore m 173 , 17 7 cross-equation restriction s 155 , 24 5 decomposition 179 , 240 , 260, 296 deterministic trend , se e trend: non stochastic de-trending 70 , 82 , 83 , 191 spurious 92- 3 diagonalization 265 , 266 , 273, 290 Dickey-Fuller: distribution/critical value s 97 , 98, 100-3, 105, 106 , 121 , 129-32 , 167 , 169 , 170 , 210-11, 268; table s 102- 3 test (DF ) 101 , 104-10 , 112 , 114-19 , 207-12, 231 , 233 , 235 , 236 , 238 , 239 n., 267 ; asymptoti c distribution of 124-7 ; tests o n more tha n on e parameter 113 , 114 , 11 6 differencing 11 , 30, 99 , 111 , 119 , 134 , 139 , 147, 153 , 158 , 168 , 192 , 199 , 30 0 seasonal 121 , 12 2 diffusion proces s 9 6 discontinuity 95 , 96 Donsker's theore m 8 9 drift ter m 9 , 72 , 101 , 106 , 108 , 111 , 15 1 see also trend : non-stochasti c dummy variable 134 , 270-6 , 288 Durbin-Hausman test s 30 6 Durbin-Watson tes t 73 , 81, 93 in co-integrating regression (CRD W test) 176 , 207-8, 235-6 dynamic: estimator 223 , 224-30 , 237 , 243 , 244, 247-51 modelling/regression 5 , 8 , 46, 47, 50 , 51, 106, 163 , 167-71 , 177 , 178 , 192 , 214 , 221 n., 222-4, 225-6, 229 , 239, 243 , 246, 24 7 omitted dynamic s 157 , 220 , 22 9 specification 168 , 240 , 242-4 system 27 8 Edge-worth expansio n 23 9 n. eigen-: value 134 , 140 , 143 , 144 , 179 , 265 , 266, 267, 268 , 270 , 277 , 292, 298
vector 265 , 270 , 292 , 298 empirical data/result s 29-32 , 40-2 , 52-3 , 159, 194-7 , 235-8, 269-71, 292, 293 encompassing 193 , 198 , 23 8 endogeneity 176 , 24 6 Engle-Granger: theorem 159-6 2 two-step procedure 153 , 157-61 , 205n., 278, 285, 283 equilibrium: dis- 2 miltiplier, se e long-run: multiplier relationship 2-9 , 46 , 47, 50, 54, 55, 136-9, 192 , 205 state 2 , 4 static 4 8 ergodicity 16 , 17 , 88 , 8 9 error-correction 5 , 6 , 47 , 51 , 55 , 63, 64, 96, 224n., 246 mechanism 5-7 , 51-4 , 139 , 140 , 151 , 232, 234 , 238 , 268 , 270-5 , 278 , 279 , 294, 300 , 30 4 model 47 , 49-52, 55 , 63, 158 , 159 , 239 , 243, 256 , 257, 260 , 61 , 268, 274 , 277-9, 290 ; generalize d 50 , 52 , 60 , 61 representation 138 , 139 , 153 ; definition of 145 ; derivation o f 154- 7 term 50-3 , 60 , 61, 140 , 151 , 155 , 157 , 262 exact tes t 10 5 exogeneity 17-18 , 288 strict 19 , 67 strong 18 , 20, 222-3, 244 , 252 , 291 super 18-2 0 in uni t roo t test s 10 7 weak 18 , 20, 65-8 , 163 , 168 , 192 , 204 , 223, 240 , 243-5, 248 , 251-2, 261, 268 , 288-91, 295; importanc e i n co-inte grated processe s 252 , 307 finite sampl e biases , se e bia s Fisher effec t 6 5 forecasting 278-8 5 multi-step 18 , 19 frequency: domain 88n . zero v. seasonal 12 2 Frisch-Waugh theore m 70n . full-information maximum-likelihoo d (FIML) 238 , 239 , 241 , 245 , 250 , 297 , 298 fully modifie d estimation 238-41 , 243 , 244, 246-50 estimator 243 , 244 , 247, 248 , 249, 250 method 239 , 240 functional centra l limi t theore m (FCLT) 22 , 89 , 124-7, 261 , 295 , 299
Subject Index generalized co-integratin g vector 17 9 general-to-specific modellin g 168 , 192 Granger causalit y 18 , 291 Granger Representatio n Theore m 48 , 146-53, 300 homogeneity 47 , 51, 52, 60, 61, 221, 222 , 231, 23 6 impact matri x 151 , 260 inconsistent regressio n 164-8 , 190 , 191, 229, 230 innovation sequenc e 12 , 85-7, 183 instrumental variable s (IV) 55 , 59, 62, 63, 119, 130- 3 integrated process 1 , 6, 7, 11 , 12, 21, 39 , 69-71, 73, 136-8, 162-9 9 asymptotic theory o f 86-9 1 near-, see near-integrated process properties o f 84- 6 see also non-stationar y proces s integration: order of , se e ordej r o f integration seasonal, see seasonal integratio n intercept 72 , 151 , 210, 232, 234, 271, 272 , 273, 274 interim multiplie r representation 15 3 invariance 20 , 282, 283 principle 22 ; see also functiona l central limit theore m invertibility 13 , 84, 108 , 242 invertible system 148 , 149, 258, 259, 266 Jacobian 62 , 63 Johansen maximum-likelihoo d procedure 211 , 262-9, 285, 286, 300 power o f 277 , 278 Kronecker product 18 1 lag 9 , 11 , 47, 50, 52, 66, 106-8 , 123 , 225 , 248, 250, 251, 286, 303 length 248 , 286 mean 28 7 polynomial 22 9 structure 208 , 222, 229 truncation paramete r 110 , 111, 113 latent roo t 13 , 104 , 142, 144, 158, 224 law o f large numbers 86 , 90 life-cycle hypothesi s 164 , 188 likelihood rati o test s 153 , 277, 278, 294, 295 limited-information maximum-likelihoo d (LIML) 264 , 28 5 linear system 30 0 logarithms v. level s 29-32 , 193- 7
327
long-run: covariance matrix 240 , 241, 245-7, 252, 290 multiplier 8 , 47-9, 51 , 54, 57, 59-64, 188, 230 , 235, 293, 295, 296; variance of estimate s o f 61- 4 relationship 2 , 7, 8 , 140 , 220; see also co-integrating: vecto r response 15 3 solution 50 , 64-8 marginal: distribution 18 , 19 , 290, 295 process 240 , 243-5, 248n . marginalization 30 4 market clearing 3 martingale difference sequence (MDS ) 11, 12, 21, 163 , 179n., 185, 242, 244, 245 , 247 maximal-eigenvalue statisti c 267 , 273 maximum-likelihood 159 , 241-5, 256 , 262 , 264, 265, 266, 267, 269, 277, 283, 285, 286, 288 full-information, se e full-information maximum-likelihood limited-information, se e limited-information maximum-likelihood mean la g 144 , 287, 301 memory 8 5 mixing: coefficient 8 7 strong 16 , 17 , 87 uniform 16 , 17 mixingale 17 9 n. Monte Carlo : method 9 , 27, 28 response surface s 28 , 211, 213, 214 results 73-83 , 101 , 106 , 108, 114, 117-19, 133, 165, 214, 215, 222-3, 225-9, 232-5, 248-51, 279, 282, 283 , 285, 291, 298 standard erro r 7 5 moving-average 12 , 88; see also auto regressive moving-average (ARMA) process component o f errors 10 7 negative components 113 , 119, 250, 304 parameter 24 8 n. representation 133 , 153, 155, 156 seasonal filte r 12 1 multiple roots 119-2 2 multiplier, long-run, se e long-run: multiplier near-integrated process 95-7 , 99 , 164, 166, 225, 231, 277 nearly-inconsistent regressio n 229 , 230 non-centrality parameter 97 , 98
328
Subject Inde x
non-parametric: correction/test 9 , 108-10 , 114-9 , 130 , 208, 210 , 211 , 238-40 , 25 1 asymptotic theory o f 129-3 0 estimation 244 , 248 , 249 nonsense regressio n 69 , 80, 138 see also spuriou s regressio n non-stationarity 4 , 8, 9, 65, 67, 72, 81-4 , 134, 150 , 21 5 transformation t o stationarit y 69 , 70, 82, 83, 99 , 134 , 14 7 non-stationary process 5 , 6 , 9 , 38 , 39, 70, 71, 81 , 163 , 24 4 v. integrate d proces s 1 2 normality 180 , 28 9 asymptotic, se e asymptotic : normalit y normalization 57-9 , 265 , 285 nuisance parameter s 100 , 104-6 , 172 , 176 , 207, 21 0 order: of magnitud e 14 , 15 , 21 , 9 0 in probabilit y 14 , 1 5 order of integration 6-9 , 48 , 79-80, 84 , 85, 147, 151 , 190-2 , 258 defined 8 4 first 137 , 17 7 higher 138 , 157 , 16 3 zero 13 7 Ornstein-Uhlenbeck proces s 9 6 orthogonal complemen t 14 7 orthogonality 86 , 149 , 151 , 242 , 244 , 245 , 258n., 259,260 , 273 asymptotic 10 7 testing 164- 8 over-identification tes t 278 , 30 0 over-rejection 206 , 210 , 28 6 parameterization 48 , 207 , 208 , 250 , 274 , 275 of dynamic s 22 1 exact 105 , 224 of nearly-integrate d processe s 9 5 over-/under- 224-9 , 262 permanent incom e hypothesi s 164 , 177 , 178, 188 , 19 0 Perron-Phillips/Phillips test, se e non-para metric: correction/tes t polynomial matrice s 140-5 , 152 , 257 isomorphism wit h companion mat rices 142- 4 power serie s expansio n 9 7 power o f tests 8 , 15 , 96, 101 , 108 , 113 , 198 , 208, 214 , 223-4 , 230-5, 277 , 278 , 28 6 pre-determinedness 1 9 random wal k 11 , 21, 22, 24-9 , 38 , 71, 72, 82, 87 , 93 , 100 , 101 , 114 , 191 , 220 , 272
in logarithm s o r level s 19 3 n. see also unit root rank: co-integrating, se e co-integrating: ran k full 56 , 58 , 59 , 144 , 147 , 151 , 181 , 258 , 260, 28 7 reduced 144 , 147 , 151 , 256 , 257 , 264 , 285, 287 , 288 , 30 1 recursive estimatio n 194n. , 221 n. re-parameterization 67 , 157 , 168 , 189 , 191 , 222 see also transformatio n representation theorem, see Granger Rep resentation Theore m Said-Dickey tes t 107 , 108 compared wit h Perron-Phillip s tes t 11 3 Sargan-Bhargava test , se e Durbin-Watson test (CRD W test) Schwarz Criterion 194 , 28 6 seasonal adjustmen t filte r 301 , 303 seasonal integratio n 121- 3 sequential cu t 18 , 19 similar test s 100 , 104 , 105 , 16 9 n. size distortion s 113 , 133 , 166 , 16 7 Slutsky's theore m 89 , 173 spurious: correlation 70 , 71; in de-trended rando m walks 82 , 8 3 regression 69-81 , 83 , 92-5, 134 , 138-9 , 158, 159 , 162 , 191 , 230 , 25 5 stacked form , se e companion for m static regression 162 , 163 , 167 , 205 , 214 , 220-3, 231 , 238 , 246 , 251 , 29 6 comparison wit h dynami c 167 , 168 , 224-30 example o f 23 6 see also Engle-Granger: two-ste p pro cedure stationarity 1 , 4, 12 , 13 , 17 , 69, 212 , 26 2 stationary proces s 4 , 5, 6 , 7, 9, 11 , 29, 38, 39, 47 , 85 , 86 , 134 , 138 , 256 , 257 , 267 , 279 strictly 11 , 1 2 weakly/second-order/covariance 11 , 1 2 stochastic: differential equatio n 9 6 trend, se e trend, stochasti c structural representatio n 261 , 30 3 super-consistency 158 , 176 , 191 , 214 , 220 , 230, 251 , 294 , 296 total effect 142 , 25 7 trace 267 , 273 transformation 6 , 28-32, 88, 111 , 125 , 178-80, 185 ADL 51 , 59 ADL t o EC M 60 , 61 300, 301
Subject Inde x transformation (cont.): Bardsen 51 , 54-9, 62 , 63 Bewley 51 , 53-6, 58n. , 59, 60, 62, 63 equivalence of , 54-60 , 62 , 64 linear 47 , 51 , 60, 61, 63, 64, 145 , 152 , 178, 224 ; in dynamic regression 167-8 , 177, 178 ; o f polynomial matrice s 144 , 145 logarithmic 99 , 192- 9 trend (inclusio n of) 5 , 9, 82, 100, 101 , 106 , 125, 185 , 211 , 212 , 213 , 214 , 236 non-stochastic (deterministic ) 6 , 20, 21, 69-72, 82, 84, 125 , 146 , 151 , 172 , 173 , 185, 187 , 27 5 stochastic 153 , 169 , 172 , 174 , 179 , 180 , 185, 187 , 191 ; se e also commo n trend ; unit roo t sums of powers o f 2 0 unit circl e 13 , 104 , 123 , 141 , 149 , 15 8 unit root 8 , 9 , 13 , 38, 72, 83-6, 95 , 96, 133, 144 , 147 , 163 , 177 , 185 , 215 , 236 , 255, 258-60, 267, 270 , 287 , 289 multiple 12 2
329
near- 95 , 99; see also near-integrate d process in polynomial matri x 14 1 testing for 8 , 96, 99-135, 206, 211 , 215 , 306; descriptiv e valu e 306 ; in marginal processes 306 ; a t seasona l frequency 120-3 variance-covariance matri x 62 , 107 , 183 , 189, 243 , 252-4 , 273 long-run 248 , 249 vector autoregressio n (VAR ) 278 , 279 , 283, 291 , 29 2 vectoring operato r 181 , 273 Wald statisti c 127 , 188 , 23 9 Wiener proces s 21-3 , 26 , 86-91, 93 , 96, 131, 188 , 189 , 241 , 261 , 268 distribution 191 , 22 1 functional o f 24 , 90 , 93 , 125-8 , 163 , 188 , 300 multivariate 182-4 , 200-3, 268 white noise 11 , 12, 22, 87, 106 , 23 1 Wold Decompositio n Theore m 257 , 258