LIST OF CONTRIBUTORS Badi H. Baltagi
Texas A&M University, Department of Economics, College Station, TX 77843-4228, USA. E-mail:
[email protected]
M. Douglas Berg
Department of Economics and International Business, Sam Houston State University, Huntsville, TX 77341, USA
Richard Blundell
Institute for Fiscal Studies and University College London, UK. E-mail:
[email protected]
Stephen Bond
Institute for Fiscal Studies and Nuffield College, Oxford, UK. E-mail:
[email protected]
Jörg Breitung
Humboldt University Berlin, Institute of Statistics and Econometrics, Spandauer Strasse 1, D-10178 Berlin, Germany. Fax: + 49.30.2093.5712; E-mail:
[email protected]
Min-Hsien Chiang
National Cheng-Kung University, Institute of International Business, Tainan, Taiwan. Fax: 886-6-2766459; E-mail:
[email protected]
Alain Hecq
University Maastricht, Department of Quantitative Economics, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Fax: + 31-43-388 48 74
Nazrul Islam
Department of Economics, Emory University, Atlanta, GA 30322-2240, USA. Fax: 404-727-4639; E-mail:
[email protected] vii
viii
Chihwa Kao
Syracuse University, Center for Policy Research, Syracuse, NY 13244-1020, USA. Fax: 315-443-1081; E-mail:
[email protected]
Heikki Kauppi
University of Helsinki, Department of Economics, P.O. Box 54 (Unioninkatu 37), FIN-00014 University of Helsinki, Finland. Fax: + 358-9-1917980; E-mail:
[email protected]
Qi Li
Department of Economics, Texas A&M University, College Station, TX 77843 and Department of Economics, University of Guelph, Guelph, Ontario, N1G 2W1 Canada. E-mail:
[email protected]
Chris Murray
Department of Economics, University of Houston, Houston, TX 77204-5882, USA. Fax: (713) 743-3798; E-mail:
[email protected]
Franz C. Palm
University Maastricht, Department of Quantitative Economics, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Fax: + 31-43-388 48 74
David H. Papell
Department of Economics, University of Houston, Houston, TX 77204-5882, USA. Fax: (713) 743-3798; E-mail:
[email protected]
Peter Pedroni
Indiana University, Department of Economics, Bloomington, IN 47405, USA. E-mail:
[email protected]
Aman Ullah
Department of Economics, University of California, Riverside, CA 92521, USA
ix
Jean-Pierre Urbain
University Maastricht, Department of Quantitative Economics, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Fax: + 31-43-388 48 74; E-mail:
[email protected]
Frank Windmeijer
Institute for Fiscal Studies, 7 Ridgmount Street, London WC1E 7AE, UK. Fax: + 44.(0)20.7323.4780; E-mail:
[email protected]
Showen Wu
Department of Finance and Managerial Economics, State University of New York at Buffalo, Buffalo, NY 14260, USA
Yong Yin
Department of Economics, State University of New York at Buffalo, Buffalo, NY 14260, USA. Fax: 716-645-2127; E-mail:
[email protected]
INTRODUCTION Badi H. Baltagi, Thomas B. Fomby and R. Carter Hill Twenty two years ago, the first special issue on panel data econometrics was published by the Annales de l’INSEE. This consisted of two volumes containing a list of ‘who’s who’ in economics and econometrics of panel data that was edited by Mazodier (1978). Since then, several books on panel data have been written including the econometric society monograph by Hsiao (1986), a two volume collection of classic papers on the subject by Maddala (1993), a Handbook, which in its second edition contained 33 chapters edited by Matyas & Sevestre (1996) and a textbook by Baltagi (1995a). Several special issues of journals with a panel data theme have also appeared since 1978, those include Raj & Baltagi (1992), Matyas (1992), Carraro, et al. (1993), Baltagi (1995b), Sevestre (1999) and Banerjee (1999). There have been nine international conferences on panel data since the first conference at INSEE, the last one was held at the University of Geneva in June, 2000. Panel data econometrics continues to have an important impact on today’s empirical economics studies. A Journal of Economic Literature search returned 2780 citations using the words ‘panel data’ between 1980 and 2000. This volume is dedicated to two recent intensive areas of research in the econometrics of panel data: nonstationary panels and dynamic panels, see the survey chapter by Baltagi & Kao in this volume. The volume includes eleven refereed chapters on this subject written by twenty authors. The editors are grateful to the authors and referees for their cooperation. The chapter by Baltagi & Kao surveys the nonstationary panels, cointegration in panels and dynamic panels literature. In particular, panel unit root tests are considered first and several important chapters are reviewed including a summary of the finite sample properties of these unit roots tests obtained from
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 1–5. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
1
2
BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL
extensive simulations. Also, spurious regressions in panel data are considered followed by panel cointegration tests with a summary of the finite sample properties of these cointegration tests using Monte Carlo experiments. Next, estimation and inference in panel cointegration models is considered and the chapter concludes with a review of recent developments in dynamic panel data models that have occurred over the last five years. The chapter by Blundell, Bond & Windmeijer reviews recent developments in the estimation of dynamic panel data models using generalized method of moments (GMM). In particular, this chapter focuses on the system GMM estimator derived by Blundell & Bond (1998) which relies on relatively mild restrictions on the initial condition process. This system GMM estimator encompasses the GMM estimator based on the non-linear moment conditions available in the dynamic error components model. Monte Carlo experiments and asymptotic variance calculations show that this extended GMM estimator can offer considerable efficiency gains in situations where the first differenced GMM estimator performs poorly. The chapter by Pedroni develops methods for estimating and testing hypotheses for cointegrating vectors in dynamic panels. In particular, this chapter proposes methods based on fully modified OLS principles which account for considerable heterogeneity across individual members of the panel. The asymptotic properties of various estimators are compared based on pooling along the within and between dimensions of the panel. Monte Carlo simulations show that the group mean estimator is well behaved even in relatively small samples under a variety of scenarios. The chapter by Hecq, Palm & Urbain extends the concept of serial correlation common features analysis to nonstationary panel data models. This analysis is motivated both by the need to study and test for common structures and comovements in panel data with autocorrelation present and by an increase in efficiency due to pooling. The authors propose sequential testing procedures and test their performance using a small scale Monte Carlo. Concentrating upon the fixed effects model, they define homogeneous panel common feature models and give a series of steps to implement these tests. These tests are used to investigate the liquidity constraints model for 22 OECD and G7 countries. The presence of a panel common feature vector is rejected at the 5% nominal level. The chapter by Breitung studies the local power of panel unit root test statistics against a sequence of local power alternatives. In particular, this chapter finds that the Levin & Lin (1993) (LL) and Im, Pesaran & Shin (1997) (IPS) tests suffer from severe loss of power if individual specific trends are
Introduction
3
included. Breitung suggests a test statistic that does not employ a bias adjustment whose power is substantially higher than that of LL or the IPS tests using Monte Carlo experiments. This chapter also finds that the power of the LL and IPS tests is sensitive to the specification of the deterministic terms. The chapter by Kao & Chiang studies the limiting distributions of ordinary least squares (OLS), fully modified OLS (FMOLS) and dynamic OLS (DOLS) estimators in a panel cointegrated regression model. This chapter shows that the OLS, FMOLS and DOLS estimators are all asymptotically normally distributed. However, the asymptotic distribution of the OLS estimator has a non-zero mean. Extensive Monte Carlo experiments are performed which show that the OLS estimator has a non-negligible bias in finite samples, the FMOLS estimator does not improve on the OLS estimator in general, and the DOLS estimator outperforms both OLS and FMOLS. The chapter by Murray & Papell proposes a panel unit roots test in the presence of structural change. In particular, this chapter proposes a unit root test for non-trending data in the presence of a one-time change in the mean for a heterogeneous panel. The date of the break is endogenously determined. The resultant test allows for both serial and contemporaneous correlation, both of which are often found to be important in the panel unit roots context. Murray & Papell conduct two power experiments for panels of non-trending, stationary series with a one-time change in means and find that conventional panel unit root tests generally have very low power. Then they conduct the same experiment using methods that test for unit roots in the presence of structural change and find that the power of the test is much improved. The chapter by Kauppi develops a new limit theory for panel data that may be cross sectionally heterogeneous in a fairly general way. This limit theory builds upon the concepts of joint convergence in probability and in distribution for double indexed processes by Phillips & Moon (1999a). This limit theory is applied to a panel regression model with regressors that are generated by an autoregressive process with a root local to unity. The main results are the following: (i) the usual pooled panel OLS estimator is invalid for inference, (ii) a bias corrected pooled OLS proves to be NT consistent with an asymptotic normal distribution centered on the true parameter value irrespective of whether the regressors have near or exact unit roots. This positive result holds only in the special case where the model does not exhibit any deterministic effects, such as individual intercepts. (iii) The fully modified panel estimator of Phillips & Moon (1999a) is also subject to severe bias effects if the regressors are nearly rather than exactly cointegrated. These theoretical results are confirmed using Monte Carlo results.
4
BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL
The chapter by Yin & Wu proposes stationarity tests for a heterogeneous panel data model. The authors consider the case of serially correlated errors in the level and trend stationary models. The proposed panel tests utilize the Kwaitkowski, Phillips, Schmidt & Shin (1992) test and the Leybourne & McCabe (1994) test from the time series literature. Two different ways of pooling information from the independent tests are used. In particular, the group mean and the Fisher type tests are used to develop the panel stationarity tests. Monte Carlo experiments are performed that reveal good small sample performance in terms of size and power. The chapter by Berg, Li & Ullah considers the problem of estimating a semiparametric partially linear dynamic panel data model with disturbances that follow a one-way error component structure. Two new semiparametric instrumental variable (IV) estimators are proposed for the coefficient of the parametric component. These are shown to be more efficient than the ones suggested by Li & Stengos (1996) and Li & Ullah (1998) because they make full use of the error component structure. This is confirmed using Monte Carlo experiments. The chapter by Islam conducts a Monte Carlo study to investigate the small sample properties of dynamic panel data estimators. Although there are extensive Monte Carlo studies on this subject, this study customizes the design to the estimation of the growth convergence equation using the SummersHeston data. Islam concludes that the OLS estimation of the growth-convergence equation is likely to give misleading results. At the same time, indiscriminate use of panel estimators is risky and one should make judicious choice of panel estimators.
REFERENCES Only references that are not cited later in the volume are given here. Baltagi, B. H. (1995b). Editor’s Introduction: Panel Data. Journal of Econometrics, 68, 1–4. Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of Economics and Statistics, 61, 607–629. Carraro, C., Peracchi, F., & Weber, G. (Eds.) (1993). The Econometrics of Panels and Pseudo Panels. Journal of Econometrics, 59, 1–211. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Maddala, G. S. (Ed.) (1993). The Econometrics of Panel Data. Vols. 1 and 2. Cheltenham: Edward Elgar. Matyas, L. (Ed.) (1996). Modelling Panel Data. Structural Change and Economic Dynamics, 3, 291–384.
Introduction
5
Matyas, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: Handbook of Theory and Applications. Dordrecht: Kluwer Academic Publishers. Mazodier P. (Ed.) (1978). The Econometrics of Panel Data. Annales de I’INSEE, 30/31. Raj, B., & Baltagi, B. (1992). Editors’ Introduction and Overview: Panel Data Analysis. Empirical Economics, 17, 1–8. Sevestre, P. (1999). 1977–1997: Changes and Continuities in Panel Data. Annales D’Economie et de Statistique, 55–56, 15–25.
NONSTATIONARY PANELS, COINTEGRATION IN PANELS AND DYNAMIC PANELS: A SURVEY Badi H. Baltagi and Chihwa Kao ABSTRACT This chapter provides an overview of topics in nonstationary panels: panel unit root tests, panel cointegration tests, and estimation of panel cointegration models. In addition it surveys recent developments in dynamic panel data models.
I. INTRODUCTION Two important areas in the econometrics of panel data that have received much attention recently are dynamic panel data1 and nonstationary panel time series models.2 This special issue focuses on these two topics. With the growing use of cross-country data over time to study purchasing power parity, growth convergence and international R&D spillovers, the focus of panel data econometrics has shifted towards studying the asymptotics of macro panels with large N (number of countries) and large T (length of the time series) rather than the usual asymptotics of micro panels with large N and small T. In fact, the limiting distributions of double indexed integrated processes had to be developed, see Phillips & Moon (1999a). The fact that T is allowed to increase to infinity in macro panel data, generated two strands of ideas. The first rejected the homogeneity of the regression parameters implicit in the use of a pooled Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 7–51. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
7
8
BADI H. BALTAGI & CHIHWA KAO
regression model in favor of heterogeneous regressions, i.e. one for each country, see Pesaran & Smith (1995), Im, Pesaran & Shin (1997), Lee, Pesaran & Smith (1997), Pesaran, Shin & Smith (1999) and Pesaran & Zhao (1999) to mention a few. This literature critically relies on T being large to estimate each country’s regression separately. Another strand of literature applied time series procedures to panels, worrying about non-stationarity, spurious regressions and cointegration. Adding the cross-section dimension to the time-series dimension offers an advantage in the testing for nonstationarity and cointegration. The hope of the econometrics of nonstationary panel data is to combine the best of both worlds: the method of dealing with nonstationary data from the time series and the increased data and power from the cross-section. The addition of the cross-section dimension, under certain assumptions, can act as repeated draws from the same distribution. Thus as the time and cross-section dimension increase panel test statistics and estimators can be derived which converge in distribution to normally distributed random variables. However, the use of such panel data methods are not without their critics, see Maddala, Wu & Liu (2000) who argue that panel data unit root tests do not rescue purchasing power parity (PPP). In fact, the results on PPP with panels are mixed depending on the group of countries studied, the period of study and the type of unit root test used. More damaging is the argument by Maddala et al. that for PPP, panel data tests are the wrong answer to the low power of unit root tests in single time series. After all, the null hypothesis of a single unit root is different from the null hypothesis of a panel unit root for the PPP hypothesis. Using the same line of criticism, Maddala (1999) argued that panel unit root tests did not help settle the question of growth convergence among countries. However, it was useful in spurring much needed research into dynamic panel data models. Also, Quah (1996) who argued that the basic issues of whether poor countries catch up with the rich can never be answered by the use of traditional panels. Instead, Quah suggested formulating and estimating models of income dynamics. One can find numerous applications of time series methods applied to panels in recent years. Examples from the purchasing power parity literature include Bernard & Jones (1996), Jorion & Sweeney (1996), MacDonald (1996), Oh (1996), Wu (1996), Coakley & Fuertes (1997), Culver & Papell (1997), Papell (1997), O’Connell (1998), Choi (1999a), Andersson & Lyhagen (1999), and Canzoneri, Cumby & Diba (1999) to mention a few. On health care expenditures, see McCoskey & Selden (1998), and Gerdtham & Löthgren (1998). On growth and convergence, see Islam (1995), Evans & Karas (1996), Sala-i-Martin (1996), Lee, Pesaran & Smith (1997), and McCoskey & Kao
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
9
(1999a). On international R&D spillovers, see Funk (1998) and Kao, Chiang & Chen (1999). On exchange rate models, see Groen & Kleibergen (1999), and Groen (1999). On savings and investment models, see Coakely, Kulasi & Smith (1996) and Moon & Phillips (1998). The first part of this chapter surveys some of the developments in nonstationary panel models that have occurred since the middle of 1990s. Two other recent surveys on this subject include Phillips & Moon (1999b) on multiindexed processes and Banerjee (1999) on panel unit roots and cointegration tests. We will pay attention to the following three topics: (1) panel unit root tests, (2) panel cointegration tests, and (3) estimation and inference in the panel cointegration models. The discussion of each topic will be illustrated by examples taken from the aforementioned list of references. Section 2 reviews panel unit root tests, while Section 3 discusses the panel spurious models. Section 4 considers the panel cointegration tests, while Section 5 discusses panel cointegration models. Section 6 reviews some recent developments in dynamic panels and Section 7 gives our conclusion. A word on notation. We write the integral 01W(s)ds, as W when there is no ambiguity over limits. We define 1/2 to be any matrix such that p = (1/2)(1/2), use ⇒ to denote weak convergence, → to denote convergence in probability, I(0) and I(1) to signify a time series that is integrated of order zero and one, respectively, and WZ(r) = W(r) [ WZ][ ZZ]Z(r) to denote an L2 projection residual of W(r) on Z(r).
II. PANEL UNIT ROOTS TESTS Testing for unit roots in time series studies is now common practice among applied researchers and has become an integral part of econometric courses. However, testing for unit roots in panels is recent, see Levin & Lin (1992), Im, Pesaran & Shin (1997), Harris & Tzavalis (1999), Maddala & Wu (1999), Choi (1999a), and Hadri (1999). Exceptions are Bharagava et al. (1982), Boumahdi & Thomas (1991), Breitung & Meyer (1994), and Quah (1994). Bharagava et al. proposed a test for random walk residuals in a dynamic model with fixed effects. They suggested a modified Durbin-Watson (DW) statistic based on fixed effects residuals and two other test statistics based on differenced OLS residuals. In typical micro panels with N → , they recommended their modified DW statistic. Boumahdi & Thomas (1991) proposed a generalization of the Dickey-Fuller (DF) test for unit roots in panel data to assess the efficiency of the French capital market using 140 French stock prices over the
10
BADI H. BALTAGI & CHIHWA KAO
period January 1973 to February 1986. Breitung & Meyer (1994) applied various modified DF test statistics to test for unit roots in a panel of contracted wages negotiated at the firm and industry level for Western Germany over the period 1972–1987. Quah (1994) suggested a test for unit root in a panel data model without fixed effects where both N and T go to infinity at the same rate such that N/T is constant. Levin & Lin (1992) generalized this model to allow for fixed effects, individual deterministic trends and heterogeneous serially correlated errors. They assumed that both N and T tend to infinity. However, T increases at a faster rate than N with N/T → 0. Even though this literature grew from time series and panel data, the way in which N, the number of crosssection units, and T, the length of the time series, tend to infinity is crucial for determining asymptotic properties of estimators and tests proposed for nonstationary panels, see Phillips & Moon (1999a). Several approaches are possible including: (i) sequential limits where one index, say N, is fixed and T is allowed to increase to infinity, giving an intermediate limit. Then by letting N tend to infinity subsequently, a sequential limit theory is obtained. Phillips & Moon (1999b) argued that these sequential limits are easy to derive and are helpful in extracting quick asymptotics. However, Phillips and Moon provided a simple example that illustrates how sequential limits can sometimes give misleading asymptotic results. (ii) A second approach, used by Quah (1994) and Levin & Lin (1992) is to allow the two indexes, N and T to pass to infinity along a specific diagonal path in the two dimensional array. This path can be determined by a monotonically increasing functional relation of the type T = T(N) which applies as the index N → . Phillips & Moon (1999b) showed that the limit theory obtained by this approach is dependent on the specific functional relation T = T(N) and the assumed expansion path may not provide an appropriate approximation for a given (T, N) situation. (iii) A third approach is a joint limit theory allowing both N and T to pass to infinity simultaneously without placing specific diagonal path restrictions on the divergence. Some control over the relative rate of expansion may have to be exercised in order to get definitive results. Phillips & Moon argued that, in general, joint limit theory is more robust than either sequential limit or diagonal path limit. However, it is usually more difficult to derive and requires stronger conditions such as the existence of higher moments that will allow for uniformity in the convergence arguments. The muti-index asymptotic theory in Phillips & Moon (1999a, b) is applied to joint limits in which T, N → and (T/N) → , i.e. to situations where the time series sample is large relative to the cross-section sample. However, the general approach given there is also applicable to situations in which (T/ N) → 0 although different limit results will generally obtain in that case.
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
11
A. Levin & Lin (1992) Tests Consider the model yit = iyit 1 + ziti + uit, i = 1, . . . , N; t = 1, . . . , T,
(1)
where zit is the deterministic component and uit is a stationary process. zit could be zero, one, the fixed effects, i, or fixed effect as well as a time trend, t. The Levin & Lin (LL) tests assume that uit are iid(0, 2u) and i = for all i. LL are interested in testing the null hypothesis H0 : = 1
(2)
against the alternative hypothesis Ha : < 1. Let ˆ be the OLS estimator of in (1) and define zt = (z1t, . . . , zNt),
T
h(t, s) = zt
1
ztzt
zs,
t=1
T
u˜ it = uit
h(t, s)uis,
s=1
and
T
y˜ it = yit
h(t, s)yis.
(3)
s=1
Then we have
N
1
NT(ˆ 1) =
N 1 N
T
1
i=1 N
i=1
T
1 T2
y˜ i, t 1u˜ it
t=1 T
y˜ 2i, t 1
t=1
and the corresponding t-statistic, under the null hypothesis is given by
N
(ˆ 1)
t =
i=1
se
T
t=1
y˜ 2i, t 1
,
12
BADI H. BALTAGI & CHIHWA KAO
where
N
1 s = NT 2 e
i=1
T
u˜ 2it.
t=1
Assume that there exists a scaling matrix DT and piecewise continuous function Z(r) such that DT 1z[Tr] → Z(r) uniformly for r[0, 1]. For a fixed N, we have
N
1 N
i=1
and
N
y˜ i, t 1u˜ it ⇒
t=1
N
1 N
T
1 T
i=1
1 N
T
1 T2
WiZ dWiZ
i=1
N
y˜ 2i, t 1 ⇒
t=1
1 N
W 2iZ,
i=1
as T → . Next we assume that WiZ dWiZ and W 2iZ, are independent across i and have finite second moments. Then it follows that
N
N
1 N
1 N
p
W 2iZ → E
i=1
WiZ dWiZ E
i=1
W 2iZ
WiZ dWiZ
⇒ N 0, Var
WiZ dWiZ
as N → by a law of large numbers and the Lindeberg-Levy central limit theorem. The following moments are taken from Levin & Lin (1992): zit
E[ WiZ dWiZ] Var[ WiZ dWiZ] 1 0 0 2 1 1 0 3 1 1 i 2 12 1 1 (i, t) 2 60
E[ W 2iZ] 1 2 1 2 1 6 1 15
Var[ W2iZ] 1 3 ? 1 45 11 6300
(4)
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
13
Using (4), Levin & Lin (1992) obtain the following limiting distributions of NT(ˆ 1) and t: zit 0 1
ˆ NT(ˆ 1) ⇒ N(0, 2) NT(ˆ 1) ⇒ N(0, 2)
i
NT(ˆ 1) + 3N ⇒ 0,
t t ⇒ N(0, 1) t ⇒ N(0, 1)
51 5
(i, t) N(T(ˆ 1) + 7.5) ⇒ N 0,
1.25t + 1.875N ⇒ N(0, 1)
2895 112
(5)
448 (t + 3.75N) ⇒ N(0, 1) 277
Sequential limit theory, i.e. T → followed by N → , is used to derive the limiting distributions in (5). In case uit is stationary, the asymptotic distributions of ˆ and t need to be modified due to the presence of serial correlation. Harris & Tzavalis (1999) also derived unit root tests for (1) with zit = {0}, {i}, or {(i, t)} when the time dimension of the panel, T, is fixed. This is the typical case for micro panel studies. The main results are: zit
ˆ
0
N(ˆ 1) ⇒ N 0,
i
N ˆ 1 +
(i, t) N ˆ 1 +
2 T(T 1)
3 3(17T 2 20T + 17) ⇒ N 0, T+1 5(T 1)(T + 1)3
15 15(193T 2 728T + 1147) ⇒ N 0, 2(T + 2) 112(T + 2)3(T 2)
Harris & Tzavalis (1999) also showed that the assumption that T tends to infinity at a faster rate than N as in LL rather than T fixed as in the case in micro panels, yields tests which are substantially undersized and have low power especially when T is small. Recently, Frankel & Rose (1996), Oh (1996), and Lothian (1996) tested the PPP hypothesis using panel data. All of these articles use LL tests and some of them report evidence supporting the PPP hypothesis. O’Connell (1998),
14
BADI H. BALTAGI & CHIHWA KAO
however, showed that the LL tests suffered from significant size distortion in the presence of correlation among contemporaneous cross-sectional error terms. O’Connell highlighted the importance of controlling for cross-sectional dependence when testing for a unit root in panels of real exchange rates. He showed that, controlling for cross-sectional dependence, no evidence against the null of a random walk can be found in panels of up to 64 real exchange rates. Virtually all the existing nonstationary panel literature assume crosssectional independence. It is true that the assumption of independence across i is rather strong, but it is needed in order to satisfy the requirement of the Lindeberg-Levy central limit theorem. Moreover, as pointed out by Quah (1994), modeling cross-sectional dependence is involved because individual observations in a cross-section have no natural ordering. Driscoll & Kraay (1998) presented a simple extension of common nonparametric covariance matrix estimation techniques which yields standard errors that are robust to very general forms of spatial and temporal dependence as the time dimension becomes large. In a recent paper, Conley (1999) presented a spatial model of dependence among agents using a metric of economic distance that provides cross-sectional data with a structure similar to time-series data. Conley proposed a generalized method of moments (GMM) using such dependent data and a class of nonparametric covariance matrix estimators that allow for a general form of dependence characterized by economic distance. B. Im, Pesaran & Shin (1997) Tests The LL test is restrictive in the sense that it requires to be homogeneous across i. As Maddala (1999) pointed out, the null may be fine for testing convergence in growth among countries, but the alternative restricts every country to converge at the same rate. Im, Pesaran & Shin (1997) (IPS) allow for a heterogeneous coefficient of yit 1 and proposed an alternative testing procedure based on averaging individual unit root test statistics. IPS suggested an average of the augmented DF (ADF) tests when uit is serially correlated with different serial correlation properties across cross-sectional units, i.e. uit = pj =i 1 ijuit j + it. Substituting this uit in (1) we get:
pi
yit = iyit 1 +
ij yit j + ziti + it.
j=1
The null hypothesis is H0 : i = 1
(6)
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
15
for all i and the alternative hypothesis is Ha : i < 1 for at least one i. The IPS t-bar statistic is defined as the average of the individual ADF statistics as
N
1 ¯t = N
ti,
(7)
i=1
where ti is the individual t-statistic of testing H0 : i = 1 in (6). It is known that for a fixed N
1
WiZ dWiZ
ti ⇒
0
1
= tiT
(8)
1/2
W 2iZ
0
as T → . IPS assume that tiT are iid and have finite mean and variance. Then
N
1 N N
tiT E[tiT | i = 1]
i=1
Var[tiT | i = 1]
⇒ N(0, 1)
(9)
as N → by the Lindeberg-Levy central limit theorem. Hence tIPS =
N(¯t E[tiT | i = ]) Var[tiT | i = 1]
⇒ N(0, 1)
(10)
as T → followed by N → sequentially. The values of E[tiT | i = 1] and Var[tiT | i = 1] have been computed by IPS via simulations for different values of T and pis. In this volume, Breitung (2000) studies the local power of LL and IPS tests statistics against a sequence of local alternatives. Breitung finds that the LL and IPS tests suffer from a dramatic loss of power if individual specific trends are included. This is due to the bias correction that also removes the mean under the sequence of local alternatives. The simulation results indicate that the power of LL and IPS tests is very sensitive to the specification of the deterministic terms. McCoskey & Selden (1998) applied the IPS test for testing unit root for per capita national health care expenditures (HE) and gross domestic product (GDP) for a panel of OECD countries. McCoskey & Selden rejected the null
16
BADI H. BALTAGI & CHIHWA KAO
hypothesis that these two series contain unit roots. Gerdtham & Löthgren (1998) claimed that the stationarity found by McCoskey & Selden are driven by the omission of time trends in their ADF regression in (6). Using the IPS test with a time trend, Gerdtham & Löthgren found that both HE and GDP are nonstationary. They concluded that HE and GDP are cointegrated around linear trends following the results of McCoskey & Kao (1999b). C. Combining P-Values Tests Let GiTi be a unit root test statistic for the i-th group in (1) and assume that as Ti → , GiTi ⇒ Gi. Let pi be the p-value of a unit root test for cross-section i, i.e. pi = F(GiTi), where F(·) is the distribution function of the random variable Gi. Maddala & Wu (1999) and Choi (1999a) proposed a Fisher type test
N
P=2
ln pi
(11)
i=1
which combines the p-values from unit root tests for each cross-section i to test for unit root in panel data. P is distributed as 2 with 2N degrees of freedom as Ti → for all N. Maddala et al. (1999) argued that the IPS and Fisher tests relax the restrictive assumption of the LL test that i is the same under the alternative. Both the IPS and Fisher tests combine information based on individual unit root tests. However, the Fisher test has the advantage over the IPS test in that it does not require a balanced panel. Also, the Fisher test can use different lag lengths in the individual ADF regressions and can be applied to any other unit root tests. The disadvantage is that the p-values have to be derived by Monte Carlo simulations. Choi (1999a) echoes similar advantages of the Fisher test: (1) the cross-sectional dimension, N, can be either finite or infinite, (2) each group can have different types of nonstochastic and stochastic components, (3) the time series dimension, T, can be different for each i, and (4) the alternative hypothesis would allow some groups to have unit roots while others may not. When N is large, Choi (1999a) proposed a Z test,
N
1
Z=
N
( 2 ln pi 2)
i=1
2
(12)
since E[ 2 ln pi] = 2 and Var[ 2 ln pi] = 4. Assume that the pi’s are iid and use the Lindeberg-Levy central limit theorem to get Z ⇒ N(0, 1)
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
17
as Ti → followed by N → .3 Choi (1999a) applied the Z test in (12) and the IPS test in (7) to panel data of real exchange rates and provided evidence in favor of the PPP hypothesis. Choi claimed that this is due to the improved finite sample power of the Fisher test. Maddala & Wu (1999) and Maddala et al. (1999) find that the Fisher test is superior to the IPS test, but they argue that these panel unit root tests still do not rescue the PPP hypothesis. When allowance is made for the deficiency in the panel data unit root tests and panel estimation methods, support for PPP turns out to be weak. D. Residual Based LM Test Hadri (1999) proposed a residual based Lagrange Multiplier (LM) test for the null that the time series for each i are stationary around a deterministic trend against the alternative of a unit root in panel data. Consider the following model yit = zit + rit + it
(13)
where zit is the deterministic component, rit is a random walk rit = rit 1 + uit uit ~ iid(0, 2u) and it is a stationary process. (13) can be written as yit = zit + eit where
(14)
t
eit =
uij + it.
j=1
Let eˆ it be the residuals from the regression in (14) and ˆ 2e be the estimate of the error variance. Also, let Sit be the partial sum process of the residuals,
t
Sit =
eˆ ij.
j=1
Then the LM statistic is
N
LM =
1 N
i=1
T
1 T2
ˆ 2e
t=1
S2it
.
18
BADI H. BALTAGI & CHIHWA KAO
It can be shown that p
LM → E
W 2iZ
as T → followed by N → provided E[ W 2iZ] < . Also, N(LM E[ W 2iZ]) Var[ W 2iZ]
⇒ N(0, 1)
as T → followed by N → . E. Finite Sample Properties of Unit Root Tests Extensive simulations have been conducted to explore the finite sample performance of panel unit root tests, e.g. Karlsson & Löthgren (1999), Im et. al. (1997), Maddala & Wu (1999), and Choi (1999a). Choi (1999a) studied the small sample properties of IPS t-bar test in (7) and Fisher’s test in (11). Choi’s major findings are the following: (1) The empirical size of the IPS and the Fisher test are reasonably close to their nominal size 0.05 when N is small. But the Fisher test shows mild size distortions at N = 100, which is expected from the asymptotic theory. Overall, the IPS t-bar test has the most stable size. (2) In terms of the size-adjusted power, the Fisher test seems to be superior to the IPS t-bar test. (3) When a linear time trend is included in the model, the power of all tests decrease considerably.
III. SPURIOUS REGRESSION IN PANEL DATA Entorf (1997) studied spurious fixed effects regressions when the true model involves independent random walks with and without drifts. Entorf found that for T → and N finite, the nonsense regression phenomenon holds for spurious fixed effects models and inference based on t-values can be highly misleading. Kao (1999) and Phillips & Moon (1999a) derived the asymptotic distributions of the least squares dummy variable estimator and various conventional statistics from the spurious regression in panel data. Consider a spurious regression model for all i using panel data: yit = xit + zit + eit, where
(15)
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
19
xit = xit 1 + it, and eit is I(1). The OLS estimator of is
ˆ =
N
T
i=1
t=1
1
x˜ it x˜ it
where y˜ it is defined in (3) and
N
T
i=1
t=1
x˜ ity˜ it ,
(16)
T
x˜ it = xit
h(t, s)xis.
s=1
It is known that if a time-series regression for a given i is performed in model (15), the OLS estimator of is spurious. It is easy to see that
N
1 N
and
i=1
T
1 T2
t=1
N
1 N
i=1
p
T
1 T2
WiZWiZ
x˜ it x˜ it → E
p
x˜ ity˜ it → E WiZWiZ u
t=1
as by a sequential limit theory, where E[ WiZWiZ] 1 0 2 1 1 2 1 i Ik 2 1 (i, t) Ik 15 zit
Then we have p ˆ → 1u.
(17)
ˆ is consistent for its true value, (17) shows that the OLS estimator of , , 1 u. This is due to the fact that the noise, eit, is as strong as the signal, xit,
20
BADI H. BALTAGI & CHIHWA KAO
since both eit and xit are I(1). In the panel regression (15) with a large number of cross-sections, the strong noise of eit is attenuated by pooling the data and a consistent estimate of can be extracted. The asymptotics of the OLS estimator are very different from those of the spurious regression in pure time series. This has an important consequence for residual-based cointegration tests in panel data, because the null distribution of residual-based cointegration tests depends on the asymptotics of the OLS estimator. This point is explained further in the next section.
IV. PANEL COINTEGRATION TESTS A. Kao Tests Kao (1999) presented two types of cointegration tests in panel data, the DF and ADF types tests. The DF type tests from Kao can be calculated from the estimated residuals in (15) as: eˆ it = ˆeit 1 + vit, (18) where ˆ eˆ it = y˜ it x˜ it. In order to test the null hypothesis of no cointegration, the null can be written as H0 : = 1. The OLS estimate of and the t-statistic are given as:
N
T
i=1
t=2
eˆ iteˆ it 1
ˆ =
N
T
eˆ 2it
i=1
and
t=2
N
T
(ˆ 1)
t =
N
1 where s = NT 2 e
i=1
i=1
eˆ 2it 1
t=2
,
se
T
(ˆeit ˆ eˆ it 1)2. Kao proposed the following four DF type
t=2
tests by assuming zit = {i}: DF =
NT(ˆ 1) + 3N 10.2
,
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
21
DFt = 1.25t + 1.875N, 3N ˆ 2 ˆ 20 , 36 ˆ 4 3+ 4 5 ˆ 0
NT(ˆ 1) +
DF* =
and
t +
DF*t =
6N ˆ 2 ˆ 0
, ˆ 20 3 ˆ 2 + 2 ˆ 2 10 ˆ 20
ˆ ˆ ˆ 1 and ˆ 2 = ˆ u ˆ u ˆ 1. While DF and DFt are based where ˆ 2 = 0 u u on the strong exogeneity of the regressors and errors, DF* and DF*t are for the cointegration with endogenous relationship between regressors and errors. For the ADF test, we can run the following regression:
p
eˆ it = ˆeit 1 +
j ˆeit j + itp.
(19)
j=1
With the null hypothesis of no cointegration, the ADF test statistics can be constructed as: tADF +
ADF =
6N ˆ 2 0
ˆ 20 3 ˆ 2 + 2 ˆ 2 10 ˆ 20
where tADF is the t-statistic of in (19). The asymptotic distributions of DF, DFt, DF*, DF*t, and ADF converge to a standard normal distribution N(0, 1) by the sequential limit theory. B. Residual Based LM Test McCoskey & Kao (1998) derived a residual-based test for the null of cointegration rather than the null of no cointegration in panels. This test is an extension of the LM test and the locally best invariant (LBI) test for an MA unit root in the time series literature, see Harris & Inder (1994) and Shin (1994). Under the null, the asymptotics no longer depend on the asymptotic properties
22
BADI H. BALTAGI & CHIHWA KAO
of the estimating spurious regression, rather the asymptotics of the estimation of a cointegrated relationship are needed. For models which allow the cointegrating vector to change across the cross-sectional observations, the asymptotics depend merely on the time series results as each cross-section is estimated independently. For models with common slopes, the estimation is done jointly and therefore the asymptotic theory is based on the joint estimation of a cointegrated relationship in panel data. For the residual based test of the null of cointegration, it is necessary to use an efficient estimation technique of cointegrated variables. In the time series literature a variety of methods have been shown to be efficient asymptotically. These include the fully modified (FM) estimator of Phillips & Hansen (1990) and the dynamic least squares (DOLS) estimator as proposed by Saikkonen (1991) and Stock & Watson (1993). For panel data, Kao & Chiang (2000) showed that both the FM and DOLS methods can produce estimators which are asymptotically normally distributed with zero means. The model presented allows for varying slopes and intercepts: yit = i + xiti + eit,
(20)
xit = xit 1 + it
(21)
eit = it + uit,
(22)
and it = it 1 + uit, where uit are i.i.d(0, 2u). The null of hypothesis of cointegration is equivalent to = 0. The test statistic proposed by McCoskey & Kao (1998) is defined as follows:
N
LM =
1 N
T
i=1
1 T2
S2it
t=1
ˆ 2e
,
(23)
where Sit, is partial sum process of the residuals,
t
Sit =
eˆ ij
j=1
and ˆ 2e is defined in McCoskey and Kao. The asymptotic result for the test is: N(LM ) ⇒ N(0, 2 ).
(24)
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
23
The moments, and 2 , can be found through Monte Carlo simulation. The limiting distribution of LM is then free of nuisance parameters and robust to heteroskedasticity. Urban economists have long sought to explain the relationship between urbanization levels and output. McCoskey & Kao (1999a) revisited this question and test the long run stability of a production function including urbanization using nonstationary panel data techniques. McCoskey and Kao applied the IPS test and LM in (23) and showed that a long run relationship between urbanization, output per worker and capital per worker cannot be rejected for the sample of thirty developing countries or the sample of twentytwo developed countries over the period 1965–1989. They do find, however, that the sign and magnitude of the impact of urbanization varies considerably across the countries. These results offer new insights and potential for dynamic urban models rather than the simple cross-section approach. C. Pedroni Tests Pedroni (1997a) also proposed several tests for the null hypothesis of no cointegration in a panel data model that allows for considerable heterogeneity. His tests can be classified into two categories. The first set is similar to the tests discussed above, and involve averaging test statistics for cointegration in the time series across cross-sections. The second set group the statistics such that instead of averaging across statistics, the averaging is done in pieces so that the limiting distributions are based on limits of piecewise numerator and denominator terms. The first set of statistics as discussed includes a form of the average of the Phillips & Ouliaris (1990) statistic:
T
(ˆeit 1 ˆeit ˆ i)
N
Z˜ =
t=1
T
i=1
,
(25)
eˆ it2 1
t=1
1 where eˆ it is estimated from (15), and ˆ i = ( ˆ 2i sˆ2i ), for which ˆ 2i and sˆ2i are 2 individual long-run and contemporaneous variances respectively of the residual eˆ it. For his second set of statistics, Pedroni defines four panel test statistics. Let ˆ i be a consistent estimate of i, the long-run variance-covariance matrix. ˆ i such that in the Define Lˆ i to be the lower triangular Cholesky composition of
24
BADI H. BALTAGI & CHIHWA KAO
ˆ 2u is the long-run conditional variance. In ˆ 2 this survey we consider only one of these statistics:
scalar case Lˆ 22i = ˆ and Lˆ 11i = ˆ 2u
N
T
2 Lˆ 11i (ˆeit 1 eˆ it ˆ i)
Ztˆ = NT
i=1
t=2
N
T
˜ 2NT
i=1
where
t=2
N
1 ˜ NT = N
2 2 Lˆ 11i eˆ it 1
i=1
,
(26)
ˆ 2i . Lˆ 211i
It should be noted that Pedroni bases his test on the average of the numerator and denominator terms respectively, rather than the average for the statistics as a whole. Using results on convergence of functionals of Brownian motion, Pedroni finds the following result: Ztˆ + 1.73N ⇒ N(0, 0.93). NT
Note that this distribution applies to the model including an intercept and not including a time trend. Asymptotic results for other model specifications can be found in Pedroni (1997a). The convergence in distribution is based on individual convergence of the numerator and denominator terms. What is the intuition of rejection of the null hypothesis? Using the average of the overall test statistic allows more ease in interpretation: rejection of the null hypothesis means that enough of the individual cross-sections have statistics ‘far away’ from the means predicted by theory were they to be generated under the null. Pedroni (1999) derived asymptotic distributions and critical values for several residual based tests of the null of no cointegration in panels where there are multiple regressors. The model includes regressions with individual specific fixed effects and time trends. Considerable heterogeneity is allowed across individual members of the panel with regards to the associated cointegrating vectors and the dynamics of the underlying error process. Pedroni (1997b) showed that for test of the null of no cointegration, the appropriate weighting matrix of a GLS based estimator must be constructed using the long run conditional covariance matrix between individual members of the panel in order to eliminate nuisance parameters associated with member specific dynamics. Pedroni (1997b) found that the violation of cross-sectional independence does not appear to play a significant role for the conclusions in
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
25
favor of weak long run PPP provided that one also includes common time dummies in the regression. Pedroni (2000) also demonstrated how it is possible to construct a test that can be employed to test whether or not members of a panel with heterogeneous short run dynamics converge to a single common steady state. D. Likelihood-Based Cointegration Test Larsson, Lyhagen & Löthgren (1998) presented a likelihood-based (LR) panel test of cointegrating rank in heterogeneous panel models based on the average of the individual rank trace statistics developed by Johansen (1995). The proposed LR-bar statistic is very similar to the IPS t-bar statistic in (7) through (10). In Monte Carlo simulation, Larsson et al. investigated the small sample properties of the standardized LR-bar statistic. They found that the proposed test requires a large time series dimension. Even if the panel has a large crosssectional dimension, the size of the test will be severely distorted. Groen & Kleibergen (1999) proposed a likelihood-based framework for cointegrating analysis in panels of a fixed number of vector error correction models. Maximum likelihood estimators of the cointegrating vectors are constructed using iterated generalized method of moments (GMM) estimators. Using these estimators Groen and Kleibergen construct likelihood ratio statistics, LR(B|A), to test for a common cointegration rank across the individual vector error correction models, both with heterogeneous and homogeneous cointegrating vectors. Interestingly, the limiting distribution of LR(B|A) is invariant to the covariance matrix of the error terms which implies that LR(B|A) is robust with respect to the choices of covariance matrix. Let us define the LRs(r|k) as the summation of the N individual trace statistics
N
LRs(r | k) =
(27)
LRi(r | k)
i=1
where LRi(r | k) is the i-th Johansen’s likelihood ratio statistic, so that
LRi(r | k) ⇒ tr
dBk r, iBk r, i
dBk r, iBk r, i
as T → . Now for a fixed N, it is clear that
dBk r, iBk r, i
26
BADI H. BALTAGI & CHIHWA KAO
N
LRs(r | k) =
LRi(r | k)
i=1 N
⇒
tr
dBk r, iBk r, i
i=1
dBk r, iBk r, i
dBk r, iBk r, i
(28) as T → by a continuous mapping theorem. It follows that LRs(r | k) is asymptotically equivalent to LR(B | A) when N is fixed and T is large. This means that nothing is lost by assuming that the covariance matrix has zero nondiagonal covariances as far as the asymptotics are concerned for the proposed test statistics in this chapter. More importantly, the tests based on the crossindependence like (27) will perform just as well (asymptotically) as the tests based on the cross-dependence such as LR(B | A). Groen and Kleibergen verified that the likelihood-based cointegration tests proposed by Larsson et al. in (27) are robust with respect to the cross-dependence in panel data. The (asymptotic) equivalence of LRs(r | k) and LR(B | A) found in Groen and Kleibergen has profound implications to econometricians and applied economists, e.g. there exists tests/estimators based on the cross-independence which are equivalent to tests/estimators based on the cross-dependence in nonstationary panel time series. Define LR(r | k) to be the average of LRi(r | k):
N
LR(r | k) =
1 1 LRs(r | k) = N N
LRi(r | k).
i=1
It can be shown that LR(r | k) E[LR(r | k)] Var[LR(r | k)]
⇒ N(0, 1)
as T → followed by N → by a continuous mapping theorem and a central limit theorem provided E[LR(r | k)] and Var[LR(r | k)] are bounded. Define LR(B | A) = For a fixed N, it is easy to show that
1 LR(B | A). N
(29)
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
LR(B | A) =
1 LR(B | A) N
N
1 ⇒ N
tr
dBk r, iBk r, i
27
dBk r, iBk r, i
i=1
dBk r, iBk r, i
N
1 = N
where Zki = tr
Zki
i=1
dBk r, iBk r, i
as T → . Then
dBk r, iBk r, i
N
1 N
dBk r, iBk r, i
N
i=1
1 Zki E N
Zki
i=1
⇒ N(0, 1) N 1 Var Zki N i=1 as N → since Bk r, i and Bk r, j are independent for i ≠ j. It implies that LR(B | A) E[LR(B | A)] Var[LR(B | A)]
⇒ N(0, 1)
as T → followed by N → . The above discussion indicates that LR(r | k) and LR(B | A) are also equivalent when T and N are large. Groen & Kleibergen (1999) applied LR(B | A) to a data set of exchange rates and appropriate monetary fundamentals. They found strong evidence for the validity of the monetary exchange rate model within a panel of vector correction models for three major European countries, whereas the results based on individual vector error correction models for each of these countries separately are less supportive. E. Finite Sample Properties McCoskey & Kao (1999b) conducted Monte Carlo experiments to compare the size and power of different residual based tests for cointegration in
28
BADI H. BALTAGI & CHIHWA KAO
heterogeneous panel data: varying slopes and varying intercepts. Two of the tests are constructed under the null hypothesis of no cointegration. These tests are based on the average ADF test and Pedroni’s pooled tests in (25) and (26). The third test is based on the null hypothesis of cointegration which is based on the McCoskey & Kao LM test in (23). Wu & Yin (1999) performed a similar comparison for panel tests in which they consider only tests for which the null hypothesis is that of no cointegration. Wu & Yin compared ADF statistics with maximum eigenvalue statistics in pooling information on means and p-values respectively. They found that the average ADF performs better with respect to power and their maximum eigenvalue based p-value performs better with regards to size. The test of the null hypothesis was originally proposed in response to the low power of the tests of the null of no cointegration, especially in the time series case. Further, in cases where economic theory predicted a long run steady state relationship, it seemed that a test of the null of cointegration rather than the null of no cointegration would be appropriate. The results from the Monte Carlo study showed that the McCoskey & Kao LM test outperforms the other two tests. Of the two reasons for the introduction of the test of the null hypothesis of cointegration, low power and attractiveness of the null, the introduction of the cross-section dimension of the panel solves one: all of the tests show decent power when used with panel data. For those applications where the null of cointegration is more logical than the null of no cointegration, McCoskey & Kao (1999b), at a minimum, conclude that using the McCoskey & Kao LM test does not compromise the ability of the researcher in determining the underlying nature of the data. Recently, Hall et al. (1999) proposed a new approach based on principal components analysis to test for the number of common stochastic trends driving the nonstationary series in a panel data set. The test is consistent even if there is a mixture of I(0) and I(1) series in the sample. This makes it unnecessary to pretest the panel for unit root. It also has the advantage of solving the problem of dimensionality encountered in large panel data sets.
V. ESTIMATION AND INFERENCE IN PANEL COINTEGRATION MODELS This section discusses the issues that arise in estimation and inference of cointegrated panel regression models. The asymptotic properties of the estimators of the regression coefficients and the associated statistical tests are different from those of the time series cointegration regression models. Some
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
29
of these differences have become apparent in recent works by Kao & Chiang (2000), Phillips & Moon (1999a) and Pedroni (1996). The panel cointegration models are directed at studying questions that surround long run economic relationships typically encountered in macroeconomic and financial data. Such a long run relationship is often predicted by economic theory and it is then of central interest to estimate the regression coefficients and test whether they satisfy theoretical restrictions. Kao & Chen (1995) showed that the OLS in panel cointegrated models is asymptotically normal but still asymptotically biased. Chen, McCoskey & Kao (1999) investigated the finite sample proprieties of the OLS estimator, the t-statistic, the bias-corrected OLS estimator, and the bias-corrected t-statistic. They found that the bias-corrected OLS estimator does not improve over the OLS estimator in general. The results of Chen et al. suggested that alternatives, such as the fully modified (FM) estimator or dynamic OLS (DOLS) estimator may be more promising in cointegrated panel regressions. Phillips & Moon (1999a) and Pedroni (1996) proposed a FM estimator, which can be seen as a generalization of Phillips & Hansen (1990). In this volume, Kao & Chiang (2000) propose an alternative approach based on a panel dynamic least squares (DOLS) estimator, which builds upon the work of Saikkonen (1991) and Stock & Watson (1993). Next, we provide a brief discussion of the OLS estimation methods in a panel cointegrated model. Consider the following panel regression: yit = xit + ziti + uit,
(30)
where {yit} are 1 1, is a k 1 vector of the slope parameters, zit is the deterministic component, and {uit} are the stationary disturbance terms. We assume that {xit} are k 1 integrated processes of order one for all i, where xit = xit 1 + it. Under these specifications, (30) describes a system of cointegrated regressions, i.e. yit is cointegrated with xit. The OLS estimator of is ˆ OLS = It is easy to show that
N
T
i=1
t=1
x˜ it x˜ it
N
1 N
and
i=1
1
N
T
i=1
t=1
x˜ ity˜ it .
T
1 T2
t=1
(31)
N
1 x˜ it x˜ it → lim N→ N p
i=1
E[2i],
(32)
30
BADI H. BALTAGI & CHIHWA KAO
N
1 N
T
i=1
1 T
t=1
N
1 x˜ itu˜ it ⇒ lim N→ N
(33)
E[1i]
i=1
using sequential limit theory, where zit
E[1i]
0
0
1
0
i
1 ui + ui 2 1 ui + ui 2
(i, t)
E[2i] 1 2 0 1 i 6 1 i 15
and i =
ui ui
ui i
is the long-run covariance matrix of (uit, it), also i =
ui ui
ui is the one i
sided long-run covariance. For example, when zit = {i}, we get
N
1 NT(ˆ OLS ) NNT ⇒ N 0, 6 1 lim N→ N
u.i 1 ,
i=1
N
where = lim
N⇒
1 N
i and
i=1
N
1 NT = N
T
1 T2
i=1
(xit x¯ i)(xit x¯ i)
1
t=1
N
1 N
1/2 i
˜ i dWi i 1/2ui + ui . W
i=1
Kao & Chiang (2000) in this volume studied the limiting distributions for the FM, and DOLS estimators in a cointegrated regression and showed they are
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
31
asymptotically normal. Phillips & Moon (1999a) and Pedroni (1996) also obtained similar results for the FM estimator. The reader is referred to the cited papers for further details. Kao and Chiang also investigated the finite sample properties of the OLS, FM, and DOLS estimators. They found that: (i) the OLS estimator has a non-negligible bias in finite samples, (ii) the FM estimator does not improve over the OLS estimator in general, and (iii) the DOLS estimator may be more promising than OLS or FM estimators in estimating the cointegrated panel regressions. Choi (1999b) extended Kao & Chiang (2000) to study asymptotic properties of OLS, Within and GLS estimators for an error component model. The error component model involves both stationary and nonstationary regressors. Choi’s simulation results indicated that the feasible GLS estimator is more efficient than the Within estimator. Choi (1999c) studied instrumental variable estimation for an error component model with stationary and nearly nonstationary regressors. Phillips & Moon (1999a) studied various regressions between two panel vectors that may or may not have cointegrating relations, and present a fundamental framework for studying sequential and joint limit theories in nonstationary panel data. In particular, Phillips and Moon studied regression limit theory of nonstationary panels when both N and T go to infinity. Their limit theory allows for both sequential limits, where T → followed by N → and joint limits, where T, N → simultaneously. Phillips and Moon require that N/T → 0, so that these results apply for moderate N and large T macro panel data and not large N and small T micro panel data. The panel models considered allow for four cases: (i) panel spurious regression, where there is no time series cointegration, (ii) heterogeneous panel cointegration, where each individual has its own specific cointegration relation, (iii) homogeneous panel cointegration where individuals have the same cointegration relation, and (iv) near-homogeneous panel cointegration, where individuals have slightly different cointegration relations determined by the value of a localizing parameter. Phillips & Moon (1999a) investigated these four models and developed panel asymptotics for regression coefficients and tests using both sequential and joint limit arguments. In all cases considered the pooled estimator is consistent and has a normal limiting distribution. In fact, for the spurious panel regression, Phillips & Moon (1999a) showed that under quite weak regularity conditions, the pooled least squares estimator of the slope coefficient is N consistent for the long-run average relation parameter and has a limiting normal distribution. Also, Moon & Phillips (1998a) showed that a limiting cross-section regression with time averaged data is also N consistent for and has a limiting normal distribution. This is different from
32
BADI H. BALTAGI & CHIHWA KAO
the pure time series spurious regression where the limit of the OLS estimator of is a nondegenerate random variate that is a functional of Brownian motions and is therefore not consistent for . The idea in Phillips & Moon (1999a) is that independent cross-section data in the panel adds information and this leads to a stronger overall signal than the pure time series case. Pesaran & Smith (1995) studied limiting cross-section regressions with time averaged data and established consistency with restrictive assumptions on the heterogeneous panel model. This differs from Phillips & Moon (1999a) in that the former use an average of the cointegrating coefficients which is different from the long run average regression coefficient. This requires the existence of cointegrating time series relations, whereas the long run average regression coefficient is defined irrespective of the existence of individual cointegrating relations and relies only on the long run average variance matrix of the panel. Phillips & Moon (1999a) also showed that for the homogeneous and near homogeneous cointegration cases, a consistent estimator of the long run regression coefficient can be constructed which they call a pooled FM estimator. They showed that this estimator has faster convergence rate than the simple cross-section and time series estimators. See also Phillips & Moon (1999b) for a concise review. In fact, the latter paper also shows how to extend the above ideas to models with individual effects in the data generating process. For the panel spurious regression with individual specific deterministic trends, estimates of the trend coefficients are obtained in the first step and the detrended data is pooled and used in least squares regression to estimate in the second step. Two different detrending procedures are used based on OLS and GLS regressions. OLS detrending leads to an asymptotically more efficient estimator of the long run average coefficient in pooled regression than GLS detrending. Phillips & Moon (1999b) explain that ‘‘the residuals after time series GLS detrending have more cross section variation than they do after OLS detrending and this produces great variation in the limit distribution of the pooled regression estimator of the long run average coefficient.” Moon & Phillips (1999) investigate the asymptotic properties of the Gaussian MLE of the localizing parameter in local to unity dynamic panel regression models with deterministic and stochastic trends. Moon and Phillips find that for the homogeneous trend model, the Gaussian MLE of the common localizing parameter is N consistent, while for the heterogeneous trends model, it is inconsistent. The latter inconsistency is due to the presence of an infinite number of incidental parameters (as N → ) for the individual trends. Unlike the fixed effects dynamic panel data model where this inconsistency due to the incidental parameter problem disappears as T → , the inconsistency of
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
33
the localizing parameter in the Moon and Phillips model persists even when both N and T go to infinity. Pesaran, Shin & Smith (1999) derived the asymptotics of a pooled mean group (PMG) estimator. The PMG estimation constrains the long run coefficients to be identical, but allow the short run and adjustment coefficients as the error variances to differ across the cross-sectional dimension. Recently, Binder, Hsiao & Pesaran (2000) considered estimation and inference in panel vector autoregressions (PVARS) with fixed effects when T is finite and N is large. A maximum likelihood estimator as well as unit root and cointegration tests are proposed based on a transformed likelihood function. This MLE is shown to be consistent and asymptotically normally distributed irrespective of the unit root and cointegrating properties of the PVAR model. The tests proposed are based on standard chi-square and normal distributed statistics. Binder et al. also show that the conventional GMM estimators based on standard orthogonality conditions break down if the underlying time series contain unit roots. Monte Carlo evidence is provided which favors MLE over GMM in small samples. In this volume, Kauppi (2000) develops a new joint limit theory where the panel data may be cross-sectionally heterogeneous in a general way. This limit theory builds upon the concepts of joint convergence in probability and in distribution for double indexed processes by Phillips & Moon (1999a) and develops new versions of the law of large numbers and the central limit theorem that apply in panels where the data may be cross-sectionally heterogeneous in a fairly general way. Kauppi demonstrates how this joint limit theory can be applied to derive asymptotics for a panel regression where the regressors are generated by a local to unit root with heterogeneous localizing coefficients across cross-sections. Kauppi discusses issues that arise in the estimation and inference of panel cointegrated regressions with near integrated regressors. Kauppi shows that a bias corrected pooled OLS for a common cointegrating parameter has an asymptotic normal distribution centered on the true value irrespective of whether the regressor has near or exact unit root. However, if the regression model contains individual effects and/or deterministic trends, then Kauppi’s bias corrected pooled OLS still produces asymptotic bias. Kauppi also shows that the panel FM estimator is subject to asymptotic bias regardless of how individual effects and/or deterministic trends are contained if the regressors are nearly rather than exacly integrated. This indicates that much care should be taken in interpreting empirical results achieved by the recent panel cointegration methods that assume exact unit roots when near unit roots are equally plausible.
34
BADI H. BALTAGI & CHIHWA KAO
Kao et al. (1999) apply the asymptotic theory of panel cointegration developed by Kao & Chiang (2000) to the Coe & Helpman (1995) international R&D spillover regression. Using a sample of 21 OECD countries and Israel, they re-examine the effects of domestic and foreign R&D capital stocks on total factor productivity of these countries. They find that OLS with bias-correction, the fully modified (FM) and the dynamic OLS (DOLS) estimators produce different predictions about the impact of foreign R&D on total factor productivity (TFP). However, all the estimators support the result that domestic R&D is related to TFP. Kao et al.’s empirical results indicate that the estimated coefficients in the Coe and Helpman’s regressions are subject to estimation bias. Given the superiority of the DOLS over FM as suggested by Kao & Chiang (2000), Kao et al. leaned towards rejecting the Coe and Helpman hypothesis that international R&D spillovers are trade related. Funk (1998) examined the relationship between trade patterns and international R&D spillovers among the OECD countries using the panel cointegration methods developed by Kao (1999), Kao & Chiang (2000), and Pesaran, Shin & Smith (1999). Using randomly simulated bilateral trade patterns, Funk found that the choice of weights used in constructing foreign R&D stocks is informative of the avenue of spillover transmission when panel cointegration methods are employed. A re-examination of the relationship between import patterns and R&D spillovers found no evidence to link the patterns of R&D spillovers to the patterns of imports. Funk found strong evidence indicating that exporters receive substantial R&D spillovers from their customers.
VI. DYNAMIC PANEL DATA MODELS This section surveys recent developments in dynamic panel data models. The dynamic panel data regression is characterized by two sources of persistence over time. Autocorrelation due to the presence of a lagged dependent variable among the regressors and individual effects characterizing the heterogeneity among the individuals yit = yi, t 1 + xit + i + uit
(34)
for i = 1, 2, . . . , N; and t = 1, 2, . . . , T. is a scalar, xit is k 1, i denotes the i-th individuals effect and uit is the remainder disturbance. Basic introductions to this topic are found in Hsiao (1986), Baltagi (1995) and Matyas & Sevestre (1996). Applications using this model are too many to enumerate. These include employment equations, see Arellano & Bond (1991), liquor demand, see Baltagi & Griffin (1995), growth convergence, see Islam (1995) and
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
35
Nerlove (1999), life cycle labor supply models, see Ziliak (1997), and demand for gasoline, see Baltagi & Griffin (1997) to mention a few. It is well known that for typical micro-panels where there are a large number of firms or individuals (N) observed over a short period of time (T), the fixed effects (FE) estimator is biased and inconsistent (since T is fixed and N → ), see Nickell (1981) and more recently Kiviet (1995, 1999). Monte Carlo results have shown that first order asymptotic properties do not necessarily yield correct inference in finite samples. Therefore, Kiviet (1995) examined higher order asymptotics which may approximate the actual finite sample properties more closely and lead to better inference. In fact, Kiviet (1995) considered the simple dynamic linear panel data model with serially uncorrelated disturbances and strongly exogenous regressors and derived an approximation for the bias of the FE estimator. When a consistent estimator of this bias is subtracted from the original FE estimator, a corrected FE estimator results. This corrected FE estimator performed well in simulations when compared with eight other consistent instrumental variable or GMM estimators.4 In macro-panels studying for example long run growth, the data covers a large number of countries N over a moderate size T. In this case, T is not very small relative to N. Hence, some researchers may still favor the FE estimator arguing that its bias may not be large. Judson & Owen (1999) performed some Monte Carlo experiments for N = 20 or 100 and T = 5, 10, 20 and 30 and found that the bias in the FE can be sizeable, even when T = 30. The bias of the FE estimator increases with and decreases with T. But even for T = 30, this bias could be as much as 20% of the true value of the coefficient of interest. Judson & Owen (1999) recommend the corrected FE estimator proposed by Kiviet (1995) as the best choice, GMM being second best and for long panels, the computationally simpler Anderson & Hsiao (1982) estimator. This last estimator first differences the data to get rid of the individual effects and then uses lagged predetermined variables in levels as instruments.5 Arellano & Bond (1991) proposed GMM procedures that are more efficient than the Anderson & Hsiao (1982) estimator. Ahn & Schmidt (1995) derive additional nonlinear moment restrictions not exploited by the Arellano & Bond (1991) GMM estimator.6 Ahn & Schmidt (1995, 1997) also give a complete count of the set of orthogonality conditions corresponding to a variety of assumptions imposed on the disturbances and the initial conditions of the dynamic panel data model. While many of the moment conditions are nonlinear in the parameters, Ahn & Schmidt (1997) propose a linearized GMM estimator that is asymptotically as efficient as the nonlinear GMM estimator. They also provide simple moment tests of the validity of these nonlinear restrictions. In addition, they investigate the circumstances under which the optimal GMM estimator is equivalent to a
36
BADI H. BALTAGI & CHIHWA KAO
linear instrumental variable estimator. They find that these circumstances are quite restrictive and go beyond uncorrelatedness and homoskedasticity of the errors. Ahn & Schmidt (1995) provide some evidence on the efficiency gains from the nonlinear moment conditions which provide support for their use in practice. By employing all these conditions, the resulting GMM estimator is asymptotically efficient and has the same asymptotic variance as the MLE under normality. In fact, Hahn (1997) showed that GMM based on an increasing set of instruments as N → would achieve the semiparametric efficiency bound. Hahn (1997) considers the asymptotic efficient estimation of the dynamic panel data model with sequential moment restrictions in an environment with i.i.d. observations. Hahn (1997) shows that the GMM estimator with an increasing set of instruments as the sample size grows attains the semiparametric efficiency bound of the model. Hahn (1997) explains how Fourier series or polynomials may be used as the set of instruments for efficient estimation. In a limited Monte Carlo comparison, Hahn finds that this estimator has similar finite sample properties as the Keane & Runkle (1992) and/or Schmidt et al. (1992) estimators when the latter estimators are efficient. In cases where the latter estimators are not efficient, the Hahn efficient estimator outperforms both estimators in finite samples. Recently, Wansbeek & Bekker (1996) considered a simple dynamic panel data model with no exogenous regressors and disturbances uit and random effects i that are independent and normally distributed. They derived an expression for the optimal instrumental variable estimator, i.e. one with minimal asymptotic variance. A striking result is the difference in efficiency between the IV and ML estimators. They find that for regions of the autoregressive parameter which are likely in practice, ML is superior. The gap between IV (or GMM) and ML can be narrowed down by adding moment restrictions of the type considered by Ahn & Schmidt (1995). Hence, Wansbeek & Bekker (1996) find support for adding these nonlinear moment restrictions and warn against the loss in efficiency as compared with MLE by ignoring them. Blundell & Bond (1998) revisit the importance of exploiting the initial condition in generating efficient estimators of the dynamic panel data model when T is small. They consider a simple autoregressive panel data model with no exogenous regressors yit = yi, t 1 + i + uit
(35)
with E(i) = 0; E(uit) = 0; and E(iuit) = 0 for i = 1, 2, . . . , N; t = 1, 2, . . . , T. Blundell & Bond (1998) focus on the case where T = 3 and therefore there is
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
37
only one orthogonality condition given by E(yi1 ui3) = 0, so that is justidentified. In this case, the first stage IV regression is obtained by running yi2 on yi1. Note that this regression can be obtained from (2) evaluated at t = 2 by subtracting yi1 from both sides of this equation, i.e. yi2 = ( 1)yi, 1 + i + ui2
(36)
Since we expect E(yi1i) > 0, ( 1) will be biased upwards with plim(ˆ 1) = ( 1)
c c + ( 2/ 2u)
(37)
where c = (1 )/(1 + ). The bias term effectively scales the estimated coefficient on the instrumental variable yi1 towards zero. They also find that the F-statistic of the first stage IV regression converges to 21 with noncentrality parameter =
( 2uc)2 → 0 as → 1 2 + 2uc
(37)
As → 0, the instrumental variable estimator performs poorly. Hence, Blundell and Bond attribute the bias and the poor precision of the first difference GMM estimator due to the problem of weak instruments described in Nelson & Startz (1990) and Staiger & Stock (1997) and characterize this weak IV by its concentration parameter . Next, Blundell & Bond (1998) show that an additional mild stationarity restriction on the initial conditions process allows the use of an extended system GMM estimator that uses lagged differences of yit as instruments for equations in levels, in addition to lagged levels of yit as instruments for equations in first differences, see Arellano & Bover (1995). The system GMM estimator is shown to have dramatic efficiency gains over the basic first difference GMM as → 1 and ( 2/ 2u) increases. In fact, for T = 4 and ( 2/ 2u) = 1, the asymptotic variance ratio of the first difference GMM estimator to this system GMM estimator is 1.75 for = 0 and increases to 3.26 for = 0.5 and 55.4 for = 0.9. This clearly demonstrates that the levels restrictions suggested by Arellano & Bover (1995) remain informative in cases where first differenced instruments become weak. Things improve for first difference GMM as T increases. However, with short T and persistent series, the Blundell and Bond findings support the use of the extra moment conditions. These results are reviewed and corroborated in Blundell, Bond & Windmeijer (2000) in this volume, using Monte Carlo experiments as well as an empirical example. In fact, simulations that include the weakly exogenous covariates find large finite sample bias and very low precision for the standard first differenced
38
BADI H. BALTAGI & CHIHWA KAO
estimator. However, the system GMM estimator not only improves the precision but also reduces the finite sample bias. The empirical application revisits the estimates of the capital and labor coefficients in a Cobb-Douglas production function considered by Griliches & Mairesse (1998). Using data on 509 R&D performing US manufacturing companies observed over 8 years (1982–1989), the standard GMM estimator that uses moment conditions on the first differenced model finds a low estimate of the capital coefficient and low precision for all coefficients estimated. However, the system GMM estimator gives reasonable and more precise estimates of the capital coefficient and constant returns to scale is not rejected. Blundell et al. conclude that “a careful examination of the original series and consideration of the system GMM estimator can usefully overcome many of the disappointing features of the standard GMM estimator for dynamic panel models.” Hahn (1999) also examines the role of the initial condition imposed by the Blundell & Bond (1998) estimator. This is done by numerically comparing the semiparametric information bounds for the case that incorporates the stationarity of the initial condition and the case that does not. Hahn (1999) finds that the efficiency gain can be substantial. Ziliak (1997) asks the question whether the bias/efficiency trade-off for the GMM estimator considered by Tauchen (1986) for the time series case is still binding in panel data where the sample size is normally larger than 500. For time series data, Tauchen (1986) shows that even for T = 50 or 75 there is a bias/ efficiency trade-off as the number of moment conditions increase. Therefore, Tauchen recommends the use of sub-optimal instruments in small samples. This result was also corroborated by Andersen & Sorensen (1996) who argue that GMM using too few moment conditions is just as bad as GMM using too many moment conditions. This problem becomes more pronounced with panel data since the number of moment conditions increase dramatically as the number of strictly exogenous variables and the number of time series observations increase. Even though it is desirable from an asymptotic efficiency point of view to include as many moment conditions as possible, it may be infeasible or impractical to do so in many cases. For example, for T = 10 and five strictly exogenous regressors, this generates 500 moment conditions for GMM. Ziliak (1997) performs an extensive set of Monte Carlo experiments for a dynamic panel data model and finds that the same trade-off between bias and efficiency exists for GMM as the number of moment conditions increase, and that one is better off with sub-optimal instruments. In fact, Ziliak finds that GMM performs well with suboptimal instruments, but is not recommended for panel data applications when all the moments are exploited for estimation.7 Ziliak estimates a life cycle labor supply model under uncertainty based on 532
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
39
men observed over 10 years of data (1978–1987) from the panel study of income dynamics. The sample was restricted to continuously married, continuously working prime age men aged 22–51 in 1978. These men were paid an hourly wage or salaried and could not be piece-rate workers or selfemployed. Ziliak finds that the downward bias of GMM is quite severe as the number of moment conditions expands, outweighing the gains in efficiency. Ziliak reports estimates of the intertemporal substitution elasticity which is the focal point of interest in the labor supply literature. This measures the intertemporal changes in hours of work due to an anticipated change in the real wage. For GMM, this estimate changes from 0.519 to 0.093 when the number of moment conditions used in GMM are increased from 9 to 212. The standard error of this estimate drops from 0.36 to 0.07. Ziliak attributes this bias to the correlation between the sample moments used in estimation and the estimated weight matrix. Interestingly, Ziliak finds that the forward filter 2SLS estimator proposed by Keane & Runkle (1992) performs best in terms of the bias/ efficiency trade-off and is recommended. Forward filtering eliminates all forms of serial correlation while still maintaining orthogonality with the initial instrument set. Schmidt, Ahn & Wyhowski (1992) argued that filtering is irrelevant if one exploits all sample moments during estimation. However, in practice, the number of moment conditions increases with the number of time periods T and the number of regressors K and can become computationally intractable. In fact for T = 15 and K = 10, the number of moment conditions for Schmidt, et al. (1992) is T(T–1)K/2 which is 1040 restrictions, highlighting the computational burden of this approach. In addition, Ziliak argues that the overidentifying restrictions are less likely to be satisfied possibly due to the weak correlation between the instruments and the endogenous regressors.8 In this case, the forward filter 2SLS estimator is desirable yielding less bias than GMM and sizeable gains in efficiency. In fact, for the life cycle labor example, the forward filter 2SLS estimate of the intertemporal substitution elasticity was 0.135 for 9 moment conditions compared to 0.296 for 212 moment conditions. The standard error of these estimates dropped from 0.32 to 0.09. The practical problem of not being able to use more moment conditions as well as the statistical problem of the trade-off between small sample bias and efficiency prompted Ahn & Schmidt (1999a) to pose the following questions: “Under what conditions can we use a smaller set of moment conditions without incurring any loss of asymptotic efficiency? In other words, under what conditions are some moment conditions redundant in the sense that utilizing them does not improve efficiency?” These questions were first dealt with by Im, Ahn, Schmidt & Wooldridge (1999) who considered panel data models with strictly exogenous explanatory variables. They argued that, for example, with
40
BADI H. BALTAGI & CHIHWA KAO
ten strictly exogenous time-varying variables and six time periods, the moment conditions available for the random effects (RE) model is 360 and this reduces to 300 moment conditions for the FE model. GMM utilizing all these moment conditions leads to an efficient estimator. However, these moment conditions exceed what the simple RE and FE estimators use. Im et al. (1999) provide the assumptions under which this efficient GMM estimator reduces to the simpler FE or RE estimator. In other words, Im et al. (1999) show the redundancy of the moment conditions that these simple estimators do not use. Ahn & Schmidt (1999a) provide a more systematic method by which redundant instruments can be found and generalize this result to models with time-varying individual effects. However, both papers deal only with strictly exogenous regressors. In a related paper, Ahn & Schmidt (1999b) consider the cases of strictly and weakly exogenous regressors. They show that the GMM estimator takes the form of an instrumental variables estimator if the assumption of no conditional heteroskedasticity (NCH) holds. Under this assumption, the efficiency of standard estimators can often be established showing that the moment conditions not utilized by these estimators are redundant. However, Ahn & Schmidt (1999b) conclude that the NCH assumption necessarily fails if the full set of moment conditions for the dynamic panel data model are used. In this case, there is clearly a need to find modified versions of GMM, with reduced set of moment conditions that lead to estimates with reasonable finite sample properties. Crepon, Kramarz & Trognon (1997) argue that for the dynamic panel data model, when one considers a set of orthogonal conditions, the parameters can be divided into parameters of interest (like ) and nuisance parameters (like the second order terms in the autoregressive error component model). They show that the elimination of such nuisance parameters using their empirical counterparts does not entail an efficiency loss when only the parameters of interest are estimated. In fact, Sevestre and Trognon in chapter 6 of Matyas & Sevestre (1996) argue that if one is only interested in , then one can reduce the number of orthogonality restrictions without loss in efficiency as far as is concerned. However, the estimates of the other nuisance parameters are not generally as efficient as those obtained from the full set of orthogonality conditions. The Alonso-Borrego & Arellano (1999) paper is also motivated by the finite sample bias in panel data instrumental variable estimators when the instruments are weak. The dynamic panel model generates many overidentifying restrictions even for moderate values of T. Also, the number of instruments increases with T, but the quality of these instruments is often poor because they tend to be only weakly correlated with first differenced
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
41
endogenous variables that appear in the equation. Limited information maximum likelihood (LIML) is strongly preferred to 2SLS if the number of instruments gets large as the sample size tends to infinity. Hillier (1990) showed that the alternative normalization rules adopted by LIML and 2SLS are at the root of their different sampling behavior. Hillier (1990) also showed that a symmetrically normalized 2SLS estimator has properties similar to those of LIML. Following Hillier (1990), Alonso-Borrego & Arellano (1999) derive a symmetrically normalized GMM (SNM) and compare it with ordinary GMM and LIML analogues by means of simulations. Monte Carlo and empirical results show that GMM can exhibit large biases when the instruments are poor, while LIML and SNM remain unbiased. However, LIML and SNM always had a larger interquartile range than GMM. For T = 4, N = 100, 2 = 0.2 and 2 = 1, the bias for = 0.5 was 6.9% for GMM, 1.7% for SNM and 1.7% for LIML. This bias increases to 17.8% for GMM, 3.7% for SNM and 4.1% for LIML for = 0.8. Alvarez & Arellano (1997) studied the asymptotic properties of FE, one-step GMM and non-robust LIML for a first-order autorgressive model when both N and T tend to infinity with (N/T) → c for 0 ≤ c < 2. For T < N, GMM bias is always smaller than FE and LIML bias is smaller than the other two. In fixed T framework, GMM and LIML are asymptotically equivalent, but as T increases, LIML has a smaller asymptotic bias than GMM. These results provide some theoretical support for LIML over GMM.9 Wansbeek & Knaap (1999) consider a simple dynamic panel data model with a time trend and heterogeneous coefficients on the lagged dependent variable and the time trend, i.e. yit = iyi, t–1 + it + i + uit
(39)
This model results from Islam’s (1995) version of Solow’s model on growth convergence among countries. Wansbeek & Knaap (1999) show that double differencing gets rid of the individual country effects (i) on the first round of differencing and the heterogeneous coefficient on the time trend (i) on the second round of differencing. Modified OLS, IV and GMM methods are adapted to this model and LIML is suggested as a viable alternative to GMM to guard against the small sample bias of GMM. Macroeconomic data are subject to measurement error and Wansbeek & Knaap (1999) show how these estimators can be modified to account for measurement error that is white noise. For example, GMM is modified so that it discards the orthogonality conditions that rely on the absence of measurement error. Jimenez-Martin (1998) performs Monte Carlo experiments to study the performance of the Holtz-Eakin (1988) test for the presence of individual
42
BADI H. BALTAGI & CHIHWA KAO
heterogeneity effects in dynamic small T unbalanced panel data models. The design of the experiment includes both endogenous and time-invariant regressors in addition to the lagged dependent variable. The test behaves correctly for a moderate autoregressive coefficient. However, when this coefficient approaches unity, the presence of an additional regressor sharply affects the power and the size of the test. The power of this test is higher when the variance of the specific effects increases (they are easier to detect), when the sample size increases, when the data set is balanced (for a given number of cross-section units) and when the regressors are strictly exogenous. A. Heterogeneous Dynamic Panel Data Models The fundamental assumption underlying pooled homogeneous parameter models has been called into question. Robertson & Symons (1992) warned about the bias from pooled estimators when the estimated model is dynamic and homogeneous when in fact the true model is static and heterogeneous. Pesaran & Smith (1995) argued in favor of dynamic heterogeneous models when N and T are large. In this case, pooled homogeneous estimators are inconsistent whereas an average estimator of heterogeneous parameters can lead to consistent estimates as N and T tend to infinity. Maddala, Srivastava & Li (1994) argued that shrinkage estimators are superior to either heterogeneous or homogeneous parameter estimates especially for prediction purposes. In fact, Maddala, Trost, Li & Joutz (1997) considered the problem of estimating short run and long run elasticities of residential demand for electricity and natural gas for each of 49 states over the period 1970–1990.10 They conclude that individual heterogeneous state estimates were hard to interpret and had the wrong signs. Pooled data regressions were not valid because the hypothesis of homogeneity of the coefficients was rejected. They recommend shrinkage estimators if one is interested in obtaining elasticity estimates for each state since these give more reliable results. Baltagi & Griffin (1997) compare short run and long run estimates as well as forecasts for pooled homogeneous, individual heterogeneous and shrinkage estimators of a dynamic demand model for gasoline across 18 OECD countries over the period 1960–1990. Based on one, five and ten year forecasts and plausibility of the short run and long run elasticity estimates, the results are in favor of pooling. Similar results were obtained for a dynamic model for cigarette demand across 46 states over the period 1963–1992, see Baltagi, Griffin & Xiong (2000). In chapter 8 of Matyas & Sevestre (1996), Pesaran, Smith & Im investigated the small sample properties of various estimators of the long run coefficients
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
43
for a dynamic heterogeneous panel data model using Monte Carlo experiments. Their findings indicate that the mean group estimator performs reasonably well for large T. However, when T is small, the mean group estimator could be seriously biased, particularly when N is large relative to T. Pesaran & Zhao (1999) examine the effectiveness of alternative bias-correction procedures in reducing the small sample bias of these estimators using Monte Carlo experiments. An interesting finding is that when the coefficient of the lagged dependent variable is greater than or equal to 0.8, none of the bias correction procedures seem to work. Hsiao, Pesaran & Tahmiscioglu (1999) suggest a Bayesian approach for estimating the mean parameters of a dynamic heterogeneous panel data model. The coefficients are assumed to be normally distributed across cross-sectional units and the Bayes estimator is implemented using Markov Chain Monte Carlo methods. Hsiao et al. argue that Bayesian methods can be a viable alternative in the estimation of mean coefficients in dynamic panel data models even when the initial observations are treated as fixed constants. They establish the asymptotic equivalence of this Bayes estimator and the mean group estimator proposed by Pesaran & Smith (1995). The asymptotics are carried out for both N and T → ∞ with N/T → 0. Monte Carlo experiments show that this Bayes estimator has better sampling properties than other estimators for both small and moderate size T. Hsiao et al. also caution against the use of the mean group estimator unless T is sufficiently large relative to N. The bias in the mean coefficient of the lagged dependent variable appears to be serious when T is small and the true value of this coefficient is larger than 0.6. Hsiao et al. apply their methods to estimate the q-investment model using a panel of 273 US firms over the period 1972–1993.
VII. CONCLUSION This survey gives a brief overview of some of the main results in the econometrics of nonstationary panels as well as recent developments in dynamic panels. There has been an immense amount of research in this area recently with the demand for empirical studies exceeding the supply of econometric theory developed for these models. As this survey indicates, several issues have been resolved but a lot remains to be done.
ACKNOWLEDGMENTS The authors would like to thank R. Carter Hill, M. H. Pesaran and an anonymous referee for their helpful comments and suggestions. Baltagi was
44
BADI H. BALTAGI & CHIHWA KAO
funded by the Advanced Research Program, Texas Higher Education Board. Kao was supported by a grant from the Chiang Ching-kou Foundation for International Scholarly Exchange.
NOTES 1. A collection of dynamic panel data routines can be found in: http://www.cemfi.es/ ~ arellano/#dpd. 2. Chiang & Kao (2000) have recently put together a fairly comprehensive set of subroutines, NPT 1.0, for studying nonstationary panel data. NPT 1.0 can be downloaded from http://web.syr.edu/ ~ cdkao. 3. Testing for cointegration in panel data by combining p-values tests is a straightforward extension of the testing procedures in this section. For cointegration tests, the relevant model is equation (15). We let GiTi be a test for the null of no cointegration and apply the same tests and asymptotic theory in this section. 4. Kiviet (1999) extends this derivation to the case of weakly exogenous variables and examines to what degree this order of approximation is determined by the initial conditions of the dynamic panel model. 5. Arellano (1989) found that using lagged differences of predetermined variables as instruments is not recommended since it has a singularity point and very large variances over a significant range of the parameter values. 6. See also Arellano & Bover (1995), chapter 8 of Baltagi (1995) and chapters 6 and 7 of Matyas & Sevestre (1996) for more details. 7. For a Hausman & Taylor (1981) type model, Metcalf (1996) shows that using less instruments may lead to a more powerful Hausman specification test. Asymptotically, more instruments lead to more efficient estimators. However, the asymptotic bias of the less efficient estimator will also be greater as the null hypothesis of no correlation is violated. Metcalf argues that if the bias increases at the same rate as the variance (as the null is violated) for the less efficient estimator, then the power of the Hausman test will increase. This is due to the fact that the test statistic is linear in variance but quadratic in bias. 8. See the growing literature on weak instruments by Nelson & Startz (1990), Bekker (1994), Angrist & Kreuger (1995), Bound, Jaeger & Baker (1995) and Staiger & Stock (1997) to mention a few. 9. An alternative one-step method that achieves the same asymptotic efficiency as robust GMM or LIML estimators is the maximum empirical likelihood estimation method, see Imbens (1997). This maximizes a multinomial pseudo-likelihood function subject to the orthogonality restrictions. These are invariant to normalization because they are maximum likelihood estimators. 10. Maddala et al. (1997) also provide a unified treatment of classical, Bayes and empirical Bayes procedures for estimating this model.
REFERENCES Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 5–27.
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
45
Ahn, S. C., & Schmidt, P. (1997). Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation. Journal of Econometrics, 76, 309–321. Ahn, S. C., & Schmidt, P. (1999a). Modified Generalized Instrumental Variables Estimation of Panel Data Models with Strictly Exogenous Instrumental Variables. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 171–198). Cambridge: Cambridge University Press. Ahn, S. C., & P. Schmidt. (1999b). Estimation of Linear Panel Data Models Using GMM. In: Generalized Method of Moments Estimation (pp. 211–247). Cambridge: Cambridge University Press. Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalized Instrumental Variable Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 36–49. Alvarez, J., & Arellano, M. (1997). The Time Series and Cross-section Asymptotics of Dynamic Panel Data Estimators. Working paper, CEMFI, Madrid. Andersen, T. G., & Sørensen, R. E. (1996). GMM Estimation of a Stochastic Volatility Model: A Monte Carlo Study. Journal of Business and Economic Statistics, 14, 328–352. Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics, 18, 47–82. Andersson, J., & Lyhagen, J. (1999). A Long Memory Panel Unit Root Test: PPP Revisited. Working paper, Economics and Finance, No. 303, Stockholm School of Economics, Sweden. Angrist, J. D., & Krueger, A. B. (1995). Split Sample Instrumental Variable Estimates of Return to Schooling. Journal of Business and Economic Statistics, 13, 225–235. Arellano, M. (1989). A Note on the Anderson-Hsiao Estimator for Panel Data. Economics Letters, 31, 337–341. Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and An Application to Employment Equations. Review of Economic Studies, 58, 277–297. Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variables Estimation of ErrorComponent Models. Journal of Econometrics, 68, 29–51. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley. Baltagi, B. H., & Griffin, J. M. (1995). A Dynamic Demand Model for Liquor: The Case for Pooling. Review of Economics and Statistics, 77, 545–553. Baltagi, B. H., & Griffin, J. M. (1997). Pooled Estimators v.s. Their Heterogeneous Counterparts in the Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303–327. Baltagi, B. H., Griffin, J. M. & Xiong, W. (2000). To Pool or Not to Pool: Homogeneous Versus Heterogeneous Estimators Applied to Cigarette Demand. Review of Economics and Statistics, 82, 117–126. Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of Economics and Statistics, 61, 607–629. Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variables Estimators. Econometrica, 62, 657–682. Bernard, A., & Jones, C. (1996). Productivity Across Industries and Countries: Time Series Theory and Evidence. Review of Economics and Statistics, 78, 135–146. Bhargava, A., Franzini, L. & Narendranathan, W. (1982). Serial Correlation and Fixed Effects Models. Review of Economic Studies, 49, 533–549. Binder, M., Hsiao, C. & Pesaran, M. H. (2000). Estimation and Inference in Short Panel Vector Autoregressions With Unit Roots and Cointegration. Working paper, Department of Economics, University of Maryland.
46
BADI H. BALTAGI & CHIHWA KAO
Blundell, R. W., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115–143. Blundell, R. W., Bond, S., & Windmeijer, F. (2000). Estimation in Dynamic Panel Data Models: Impoving on the Performance of the Standard GMM Estimator. Advances in Econometrics, 15, forthcoming. Boumahdi, R., & Thomas, A. (1991). Testing for Unit Roots Using Panel Data. Economics Letters, 37, 77–79. Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variables is Weak. Journal of the American Statistical Association, 90, 443–450. Breitung, J. (2000). The Local Power of Some Unit Root Tests for Panel Data. Advances in Econometrics, 15, forthcoming. Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different Bargaining Levels Cointegrated? Applied Economics, 26, 353–361. Canzoneri, M. B., Cumby, E. E., & Diba, B. (1999). Relative Labor Productivity and the Real Exchange Rate in the Long Run: Evidence for a Panel of OECD Countries. Journal of International Economics, 47, 245–266. Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management Sciences, 19, 75–114. Chiang, M. H., & Kao, C. (2000). Nonstationary Panel Time Series Using NPT 1.0 – A User Guide. Manuscript, Center for Policy Research, Syracuse University. Choi, I. (1999a). Unit Root Tests for Panel Data. Working paper, Department of Economics, Kookmin University, Korea. Choi, I. (1999b). Asymptotic Analysis of a Nonstationary Error Component Model. Working paper, Department of Economics, Kookmin University, Korea. Choi, I. (1999c). Instrumental Variables Estimation of a Nearly Nonstationary Error Component Model. Working paper, Department of Economics, Kookmin University, Korea. Coakley, J., & Fuertes, A. M. (1997). New Panel Unit Root Tests of PPP. Economics Letters, 57, 17–22. Coakely, J., Kulasi, F., & Smith, R. (1996). Current Account Solvency and the Feldstein-Horioka Puzzle. Economic Journal, 106, 620–627. Coe, D., & Helpman, E. (1995). International R&D Spillovers. European Economic Review, 39, 859–887. Conley, T. G. (1999). GMM Estimation with Cross Sectional Dependence. Journal of Econometrics, 92, 1–45. Crepon, B., Kramarz, F., & Trognon, A. (1997). Parameters of Interest, Nuisance Parameters and Orthogonality Conditions: An Application to Autoregressive Error Components Models. Journal of Econometrics, 82, 135–156. Culver, S. E., & Papell, D. H. (1997). Is There a Unit Root in the Inflation Rate? Evidence from Sequential Break and Panel Data Model. Journal of Applied Econometrics, 35, 155–160. Driscoll, J. C., & Kraay, A. C. (1998). Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data. Review of Economics and Statistics, 80, 549–560. Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37, 249–265. Entorf, H. (1997). Random Walks with Drifts: Nonsense Regression and Spurious Fixed-Effect Estimation. Journal of Econometrics, 80, 287–296.
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
47
Frankel, J. A., & Rose, A. K. (1996). A Panel Project on Purchasing Power Parity: Mean Reversion Within and Between Countries. Journal of International Economics, 40, 209–224. Funk, M. (1998). Trade and International R&D Spillovers Among OECD Countries. Working paper, Department of Economics, St. Louis University, St. Louis. Gerdtham, U. G., & Löthgren, M. (1998). On Stationarity and Cointegration of International Health Expenditure and GDP. Working paper, Economics and Finance, No. 232, Stockholm School of Economics, Sweden. Griliches, Z., & Mairesse, J. (1998). Production Functions: The Search for Identification. In: S. Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series, Cambridge: Cambridge University Press. Groen, J. J. J. (1999). The Monetary Exchange Rate Model as A Long-run Phenomenon. Journal of International Economics, forthcoming. Groen, J. J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of Vector Error Correction Models. Discussion paper 99–055/4, Tinbergen Institute, The Netherlands. Hadri, K. (1999). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root in Panel Data with Serially Correlated Errors. Manuscript, Department of Economics and Accounting, University of Liverpool, United Kingdom. Hahn, J. (1997). Efficient Estimation of Panel Data Models With Sequential Moment Restrictions. Journal of Econometrics, 79, 1–21. Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed Effects? Journal of Econometrics, 93, 309–326. Hall, S., Lazarova, S., & Urga, G. (1999). A Principal Components Analysis of Common Stochastic Trends in Heterogeneous Panel Data: Some Monte Carlo Evidence. Oxford Bulletin of Economics and Statistics, 61, 749–767. Harris, D., & Inder, B. (1994). A Test of the Null Hypothesis of Cointegration. In: C. P. Hargreaves (Ed.), Nonstationary Time Series Analysis and Cointegration. New York: Oxford University Press. Harris, R. D. F., & Tzavalis, E. (1999). Inference for Unit Roots in Dynamic Panels Where the Time Dimension is Fixed. Journal of Econometrics, 91, 201–226. Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects. Econometrica, 49, 1377–1398. Hillier, G. H. (1990). On the Normalization of Structural Equations: Properties of Direction Estimators. Econometrica, 58, 1181–1194. Holtz-Eakin, D. (1988). Testing for Individual Effects in Autoregressive Models. Journal of Econometrics, 39, 297–307. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Hsiao, C., Pesaran, M. H., & Tahmiscioglu, K. (1999). Bayes Estimation of Short-run Coefficients in Dynamic Panel Data Models. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 268–296). Cambridge: Cambridge University Press. Im, K. S., Ahn, S. C., Schmidt, P., & Wooldridge, J. M. (1999). Efficient Estimation of Panel Data Models with Strictly Exogenous Explanatory Variables. Journal of Econometrics, 93, 177–201. Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels. Manuscript, Department of Applied Economics, University of Cambridge, United Kingdom.
48
BADI H. BALTAGI & CHIHWA KAO
Imbens, G. (1997). One-Step Estimators for Over-identified Generalized Method of Moments Models. Review of Economic Studies, 64, 359–383. Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, 110, 1127–1170. Jimenez-Martin, S. (1998). On the Testing of Heterogeneity Effects in Dynamic Unbalanced Panel Data Models. Economics Letters, 58, 157–163. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Jorion, P., & Sweeney, R. (1996). Mean Reversion is Real Exchange Rates: Evidence and Implications for Forecasting. Journal of International Money and Finance, 15, 535–550. Judson, R. A., & Owen, A. L. (1999). Estimating Dynamic Panel Data Models: A Guide for Macroeconomists. Economics Letters, 65, 9–15. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 1–44. Kao, C., & Chiang, M. H. (2000). On the Estimation and Inference of a Cointegrated Regression in Panel Data. Advances in Econometrics, 15, forthcoming. Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data when the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center for Policy Research, Syracuse University. Kao, C., Chiang, M. H., & Chen, B. (1999). International R&D Spillovers: An Application of Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and Statistics, 61, 691–709. Karlsson, S., & Löthgren, M. (1999). On the Power and Interpretation of Panel Unit Root Tests. Working paper, Economics and Finance, No. 299, Stockholm School of Economics, Sweden. Kauppi, H. (2000). Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression with Near Integrated Regressors. Advances in Econometrics, 15, forthcoming. Keane, M. P., & Runkle, D. E. (1992). On the Estimation of Panel-data Models with Serial Correlation When Instruments are Not Strictly Exogenous. Journal of Business and Economics Statistics, 10, 1–9. Kiviet, J. F. (1995). On Bias, Inconsistency and Efficiency of Some Estimators in Dynamic Panel Data Models. Journal of Econometrics, 68, 53–78. Kiviet, J. F. (1999). Expectations of Expansions for Estimators in a Dynamic Panel Data Model: Some Results for Weakly Exogenous Regressors. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 199–225). Cambridge: Cambridge University Press. Larsson, R., Lyhagen, J., & Löthgren, M. (1998). Likelihood-Based Cointegration Tests In Heterogeneous Panels. Working paper, Economics and Finance, No. 250, Stockholm School of Economics, Sweden. Lee, K., Pesaran, M. H., & Smith, R. (1997). Growth and Convergence in a Multi-Country Empirical Stochastic Solow Model. Journal of Applied Econometrics, 12, 357–392. Levin, A., & Lin, C. F. (1992). Unit Root Test in Panel Data: Asymptotic and Finite Sample Properties. Discussion paper No. 92–93, University of California at San Diego. Lothian, J. R. (1996). Multi-Country Evidence on the Behavior of Purchasing Power Parity Under the Current Float. Journal of International Money and Finance, 16, 19–35. MacDonald, R. (1996). Panel Unit Root Tests and Real Exchange Rates’’ Economics Letters, 50, 7–11.
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
49
Maddala, G. S. (1999). On the Use of Panel Data Methods with Cross Country Data. Annales d’Economie et de Statistique, 55–56, 429–448. Maddala, G. S., Srivastava, V. K., & Li, H. (1994). Shrinkage Estimators for the Estimation of Short-run and Long-run Parameters From Panel Data Models. Working paper, Ohio State University, Ohio. Maddala, G. S., Trost, R. P., Li, H., & Joutz, F. (1997). Estimation of Short-run and Long-run Elasticities of Energy Demand from Panel Data Using Shrinkage Estimators. Journal of Business and Economic Statistics, 15, 90–100. Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and A New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631–652. Maddala, G. S., Wu, S., & Liu, P. (2000). Do Panel Data Rescue Purchasing Power Parity (PPP) Theory? In: J. Krishnakumar & E. Ronchetti (Eds.), Panel Data Econometrics: Future Directions (pp. 35–51). Amsterdam: North-Holland. Mátyás, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: A Handbook of Theory and Applications. Dordrecht: Kluwer Academic Publishers. McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 57–84. McCoskey, S., & Kao, C. (1999a). Testing the Stability of a Production Function with Urbanization as a Shift Factor: An Application of Non-Stationary Panel Data Techniques. Oxford Bulletin of Economics and Statistics, 61, 671–690. McCoskey, S., & Kao, C. (1999b). Comparing Panel Data Cointegration Tests with an Application of the Twin Deficits Problems. Working paper, Center for Policy Research, Syracuse University, New York. McCoskey, S., & Selden, T. (1998). Health Care Expenditures and GDP: Panel Data Unit Root Test Results. Journal of Health Economics, 17, 369–376. Metcalf, G. E. (1996). Specification Testing in Panel Data with Instrumental Variables. Journal of Econometrics, 71, 291–307. Moon, H. R., & Phillips, P. C. B. (1998). A Reinterpretation of the Feldstein-Horioka Regressions from a Nonstationary Panel Viewpoint. Working paper, Department of Economics, Yale University. Moon, H. R., & Phillips, P. C. B. (1999). Maximum Likelihood Estimation in Panels with Incidental Trends. Oxford Bulletin of Economics and Statistics, 61, 711–747. Nelson, C., & Startz, R. (1990). The Distribution of the Instrumental Variables Estimator and Its t-ratio When the Instrument Is A Poor One. Journal of Business, 63, S125-S140. Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri, L. F. Lee & M.H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 136–170). Cambridge: Cambridge University Press. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 1417–1426. O’Connell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of International Economics, 44, 1–19. Oh, K. Y. (1996). Purchasing Power Parity and Unit Roots Tests Using Panel Data. Journal of International Money and Finance, 15, 405–418. Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float. Journal of International Economics, 43, 313–332. Pedroni, P. (1996). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper, Department of Economics, Indiana University.
50
BADI H. BALTAGI & CHIHWA KAO
Pedroni, P. (1997a). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests with an Application to the PPP Hypothesis. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power Parity in Panels. Working paper, Department of Economics, Indiana University. Pedroni, P. (1999). Critical Values for Cointegration Tests in Heterogeneous Panels with Multiple Regressors. Oxford Bulletin of Economics and Statistics, 61, 653–678. Pedroni, P. (2000). Testing for Convergence to Common Steady States in Nonstationary Heterogeneous Panels. Working paper, Department of Economics, Indiana University. Pesaran, M. H., & Smith, R. (1995). Estimating Long-run Relationships From Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79–113. Pesaran, M. H., Shin, Y., & Smith, R. (1999). Pooled Mean Group Estimation of Dynamic Heterogeneous Panels. Journal of the American Statistical Association, 94, 621–634. Pesaran, M. H., & Zhao, Z. (1999). Bias Reduction in Estimating Long-run Relationships From Dynamic Heterogeneous Panels. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Persaran (Eds.), Analysis of Panels and Limited Dependent Variable Models (pp. 297–322). Cambridge: Cambridge University Press. Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables Regression with I (1) Processes. Review of Economic Studies, 57, 99–125. Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 1057–1111. Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some Recent Developments. Econometric Reviews, forthcoming. Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for Cointegration. Econometrica, 58, 165–193. Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 9–19. Quah, D. (1996). Empirics for Economic Growth and Convergence. European Economic Review, 40, 1353–1375. Robertson, D., & Symons, J. (1992). Some Strange Properties of Panel Data Estimators. Journal of Applied Econometrics, 7, 175–189. Saikkonen, P. (1991). Asymptotically Efficient Estimation of Cointegrating Regressions. Econometric Theory, 58, 1–21. Sala-i-Martin, X. (1996). The Classical Approach to Convergence Analysis. Economic Journal, 106, 1019–1036. Schmidt, P., Ahn, S. C. & Wyhowski, D. (1992). Comment. Journal of Business and Economic Statistics, 10, 10–14. Shin, Y. (1994). A Residual Based Test of the Null of Cointegration Against the Alternative of No Cointegration. Econometric Theory, 10, 91–115. Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression With Weak Instruments. Econometrica, 65, 557–586. Stock, J. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems. Econometrica, 61, 783–820. Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems. Econometrica, 61, 783–820. Tauchen, G. (1986). Statistical Properties of Generalized Method of Moments Estimators of Structural Parameters Obtained From Financial Market Data. Journal of Business and Economic Statistics, 4, 397–416.
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
51
Wansbeek, T. J., & Bekker, P. (1996). On IV, GMM and ML in a Dynamic Panel Data Model. Economics Letters , 51, 145–152. Wansbeek, T. J., & Knaap, T. (1999). Estimating a Dynamic Panel Data Model with Heterogenous Trends. Annales d’Economie et de Statistique, 55–56, 331–349. Wooldridge, J. M. (1997). Multiplicative Panel Data Models Without the Strict Exogeneity Assumption. Econometric Theory, 13, 667–678. Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo Study. Working paper, Department of Economics, State University of New York at Buffalo, New York. Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel Data Set. Journal of Money, Credit and Banking, 28, 54–63. Ziliak, J. P. (1997). Efficient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-condition Estimators. Journal of Business and Economic Statistics, 15, 419–431.
ESTIMATION IN DYNAMIC PANEL DATA MODELS: IMPROVING ON THE PERFORMANCE OF THE STANDARD GMM ESTIMATOR Richard Blundell, Stephen Bond and Frank Windmeijer ABSTRACT This chapter reviews developments to improve on the poor performance of the standard GMM estimator for highly autoregressive panel series. It considers the use of the ‘system’ GMM estimator that relies on relatively mild restrictions on the initial condition process. This system GMM estimator encompasses the GMM estimator based on the non-linear moment conditions available in the dynamic error components model and has substantial asymptotic efficiency gains. Simulations, that include weakly exogenous covariates, find large finite sample biases and very low precision for the standard first differenced estimator. The use of the system GMM estimator not only greatly improves the precision but also greatly reduces the finite sample bias. An application to panel production function data for the U.S. is provided and confirms these theoretical and experimental findings.
1. INTRODUCTION Much of the recent literature on dynamic panel data estimation has focused on providing optimal linear Generalised Method of Moments (GMM) estimators Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 53–91. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
53
54
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
under relatively weak auxiliary assumptions about the exogeneity of the covariate processes and the properties of the heterogeneity and error term processes. A standard approach is to first-difference the equation to remove permanent unobserved heterogeneity, and to use lagged levels of the series as instruments for the predetermined and endogenous variables in first-differences (see Anderson & Hsiao (1981), Holtz-Eakin, Newey & Rosen (1988) and Arellano & Bond (1991)). However, in dynamic panel data models where the series are highly autoregressive and the number of time series observations is moderately small, this standard GMM estimator has been found to have large finite sample bias and poor precision in simulation studies (see the experimental evidence and theoretical discussions in Ahn & Schmidt (1995) and Alonso-Borrego & Arellano (1999), for example). The poor performance of the standard GMM panel data estimator is also reflected in empirical experience with estimation on relatively short panels with highly persistent data. To quote from the extensive review of production function estimation by Griliches & Mairesse (1998) – one of the original applications for panel data estimation – “In empirical practice, the application of panel methods to micro-data produced rather unsatisfactory results: low and often insignificant capital coefficients and unreasonably low estimates of returns to scale.” One simple explanation of these findings in the production function context is that lagged levels of the series provide weak instruments for first-differenced variables in this case (see Blundell & Bond (2000)). One response to these findings has been to consider the use of further moment conditions that have improved properties for the estimates of the parameters of interest. For example, Ahn & Schmidt (1995) consider the nonlinear moment conditions implied by the standard error components formulation and show that asymptotic variance ratios can be considerably improved. Blundell & Bond (1998) consider alternative estimators that require further restrictions on the initial conditions process, designed to improve the properties of the standard first-differenced instrumental variables estimator. This also provides the motivation for the discussion in this chapter. The idea is to consider the performance of a ‘system’ GMM estimator that relies on relatively mild restrictions on the initial condition process to improve the performance of the GMM estimator in the dynamic panel data context. The material presented draws extensively from the existing literature. For example, Arellano & Bover (1995) and Blundell & Bond (1998) show that mean stationarity in an AR(1) panel data model is sufficient to justify the use of lagged differences of the dependent variable as instruments for equations in levels, in addition to lagged levels as instruments for equations in firstdifferences. This result naturally extends to models with weakly exogenous
GMM Estimation in Dynamic Panel Data Models
55
covariates. The Monte Carlo simulations and asymptotic variance calculations reported in this paper show that this extended GMM estimator can offer considerable efficiency gains in the situations where the standard firstdifferenced GMM estimator performs poorly. Given this restriction on the initial conditions, the system GMM estimator is also shown to encompass the GMM estimator based on the non-linear moment conditions available in the dynamic error components model (see Ahn & Schmidt (1995)). The system GMM estimator has substantial asymptotic efficiency gains relative to this nonlinear GMM estimator, and these are reflected in their finite sample properties. The chapter is organised in the following way. The next section reviews the standard error components structure for a linear dynamic panel data model and lays out the underlying assumptions. Recalling that Within Groups, GLS and OLS on the levels and first-differenced models all suffer from bias even when the cross-section dimension is large, this section also briefly considers the biases that occur for standard panel data estimators in dynamic models. Section 3 then presents the linear GMM estimator for this model that uses lagged information to instrument current differences in a first-differenced specification. The following section then outlines the problem of weak instruments in this case. Following the discussion in Ahn & Schmidt (1995), Section 5 considers the use of further non-linear moment conditions that are implied by the model outlined in Section 2. Section 6 derives a linear moment restriction for the levels model using initial condition restrictions and this is then incorporated into the full system GMM estimator. Asymptotic variance comparisons among these various GMM estimators are given in Section 8. The detailed discussion in these earlier sections uses an AR(1) model and the extension to a multivariate setting is presented in Section 9. Finally, before moving to the Monte Carlo results and empirical application, over-identification tests are reviewed. The Monte Carlo results presented in Section 11 are the first in the literature to consider the properties of these GMM estimators in dynamic models with weakly exogenous regressors. As this is perhaps the most common case in empirical applications, these results have important bearing on applied work. The analysis finds both a large bias and very low precision for the standard first-differenced estimator when the individual series are highly autoregressive. The use of the system GMM estimator not only greatly improves the precision but also greatly reduces the finite sample bias. Exploiting the non-linear moment conditions also provides significant gains compared to the standard first-differenced GMM estimator, but these gains are much less dramatic than
56
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
those provided by the system GMM estimator when the initial conditions restriction is valid. The empirical application returns to the Griliches and Mairesse discussion. The application uses production function data for the U.S. and confirms the Griliches and Mairesse findings for the capital and labor coefficients in a CobbDouglas model. Using the standard first-differenced GMM estimator, the estimated coefficient on capital is very low and all coefficient estimates have poor precision. Constant returns to scale is easily rejected. Moreover, an examination of the individual series suggests that they are highly autoregressive thus hinting at a weak instruments problem for standard GMM on this data. These production function results are improved by using the system estimator. The capital coefficient is now more precise and takes a reasonable value and constant returns to scale is not rejected. These Monte Carlo and empirical results indicate that a careful examination of the original series and use of the system GMM estimator can overcome many of the disappointing features of the standard GMM estimator in the context of highly persistent series.
2. DYNAMIC MODELS AND THE BIASES FROM STANDARD PANEL DATA ESTIMATORS To analyse the properties of estimators of the parameters in linear dynamic panel data models we consider an autoregressive panel data model of the form yit = yit 1 + xit + uit
(2.1)
uit = i + vit
(2.2)
for i = 1, . . . , N and t = 2, . . . , T, where i + vit is the usual ‘error components’ decomposition of the error term; N is large, T is fixed and || < 1.1 This model specification is sufficient to cover most of the standard cases encountered in linear dynamic panel applications. Allowing the inclusion of xit 1 provides the autoregressive panel data model yit = yi, t 1 + 1xit + 2xit 1 + i + vit which has the corresponding ‘common factor’ restricted (2 = 1) form yit = 1xit + fi + it, with it = i, t 1 + vit and i = (1 )fi. In our Monte Carlo study and application to panel data production function equations presented in Sections 11 and 12 we allow for the inclusion of xit regressors, but for the evaluation of the various estimators we use an AR(1) model with unobserved individual-specific effects
GMM Estimation in Dynamic Panel Data Models
yit = yi, t 1 + uit
57
(2.3)
uit = i + vit for i = 1, . . . , N and t = 2, . . . , T.2 At the outset we will assume that i and vit have the familiar error components structure in which E(i) = 0, E(vit) = 0, E(viti) = 0 for i = 1, . . . , N and t = 2, . . . , T
(2.4)
E(vitvis) = 0 for i = 1, . . . , N and t ≠ s.
(2.5)
and In addition there is the standard assumption concerning the initial conditions yi1 (see Ahn & Schmidt (1995), for example) E(yi1vit) = 0 for i = 1, . . . , N and t = 2, . . . , T.
(2.6)
These ‘standard assumptions’ (2.4), (2.5) and (2.6) imply moment restrictions that are sufficient to (identify and) estimate for T ≥ 3.3 Further restrictions on the initial conditions define a mean stationary process as yi1 =
i + i1 1
for i = 1, . . . , N
(2.7)
and E( i1) = E(i i1) = 0 for i = 1, . . . , N,
(2.8)
and a covariance stationary process by further specifying E(v2it) = 2v E( 2i1) =
for i = 1, . . . , N and t = 2, . . . , T
2v 1 2
for i = 1, . . . , N.
For completeness and to conclude this brief outline of the dynamic error components model, we consider the biases from the standard panel data estimators in this model. We consider here the biases found under covariance stationarity (for more details see Baltagi (1995) and Hsiao (1986)). The asymptotic bias of the simple OLS estimator for in model (2.3), is given by plim(ˆ OLS ) = (1 )
2/ 2v 1 , with k = ,
/ 2v + k 1+ 2
where 2 = E(2i ), and therefore the OLS estimator is biased upwards, with < plim(ˆ OLS) < 1.
58
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
The asymptotic bias of the Within Groups estimator for has been documented by Nickell (1981) and is given by
1+ 1 1 T 1 T1 T (1 ) , plim(ˆ WG ) = 2 1 1 T 1 1 (1 )(T 1) T (1 ) and so, when > 0, plim(ˆ WG) < . When the model is transformed into first-differences to eliminate the unobserved individual heterogeneity component i, yit = yit 1 + uit, the asymptotic bias of the OLS estimator is given by 1+ , plim(ˆ OLSd ) = 2 1 and so plim(ˆ OLSd) = < 0. 2
3. A FIRST-DIFFERENCED GMM ESTIMATOR 3.1. The Standard Moment Conditions In the absence of any further restrictions on the process generating the initial conditions, the autoregressive error components model (2.3)–(2.6) implies the following md = 0.5(T 1)(T 2) orthogonality conditions which are linear in the parameter E(yi, t suit) = 0; for t = 3, . . . , T and 2 ≤ s ≤ t 1, (3.1) where uit = uit ui, t 1. These depend only on the assumed absence of serial correlation in the time varying disturbances vit, together with the restriction (2.6). The moment restrictions in (3.1) can be expressed more compactly as E(Zdiui) = 0, where Zdi is the (T 2) md matrix given by yi1 0 0 . . . 0 . . . 0 0 0 yi1 yi2 . . . 0 . . . Zdi = . , . . ... . ... . 0 0 0 . . . yi1 . . . yiT 2 and ui is the (T 2) vector (ui3, ui4, . . . , uiT).
GMM Estimation in Dynamic Panel Data Models
59
The Generalised Method of Moments (GMM) estimator based on these moment conditions minimises the quadratic distance uZdWNZdu for some metric WN, where Zd is the md N(T 2) matrix (Zd1, Zd2, . . . , ZdN) and u is the N(T 2) vector (u1, u2, . . . , uN). This gives the GMM estimator for as ˆ d = (y 1ZdWNZdy 1) 1y 1ZdWNZdy, where yi is the (T 2) vector (yi3, yi4, . . . , yiT), yi, 1 is the (T 2) vector (yi2, yi3, . . . , yi, T 1), and y and y 1 are stacked across individuals in the same way as u. Alternative choices for the weights WN give rise to a set of GMM estimators based on the moment conditions in (3.1), all of which are consistent for large N and finite T, but which differ in their asymptotic efficiency.4 In general the optimal weights are given by
N
1 WN = N
ZdiuiuiZdi
i=1
1
(3.2)
where ui are residuals from an initial consistent estimator. We refer to this as the two-step GMM estimator.5 In the absence of any additional knowledge about the process for the initial conditions, this estimator is asymptotically efficient in the class of estimators based on the linear moment conditions (3.1) (see Hansen (1982) and Chamberlain (1987)). 3.2. Homoskedasticity Ahn & Schmidt (1995) show that additional linear moment conditions are available if the vit disturbances are homoskedastic through time, i.e. if E(v2it) = 2i for t = 2, . . . , T.
(3.3)
This implies T 3 orthogonality restrictions of the form E(yi, t 2ui, t 1 yi, t 1uit) = 0; for t = 4, . . . , T
(3.4)
and allows a further T 3 columns to be added to the instrument matrix Zdi. The additional columns Zhi are yi2 0 Zhi = . 0
yi3 yi3 . 0
0 yi4 . 0
... 0 ... 0 ... . . . . yiT 2
0 0 . . yiT 1
60
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
Calculation of the one-step and two-step GMM estimators then proceeds exactly as described above.
4. WEAK INSTRUMENTS The instruments used in the standard first-differenced GMM estimator become less informative in two important cases. First, as the value of the autoregressive parameter increases towards unity; and second, as the variance of the individual effects i increases relative to the variance of vit. To examine this further consider the case with T = 3. In this case, the moment conditions corresponding to the standard GMM estimator reduce to a single orthogonality condition. The corresponding method of moments estimator reduces to a simple two stage least squares (2SLS) estimator, with first stage (instrumental variable) regression yi2 = dyi1 + ri for i = 1, . . . , N. For sufficiently high autoregressive parameter or for sufficiently high relative variance of the individual effects, the least squares estimate of the reduced form coefficient d can be made arbitrarily close to zero. In this case the instrument yi1 is only weakly correlated with yi2. To see this notice that the model (2.3) implies that yi2 = ( 1)yi1 + i + vi2 for i = 1, . . . , N.
(4.1)
The least squares estimator of ( 1) in (4.1) is generally biased upwards, towards zero, since we expect E(yi1i) > 0. Assuming covariance stationarity ˆ d is given by and letting 2 = var(i) and 2v = var(vit), the plim of plim ˆ d = ( 1)
k
+k
2 2 v
; with k =
1 . 1+
(4.2)
The bias term effectively scales the estimated coefficient on the instrumental variable yi1 toward zero. We find that plim ˆ d → 0 as → 1 or as ( 2/ 2v ) → , which are the cases in which the first stage F-statistic is Op(1). A graph showing both plim ˆ d and 1 against is given in Fig. 1, for 2 = 2v , T = 3. We are interested in inferences using this first-differenced instrumental variable (IV) estimator when d is local to zero, that is where the instrument yi1 is only weakly correlated with yi2. Following Nelson & Startz (1990a, b) and Staiger & Stock (1997) we characterise this problem of weak instruments using the concentration parameter. First note that the F-statistic for the first stage instrumental variable regression converges to a noncentral chi-squared with one
GMM Estimation in Dynamic Panel Data Models
Fig. 1.
61
plim ˆ d and 1, 2 = 2 , T = 3. Source: Blundell & Bond (1998).
degree of freedom. The concentration parameter is then the corresponding noncentrality parameter which we label in this case. The IV estimator performs poorly when approaches zero. Assuming covariance stationarity, has the following simple characterisation in terms of the parameters of the AR model =
( 2v k)2 1 ; with k = .
2 + 2v k 1+
The performance of the standard GMM differenced estimator in this AR(1) specification can therefore be seen to deteriorate as → 1, as well as for decreasing values of 2v and for increasing values of 2. To illustrate this further Fig. 2 provides a plot of against for the case 2 = 2v = 1, T = 3. Blundell & Bond (2000) note that the finite sample bias of the firstdifferenced GMM estimator for the AR(1) model with weak instruments is likely to be in the direction of the Within Groups estimator. This is because the (one-step) first-differenced GMM estimator coincides with a 2SLS estimator based on the ‘orthogonal deviations’ transformation of Arellano & Bover (1995), and 2SLS estimators are biased in the direction of OLS in the presence of weak instruments (see, for example, Bound, Jaeger & Baker (1995)).6 We explore the finite sample behaviour of the first-differenced GMM estimator further in Section 11 below.
62
Fig. 2.
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
Concentration Parameter , 2 = 2 = 1, T = 3. Source: Blundell & Bond (1998).
5. NON-LINEAR MOMENT CONDITIONS 5.1. Standard Assumptions The standard assumptions (2.4), (2.5) and (2.6) also imply non-linear moment conditions which are not exploited by the standard linear first-differenced GMM estimator described in Section 3.1. Ahn & Schmidt (1995) show that there are a further T 3 non-linear moment conditions, which can be written as E(uitui, t 1) = 0; for t = 4, 5, . . . , T
(5.1)
and which could be expected to improve efficiency. These conditions relate directly to the absence of serial correlation in vit and do not require homoskedasticity. Thus, under the standard assumptions, the complete set of second-order moment conditions available is (3.1) and (5.1). Asymptotic efficiency comparisons reported in Ahn & Schmidt (1995) confirm that these non-linear moments are particularly informative in cases where is close to unity and/or where 2/ 2v is high.
GMM Estimation in Dynamic Panel Data Models
63
5.2. Homoskedasticity Under the homoskedasticity through time restriction (3.3), there is one further non-linear moment condition available, in addition to (3.1), (3.4) and (5.1) (see Ahn & Schmidt (1995)). This can be written as
T
1 E(uiui3) = 0 where ui = T1
uit.
(5.2)
t=2
Thus, under the homoskedasticity assumption in addition to the standard assumptions, the complete set of moment conditions available comprises the linear conditions (3.1) and (3.4), and the non-linear conditions (5.1) and (5.2).
6. INITIAL CONDITIONS AND A LEVELS GMM ESTIMATOR In addition to the standard assumptions set out in Section 2, we now consider the additional assumption E(iyi2) = 0 for i = 1, . . . , N.
(6.1)
Notice that, given (2.3)–(2.6) which specifies yi2 given yi1, assumption (6.1) is a restriction on the initial conditions process generating yi1.7 If this initial conditions restriction holds in addition to the standard assumptions (2.4), (2.5) and (2.6), the following T 2 linear moment conditions are valid E(uityi, t 1) = 0; for t = 3, 4, . . . , T.
(6.2)
Moreover, given the standard assumptions, these linear moment conditions imply the T 3 non-linear moment conditions given in (5.1), and render these non-linear conditions redundant for estimation. Thus the complete set of second order moment restrictions implied by (2.3)–(2.6) and (6.1) can be implemented as a linear GMM estimator. To consider when the first-differences yit are uncorrelated with the individual effects, notice that for the AR(1) model (2.3)
t3
t2
yit =
yi2 +
sui, t s
s=0
so that yit will be uncorrelated with i if and only if yi2 is uncorrelated with i. This is precisely the assumption (6.1). To guarantee this, we require the initial conditions restriction
64
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
E
yi1
i i = 0, 1
which is satisfied under mean stationarity of the yit process, as defined by (2.3)–(2.8). To show that the moment conditions (6.2) remain informative when approaches unity or 2/ 2v becomes large, we again consider the case of T = 3. Here we can use one equation in levels yi3 = yi2 + i + vi3 for which the instrument available is yi2, and the first stage regression is yi2 = lyi2 + ri. In this case, assuming covariance stationarity, the plim ˆ l is given by8 plim ˆl=
1 2
(6.3)
and therefore this moment condition stays informative for high values of , in contrast to the moment condition available for the first-differenced model. The 0.5(T + 1)(T 2) linear moment conditions (3.1) and (6.2) comprise the full set of second-order moment conditions under mean stationarity in conjunction with the standard assumptions listed in Section 2, and form the basis for a system GMM estimator which will be discussed in the next section. However, as this system GMM estimator combines the moment conditions for the model in first-differences with those for the model in levels, we also consider a simpler GMM levels estimator, that is based on the ml = 0.5(T 1)(T 2) moment conditions E(uityi, t s) = 0; for t = 3, . . . , T and 1 ≤ s ≤ t 2,
(6.4)
that relate only to the equations in levels. These can be expressed as E(Zliui) = 0, where Zli is the (T 2) ml matrix given by 0 0 yi2 0 yi2 yi3 Zli = . . . 0 0 0
... 0 ... 0 ... . . . . yi2
... 0 ... 0 , ... . . . . yiT 1
and ui is the (T 2) vector (ui3, ui4, . . . , uiT). Calculation of the one-step and
GMM Estimation in Dynamic Panel Data Models
65
two-step GMM estimators then proceeds in a similar way to that described above. In this case though, unless 2 = 0, there is no one-step GMM estimator that is asymptotically equivalent to the two-step estimator, even in the special case of i.i.d. disturbances.9
7. A SYSTEM GMM ESTIMATOR 7.1. The Optimal Combination of Differenced and Levels Estimators Calculation of the GMM estimator using the full set of linear moment conditions (3.1) and (6.2) can be based on a stacked system comprising all T 2 equations in first-differences and the T 2 equations in levels corresponding to periods 3, . . . , T, for which instruments are observed. The ms = 0.5(T + 1)(T 2) moment conditions are10 E(yi, t suit) = 0; for t = 3, . . . , T and 2 ≤ s ≤ t 1
(7.1)
E(uityi, t 1) = 0; for t = 3, . . . , T.
(7.2)
These can be expressed as E(Zsipi) = 0, where pi =
Zsi =
Zdi 0 0 Zpli
ui ui
Zdi 0 0 0 0 yi2 = 0 0 yi3 . . . 0 0 0
... 0 ... 0 ... 0 ; ... 0 . . . yi, T 1
with Zdi as defined in section 3, and Zpli is the non-redundant subset of Zli. The calculation of the two-step GMM estimator is then analogous to that described above. Again in this case, unless 2 = 0, there is no one-step GMM estimator that is asymptotically equivalent to the two-step estimator, even in the special case of i.i.d. disturbances.11 The system GMM estimator is clearly a combination of the GMM differenced estimator and a GMM levels estimator that uses only (7.2). This combination is linear for the system 2SLS estimator which is given by
66
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
ˆ s = (q 1Zs( ZsZs) 1Zsq 1) 1q 1Zs(ZsZs) 1Zsq, where qi =
yi . yi
Because q 1Zs(ZsZs) 1Zsq 1 = y 1Zd(ZdZd) 1Zdy 1 + y 1Zpl (ZplZpl ) 1Zpl y 1 the system 2SLS estimator is equivalent to the linear combination ˆ s = ˆ d + (1 ) ˆ pl , p where ˆ d and ˆ l are the 2SLS first-differenced and levels estimators respectively, with the levels estimator utilising only the T 2 moment conditions (7.2), and y 1Zd(ZdZd) 1Zdy 1 = y 1Zd(ZdZd) 1Zdy 1 + y 1Zpl (ZplZpl ) 1Zpl y 1 ˆ dZdZd ˆd = , ˆ dZdZd ˆ d + ˆ lZpl Zpl ˆl ˆ l are the OLS estimates of the first stage regression coefficients where ˆ d and underlying these 2SLS estimators. From (4.2) and (6.3) it follows that → 0 if → 1 and/or ( 2/ 2v ) → , so all the weight for the system estimator will in these cases be given to the informative levels moment conditions (7.2). 7.2. Homoskedasticity In the case where the initial conditions satisfy restriction (6.1) and the vit satisfy restriction (3.3), Ahn & Schmidt (1995, equation (12b)) show that the T 2 homoskedasticity restrictions (3.4) and (5.2) can be replaced by a set of T 2 moment conditions E(yituit yi, t 1ui, t 1) = 0; for t = 3, . . . , T, which are all linear in the parameter . The non-linear conditions (5.2) are again redundant for estimation given (6.1), and the complete set of second order moment restrictions implied by (2.3)–(2.6), (3.3) and (6.1) can be implemented as a linear GMM estimator.
8. ASYMPTOTIC VARIANCE COMPARISONS To quantify the gains in asymptotic efficiency that result from exploiting the linear moment conditions (6.2), Table 1 reports the ratio of the asymptotic variance of the standard first-differenced GMM estimator described in Section 3.1 to the asymptotic variance of the system GMM estimator described in
GMM Estimation in Dynamic Panel Data Models
Table 1.
67
Asymptotic Variance Ratios
2/ 2v = 1.00
SYS
2/ 2v = 0.25
NON-LINEAR
SYS
NON-LINEAR
T=3
0.0 0.3 0.5 0.8 0.9
1.33 2.15 4.00 28.00 121.33
n/a
1.33 1.89 2.91 13.10 47.91
n/a
T=4
0.0 0.3 0.5 0.8 0.9
1.75 2.31 3.26 13.97 55.40
1.67 1.91 2.10 2.42 2.54
1.40 1.77 2.42 8.88 30.90
1.29 1.33 1.35 1.41 1.45
Source: Blundell & Bond (1998)
Section 7.1. These asymptotic variance ratios are calculated assuming both covariance stationarity and homoskedasticity. They are presented for T = 3 and T = 4, for two fixed values of 2/ 2v , and for a range of values of the autoregressive parameter . For comparison, we also reproduce from Ahn & Schmidt (1995) the corresponding asymptotic variance ratios comparing firstdifferenced GMM to the non-linear GMM estimator which uses the quadratic moment conditions (5.1), but not the extra linear moment conditions (6.2). In the T = 3 case there are no quadratic moment restrictions available. These calculations suggest that exploiting conditions (6.2) can result in dramatic efficiency gains when T = 3, particularly at high values of and high values of
2/ 2v . These are indeed the cases where we find the instruments used to obtain the first-differenced estimator to be weak. In the T = 4 case we still find dramatic efficiency gains at high values of . Comparison to the results for the non-linear GMM estimator also shows that the gains from exploiting conditions (6.2) can be much larger than the gains from simply exploiting the non-linear restrictions (5.1). In the Monte Carlo simulations presented in Section 11 we investigate whether similar improvements are found in finite samples.
9. MULTIVARIATE DYNAMIC PANEL DATA MODELS In this section the dynamic panel data model with additional regressors is considered.12 In particular, we focus on the model
68
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
yit = yit 1 + xit + uit uit = i + vit
(9.1)
where xit is a scalar. The error components i and vit again satisfy the conditions (2.4)–(2.6). The xit process is correlated with the individual effects i and we consider three possible correlation structures between the xit process and the vit error process that determine the instruments that can be used to estimate and . First, the xit process is strictly exogenous: E(xisvit) = 0; for s = 1, . . . , T; t = 2, . . . , T.
(9.2)
Secondly, the xit process is weakly exogenous, or predetermined E(xisvit) = 0; for s = 1, . . . , t; t = 2, . . . , T
(9.3)
E(xisvit) ≠ 0; for s = t + 1, . . . , T; t = 2, . . . , T and thirdly, the xit process is endogenously determined E(xisvit) = 0; for s = 1, . . . , t 1; t = 2, . . . , T
(9.4)
E(xisvit) ≠ 0; for s = t, . . . , T; t = 2, . . . , T. We are especially interested in the case when the xit process is endogenously determined, which includes simultaneous processes, but also measurement error. For the GMM first-differenced estimator, the 0.5(T 1)(T 2) moment conditions (3.1) E(yi, t suit) = 0; for t = 3, . . . , T and 2 ≤ s ≤ t 1 remain valid. When the xit process is strictly exogenous, the following additional T(T 2) moment conditions are valid E(xisuit) = 0; for t = 3, . . . , T and 1 ≤ s ≤ T.
(9.5)
When xit is predetermined there are only the 0.5(T + 1)(T 2) additional moment conditions E(xi, t suit) = 0; for t = 3, . . . , T and 1 ≤ s ≤ t 1,
(9.6)
whereas when xit is endogenously determined only the following 0.5(T 1)(T 2) additional moment conditions are valid E(xi, t suit) = 0; for t = 3, . . . , T and 2 ≤ s ≤ t 1.
(9.7)
For the non-linear GMM estimator, moment conditions (5.1) remain valid, and no further moment conditions result from the presence of xit variables.
GMM Estimation in Dynamic Panel Data Models
69
For the system GMM estimator, we first consider under what conditions both yit and xit are uncorrelated with i. In order to illustrate this, we specify the following process for the regressor xit = xi, t 1 + i + eit. Thus ≠ 0 allows the level of xit to be correlated with i, and the covariance properties between vit and eis determine whether xit is strictly exogenous, predetermined or endogenously determined. First notice that
t3
t2
xit =
xi2 +
sei, t s,
s=0
so that xit will be correlated with i if and only if xi2 is correlated with i. To guarantee E[xi2i] = 0 we require the initial conditions restriction
E
xi1
i i = 0 1
(9.8)
which is satisfied under mean stationarity of the xit process. Given this restriction, writing yit as
t3
t2
yit =
yi2 +
s(xi, t s + ui, t s)
(9.9)
s=0
shows that yit will be correlated with i if and only if yi2 is correlated with i. To guarantee E[yi2i] = 0 we then require the similar initial conditions restriction
E
yi1
i + i 1 1
i
=0
(9.10)
which would again be satisfied under stationarity. Thus, there are additional moment restrictions available for the equations in levels when the yit and xit processes are both mean stationary. Whilst jointly stationary means is sufficient to ensure that both yit and xit are uncorrelated with i, this condition is stronger than is necessary. For example, if the conditional model (9.1) has generated the yit series for sufficiently long time prior to our sample period for any influence of the true initial conditions to be negligible, then an expression analogous to (9.9) shows that yit will be uncorrelated with i provided that xit is uncorrelated with i,
70
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
even if the mean of xit (and hence yit) is time-varying. Moreover we can note that it is perfectly possible for xit to be uncorrelated with i in cases where yit is correlated with i (for example, when (9.8) holds or = 0 but (9.10) is not satisfied). However, given (9.9), it seems very unlikely that yit will be uncorrelated with i in contexts where xit is correlated with i. When both yit and xit are uncorrelated with i, the extra moment conditions for the GMM system estimator are, as before, (7.2), E(uityi, t 1) = 0; for t = 3, . . . , T and E(uitxit) = 0; for t = 2, . . . , T
(9.11)
in the case where xit is strictly exogenous or predetermined; or E(uitxit 1) = 0; for t = 3, . . . , T,
(9.12)
when xit is endogenously determined. Therefore, when for example xit is endogenous, the GMM system estimator is based on the moment conditions (7.1), (9.7), (7.2) and (9.12).
10. TESTS OF OVERIDENTIFYING RESTRICTIONS The standard test for testing the validity of the moment conditions used in the GMM estimation procedure is the Sargan test of overidentifying restrictions (see Sargan (1958) and the development for GMM in Hansen (1982)). For the GMM estimator in the first-differenced model this test statistic is given by Sard =
1 uZdWNZdu N
where WN is the optimal weight matrix as in (3.2) and u are the two-step residuals in the differenced model. In general, under the null that the moment conditions are valid, Sard is asymptotically chi-squared distributed with md k degrees of freedom, where md is the number of moment conditions and k is the number of estimated parameters. For the system estimator, the same test is readily defined. Call this test Sars. A test for the validity of the level moment conditions that are utilised by the system estimator is then obtained as the difference between Sars and Sard: Dif-Sar = Sars Sard
(10.1)
and Dif-Sar is asymptotically chi-squared distributed with ms md degrees of freedom under the null that the level moment conditions are valid.
GMM Estimation in Dynamic Panel Data Models
71
11. MONTE CARLO RESULTS This section illustrates the performance of the various estimators, as discussed above, for a dynamic multivariate panel data model. In particular, the effect of weak instruments and the potential gains from exploiting initial conditions restrictions are investigated. The model specification is yit = yit 1 + xit + i + vit
(11.1)
xit = xit 1 + i + vit + eit
(11.2)
with i ~ N(0, 2); vit ~ N(0, 2v ); eit ~ N(0, 2e ) and the initial observations are drawn from the covariance stationary distribution. Although these errors are homoskedastic, we do not consider any of the additional moment conditions that require homoskedasticity in the simulated estimators. We choose the error process parameters in such a way that the xit process is highly persistent for high values of . Further, xit is positively correlated with i and the value of is negative to mimic the effects of measurement error. The values of the parameters that are kept fixed in the various Monte Carlo simulations presented below are = 1, = 0.25, = 0.1,
2 = 1, 2v = 1, 2e = 0.16. The parameters that are varied in the simulations are the autoregressive coefficients and . We consider four designs with and both taking the values of 0.5 and 0.95. The case when = 0.5 and = 0.95 resembles the production function data that will be analysed in the next section. The sample size is N = 500, and the simulation results for the various estimators are presented in Tables 2 and 3 for T = 4 and in Tables 4 and 5 for T = 8. Means, standard deviations and root mean squared errors (RMSE) from 10,000 simulations are tabulated for the OLS levels estimator (OLS), Within Groups estimator (WG), the GMM first-differenced estimator (DIF), the nonlinear GMM estimator (AS),13 the levels GMM estimator (LEV), and the
0.990 0.583
0.775
0.820
0.762
0.001 0.040 0.053 0.420
0.011 0.320 0.053 0.231
0.017 0.263
St D rmse
0.194
0.300
0.318
0.010
–0.036
Mean
WG
0.032 0.651 0.075 0.809
0.031 0.491 0.080 0.687
0.030 0.538
St D rmse
–0.195
0.350
0.915
0.469
0.496
Mean
DIF
0.487 0.773 0.994 1.554
0.131 0.135 0.420 0.428
0.090 0.091
St D rmse
0.790
0.840
1.006
0.516
0.501
Mean
AS
0.242 0.266 0.524 0.565
0.095 0.096 0.351 0.351
0.075 0.075
St D rmse
1.004
0.980
1.029
0.512
0.502
St D rmse
0.029 0.042 0.289 0.289
0.070 0.070 0.336 0.337
0.059 0.059
LEV Mean
Monte-Carlo results, T = 4, = 0.5, = 1, N = 500
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 2.
1.000
0.979
1.015
0.512
0.500
Mean
SYS
0.033 0.044 0.232 0.232
0.060 0.061 0.257 0.257
0.055 0.055
St D rmse
72 RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
0.962 0.904
0.830
0.650
0.997
0.001 0.012 0.026 0.100
0.014 0.151 0.034 0.174
0.002 0.047
St D rmse
0.465
0.661
0.551
0.089
0.221
Mean
WG
0.026 0.290 0.089 0.543
0.031 0.412 0.090 0.458
0.032 0.729
St D rmse
0.233
0.907
0.517
0.466
0.472
Mean
DIF
0.104 0.112 1.769 1.928
0.103 0.109 1.438 1.522
0.825 0.954
St D rmse
0.863
0.936
1.021
0.500
0.868
Mean
AS
0.072 0.074 0.853 0.864
0.065 0.065 0.461 0.461
0.221 0.235
St D rmse
LEV
1.020
0.957
1.078
0.518
0.961
Mean
Monte-Carlo results, T = 4, = 0.95, = 1, N = 500
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 3.
0.008 0.010 0.091 0.093
0.053 0.056 0.160 0.178
0.144 0.145
St D rmse
1.020
0.956
1.075
0.514
0.953
Mean
SYS
0.010 0.011 0.090 0.092
0.044 0.046 0.153 0.170
0.096 0.096
St D rmse
GMM Estimation in Dynamic Panel Data Models 73
0.990 0.581
0.775
0.820
0.762
0.001 0.040 0.035 0.421
0.007 0.320 0.034 0.228
0.012 0.262
St D rmse
0.388
0.662
0.490
0.311
0.265
Mean
WG
0.016 0.289 0.044 0.613
0.017 0.190 0.045 0.512
0.018 0.236
St D rmse
0.226
0.548
0.930
0.480
0.494
Mean
DIF
0.177 0.440 0.356 0.852
0.040 0.045 0.136 0.153
0.034 0.035
St D rmse
0.972
0.969
0.944
0.497
0.495
Mean
AS
0.030 0.036 0.134 0.137
0.029 0.029 0.134 0.145
0.025 0.026
St D rmse
LEV
0.979
0.982
1.041
0.523
0.503
Mean
Monte Carlo results, T = 8, = 0.5, = 1, N = 500
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 4.
0.007 0.032 0.108 0.110
0.034 0.041 0.157 0.162
0.029 0.029
St D rmse
0.983
0.979
0.997
0.511
0.501
Mean
SYS
0.011 0.031 0.101 0.103
0.027 0.029 0.124 0.124
0.024 0.024
St D rmse
74 RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
0.962 0.902
0.830
0.650
0.997
0.001 0.012 0.017 0.100
0.009 0.150 0.022 0.171
0.001 0.047
St D rmse
0.745
0.882
0.796
0.396
0.591
Mean
WG
0.009 0.068 0.040 0.258
0.015 0.106 0.040 0.208
0.017 0.359
St D rmse
0.615
0.927
0.800
0.480
0.676
Mean
DIF
0.025 0.034 0.400 0.555
0.033 0.039 0.290 0.352
0.222 0.350
St D rmse
1.016
0.956
1.099
0.508
0.903
Mean
AS
0.007 0.009 0.118 0.119
0.024 0.025 0.125 0.159
0.061 0.077
St D rmse
LEV
1.017
0.957
1.084
0.523
0.973
Mean
Monte Carlo results, T = 8, = 0.95, = 1, N = 500
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 5.
0.002 0.007 0.028 0.033
0.022 0.032 0.058 0.101
0.022 0.032
St D rmse
1.019
0.957
1.075
0.518
0.958
Mean
SYS
0.003 0.007 0.031 0.036
0.021 0.028 0.059 0.095
0.031 0.032
St D rmse
GMM Estimation in Dynamic Panel Data Models 75
76
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
system GMM estimator (SYS). Thus for the case of estimating the AR(1) model for xit, DIF uses the moment conditions (3.1); AS uses the moment conditions (3.1) and (5.1); LEV uses the moment conditions (6.4); and SYS uses the moment conditions (3.1) and (6.2). The reported results are for the two-step GMM estimators. Tables 2 and 4 present results for = 0.5. The row labelled ‘’ presents the results for the estimates of in model (11.2), where the various GMM estimators only utilise lagged information on x as instruments, and potential information from the lagged values of y is not used. Our results for the DIF and SYS estimators can therefore be compared to those reported in, for example, Blundell & Bond (1998) and Alonso–Borrego & Arellano (1999). As expected, the OLS estimates are biased upward and the WG estimates are biased downwards. In this experiment where xit is not highly persistent and the instruments available for the equations in first-differences are not weak, all four GMM estimators are virtually unbiased. The AS, LEV and SYS estimators all provide an improvement in precision compared to the standard DIF estimator. As we would expect from the asymptotic variance ratios in Table 1, there is a greater gain in precision from using SYS rather than AS at T = 4, although in Table 4 we can observe that this difference becomes very small at T = 8. The next two rows in Tables 2 and 4 present the estimation results for and in model (11.1) when = 0.5 and = 0.5. The OLS estimates for are biased upwards, whereas those for are biased downwards. The WG estimates for and are both biased downwards. Again, as expected, since both the y and x series have a low degree of persistence, the four GMM estimators perform quite well in this experiment. The SYS estimator has the smallest RMSE for both parameters, but the gains are not dramatic at T = 8. The final two rows in Tables 2 and 4 are for the model with = 0.95 and = 0.5. As this makes the y process highly persistent, the DIF estimator suffers from a serious weak instrument bias, as well as being very imprecise. We can notice that the DIF estimates of and are both biased downwards, in the direction of the Within Groups estimates. The AS estimator is better behaved, as a result of exploiting the non-linear moment conditions (5.1). However the LEV and SYS estimators which exploit the initial conditions restrictions provide more dramatic gains in precision, particularly for the estimation of and particularly in the case with T = 4. With T = 8, the LEV and SYS estimates of are biased upwards, in the direction of the OLS estimate, but still dominate on the RMSE criterion. Tables 3 and 5 present the results for the cases where the xit process is highly persistent, with = 0.95. The estimates for show the familiar pattern: OLS is upward biased, WG is downward biased, and DIF is downward biased towards
GMM Estimation in Dynamic Panel Data Models
77
WG as a result of weak instruments. The AS estimator provides a substantial improvement in both bias and precision. However the LEV and SYS estimators provide more dramatic gains, particularly when T = 4. When = 0.5, the DIF estimator estimates quite well, but the DIF estimate of is very imprecise, biased downwards and on average very similar to the WG estimate of . The AS, LEV and SYS estimates of are all close to the true value. The AS estimates of are much less biased than DIF but still imprecise, particularly at T = 4. The LEV and SYS estimates of show a little finite-sample bias, but again dominate in terms of RMSE. This experiment is intended to capture salient features of the production function data we consider in Section 12, notably a highly persistent explanatory variable that is measured with error, and a significant autoregressive parameter that is not close to one. The simulation results confirm that the system GMM estimator has reasonable properties in this context. When both and are equal to 0.95 the estimators display a similar pattern. One surprise is that the LEV and SYS estimators actually estimate both parameters better than in the experiments with = 0.5, and the gain from using either of these estimators compared to AS is rather more striking in this case. Also the DIF estimator now estimates quite well (though not ); this may be because by increasing whilst keeping the variance of i and vit fixed, we have greatly increased the variance of the yit series. To investigate the size properties of the Sargan test of overidentifying restrictions, we present in Figures 3–12 p-value plots (see Davidson & MacKinnon, 1996) for the Sargan test statistics for the DIF and SYS GMM estimators. We also present the p-value plots for the Dif-Sar statistic as defined in (10.1), testing the validity of the additional levels moment conditions exploited by the SYS estimator. The x-axis of the p-value plots represents the nominal size using the asymptotic critical values of the corresponding chi-squared distributions; the yaxis represents the actual size of the test statistics in the experiments. Figures 3–6 are the p-value plots for the Sargan tests for the GMM estimators in the univariate model for xit, (11.2). When = 0.5, the distributions of the test statistics are all very close to the asymptotic distribution, with a slight over-rejection when T = 8. When the series are persistent, = 0.95, the tests over-reject, especially for larger T, with the Dif-Sar test having the largest size distortion when T = 4. Figures 7–14 present the p-value plots for the Sargan test statistics for the multivariate dynamic panel data model (11.1). These appear to be well behaved in the case with = 0.5 and = 0.5. In general, the Dif-Sar test is oversized when either y or x or both are persistent. An interesting case is when = 0.5,
78
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
= 0.95 and T = 8. The Sars and Dif-Sar tests are considerably oversized in this case, whereas the Sard test has the correct size.
Fig. 3.
p-value plot, = 0.5, T = 4.
Fig. 4.
p-value plot, = 0.95, T = 4.
GMM Estimation in Dynamic Panel Data Models
Fig. 5.
p-value plot, = 0.5, T = 8.
Fig. 6.
p-value plot, = 0.95, T = 8.
79
80
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
Fig. 7.
= 0.5, = 0.5, T = 4.
Fig. 8.
= 0.5, = 0.95, T = 4.
GMM Estimation in Dynamic Panel Data Models
Fig. 9.
= 0.5, = 0.5, T = 8.
Fig. 10.
= 0.5, = 0.95, T = 8.
81
82
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
Fig. 11.
= 0.95, = 0.5, T = 4.
Fig. 12.
= 0.95, = 0.95, T = 4.
GMM Estimation in Dynamic Panel Data Models
Fig. 13.
= 0.95, = 0.5, T = 8.
Fig. 14.
= 0.95, = 0.95, T = 8.
83
84
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
12. AN APPLICATION: THE COBB–DOUGLAS PRODUCTION FUNCTION As Griliches and Mairesse (1998) have argued, the estimation of production functions has highlighted the poor performance of standard GMM estimators for short panels. Here we use the problem of estimating production function parameters to evaluate the practical significance of the alternative estimators reviewed in this chapter. In particular attention is focused on the estimation of the Cobb–Douglas production function yit =nnit + kkit + t + (i + vit + mit) vit =vi, t 1 + eit
|| < 1
eit, mit ~MA(0),
(12.1)
where yit is log sales of firm i in year t, nit is log employment, kit is log capital stock and t is a year-specific intercept reflecting, for example, a common technology shock. Of the error components, i is an unobserved time-invariant firm-specific effect, vit is a possibly autoregressive (productivity) shock and mit reflects serially uncorrelated (measurement) errors. Constant returns to scale would imply n + k = 1, but this is not necessarily imposed. Interest is in the consistent estimation of the parameters (n, k, ) when the number of firms (N) is large and the number of years (T) is fixed. We maintain that both employment (nit) and capital (kit) are potentially correlated with the firm-specific effects (i), and with both productivity shocks (eit) and measurement errors (mit). The model has a dynamic (common factor) representation yit = nnit nni, t 1 + kkit kki, t 1 + yi, t 1 + (t t 1) + (i(1 ) + eit + mit mi, t 1)
(12.2)
or yit = 1nit + 2ni, t 1 + 3kit + 4ki, t 1 + 5yi, t 1 + *t + (*i + wit)
(12.3)
subject to two non-linear (common factor) restrictions 2 = 1 5 and
4 = 3 5. Given consistent estimates of the unrestricted parameter vector
= ( 1, 2, 3, 4, 5) and var( ), these restrictions can be (tested and) imposed using minimum distance to obtain the restricted parameter vector (n, k, ). Notice that wit = eit ~ MA(0) if there are no measurement errors (var(mit) = 0), and wit ~ MA(1) otherwise.
GMM Estimation in Dynamic Panel Data Models
85
12.1. Data and Results The data used is a balanced panel of 509 R&D-performing U.S. manufacturing companies observed for 8 years, 1982–89. These data were kindly made available to us by Bronwyn Hall, and are similar to those used in Mairesse & Hall (1996), although the sample of 509 firms used here is larger than the final sample of 442 firms used in Mairesse & Hall (1996). Capital stock and employment are measured at the end of the firm’s accounting year, and sales is used as a proxy for output. Further details of the data construction can be found in Mairesse & Hall (1996). Table 6 reports results for the basic production function, not imposing constant returns to scale, for a range of estimators. We report results for both the unrestricted model (12.3) and the restricted model (12.1), where the common factor restrictions are tested and imposed using minimum distance.14 We report results here for the one-step GMM estimators, for which inference based on the asymptotic variance matrix has been found to be more reliable than for the (asymptotically) more efficient two-step estimator. Simulations suggest that the loss in precision that results from not using the optimal weight matrix is unlikely to be large (cf. Blundell & Bond, 1998). As expected in the presence of firm-specific effects, OLS levels appears to give an upwards-biased estimate of the coefficient on the lagged dependent variable, whilst Within Groups appears to give a downwards-biased estimate of this coefficient. Note that even using OLS, we reject the hypothesis that = 1, and even using Within Groups we reject the hypothesis that = 0. Although the pattern of signs on current and lagged regressors in the unrestricted models are consistent with the AR(1) error-component specification, the common factor restrictions are rejected for both these estimators. They also reject constant returns to scale.15 The validity of lagged levels dated t 2 as instruments in the firstdifferenced equations is clearly rejected by the Sargan test of overidentifying restrictions. This is consistent with the presence of measurement errors. Instruments dated t 3 (and earlier) are accepted, and the test of common factor restrictions is easily passed in these first-differenced GMM results. However the estimated coefficient on the lagged dependent variable is barely higher than the Within Groups estimate. Indeed the differenced GMM parameter estimates are all very close to the Within Groups results. The estimate of k is low and statistically weak, and the constant returns to scale restriction is rejected. The validity of lagged levels dated t 3 (and earlier) as instruments in the first-differenced equations, combined with lagged first-differences dated t 2
86
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
Table 6.
Production Function Estimates
OLS Levels
Within Groups
DIF t–2
DIF t–3
SYS t–2
SYS t–3
0.479 (0.029) –0.423 (0.031) 0.235 (0.035) –0.212 (0.035) 0.922 (0.011)
0.488 (0.030) –0.023 (0.034) 0.177 (0.034) –0.131 (0.025) 0.404 (0.029)
0.513 (0.089) 0.073 (0.093) 0.132 (0.118) –0.207 (0.095) 0.326 (0.052)
0.499 (0.101) –0.147 (0.113) 0.194 (0.154) –0.105 (0.110) 0.426 (0.079)
0.629 (0.106) –0.092 (0.108) 0.361 (0.129) –0.326 (0.104) 0.462 (0.051)
0.472 (0.112) –0.278 (0.120) 0.398 (0.152) –0.209 (0.119) 0.602 (0.098)
m1 m2 Sar Dif-Sar
–2.60 –2.06 — —
–8.89 –1.09 — —
–6.21 –1.36 0.001 —
–4.84 –0.69 0.073 —
–8.14 –0.59 0.000 0.001
–6.53 –0.35 0.032 0.102
n
0.538 (0.025) 0.266 (0.032) 0.964 (0.006)
0.488 (0.030) 0.199 (0.033) 0.512 (0.022)
0.583 (0.085) 0.062 (0.079) 0.377 (0.049)
0.515 (0.099) 0.225 (0.126) 0.448 (0.073)
0.773 (0.093) 0.231 (0.075) 0.509 (0.048)
0.479 (0.098) 0.492 (0.074) 0.565 (0.078)
0.000 0.000
0.000 0.000
0.014 0.000
0.711 0.006
0.012 0.922
0.772 0.641
nt nt–1 kt kt–1 yt–1
k
Comfac CRS
Asymptotic standard errors in parentheses. Year dummies included in all models. m1 and m2 are tests for first- and second-order serial correlation, asymptotically N(0, 1). We test the levels residuals for OLS levels, and the first-differenced residuals in all other columns. Comfac is a minimum distance test of the non-linear common factor restrictions imposed in the restricted models. P-values are reported (also for Sar and Dif-Sar). CRS is a Wald test of the constant resturns to scale hypothesis n + k = 1 in the restricted models. P-values are reported. Source: Blundell & Bond (2000). For the one-step GMM estimators, ‘t s’ indicates that levels of the three series (y, n, k) dated t s and all observed longer lags are used as instruments for the first-differenced equations. SYS estimators use lagged differences of the three series dated t s + 1 as instruments for the levels equations.
as instruments in the levels equations, appears to be marginal in the system GMM estimator. However we have seen that these tests do have some tendency to overreject in samples of this size. Moreover the Dif-Sar statistic that
GMM Estimation in Dynamic Panel Data Models
87
specifically tests the additional moment conditions used in the levels equations accepts their validity at the 10% level. The system GMM parameter estimates appear to be reasonable. The estimated coefficient on the lagged dependent variable is higher than the Within Groups estimate, but well below the OLS levels estimate. The common factor restrictions are easily accepted, and the estimate of k is both higher and better determined than the differenced GMM estimate. The constant returns to scale restriction is easily accepted in the system GMM results.16 Blundell & Bond (2000) explore this data in more detail and conclude that the system GMM estimates in the final column of Table 6 are their preferred results. In particular they find that the individual series used here are highly persistent, and that the instruments available for the first-differenced equations are only weakly correlated with the explanatory variables in first-differences. This is consistent with the similarity between the first-differenced GMM and Within Groups results. Blundell & Bond (2000) also find that when constant returns to scale is imposed on the production function – it is not rejected in the preferred system GMM results – then the results obtained using the firstdifferenced GMM estimator become more similar to the system GMM estimates.
13. SUMMARY AND CONCLUSIONS The aim of this chapter has been to review developments in the recent literature which have tried to improve on the poor performance of the standard firstdifferenced GMM estimator for highly autoregressive panel series by using additional moment conditions. In particular, we discuss the use of the ‘system’ GMM estimator that relies on relatively mild restrictions on the initial conditions process. This system GMM estimator encompasses the GMM estimator based on the non-linear moment conditions available in the dynamic error components model and has substantial asymptotic efficiency gains relative to this non-linear GMM estimator. The chapter systematically sets out the assumptions required and moment conditions used by each estimator and provides a Monte Carlo simulation comparison as well as an application to production function estimation. The simulation results are the first in the literature to consider the properties of these GMM estimators in dynamic models with endogenous regressors. Our analysis suggests that similar issues arise in this case to those that have been found in previous Monte Carlo studies for the AR(1) model. In particular, we find both a large bias and very low precision for the standard first-differenced estimator when the individual series are highly persistent. By exploiting
88
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
instruments available for the equations in levels, the system GMM estimator can both greatly improve the precision and greatly reduce the finite sample bias when these additional moment conditions are valid. Intermediate results are found for the non-linear GMM estimator considered, which suggests that this estimator could also be useful in applications with persistent series where the validity of the initial conditions restrictions required for the system GMM estimator are rejected. The empirical application uses company accounts data for the US to estimate a simple Cobb-Douglas production function. For the standard GMM estimator that uses moment conditions only for the first-differenced equations, we confirm the problems noted by Griliches and Mairesse: the estimated coefficient on capital is very low, all coefficient estimates are imprecise, and constant returns to scale is easily rejected. We notice that the first-differenced GMM results are similar to the Within Groups results, which suggests there may be a problem of weak instruments. This suggestion is consistent with the persistence of the underlying sales, employment and capital stock series. The additional moment conditions used by the system GMM estimator are not rejected in this context, and lead to a marked improvement in the empirical results. Taken together, these Monte Carlo and empirical results suggest that careful consideration of the underlying series and comparisons between different panel data estimators can be useful in detecting situations where the standard firstdifferenced GMM estimator is likely to be subject to serious weak instruments biases. Where appropriate, the use of the system GMM estimator offers a simple and powerful alternative, that can overcome many of the disappointing features of the standard first-differenced GMM estimator in the context of highly persistent series.
ACKNOWLEDGMENTS This research is part of the programme of research at the ESRC Centre for the Micro-Economic Analysis of Fiscal Policy at IFS. Financial support from the ESRC is gratefully acknowledged.
NOTES 1. All of the estimators discussed and their properties extend in an obvious fashion to higher order autoregressive models. 2. Extensions to dynamic models with additional regressors are considered in Section 9.
GMM Estimation in Dynamic Panel Data Models
89
3. With T = 3, the absence of serial correlation in vit (2.5) and predetermined initial conditions (2.6) are required to identify (in the absence of any strictly exogenous instruments). With T > 3, can be identified in the presence of suitably low order moving average autocorrelation in vit. 4. These estimators are all based on the normalisation (2.3). Alonso-Borrego & Arellano (1999) consider a symmetrically normalised instrumental variable estimator based on the normalisation invariance of the standard LIML estimator. 5. As a choice of WN to yield the initial consistent estimator, Arellano & Bond (1991) suggest WN =
1 N
N
1
ZdiHdZdi
i=1
where Hd is the (T 2) (T 2) matrix given by
Hd =
2 –1 0 ... 0
–1 2 –1 ... 0
0 –1 2 ... 0
... ... ... ... ...
0 0 0 . 2
which can be calculated in one step. The use of this Hd matrix accounts for the firstorder moving average structure in uit induced by the first-differencing transformation. Note that when the vit are i.i.d., the one-step and two-step estimators are asymptotically equivalent in this model. We follow this suggestion in the Monte Carlo simulations in Section 11. 6. As shown by Arellano & Bover (1995), OLS on the model transformed to orthogonal deviations coincides with the Within Groups estimator. 7. In this section we focus only on moment conditions that are valid under heteroskedasticity. The case with homoskedasticity and assumption (6.1) is considered in Section 7.2. 8. This corrects the expression for plim ˆ l as given in Blundell and Bond (1998, p. 125). 9. As a choice of WN to yield the initial consistent estimator, we use WN =
1 N
N
1
ZliZli
i=1
in the Monte Carlo simulations reported below. 10. The use of moment conditions E(uityi, t s) = 0 for s > 1 can be shown to be redundant, given (7.1) and (7.2). For balanced panels, the T 2 equations in levels may be replaced by a single levels equation for period T, with (7.2) replaced by the equivalent moment conditions E(uiTyi, T s) = 0 for s = 1, . . . , T 2. However this approach does not extend easily to the case of unbalanced panels. 11. For an analysis of the potential loss in efficiency due to specific choices of the initial weight matrix for these system estimators, see Windmeijer (2000). As a choice of WN to yield the initial consistent estimator, we use
90
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
1 WN = N
N
1
ZsiHsZsi
i=1
in our Monte Carlo simulations, where Hs is the matrix
Hd 0
0
IT 2
,
IT 2 is the (T 2) identity matrix and Hd is defined in Section 3. 12. Here we only consider moment conditions that do not require any homoskedasticity assumptions. 13. Define si = [ui3 ui2, . . . , uiT uiT 1, ui4(ui3 ui2), . . . , uiT(uiT 1 uiT 2)] and Znli =
Zdi 0
0 , then the non-linear moment conditions can be written as IT 3
N
1 1 , see Meghir & E[Znlisi] = 0. As an initial weight matrix we use WN = ZnliZnli N i=1 Windmeijer (1999). 14. The unrestricted results are computed using DPD98 for GAUSS (see Arellano & Bond, 1998). 15. The table reports p-values from minimum distance tests of the common factor restrictions and Wald tests of the constant returns to scale restrictions. 16. One puzzle is that we find little evidence of second-order serial correlation in the first-differenced residuals (i.e. an MA(1) component in the error term in levels), although the use of instruments dated t 2 is strongly rejected. It may be that the eit productivity shocks are also MA(1), in a way that happens to offset the appearance of serial correlation that would otherwise result from measurement errors.
REFERENCES Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 5–28. Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalised Instrumental-Variable Estimation using Panel Data. Journal of Business and Economic Statistics, 17, 36–49. Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components. Journal of the American Statistical Association, 76, 598–606. Arellano, M., & Bond, S. R. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. Review of Economic Studies, 58, 277–297. Arellano, M., & Bond, S. R. (1998). Dynamic Panel Data Estimation using DPD98 for GAUSS. http://www.ifs.org.uk/staff/steve_b.shtml. Arellano, M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 29–52. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley.
GMM Estimation in Dynamic Panel Data Models
91
Bhagarva, A., & Sargan, J. D. (1983). Estimating Dynamic Random Effects Models from Panel Data Covering Short Time Periods. Econometrica, 51, 1635–1659. Blundell, R. W., & Bond, S. R. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115–143. Blundell, R. W., & Bond, S. (2000). GMM Estimation with Persistent Panel Data: An Application to Production Functions. Econometric Reviews, 19(3), 321–340. Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak. Journal of the American Statistical Association, 90, 443–450. Chamberlain, G. (1987). Asymptotic Efficiency in Estimation with Conditional Moment Restrictions. Journal of Econometrics, 34, 305–334. Davidson, R., & MacKinnon, J. G. (1996). Graphical Methods for Investigating the Size and Power of Hypothesis Tests. Manchester School, 66, 1–26. Griliches, Z., & Mairesse, J. (1998). Production Functions: the Search for Identification. In: S. Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series, Cambridge: Cambridge University Press. Hansen, L. P. (1982). Large Sample Properties of Generalised Method of Moment Estimators. Econometrica, 50, 1029–1054. Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating Vector Autoregressions with Panel Data. Econometrica, 56, 1371–1396. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Mairesse, J., & Hall, B. H. (1996). Estimating the Productivity of Research and Development in French and US Manufacturing Firms: An Exploration of Simultaneity Issues with GMM Methods. In: K. Wagner & B. Van Ark (Eds), International Productivity Differences and, Their Explanations (pp. 285–315). Elsevier Science. Meghir, C., & Windmeijer, F. (1999). Moment Conditions for Dynamic Panel Data Models with Multiplicative Individual Effects in the Conditional Variance. Annales d’Économie et de Statistique, 55/56, 317–330. Nelson, C. R., & Startz, R. (1990a). Some Further Results on the Exact Small Sample Properties of the Instrumental Variable Estimator. Econometrica, 58, 967–976. Nelson, C. R., & Startz, R. (1990b). The Distribution of the Instrumental Variable Estimator and Its t-ratio When the Instrument is A Poor One. Journal of Business and Economic Statistics, 63, 5125–5140. Nickell, S. J. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 1417–1426. Sargan, J. D. (1958). The Estimation of Economic Relationships Using Instrumental Variables. Econometrica, 26, 329–338. Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments. Econometrica, 65, 557–586. Windmeijer, F. (2000). Efficiency Comparisons for a System GMM Estimator in Dynamic Panel Data Models. In: R. D. H. Heijmans, D. S. G. Pollock & A. Satorra (Eds), Innovations in Multivariate Statistical Analysis. A Festschrift for Heinz Neudecker (pp. 175–184). Kluwer Academic Publishers.
FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS Peter Pedroni ABSTRACT This chapter uses fully modified OLS principles to develop new methods for estimating and testing hypotheses for cointegrating vectors in dynamic panels in a manner that is consistent with the degree of cross sectional heterogeneity that has been permitted in recent panel unit root and panel cointegration studies. The asymptotic properties of various estimators are compared based on pooling along the ‘within’ and ‘between’ dimensions of the panel. By using Monte Carlo simulations to study the small sample properties, the group mean estimator is shown to behave well even in relatively small samples under a variety of scenarios.
I. INTRODUCTION In this chapter we develop methods for estimating and testing hypotheses for cointegrating vectors in dynamic time series panels. In particular we propose methods based on fully modified OLS principles which are able to accommodate considerable heterogeneity across individual members of the panel. Indeed, one important advantage to working with a cointegrated panel approach of this type is that it allows researchers to selectively pool the long run information contained in the panel while permitting the short run dynamics Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 93–130. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
93
94
PETER PEDRONI
and fixed effects to be heterogeneous among different members of the panel. An important convenience of the fully modified approach that we propose here is that in addition to producing asymptotically unbiased estimators, it also produces nuisance parameter free standard normal distributions. In this way, inferences can be made regarding common long run relationships which are asymptotically invariant to the considerable degree of short run heterogeneity that is prevalent in the dynamics typically associated with panels that are composed of aggregate national data. A. Nonstationary Panels and Heterogeneity Methods for nonstationary time series panels, including unit root and cointegration tests, have been gaining increased acceptance in a number of areas of empirical research. Early examples include Canzoneri, Cumby & Diba (1996), Chinn & Johnson (1996), Chinn (1997), Evans & Karras (1996), Neusser & Kugler (1998), Obstfeld & Taylor (1996), Oh (1996), Papell (1997), Pedroni (1996b), Taylor (1996) and Wu (1996), with many more since. These studies have for the most part been limited to applications which simply ask whether or not particular series appear to contain unit roots or are cointegrated. In many applications, however, it is also of interest to ask whether or not common cointegrating vectors take on particular values. In this case, it would be helpful to have a technique that allows one to test such hypothesis about the cointegrating vectors in a manner that is consistent with the very general degree of cross sectional heterogeneity that is permitted in such panel unit root and panel cointegration tests. In general, the extension of conventional nonstationary methods such as unit root and cointegration tests to panels with both cross section and time series dimensions holds considerable promise for empirical research considering the abundance of data which is available in this form. In particular, such methods provide an opportunity for researchers to exploit some of the attractive theoretical properties of nonstationary regressions while addressing in a natural and direct manner the small sample problems that have in the past often hindered the practical success of these methods. For example, it is well known that superconsistent rates of convergence associated with many of these methods can provide empirical researchers with an opportunity to circumvent more traditional exogeneity requirements in time series regressions. Yet the low power of many of the associated statistics has often impeded the ability to take full advantage of these properties in small samples. By allowing data to be pooled in the cross sectional dimension, nonstationary panel methods have the potential to improve upon these small sample limitations. Conversely, the use
Fully Modified OLS for Heterogeneous Cointegrated Panels
95
of nonstationary time series asymptotics provides an opportunity to make panel methods more amenable to pooling aggregate level data by allowing researchers to selectively pool the long run information contained in the panel, while allowing the short run dynamics to be heterogeneous among different members of the panel. Initial methodological work on nonstationary panels focused on testing for unit roots in univariate panels. Quah (1994) derived standard normal asymptotic distributions for testing unit roots in homogeneous panels as both the time series and cross sectional dimensions grow large. Levin & Lin (1993) derived distributions under more general conditions that allow for heterogeneous fixed effects and time trends. More recently, Im, Pesaran & Shin (1995) study the small sample properties of unit root tests in panels with heterogeneous dynamics and propose alternative tests based on group mean statistics. In practice however, empirical work often involves relationships within multivariate systems. Toward this end, Pedroni (1993, 1995) studies the properties of spurious regressions and residual based tests for the null of no cointegration in dynamic heterogeneous panels. This chapter continues this line of research by proposing a convenient method for estimating and testing hypotheses about common cointegrating vectors in a manner that is consistent with the degree of heterogeneity permitted in these panel unit root and panel cointegration studies. In particular, we address here two key sources of cross member heterogeneity that are particularly important in dealing with dynamic cointegrated panels. One such source of heterogeneity manifests itself in the familiar fixed effects form. These reflect differences in mean levels among the variables of different individual members of the panel and we model these by including individual specific intercepts. The second key source of heterogeneity in such panels comes from differences in the way that individuals respond to short run deviations from equilibrium cointegrating vectors that develop in response to stochastic disturbances. In keeping with earlier panel unit root and panel cointegration papers, we model this form of heterogeneity by allowing the associated serial correlation properties of the error processes to vary across individual members of the panel. B. Related Literature Since the original version of this paper, Pedroni (1996a),1 many more papers have contributed to our understanding of hypothesis testing in cointegrating panels. For example, Kao & Chiang (1997) extended their original paper on the least squares dummy variable model in cointegrated panels, Kao & Chen
96
PETER PEDRONI
(1995), to include a comparison of the small sample properties of a dynamic OLS estimator with other estimators including a FMOLS estimator similar to Pedroni (1996a). Specifically, Kao & Chiang (1997) demonstrated that a panel dynamic OLS estimator has the same asymptotic distribution as the type of panel FMOLS estimator derived in Pedroni (1996a) and showed that the small sample size distortions for such an estimator were often smaller than certain forms of the panel FMOLS estimator. The asymptotic theory in these earlier papers were generally based on sequential limit arguments (allowing the sample sizes T and N to grow large sequentially), whereas Phillips & Moon (1999) subsequently provided a rigorous and more general study of the limit theory in nonstationary panel regressions under joint convergence (allowing T and N to grow large concurrently). Phillips & Moon (1999) also provided a set of regularity conditions under which convergence in sequential limits implies convergence in joint limits, and considered these properties in the context of a FMOLS estimator, although they do not specifically address the small sample properties of feasible versions of the estimators. More recently, Mark & Sul (1999) also study a similar form of the panel dynamic OLS estimator first proposed by Kao & Chiang (1997). They compare the small sample properties of a weighted versus unweighted version of the estimator and find that the unweighted version generally exhibits smaller size distortion than the weighted version. In this chapter we report new small sample results for the group mean panel FMOLS estimator that was originally proposed in Pedroni (1996a). An advantage of the group mean estimator over the other pooled panel FMOLS estimators proposed in the Pedroni (1996a) is that the t-statistic for this estimator allows for a more flexible alternative hypothesis. This is because the group mean estimator is based on the so called ‘between dimension’ of the panel, while the pooled estimators are based on the ‘within dimension’ of the panel. Accordingly, the group mean panel FMOLS provides a consistent test of a common value for the cointegrating vector under the null hypothesis against values of the cointegrating vector that need not be common under the alternative hypothesis, while the pooled within dimension estimators do not. Furthermore, as Pesaran & Smith (1995) argue in the context of OLS regressions, when the true slope coefficients are heterogeneous, group mean estimators provide consistent point estimates of the sample mean of the heterogeneous cointegrating vectors, while pooled within dimension estimators do not. Rather, as Phillips & Moon (1999) demonstrate, when the true cointegrating vectors are heterogeneous, pooled within dimension estimators provide consistent point estimates of the average regression coefficient, not the
Fully Modified OLS for Heterogeneous Cointegrated Panels
97
sample mean of the cointegrating vectors. Both of these features of the group mean estimator are often important in practical applications. Finally, the implementation of the feasible form of the between dimension group mean estimator also has advantages over the other estimators in the presence of heterogeneity of the residual dynamics around the cointegrating vector. As was demonstrated in Pedroni (1996a), in the presence of such heterogeneity, the pooled panel FMOLS estimator requires a correction term that depends on the true cointegrating vector. For a specific null value for a cointegrating vector, the t-statistic is well defined, but of course this is of little use per se when one would like to estimate the cointegrating vector. One solution is to obtain a preliminary estimate of the cointegrating vector using OLS. However, although the OLS estimator is superconsistent, it still contains a second order bias in the presence of endogeneity, which is not eliminated asymptotically. Accordingly, this bias leads to size distortion, which is not necessarily eliminated even when the sample size grows large in the panel dimension. Consequently, this type of approach based on a first stage OLS estimate was not recommended in Pedroni (1996a), and it is not surprising that Monte Carlo simulations have shown large size distortions for such estimators. Even when the null hypothesis was imposed without using an OLS estimator, the size distortions for this type of estimator were large as reported in Pedroni (1996a). Similarly, Kao & Chiang (1997) also found large size distortions for such estimators when OLS estimates were used in the first stage for the correction term. By contrast, the feasible version of the between dimension group mean based estimator does not suffer from these difficulties, even in the presence of heterogeneous dynamics. As we will see, the size distortions for this estimator are minimal, even in panels of relatively modest dimensions. The remainder of the chapter is structured as follows. In Section 2, we introduce the econometric models of interest for heterogeneous cointegrated panels. We then present a number of theoretical results for estimators designed to be asymptotically unbiased and to provide nuisance parameter free asymptotic distributions which are standard normal when applied to heterogeneous cointegrated panels and can be used to test hypotheses regarding common cointegrating vectors in such panels. In Section 3 we study the small sample properties of these estimators and propose feasible FMOLS statistics that perform relatively well in realistic panels with heterogeneous dynamics. In Section 4 we enumerate the algorithm used to construct these statistics and briefly describe a few examples of their uses. Finally, in Section 5 we offer conclusions and discuss a number of related issues in the ongoing research on estimation and inference in cointegrated panels.
98
PETER PEDRONI
II. ASYMPTOTIC RESULTS FOR FULLY MODIFIED OLS IN HETEROGENEOUS COINTEGRATED PANELS In this section we study asymptotic properties of cointegrating regressions in dynamic panels with common cointegrating vectors and suggest how a fully modified OLS estimator can be constructed to deal with complications introduced by the presence of parameter heterogeneity in the dynamics and fixed effects across individual members. We begin, however, by discussing the basic form of a cointegrating regression in such panels and the problems associated with unmodified OLS estimators. A. Cointegrating Regressions in Heterogeneous Panels Consider the following cointegrated system for a panel of i = 1, . . . , N members, yit = i + xit + it xit = xit–1 + it
(1)
where the vector error process it = (it, it) is stationary with asymptotic covariance matrix i. Thus, the variables xi, yi are said to cointegrate for each member of the panel, with cointegrating vector if yit is integrated of order one. The term i allows the cointegrating relationship to include member specific fixed effects. In keeping with the cointegration literature, we do not require exogeneity of the regressors. As usual, xi can in general be an m dimensional vector of regressors, which are not cointegrated with each other. In this case, we partition it = (it, it) so that the first element is a scalar series and the second element is an m dimensional vector of the differences in the regressors it = xit xit–1 = xit, so that when we construct i =
11i 21i
21i 22i
(2)
then 11i is the scalar long run variance of the residual it, and 22i is the m m long run covariance among the it, and 21i is an m 1 vector that gives the long run covariance between the residual it and each of the it. However, for simplicity and convenience of notation, we will refer to xi as univariate in the remainder of this chapter. Each of the results of this study generalize in an obvious and straightforward manner to the vector case, unless otherwise indicated.2
Fully Modified OLS for Heterogeneous Cointegrated Panels
99
In order to explore the asymptotic properties of estimators as both the cross sectional dimension, N, and the time series dimension, T, grow large, we will make assumptions similar in spirit to Pedroni (1995) regarding the degree of dependency across both these dimensions. In particular, for the time series dimension, we will assume that the conditions of the multivariate functional central limit theorems used in Phillips & Durlauf (1986) and Park & Phillips (1988), hold for each member of the panel as the time series dimension grows large. Thus, we have Assumption 1.1 (invariance principle): The process it satisfies a multivariate functional central limit theorem such that the convergence as T → for the
[Tr]
1
it → Bi(r, i) holds for any given member, i, of the panel, T t = 1 where Bi(r, i) is Brownian motion defined over the real interval r[0,1], with asymptotic covariance i. This assumption indicates that the multivariate functional central limit theorem, or invariance principle, holds over time for any given member of the panel. This places very little restriction on the temporal dependency and heterogeneity of the error process, and encompasses for example a broad class of stationary ARMA processes. It also allows the serial correlation structure to be different for individual members of the panel. Specifically, the asymptotic covariance matrix, i varies across individual members, and is given by i limT → E[T –1(Tt = 1it)(Tt = 1it)], which can also be decomposed as i = oi + i + i, where oi is the contemporaneous covariance and i is a weighted sum of autocovariances. The off-diagonal terms of these individual 21i matrices capture the endogenous feedback effect between yit and xit, which is also permitted to vary across individual members of the panel. For several of the estimators that we propose, it will be convenient to work with a triangularization of this asymptotic covariance matrix. Specifically, we will refer to this lower triangular matrix of i as Li, whose elements are related as follows 1/2 (3) L11i = (11i 221i/22i)1/2, L12i = 0, L21i = 21i /1/2 22i, L22i = 22i Estimation of the asymptotic covariance matrix can be based on any one of a number of consistent kernel estimators such as the Newey & West (1987) estimator. Next, for the cross sectional dimension, we will employ the standard panel data assumption of independence. Hence we have: Assumption 1.2 (cross sectional independence): The individual processes are assumed to be independent cross sectionally, so that E[it, jt] = 0 for all i ≠ j. partial sum
100
PETER PEDRONI
More generally, the asymptotic covariance matrix for a panel of dimension N T is block diagonal with the ith diagonal block given by the asymptotic covariance for member i. This type of assumption is typical of our panel data approach, and we will be using this condition in the formal derivation of the asymptotic distribution of our panel cointegration statistics. For panels that exhibit common disturbances that are shared across individual members, it will be convenient to capture this form of cross sectional dependency by the use of a common time dummy, which is a fairly standard panel data technique. For panels with even richer cross sectional dependencies, one might think of estimating a full non-diagonal N N matrix of ij elements, and then premultiplying the errors by this matrix in order to achieve cross sectional independence. This would require the time series dimension to grow much more quickly than the cross sectional dimension, and in most cases one hopes that a common time dummy will suffice. While the derivation of most of the asymptotic results of this chapter are relegated to the mathematical appendix, it is worth discussing briefly here how we intend to make use of assumptions 1.1 and 1.2 in providing asymptotic distributions for the panel statistics that we consider in the next two subsections. In particular, we will employ here simple and somewhat informal sequential limit arguments by first evaluating the limits as the T dimension grows large for each member of the panel in accordance with assumption 1.1 and then evaluating the sums of these statistics as the N dimension grows large under the independence assumption of 1.2.3 In this manner, as N grows large we obtain standard distributions as we average the random functionals for each member that are obtained in the initial step as a consequence of letting T grow large. Consequently, we view the restriction that first T → and then N → as a relatively strong restriction that ensures these conditions, and it is possible that in many circumstances a weaker set of restrictions that allow N and T to grow large concurrently, but with restrictions on the relative rates of growth might deliver similar results. In general, for heterogeneous error processes, such restrictions on the rate of growth of N relative to T can be expected to depend in part on the rate of convergence of the particular kernel estimators used to eliminate the nuisance parameters, and we can expect that our iterative T → and then N → requirements proxy for the fact that in practice our asymptotic approximations will be more accurate in panels with relatively large T dimensions as compared to the N dimension. Alternatively, under a more pragmatic interpretation, one can simply think of letting T → for fixed N reflect the fact that typically for the panels in which we are interested, it is the
Fully Modified OLS for Heterogeneous Cointegrated Panels
101
time series dimension which can be expected to grow in actuality rather than the cross sectional dimension, which is in practice fixed. Thus, T → is in a sense the true asymptotic feature in which we are interested, and this leads to statistics which are characterized as sums of i.i.d. Brownian motion functionals. For practical purposes, however, we would like to be able to characterize these statistics for the general case in which N is large, and in this case we take N → as a convenient benchmark for which to characterize the distribution, provided that we understand T → to be the dominant asymptotic feature of the data. B. Asymptotic Properties of Panel OLS Next, we consider the properties of a number of statistics that might be used for a cointegrated panel as described by (1) under assumptions 1.1 and 1.2 regarding the time series and cross dimensional dependencies in the data. The first statistic that we examine is a standard panel OLS estimator of the cointegrating relationship. It is well known that the conventional single equation OLS estimator for the cointegrating vector is asymptotically biased and that its standardized distribution is dependent on nuisance parameters associated with the serial correlation structure of the data, and there is no reason to believe that this would be otherwise for the panel OLS estimator. The following proposition confirms this suspicion.4 Proposition 1.1 (Asymptotic Bias of the Panel OLS Estimator). Consider a standard panel OLS estimator for the coefficient of panel (1), under assumptions 1.1 and 1.2, given as
N
ˆ NT =
i=1
T
t=1
(xit x¯ i)
–1
N
T
i=1
t=1
2
(xit x¯i)(yit y¯ i)
where x¯ i and y¯ i refer to the individual specific means. Then, (a) The estimator is asymptotically biased and its asymptotic distribution will be dependent on nuisance parameters associated with the dynamics of the underlying processes. (b) Only for the special case in which the regressors are strictly exogenous and the dynamics are homogeneous across members of the panel can valid inferences be made from the standardized distribution of ˆ NT or its associated t-statistic. As the proof of proposition 1.1 given in the appendix makes clear, the source of the problem stems from the endogeneity of the regressors under the usual
102
PETER PEDRONI
assumptions regarding cointegrated systems. While an exogeneity assumption is common in many treatments of cross sectional panels, for dynamic cointegrated panels such strict exogeneity is by most standards not acceptable. It is stronger than the standard exogeneity assumption for static panels, as it implies the absence of any dynamic feedback from the regressors at all frequencies. Clearly, the problem of asymptotic bias and data dependency from the endogenous feedback effect can no less be expected to diminish in the context of such panels, and Kao & Chen (1995) document this bias for a panel of cointegrated time series for the special case in which the dynamics are homogeneous. For the conventional time series case, a number of methods have been devised to deal with the consequences of such endogenous feedback effects, and in what follows we develop an approach for cointegrated panels based on fully modified OLS principles similar in spirit to those used by Phillips & Hanson (1990). C. Pooled Fully Modified OLS Estimators for Heterogeneous Panels Phillips & Hansen (1990) proposed a semi-parametric correction to the OLS estimator which eliminates the second order bias induced by the endogeneity of the regressors. The same principle can also be applied to the panel OLS estimator that we have explored in the previous subsection. The key difference in constructing our estimator for the panel data case will be to account for the heterogeneity that is present in the fixed effects as well as in the short run dynamics. These features lead us to modify the form of the standard single equation fully modified OLS estimator. We will also find that the presence of fixed effects has the potential to alter the asymptotic distributions in a nontrivial manner. The following proposition establishes an important preliminary result which facilitates intuition for the role of heterogeneity and the consequences of dealing with both temporal and cross sectional dimensions for fully modified OLS estimators. Proposition 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS Estimator). Consider a panel FMOLS estimator for the coefficient of panel (1) given by
N
ˆ NT = *
i=1
where
T
Lˆ –2 22i
t=1
(xit x¯ i)
–1
N
T
ˆ –1 Lˆ –1 11iL22i
2
i=1
t=1
(xit x¯ i)*it Tˆ i
Fully Modified OLS for Heterogeneous Cointegrated Panels
103
Lˆ 21i Lˆ ˆ o21i 21i ( ˆ 22i + ˆ o22i)
x , ˆ ˆ 21i + ˆL22i it i Lˆ 22i ˆ i as defined in (2) above. Then, and Lˆ i is a lower triangular decomposition of ˆ NT converges to the true value under assumptions 1.1 and 1.2, the estimator * at rate TN, and is distributed as
*it = it
ˆ NT ) → N(0, v) where v = TN(*
2 iff x¯ i = y¯ i = 0 6 else
as T → and N → . As the proposition indicates, when proper modifications are made to the estimator, the corresponding asymptotic distribution will be free of the nuisance parameters associated with any member specific serial correlation patterns in the data. Notice also that this fully modified panel OLS estimator is asymptotically unbiased for both the standard case without intercepts as well as the fixed effects model with heterogeneous intercepts. The only difference is in the size of the variance, which is equal to 2 in the standard case, and 6 in the case with heterogeneous intercepts, both for xit univariate. More generally, when xit is an m-dimensional vector, the specific values for v will also be a function of the dimension m. The associated t-statistics, however, will not depend on the specific values for v, as we shall see. The fact that this estimator is distributed normally, rather than in terms of unit root asymptotics as in Phillips & Hansen (1990), derives from the fact that these unit root distributions are being averaged over the cross sectional dimension. Specifically, this averaging process produces normal distributions whose variance depends only on the moments of the underlying Brownian motion functionals that describe the properties of the integrated variables. This is achieved by constructing the estimator in a way that isolates the idiosyncratic components of the underlying Wiener processes to produce sums of standard and independently distributed Brownian motion whose moments can be computed algebraically, as the proof of the proposition makes clear. The estimators Lˆ 11i and Lˆ 22i, which correspond to the long run standard errors of conditional process it, and the marginal process xit respectively, act to purge the contribution of these idiosyncratic elements to the endogenous feedback
T
and serial correlation adjusted statistic
(xit x¯ i)y*it Tˆ i.
t=1
The fact that the variance is larger for the fixed effects model in which heterogeneous intercepts are included stems from the fact that in the presence
104
PETER PEDRONI
of unit roots, the variation from the cross terms of the sample averages x¯ i and y¯ i grows large over time at the same rate T, so that their effect is not eliminated ˆ NT ).5 However, since the asymptotically from the distribution of TN(* contribution to the variance is computable analytically as in the proof of proposition 1.2, this in itself poses no difficulties for inference. Nevertheless, upon consideration of these expressions, it also becomes apparent that there should exist a metric which can directly adjust for this effect in the distribution and consequently render the distribution standard normal. In fact, as the following proposition indicates, it is possible to construct a t-statistic from this fully modified panel OLS estimator whose distribution will be invariant to this effect. Corollary 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS tstatistic). Consider the following t-statistic for the FMOLS panel estimator of as defined in proposition 1.2 above. Then under the same assumptions as in proposition 1.2, the statistic is standard normal,
N
ˆ NT ) t* ˆ NT = (*
Lˆ
i=1
T
–2 22i
(xit x¯ i)2
t=1
1/2
→ N(0, 1)
as T → and N → for both the standard model without intercepts as well as the fixed effects model with heterogeneous estimated intercepts. Again, as the derivation in the appendix makes apparent, because the numerator ˆ NT is a sum of mixture normals with zero mean of the fully modified estimator * whose variance depends only on the properties of the Brownian motion
T
functionals associated with the quadratic
(xit x¯ i)2, the t-statistic con-
t=1
structed using this expression will be asymptotically standard normal. This is ˆ NT ) regardless of the value of v associated with the distribution of TN(* and so will also not depend on the dimensionality of xit in the general vector case. Note, however, that in contrast to the conventional single equation case studied by Phillips & Hansen (1990), in order to ensure that the distribution of this t-statistic is free of nuisance parameters when applied to heterogeneous panels, the usual asymptotic variance estimator of the denominator is replaced with the estimator Lˆ –2 22i. By construction, this corresponds to an estimator of the asymptotic variance of the differences for the regressors and can be estimated accordingly. This is in contrast to the t-statistic for the conventional single equation fully modified OLS, which uses an estimator for the conditional
Fully Modified OLS for Heterogeneous Cointegrated Panels
105
asymptotic variance from the residuals of the cointegrating regression. This distinction may appear puzzling at first, but it stems from the fact that in heterogeneous panels the contribution from the conditional variance of the residuals is idiosyncratic to the cross sectional member, and must be adjusted ˆ NT estimator itself for directly in the construction of the numerator of the * before averaging over cross sections. Thus, the conditional variance has already ˆ NT, and all that is required been implicitly accounted for in the construction of * is that the variance from the marginal process xit be purged from the quadratic
T
(xit x¯ i)2. Finally, note that proposition 1.2 and its corollary 1.2 have been
t=1
specified in terms of a transformation, *it, of the true residuals. In Section 3 we will consider various strategies for specifying these statistics in terms of observables and consider the small sample properties of the resulting feasible statistics. D. A Group Mean Fully Modified OLS t-Statistic Before preceding to the small sample properties, we first consider one additional asymptotic result that will be of use. Recently Im, Pesaran & Shin (1995) have proposed using a group mean statistic to test for unit roots in panel data. They note that under certain circumstances, panel unit root tests may suffer from the fact that the pooled variance estimators need not necessarily be asymptotically independent of the pooled numerator and denominator terms of the fixed effects estimator. Notice, however, that the fully modified panel OLS statistics in proposition 1.2 and corollary 1.2 here have been constructed without the use of a pooled variance estimator. Rather, the statistics of the numerator and denominator have been purged of any influence from the nuisance parameters prior to summing over N. Furthermore, since asymptotically the distribution for the numerator is centered around zero, the covariance between the summed terms of the numerator and denominator also do not play ˆ NT ) or t* a role in the asymptotic distribution of TN(* ˆ it as they would otherwise. Nevertheless, it is also interesting to consider the possibility of a fully modified OLS group mean statistic in the present context. In particular, the group mean t-statistic is useful because it allows one to entertain a somewhat broader class of hypotheses under the alternative. Specifically, we can think of the distinction as follows. The t-statistic for the true panel estimator as described in corollary 1.2 can be used to test the null hypothesis Ho : i = o for all i versus the alternative hypothesis Ha : i = a ≠ o for all i where o is the
106
PETER PEDRONI
hypothesized common value for under the null, and a is some alternative value for which is also common to all members of the panel. By contrast, the group mean fully modified t-statistic can be used to test the null hypothesis Ho : i = o for all i versus the alternative hypothesis Ha : i ≠ o for all i, so that the values for are not necessarily constrained to be homogeneous across different members under the alternative hypothesis. The following proposition gives the precise form of the panel fully modified OLS t-statistic that we propose and gives its asymptotic distributions. Proposition 1.3 (Asymptotic Distribution of the Panel FMOLS Group Mean t-Statistic). Consider the following group mean FMOLS t-statistic for of the cointegrated panel (1). Then under assumptions 1.1 and 1.2, the statistic is standard normal, and
N
1 ¯t* ˆ NT = N
T
Lˆ –1 11i
i=1
t=1
(xit x¯ i)
–1/2
T
2
t=1
(xit x¯ i)y*it Tˆ i → N(0, 1)
where y*it = (yit y¯ i)
ˆ Lˆ 21i ˆ o21i L21i ( ˆ 22i + ˆ o22i)
xit, ˆ i ˆ 21i + Lˆ 22i Lˆ 22i
ˆ i as defined in (2) above, as and Lˆ i is a lower triangular decomposition of T → and N → for both the standard model without intercepts as well as the fixed effects model with heterogeneous intercepts. Note that the asymptotic distribution of this group mean statistic is also invariant to whether or not the standard model without intercepts or the fixed effects model with heterogeneous intercepts has been estimated. Just as with the previous t-statistic of corollary 1.2, the asymptotic distribution of this panel group mean t-statistic will also be independent of the dimensionality of xit for the more general vector case. Thus, we have presented two different types of tstatistics, a pooled panel OLS based fully modified t-statistic based on the ‘within’ dimension of the panel, and a group mean fully modified OLS tstatistic based on the ‘between’ dimension of the panel, both of which are asymptotically unbiased, free of nuisance parameters, and invariant to whether or not idiosyncratic fixed effects have been estimated. Furthermore, we have characterized the asymptotic distribution of the fully modified panel OLS estimator itself, which is also asymptotically unbiased and free of nuisance parameters, although in this case one should be aware that while the distribution will be a centered normal, the variance will depend on whether heterogeneous intercepts have been estimated and on the dimensionality of the
Fully Modified OLS for Heterogeneous Cointegrated Panels
107
vector of regressors. In the remainder of this chapter we investigate the small sample properties of feasible statistics associated with these asymptotic results and discuss examples of their application.
III. SMALL SAMPLE PROPERTIES OF FEASIBLE PANEL FULLY MODIFIED OLS STATISTICS In this section we investigate the small sample properties of the pooled and group mean panel FMOLS estimators that were developed in the previous section. We discuss two alternative feasible estimators associated with the panel FMOLS estimators of proposition 1.2 and its t-statistic, which were defined only in terms of the true residuals. While these estimators perform reasonably well in idealized situations, more generally, size distortions for these estimators have the potential to be fairly large in small samples, as was reported in Pedroni (1996a). By contrast, we find that the group mean test statistics do very well and exhibit relatively little size distortion even in relatively small panels even in the presence of substantial cross sectional heterogeniety of the error process associated with the dynamics around the cointegrating vector. Consequently, after discussing some of the basic properties of the feasible versions of the pooled estimators and the associated difficulties for small samples, we focus here on reporting the small sample properties of the group mean test statistics, which are found to do extremely well provided that the time series dimension is not smaller than the cross sectional dimension. A. General Properties of the Feasible Estimators First, before reporting the results for the between dimension group mean test statistic, we discuss the general properties of various feasible forms of the within dimension pooled panel fully modified OLS statistics and consider the consequences of these properties in small samples. One obvious candidate for a feasible estimator based on proposition 1.2 would be to simply construct the statistic in terms of estimated residuals, which can be obtained from the initial N single equation OLS regressions associated with the cointegrating regression for (1). Since the single equation OLS estimator is superconsistent, one might hope that this produces a reasonably well behaved statistic for the panel FMOLS estimator. The potential problem with this reasoning stems from the fact that although the OLS regression is superconsistent it is also asymptotically biased in general. While this is a second order effect for the conventional
108
PETER PEDRONI
single series estimator, for panels, as N grows large, the effect has the potential to become first order. Another possibility might appear to be to construct the feasible panel FMOLS estimator for proposition 1.2 in terms of the original data series Lˆ 21i
xit along the lines of how it is often done for the y*it = (yit y¯ i) Lˆ 22i conventional single series case. However, this turns out to be correct only in very specialized cases. More generally, for heterogeneous panels, this will introduce an asymptotic bias which depends on the true value of the cointegrating relationship and the relative volatility of the series involved in the regression. The following makes this relationship precise. Proposition 2.1 (Regarding Feasible Pooled Panel FMOLS) Under the conditions of proposition 1.2 and corollary 1.2, consider the panel FMOLS estimator for the coefficient of panel (1) given by
N
ˆ *NT =
Lˆ
i=1
T
–2 22i
(xit x¯ i)
t=1
–1
N
T
Lˆ Lˆ
2
–1 –1 11i 22i
i=1
t=1
(xit x¯ i)y*it Tˆ i
where Lˆ 21i Lˆ 11i Lˆ 22i
xit + (xit x¯ i) Lˆ 22i Lˆ 22i ˆ NT ) and t* and Lˆ i and ˆ i are defined as before. Then the statistics TN (* ˆ NT constructed from this estimator are numerically equivalent to the ones defined in proposition 1.2 and corollary 1.2. This proposition shows why it is difficult to construct a reliable point estimator based on the naive FMOLS estimator simply by using a transformation of y*it analogous to the single equation case. Indeed, as the proposition makes explicit, such an estimator would in general depend on the true value of the parameter that it is intended to estimate, except in very specialized cases, which we discuss below. On the other hand, this does not necessarily prohibit the usefulness of an estimator based on proposition 2.1 for the purposes of testing a particular hypothesis about a cointegrating relationship in heterogeneous panels. By using the hypothesized null value for in the expression for y*it, proposition 2.1 can at least in principle be employed to construct a feasible FMOLS statistics to test the null hypothesis that i = for all i. However, as was reported in Pedroni (1996a), even in this case the small sample performance of the statistic is often subject to relatively large size distortion. Proposition 2.1 also provides us with an opportunity to examine the consequences of ignoring heterogeneity associated with the serial correlation
y*it = (yit y¯ i)
Fully Modified OLS for Heterogeneous Cointegrated Panels
109
dynamics for the error process for this type of estimator. In particular, we notice that the modification involved in this estimator relative to the convential time series fully modified OLS estimator differs in two respects. First, it includes the estimators Lˆ 11i and Lˆ 22i that premultiply the numerator and denominator terms to control for the idiosyncratic serial correlation properties of individual cross sectional members prior to summing over N. Secondly, and more importantly, it includes in the transformation of the dependent variable y*it Lˆ 11i Lˆ 22i an additional term (xit x¯ i). This term is eliminated only in two Lˆ 22i special cases: (1) The elements L11i and L22i are identical for all members of the panel, and do not need to be indexed by i. This corresponds to the case in which the serial correlation structure of the data is homogeneous for all members of the panel. (2) The elements L11i and L22i are perhaps heterogeneous across members of the panel, but for each panel L11i = L22i. This corresponds to the case in which asymptotic variances of the dependent and independent variables are the same. Conversely, the effect of this term increases as (1) the dynamics become more heterogeneous for the panel, and (2) as the relative volatility becomes more different between the variables xit and yit for any individual members of the panel. For most panels of interest, these are likely to be important practical considerations. On the other hand, if the data are known to be relatively homogeneous or simple in its serial correlation structure, the imprecise estimation of these elements will decrease the attractiveness of this type of estimator relative to one that implicitly imposes these known restrictions. B. Monte Carlo Simulation Results We now study small sample properties in a series of Monte Carlo simulations. Given the difficulties associated with the feasible versions of the within dimension pooled panel fully modified OLS estimators discussed in the previous subsection based on proposition 2.1, it is not surprising that these tend to exhibit relatively large size distortions in certain scenarios, as reported in the Pedroni (1996a). Kao & Chiang (1997) subsequently also confirmed the poor small sample properties of the within dimension pooled panel fully modified estimator based on a version in which a first stage OLS estimate was used for the adjustment term. Indeed, such results should not be surprising given that the first stage OLS estimator introduces a second order bias in the presence of endogeneity, which is not eliminated asymptotically. Consequently, this bias leads to size distortion for the panel which is not necessarily eliminated even when the sample size grows large. By contrast, the feasible version of the
110
PETER PEDRONI
between dimension group mean estimator does not require such an adjustment term even in the presence of heterogeneous serial correlation dynamics, and does not suffer from the same size distortion.6 Consequently, we focus here on reporting the small sample Monte Carlo results for the between dimension group mean estimator and refer readers to Pedroni (1996a) for simulation results for the feasible versions of the within dimension pooled estimators. To facilitate comparison with the conventional time series literature, we use as a starting point a few Monte Carlo simulations analogous to the ones studied in Phillips & Loretan (1991) and Phillips & Hansen (1990) based on their original work on FMOLS estimators for conventional time series. Following these studies, we model the errors for the data generating process in terms of a vector MA(1) process and consider the consequences of varying certain key parameters. In particular, for the purposes of the Monte Carlo simulations, we model our data generating process for the cointegrated panel (1) under assumptions 1.1 and 1.2 as yit = i + xit + it xit = xit 1 + it i = 1, . . . , N, t = 1, . . . , T, for which we model the vector error process it = (it, it) in terms of a vector moving average process given by it = it iit 1; it ~ i.i.d. N(0, i)
(3)
where i is a 2 2 coefficient matrix and i is a 2 2 contemporaneous covariance matrix. In order to accommodate the potentially heterogeneous nature of these dynamics among different members of the panel, we have indexed these parameters by the subscript i. We will then allow these parameters to be drawn from uniform distributions according to the particular experiment. Likewise, for each of the experiments we draw the fixed effects i from a uniform distribution, such that i ~ U(2.0, 4.0). We consider first as a benchmark case an experiment which captures much of the richness of the error process studied in Phillips & Loretan (1991) and yet also permits considerable heterogeneity among individual members of the panel. In their study, Phillips & Loretan (1991), following Phillips & Hansen (1990), fix the following parameters 11i = 0.3, 12i = 0.4, 22i = 0.6, 11i = 22i = 1.0, = 2.0 and then permit 21i and 21i to vary. The coefficient 21i is particularly interesting since a non-zero value for this parameter reflects an absence of even weak exogeneity for the regressors in the cointegrating regression associated with (1), and is captured by the term L21i in the panel FMOLS statistics. For our heterogeneous panel, we therefore set 11i = 22i = 1.0, = 2.0 and draw the remaining parameters from uniform
Fully Modified OLS for Heterogeneous Cointegrated Panels
111
distributions which are centered around the parameter values set by Phillips & Loretan (1991), but deviate by up to 0.4 in either direction for the elements of i and by up to 0.85 in either direction for 21i. Thus, in our first experiment, the parameters are drawn as follows: 11i ~ U(–0.1, 0.7), 12i ~ (0.0, 0.8), 21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0) and 21i ~ U(–0.85, 0.85). This specification achieves considerable heterogeneity across individual members and also allows the key parameters 21i and 21i to span the set of values considered in Phillips and Loretan’s study. In this first experiment we restrict the values of 21i to span only the positive set of values considered in Phillips and Loretan for this parameter. In several cases Phillips and Loretan found negative values for 21i to be particularly problematic in terms of size distortion for many of the conventional test statistics applied to pure time series, and in our subsequent experiments we also consider the consequences of drawing negative values for this coefficient. In each case, the asymptotic covariances were estimated individually for each member i of the cross section using the Newey-West (1987) estimator. In setting the lag length for the band width, we employ the data dependent scheme recommended in Newey & West (1994), which is to set
the lag truncation to the nearest integer given by K = 4
T 100
2/9
, where T is the
number of sample observations over time. Since we consider small sample results for panels ranging in dimension from T = 10 to T = 100 by increments of 10, this implies that the lag truncation ranges from 2 to 4. For the cross sectional dimension, we consider small sample results for N = 10, N = 20 and N = 30 for each of these values of T. Results for the first experiment, with 21i ~ U(0.0, 0.8) are reported in Table I of Appendix B. The first column of results reports the bias of the point estimator and the second column reports the associated standard error of the sampling distribution. Clearly, the biases are small at –0.058 even in extreme cases when both the N and T dimensions are as small as N = 10, T = 10 and become minuscule as the T dimension grows larger. At N = 10, T = 30 the bias is already down to –0.009, and at T = 100 it goes to –0.001. This should be anticipated, since the estimators are superconsistent and converge at rate TN, so that even for relatively small dimensions the estimators are extremely precise. Furthermore, the Monte Carlo simulations confirm that the bias is reduced more quickly with respect to growth in the T dimension than with respect to growth in the N dimension. For example, the biases are much smaller for T = 30, N = 10 than for T = 10, N = 30 for all of the experiments. The standard errors in column two confirm that the sampling variance around these
112
PETER PEDRONI
biases are also very small. Similar results continue to hold in subsequent experiments with negative moving average coefficients, regardless of the data generating process for the serial correlation processes. Consequently, the first thing to note is that these estimators are extremely accurate even in panels with very heterogeneous serial correlation dynamics, fixed effects and endogenous regressors. Of course these findings on bias should not come as a surprise given the superconsistency results presented in the previous section. Instead, a more central concern for the purposes of inference are the small sample properties of the associated t-statistic and the possibility for size distortion. For this, we consider the performance of the small sample sizes of the test under the null hypothesis for various nominal sizes based on the asymptotic distribution. Specifically, the last two columns report the Monte Carlo small sample results for the nominal 5% and 10% p-values respectively for a two sided test of the null hypothesis = 2.0. As a general rule, we find that the size distortions in these small samples are remarkably small provided that the time series dimension, T, is not smaller than the cross sectional dimension, N. The reason for this condition stems primarily as a consequence of the estimation of the fixed effects. The number of fixed effects, i, grows with the N dimension of the panel. On the other hand, each of these N fixed effects are estimated consistently as T grows large, so that ˆ i i goes to zero only as T grows large. Accordingly, we require T to grow faster than N in order to eliminate this effect asymptotically for the panel. As a practical consequence, small sample size distortion tends to be high when N is large relative to T, and decreases as T becomes large relative to N, which can be anticipated in any fixed effects model. As we can see from the results in Table I, in cases when N exceeds T, the size distortions are large, with actual sizes exceeding 30 and 40% when T = 10 and N grows from 10 to 20 and 30. This represents an unattractive scenario, since in this case, the tests are likely to report rejections of the null hypothesis when in fact it is not warranted. However, these represent extreme cases, as the techniques are designed to deal with the opposite case, where the T dimension is reasonably large relative to the N dimension. In these cases, even when the T dimension is only slightly larger than the N dimension, and even in cases where it is comparable, we find that the size distortion is remarkably small. For example, in the results reported in Table I we find that with N = 20, T = 40 the size of the nominal 5% and 10% tests becomes 4.5% and 9.3% respectively. Similarly, for N = 10, T = 30 the sizes for the Monte Carlo sample become 6.1% and 11% respectively, and for N = 30, T = 60, they become 4.7% and 9.6%. As the T dimension grows even larger for a fixed N dimension, the tests tend to become slightly undersized, with the actual size
Fully Modified OLS for Heterogeneous Cointegrated Panels
113
becoming slightly smaller than the nominal size. In this case the small sample tests actually become slightly more conservative than one would anticipate based on the asymptotic critical values. Next, we consider the case in which the values for 21i span negative numbers, and for the experiment reported in Table II of Appendix B we draw this coefficient from 21i ~ U(–0.8, 0.0). Large negative values for moving average coefficients are well known to create size distortion for such estimators, and we anticipate this to be a case in which we have higher small sample distortion. It is interesting to note that in this case the biases for the point estimate become slightly positive, although as mentioned before, they continue to be very small. The small sample size distortions follow the same pattern in that they tend to be largest when T is small relative to N and decrease as T grows larger. In this case, as anticipated, they tend to be higher than for the case in which 21i spans only positive values. However, the values still fall within a fairly reasonable range considering that we are dealing with all negative values for 21i. For example, with N = 10, T = 100 we have values of 6.3% and 12% for the 5% and 10% nominal sizes respectively. For N = 20, T = 100 they become 9% and 15.6% respectively. These are still remarkably small compared to the size distortions reported in Phillips & Loretan (1991) for the conventional time series case. Finally, we ran a third experiment in which we allowed the values for 21i to span both positive and negative values so that we draw the values from 21i ~ U(–0.4, 0.4). We consider this to be a fairly realistic case, and this corresponds closely to the range of moving average coefficients that were estimated in the purchasing power parity study contained in Pedroni (1996a). We find the group mean estimator and test statistic to perform very well in this situation. The Monte Carlo simulation results for this case are reported in Table III of Appendix B. Whereas the biases for the case with large positive values of 21i in Table I were negative, and for the case with large negative values in Table II were positive, here we find the biases to be positive and often even smaller in absolute value than either of the first two cases. Most importantly, we find the size distortions for the t-statistic to be much smaller here than in the case where we have exclusively negative values for 21i. For example, with N = 30, and T as small as T = 60, we find the nominal 5% and 10% sizes to be 5.4% and 10.5%. Again, generally the small sample sizes for the test are quite close to the asymptotic nominal sizes provided that the T dimension is not smaller than the N dimension. Consequently, it appears to be the case that even when some members of the panel exhibit negative moving average coefficients, as long as other members exhibit positive values, the distortions tend to be averaged out so that the small sample sizes for the group mean statistic stay
114
PETER PEDRONI
very close to the asymptotic sizes. Thus, we conclude that in general when the T dimension is not smaller than the N dimension, the asymptotic normality result appears to provide a very good benchmark for the sampling distribution under the null hypothesis, even in relatively small samples with heterogeneous serial correlation dynamics. Finally, although power is generally not a concern for such panel tests, since the power is generally quite high, it is worth mentioning the small sample power properties of the group mean estimator. Specifically, we experimented by checking the small sample power of the test against the alternative hypothesis by generating the 10,000 draws for the DGP associated with case 3 above with = 1.9. For the test of the null hypothesis that = 2.0 against the alternative hypothesis that = 1.9, we found that the power for the 10% p-value test reached 100% for N = 10 when T was 40 or more (or 98.2% when T = 30) and reached 100% for N = 20 when T was 30 or more, and for N = 30 the power reached 100% already when T was 20 or more. Consequently, considering the high power and the relatively small size distortion, we find the small sample properties of the estimator and associated t-statistic to be extremely well behaved in the cases for which it was designed.
IV. ESTIMATION ALGORITHM AND SOME EXAMPLES OF APPLICATIONS7 In this section we describe the algorithm for computing the panel FMOLS estimators and their associated test statistics and then discuss a few examples of their use. In summary, we can compute any one the desired statistics by performing the following steps: 1. Estimate the panel regression and collect the residuals. Specifically one should estimate the desired panel cointegration regression, making sure to include any desired intercepts, or common time dummies in the regression, and then collect the residuals ˆ i,t for each of the members of the panel. If the slopes are homogeneous, the common time dummy effects can be eliminated more simply by first demeaning the data over the time dimension prior to estimating the regression. Thus, construct yit y¯ t, xit x¯ t for each variable, where y¯ t = N–1 Ni = 1 yit, x¯ t = N–1 Ni = 1 xit prior to estimating the regression, and prior to the following steps. 2. Estimate the long run covariances and autocovariances of the errors. Use the estimated residuals from part (1) plus the differences of each of the regressors to construct a vector error series it = (it, it). Note that the second element is a vector of dimension m, where m corresponds to the number of regressors. Now use any long run covariance matrix estimator,
Fully Modified OLS for Heterogeneous Cointegrated Panels
115
such as the Newey-West (1987) estimator to estimate the elements of the long run covariance i and the autocovariances i. This can be done by applying the estimator to the entire m + 1 vector it = (it, it) to produce an (m + 1) (m + 1) long run covariance matrix and autocovariances matrix. The elements of i and i then correspond to partitions of the (m + 1) (m + 1) long run covariance matrix and autocovariance matrix respectively. Specifically, the far upper right scalar element of the (m + 1) (m + 1) long run covariance matrix corresponds to 11i. The lower m m partition corresponds to 22i, which is an m m matrix representing the long run covariance among the regressors, and the remaining m elements in the column below the far upper right scalar element correspond to 21i. Since the covariance matrix is symmetric, 12i = 21i. The same mapping corresponds the partitions of the (m + 1) (m + 1) autocovariance matrix and the elements of i, except that unlike i, the autocovariance matrix i is not symmetric, so 12i ≠ 21i, and these elements must be extracted from the corresponding column and row partitions separately. Once i has been constructed, apply a Cholesky style triangularization to obtain the elements of the matrix Li. Finally, we will use an estimate of the standard contemporaneous covariance matrix, oi , for the elements of it = (it, it), similarly partitioned. 3. Construct the estimator. Now we have all of the pieces required to construct the estimators. Each estimator uses a serial correlation correction term, i, which can be constructed from the pieces obtained in part (2) above, as ˆ ˆ o21i L21i ( ˆ 22i + ˆ o22i) ˆ i ˆ 21i + Lˆ 22i Lˆ 21i
xit can be Lˆ 22i constructed from the original data. Then the final step is to construct the cross product terms between y*it and (xit xi). This is sufficient now to compute either the point estimators or the associated t-statistics for any of the statistics. It is worth noting two points here. The difference between the panel ‘within’ dimension estimators and the group mean ‘between’ dimension estimators is in the way in which the cross product terms are computed. For the ‘within’ dimension statistics, the cross product terms are computed by summing over the T and N dimensions separately for the numerator and the denominator. For the group mean ‘between’ dimension statistics, the cross product terms are computed by summing over the T dimension for the numerator and denominator separately, and then summing over the N dimension for the entire ratio. Consequently, the first point to note is that the algorithm as applied to the
Next, using the elements of Li, the expression for y*it = (yit y¯ i)
116
PETER PEDRONI
group mean estimator describes the same steps that one would take if one were estimating N different conventional FMOLS estimators and then taking the average of these. The same is true for the group mean t-statistic. Thus, if one already has a routine to estimate the conventional time series FMOLS estimator, then the group mean panel FMOLS estimator is extremely simple and convenient to estimate. The second point to note is that for the panel FMOLS ‘within’ dimension estimator we have used the estimates of i, i, oi and i to compute the weighted panel variances. But it is equally feasible to compute the unweighted panel variances by first averaging the values i, i, oi before applying the transformations. Whether or not the two different treatments has much consequence for the estimate is likely to depend on how heterogeneous the values of i are across individual members. Next, we briefly describe a few examples of the use of these panel FMOLS estimators. One obvious application is to the exchange rate literature, and in particular the purchasing power parity literature. Long run absolute or strong purchasing power parity predicts that nominal exchange rates and aggregate price ratios among countries should be cointegrated with a unit cointegrating vector, so that the real exchange rate is stationary. However, panel unit root tests based on Levin & Lin (1993) have generally found mixed results. See for example Oh (1996) and Papell (1997) and Wu (1996) among others. On the other hand, panel cointegration tests based on Pedroni (1995, 1997a) have generally rejected the null of no cointegration. See for example Canzoneri, Cumby & Diba (1996), Chinn (1997) and Taylor (1996) among others for these. By contrast, long run relative or weak purchasing power parity simply predicts that the nominal exchange rate and aggregate price ratios will be cointegrated, though not necessarily with a unit cointegrating vector. The panel FMOLS estimators presented in this paper are an obvious way to distinguish between these two hypothesis, and Pedroni (1996a, 1999) uses these panel FMOLS estimators to show that only the relative, weak form of purchasing power parity holds for a panel of post Bretton Woods period floating exchange rates. The latter paper contrasts results for both a parametric group mean DOLS estimator and nonparametric group mean FMOLS estimator for the weak purchasing power parity test. In a similar spirit, Alexius & Nilson (2000), Canzoneri, Cumby & Diba (1996), Chinn (1997) apply these panel FMOLS tests from Pedroni (1996a) to test the Samuelson-Balassa hypothesis that long run movements of real exchange rates are driven by differences in long run relative productivities among countries. Other examples of the use of these panel FMOLS tests have been to the growth literature. Neusser & Kugler (1998) use the tests from Pedroni (1996a) to investigate the connection between financial development and growth. Kao,
Fully Modified OLS for Heterogeneous Cointegrated Panels
117
Chiang & Chen (1999) use a panel FMOLS estimator and compare it to a panel DOLS estimator to investigate the connection between research and development expenditure and growth. Keller & Pedroni (1999) use the group mean panel estimator presented in this chapter to study the mechanism by which imported R&D impacts growth at the industry level and demonstrate the attractiveness of the more flexible form of the group mean estimator. Canning & Pedroni (1999) use the same group mean panel FMOLS test as a first step estimator to construct a test for the direction of long run causality between public infrastructure and long run growth. Finally Pedroni & Wen (2000) make use of the group mean panel FMOLS estimator as a first step estimator in an overlapping generations model to identify the position of the U.S., Japanese and European economies relative to the golden rule, and the extent to which social security transfer programs can move economies closer to this position. This is just a brief summary of the application of these estimators to two literatures, the exchange rate and growth literatures. Needless to say, many potential applications exist beyond these two literatures.
V. DISCUSSION OF FURTHER RESEARCH AND CONCLUDING REMARKS We have explored in this chapter methods for testing and making inferences about cointegrating vectors in heterogeneous panels based on fully modified OLS principles. When properly constructed to take account of potential heterogeneity in the idiosyncratic dynamics and fixed effects associated with such panels, the asymptotic distributions for these estimators can be made to be centered around the true value and will be free of nuisance parameters. Furthermore, based on Monte Carlos simulations we have shown that in particular the t-statistic constructed from the between dimension group mean estimator performs very well in that in exhibits relatively little small sample size distortion. To date, the techniques developed in this study have been employed successfully in a number of applications, and it will be interesting to see if the panel FMOLS methods developed in this paper fare equally well in other scenarios. The area of research and application of nonstationary panel methods is rapidly expanding, and we take this opportunity to remark on a few further issues of current and future research as they relate to the subject of this chapter. As we have already discussed, the between dimension group mean estimator has an advantage over the within dimension pooled estimators presented in this chapter in that it permits a more flexible alternative hypothesis that allows for heterogeneity of the cointegrating vector. In many cases it is not known a priori
118
PETER PEDRONI
whether heterogeneity of the cointegrating vector can be ruled out, and it would be particularly nice to test the null hypothesis that the cointegrating vectors are heterogeneous in such panels with heterogeneous dynamics. In this context, Pedroni (1998) provides a technique that allows one to test such a null hypothesis against the alternative hypothesis that they are homogeneous and demonstrates how the technique can be used to test whether convergence in the Solow growth model occurs to distinct versus common steady states for the Summers and Heston data set. Another important issue that is often raised for these types of panels pertains to the assumption of cross sectional independence as per assumption 1.2 in this chapter. The standard approach is to use common time dummies, which in many cases is sufficient to deal with cross sectional dependence. However, in some cases, common time dummies may not be sufficient, particularly when the cross sectional dependence is not limited to contemporaneous effects and is dynamic in nature. Pedroni (1997b) proposes an asymptotic covariance weighted GLS approach to deal with such dynamic cross sectional dependence for the case in which the time series dimension is considerably larger than the cross sectional dimension, and applies the panel fully modified form of the test to the purchasing power parity hypothesis using monthly OECD exchange rate data. It is interesting to note, however, that for this particular application, taking account of such cross sectional dependencies does not appear to impact the conclusions and it is possible that in many cases cross sectional dependence does not play as large a role as one might anticipate once common time dummies have been included, although this remains an open question. Another important issue is parameteric versus non-parametric estimation of nuisance parameters. Clearly, any of the estimators presented here can be implemented by taking care of the nuisance parameter effects either nonparameterically using kernel estimators, or parametrically, as for example using dynamic OLS corrections. Generally speaking, non-parametric estimation tends to be more robust, since one does not need to assume a specific parametric form. On the other hand, since non-parametric estimation relies on fewer assumptions, it generally requires more data than parametric estimation. Consequently, for conventional time series tests, when data is limited it is often worth making specific parameteric assumptions. For panels, on the other hand, the greater abundance of data suggests an opportunity to take advantage of the greater robustness of nonparametric methods, though ultimately the choice may simply be a matter of taste. The Monte Carlo simulation results provided here demonstrate that even in the presence of considerable heterogeneity, nonparametric correction methods do very well for the group mean estimator and the corresponding t-statistic.
Fully Modified OLS for Heterogeneous Cointegrated Panels
119
NOTES 1. The results in section 2 and appendix A first appeared in Pedroni (1996a). The Indiana University working paper series is available at http://www.indiana.edu/ iuecon/ workpaps/ 2. In fact the computer program which accompanies this paper also allows one to implement these tests for any arbitrary number of regressors. It is available upon request from the author at
[email protected] 3. See Phillips & Moon (1999) for a recent formal study of the regularity conditions required for the use of sequential limit theory in panel data and a set of conditions under which sequential limits imply joint limits, including the case in which the long run variances differ among members of the panel. 4. These results are for the OLS estimator when the variables are cointegrated. A related stream of the literature studies the properties of the panel OLS estimator when the variables are not cointegrated and the regression is spurious. See for example Entorf (1997), Kao (1999), Phillips & Moon (1999) and Pedroni (1993, 1997a) on spurious regression in nonstationary panels. 5. A separate issue pertains to differences between the sample averages and the true population means. Since we are treating the asymptotics sequentially, this difference goes to zero as T grows large prior to averaging over N, and thus does not impact the limiting distribution. Otherwise, more generally we would require that the ratio N/T goes to zero as N and T grow large in order to ensure that these differences do not impact the limiting distribution. We return to this point in the discussion of the small sample properties in section 3.2. 6. Of course this is not to say that all within dimension estimators will necessarily suffer from this particular form of size distortion, and it is likely that some forms of the pooled FMOLS estimator will be better behaved than others. Nevertheless, given the other attractive features of the between dimension group mean estimator, we focus here on reporting the very attractive small sample properties of this estimator. 7. I am grateful to an anonymous referee for suggesting this section.
ACKNOWLEDGMENTS I thank especially Bob Cumby, Bruce Hansen, Roger Moon, Peter Phillips, Norman Swanson and Pravin Trivedi and two anonymous referees for helpful comments and suggestions on various earlier versions, and Maria Arbatskaya for research assistance. The paper has also benefitted from presentations at the June 1996 North American Econometric Society Summer Meetings, the April 1996 Midwest International Economics Meetings, and workshop seminars at Rice University-University of Houston, Southern Methodist University, The Federal Reserve Bank of Kansas City, U. C. Santa Cruz and Washington University. The current version of the paper was completed while I was a visitor at the Department of Economics at Cornell University, and I thank the members of the Department for their generous hospitality. A computer program
120
PETER PEDRONI
which implements these tests is available upon request from the author at
[email protected]
REFERENCES Alexius, A., & Nilson, J. (2000). Real Exchange Rates and Fundamentals: Evidence from 15 OECD Countries. Open Economies Review, forthcoming. Canning, D., & Pedroni, P. (1999). Infrastructure and Long Run Economic Growth. CAE Working paper, No. 99–09, Cornell University. Canzoneri M., Cumby, R., & Diba, B. (1996). Relative Labor Productivity and the Real Exchange Rate in the Long Run: Evidence for a Panel of OECD Countries. NBER Working paper No. 5676. Chinn, M. (1997). Sectoral Productivity, Government Spending and Real Exchange Rates: Empirical Evidence for OECD Countries. NBER Working paper No. 6017. Chinn, M., & Johnson, L. (1996). Real Exchange Rate Levels, Productivity and Demand Shocks: Evidence from a Panel of 14 Countries. NBER Working paper No. 5709. Entorf, H. (1997). Random Walks and Drifts: Nonsense Regression and Spurious Fixed-Effect Estimation’. Journal of Econometrics, 80, 287–96. Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37, 249–265. Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels. Working paper, Department of Economics, University of Cambridge. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data’. Journal of Econometrics, 90, 1–44. Kao, C., & Chen, B. (1995). On the Estimation and Inference of a Cointegrated Regression in Panel Data When the Cross-section and Time-series Dimensions Are Comparable in Magnitude. Working paper, Department of Economics, Syracuse University. Kao, C., & Chiang, M. (1997). On the Estimation and Inference of a Cointegrated Regression In Panel Data. Working paper, Department of Economics, Syracuse University. Kao, C., Chiang, M., & Chen, B. (1999). International R&D Spillovers: An Application of Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and Statistics, 61(4), 691–709. Keller, W., & Pedroni, P. (1999). Does Trade Affect Growth? Estimating R&D Driven Models of Trade and Growth at the Industry Level. Working paper, Department of Economics, Indiana University and University of Texas. Levin, A., & Lin, F. (1993). Unit Root Tests in Panel Data; Asymptotic and Finite-sample Properties. Working paper, Department of Economic, U. C. San Diego. Mark, N., & Sul, D. (1999). A Computationally Simple Cointegration Vector Estimator for Panel Data. Working paper, Department of Economics, Ohio State University. Neusser, K., & Kugler, M. (1998). Manufacturing Growth and Financial Development: Evidence from OECD Countries. Review of Economics and Statistics, 80, 638–646. Newey, W., & West, K. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Coariance Matrix. Econometrica, 55, 703–708. Newey, W., & West, K. (1994). Autocovariance Lag Selection in Covariance Matrix Estimation’. Review of Economic Studies, 61, 631–653.
Fully Modified OLS for Heterogeneous Cointegrated Panels
121
Obstfeld M., & Taylor, A. (1996). International Capital-Market Integration over the Long Run: The Great Depression as a Watershed. Working paper, Department of Economics, U. C. Berkeley. Oh, K. (1996). Purchasing Power Parity and Unit Root Tests Using Panel Data’. Journal of International Money and Finance, 15, 405–418. Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float’. Journal of International Economics, 43, 313–32. Pedroni, P. (1993). Panel Cointegration. Chapter 2 in Panel Cointegration, Endogenous Growth And Business Cycles in Open Economies, Columbia University Dissertation, Ann Arbor, MI: UMI Publishers. Pedroni, P. (1995). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time Series Tests, With an Application to the PPP Hypothesis. Working paper, Department of Economics, No. 95–013, Indiana University. Pedroni, P. (1996a). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper No. 96–020, Department of Economics, Indiana University. Pedroni, P. (1996b). Human Capital, Endogenous Growth, & Cointegration for Multi-Country Panels. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997a). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time Series Tests, With an Application to the PPP Hypothesis; New Results. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997b). On the Role of Cross Sectional Dependency in Dynamic Panel Unit Root and Panel Cointegration Exchange Rate Studies. Working paper, Department of Economics, Indiana University. Pedroni, P. (1998). Testing for Convergence to Common Steady States in Nonstationary Heterogeneous Panels. Working paper, Department of Economics, Indiana University. Pedroni, P. (1999). Purchasing Power Parity Tests in Cointegrated Panels. Working paper, Department of Economics, Indiana University. Pedroni, P., & Wen, Y. (2000). Government and Dynamic Efficiency. Working paper, Department of Economics, Cornell University and Indiana University. Pesaran, H., & Smith, R. (1995). Estimating Long Run Relationships from Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79–114. Phillips, P., & Durlauf, S. (1986). Multiple Time Series Regressions with Integrated Processes’. Review of Economic Studies, 53, 473–495. Phillips, P., & Hansen, B. (1990). Statistical Inference in Instrumental Variables Regression with I(1) Processes. Review of Economic Studies, 57, 99–125. Phillips, P., & Loretan, M. (1991). Estimating Long-run Economic Equilibria. Review of Economic Studies, 58, 407–436. Phillips, P., & Moon, H. (1999). Linear Regression Limit Theory for Nonstationary Panel Data’. Econometrica, 67, 1057–1112. Quah, D. (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data’. Economics Letters, 44, 9–19. Taylor, A. (1996). International Capital Mobility in History: Purchasing Power Parity in the LongRun. NBER Working paper No. 5742. Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel-Data Test. Journal of Money Credit and Banking, 28, 54–63.
122
PETER PEDRONI
MATHEMATICAL APPENDIX A Proposition 1.1: We establish notation here which will be used throughout the remainder of the appendix. Let Zit = Zit–1 + it where it = (it, it). Then by virtue of assumption 1.1 and the functional central limit theorem,
T
T
–1
Z˜ itit →
1
˜ i) dB(r, ˜ i) + i + oi B(r,
t=1
T
T
(A1)
r=0
1
˜ it → Z˜ itZ
–2
˜ i)B(r, ˜ i) dr B(r,
(A2)
r=0
t=1
for all i, where Z˜ it = Z it Z¯ i refers to the demeaned discrete time process and ¯ i) is demeaned vector Brownian motion with asymptotic covariance i. B(r, ˜ i(r) where Li = i1/2 is the ˜ i) = Li W This vector can be decomposed as B(r,
˜ = W1(r) lower triangular decomposition of i and W(r) W2(r)
0
1
1
W1(r) dr,
0
is a vector of demeaned standard Brownian motion,
W2(r) dr
with W1i independent of W2i. Under the null hypothesis, the statistic can be written in these terms as N T 1 –1 Z˜ itit T N i = 1 21 t=1 TN(ˆ NT ) = (A3) N T 1 –2 ˜ it T Z˜ itZ N i=1 22 t=1 Based on (A1), as T → , the bracketed term of the numerator converges to
1
˜ i) ˜ i) dB(r, B(r,
r=0
+ 21i + o21i
the first term of which can be decomposed as
1
r=0
(A4)
21
˜ i) dB(r, ˜ i) B(r,
= L11iL22i 21
+ L21iL22i
W2i dW1i W 1i(1)
W2i dW2i W 2i(1)
W2i
W2i
(A5)
Fully Modified OLS for Heterogeneous Cointegrated Panels
123
In order for the distribution of the estimator to be unbiased, it will be necessary that the expected value of the expression in (A4) be zero. But although the expected value of the first bracketed term in (A5) is zero, the expected value of the second bracketed term is given as
E L21iL22i
W2i dW2i W2i(1)
W2i
1 = L21iL22i 2
(A6)
Thus, given that the asymptotic covariance matrix, i, must have positive diagonals, the expected value of the expression (A4) will be zero only if L21i = 21i = o21i = 0, which corresponds to strict exogeneity of regressors for all members of the panel. Finally, even if such strict exogeneity does hold, the variance of the numerator will still be influenced by the parameters L11i, L22i which reflect the idiosyncratic serial correlation patterns in the individual cross sectional members. Unless these are homogeneous across members of the panel, they will lead to non-trivial data dependencies in the asymptotic distribution. Proposition 1.2: Continuing with the same notation as above, the fully modified statistic can be written under the null hypothesis as
N
1
ˆ NT ) = TN(*
N
T
ˆ –1 Lˆ –1 11iL22i (0,1)
T
Z˜ itit
–1
i=1
1,
t=1
N
1 N
Lˆ 21i ˆ i Lˆ 22i
T
Lˆ
–2 22i
T
i=1
(A7)
˜ it Z˜ itZ
–2
t=1
22
Thus, based on (A1), as T → , the bracketed term of the numerator converges to
1
r=0
˜ i) dB(r, ˜ i) B(r,
21
Lˆ 21i Lˆ 22i
1
˜ i) dB(r, ˜ i) B(r,
r=0
+ 21i + o21i
Lˆ 21i ( 22i + o22i) Lˆ 22i
˜ i such that which can be decomposed into the elements of W
22
(A8)
124
PETER PEDRONI
1
˜ i) dB(r, ˜ i) B(r,
r=0
= L11iL22i 21
+ L21iL22i
1
˜ i) dB(r, ˜ i) B(r,
r=0
W2i dW1i W1i(1)
W2i
W2i dW2i W2i(1)
= L222i
W2i
W2idW2i W2i(1)
22
W2i
(A9)
(A10)
where the index r has been omitted for notational simplicity. Thus, if a ˆ i → i and consequently Lˆ i → Li consistent estimator of i is employed, so that and ˆ i → , then
T
Lˆ Lˆ
–1 –1 11i 22i
(0,1)(T
–1
t=1
1
→
Lˆ 21i ˆ i Z˜ itit) 1, Lˆ 22i 1
W2i(r) dW1i(r) W1i(1)
0
(A11)
W2i(r) dr
0
where the mean and variance of this expression are given by
E
E
W2i dW1i
2
W2idW1i W1i(1)
2W1i(1)
W2idr
(A12)
W2idr = 0
2
W2idW1i + W1i(1)2
1 1 1 1 = 2 + = 2 3 3 6
W2idr
(A13)
respectively. Now that this expression has been rendered void of any ˜ i), then by virtue of idiosyncratic components associated with the original B(r, assumption 1.2 and a standard central limit theorem argument,
N
1 N
i=1
1
0
W2i(r) dW1i(r) W1i(1)
1
0
W2i(r) dr → N( 0, 1/6) (A14)
Fully Modified OLS for Heterogeneous Cointegrated Panels
125
as N → . Next, consider the bracketed term of the denominator of (A3), which based on (A1), as T → , converges to
1
˜ i)B(r, ˜ i) B(r,
r=0
22
Thus,
1
= L222i
0
Lˆ
(T
1
→
˜ it) Z˜ itZ
–2
22
t=1
W2i(r) dr
(A15)
W2i(r) dr
(A16)
2
1
0
T
–2 22i
W2i(r)2 dr
W2i(r)2dr
0
1
0
2
which has finite variance, and a mean given by
1
E
W2i(r)2dr
0
2
1
W2i(r) dr
0
1 1 1 = = 2 3 6
(A18)
Again, since this expression has been rendered void of any idiosyncratic ˜ i), then by virtue of assumption components associated with the original B(r, 1.2 and a standard law of large numbers argument,
N
1 N
i=1
1
W2i(r)2 dr
0
2
1
W2i(r) dr
0
→
1 6
(A18)
as N → . Thus, by iterated weak convergence and an application of the ˆ NT ) → N(0, 6) for this case where continuous mapping theorem, TN(* heterogeneous intercepts have been estimated. Next, recognizing that T –1/2y¯ i→
1
W1i(r) dr
0
W1i =
and
T –1/2x¯ i →
1
W2i(r) dr
as
T → ,
and
setting
0
W2i = 0 for the case where y¯ i = x¯ i = 0 gives as a special case of (A13)
and (A17) the results for the distribution in the case with no estimated
126
PETER PEDRONI
intercepts. In this case the mean given by (A12) remains zero, but the variance 1 1 in (A13) become 2 and the mean in (A17) also becomes 2. Thus, ˆ NT ) → N(0, 2) for this case. TN(* Corollary 1.2: In terms of earlier notation, the statistic can be rewritten as:
N
1
t* ˆ NT =
N
T
ˆ –1 Lˆ –1 11iL22i (0,1)
T
Z˜ itit
–1
i=1
1,
t=1
N
1 N
Lˆ 21i ˆ i Lˆ 22i
(A19)
T
Lˆ –2 22i
˜ it Z˜ itZ
T –2
i=1
22
t=1
where the numerator converges to the same expression as in proposition 1.2, and the root term of the denominator converges to the same value as in proposition 1.2. Since the distribution of the numerator is centered around zero, the asymptotic distribution of t* ˆ NT will simply be the distribution of the numerator divided by the square root of this value from the denominator. Since E
2
W2i dW1i
2W1i(1)
2
W2i dW1i + W1i(1)2
W2i
2
W 22i
=E
W2i
(A20)
W2i
by (A13) and (A17) regardless of whether or not
W1i,
W2i are set to zero,
then t* ¯ i, y¯ i are ˆ NT → N(0, 1) irrespective of whether x estimated or not. Proposition 1.3: Write the statistic as:
N
¯t* ˆ NT =
1
N
T
i=1
t=1
T
Z˜ itit
–1 Lˆ –2 11i (0, 1) T
(T
Lˆ 21i ˆ i Lˆ 22i
–1/2
˜ it)22 Z˜ itZ
–2
1,
t=1
Then the first bracketed term converges to
(A21)
Fully Modified OLS for Heterogeneous Cointegrated Panels
1
W2i(r) dW1i(r) W1i(1)
L11iL22i
0
127
1
W2i(r) dr
0
~ N 0, L11iL22i
1
W2i(r)2 dr
0
2
1
W2i(r) dr
0
(A22)
by virtue of the independence of W21i(r) and dW1i(r). Since the second bracketed term converges to
1
L22i
W2i(r)2 dr
0
2
1
–1/2
(A23)
W2i(r) dr
0
then, taken together, for Lˆ i → Li, (A21) becomes a standardized sum of i.i.d. standard normals regardless of whether or not
W1i,
W2i are set to zero,
and thus ¯t* ˆ NT → N(0, 1) by a standard central limit theorem argument irrespective of whether x¯ i, y¯ i are estimated or not. Proposition 2.1: Insert the expression for y*it into the numerator and use yit y¯ i = (xit x¯ i) + it to give N T Lˆ 21i –1 –1 Lˆ 11iLˆ 22i (xit x¯ i)(it
xit) Tˆ i Lˆ 22i i = 1 t = 1 ˆ NT = * N T
(xit x¯ i)2
Lˆ 222i
i=1
N
Lˆ Lˆ
–1 –1 11i 22i
+
i=1
t=1
Lˆ 11i Lˆ 22i 1+ Lˆ 22i
N
T
t=1
T
(xit x¯ i)2
(A24)
(xit x¯ i)2
Lˆ –2 22i
i=1
t=1
Lˆ 11i Lˆ 22i ˆ –1 ˆ –1 Since Lˆ –2 , the last term in (A24) reduces to , thereby 22i = L11iL22i 1 + Lˆ 22i
giving the desired result.
128
PETER PEDRONI
APPENDIX B Table I. Small Sample Performance of Group Mean Panel FMOLS with Heterogeneous Dynamics Case 1: 21i ~ (0.0, 0.8) N
T
bias
std error
5% size
10% size
10
10 20 30 40 50 60 70 80 90 100
–0.058 –0.018 –0.009 –0.006 –0.004 –0.003 –0.002 –0.002 –0.002 –0.001
0.115 0.047 0.029 0.020 0.016 0.012 0.010 0.009 0.008 0.007
0.282 0.084 0.061 0.035 0.027 0.020 0.016 0.014 0.014 0.014
0.362 0.145 0.110 0.076 0.062 0.049 0.044 0.040 0.038 0.037
20
10 20 30 40 50 60 70 80 90 100
–0.034 –0.012 –0.006 –0.004 –0.003 –0.003 –0.002 –0.002 –0.002 –0.001
0.079 0.033 0.020 0.014 0.011 0.009 0.007 0.006 0.006 0.005
0.291 0.100 0.076 0.045 0.039 0.028 0.026 0.021 0.020 0.018
0.378 0.166 0.132 0.093 0.081 0.066 0.059 0.055 0.050 0.052
30
10 20 30 40 50 60 70 80 90 100
–0.049 –0.017 –0.009 –0.006 –0.004 –0.003 –0.003 –0.002 –0.002 –0.002
0.061 0.025 0.015 0.011 0.008 0.007 0.006 0.005 0.004 0.004
0.386 0.156 0.107 0.072 0.059 0.047 0.039 0.035 0.032 0.030
0.470 0.234 0.177 0.133 0.118 0.096 0.086 0.073 0.077 0.076
Notes: Based on 10,000 independent draws of the cointegrated system (1)–(3), with = 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(–0.85, 0.85) and 11i ~ U(–0.1, 0.7), 12i ~ U(0.0, 0.8), 21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0).
Fully Modified OLS for Heterogeneous Cointegrated Panels
129
Table II. Small Sample Performance of Group Mean Panel FMOLS with Heterogeneous Dynamics Case 2: 21i ~ U(–0.8, 0.0) N
T
bias
std error
5% size
10% size
10
10 20 30 40 50 60 70 80 90 100
0.082 0.041 0.025 0.016 0.012 0.009 0.007 0.006 0.005 0.005
0.132 0.058 0.037 0.027 0.021 0.017 0.014 0.012 0.011 0.010
0.422 0.234 0.187 0.137 0.115 0.091 0.087 0.078 0.072 0.063
0.498 0.324 0.268 0.213 0.185 0.155 0.151 0.140 0.135 0.120
20
10 20 30 40 50 60 70 80 90 100
0.093 0.043 0.026 0.017 0.012 0.009 0.007 0.006 0.005 0.004
0.092 0.042 0.027 0.020 0.015 0.012 0.010 0.009 0.008 0.007
0.581 0.352 0.265 0.205 0.158 0.130 0.117 0.109 0.103 0.090
0.648 0.447 0.361 0.294 0.242 0.211 0.194 0.181 0.170 0.156
30
10 20 30 40 50 60 70 80 90 100
0.070 0.033 0.020 0.013 0.009 0.007 0.006 0.005 0.004 0.003
0.071 0.032 0.020 0.015 0.011 0.009 0.008 0.007 0.006 0.005
0.563 0.339 0.259 0.196 0.152 0.131 0.113 0.103 0.096 0.087
0.630 0.433 0.352 0.289 0.236 0.211 0.190 0.175 0.164 0.156
Notes: Based on 10,000 independent draws of the cointegrated system (1)–(3), with = 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(–0.85, 0.85) and 11i ~ U(–0.1, 0.7), 12i ~ U(–0.8, 0.0), 21i ~ U(–0.8, 0.0), 22i ~ U(0.2, 1.0).
130
PETER PEDRONI
Table III. Small Sample Performance of Group Mean Panel FMOLS with Heterogeneous Dynamics Case 3: 21i ~ U(–0.4, 0.4) N
T
bias
std error
5% size
10% size
10
10 20 30 40 50 60 70 80 90 100
0.009 0.011 0.008 0.005 0.004 0.003 0.002 0.002 0.002 0.001
0.129 0.052 0.033 0.023 0.018 0.014 0.012 0.011 0.009 0.008
0.284 0.113 0.086 0.058 0.048 0.039 0.037 0.031 0.029 0.028
0.367 0.179 0.150 0.113 0.093 0.083 0.077 0.072 0.068 0.062
20
10 20 30 40 50 60 70 80 90 100
0.028 0.014 0.009 0.006 0.004 0.003 0.002 0.002 0.001 0.001
0.090 0.037 0.024 0.017 0.013 0.010 0.009 0.008 0.007 0.006
0.346 0.145 0.106 0.077 0.060 0.048 0.040 0.037 0.035 0.035
0.430 0.222 0.179 0.138 0.114 0.093 0.085 0.083 0.079 0.078
30
10 20 30 40 50 60 70 80 90 100
0.008 0.006 0.004 0.003 0.002 0.001 0.001 0.001 0.001 0.001
0.069 0.028 0.018 0.013 0.010 0.008 0.007 0.006 0.005 0.005
0.317 0.122 0.095 0.068 0.054 0.044 0.038 0.036 0.033 0.036
0.402 0.194 0.155 0.122 0.105 0.088 0.082 0.076 0.073 0.074
Notes: Based on 10,000 independent draws of the cointegrated system (1)–(3), with = 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(–0.85, 0.85) and 11î ~ U(–0.1, 0.7), 12i ~ U(–0.4, 0.4), 21i ~ U(–0.4, 0.4), 22i ~ U(0.2, 1.0).
TESTING FOR COMMON CYCLICAL FEATURES IN NONSTATIONARY PANEL DATA MODELS Alain Hecq, Franz C. Palm and Jean-Pierre Urbain ABSTRACT In this chapter we extend the concept of serial correlation common features to panel data models. This analysis is motivated both by the need to develop a methodology to systematically study and test for common structures and comovements in panel data with autocorrelation present and by an increase in efficiency coming from pooling procedures. We propose sequential testing procedures and study their properties in a small scale Monte Carlo analysis. Finally, we apply the framework to the well known permanent income hypothesis for 22 OECD countries, 1950–1992.
I. INTRODUCTION In economics it is often of interest to test whether a set of time series moves together, that is whether the series are driven by some common factors. The vast literature on cointegration has focussed on long-run comovements for nonstationary time series. More recently, some authors have analyzed the existence of short-run comovements between stationary time series or between first differenced cointegrated-I(1) series (see Tiao & Tsay, 1989; Engle & Kozicki, 1993; Gouriéroux & Peaucelle, 1993; Vahid & Engle, 1993; Vahid & Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 131–160. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
131
132
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Engle, 1997; Ahn, 1997). Among these approaches, the concept of serial correlation common features (SCCF hereafter) introduced by Engle & Kozicki (1993) appeared to be useful. It means that stationary time series move together as there exist linear combinations of these variables that yield white noise processes. These common feature vectors are measures for analyzing short-run relationships between economic variables suggested by economic theory such as relative purchasing power parity (Gouriéroux & Peaucelle, 1993), permanent income hypothesis (Campbell & Mankiw, 1990, Jobert, 1995), cross-country real interest rate differentials (Kugler & Neusser, 1993), real business cycle models (Issler & Vahid, 1996), convergence of economies (Beine & Hecq, 1997, 1998), Okun’s Law (Candelon & Hecq, 2000). Serial correlation common features imply the existence of a reduced number of common dynamic factors explaining short-run comovements in economic variables. A companion form of the common features models is the common factor representation which has been used in macroeconomics for some decades (see e.g. Engle & Watson, 1981; Geweke, 1977; Lumsdaine & Prasad, 1997; Singleton, 1980). Beyond economic considerations, through the reducedrank restrictions, the existence of common features is likely to lead to a reduction of the number of parameters to be estimated. In general, imposing common cyclical feature restrictions when they are appropriate will induce an increase in estimation efficiency (Lütkepohl, 1991) and accuracy of forecasts (Vahid & Issler, 1999). Also as for unit roots and cointegration tests, the power of common cyclical feature procedures may be low for small samples (Beine & Hecq, 1999). The power of tests might be increased by relying on panel data instead of using only time series data. Consequently, in this paper we propose to extend these models by testing for serial correlation common features in a panel data framework. In order to avoid confusion, it is worth noticing that standard panel data models with common parameter structures obviously already imply a common feature structure, namely the one which allows to pool the behavior of N individuals. Notice that the assumption of poolability often made in panels may be often far too strong. An investigator may want to test which poolability restrictions are supported by the data and which restrictions have to be rejected for the panel data. We propose to generalize the SCCF approach and apply it to search for common cyclical features in panel data. In particular, we investigate whether there exist linear combinations of the variables for individual or entity i which are white noise for all i, in other words, which weights in the linear combinations are identical across all entities. Developing a methodology to
Testing for Common Cyclical Features
133
analyze and test common cyclical features in panel data is of theoretical and practical importance since common cyclical feature restrictions are less restrictive than the assumption of identical parameters across individuals usually made in panel data modeling. Some purists might not speak about panel for this type of analysis. Indeed, in situations we are interested with, N will be relatively small compared to its value in usual panel data and T is assumed large (with T → ∞ asymptotics). Many macroeconomic studies deal with 15 to 50 annual observations for 20 to 100 countries, regions, industry levels or big firms. In those cases, the border between pure panel analysis (N → ∞ ) and pure time series analysis (T → ∞ ) is fuzzy. Far from impoverishing the panel data analysis, taking into account medium or large size time series raises new interesting issues such as testing for unit roots or cointegration in panel data (see inter alia Levin & Lin, 1993; Pesaran & Smith, 1995; Evans & Karas, 1996a; Kao, 1999; Pedroni, 1997a; Phillips & Moon, 1999b, and Phillips & Moon, 1999a, for the asymptotic theory, and the recent issue of the Oxford Bulletin of Economics and Statistics, 1999). The chapter is organized as follows. Section II provides an example of common features between consumption and income implied by economic theory and likely to be common to data for different countries. In Section III we review the concept of serial correlation common features. Section IV extends it to panel data. As we study differences and similarities in macroeconomic series for different countries, we concentrate our analysis on the fixed effect model (see Hsiao, 1986). Section V describes estimation procedures. In Section VI simulation results are reported. In Section VII we present an empirical analysis of the liquidity constraint consumption model for 22 OECD countries and the G7. Section VIII concludes.
II. AN EXAMPLE OF COMMON FEATURES To further motivate this chapter, consider the permanent income hypothesis (PIH hereafter) and the heterogeneous consumer model proposed by Campbell & Mankiw (1990, 1991). These authors consider two groups of agents who receive a disposable income y1t and y2t in fixed proportions of the total income respectively, such that y1t = yt, y2t = (1 )yt and yt = y1t + y2t. Agents in the first group are subject to liquidity constraints. Therefore, they consume their current income while agents in the second group consume their permanent income. We get the following system:
134
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
c1t = y1t = yt c2t = yP2t = (1 )yPt y1t = yP1t + yT1t
(1)
y2t = yP2t + yT2t, where cit is the consumption of agent i and yPit and yTit are the permanent and transitory component of income of the agent i which are assumed to be I(1) and I(0), respectively. Aggregating over agents we get ct = yP1t + yT1t + yP2t = yPt + yT1t , and thus: ct = yPt + yTt
(2)
yt = yPt + yTt , which shows that aggregate consumption and income share a common trend yPt . Note that because a fraction of income accrues to individuals who consume their current income rather than their permanent income, this model has been labelled ‘ model’ by Campbell & Mankiw (1990, 1991). It is also easily seen that if = 0 we get the permanent income model. In order to stress the common cycle component let us take the first difference of aggregate consumption ct = c1t + c2t. By substituting the shares of income in the total income we obtain ct = yt + (1 )yPt which in first differences can be written as: ct = yt + (1 )yPt .
(3)
Consequently, assuming that the permanent income is a martingale, the consumption function can be tested by the regression ct = yt + (1 )t. However, t is a difference martingale which is not orthogonal by construction to yt. Therefore this equation cannot be consistently estimated by OLS but instrumental variables (IV) estimators are appropriate. With a few exceptions as Vahid & Engle (1993) and Jobert (1995), most empirical studies do not take the cointegrating vector into account as a valid instrument when testing equation (3) using IV estimates, and therefore may be subject to an omitted variable problem. Vahid & Engle (1993) made the connection with the common feature hypothesis that t is a white noise1 with [1 ] the associate normalized common features vector. Empirical studies have shown that is usually significantly different from zero with coefficients in the range 0.3 to 0.5 for most countries. Therefore in order to test for the existence of one short-run relationship common to a set of countries and to improve the power of common feature tests, a pooled common features test in panel seems appropriate. The use of the cross-section dimension in the estimation could also give rise to substantial efficiency gains.
Testing for Common Cyclical Features
135
III. COMMON FEATURES IN TIME SERIES In the context of time series analysis, serial correlation common features means that there exist linear combinations of (stationary) economic time series which are white noise processes. Consider a cointegrated VAR of order p = 2, with reduced rank autoregressive coefficient matrix, written in its VECM form, for consumption and income, for t = 1 . . . T:
ct 1 = + [21 yt 2 1
22]
ct 1 + [ 1 yt 1 1
2]
ct 1 1t + yt 1 2t
(4)
where 1 and 2 are constant drift terms, [1t, 2t] is a bivariate white noise process with non-singular covariance matrix . ( 2/ 1) is the long-run income elasticity if one chooses consumption as normalized variable. A distinction could be made at this stage between a strong and a weak form reduced rank structure, as put forward by Hecq, Palm & Urbain (1997, 2000). The Strong Form Reduced Rank Structure (SF) is the original formulation proposed by Engle & Kozicki (1993) in which long-run and short-run matrices share the same left null space. It corresponds to = in system (4). In this case, there ˜ = [1 ] such that premultiplying exists a common feature vector
˜ expression (4) by yields a white noise. In the less restrictive model, labelled Weak Form Reduced Rank Structure (WF), ≠ , and a linear combination of first differences in deviation from the long-run equilibrium is a white noise:
˜
ct [ 1 yt 1
2
ct 1 yt 1
˜ =
1t . 2t
(5)
Formal definitions of the strong and the weak form are given in Hecq, Palm & Urbain (1997, 2000) and consequences in terms of common cycles as well as inference issues are analyzed there as well. Notice that Hecq et al. (1997) also consider the mixed form combining both the strong and the weak form. Common features relationships give information on short-run comovements. These relationships may come from economic theory (relative purchasing power parity, PIH) or from stylized facts (convergence, Real Business Cycle (RBC) models) and give the dynamic common factor within the system, i.e. 21ct 1 + 22yt 1 in the WF case for instance. The orthogonal complement of ˜ ˜ = 0s 2), gives the factor loading of the common ˜ labelled ˜ (
the , dynamics in the equations, that is ˜ = [ 1] in system (4). Note that these common dynamic factors should not be confused with common cycles.
136
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Common cycles are defined in a specific trend-cycle decomposition as the stationary part of the time series left after removing permanent components. Vahid & Engle (1993) show that the existence of s common feature vectors (of the SCCF or SF type) leads to n s common cycles in the multivariate Beveridge-Nelson decomposition. Vahid & Engle (1997) extend this definition to nonsynchronous cycles. Hecq, Palm & Urbain (2000) propose a BeveridgeNelson decomposition for the WF that allows for a reduced number of common cycles. Note that the latter weak form reduced rank structure will in the sequel not be explicitly considered as we want to focus on the extension of the standard serial correlation common feature analysis to panel data. We use the terms ‘common dynamic features’, ‘common cyclical features’ and ‘common dynamic factors’ as synonyms to denote reduced rank structures in the shortrun dynamics of the first-differenced VAR or the VECM. In this simple bivariate model (4), the serial correlation common feature hypothesis may also be written in terms of moment conditions such as: E[(ct yt).Wt] = 0,
(6)
where E[.] is the expectation operator and Wt = {1, ct 1 . . . ct k, yt 1, . . . , yt l, zt 1} is a set of instruments consisting of a constant term, the lags of both variables and the deviation from the long-run relationship zt 1 ct 1 ( 2/ 1)yt 1. Adopting a two-step approach,2 there are two obvious ways to test for SCCF. The first way is to carry out a canonical correlation analysis between consumption and income on the one hand and the set of instruments on the other hand. The non-significant squared canonical correlations reveal the existence of linear combinations which yield white noise processes. Alternatively, one can use generalized method of moments type estimators exploiting the moment condition (6). A test of overidentifying restrictions implied by (6) is a test of serial correlation common features. The use of canonical correlation estimation has the advantage that results do not rely on the choice of the normalization of the moment conditions. Moreover, it is more convenient when we test for the number of common feature vectors. In this paper we treat the problem in a GMM framework for several reasons. Firstly, we have at most one common feature vector in a bivariate system. Secondly, this framework may be more easily extended to panel data models. Finally, normalization imposed on IV by selecting one variable as having a coefficient equal to one leads to an increase of the power of the test compared with those based on canonical correlations.3
Testing for Common Cyclical Features
137
IV. EXTENSION TO PANEL DATA MODELS Frequently, analyses comparing for instance the PIH with ‘ model’, concentrate on one country, very often the USA. In order to motivate the generality of the theory, some authors extend their empirical investigation to several countries (Campbell & Mankiw, 1991; Evans & Karas, 1996b). However it is difficult to claim that results for different countries are uncorrelated. Since it is not possible to construct a pure time series model with relatively few observations for a large number of individuals, such as a VAR model with 2 N endogenous variables in a bivariate case, alternatives must be found. One solution would be to analyze the system under separation in common features (Hecq, Palm & Urbain, 1999), an extension to separation in cointegration (Granger & Haldrup, 1997; Konishi & Granger, 1993). Under separation in common features, the common feature matrix is block-diagonal with blocks corresponding to one individual i only. Treating the issue in the complete system with separation in common features avoids losing efficiency compared to an analysis of the marginal model for individual i since separation does not require block-diagonality of the disturbance covariance matrix. This solution is however difficult to implement for more than two or three individuals. We illustrate this point via a small Monte Carlo experiment, of which the precise specification will be given in Section VI. Consider a DGP made out of bivariate systems similar to (4), with = (SCCF hypothesis), for respectively two and five individuals. The only cross-sectional relations are due to a non-diagonal disturbance covariance matrix. Complete separation in cointegration, in common features as well as absence of bidirectional short-run Granger causality are thus maintained. Using a standard canonical correlation framework (see inter alia Hecq, Palm & Urbain, 1997) we perform a serial correlation common feature analysis in the marginal model for the first individual, ignoring the disturbance cross-correlations. Alternatively, under separation in common features, we test the number (s = 2 or s = 5) of common feature vectors for each individual in the complete system. We then constrain the common feature space to be block-diagonal (see Hecq, Palm & Urbain, 1999) and estimate the vector for the first individual. In Table 1, we report for 5,000 replications the median and the spread (interquartile range) of the bias, 2 test statistics for the overidentifying restrictions implied by the presence of common features as well as a small sample adjusted version (Hecq, 1999). Although separation in common features holds at the level of the DGP, some efficiency loss, as measured by the spread, is observed in the marginal model compared to the full system for
138
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Table 1. Monte Carlo Results (Separated vs. Marginal Systems)
N=2
N=5
T = 10 T = 25 T = 50 T = 100
T = 10 T = 25 T = 50 T = 100
Marg· bias0·5
bias0·75–0·25
2(2)
2ss(2)
Separ· bias0·5
bias0·75–0·25
2(8)
2ss(8)
–0.056 –0.026 –0.011 –0.005
0.310 0.155 0.104 0.068
14.64 7.56 6.30 4.86
6.22 5.20 5.04 4.42
–0.040 –0.027 –0.013 –0.007
0.441 0.138 0.090 0.059
70.98 18.36 10.16 6.66
12.8 7.14 6.16 5.14
Marg· bias0·5
bias0·75–0·25
2(2)
2ss(2)
Separ· bias0·5
bias0·75–0·25
2(14)
2ss(14)
–0.061 –0.025 –0.012 –0.006
0.299 0.152 0.100 0.069
14.14 7.82 6.30 5.58
5.86 5.44 5.18 5.04
— –0.019 –0.011 –0.007
— 0.241 0.087 0.052
— 99.76 62.88 25.18
— 35.04 15.26 9.38
T = 25 for N = 2 and for T = 50 for N = 5. However the dispersion is too high for smaller sample size and test statistics reject too often the presence of respectively two and five common feature vectors. These illustrative Monte Carlo results call for an extension to a (possibly nonstationary) panel common feature analysis. Let the subscript i = 1, . . . , N indicate the different groups/entities/units, t = 1, . . . , T denote the sample period and j = 1, . . . , n denote the number of variables for each group/entity. We assume that the n-dimensional vector of observed I(1) variables for entity i, Xi, t, is generated by a pi-th order cointegrated VAR which can be expressed in error-correction form as follows:
pi 1
Xi, t = i + t + i i Xi, t 1 +
i, jXi, t j + i, t,
j=1
i = 1, . . . N,
t = 1, . . . , T,
(7)
where i denotes fixed individual effects, t denotes a vector of deterministic time effects, i and i are n ri matrices of full column rank with ri being the cointegrated rank (ri < n) and i, t is a disturbance. The vector t = ( 1, t, . . . , N, t) is an nN 1 dimensional homoskedastic Gaussian mean innovation process relative to X 1 = {Xi, t j, i = 1, . . . , N; j < t} with non-singular contemporaneous covariance matrix , the (i, j)-th block of which being
Testing for Common Cyclical Features
139
E(i, t j, t) = i, j. Note that one could allow for random individual effects in expression (7). This would lead to an error-component structure of i, t similar to that used in the panel-data literature. For system (7), we define a homogeneous SCCF panel model as follows: Definition 1. A panel model is called an homogeneous panel common feature model if there exists, i = 1, . . . , N, a (n si) matrix ˜ i = ˜ j i, j = 1, . . . , N, ˜ ii, t ˜ i Xi, t =
whose columns span the individual co-feature space, such that
is a si-dimensional white noise process for each individual. This definition applies to the case where the individual co-feature matrices ˜ i, and hence their column ranks si, are the same across all individuals. A typical dynamic panel data model with fixed effects i and deterministic time effects t arises as a special case of (7) when the parameters i, i, i, j and i are the same across entities i (see e.g. Hoogstrate, 1998). In order to clarify the nature of the hypotheses underlying the panel common feature restrictions, in the next subsection, following Groen & Kleibergen (1999) for panel cointegration, we consider a model resulting from sequentially testing and imposing restrictions on a high dimensional unrestricted VECM. A. A Panel VECM Representation We are interested in testing for cointegration and common serial features with respect to n I(1) time series in vector Xi, t within a dynamic model for N individuals i. Without loss of generality, we consider a large VECM with one lag in the first differences, e.g. a VAR with two lags in levels. The generalization to high order dynamics is immediate by substituting ij by ij(L) in (8) but it makes the notation heavy. We consider the model without any time dummies for sake of simplicity. For t = 1, . . . , T we may write the nNdimensional system as: 11 . . . 1N 11 . . . 1N · ·· Xt = ·· Xt 1 + Xt 1 + ut, · N1 . . . NN N1 . . . NN
(8)
where Xt = (X 1, t . . . X N, t) , ut = (u 1, t . . . u N, t) and Xt 1 = (X 1, t 1 . . . X N, t 1)
are vectors of dimension nN 1, or more concisely Xt = urXt 1 + urXt 1 + ut,
(9)
where ur and ur are nN nN matrices and ut = + t, = ( 1 , . . . , N) , t = ( 1, t, . . . , N, t) are nN 1 vectors with t ~ N(0, ).
140
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
=
nN nN
11 N1
... ·· · ...
1N NN
.
(10)
When ur = 0, the system (9) is non-cointegrated. The approach presented can be applied to non-cointegrated systems. Obviously, in such system, the WF and SF reduced rank structures are identical. Without imposing any zero block restrictions, the large unrestricted model (8) is not estimable in practice. Consequently, restrictions have to be considered. We first describe cointegrating restrictions before introducing serial correlation common feature restrictions. 1. Cointegrating Restrictions In A Panel VAR We first consider restrictions on the long-run matrix ur in the unrestricted VECM. Two types (A and B) of sequences of hypotheses naturally arise in panel data. The hypotheses involved in a sequence can be tested either sequentially or jointly. • A1: Absence of long-run Granger-Causality [see Granger & Lin, 1995] between the individual subgroups, i.e. ur is block-diagonal with elements ij = 0 for i ≠ j. • A2: Cointegration in absence of long-run Granger-causality, i.e. ii = i i, with i and i being n ri matrices of rank ri, i = 1, . . . , N. • A3: Homogeneous panel cointegration, i.e. i = 1, i = 1, . . . , N; r = Nr1. • B1: Cointegration, i.e. ur = , with and being nN r matrices of rank r. • B2: Complete separation in cointegration (see Granger & Haldrup, 1997), i.e. and are block-diagonal with typical blocks i and i respectively, of rank ri, such that a typical block of is i i as defined in A2, and r = Ni = 1ri. • B3: Homogeneous panel cointegration, i.e. i = 1 ; i = 1, . . . , N; r = Nr1. When the first two sets of restrictions in either sequence hold, the following restricted structure arises. 0 11 . . . 1N 1 1 0 . . . · ·· ·· Xt = 0 0 Xt 1 + Xt 1 + ut. (11) · 0 0 . . . N N N1 . . . NN When it is appropriate to add a restricted trend in the cointegration space, we replace Xt 1 by X*t 1 = (X t 1, t) . For N fixed, a likelihood ratio statistic for testing (11) versus (8) can be obtained using the sum of two different conditional likelihood ratio statistics to test the sets of restrictions {A1, A2} or
Testing for Common Cyclical Features
141
{B1, B2}. Next, homogeneity of panel cointegration can be tested using a likelihood ratio test. A decomposition similar to {A1, A2} is proposed by Groen & Kleibergen (1999). The main problem with this approach is that under A1, that is absence of long-run Granger-causality, the usual tests have an unknown asymptotic distribution, as the possible presence of cointegration interfers with the block-diagonality of ur. On the other hand, once the cointegrating rank in the unrestricted VECM has been fixed, a test statistic with separation as the null hypothesis has an 2-asymptotic distribution. It is worthwhile to mention that although model (11) looks rather specific, it is less restrictive than the models used in the dynamic panel literature, where quite frequently, in addition to separation in cointegration, the same parameter structure is assumed to hold across individuals (see inter alia the overview in Phillips & Moon, 1999b). Occasionally, complete separation is relaxed to requiring to be block-diagonal leaving unrestricted (Larsson & Lyhagen, 1999). 2. Common Feature Restrictions Imposing serial correlation common feature and short-run Granger-noncausality restrictions, system (11) becomes: 0 ˜ 1*1 0 . . . 0 ˜ 11 1 0 . . . ·· · 0 0 X + 0 · 0 Xt 1 + ut. Xt = t1 · · 0 0 . . . ˜ N*N 0 0 . . . ˜ NN N (12) As for cointegrating restrictions, this model may be obtained by considering two of the next three hypotheses under (11). • C1: Serial correlation common features: there exists a (nN s) matrix ˜ such N ˜ that X t is an s dimensional white noise, with s = i = 1si. • C2: Absence of short-run Granger-causality between the individual subgroups: ur is block-diagonal, i.e. ij = 0 for i ≠ j. • C3: Separation in common features: the matrix ˜ is block-diagonal with the (si n) matrix ˜ i being a typical block on the main diagonal, s = Ni = 1si. • C4: Homogeneity of common features: ˜ i = ˜ 1; i = 1, . . . , N; s = Ns1. Actually the hypothesis C2 is implicit when one stacks VECMs. Restriction C3 is developed in Hecq, Palm & Urbain (1999) for the SCCF as well as for the weak form structure. Here again a likelihood ratio for testing model (12) versus (11) can be obtained as the sum of two conditional likelihood ratio statistics to test either {C1, C2} or {C2, C3}. This means that we can first test for common cyclical features under the maintained hypothesis of short-run Granger-non-
142
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
causality C2. Alternatively, we can first test for absence of short-run causality and then test for SCCF since both sequences of restrictions imply separation in common features. This result is derived from Proposition 3.3. in Hecq, Palm & Urbain (1999) which states that under separation in cointegration and blockdiagonality of this long-run matrix, the presence of common features implies that the co-feature matrix is block-diagonal.
V. GMM ESTIMATION To test for common features in a time series context, we have the choice between GMM estimators applied to a regression framework and a canonical correlation procedure based on maximum likelihood (ML) estimation. Both methods have their advantages and drawbacks. The ML estimation is fully efficient and likelihood ratio tests are asymptotically most powerful. GMM estimators can be more easily implemented but they are in general not fully efficient. In this section we present a GMM estimator that will be used in our empirical analysis of a bivariate system for consumption and income for the case where at most one serial correlation common feature vector exists. For each individual, let us split Xi, t = (yi, t, zi, t) and let the bivariate DGP be ˜ i zi, t + i, t yi, t = i + *
(13)
pi 1
zi, t = i(yi *i zi)t 1 +
k=1
pi 1
yi, t 1 + (i) 1,k
(i) 2,kzi, t 1 + i, t,
(14)
k=1
where the second equation for zi, t is just one row of the VECM (11), with normalized cointegrating vector i = [1, *i ]. Both the y’s and the z’s are autocorrelated as the disturbances i, t depend on lagged values of yi, t, zi, t and on the error correction mechanism. Under the null of serial correlation common features for individual i, i, t is a white noise process and the ˜ i ]. ˜ i = [1, * normalized SCCF vector is given by
In practice (Vahid & Engle, 1993, 1997), after the cointegration analysis in the first step, the GMM procedure proceeds as follows. Regress the explanatory variables zt on the whole set of instruments (i.e. lags of Xt and cointegrating vectors) in order to obtain the best linear prediction ˆzt. Then regress yt on a constant term and ˆzt. This estimate gives the potential serial correlation common feature vector ˜ i. Finally, one tests for the validity of the overidentifying restrictions using Hansen’s (1982) 2 test.
Testing for Common Cyclical Features
143
A. Heterogeneous Independent Case When the observations on individuals are assumed cross-sectionally independent, a joint test for the existence of one individual-specific (heterogeneous) common feature vector can be obtained by computing the 2-statistics for the SCCF restrictions for each individual [i ~ 2(i)], with the same number of variables for each i but with the possibility of having a different dynamics and the presence or not of cointegrating vectors. The number of degrees of freedom is then given by i = n(pi 1) + ri (n 1) since si equals one. Using the standard central limit theorem for large N, we then have
N
i
i=1
(2)1/2
N
a
~ N(0, 1)
where =
i
(15)
k=1
This procedure is however not appropriate in the presence of cross-correlation, a phenomenon pointed out inter alia by O’Connell (1998) in the case of panel unit root tests. The size distortions increase with N and with the crosscorrelation. While these distortions are DGP-dependent, we observe empirical sizes of about 20% (nominal size = 5%) for T = 25 and N = 10 as well as for T = 50 and N = 25 using a Monte Carlo experiment similar to the one presented in Section 6.4 B. Homogeneous and Heterogeneous Case Dependent In most cases disturbances across individuals i will be at least contemporaneously correlated i.e. if some ij ≠ 0 for i ≠ j, and/or for ii being non-diagonal for some i. For instance, when testing for PPP in panel data, contemporaneous disturbance correlation arises because one country must serve as a benchmark. Also, for instance, for a given country consumption and income cannot be assumed independent. One way to deal with this cross-country correlation is to incorporate a common time dummy in the panel. This solution was pursued by Pedroni (1997b) in the context of panel cointegration test, but it appears that time dummies do not capture all the correlation, see O’Connell (1998). Another solution we use here is to account for cross-correlation by using GLS or SUR type corrections. These corrections require that T > N and the asymptotics we consider are mainly based on T → ∞ while N is fixed or at least grows at a lower rate than T. Assuming that all the variables in levels are I(1), we first test for each individual i the existence of a cointegrating relationship using standard time
144
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
series-based procedures. In the case the null hypothesis of no-cointegration can be rejected, the cointegration vector(s) are then considered as known in the subsequent analyses. An alternative to the time series based cointegration analysis is to rely on a test procedure designed for cointegrated panel models, a procedure which could possibly be more powerful. The asymptotic arguments used in panel cointegration analysis are however mainly based on large Nasymptotics and independence across units while we are here dealing with fixed N cases allowing for dependence across the units. Existing Monte Carlo simulations furthermore reveal (see inter alia McCoskey & Kao, 1998b, Pedroni, 1997b) the occurrence of some problems when cross-correlation exists. Moreover, the properties of common feature test statistics will be affected by the outcome of the cointegration analysis. Indeed, if one erroneously imposes an identical homogeneous cointegrating matrix *i for all i, while for some j cointegration does not hold or holds with a cointegrating matrix different from *i , the likelihood to reject the SCCF restrictions will tend to increase. Before presenting the GMM-estimator, we present the model under common features in general terms. Under separation C3, the model (11) can be written as ˜1 0 · · · 0
˜ 2 ·· 0
· 0 0 ··· ··· 0 ˜N 0 · · · 0
s nN
X1t X2t XNt nN 1
˜1 0 · · · 0
˜ 2 ·· 0
· 0 = 0 ··· ··· 0 ˜N 0 · · · 0
s nN
u1t u2t uNt
(16)
nN 1
N
with s =
si and ut = (u 1t, u 2t, . . . , u Nt) being IIN(0, ).
i=1
Under the homogeneity assumption C4, the model (16) specializes to become ˜ 1)Xt = (IN
˜ 1)ut. (IN
(17)
As in (13) and (14), we partition the vector Xit as [y it, z it] , where yit and ˜ i is normalized (without zit are si 1 and (n si) 1 subvectors. The matrix
˜ i = [Is , * ˜ i ]. Under this normalization, the loss of generality) as follows
i system (16) can be expressed as
Testing for Common Cyclical Features
y1t y2t yNt
145
˜1 0 ··· 0 * ˜ 2 ·· 0 * 0 · = ·· ·· 0 0 · · ˜ N
0 · · · 0 *
z1t z2t zNt
s (nN s)
s 1
˜ t + u
(18)
(nN s) 1
or more compactly (19) yt = B zt + vt ˜ ˜ with yt = [y 1t, . . . , y Nt] , B = diag( *i ), zt = [z 1t, . . . , z Nt] , vt = ut, ˜ = diag(
˜ i). Transposing (19) and writing the model for a sample of T
observations, we get Y = Z T s
B
+V
T (nN s)(nN s) s
(20)
T s
or in vectorized form y* = Ts 1
Z*
Ts (ns isi2)(ns isi2) 1
+ v*
(21)
Ts 1
with y* = vec(Y), v* = vec(V), Z* = diag(Isi Zi) with Zi = [zitl], of dimension T (n si), with t = 1, . . . , T, l = 1, . . . , n si; and being a vector ˜ i ). Under the homogeneity with typical i-th subvector being equal to vec( * ˜ ˜ assumption C4, *i = *1, i = 1, . . . , N, s = Ns1, the system (21) specializes to become y* = Z*rr + v*
(22)
with the [TNs1 s1(n s1)] matrix Is1 Z1 I Z2 Z*r = s1 ... Is1 ZN ˜ 1). and the [s1(n s1) 1] vector r = vec( * The vector of parameters and r can be estimated by GMM provided we have a (Ts k) matrix of instrumental variables W such that EW v* = 0 and k is equal to or larger than the number of unknown parameters in (or r). The GMM estimator solving W v* = 0 using the weighting matrix S is given by (23) ˆ GMM = [Z* WS 1W Z*] 1Z* WS 1W y*. The optimal weighting matrix is S = W W, where = Ev*v* = IT v, ˜ . ˜ When is unknown, it will have to be replaced by a consistent v =
146
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
estimate. The asymptotic covariance matrix of ˆ GMM with optimal weighting matrix S is given by (24) Var(ˆ GMM) = [Z* W(W W) 1W Z*] 1. Under homogeneity C4, r can be estimated by expression (23) replacing Z* by Z*r. When the number of instruments k is strictly larger than the number of parameters (or r) to be estimated, these overidentifying restrictions can be tested using the well-known minimum distance criterion min (v* W)(W W) 1(W v*),
(25)
which has an asymptotic 2-distribution with the number of degrees of freedom being equal to k minus the number of estimated parameters. Some remarks on the choice of the instruments have to be made. We can determine the order pi of the VAR for each country i using for instance information criteria. The lagged first differences of Xit, i = 1, . . . , pi 1, and the lagged long-run relations can be used to yield n(pi 1) + ri, instruments Wi for Zi in (19) and taking W = diag(Tsi, Wi) where ri is the cointegrating rank of ˆ individual i. As is well-known, the OLS estimator regressing y* on Z*, ˆ where the Z* are the projections of Z* on W, can be obtained as a GMM estimator by selecting S = ITs in (23) and taking W(W W) 1W as instrument. ˆ = W(W * 1W) 1 Similarly, the GLS estimator regressing y* on Z* 1 W * Z*, with * being the disturbance covariance matrix of the (multivariate) regression of Z* on W, can be obtained from (23) by taking S = and using as instruments W(W * 1W) 1W * 1 instead of W. In the empirical analysis in Section VII, we consider a fixed effects model because in the macroeconomic application, we study the population and not a sample. Adding fixed effects to the model (21) for the case which we analyze, e.g. for si = 1, i = 1, . . . , N and n = 2, yields y = Z [ + Z] + Z*r r + v*,
(26)
where Z = T IN and Z = IT N, with T and N being unit vectors of dimension T and N respectively. Let JN denote an N N matrix of ones, so ZZ = IT JN and the projection of JN on Z is IT J¯ N with J¯ N = JN/N. This matrix averages over individuals. Also define time means by ZZ = JT IN and the projection of JT on Z is J¯ T IN. It is shown in Baltagi (1995, p. 28) that ˆ r Q 1QZ* ˆ r ) 1Zˆ r* Q 1Qy, (27) ˆ r, GMM = (Z* where Q = INT J¯ T IN for model with only individual effects and Q = IT IN J¯ T IN IT J¯ N + J¯ T J¯ N when time dummies are present. The ˆ r = W(W * 1W) 1W * 1Z*r will be indicated as the estimator (27) with Z*
Testing for Common Cyclical Features
147
GLS-LSDV estimator. When the matrix is replaced by the identity matrix, a less-efficient estimator arises which will be denoted as the LSDV estimator. The asymptotic covariance matrix of ˆ r, GMM with optimal weighting matrix S is then given by ˆ r QW(W W) 1W QZ* ˆ r ] 1. Var(ˆ r, GMM) = [Z*
(28)
A test for the validity the overidentifying restrictions is obtained using (25) and is readily seen to be a test for the null hypothesis of C4, e.g. for the null of homogeneity of common features: ˜ i = ˜ 1; i = 1, . . . , N, with s = Ns1, si = s1 = 1, i = 1, . . . , N. In this specific case, the number of degrees of freedom for the
N
overidentifying restrictions test (25) is given by
[n(pi 1) + ri (n 1)] +
i=1
(n 1)(N 1) where n, pi, ri are respectively the number of variables, the number of lags and the number of cointegrating relations for each i. Note that the factor (n 1)(N 1) arises as a consequence of the pooled estimation of the common feature vector. Imposing a common co-feature vector actually decreases by (n 1)(N 1) the number of parameters to be estimated. More generally, one could naturally extend the analysis (in the case n > 2) and consider similar analyses for s1 = 1, . . . , n 1. Sequentially testing, for s1 = 1, . . . , n 1, the validity of the underlying overidentifying restrictions with (25), provides a direct way to test the number of common co-features in a GMM set-up, provided we first properly normalize the co-feature matrix as above. A somewhat similar use of GMM for the detection of the dimension of the common feature space, albeit in a pure time series context, is discussed in Vahid & Engle (1997). In the next section, we evaluate the merits of this analysis (for si = s1 = 1, i = 1, . . . , N) in a small Monte Carlo experiment.
VI. MONTE CARLO SIMULATIONS In this section we present some illustrative Monte Carlo evidence on the usefulness of the common feature test statistic (25) presented above for panel data. The data are generated as if there exists a huge VECM with both common feature and cointegrating restrictions. Under the null of reduced rank structures, the bivariate DGP for each of N individuals assumes the existence of one cointegrating vector and of a single common feature vector. It has the form:
148
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
yi, t 1 0.25 = + (1 zi, t 2 0.5
+
0.5 (0.6 1
1)
y1, t 1 z1, t 1
y1, t 1 i1, t + , z1, t 1 i2, t
0.3)
where the ’s are generated from uniform distributions 1 ~ U(0, 0.3), 2 ~ U( 0.25, 0.15) so that E(1) = 0.15 and E(2) = 0.05. The normalized common feature vector is ˜ = (1, 0.5) and the normalized cointegration vector is simply = (1, 1) . For each individual i, (i1, t, i2, t) is bivariate Gaussian with covariance matrix ii. The cross-contemporaneous correlation matrices between individual i and j are all equal to ij so that the panel VECM covariance matrix is given by (10) with ii =
1 0.8 0.8 1
ij =
0.7 0.6
0.6 . 0.75
We have added a heterogeneous structure increasing5 with N. Figures 1 and 2 illustrate a realization of the DGP for 10 individuals and two variables and then they compare processes with (Fig. 2) and without (Fig. 1)
Fig. 1.
A Realization of the GDP for 10 Individuals.
Testing for Common Cyclical Features
Fig. 2.
149
A Realization of the DGP with Additional Heteroscedasticity.
this additional heteroscedasticity. From this DGP we see that under the assumption of reduced rank the short run dynamic matrices (for each i) are simply given by
0.30 0.60
0.15 , while under the alternative we chose to 0.30
arbitrarily fix one element to zero:
0.30 0.60
0.00 . 0.30
We consider three sample sizes, i.e. T = 10, 25 and 50, and five cases for the number of individuals, i.e. N = 1, 2, 5, 10 and 25. We report the median and the spread (interquartile range) of the bias of the GMM panel estimator. We also report the median of the standard deviation of ˆ r, GMM. We report the empirical size (nominal being 5%) as well as the empirical size-adjusted power for overidentifying restrictions test statistics. df denotes the number of degrees of freedom. Due to the huge computational time required for these simulations, 5,000 replications were used for N = 1, 2, 5; 2,000 for N = 10 and 1,000 for N = 25. The results are presented in Table 2. One can directly observe that the bias is small and decreases when both T and/or N increase. The accuracy of estimates, measured both by the spread and the standard deviation of the
150
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Table 2. Monte Carlo Results (GMM estimation and test statistic) biasMedian
biasQ75–Q25
(ˆ r,GMM)Median
2(df)
size
size-adj. power
N=1
T = 10 T = 25 T = 50
–0.0123 –0.0101 –0.0067
0.2228 0.1387 0.0944
0.156 0.098 0.070
(2) (2) (2)
7.88 5.58 5.54
9.90 19.78 34.68
N=2
T = 10 T = 25 T = 50
–0.0136 –0.0069 –0.0034
0.1817 0.1057 0.0726
0.106 0.079 0.057
(5) (5) (5)
4.98 6.18 5.72
8.56 16.58 31.52
N=5
T = 10 T = 25 T = 50
–0.0045 –0.0044 –0.0021
0.1409 0.0751 0.0460
0.067 0.060 0.047
(14) (14) (14)
3.96 5.68 5.74
7.26 12.52 24.82
N = 10
T = 25 T = 50
–0.0022 –0.0020
0.0658 0.0377
0.046 0.038
(29) (29)
4.70 4.80
11.00 21.55
N = 25
T = 50
0.0002
0.0398
0.029
(74)
5.80
13.80
estimate, also increases with T and/or N. We interpret these illustrative findings as evidence in favor of the pooling estimator. No substantial size distortions are observed. Remark that the values of N we have retained in these simulations are clearly too small to assess the validity of a central limit theorem based on large N asymptotic.
VII. EMPIRICAL ANALYSIS The data we use are taken from the Penn World Tables Mark 5.6 (see Summers & Heston, 1991).6 Thanks to the homogeneity in their definition, these data are extremely useful and have been extensively used in empirical literature. However the data are certainly not free of measurement errors because the price to pay for obtaining long series of homogeneous data for more than 150 countries is the reliance on a set of hypotheses, approximations and interpolations. Because of both the quality of the data as well as the underlying theoretical motivation, we limit our analysis to 22 OECD countries for the sample period 1950–1992 (up to 1991 for Greece and 1990 for Portugal).7 The data extracted are Y = ‘RGDPL: Real GDP per capita (Laspeyres index) in 1985 international prices’ and C = ‘C: Real Consumption share of GDP in 1985 international prices’ Y/100. This last operation is necessary to get the consumption in level and not in percentage of income.8 Figure 3 plots the 44
Testing for Common Cyclical Features
Fig. 3.
151
Consumption and Output Series for the 22 OECD Countries.
series, namely consumption and income variables for the OECD countries. The picture also pleas in favor of disposing tools in order to modeling this information. Lower case c and y denote natural logarithms of C and Y respectively. Table 3 reports time series statistics for each country. The first column of Table 3 lists in alphabetical order, the names of the countries as well as the date of joining OECD.9 Column 2 gives the quality ranking of the data as presented in Summers & Heston (1991). It is seen that for the most part, the quality of the data is reasonable. Columns 3 and 4 give the value of the Augmented DickeyFuller unit root test for respectively consumption and income. All tests are based on both a constant term and a trend. The number of lags necessary to whiten the residuals is given in parentheses. Columns 5 and 6 give respectively the value of the Engle-Granger Augmented Dickey-Fuller cointegrating test for each country separately and the long-run elasticity of consumption as a dependent variable. Column 7 gives the order of the VAR(pi) in level, where pi is determined using multivariate Hannan-Quinn (HQC) criteria. These lags, as well as the presence of an error correcting mechanism term, will determine the instruments to be used in common features test statistics. In Table 3, a ‘*’ indicates that individual unit root or cointegration test statistics reject the null at a 5% nominal level. It emerges that, except for
152
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Table 3. Time Series Statistics (Individual countries)
Australia (1971) Austria (1961) Belgium (1961) Canada (1961) Denmark (1961) Finland (1969) France (1961) Germany (1961) Greece (1961) Iceland (1961) Ireland (1961) Italy (1961) Japan (1964) Luxembourg (1961) Netherlands (1961) New Zealand (1973) Norway (1961) Portugal (1961) Spain (1961) Sweden (1961) Switzerland (1961) Turkey (1961) UK (1961) USA (1961)
Qual.
ADF ct
ADF yt
EG
ˆi *
HQC
A A A A A A A A A B+ A A A A A A A A A A B+ C A A
–1.21(4) –0.82(0) –1.43(1) –1.50(1) –0.94(0) –2.48(1) –0.11(2) –2.18(2) –0.58(0) –2.64(1) –2.54(1) –0.61(1) –0.91(0) –1.45(1) –0.71(2) –2.26(0) –1.29(1) –3.54(3)* –1.25(0) –0.70(1) 0.03(4) –3.26(2) –3.61(1)* –1.75(0)
–0.93(2) –1.25(2) –0.74(1) –1.80(1) –0.94(0) –0.20(2) –0.04(1) –3.10(2) 0.01(0) –2.23(1) –2.82(1) –0.77(1) –0.48(1) –3.32(4) –0.20(2) –1.52(0) –1.76(1) –2.95(3) –1.34(0) –0.30(1) –1.69(2) –3.48(0)* –3.62(1)* –2.05(0)
–1.46(1) –3.59(0)* –2.36(0) –3.89(1)* –3.69(0)* –1.69(3) –1.96(0) –1.69(2) –0.79(0) –4.52(0)* –3.76(2)* –1.86(1) –4.75(1)* –2.16(4) –3.07(1) –5.93(0)* –1.83(1) –3.07(3) –2.99(0) –3.58(1)* –3.28(0) –1.73(0) –2.13(0) –4.08(0)*
0.95 1.00 0.94 1.00 0.82 0.98 0.98 1.07 0.97 1.04 0.81 1.09 0.92 1.34 1.08 1.02 0.80 0.88 0.94 0.81 0.92 0.85 1.04 1.15
3 1 1 1 1 4 2 2 1 1 1 4 4 4 4 1 1 3 1 2 2 1 3 2
Portugal, UK and Turkey, we cannot reject the unit root hypothesis for consumption and income. Using the Engle-Granger cointegration test, the null hypothesis of non-cointegration is rejected for nine countries with long-run elasticity *i close10 to 1. Consequently, we will use the cointegrating vectors as instruments in six different versions: four homogeneous cases and two heterogeneous ones. We proceed in two steps. In the first step the cointegrating vectors are estimated. They are used as instruments in the second step to estimate the common feature vectors. The results are reported in Table 4. The homogeneous cases refer to a panel estimation of a common cointegrating vector, that is parameters are assumed to be the same across countries and the contemporaneous disturbance correlation across countries and across variables for a given country is ignored. Absence of short-run Granger-causality between countries is assumed throughout steps 1 and 2.
Testing for Common Cyclical Features
153
Because most panel cointegration test statistics assume independence across individuals, we cannot, strictly speaking, rely on these panel cointegration test statistics. However because the estimator of the cointegrating vectors is still consistent we use them to get estimates for four different cases. • As Table 3 shows even when the absence of cointegration is not rejected, the elasticity is close to one. We first analyze a version in which we assume that there exists a homogeneous cointegrating relationship for all the countries with a coefficient * equal to one (see upper panel of Table 4). Similar results are obtained using Johansen’s MLE based procedure. • A second panel cointegration test uses the group mean estimator (GM) of Pesaran et al. (1997). This means that we average cointegrating vectors over the 22 individuals. Table 4.
Common Features within 22 OECD Countries ˆ r,GM
NGM
ˆ r,GMM
(ˆ r,GMM)
Test
df
p-val
*i = 1, (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.770 0.769 0.770 —
3.71 6.14 4.43 —
0.745 0.660 0.704 0.718
0.050 0.036 0.031 0.036
148.98 173.65 211.27 156.04
65 109 153 93
< 0.001 < 0.001 0.001 < 0.001
ˆ GM = 0.979 *i = * (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.829 0.804 0.793 —
5.36 6.54 4.95 —
0.768 0.670 0.710 0.728
0.051 0.036 0.031 0.036
146.67 176.61 214.06 156.92
65 109 153 93
< 0.001 < 0.001 < 0.001 < 0.001
ˆ OLS = 0.939 *i = * (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.870 0.837 0.822 —
5.74 5.12 3.93 —
0.814 0.687 0.727 0.738
0.050 0.036 0.031 0.036
131.96 170.16 206.93 145.01
65 109 153 93
< 0.001 < 0.001 0.002 < 0.001
*i = ˆ LSDV = 0.968 (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.855 0.821 0.804 —
6.03 6.25 4.94 —
0.782 0.677 0.715 0.733
0.051 0.036 0.031 0.036
142.93 175.97 213.50 155.12
65 109 153 93
< 0.001 < 0.001 0.001 < 0.001
ˆj *i = * (i,j with cointegration)
p* = 1 p* = 2 p* = 3 p* = p*i
0.814 0.726 0.755 —
6.89 6.16 4.46 —
0.782 0.647 0.696 0.707
0.053 0.036 0.031 0.037
138.45 158.74 210.03 146.50
52 96 140 80
< 0.001 < 0.001 < 0.001 < 0.001
*i = 1 (i with cointegration)
p* = 1 p* = 2 p* = 3 p* = p*i
0.865 0.784 0.775 —
1.59 3.89 2.72 —
0.810 0.682 0.734 0.750
0.056 0.037 0.033 0.040
115.25 144.00 192.33 131.56
52 96 140 80
< 0.001 0.001 0.002 < 0.001
154
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
• A third alternative uses the usual OLS estimator. • The last one allows for intercept heterogeneity and is the usual LSDV estimator. Note that the pooled FM-OLS estimator proposed by Pedroni (1997a), which assumes independence across units, gives a point estimate of 0.971 for the 22 OECD countries and 1.021 for the G7 countries, the latter being not significantly different from one. Both results are very close to those obtained with the LSDV and OLS estimators so that the results of the common cyclical feature analysis obtained with Pedroni’s FM-OLS estimator are not reported. For the two heterogeneous cases we impose cointegration for the nine countries for which the Engle-Granger ADF test is significant. In step 2, we take as an instrument, cointegrating vectors for countries for which we reject the null. Notice that Phillips-Hansen Fully Modified OLS estimation was also used to test formally the assumption of unit long-run elasticity. The null of unit long-run elasticity was formally rejected in all cases of cointegration but for three (Austria, Canada and New-Zealand). Two different cases are considered: • For the nine countries we take the estimated value of *i given by the longrun regression. • We fix these values to 1. The maximum lag length for a country is four, so that p* = (p–1) = 0, 1, 2 or 3 for some countries. The following cases are considered: • p* is fixed uniformly to respectively 1, 2, 3 • p* is fixed to the value determined using the HQ criterion. Note that over-estimating the lag length will certainly reduce the power of the test statistics (Beine & Hecq, 1999). The results of the two panel common feature statistics are presented. For the heterogeneous cases, the first two columns present the group mean estimates (denoted by ˆ r, GM) as well as the value of the Normal test statistics (NGM) in (15) which tests for the significance of one common feature vector. The next columns present the value of common feature elasticity for the homogeneous dependent case (denoted by ˆ r, GMM), the associated standard errors denoted by (ˆ r, GMM), as well as the value of the test of the overidentifying restrictions implied by the common feature vector (column labelled Test) asymptotically 2(df) under the null, with the column df indicating the degrees of freedom of these statistics. The final column labelled p-val reports the associated p-values. Note that in the second step, we always assume the occurrence of nonzero contemporaneous disturbances correlation. It appears that the estimated coefficient ˆ r, GMM and ˆ r, GM are too high compared with a priori expectations. Moreover we reject the null of a panel
Testing for Common Cyclical Features
155
common feature model for both test statistics. Table 5 presents the results for the G7. The results are similar to those for the panel of 22 countries. However in several situations we cannot reject the null of one homogeneous common features vector. In these cases, we imposed the unlikely hypotheses of an homogeneous cointegrating vector with a lag order uniformly fixed to p* = 3. Finally, we want to notice the implications for empirical modeling that follow from a restriction between the number of variables n and the sum of cointegrated vectors and common features vectors. From Vahid & Engle (1993), Theorem 1, it follows that the common feature space and the cointegration space are linearly independent. This means that the sum of the number of common feature vectors (s) and of the number of cointegrating vectors (r) should be less than or equal to the number of variables (n): r + s ≤ n. In a panel context under the absence of long and short-run Granger causality, this has obvious but different implications depending on whether common features vectors and cointegrating vectors are homogeneous or heterogeneous.
Table 5.
Common Features within G7 Countries ˆ r,GM
NGM
ˆ r,GMM
(ˆ r,GMM)
Test
df
p-val
ˆ LSDV *i = 1 = * (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.866 0.763 0.755 —
2.47 2.37 1.81 —
1.042 0.856 0.872 0.884
0.087 0.060 0.052 0.058
32.83 53.70 67.05 50.84
20 34 48 30
0.035 0.017 0.036 0.010
ˆ GM = 1.035 *i = * (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.893 0.777 0.766 —
1.64 1.815 1.49 —
1.021 0.857 0.878 0.892
0.082 0.060 0.052 0.057
31.51 50.25 62.75 46.22
20 34 48 30
0.048 0.036 0.075* 0.029
ˆ OLS = 1.023 *i = * (i)
p* = 1 p* = 2 p* = 3 p* = p*i
0.882 0.771 0.762 —
1.75 1.89 1.51 —
1.036 0.856 0.876 0.890
0.084 0.060 0.052 0.057
32.06 51.11 63.87 47.84
20 34 48 30
0.043 0.030 0.062* 0.021
ˆj *i = * i,j with size cointegration)
p* = 1 p* = 2 p* = 3 p* = p*i
0.818 0.710 0.737 —
6.02 3.58 2.13 —
0.894 0.723 0.787 0.800
0.074 0.053 0.047 0.051
49.07 52.66 64.46 46.61
16 30 44 26
*i = *j = 1 (i,j with size cointegration)
p* = 1 p* = 2 p* = 3 p* = p*i
0.875 0.753 0.764 —
2.68 2.60 1.66 —
1.029 0.859 0.894 0.917
0.089 0.062 0.053 0.061
27.69 47.49 60.14 43.97
16 30 44 26
< 0.001 0.006 0.024 0.008 0.034 0.022 0.053* 0.015
156
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
A misspecification of the number of homogeneous cointegrating vectors may for instance too heavily constrain the dimension of the homogeneous common feature space and lead to flawed inference regarding the existence of common features. A last remark seems in order. Although we can formally reject the existence of a common homogeneous co-feature relation in this OECD data set, one should be aware that our results do not per se imply the absence of SCCF for some of the countries taken individually.
VIII. CONCLUSION In this chapter we extended the serial correlation common feature analysis to nonstationary panel data models. Concentrating upon the fixed effect model, we defined homogeneous panel common feature models. We give a series of steps allowing to implement these tests. We then apply this framework when investigating the liquidity constraints model for 22 OECD and G7 countries. At a 5% nominal level, we reject the presence of a panel common feature vector. From the empirical analysis we can draw several (tentative) conclusions: First, in a country by country analysis for approximately slightly less than 50% of the countries in the sample, there is evidence of cointegration between consumption and income. The cointegrating vector appears to be homogeneous across these countries with a long-run consumption elasticity close to one. Second, for the sample of 22 countries, the existence of one homogeneous SF (SCCF) common feature vector is rejected in most instances when using the test proposed in (15). For the sample of G7 countries, in several instances, the occurrence of a homogeneous SF common feature vector is not rejected. Notice that this restriction is obviously less restrictive when it only applies to seven countries. However the p-values are quite low and the non rejection of the null hypothesis occurs when the model might be misspecified in particular because we have maintained a homogeneous lag length of 3. Third, the overidentifying restrictions implied by the assumption of a homogeneous common feature vector are rejected in all instances in the sample of 22 countries. For the G7 countries, again there is occasionally evidence in favor of the overidentifying restrictions. Again, it is not surprising to see that the assumption of homogenous common features is rejected more frequently than the assumption of homogenous cointegration. In the long-run consumption and income are closely linked to each other, short-run deviations are generally possible and can be realized through saving or borrowing.
Testing for Common Cyclical Features
157
Our model representation is not stricto sensus a dynamic panel because only a part of the dynamics is common to all individuals. However it does part of the job. Indeed while no size distortions have been noticed in our Monte Carlo results, we can increase the power of test statistics, by going a step further towards dynamic panel data if the null hypothesis of panel common-cyclical feature model is not rejected. In the opposite case, it is not worth imposing further common restrictions if the null is rejected. This is a clue for considering less restrictive models like heterogeneous or group homogeneous models. A bootstrap procedure could certainly be undertaken to find the distribution. This is also perhaps the place to choose more flexible models like the nonsynchronous common cycle model (Vahid & Engle, 1997) or the weak form common feature analysis (Hecq, Palm & Urbain, 1997).
ACKNOWLEDGMENT Support from METEOR through the research project ‘Dynamic and Nonstationary Panels: Theoretical and Empirical Issues’ is gratefully acknowledged. The authors want to thank two anonymous referees and the coeditor for useful comments of a previous version of this paper. The usual disclaimer applies. The GAUSS routines and the data that have been used in this paper are available from http://www.employ.unimaas.nl/j.urbain
NOTES 1. Note that Vahid & Engle (1997) have extended their framework to the case where a linear combination is a MA(q) process and not a white noise. They labelled this model non-synchronous common cycle. 2. The first step checks for the presence of cointegrating relationships and then, given the estimated cointegration relations, the common feature analysis is carried out in a second step. An alternative is to use a joint estimation procedure that exploits both the cointegration and common features restrictions using a switching algorithm (Hansen & Johansen, 1998; Hecq, 1999). 3. See Anderson and Vahid (1996) for the connection between GMM and canonical correlation estimators. 4. Complete results are available upon request. 5. The operation is the following. Consider an N dimensional vector with increment four g = (1, 5, 9 . . .) . We form an nN nN matrix G = gg R with R an n n matrix with all elements equal to 1. Then the heteroskedasticity disturbance covariance matrix * is given by * = G, with given in (10) and the elementwise product or Hadamard product. 6. The data may be downloaded via different internet sites such as http://www.nber.org/pwt56.html or http://datacentre.epas.utoronto.ca:5620/pwt. 7. Because of computation facility, we have balanced the panel in this study and we did not consider either Greece and Portugal.
158
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
8. We did not consider here a slightly different model in which real government expenditures are substracted from output. Indeed, as raised by Evans & Karas (1996b), the ‘ model’ should be extended to take care of the potential substitutability or complementarity between private and public goods. Without a fine distinction of the components of government expenditures, it might be desirable to extend the model to take into account a third variable. It is also possible to consider a simple alternative model where all the public goods are substitutable to private one by substracting G from Y. 9. Other countries joined the OECD. This was the case of the Czech Republic in 1995, Korea in 1996, Poland 1996 and Mexico 1994. We drop them because the ending year is 1992 in our data set. Also note that OECD has its origin in the Organization for European Economic Cooperation which grouped European Countries. This organization was charged with administering United States aid, under the Marshall Plan, to reconstruct Europe after the World War II. Consequently, for countries that did not participate at the beginning in this project, homogeneity of cointegration and/or common features might be rejected for that reason. 10. As noted in Section 4, the main part of the approach presented in this paper also applies to non-cointegrated systems.
REFERENCES Ahn, S. K. (1997). Inference of Vector Autoregressive Models with Cointegration and Scalar Components. Journal of the American Statistical Association, 92, 350–356. Anderson, H., & Vahid, F. (1996). Testing Multiple Equation Systems for Common Nonlinear Components. Working paper, Department of Economics, Texas A&M University. Banerjee, A. (Ed.) (1999). Testing for Unit Roots and Cointegration Using Panel Data: Theory and Applications. Oxford Bulletin of Economics and Statistics, 61, 607–629. Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley. Beine, M., & Hecq, A. (1997). Asymmetric Shocks Inside Future EMU. Journal of Economic Integration, 12, 131–140. Beine, M., & Hecq, A. (1998). Codependence and Convergence, an Application to the EC Economies. Journal of Policy Modeling, 20, 403–426. Beine, M., & Hecq, A. (1999). Inference in Codependence: Some Monte Carlo Results and Applications. Annales d’Economies et de Statistique, 54, 69–90. Campbell, J. Y., & Mankiw, N. G. (1990). Permanent Income, Current Income, & Consumption.Journal of Business and Economic Statistics, 8, 265–279. Campbell, J. Y., & Mankiw, N. G. (1991). The Response of Consumption to Income: A CrossCountry Investigation. European Economic Review, 35, 723–767. Candelon, B., & Hecq, A. (2000). Stability of the Unemployment-Activity Relationship In: A Codependent System. Applied Economics Letters, forthcoming. Engle, R. F., & Kozicki, S. (1993). Testing for Common Features (with comments). Journal of Business and Economic Statistics, 11, 369–395. Engle, R. F., & Watson, M. W. (1981). A One-Factor Multivariate Time Series Model of Metropolitan Wages.Journal of the American Statistical Association, 76, 545–565. Evans, P., & Karras, G. (1996a). Convergence Revisited. Journal of Monetary Economics, 37, 249–265.
Testing for Common Cyclical Features
159
Evans, P., & Karras, G. (1996b). Private and Government Consumption With Liquidity Constraints. Journal of International Money and Finance, 2, 255–266. Geweke, J. (1977). The Dynamic Factor Analysis of Economic Time Series. In: D. J. Aigner & A. S. Goldberger (Eds), Latent Variables in Socio-Economic Models.Amsterdam: NorthHolland. Gouriéroux, C., & Peaucelle, I. (1993). Séries codépendantes: application à l’hypothèse de parité du pouvoir d’achat. In: Macroéconomie}, Développements Récents. Economica: Paris. Granger, C. W. J., & Lin, J. L. (1995). Causality in the Long Run. Econometric Theory, 11, 530–536. Granger, C. W. J., & Haldrup, N. (1997). Separation in Cointegrated Systems and P-T Decompositions. Oxford Bulletin of Economics and Statistics, 59, 449–463. Greene, W. H. (1993). Econometric Analysis. New York: MacMillan. Groen, J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of Vector Error Correction Models. Discussion Paper TI 99–055/4, Tinbergen Institute, Erasmus University Rotterdam. Hamilton, J. D. (1994). Time Series Analysis. Princeton: Princeton University Press. Hansen, L. P. (1982). Large Sample Properties of Generalized Method of Moment Estimators. Econometrica, 50, 1029–1054. Hansen, P. R., & Johansen, S. (1998). Workbook on Cointegration. Oxford: Oxford University Press. Hecq, A. (1999). On the Usefulness of Considering Common Serial Features and Cointegrating Restrictions. Working paper, University of Maastricht RM/99/017. Hecq, A., Palm, F. C., & Urbain, J. P. (1997). Testing for Common Cycles in VAR Models with Cointegration. Working paper, University of Maastricht RM/97/031 (revised 1998). Hecq, A., Palm, F. C., & Urbain, J. P. (1999). Separation and Weak Exogeneity in Cointegrated VAR Models with Common Features. mimeo, University of Maastricht. Hecq, A., Palm, F. C., & Urbain, J. P. (2000). Permanent-Transitory Decomposition in VAR Models with Cointegration and Common Cycles. Oxford Bulletin of Economics and Statistics, forthcoming. Hoogstrate, A. J. (1998). Dynamic Panel Data Models: Theory and Macroeconomic Applications. Ph. D.Thesis, University of Maastricht. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels. mimeo, University of Cambridge. Issler, J. V., & Vahid, F. (1996). Common Cycles in Macroeconomic Aggregates. mimeo. Jobert, T. (1995. Tendances et cycles communs à la consommation et au revenu: Implications pour le modèle de revenu permanent. Economie et Prévision, 121, 19–38. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Kugler, P., & Neusser, K. (1993). International Real Interest Rate Equalization: A Multivariate Time-Series Approach. Journal of Applied Econometrics, 8, 163–174. Kunst, R., & Neusser, K. (1990). Cointegration in Macroeconomic System. Journal of Applied Econometrics, 5, 351–365. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 40, 1–44. Konishi, T., & Granger, C. W. J. (1993). Separation in Cointegrated Systems. Working paper, Department of Economics, University of California-San Diego
160
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite Sample Properties. Working paper, Department of Economics, University of Calfornia-San Diego. Larsson, R., & Lyhagen, J. (1999). Likelihood-Based Inference in Multivariate Panel Cointegration Models. Working paper 331, Stockholm School of Economics, SSE. Lumsdaine, R. L., & Prasad, E. (1997). Identifying the Common Components in International Economic Fluctuations. NBER Working paper 5984. Lütkepohl, H. (1991). Introduction to Multiple Time Series Models. Berlin: Springer Verlag. McCoskey, S., & C. Kao. (1998a. A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 57–84. McCoskey, S., & Kao, C. (1998b). A Monte Carlo Comparison of Tests for Cointegration in Panel Data. mimeo. O’Connell, P. (1998). The Overvaluation of Purchasing Power Parity. Journal of International Economics, 44, 1–19. Pedroni, P. (1997a). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power Parity. Working paper, Department of Economics, Indiana University. Pesaran, M. H., Shin, Y., & Smith, R. P. (1997). Pooled Estimation of Long-Run Relationships in Dynamic Heterogenous Panels. Working paper, Department of Economics, University of Cambridge. Pesaran, M. H., & Smith, R. P. (1995). Estimating Long-Run Relationships From Dynamic Heterogenous Panels. Journal of Econometrics, 68, 79–113. Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 1057–1111. Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some Recent Developments. Econometric Reviews, forthcoming. Singleton, K. (1980). A Latent Time Series Model of the Cyclical Behavior of Interest Rates. International Economic Review, 21, 559–575. Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950–1988. Quarterly Journal of Economics, 106, 327–368. Tiao, G. C., & Tsay, R. S. (1989). Model Specification in Multivariate Time Series. Journal of Royal Statistical Society (series B), 51, 157–213. Vahid, F., & R. F. Engle (1993). Common Trends and Common Cycles. Journal of Applied Econometrics}, 8, 341–360. Vahid, F., & R. F. Engle. (1997). Codependent Cycles. Journal of Econometrics, 80, 199–221. Vahid, F., & Issler, J. V. (1999). The Importance of Common-Cyclical Features in VAR Analysis: A Monte-Carlo Study. Presented at ESEM99 in Madrid, Spain.
THE LOCAL POWER OF SOME UNIT ROOT TESTS FOR PANEL DATA Jörg Breitung ABSTRACT To test the hypothesis of a difference stationary time series against a trend stationary alternative, Levin & Lin (1993) and Im, Pesaran & Shin (1997) suggest bias adjusted t-statistics. Such corrections are necessary to account for the nonzero mean of the t-statistic in the case of an OLS detrending method. In this chapter the local power of panel unit root statistics against a sequence of local alternatives is studied. It is shown that the local power of the test statistics is affected by two different terms. The first term represents the asymptotic effect on the bias due to the detrending method and the second term is the usual location parameter of the limiting distribution under the sequence of local alternatives. It is argued that both terms can offset each other so that the test has no power against the sequence of local alternatives. These results suggest to construct test statistics based on alternative detrending methods. We consider a class of t-statistics that do not require a bias correction. The results of a Monte Carlo experiment suggest that avoiding the bias can improve the power of the test substantially.
I. INTRODUCTION In a panel data set, a variable yit is observed for cross section units i = 1, . . . , N in t = 1, . . . , T time periods. A well known problem with such data is Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 161–177. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
161
162
JÖRG BREITUNG
unobserved heterogeneity (e.g. Hsiao (1986) and Baltagi (1995)). In a univariate time series context heterogeneity may result in individual specific mean and short run dynamics. For illustration consider an autoregressive process of the form yit = i + iyi, t 1 + it ,
(1)
where the error term it is assumed to be uncorrelated across i and t. In this model individual heterogeneity is represented by the individual specific parameters i, i and 2i = E(2it). If there are no further assumptions on the parameters, then the data for each cross section unit can be analyzed separately by running N different regressions. In this case, we take no advantage from pooling the data and, thus, inference may be very inefficient. The other extreme is that we ignore a possible heterogeneity altogether and estimate a pooled regression with 1 = · · · = N, 1 = · · · = N and 21 = · · · = 2N. Of course, ignoring heterogeneity in the data may result in biased estimates (e.g. Baltagi (1995) p. 3f). Traditional panel data analysis adopts a compromise between these two extremes and assumes that individual heterogeneity can be represented by an individual specific intercept i alone. Furthermore, one often encounters additional assumptions on the individual effect i, for example, that it is random and uncorrelated with the regressors. The latter model is known as ‘random-effects model’. It is not surprising that early work on tests for unit roots in panel data starts from the Dickey-Fuller type regression with individual specific intercept (e.g. Breitung (1992)). Levin & Lin (henceforth: LL) (1993) and Im, Pesaran & Shin (henceforth: IPS) (1997) consider more general models by allowing for individual specific short run dynamics and time trends. It is well known that the usual dummy variable estimator (or ‘within-group’ estimator) of dynamic models suffers from the so-called ‘Nickell bias’ (Nickell 1981). The same is true if individual specific time trends are estimated by using the dummy-variable approach. LL (1993) construct a bias adjusted t-statistic to test the null hypothesis of a unit root process. Unfortunately, bias adjusted test statistics for the model with a constant or a time trend suffer from a severe loss of power. For example, the power of the LL (1993) test without an intercept (and thus without the need to correct for the Nickell bias) against a stationary alternative with an autoregressive coefficient of 0.9 is virtually unity for N = 25 and T = 25. For the bias adjusted test statistic in the model with individual specific intercept (trend), the power against the same alternative drops to 0.45 (0.25). Furthermore IPS (1997) observe a serious size bias if the bias adjusted LL statistic is augmented with lagged differences.
The Local Power of Some Unit Root Tests
163
If there is only a constant in the model, the problem is easily resolved by subtracting the first observation instead of the mean. As argued in Schmidt & Phillips (1992), the first observation is the best estimator of the constant under the hypothesis of a random walk. Furthermore, subtracting the first observation instead of the mean avoids the Nickell bias and, therefore, the test does not require a bias correction (cf. Breitung & Meyer (1994)). To study the asymptotic properties we compare the local power of the bias adjusted test statistics. Our analysis demonstrates that the local power of the test depends on two different terms. The first term represents the asymptotic effect on the bias due to the detrending method and the second term is the usual location parameter of the limiting distribution under the sequence of local alternatives. It is shown that if the long-run variances are estimated consistently, both terms cancel out each other so that the test statistic is centered around zero under the local alternative. Levin & Lin (1993), suggest to estimate the long-run variances by using a non-parametric estimator computed from the first differences of the series. An attractive property of this approach is that under the alternative the non-parametric estimator tends to zero so that the resulting test statistic has power against the sequence of local alternatives. A class of tstatistics is suggested that do not require a bias correction. These tests are based on the t-statistic from a simple least-squares regression of transformed variables and it is shown that the limiting distribution of these tests is standard normal. The results of our Monte Carlo experiments suggest that avoiding the detrending bias may improve the power of the test substantially. The rest of this chapter is organized as follows. In Section II the details of the test statistics are given. The local power of the tests is analyzed in Section III. In Section IV a class of t-statistics is suggested in order to avoid the detrending bias. Since the test are based on asymptotic properties, it is interesting to consider the relative performance of the tests in small samples. This problem is studied in Section V by using Monte Carlo simulations. Furthermore, the actual power against a sequence of local alternatives is investigated by means of Monte Carlo simulations. Section VI offers some conclusions and makes suggestions for further research. Finally, a word on the notational conventions applied in this chapter. A standard Brownian motion is written as Wi(r). Although there are different Brownian motions for different cross section units i, we sometimes drop the index i for convenience. This has no consequences for the final results since they depend on the expectation of the stochastic functionals. Furthermore, if there is no risk of misunderstanding, we drop the limits and the argument r (or dr). For example, the term 01 rWi(r) dr will be economically written as rW. A
164
JÖRG BREITUNG
detrended Brownian motion is represented as V(r) V = W W 12r rW. As usual in this kind of literature we use [a] to indicate the integer part of a. The proofs of the lemmas and theorems can be found working paper version (Breitung 1999).
II. THE TEST STATISTICS Assume that the variable yit can be represented as yit = i + it + xit ,
(2)
t = 1, 2, . . . , T ,
where xit is generated by the autoregressive process
p+1
xit =
ikxi, t k + it
(3)
k=1
and xis = 0 for s ≤ 0. It is assumed that it is white noise with E(2it) = 2i and E|it|2 + < for all i, t and some > 0. Furthermore it is assumed to be independent of js for i ≠ j and all t and s. The null hypothesis is that the process is difference stationary, i.e.
p+1
H0:
i
ik 1 = 0 for all i = 1, . . . , N .
(4)
k=1
Under the alternative we assume that yit is (trend) stationary, that is, i < 0 for all i. The assumptions concerning it ensure that there exists a functional central limit theorem such that
[rT]
T 1/2
it ⇒ iWi(r) ,
t=1
T
where Wi(r) is a Brownian motion, = lim E(T¯ ) and ¯ i = T 2 i
T→
2 i
1
it (e.g.
t=1
Phillips & Solo (1992)). The parameter 2i is sometimes called the ‘long-run variance’, since it is computed as 2 times the spectral density at frequency zero. LL (1993) suggest a test procedure against the alternative 1 = · · · = N < 0. Let eit (vi, t 1) denote the residuals from a regression of yit (yi, t 1) on 1, t, yi, t 1, . . . , yi, t p. Furthermore, let e˜ it = eit /i and v˜ it = vit /i, where in
The Local Power of Some Unit Root Tests
165
practice 2i is estimated using the residuals eit. The LL test is based on the bias adjusted t-statistic for = 0 in the regression: e˜ it = ˜vi, t 1 + it . LL (1993) show that under the null hypothesis, the ordinary t-statistic tends to minus infinity if a constant or a time trend is included in the model. Therefore, they suggest a bias adjusted test statistic given by
N
T
i=1
t=1
[˜eitv˜ i, t 1 (i /i)aT]
LL =
N
,
(5)
T
bT
i=1
v˜ 2i, t 1
t=1
where aT and bT are the small sample analogs of
b2 =
V dV
(6)
var[ VdV] E V2
(7)
a = E
and V V(r) is a detrended Brownian motion. LL (1993) suggest to use a nonparametric estimator for ¯ 2i based on the first differences of the data.1 IPS (1997) relax the assumption of a common parameter under the alternative. Accordingly, model (2) is estimated for each cross section unit separately, yielding an individual specific Dickey-Fuller t-statistic i. The IPS statistic is given by:
N
IPS = N
1/2
[i mT]/T ,
i=1
where i is the usual augmented Dickey-Fuller t-statistic for cross section unit i, and mT, 2T are small sample analogs of
m = E
2 = var
VdV
(8)
V 2 VdV
V 2
.
(9)
166
JÖRG BREITUNG
IPS (1997) provide tables for various values of T and the lag order p. As for the LL test, these tables assume that the panels are balanced, that is, all cross section units have the same number of time periods T.
III. LOCAL POWER In this section we study the local power of alternative test procedures. The sequence of local alternatives given by yit = i + it + xit ,
(10)
where
xit = 1
c
TN
xi, t 1 + it
c>0.
(11)
To analyze the asymptotic behavior of the tests, it is important to specify the relationship between N and T (see Phillips & Moon (1999)). For our analysis it is convenient to apply sequential limits denoted by (T, N → )seq, wherein T → is followed by N → . Although such an asymptotic framework is more restrictive than using a joint limit and requires moment conditions that are difficult to verify (see IPS (1997)), we follow Kao (1999), Moon & Phillips (1999) and others and apply a sequential limit. Whether our results continue to hold for a joint limit theory is an interesting problem for future research. We will further assume that the initial value yi0 is fixed or stochastic with a finite variance. When the initial conditions are allowed to go into the remote past, the initial condition plays a role in the limiting distribution of the process (e.g. Phillips & Lee (1996)). In what follows, however, we will neglect such complications in order to keep the analysis reasonably simple. In the following Lemma we state the important fact that under the local alternative the limiting process of xit is the same as under the null hypothesis. Lemma 1: Under the local alternative given in (10)–(11) and a sequential limit (T, N → )seq we have T 1/2xi, [rT] ⇒ ¯ iWi(r) ,
0≤c< .
This is an important difference to the asymptotic theory in the usual time series context, where under the local alternative the limiting process is an OrnsteinUhlenbeck process (cf. Phillips (1987)). The probability limits of the tests depend on the parameters i and ¯ i. First, we consider the theoretical value of ¯ 2i under the local alternative.
The Local Power of Some Unit Root Tests
167
Lemma 2: Under the local alternative (10)–(11) we have ¯ 2i = lim E(T 1x2iT) = 2i . T→
In what follows we derive the main result by assuming that ¯ 2i is estimated consistently for all values of c ≥ 0. First, we present the local power in a model without any deterministics. In this case no bias adjustment is required and the test can be based on the usual t-statistic of the pooled sample (Quah 1994). Theorem 1: Under the sequence of local alternatives given in (10)–(11) with i = 0 and i = 0, the t-statistic for = 0 in the pooled regression yit =
yi, t 1 + it is asymptotically distributed as ( c/2, 1). In Breitung (1999) it is shown that the same local power is obtained if the individual mean i is removed by subtracting the first observation or if in addition a common time trend 1 = 2 = . . . = N is assumed. Next we consider the bias corrected test statistics. Under the local alternative the bias adjusted (BA) statistic due to LL (1993) converges to the limit
E N T
*BA(c) = lim N N, T →
N
1
T
1
N
e˜ itv˜ i, t 1 N
i=1
t=1
(¯ i /i)a
i=1
N
b
1
.
T
E N 1T 2
v˜ 2i, t 1
i=1
t=1
Note that numerator and denominator are normalized so that both converge to a fixed limit. Since e˜ itv˜ i, t 1 = [i 1 it c/(TN)˜vi, t 1]˜vi, t 1 the limit can be written as
N
lim N N
*BA(c) =
N, T →
1
E(Ti) a
i=1
b E V 2
cE V 2 , b
(12)
where we use ¯ i /o = 1 under the local alternative and
T
Ti = T
1
i 1 itv˜ i, t 1 .
t=1
It turns out that the limit of the bias adjusted statistic depends on two different terms on the right hand side of (12). The first term is due to the detrending
168
JÖRG BREITUNG
method represented by the statistic Ti. The second term is proportional to E V 2 and is similar to the usual location parameter in the asymptotic distribution under the null hypothesis. For example, in the simple regression model yt = xt + ut with stationary variables, the location parameter is proportional to E(xt2). It is important to notice that the expectation of Ti enters the test statistic with the factor N and, therefore, for the asymptotic analysis the expectation must be determined with an accuracy up to O(N 1/2). The following Lemma provides an approximation of this expectation that is sufficient for our purpose. Lemma 3: Under the local alternative given in (10)–(11) the asymptotic expectation of Ti is given by lim E(Ti) = (1/15)c/N 0.5 + o(N 1/2) .
T→
Since the result of Lemma 3 is crucial for the local power of the bias adjusted test, the accuracy of the approximation is investigated in a Monte Carlo experiment. First, we generate 10,000 realizations of Ti by letting T = 200, c = 5 and repeat the experiment with various values for N.2 If Lemma 3 holds, a regression of the sample means of Ti on c/N and a constant should yield an estimate for the intercept close to 0.5 and a slope of roughly 1/15 = 0.067. Using N{30, 35, 40, . . . , 500} the following regression function was obtained for the 71 realizations: E(Ti) 0.495 + 0.0629c/N , (0.00060)
(0.0016)
where standard errors are given in parentheses. The estimated slope coefficient is only slightly smaller than 0.067 and, therefore, the approximation in Lemma 3 seems to perform fairly well in finite samples. Now we present the limiting distribution of the bias adjusted test statistic. Theorem 2: Consider a sequence of local alternatives given in (10)–(11). If the estimator for ¯ i converges weakly to i, the bias adjusted test statistic is asymptotically distributed as (0, 1). It turns out that the bias adjusted test can fail to have power against the sequence of local alternatives. This finding suggests that the power may be improved by a modification that avoids the bias correction altogether. Such a modified test procedure is suggested in Section IV. It is important to notice that the test suggested by LL (1993) employs a nonparametric estimator that converges to zero for a stationary alternative. In the univariate time series context the unit root tests are inconsistent if the long-run
The Local Power of Some Unit Root Tests
169
variance is estimated by using the differences of the time series (cf Phillips & Ouliaris (1990), Theorem 5.3). Therefore, Phillips & Perron (1988) estimate ¯ 2i by using the residuals of the autoregression. In a panel data framework, however, this approach yields a test that has no power against the sequence of local alternatives. Finally the local power of the IPS test is investigated. As in the case of the bias adjusted statistic considered above, the probability limit of the test statistic depends on two terms. The first term is due to the detrending method and depends on
T
i 1itv˜ i, t 1
*Ti =
t=1
.
T
v˜ 2i, t 1
t=1
Since this statistic is a ratio of correlated random variables, the analytic evaluation of this bias is very complicated. To obtain a suitable approximation we apply a similar simulation technique that was also used to check the reliability of Lemma 3. Using the same setup as before the following approximation is found for the expectation of *Ti: E(*Ti) 2.151 + 0.212c/N (0.0030)
(0.0077)
(14)
This approximation can be used to compute the limiting distribution of the IPS test given in Theorem 3: For a sequence of local alternative given in (10)–(11) the IPS test is asymptotically distributed as (IPS c , 1), where IPS c =
c
lim
T→
E(*Ti)
(c/N)
E
V2
c=0
Again we find that the local power depends on two terms. Our Monte Carlo experiment suggests that the derivative of E(*Ti) is positive so that the detrending bias implies a substantial loss of power. Using 10,000 Monte Carlo replications, the expression E( V 2) is estimated as 0.243. Using the value 100 = 0.597, which is taken from the values reported in IPS (1997), we obtain: cIPS = c(0.212–0.243)/0.597 = 0.0401c .
170
JÖRG BREITUNG
It turns out that the asymptotic mean function has a relatively small slope of roughly 0.04 compared to the slope of 1/2 = 0.707 for the case without deterministic trend (see Theorem 1).
III. TEST STATISTICS WITHOUT BIAS ADJUSTMENT From the local power analysis we found that bias corrections used for the LL and IPS tests may imply a severe loss of power. It is therefore desirable to avoid the bias term when constructing the t-statistics. For the case that the model includes only a constant, such an unbiased statistic is easily obtained by subtracting the first observation instead of the individual mean. This is the approach used in Breitung & Meyer (1994). In this section we consider a class of test statistics that do not involve a bias term.3 To facilitate the exposition we will assume that the data are generated by an AR(1) process and, thus, no augmentation with lagged differences is needed. For higher order processes, yit and yi, t 1 are replaced by the residuals from the regressions of yit and yi, t 1 on yi, t 1, . . . , yi, t p. Furthermore, to correct for individual specific variances, the series are adjusted as in the case of the LL statistic. The idea is to transform the variables yit and yi, t 1 such that the usual regression t-statistic can be used to test the unit root hypothesis. For this purpose we define the T 1 vectors yi = [ yi1, . . . , yiT] and xi = [yi0, . . . , yi, T 1]. In order to construct the test statistic we use the transformed vectors y*i = Ayi = [y*i1, . . . , y*iT] and x*i = Bxi = [x*i1, . . . , x*iT] such that E(y*it x*it ) = 0
(15)
for all i and t. Imposing further assumptions to rule out degenerate cases it is possible to show that a t-statistic based on the transformed variables has a standard normal limiting distribution. Theorem 4: Let yit be white noise with E( yit) = i, E( yit i)2 = 2i > 0 and E( yit i)4 < . Under the assumption (15) and lim E(T 1y*i y*i ) > 0
T→
lim E(T 1x*i AAx*i) > 0
T→
the statistic
The Local Power of Some Unit Root Tests
171
N
UB =
i 2y*i x*i
i=1 N
i 2x*i AAx*i
i=1
has a standard normal limiting distribution as (N, T → )seq. A simple way to satisfy assumption (15) is to use an upper triangular matrix A, where the elements of each row sum to zero. In other words, only the present and future observations are used to transform the differences yit. A well known example for such a transformation is the Helmert transformation given by
y*it = st yit
1 ( yi, t + 1 + · · · + yiT) , Tt
t = 1, 2, . . . , T 1, (16)
where s2t = (T t)/(T t + 1). This transformation is also used in Arellano & Bover (1995), for example. An important property of this transformation is that whenever yit is a white noise process with constant variance, then the same is true for y*it. Obviously, if yit is a random walk with (individual specific) time trend, then y*it has a zero mean and is uncorrelated with yi, t 1. The matrix B is chosen such that E(x*it) = 0 and E(y*it x*it) = 0. A possible transformation with the desired properties is: t1 yiT . (17) x*it = yi, t 1 yi1 T
T
1
Note that T yiT = T
1
yit is an estimator of i and, thus, the transformed
t=1
variable x*it is adjusted for a time trend. It is easy to verify that in this case y*it and x*it are uncorrelated. Furthermore, since the transformation matrix A corresponding to the Helmert transformation (16) satisfies AA = I we conclude from Theorem 4 that the t-statistic for H0: * = 0 in the pooled regression y*it = *xit + e*it t = 2, 3, . . . , T 1 (18) has a standard normal limiting distribution. To compute the local power function of this test statistic we need an approximation for
T
E(*Ti) = E T
1
y*it x*it
t=1
172
JÖRG BREITUNG
that is accurate up to O(N 1/2). As for the LL and IPS statistic, such an approximation is obtained by fitting a regression function to the simulated values of *Ti: E(*Ti) 0.0104 0.0407cN . (0.0021)
(0.0104)
(19)
Since the test statistic is constructed to have an expectation of zero under the null hypothesis, we expect to find a constant close to zero. The estimated constant is indeed quite small but nevertheless significant. The slope coefficient is significantly negative so that the test seem to have a local power larger than the size. The following theorem presents further details on the local power of the UB statistic. Theorem 5: For a sequence of local alternative given in (10)–(11) the UB test is asymptotically distributed as (UB c , 1), where
UB c = c6 lim
T→
E(*Ti) (c/N)
.
c=0
It is interesting to compare the local power of the IPS and the UB test. Since 6 · 0.0407 > 0.0401, the UB statistic has a location parameter which is more than twice as large in absolute value compared to the IPS statistic. Again, however, we emphasize that this comparison is inappropriate, because the IPS test is more general than the UB test as it allows for a heterogeneous autoregressive parameter under the alternative.
IV. SMALL SAMPLE PROPERTIES The asymptotic properties of the tests do not depend on the number of lagged differences that are used to account for higher order autoregressive models. However, as noted by IPS (1997) for a small number of time periods T, the null distribution may be substantially affected by the augmentation lag. They therefore present tables for the mean and the variance of i that depend on the type of deterministics (constant/trend), the number of time periods T and the augmentation lag p. From the usual Dickey-Fuller test for univariate time series it is known that the power of the test deteriorates substantially with an increasing augmentation lag. It is therefore expected that also the power of panel unit root tests are affected by the choice of the augmentation lag. To study the robustness of the size and power of the tests considered in the previous sections we generate time series according to the process
The Local Power of Some Unit Root Tests
173
xit = xi, t 1 + it
(20)
and yit = i + it + xit. The initial values of the process are set equal to zero. The errors are i.i.d. with it ~ N(0, 1). Since all tests are invariant to the parameters i and i, these parameters are set equal to zero. For the bias and variance corrections of the LL and IPS tests the tabulated values in LL (1993) and IPS (1997) are used. To represent a typical regional panel data set, we let T = 30 (years) and N = 20 (countries). All rejection frequencies are computed from 1000 realizations with a nominal significance level of 0.05. Table 1 presents the rejection frequencies for the different tests. For p > 0 the LL test turns out to be quite conservative. This was also observed by IPS (1997) and, therefore, the values for the mean and variance of this test should also be tabulated for different augmentation lags. With respect to the power of the test it turns out that for p = 0 the power of the LL and IPS tests are roughly similar. For p > 0 the IPS test is more powerful than the LL test, at least if the critical values of the LL test are not adjusted for different augmentation lags. The UB statistic suggested in Section IV appears to be substantially more powerful than the LL and IPS tests. Furthermore the size of the UB test is fairly robust with respect to the augmentation lag. Notice that for the UB test no tables are required for different values of p and T. In the next Monte Carlo experiment we consider the validity of the theoretical results for the actual power of the test. For this purpose we set Table 1. LL 1.00 0.95 0.90 0.80
IPS
UB
LL
p=0 0.025 0.048 0.189 0.801
1.00 0.95 0.90 0.80
Empirical size and power for T = 30 and N = 20
0.046 0.076 0.198 0.723
0.045 0.072 0.118 0.365
UB
p=1 0.073 0.127 0.396 0.897
0.005 0.009 0.041 0.277
p=2 0.001 0.001 0.001 0.002
IPS
0.053 0.077 0.152 0.544
0.069 0.213 0.417 0.807
p=3 0.038 0.147 0.260 0.508
0.000 0.000 0.000 0.000
0.040 0.056 0.107 0.257
Note: Empirical sizes computed from 1000 Monte Carlo replications of model (20). p denotes the number of lagged differences. The nominal size is 0.05.
0.053 0.195 0.266 0.418
174
JÖRG BREITUNG
= 1–20/(TN). If the test does not have power against such alternative, we expect that the power of the test tends to the size as N → and T → . In our Monte Carlo comparison we also include a variant of the LL test that estimates the long-run variances by using the regression residuals instead of the first difference of the process. As shown in Section III such a test has a local power equal to the size. The critical values for this test are computed by Monte Carlo simulations. The respective test is denoted as LL*. Table 2 presents the outcome of such a Monte Carlo experiment. As predicted by Theorem 2, the power of the LL* test is close the size for all N and T. All other tests appear to converge to a limit larger than the size, where the limiting power of the UB test is nearly twice as large as the limiting power of the IPS test. The original LL test turns out to have power against the local alternative but the power is substantially smaller than the power of the IPS and UB statistics. The findings of the Monte Carlo experiment can be compared to the results of our theoretical analysis. From Theorem 3 it is expected that the IPS test has Table 2. Power against local alternatives LL N
T
25 50 70 100
25 50 70 100
LL*
IPS
UB
0.384 0.300 0.296 0.261
0.668 0.660 0.608 0.579
N and T → 0.378 0.269 0.210 0.170
0.064 0.056 0.033 0.050
T fixed, N → 50 70 100
25 25 25
0.235 0.156 0.090
0.038 0.038 0.028
0.342 0.313 0.273
0.575 0.535 0.450
N fixed, T → 25 25 25
50 70 100
0.415 0.378 0.298
0.061 0.020 0.028
0.419 0.421 0.402
0.724 0.742 0.783
Note: This table reports the rejection rates computed from 1000 replications of model (20) with = 1 20/(TN). The significance level is 0.05. The statistic LL* is constructed similarly to the LL test but using the residuals from the autoregressions to estimate ¯ 2i . For this test the values for the expectation and variance are computed by additional Monte Carlo simulations.
The Local Power of Some Unit Root Tests
175
a limiting power of ( 1.645 + 20 · 0.0401) = 0.199, where ( · ) denotes the c.d.f. of the standard normal distribution. The empirical power for N = 100 and T = 100 is 0.261, which is higher than the predicted power based on Theorem 3. This may be due to the simulation error when using (14). An analogous calculation using the results for the UB statistic yields a limiting power of ( 1.645 + 20 · 0.0997) = 0.636. Since the empirical power for N = 100 and T = 100 is 0.579, the value derived from Theorem 5 using (19) tends to be too high. Finally it is interesting to note that the power of the tests appears to deteriorate with fixed T and increasing N. For the LL test the local power seems to tend slowly to the size as T is fixed and T → .
V. CONCLUSION In this chapter we have considered the local power of some well known tests and a new test for unit roots in panel data. We found that the LL and IPS tests suffer from a severe loss of power if individual specific trends are included. Therefore, a class of test statistic is suggested that does not employ a bias adjustment and it is found that the power of this test is substantially higher than the LL and the IPS tests. Furthermore, it turns out that the LL test is very sensitive to the augmentation lag. It is therefore recommended to apply tables for the mean and variance that take into account the lag-augmentation of the test. The results further indicate that the power of the tests is very sensitive to the specification of the deterministic terms. If there is only a constant or a joint linear trend, then subtracting the first observation yields a very powerful test. Including individual specific trends when it is unnecessary leads to a dramatic loss of power. Hence, in practice it is desirable to have a test for a common deterministic trend against the alternative of individual specific time trends. As pointed out by a referee, there are other detrending methods that may be used to construct an improved test procedure. A natural candidate is the ‘quasi difference’ detrending suggested by Elliot, Rothenberg & Stock (1996) (see also Phillips & Xiao (1998)). Unfortunately, it can be shown that a t-statistic computed from quasi differenced data also suffers from a (Nickell type) bias so that again a bias correction is required to obtain a reasonable test procedure. Nevertheless, a test procedure based on quasi differences may perform better than test procedures with OLS detrending. In this chapter, our strategy is to avoid the bias term altogether. The comparison of our approach to a test procedure based on quasi differences is left for future research.
176
JÖRG BREITUNG
ACKNOWLEDGMENTS The research for this paper was carried out within the SFB 373 at the Humboldt University Berlin and the METEOR research project ‘Dynamic and Nonstationary Panels: Theoretical and Empirical Issues’. I thank Carsten Trenkler and two referees for their helpful comments and suggestions.
NOTES 1. In LL (1993) the test statistic is divided by ˆ NT which is computed as the overall standard deviation of e˜ it. However, since e˜ it is already adjusted for its standard deviation, we can drop ˆ NT when computing the test statistic. 2. I repeated the experiment for different values of c and T. The results turn out to be fairly robust. 3. Another possibility is to use alternative estimation methods like the Generalized Methods of Moments (GMM). Breitung (1997) apply second differences and obtains a unit root test without bias adjustment by using an appropriate GMM estimator.
REFERENCES Arellano M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 29–51. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley and Sons. Breitung, J. (1992). Dynamische Modelle für die Paneldatenanalyse (Dynamic Models for the Analysis of Panel Data). PhD dissertation, Haag + Herchen, Frankfurt. Breitung, J. (1997). Testing for Unit Roots in Panel Data Using a GMM Approach. Statistical Papers, 38, 253–269. Breitung, J. (1999). The Local Power of Some Unit Root Tests for Panel Data. SFB 373 Discussion paper, No. 69–1999, Humboldt University Berlin. Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different Bargaining Levels Cointegrated? Applied Economics, 26, 353–361. Cheung, K. S. (1995), Lag Order and Critical Values of the Augmented Dickey-Fuller Test. Journal of Business and Economic Statistics, 13, 277–280. Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimates for Autoregressive Time Series With a Unit Root. Journal of the American Statistical Association, 74, 427–431. Elliot, G., Rothenberg, T. J., & Stock, J. H. (1996). Efficient Tests for an Autoregressive Unit Root. Econometrica, 64, 813–836. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. DAE Working paper, No 9526, University of Cambridge, revised version. Kao, C. (1999). Spurious Regression and Residual-based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 1–44. Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. Working paper, Department of Economics, University of California San Diego.
The Local Power of Some Unit Root Tests
177
Moon, H. R., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using Panel Data’.’ mimeo, Yale University. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 1417–1426. Phillips, P. C. B. (1987). Towards a Unified Asymptotic Theory of Autoregression. Biometrika, 74, 535–48. Phillips, P. C. B., & Lee, C. C. (1996). Efficiency Gains from Quasi-Differencing Under Nonstationarity. In: P. M. Robinson & M. Rosenblatt (Eds), Essays in Memory of E. J. Hannan (pp. 300–314). Phillips, P. C. B., & Moon, H. R. (1999). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 1057–1111. Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for Cointegration. Econometrica, 58, 165–193. Phillips, P. C. B., & Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 75, 335–346. Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20, 971–1001. Phillips, P. C. B., & Xiao, Z. (1998). A Primer on Unit Root Testing. Journal of Economic Surveys, 12, 423–467. Quah, D, (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 9–19. Schmidt, P., & Phillips, P. C. B. (1992). LM Test for a Unit Root in the Presence of Deterministic Trends. Oxford Bulletin of Economics and Statistics, 54, 257–287.
ON THE ESTIMATION AND INFERENCE OF A COINTEGRATED REGRESSION IN PANEL DATA Chihwa Kao and Min-Hsien Chiang ABSTRACT In this chapter, we study the asymptotic distributions for ordinary least squares (OLS), fully modified OLS (FMOLS), and dynamic OLS (DOLS) estimators in cointegrated regression models in panel data. We show that the OLS, FMOLS, and DOLS estimators are all asymptotically normally distributed. However, the asymptotic distribution of the OLS estimator is shown to have a non-zero mean. Monte Carlo results illustrate the sampling behavior of the proposed estimators and show that (1) the OLS estimator has a non-negligible bias in finite samples, (2) the FMOLS estimator does not improve over the OLS estimator in general, and (3) the DOLS outperforms both the OLS and FMOLS estimators.
I. INTRODUCTION Evaluating the statistical properties of data along the time dimension has proven to be very different from analysis of the cross-section dimension. As economists have gained access to better data with more observations across time, understanding these properties has grown increasingly important. An area of particular concern in time-series econometrics has been the use of nonstationary data. With the desire to study the behavior of cross-sectional data Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 179–222. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
179
180
CHIHWA KAO & MIN-HSIEN CHIANG
over time and the increasing use of panel data, e.g. Summers and Heston (1991) data, one new research area is examining the properties of non-stationary timeseries data in panel form. It is an intriguing question to ask: how exactly does this hybrid style of data combine the statistical elements of traditional crosssectional analysis and time-series analysis? In particular, what is the correct way to analyze non-stationarity, the spurious regression problem, and cointegration in panel data? Given the immense interest in testing for unit roots and cointegration in timeseries data, not much attention has been paid to testing the unit roots in panel data. The only theoretical studies we know of in this area are Breitung & Meyer (1994); Quah (1994); Levin & Lin (1993); Im, Pesaran & Shin (1995); and Maddala & Wu (1999). Breitung & Meyer (1994) derived the asymptotic normality of the Dickey-Fuller test statistic for panel data with a large crosssection dimension and a small time-series dimension. Quah (1994) studied a unit root test for panel data that simultaneously have extensive cross-section and time-series variation. He showed that the asymptotic distribution for the proposed test is a mixture of the standard normal and Dickey-Fuller-Phillips asymptotics. Levin & Lin (1993) derived the asymptotic distributions for unit roots on panel data and showed that the power of these tests increases dramatically as the cross-section dimension increases. Im et al. (1995) critiqued the Levin and Lin panel unit root statistics and proposed alternatives. Maddala & Wu (1999) provided a comparison of the tests of Im et al. (1995) and Levin & Lin (1993). They suggested a new test based on the Fisher test. Recently, some attention has been given to the cointegration tests and estimation with regression models in panel data, e.g. Kao (1999), McCoskey & Kao (1998), Pedroni (1996, 1997) and Phillips & Moon (1999). Kao (1999) studied a spurious regression in panel data, along with asymptotic properties of the ordinary least squares (OLS) estimator and other conventional statistics. Kao showed that the OLS estimator is consistent for its true value, but the tstatistic diverges so that inferences about the regression coefficient, , are wrong with a probability that goes to one. Furthermore, Kao examined the Dickey-Fuller (DF) and the augmented Dickey-Fuller (ADF) tests to test the null hypothesis of no cointegration in panel data. McCoskey & Kao (1998) proposed further tests for the null hypothesis of cointegration in panel data. Pedroni (1997) derived asymptotic distributions for residual-based tests of cointegration for both homogeneous and heterogeneous panels. Pedroni (1996) proposed a fully modified estimator for heterogeneous panels. Phillips & Moon (1999) developed both sequential limit and joint limit theories for nonstationary panel data. Pesaran & Smith (1995) are not directly concerned with cointegration but do touch on a number of related issues, including the potential
Panel Cointegration
181
problems of homogeneity misspecification for cointegrated panels. See the survey paper by Baltagi & Kao (2000) in this volume. This chapter makes two main contributions. First, it adds to the literature by suggesting a computationally simpler dynamic OLS (DOLS) estimator in panel cointegrated regression models. Second, it provides a serious study of the finite sample properties of the OLS, fully modified OLS (FMOLS), and DOLS estimators. Section 2 introduces the model and assumptions. Section 3 develops the asymptotic theory for the OLS, FMOLS and DOLS estimators. Section 4 gives the limiting distributions of the FMOLS and DOLS estimators for heterogeneous panels. Section 5 presents Monte Carlo results to illustrate the finite sample properties of the OLS, FMOLS, and DOLS estimators. Section 6 summarizes the findings. The proofs of Theorems 1, 2, and 4 are not presented since the proofs can be found in Phillips & Moon (1999) and Pedroni (1997). The appendix contains the proofs of Theorems 3 and 5. A word on notation. We write the integral 01W(s)ds, as W, when there is no ambiguity over limits. We define 1/2 to be any matrix such that = (1/2)(1/2). We use || A || to denote {tr(AA)}1/2, |A| to denote the determinant p of A, ⇒ to denote weak convergence, → to denote convergence in probability, [x] to denote the largest integer ≤ x, I(0) and I(1) to signify a time-series that is integrated of order zero and one, respectively, and BM() to denote Brownian motion with the covariance matrix .
II. THE MODEL AND ASSUMPTIONS Consider the following fixed effect panel regression: yit = i + xit + uit, i = 1, . . . , N, t = 1, . . . , T,
(1)
where {yit} are 1 1, is a k 1 vector of the slope parameters, {i} are the intercepts, and {uit} are the stationary disturbance terms. We assume that {xit} are k 1 integrated processes of order one for all i, where xit = xit 1 + it. Under these specifications, (1) describes a system of cointegrated regressions, i.e. yit is cointegrated with xit. The initialization of this system is yi0 = xi0 = Op(1) as T → , for all i. The individual constant term i can be extended into general deterministic time trends such as 0i + 1it + , . . . , + pit p. Assumption 1. The asymptotic theory employed in this paper is a sequential limit theory established by Phillips & Moon (1999) in which T → and followed by N → .
182
CHIHWA KAO & MIN-HSIEN CHIANG
Next, we characterize the innovation vector wit = (uit, it). We assume that wit is a linear process that satisfies the following assumption. Assumption 2. For each i, we assume:
(a) wit = (L)it =
j it j,
j=0
ja|| j || < , | (1)| ≠ 0, for some a > 1.
j=0
(b) it is i.i.d. with zero mean, variance matrix , and finite fourth order cumulants. Assumption 2 implies that (e.g. Phillips & Solo, 1992) the partial sum process
[Tr]
1 T
wit satisfies the following multivariate invariance principle:
t=1
[Tr]
1 T
wit ⇒ Bi(r) = BMi() as T → for all i,
(2)
t=1
where Bi =
Bui . Bi
The long-run covariance matrix of {wit} is given by
=
E(wijwi0)
j=
= (1) (1) = + + =
u u
u ,
where
=
j=1
and
E(wijwi0) =
u
u
u
(3)
Panel Cointegration
183
= E(wi0wi0) =
u u
u
(4)
are partitioned conformably with wit. Assumption 3. is non-singular, i.e. {xit}, are not cointegrated. Define u. = u u 1u.
(5)
Then, Bi can be rewritten as Bi =
where
Bui 1/2 u. = Bi 0
u 1/2 1/2
Vi , Wi
(6)
Vi = BM(I) is a standardized Brownian motion. Define the one-sided Wi
long-run covariance = +
=
E(wijwi0)
j=0
with =
u u
u .
Here we assume that panels are homogeneous, i.e. the variances are constant across the cross-section units. We will relax this assumption in Section 4 to allow for different variances for different i. Remark 1. The benefits of using panel data models have been discussed extensively by Hsiao (1986) and Baltagi (1995), though Hsiao & Baltagi assume the time dimension is small while the cross-section dimension is large. However, in international trade, open macroeconomics, urban regional, public finance, and finance, panel data usually have long time-series and crosssection dimensions. The data of Summers & Heston (1991) are a notable example.
184
CHIHWA KAO & MIN-HSIEN CHIANG
Remark 2. The advantage of using the sequential limit theory is that it offers a quick and easy way to derive the asymptotics as demonstrated by Phillips & Moon (1999). Phillips & Moon also provide detailed treatments of the connections between the sequential limit theory and the joint limit theory. Remark 3. If one wants to obtain a consistent estimate of in (1) or wants to test some restrictions on , then an individual time-series regression or a multiple time-series regression is probably enough. So what are the advantages of using the (N, T) asymptotics, e.g. sequential asymptotics in Assumption 1, instead of T asymptotics? One of the advantages is that we can get a normal approximation of the limit distributions of the estimators and test statistics with the convergence rate NT. More importantly, the biases of the estimators and test statistics can be reduced when N and T are large. For example, later in this paper we will show that the biases of the OLS, FMOLS, and DOLS estimators in Table 2 were reduced by half when the sample size was changed from (N = 1, T = 20) to (N = 20, T = 20). However, in order to obtain an asymptotic normality using the (N, T) asymptotics we need to make some strong assumptions; for example, in this paper we assume that the error terms are independent across i. Remark 4. The results in this chapter require that regressors are not cointegrated. Assuming that I(1) regressors are not cointegrated with each other is indeed restrictive. The authors are currently investigating this issue.
III. OLS, FMOLS, AND DOLS ESTIMATORS Let us first study the limiting distribution of the OLS estimator for equation (1). The OLS estimator of is
N
ˆ OLS =
T
i=1
t=1
(xit x¯ i)(xit x¯ i)
1
N
T
i=1
t=1
(xit x¯ i)(yit y¯ i) .
(7)
All the limits in Theorems 1–6 are taken as T → followed by N → sequentially from Assumption 1. First, we present the following theorem: Theorem 1. If Assumptions 1–3 hold, then (a) T(ˆ OLS ) → 3 1u + 6 1u, (b) NT(ˆ OLS ) NNT ⇒ N(0, 6 1u.), p
where
Panel Cointegration
185
N
1 NT = N
T
1 T2
i=1
(xit x¯ it)(xit x¯ i)
1
t=1
N
1 N
˜ idWi 1/2u + u W
1/2
i=1
˜ i = Wi Wi. and W The normality of the OLS estimator in Theorem 1 comes naturally. When summing across i, the non-standard asymptotic distribution due to the unit root in the time dimension is smoothed out. From Theorem 1 we note that there is an interesting interpretation of the asymptotic covariance matrix, 6 1u., i.e. 1u. can be seen as the long-run noise-to-signal ratio. We also note that 1 2u is due to the endogeneity of the regressor xit, and u is due to the serial correlation. It can be shown easily that NT → 3 1u + 6 1u. p
If wit = (uit, it) are i.i.d., then NT → 3 1 u, p
ˆ u, ˆ , and ˆ , ˆ u be which was examined by Kao & Chen (1995). Let consistent estimates of , u, , and u respectively. Then from (b) in + Theorem 1, we can define a bias-corrected OLS, ˆ OLS , ˆ NT + = ˆ OLS ˆ OLS T
such that + NT(ˆ OLS ) ⇒ N(0, 6 1u.),
where ˆ 1 ˆ u + 6 ˆ 1 ˆ u. ˆ NT = 3 Chen, McCoskey & Kao (1999) investigated the finite sample proprieties of the OLS estimator in (7), the t-statistic, the bias-corrected OLS estimator, and the bias-corrected t-statistic. They found that the bias-corrected OLS estimator does not improve over the OLS estimator in general. The results of Chen et al. suggest that alternatives, such as the FMOLS estimator or the DOLS estimator (e.g. Saikkonen, 1991; Stock & Watson, 1993) may be more promising in
186
CHIHWA KAO & MIN-HSIEN CHIANG
cointegrated panel regressions. Thus, we begin our study by examining the limiting distribution of the FMOLS estimator, ˆ FM. The FMOLS estimator is constructed by making corrections for endogeneity and serial correlation to the OLS estimator ˆ OLS in (7). Define uit+ = uit u 1it, (8) 1 + ˆ ˆ (9) uˆ it = uit u it, yit+ = yit u 1xit, (10) and ˆ u ˆ 1xit. (11) yˆ it+ = yit Note that
u 1 Ik
uit+ 1 = it 0
uit , it
which has the long-run covariance matrix
u. 0
0 ,
where Ik is a k k identity matrix. The endogeneity correction is achieved by modifying the variable yit, in (1) with the transformation ˆ u ˆ 1xit yˆ it+ = yit ˆ u ˆ 1xit. = i + xit + uit The serial correlation correction term has the form
ˆ u ˆ ) ˆ +u = (
1 ˆ u ˆ 1
ˆ u, ˆ 1 ˆ = ˆ u ˆ are kernel estimates of u and . Therefore, the FMOLS where ˆ u and estimator is
N
ˆ FM =
T
(xit x¯ i)(xit x¯ i)
i=1
t=1
N
T
i=1
t=1
1
(xit x¯ i)ˆyit+ Tˆ +u
.
(12)
Panel Cointegration
187
Now, we state the limiting distribution of ˆ FM. Theorem 2. If Assumptions 1–3 hold, then NT(ˆ FM ) ⇒ N(0, 6 1u.). It can be shown easily that the limiting distribution of ˆ FM becomes NT(ˆ FM ) ⇒ N(0, 2 1u.)
(13)
by the exclusion of the individual-specific intercept, i. ˆ it, were estimated, we used Remark 5. Once the estimates of wit, w ˆ= 1 NT
N
T
i=1
t=1
(14)
w ˆ itw ˆ it
to estimate . was estimated by
N
ˆ =1 N
i=1
T
1 T
t=1
l
1 w ˆ itw ˆ it + T
T
l
=1
(wˆ itw ˆ it + wˆ it w ˆ it) ,
t=+1
(15)
where l is a weight function or a kernel. Using Phillips & Durlauf (1986) ˆ can be shown to be consistent for and and sequential limit theory, ˆ and . ˆ ) does not Remark 6. The distribution results for ˆ FM require N( ˆ may not be small when T is fixed. diverge as N grows large. However, ˆ ) may be non-neglibible in panel data with finite It follows that N( samples. Next, we propose a DOLS estimator, ˆ D, which uses the past and future values of xit as additional regressors. We then show that the limiting distribution of ˆ D is the same as the FMOLS estimator, ˆ FM. But first, we need the following additional assumption: Assumption 4. The spectral density matrix fww() is bounded away from zero and full rank for all i, i.e. fww() ≥ IT, [0, ], > 0. When Assumptions 2 and 4 hold, the process {uit} can be written as (see Saikkonen, 1991):
uit =
j=
for all i, where
cijit + j + vit
(16)
188
CHIHWA KAO & MIN-HSIEN CHIANG
|| cij || < ,
j=
{vit} is stationary with zero mean, and {vit} and {it} are uncorrelated not only contemporaneously but also in all lags and leads. In practice, the leads and lags may be truncated while retaining (16) approximately, so that
q
uit =
cijit + j + v˙ it.
j=q
for all i. This is because {cij} are assumed to be absolutely summable, i.e.
|| cij || < .
j=
We also need to require that q tends to infinity with T at a suitable rate: q3 → 0, and T
Assumption 5. q → as T → such that T1/2
|| cij || → 0
(17)
|j|>q
for all i. We then substitute (16) into (1) to get
cijit + j + v˙ it,
cijit + j.
q
yit = i + xit +
j=q
where v˙ it = vit +
(18)
|j|>q
Therefore, we obtain the DOLS of , ˆ D, by running the following regression:
q
yit = i + xit +
cijxit + j + v˙ it.
(19)
j=q
Next, we show that ˆ D has the same limiting distribution ˆ FM as in Theorem 2. Theorem 3. If Assumptions 1–5 hold, then NT(ˆ D ) ⇒ N(0, 6 1u.).
Panel Cointegration
189
IV. HETEROGENEOUS PANELS This chapter so far assumes that the panel data are homogeneous. The substantial heterogeneity exhibited by actual data in the cross-sectional dimension may restrict the practical applicability of the FMOLS and DOLS estimators. Also, the estimators in Sections 2 and 3 are not easily extended to cases of broader cross-sectional heterogeneity since the variances and biases are specified in terms of the asymptotic covariance parameters that are assumed to be shared cross-sectionally. In this section, we propose an alternative representation of the panel FMOLS estimator for heterogeneous panels. Before we discuss the FMOLS estimator we need the following assumptions: Assumption 6. We assume the panels are heterogeneous, i.e. i, i and i are varied for different i. We also assume the invariance principle in (2), (16), and (17) in Assumption 5 still holds. Let ˆ i 1/2xit, x*it = 1/2 + ˆ iu. u*it = ˆ it , u
(20)
ˆ iu ˆ i 1it, uˆ it+ = uit ˆ iu ˆ 1/2 ˆ 1/2 ˆ 1/2xit)), ˆ i 1xit yˆ it+ = yit iu.(iu. xit (i
(21)
1/2 + ˆ iu. ˆ it , y*it = y
(23)
(22)
and ˆ iu. are consistent estimators of i and ˆ i and where iu. = iu iui 1iu, ˆ 1/2 ˆ 1/2 respectively. Similar to Pedroni (1996) the correction term, iu.(iu. 1/2 ˆ i xit)), is needed in (22) in the heterogeneous panel. We note that xit ( 1/2 ˆ 1/2xit) = 0 in the ˆ iu. (22) will be the same as (11) only if xit (i heterogeneous panel. Also (22) requires knowing something about the true . In practice, in (22) can be replaced by a preliminary OLS, ˆ OLS. Therefore, let ˆ iu ˆ 1/2 ˆ 1/2 ˆ 1/2xit))ˆ OLS, ˆ i 1xit yˆ it+ + = yit iu.(iu. xit (i and 1/2 + + ˆ iu. ˆ it . y*it = y
190
CHIHWA KAO & MIN-HSIEN CHIANG
Assumption 7. i is not singular for all i. Then, we define the FMOLS estimator for heterogeneous panels as ˆ FM = *
N
T
i=1
t=1
1
(x*it x¯ *i )(x*it x¯ *i )
N
T
i=1
t=1
ˆ iu (x*it x¯ *i )y*it T*
,
(24) where 1/2 ˆ iu. ˆ i 1/2 ˆ iu = ˆ iu + *
and
ˆ i+u = ( ˆ iu ˆ i)
1 ˆ iu ˆ i 1
ˆ iu. ˆ i 1 ˆ i = ˆ iu ˆ FM ) Theorem 4. If Assumptions 1–2 and 6–7 hold, then NT(* ⇒ N(0, 6Ik). ˆ D, can be obtained by The DOLS estimator for heterogeneous panels, * running the following regression:
qi
y*it = i + x* it +
cijx*it + j + v˙ *it,
(25)
j = qi
where v˙ *it is defined similarly as in (18). Note that in (25) different lag truncations, qi, may have to be used because the error terms are heterogeneous across i. Therefore, we need to assume that qi tends to infinity with T at a suitable rate for all i: Assumption 8. qi → as T → such that T1/2
q3i → 0, and T
|| cij || → 0
(26)
| j | > qi
for all i. ˆ D also has the same limiting In the following theorem we show that * ˆ distribution as *FM.
Panel Cointegration
191
ˆ D ) Theorem 5. If Assumptions 1–2 and 6–8 hold, then NT(* ⇒ N(0, 6Ik). ˆ FM and Remark 7. Theorems 4 and 5 show that the limiting distributions of * ˆ*D are free of nuisance parameters. Remark 8. We now consider a linear hypothesis that involves the elements of the coefficient vector . We show that hypothesis tests constructed using the FMOLS and DOLS estimators have asymptotic chi-squared distributions. The null hypothesis has the form: H0:R = r,
(27)
where r is an m 1 known vector and R is a known m k matrix describing the restrictions. A natural test statistic of the Wald test using ˆ FM or ˆ D for homogeneous panels is 1 ˆ u.R] 1(Rˆ D r). ˆ 1 W = NT2(Rˆ D r)[R 6
(28)
ˆ FM or * ˆD Remark 9. For the heterogeneous panels, a natural statistic using * to test the null hypothesis is 1 ˆ D r)[RR] 1(R* ˆ D r). W* = NT2(R* 6
(29)
It is clear that W and W* converge in distribution to a chi-squared random variable with m degrees of freedom, 2m, as T → and followed by N → sequentially under the null hypothesis. Hence, we establish the following results: W ⇒ 2m, and W* ⇒ 2m. Because the FMOLS and the DOLS estimators have the same asymptotic distributions, it is easy to verify that the Wald statistics based on the FMOLS estimator share the same limiting distributions as those based on the DOLS estimator.
V. MONTE CARLO SIMULATIONS The ultimate goal of this Monte Carlo study is to compare the sample properties of OLS, FMOLS, and DOLS for two models: a homogeneous panel
192
CHIHWA KAO & MIN-HSIEN CHIANG
and a heterogeneous panel. The simulations were performed by a Sun SparcServer 1000 and an Ultra Enterprise 3000. GAUSS 3.2.31 and COINT 2.0 were used to perform the simulations. Random numbers for error terms, (u*it, *it), for Sections 5 A, B and D, were generated by the GAUSS procedure RNDNS. At each replication, we generated an N(T + 1000) length of random numbers and then split it into N series so that each series had the same mean and variance. The first 1, 000 observations were discarded for each series. {u*it} and {*it} were constructed with ui0 = 0 and i0 = 0. In order to compare the performance of the OLS, FMOLS, and DOLS estimators, the following data generating process (DGP) was used: (30) yit = i + xit + uit and xit = xit 1 + it where (uit, it) follows an ARMA(1, 1) process:
uit 0.5 0 = it 0 0.5
uit 1 u*it 0.3 + + it 1 *it 21
0.4 0.6
u*it 1 *it 1
with
u*it iid ~N *it
0 , 0
1 21 21 1
.
The design in (30) nests several important special cases. First, when is replaced by
0 0
0 0
0.5 0 0 0.5
and 21 is constant across i, then the DGP becomes the
homogeneous panel in Section 5A. Second, when
0 0
0.5 0 0 0.5
is replaced by
0 , and 21 and 21 are random variable different across i, then the DGP 0
is the heterogeneous panel in Section 5D. A. Homogeneous Panel To compare the performance of the OLS, FMOLS, and DOLS estimators for the homogeneous panel we conducted Monte Carlo experiments based on a
Panel Cointegration
193
design similar to that of Phillips & Hansen (1990) and Phillips & Loretan (1991). yit = i + xit + uit and xit = xit 1 + it for i = 1, . . ., N, t = 1, . . . , T, where
0.4 0.6
uit u*it 0.3 = + it *it 21
u*it 1 *it 1
(31)
with
u*it iid ~N *it
0 , 0
1 21
21 1
.
We generated i from a uniform distribution, U[0, 10], and set = 2. From Theorems 1–3 we know that the asymptotic results depend upon variances and covariances of the errors uit and it. The design in (31) is a good one since the endogeneity of the system is controlled by only two parameters, 21 and 21. We allowed 21 and 21 to vary and considered values of {0.8, 0.4, 0.0, 0.8} for 21 and {–0.8, –0.4, 0.4} for 21. The estimate of the long-run covariance matrix in (15) was obtained by using the procedure KERNEL in COINT 2.0 with a Bartlett window. The lag truncation number was set arbitrarily at five. Results with other kernels, such as Parzen and quadratic spectral kernels, are not reported, because no essential differences were found for most cases. Next, we recorded the results from our Monte Carlo experiments that examined the finite-sample properties of the OLS estimator, ˆ OLS; the FMOLS estimator, ˆ FM; and the DOLS estimator, ˆ D. The results we report are based on 10,000 replications and are summarized in Tables 1–4 and Figures 1–8. The FMOLS estimator was obtained by using a Bartlett window of lag length five as in (15). Four lags and two leads were used for the DOLS estimator. Table 1 reports the Monte Carlo means and standard deviations (in parentheses) of (ˆ OLS ), (ˆ FM ), and (ˆ D ) for sample sizes T = N = (20, 40, 60). The biases of the OLS estimator, ˆ OLS, decrease at a rate of T. For example, with 21 = –0.8 and 21 = 0.8, the bias at T = 20 is –0.201 and at T = 40 is –0.104. Also, the biases increase in 21 (with 21 > 0) and decrease in 21.
–0.176 (0.044) –0.099 (0.017) –0.069 (0.009) –0.064 (0.025) –0.038 (0.009) –0.027 (0.005) –0.002 (0.015) –0.005 (0.005) –0.004 (0.003) 0.038 (0.012) 0.018 (0.004) 0.011 (0.002)
–0.201 (0.049) –0.104 (0.019) –0.070 (0.010)
–0.132 (0.038) –0.066 (0.014) –0.044 (0.007)
–0.079 (0.027) –0.039 (0.009) –0.026 (0.005)
–0.029 (0.016) –0.015 (0.006) –0.009 (0.003)
21 = –0.8 ˆ FM–
0.007 (0.008) 0.003 (0.002) 0.002 (0.001)
0.001 (0.017) 0.001 (0.005) 0.000 (0.003)
–0.001 (0.027) –0.001 (0.027) –0.000 (0.005)
–0.001 (0.040) –0.000 (0.013) –0.000 (0.007)
ˆ D–
–0.019 (0.017) –0.009 (0.006) –0.007 (0.003)
–0.059 (0.026) –0.029 (0.009) –0.019 (0.005)
–0.082 (0.030) –0.041 (0.011) –0.027 (0.006)
–0.097 (0.032) –0.049 (0.012) –0.033 (0.007)
ˆ OLS–
0.036 (0.015) 0.018 (0.005) 0.012 (0.002)
–0.019 (0.022) –0.012 (0.008) –0.009 (0.004)
–0.068 (0.029) –0.038 (0.011) –0.026 (0.006)
–0.113 (0.035) –0.062 (0.013) –0.042 (0.007)
21 = –0.4 ˆ FM–
0.007 (0.014) 0.003 (0.004) 0.002 (0.002)
0.002 (0.026) 0.001 (0.008) –0.001 (0.008)
–0.002 (0.031) –0.001 (0.009) –0.001 (0.005)
–0.002 (0.033) –0.001 (0.011) –0.000 (0.006)
ˆ D–
0.114 (0.034) 0.057 (0.012) 0.038 (0.007)
0.005 (0.016) 0.002 (0.006) 0.001 (0.003)
–0.014 (0.013) –0.007 (0.005) –0.005 (0.002)
–0.022 (0.011) –0.011 (0.004) –0.007 (0.002)
ˆ OLS–
0.012 (0.028) 0.011 (0.009) 0.010 (0.005)
–0.069 (0.021) –0.035 (0.007) –0.023 (0.004)
–0.073 (0.018) –0.037 (0.006) –0.025 (0.003)
–0.069 (0.016) –0.036 (0.006) –0.024 (0.003)
21 = 0.8 ˆ FM–
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000 (0.031) –0.000 (0.009) 0.000 (0.005)
0.006 (0.017) 0.003 (0.005) 0.002 (0.003)
–0.003 (0.013) –0.001 (0.004) –0.001 (0.002)
–0.009 (0.009) –0.004 (0.003) –0.003 (0.002)
ˆ D–
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
T = 60
T = 40
21 = –0.8 T = 20
T = 60
T = 40
21 = 0.0 T = 20
T = 60
T = 40
21 = 0.4 T = 20
T = 60
T = 40
21 = 0.8 T = 20
ˆ OLS–
Table 1.
194 CHIHWA KAO & MIN-HSIEN CHIANG
Panel Cointegration
Table 2.
195
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators for Different N and T
(N,T)
ˆ OLS–
ˆ FM(5)–
ˆ FM(2)–
ˆ D(4,2)–
ˆ D(2,1)–
(1,20)
–0.135 (0.184) –0.070 (0.093) –0.047 (0.063) –0.024 (0.032) –0.082 (0.030) –0.042 (0.016) –0.028 (0.010) –0.014 (0.005) –0.081 (0.022) –0.041 (0.011) –0.028 (0.007) –0.014 (0.004) –0.080 (0.017) –0.041 (0.009) –0.027 (0.006) –0.014 (0.003) –0.079 (0.012) –0.041 (0.006) –0.027 (0.004) –0.014 (0.002)
–0.104 (0.196) –0.059 (0.012) –0.041 (0.064) –0.023 (0.031) –0.068 (0.029) –0.039 (0.015) –0.027 (0.010) –0.014 (0.005) –0.066 (0.021) –0.038 (0.011) –0.026 (0.007) –0.014 (0.004) –0.067 (0.017) –0.038 (0.009) –0.026 (0.006) –0.014 (0.003) –0.066 (0.012) –0.037 (0.006) –0.026 (0.004) –0.014 (0.002)
–0.122 (0.189) –0.065 (0.092) –0.043 (0.061) –0.022 (0.031) –0.075 (0.029) –0.039 (0.015) –0.026 (0.009) –0.013 (0.005) –0.073 (0.021) –0.038 (0.011) –0.025 (0.007) –0.013 (0.003) –0.073 (0.017) –0.038 (0.009) –0.025 (0.006) –0.012 (0.003) –0.072 (0.012) –0.037 (0.006) –0.025 (0.004) –0.013 (0.002)
–0.007 (0.297) –0.001 (0.106) –0.001 (0.064) –0.001 (0.029) –0.002 (0.031) –0.001 (0.015) –0.000 (0.009) –0.000 (0.005) –0.001 (0.022) –0.001 (0.009) –0.001 (0.007) –0.000 (0.003) –0.002 (0.018) –0.001 (0.008) –0.001 (0.005) –0.000 (0.003) –0.002 (0.012) –0.001 (0.006) –0.001 (0.004) –0.000 (0.002)
0.031 (0.211) 0.015 (0.090) 0.009 (0.057) 0.004 (0.027) 0.017 (0.028) 0.008 (0.014) 0.006 (0.009) 0.003 (0.004) 0.017 (0.019) 0.008 (0.009) 0.005 (0.006) 0.003 (0.004) 0.016 (0.016) 0.008 (0.008) 0.005 (0.005) 0.003 (0.003) 0.016 (0.011) 0.008 (0.005) 0.005 (0.004) 0.003 (0.002)
(1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 = –0.4 and 21 = 0.4.
–5.594 (1.330) –8.435 (1.382) –10.749 (1.439) –2.377 (1.042) –4.558 (1.071) –6.012 (1.109) –0.145 (0.919) –0.796 (0.888) –1.294 (0.899) 3.694 (1.201) 5.509 (1.243) 7.130 (1.281)
–7.247 (1.526) –10.047 (1.484) –12.250 (1.468) –5.425 (1.340) –7.507 (1.302) –9.161 (1.287) –3.927 (1.200) –5.453 (1.173) –6.674 (1.161)
–2.067 (1.066) –2.898 (1.050) –3.574 (1.040)
21 = –0.8 FMOLS
0.635 (0.732) 0.948 (0.712) 1.236 (0.737)
0.054 (0.993) 0.001 (0.926) 0.147 (0.927)
–0.046 (1.132) –0.017 (1.023) –0.009 (1.009)
–0.047 (1.281) –0.004 (1.119) –0.004 (1.093)
DOLS
–1.229 (1.084) –1.758 (1.067) –2.188 (1.061)
–2.944 (1.241) –4.134 (1.229) –5.070 (1.229)
–3.905 (1.334) –5.462 (1.325) –6.676 (1.329)
–4.650 (1.393) –6.503 (1.389) –7.937 (1.397)
OLS
2.893 (1.214) 4.041 (1.161) 4.983 (1.143)
–1.006 (1.180) –1.684 (1.086) –2.198 (1.065)
–3.017 (1.282) –4.401 (1.205) –5.489 (1.197)
–4.823 (1.414) –6.833 (1.366) –8.429 (1.377)
21 = –0.4 FMOLS
0.530 (1.107) 0.741 (0.984) 0.913 (0.964)
0.096 (1.342) 0.168 (1.134) 0.199 (1.088)
–0.124 (1.402) –0.104 (1.168) –0.126 (1.118)
–0.086 (1.423) –0.069 (1.187) –0.084 (1.135)
DOLS
4.495 (1.123) 6.255 (1.088) 7.630 (1.092)
0.277 (0.897) 0.334 (0.885) 0.405 (0.891)
–0.925 (0.867) –1.336 (0.856) –1.626 (0.859)
–1.758 (0.859) –2.491 (0.847) –3.030 (0.847)
OLS
Means Biases and Standard Deviations of t-statistics
0.542 (1.209) 1.349 (1.103) 1.975 (1.087)
–5.198 (1.503) –7.086 (1.441) –8.556 (1.395)
–6.864 (1.642) –9.744 (1.665) –11.966 (1.644)
–7.927 (1.719) –11.584 (1.826) –14.402 (1.840)
21 = 0.8 FMOLS
0.013 (1.350) –0.002 (1.160) 0.003 (1.109)
0.439 (1.277) 0.547 (1.104) 0.663 (1.047)
–0.277 (1.203) –0.362 (1.054) –0.408 (0.999)
–1.049 (1.122) –1.386 (1.006) –1.633 (0.959)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
T = 60
T = 40
21 = –0.8 T = 20
T = 60
T = 40
21 = 0.0 T = 20
T = 60
T = 40
21 = 0.4 T = 20
T = 60
T = 40
21 = 0.8 T = 20
OLS
Table 3.
196 CHIHWA KAO & MIN-HSIEN CHIANG
Panel Cointegration
Table 4.
197
Means Biases and Standard Deviations of t-statistics for Different N and T
(N,T)
OLS
FMOLS(5)
FMOLS(2)
DOLS(4,2)
DOLS(2,1)
(1,20)
–1.169 (1.497) –1.116 (1.380) –1.090 (1.357) –1.092 (1.333) –3.905 (1.334) –3.934 (1.307) –3.861 (1.306) –3.893 (1.312) –5.439 (1.347) –5.462 (1.325) –5.457 (1.328) –5.469 (1.296) –6.677 (1.329) –6.699 (1.323) –6.676 (1.329) –6.677 (1.311) –9.407 (1.350) –9.418 (1.313) –9.411 (1.310) –9.408 (1.315)
–1.264 (2.326) –1.169 (1.805) –1.162 (1.692) –1.239 (1.165) –3.017 (1.281) –3.202 (1.206) –3.202 (1.150) –3.247 (1.149) –4.163 (1.269) –4.401 (1.205) –4.506 (1.199) –4.647 (1.190) –5.097 (1.258) –5.384 (1.204) –5.489 (1.197) –5.656 (1.196) –7.153 (1.262) –7.753 (1.171) –7.717 (1.182) –7.932 (1.195)
–1.334 (2.031) –1.232 (1.738) –1.195 (1.676) –1.217 (1.652) –3.156 (1.230) –3.169 (1.200) –3.111 (1.191) –3.141 (1.209) –4.342 (1.226) –4.344 (1.197) –4.339 (1.192) –4.356 (1.176) –5.314 (1.208) –5.309 (1.192) –5.289 (1.191) –5.299 (1.182) –7.446 (1.215) –7.753 (1.171) –7.429 (1.174) –7.432 (1.181)
–0.304 (3.224) –0.113 (2.086) –0.071 (1.778) –0.056 (1.531) –0.124 (1.402) –0.114 (1.186) –0.053 (1.122) –0.073 (1.078) –0.088 (1.358) –0.104 (1.168) –0.098 (1.121) –0.106 (1.050) –0.169 (1.361) –0.162 (1.169) –0.126 (1.118) –0.115 (1.056) –0.220 (1.348) –0.193 (1.157) –0.177 (1.093) –0.152 (1.057)
0.232 (2.109) 0.258 (1.689) 0.254 (1.554) 0.234 (1.448) 0.695 (1.184) 0.634 (1.099) 0.677 (1.079) 0.642 (1.061) 1.008 (1.169) 0.928 (1.092) 0.913 (1.081) 0.879 (1.033) 1.179 (1.162) 1.097 (1.094) 1.106 (1.074) 1.083 (1.041) 1.662 (1.163) 1.565 (1.085) 1.549 (1.053) 1.530 (1.040)
(1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 = –0.4 and 21 = 0.4.
Fig. 1.
Distribution of biases of Estimators with N = 40, T = 20.
198 CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 2.
Distribution of t-statistics with N = 40, T = 20.
Panel Cointegration 199
Fig. 3.
Distribution of biases of Estimators with N = 40, T = 40.
200 CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 4.
Distribution of t-statistics with N = 40, T = 40.
Panel Cointegration 201
Fig. 5.
Distribution of biases of Estimators with N = 40, T = 60.
202 CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 6.
Distribution of t-statistics with N = 40, T = 60.
Panel Cointegration 203
Fig. 7.
Distribution of biases of Estimators with N = 40, T = 120.
204 CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 8.
Distribution of t-statistics with N = 40, T = 120.
Panel Cointegration 205
206
CHIHWA KAO & MIN-HSIEN CHIANG
While we expected the OLS estimator to be biased, we expected the FMOLS estimator to produce much better estimates. However, it is noticeable that the FMOLS estimator has a downward bias when 21 ≥ 0 and an upward bias when 21 < 0. In general, the FMOLS estimator, ˆ FM, presents the same degree of difficulty with bias as does the OLS estimator, ˆ OLS. For example, while the FMOLS estimator, ˆ FM, reduces the bias substantially and outperforms ˆ OLS when 21 > 0 and 21 < 0, the opposite is true when 21 > 0 and 21 > 0. Likewise, when 21 = –0.8, ˆ FM is less biased than ˆ OLS for values of 21 = –0.8. Yet, for values of 21 = –0.4, the bias in ˆ OLS is less than the bias in ˆ FM. There seems to be little to choose between ˆ OLS and ˆ FM when 21 < 0. This is probably due to the failure of the non-parametric correction procedure in the presence of a negative serial correlation of the errors, i.e. a negative MA value, 21 < 0. Finally, for the cases where 21 = 0.0, ˆ FM outperforms ˆ OLS when 21 < 0. On the other hand, ˆ FM is more biased than ˆ OLS when 21 > 0. In contrast, the results in Table 1 show that the DOLS, ˆ D, is distinctly superior to the OLS and FMOLS estimators for all cases in terms of the mean biases. It was noticeable that the FMOLS leads to a significant bias. Clearly, the DOLS outperformed both the OLS and FMOLS estimators. The FMOLS estimator is also complicated by the dependence of the correction in (11) and (12) upon the preliminary estimator (here we use OLS), which may be biased in finite samples. The DOLS differs from the FMOLS estimator in that the DOLS requires no initial estimation and no non-parametric correction. It is important to know the effects of the variations in panel dimensions on the results, since the actual panel data have a wide variety of cross-section and time-series dimensions. Table 2 considers 20 different combinations for N and T, each ranging from 20 to 120 with 21 = –0.4 and 21 = 0.4. First, we notice that the cross-section dimension has a significant effect on the biases of ˆ OLS, ˆ FM, and ˆ D when N is increased from 1 to 20. However, when N is increased from 20 to 40 and beyond, there is little effect on the biases of ˆ OLS, ˆ FM, and ˆ D. From this it seems that in practice the T dimension must exceed the N dimension, especially for the OLS and FMOLS estimators, in order to get a good approximation of the limiting distributions of the estimators. For example, for each of the estimators in Table 2, the reported bias is substantially less for (T = 120, N = 40) than it is for either (T = 40, N = 40) or (T = 40, N = 120). The results in Table 2 again confirm the superiority of the DOLS. The largest bias in the DOLS with four lags and two leads, DOLS(4, 2), is less than or equal to 0.02 for all cases except at N = 1 and T = 20, which can be compared with a simulation standard error (in parentheses) that is less than 0.007 when N ≥ 20 and, T ≥ 60, confirming the accuracy of the DOLS(4, 2). The biases in DOLS with two lags and one lead, DOLS(2, 1) start off slightly biased
Panel Cointegration
207
at N = 1 and T = 20, and converge to an almost unbiased coefficient estimate at N = 20 and T = 40. The biases of DOLS(2, 1) move in the opposite direction to those of DOLS(4, 2). Figures 1, 3, 5 and 7 display estimated pdfs for the estimators for 21 = –0.4 and = 0.4 with N = 40 (T = 20 in Figure 1, T = 40 in Figure 3, T = 60 in Figure 5 and T = 120 in Figure 7). In Figure 1, N = 40, T = 20, the DOLS is much better centered than the OLS and FMOLS. In Figures 3, 5 and 7, the biases of the OLS and FMOLS were reduced as T increases, the DOLS still dominates the OLS and FMOLS. Monte Carlo means and standard deviations of the t-statistic, t = 0, are given in Table 3. Here, the OLS t-statistic is the conventional t-statistic as printed by standard statistical packages, and the FMOLS and DOLS t-statistics. With all values of 21 and 21, the DOLS(4, 2) t-statistic is well approximated by a standard N(0, 1) suggested from the asymptotic results. The DOLS(4, 2) tstatistic is much closer to the standard normal density than the OLS t-statistic and the FMOLS t-statistic. When 21 > 0 and 21 < 0, the OLS t-statistic is more heavily biased than the FMOLS t-statistic. Again, when 21 > 0 and 21 > 0, the opposite is true. Even when 21 = 0, the FMOLS t-statistic is not well approximated by a standard N(0, 1). The OLS t-statistic performs better than the FMOLS t-statistic when 21 = 0.8 and 21 > 0 and when 21 ≤ –0.4 and 21 = –0.8, but not in other cases. The FMOLS t-statistic in general does not perform better than the OLS t-statistic. Table 4 shows that both the OLS t-statistic and the FMOLS t-statistic become more negatively biased as the dimension of cross-section N increases. The heavily negative biases of the FMOLS t-statistic in Tables 3–4 again indicate the poor performance of the FMOLS estimator. For the DOLS(4, 2), the biases decrease rapidly and the standard errors converge to 1.0 as T increases. Similar to Table 2, we observe from Table 4 that for the DOLS tstatistic the T dimension is more important than the N dimension in reducing the biases of the t-statistics. However, the improvement of the DOLS t-statistic is rather marginal as T increases. Figures 2, 4, 6 and 8 display estimated pdfs for the t-statistics for 21 = –0.4 and = 0.4 with N = 40 (T = 20 in Figure 2, T = 40 in Figure 4, T = 60 in Figure 6 and T = 120 in Figure 8). The figures show clearly that the DOLS t-statistic is well approximated by a standard N(0, 1) especially as T increases. From the results in Tables 2 and 4 and Figures 1–8 we note that the sequential limit theory approximates the limiting distributions of the DOLS and its t-statistic very well. ˆ in (15) It is known that when the length of time series is short the estimate may be sensitive to the length of the bandwidth. In Tables 2 and 4, we first
208
CHIHWA KAO & MIN-HSIEN CHIANG
investigate the sensitivity of the FMOLS estimator with respect to the choice of length of the bandwidth. We extend the experiments by changing the lag length from 5 to 2 for a Barlett window. Overall, the results show that changing the lag length from 5 to 2 does not lead to substantial changes in biases for the FMOLS estimator and its t-statistic. However, the biases of the DOLS estimator and its t-statistic are reduced substantially when the lags and leads are changed from (2, 1) to (4, 2) as predicted from Theorem 3. The results from Tables 2 and 4 show that the DOLS method gives different estimates of and the t-statistic depending on the number of lags and leads we choose. This seems to be a drawback of the DOLS estimator. Further research is needed on how to choose the lags and leads for the DOLS estimator in the panel setting. B. ARMA(1, 1) Error Terms In this section, we look at simulations where, instead of the errors being generated by an MA(1) process, like in (31), the errors are generated by an ARMA(1, 1) process, as in (30). One may question that the MA(1) specification in (31) may be unfair to the FMOLS estimator. One of the reasons why the performance of the DOLS is much better than that of the FMOLS lies in the simulation design in (31), which assumes that the error terms are MA(1) processes. If (uit , it) is an MA(1) process, then uit can be written exactly with three terms, it–1, it, and it + 1 and no lag truncation approximation is required for the DOLS. Tables 5 and 6 report the performance of OLS, FMOLS, and DOLS and their t-statistics when the errors are generated by an ARMA(1, 1) process. Tables 5 and 6 show that the FMOLS estimator and its t-statistic are less biased than the OLS estimator for most cases and is outperformed by the DOLS. Again, when 21 ≥ 0.0 and 21 = 0.8 the FMOLS estimator and its t-statistic suffer from severe biases. On the other hand, we observe that DOLS shows less improvement compared with OLS and FMOLS, in contrast to Tables 1 and 3. However, the good performance of DOLS may disappear for high order ARMA(p, q) error process. C. Non-normal Errors In this section, we conduct an experiment where the error terms are nonnormal. The DGP is similar to that of Gonzalo (1994):
–0.101 (0.038) –0.052 (0.014) –0.035 (0.008) –0.039 (0.024) –0.020 (0.008) –0.013 (0.004) –0.006 (0.015) –0.003 (0.005) –0.002 (0.003) 0.017 (0.009) 0.008 (0.003) 0.005 (0.001)
–0.110 (0.042) –0.052 (0.015) –0.034 (0.008)
–0.073 (0.032) –0.034 (0.011) –0.022 (0.006)
–0.046 (0.025) –0.021 (0.009) –0.014 (0.005)
–0.020 (0.016) –0.008 (0.005) –0.006 (0.003)
21 = –0.8 ˆ FM–
0.002 (0.007) 0.002 (0.002) 0.001 (0.001)
0.001 (0.015) 0.000 (0.005) 0.001 (0.003)
0.001 (0.024) 0.000 (0.008) 0.000 (0.004)
0.003 (0.037) 0.001 (0.012) 0.000 (0.007)
ˆ D–
–0.016 (0.017) –0.007 (0.006) –0.005 (0.003)
–0.035 (0.025) –0.016 (0.008) –0.011 (0.005)
–0.045 (0.028) –0.021 (0.010) –0.013 (0.005)
–0.049 (0.029) –0.024 (0.010) –0.015 (0.006)
ˆ OLS–
0.017 (0.013) 0.008 (0.004) 0.005 (0.002)
–0.013 (0.022) –0.006 (0.007) –0.004 (0.004)
–0.038 (0.027) –0.019 (0.009) –0.012 (0.005)
–0.062 (0.020) –0.031 (0.011) –0.021 (0.006)
21 = –0.4 ˆ FM–
0.003 (0.012) 0.001 (0.004) 0.001 (0.002)
0.001 (0.023) 0.001 (0.008) 0.001 (0.004)
–0.000 (0.028) –0.000 (0.009) –0.000 (0.005)
0.000 (0.030) 0.000 (0.010) –0.000 (0.005)
ˆ D–
0.035 (0.024) 0.016 (0.009) 0.011 (0.005)
–0.001 (0.016) –0.001 (0.006) –0.000 (0.003)
–0.006 (0.013) –0.002 (0.004) –0.002 (0.002)
–0.009 (0.011) –0.004 (0.004) –0.003 (0.002)
ˆ OLS–
0.012 (0.024) 0.007 (0.009) 0.005 (0.005)
–0.034 (0.016) –0.016 (0.005) –0.010 (0.003)
–0.037 (0.014) –0.017 (0.004) –0.012 (0.002)
–0.036 (0.012) –0.017 (0.004) –0.012 (0.002)
21 = 0.8 ˆ FM–
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000 (0.031) –0.000 (0.009) 0.000 (0.005)
0.003 (0.015) 0.001 (0.005) 0.002 (0.003)
–0.001 (0.012) –0.001 (0.004) –0.000 (0.002)
–0.003 (0.009) –0.001 (0.003) –0.001 (0.002)
ˆ D–
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
T = 60
T = 40
21 = –0.8 T = 20
T = 60
T = 40
21 = 0.0 T = 20
T = 60
T = 40
21 = 0.4 T = 20
T = 60
T = 40
21 = 0.8 T = 20
ˆ OLS–
Table 5.
Panel Cointegration 209
–3.569 (1.323) –4.601 (1.219) –5.22 (1.195) –1.857 (1.106) –2.576 (1.044) –3.179 (1.036) –0.353 (0.952) –0.624 (0.897) –0.827 (0.904) 1.733 (0.933) 2.511 (0.871) 3.270 (0.897)
–5.316 (1.929) –7.013 (1.903) –8.437 (1.899)
–4.152 (1.762) –5.424 (1.733) –6.521 (1.721)
–3.184 (1.644) –4.120 (1.616) –4.952 (1.599)
–1.956 (1.529) –2.471 (1.507) –2.966 (1.484)
0.214 (0.663) 0.317 (0.664) 0.428 (0.694)
0.034 (0.956) 0.047 (0.909) 0.058 (0.913)
0.056 (1.132) 0.045 (1.027) 0.034 (1.004)
0.119 (1.290) 0.090 (1.119) 0.068 (1.077)
–1.496 (1.589) –1.888 (1.578) –2.267 (1.571)
–2.538 (1.769) –3.327 (1.771) –4.131 (1.746)
–3.064 (1.867) –4.069 (1.880) –4.899 (1.898)
–3.411 (1.924) –4.583 (1.949) –5.523 (1.969)
OLS
1.429 (1.015) 1.917 (1.010) 2.237 (0.999)
–0.732 (1.226) –0.967 (1.085) –1.141 (1.021)
–1.877 (1.314) –2.346 (1.149) –2.779 (1.114)
–2.912 (1.390) –3.580 (1.216) –4.206 (1.178)
0.221 (1.052) 0.294 (0.956) 0.363 (0.941)
0.038 (1.313) 0.075 (1.116) 0.206 (1.118)
–0.025 (1.388) –0.011 (1.152) –0.027 (1.096)
0.006 (1.417) 0.009 (1.166) –0.006 (1.111)
DOLS
2.315 (1.577) 3.089 (1.644) 3.736 (1.676)
–0.047 (1.498) –0.194 (1.528) –0.064 (1.498)
–0.705 (1.454) –1.099 (1.479) –1.343 (1.473)
–1.158 (1.426) –1.723 (1.445) –2.097 (1.435)
OLS
21 = –0.4 FMOLS
DOLS
21 = –0.8 FMOLS
0.564 (1.195) 0.876 (1.088) 1.132 (1.062)
–2.825 (1.327) –3.557 (1.194) –4.005 (1.096)
–3.858 (1.373) –5.034 (1.268) –6.016 (1.211)
–4.589 (1.420) –6.144 (1.343) –7.428 (1.294)
21 = 0.8 FMOLS
0.002 (1.551) –0.005 (1.239) 0.003 (1.155)
0.230 (1.276) 0.212 (1.095) 0.693 (1.094)
–0.068 (1.208) –0.134 (1.053) –0.144 (1.014)
–0.347 (1.139) –0.505 (1.011) –0.603 (0.978)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
T = 60
T = 40
21 = –0.8 T = 20
T = 60
T = 40
21 = 0.0 T = 20
T = 60
T = 40
21 = 0.4 T = 20
T = 60
T = 40
21 = 0.8 T = 20
OLS
Means Biases and Standard Deviations of t-statistics
Table 6.
210 CHIHWA KAO & MIN-HSIEN CHIANG
Panel Cointegration
211
uit u*it 0.3 = + it *it 21
u*it =
0.4 0.6
u*it–1 , *it–1
(32)
1 0.5*it + (1–0.52)1/2u** it ,
and
*it = ** it , where u** and ** are independent exponential random variables with a it it parameter 1. The results from Tables 7–8 show that while the DOLS estimator performs better in terms of the biases, the distribution of the DOLS t-statistic is far from the asymptotic N(0, 1). The standard deviations of the DOLS tstatistic are badly underestimated. To summarize the results so far, it would appear that the DOLS estimator is the best estimator overall, though the standard error for the DOLS t-statistic shows significant downward bias when the error terms are generated from nonnormal distributions. D. Heterogeneous Panel In Sections A–C we compare the small sample properties of the OLS, FMOLS, and DOLS estimators and conclude that the DOLS estimator and its t-statistic generally exhibit the least bias. One of the reasons for the poor performance of the FMOLS estimator in the homogeneous panel is that the FMOLS estimator needs to use a kernel estimator for the asymptotic covariance matrix, while the DOLS does not. By contrast, for the heterogeneous panel both DOLS in (20) and OLS in (33) use kernel estimators. Consequently, one may expect that the much better performance of the DOLS estimator in Sections 5A-C is limited to only very specialized cases, e.g. in the homogeneous panel. To test this, we now compare the performance of the OLS, FMOLS, and DOLS estimators for a heterogeneous panel using Monte Carlo experiments similar to those in Section 5A. The DGP is yit = i + xit + uit and xit = xit–1 + it for i = 1, . . . , N, t = 1, . . . T, where
–0.011 (0.009) –0.003 (0.002) –0.001 (0.001) –0.008 (0.009) –0.005 (0.004) –0.002 (0.002) –0.010 (0.057) –0.002 (0.014) –0.001 (0.007) 0.022 (0.012) 0.006 (0.003) 0.003 (0.001)
–0.005 (0.009) –0.001 (0.002) –0.001 (0.001)
–0.002 (0.009) –0.002 (0.004) –0.001 (0.002)
0.012 (0.058) 0.003 (0.014) 0.001 (0.007)
0.011 (0.013) 0.003 (0.003) 0.001 (0.001)
= 0.25 ˆ FM–
–0.000 (0.002) 0.000 (0.001) 0.000 (0.000)
0.001 (0.054) 0.000 (0.013) 0.000 (0.006)
–0.001 (0.005) –0.000 (0.001) –0.000 (0.001)
–0.000 (0.002) –0.000 (0.000) –0.000 (0.000)
ˆ D–
0.034 (0.020) 0.009 (0.005) 0.004 (0.002)
0.005 (0.017) 0.001 (0.004) 0.001 (0.002)
–0.002 (0.009) –0.000 (0.002) –0.000 (0.001)
–0.002 (0.006) –0.001 (0.001) –0.000 (0.001)
ˆ OLS–
0.049 (0.019) 0.014 (0.005) 0.007 (0.002)
–0.007 (0.016) –0.002 (0.004) –0.001 (0.002)
–0.008 (0.009) –0.002 (0.002) –0.001 (0.001)
–0.007 (0.006) 0.002 (0.001) –0.001 (0.001)
= 0.5 ˆ FM–
0.001 (0.013) 0.000 (0.003) 0.000 (0.001)
0.001 (0.014) 0.000 (0.003) –0.000 (0.002)
–0.000 (0.005) –0.000 (0.001) –0.000 (0.001)
–0.000 (0.003) –0.028 (0.001) –0.000 (0.000)
ˆ D–
0.039 (0.016) 0.012 (0.004) 0.005 (0.002)
0.001 (0.005) 0.000 (0.001) 0.000 (0.001)
–0.001 (0.004) –0.000 (0.001) –0.000 (0.000)
–0.001 (0.003) –0.000 (0.001) –0.000 (0.000)
ˆ OLS–
0.008 (0.014) 0.003 (0.004) 0.002 (0.002)
–0.005 (0.005) –0.001 (0.001) –0.001 (0.001)
–0.005 (0.004) –0.001 (0.001) –0.001 (0.000)
–0.004 (0.003) –0.001 (0.001) –0.001 (0.000)
=1 ˆ FM–
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000 (0.013) –0.000 (0.003) –0.000 (0.001)
0.000 (0.003) 0.000 (0.001) 0.000 (0.000)
–0.000 (0.002) –0.000 (0.001) –0.000 (0.000)
–0.000 (0.002) –0.000 (0.000) –0.000 (0.000)
ˆ D–
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are non-normal.
T = 60
T = 40
21 = –0.8 T = 20
T = 60
T = 40
21 = 0.0 T = 20
T = 60
T = 40
21 = 0.4 T = 20
T = 60
T = 40
21 = 0.8 T = 20
ˆ OLS–
Table 7.
212 CHIHWA KAO & MIN-HSIEN CHIANG
–1.248 (0.940) –0.892 (0.599) –0.738 (0.488) –0.884 (0.932) –0.787 (0.599) –0.651 (0.488) –0.164 (0.941) –0.106 (0.616) –0.093 (0.505) 1.714 (0.951) 1.249 (0.605) 1.036 (0.492)
–0.699 (1.311) –0.717 (1.253) –0.741 (1.267)
–0.259 (1.243) –0.587 (1.250) –0.611 (1.264)
0.275 (1.271) 0.282 (1.231) 0.264 (1.248)
1.104 (1.326) 1.134 (1.262) 1.163 (1.274)
–0.000 (0.189) 0.001 (0.126) 0.001 (0.102)
0.014 (0.896) 0.013 (0.579) 0.002 (0.477)
–0.071 (0.561) –0.007 (0.230) –0.008 (0.188)
–0.006 (0.209) –0.002 (0.139) –0.002 (0.113)
2.286 (1.278) 2.368 (1.208) 2.416 (1.214)
0.340 (1.236) 0.347 (1.186) 0.332 (1.193)
–0.259 (1.243) –0.268 (1.189) –0.289 (1.197)
–0.472 (1.245) –0.484 (1.191) –0.506 (1.199)
OLS
2.528 (0.976) 1.947 (0.633) 1.637 (0.513)
–0.398 (0.941) –0.268 (0.611) –0.226 (0.497)
–0.884 (0.932) –0.626 (0.599) –0.519 (0.485)
–1.055 (0.931) –0.752 (0.597) –0.623 (0.483)
0.035 (0.650) 0.035 (0.446) 0.033 (0.363)
0.031 (0.784) 0.025 (0.509) 0.013 (0.421)
–0.071 (0.561) –0.054 (0.363) –0.052 (0.299)
–0.039 (0.421) –0.003 (0.276) –0.028 (0.227)
DOLS
2.749 (1.067) 2.946 (0.992) 3.011 (0.981)
0.145 (1.041) 0.141 (0.982) 0.125 (0.978)
–0.199 (1.040) –0.213 (0.981) –0.232 (0.978)
–0.406 (1.040) –0.424 (0.981) –0.445 (0.979)
OLS
= 0.5 FMOLS
DOLS
= 0.25 FMOLS
0.539 (0.984) 0.598 (0.672) 0.538 (0.554)
–0.961 (0.931) –0.685 (0.594) –0.570 (0.478)
–1.152 (0.927) –0.831 (0.589) –0.692 (0.474)
–1.265 (0.925) –0.918 (0.588) –0.764 (0.472)
=1 FMOLS
0.026 (0.899) 0.008 (0.624) –0.002 (0.525)
0.066 (0.619) 0.053 (0.407) 0.039 (0.337)
–0.019 (0.567) –0.016 (0.368) –0.020 (0.304)
–0.118 (0.520) –0.096 (0.336) –0.088 (0.276)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are non-normal.
T = 60
T = 40
21 = –0.8 T = 20
T = 60
T = 40
21 = 0.0 T = 20
T = 60
T = 40
21 = 0.4 T = 20
T = 60
T = 40
T = 20
OLS
Means Biases and Standard Deviations of t-statistics
Table 8.
Panel Cointegration 213
214
CHIHWA KAO & MIN-HSIEN CHIANG
uit u*it 0.3 = + it *it 21
–0.4 0.6
u*it–1 *it–1
with
u*it iid ~N *it
0 , 0
1 21
21 1
.
As in Section A, we generated i from a uniform distribution, U[0, 10], and set = 2. In this section, we allowed 21 and 21 to be random in order to generate the heterogeneous panel, i.e. both 21 and 21 are generated from a uniform distribution, U[–0.8, 0.8]. We hold these values fixed in simulations. An ˆ i, was obtained by the COINT 2.0 with a Bartlett estimate of i = i + i + i, window. The lag truncation number was set at 5. The three estimators considered are the FMOLS, DOLS, and the OLS, where the OLS is defined as ˆ OLS = *
N
T
i=1
t=1
(x** ¯ ** ¯ ** it x i )(x** it x i )
–1
N
T
i=1
t=1
(x** ¯ ** it x i )(y** it )
(33)
T
with x** ¯ ** = it = wi xit, y** it = wiyit, x i
1 T
ˆ –1 x** it , and wi = [i ]11. Two FMOLS
t=1
estimators will be considered, one using the lag length of 5 (FMOLS(5)), the second using the lag length of 2 (FMOLS(2)). Two DOLS estimators are also considered: DOLS with four lags and two leads, DOLS(4, 2) and DOLS with two lags and one lead, DOLS(2, 1). The relatively good performance of the DOLS estimator in a homogeneous panel can also be observed in Table 9. The biases of the OLS and FMOLS estimators are substantial. Again, the DOLS outperforms the OLS and FMOLS. Note from Table 9 that the FMOLS always has more bias than the OLS for all N and T except when N = 1. The poor performance of the FMOLS in the heterogenous panels indicates that the FMOLS in Section 4 is not recommended in practice. A possible reason for the poor performance of the FMOLS in heterogenous panels is that it has to go through two non-parametric corrections, as in (22) and (23). Therefore the failure of the non-parametric correction could be very severe for the FMOLS estimator in heterogenous panels. Pedroni (1996) proposed several alternative versions of the FMOLS estimator such as an FMOLS estimator based on the
Panel Cointegration
Table 9.
215
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators for Different N and T in a Heterogeneous Panel
(N,T)
ˆ OLS– *
ˆ FM(5)– *
ˆ FM(2)– *
ˆ D(4,2)– *
ˆ D(2,1)– *
(1,20)
–0.102 (0.163) –0.052 (0.079) –0.035 (0.052) –0.018 (0.026) –0.025 (0.032) –0.016 (0.014) –0.012 (0.009) –0.006 (0.004) –0.023 (0.024) –0.015 (0.009) –0.013 (0.006) –0.014 (0.004) –0.023 (0.019) –0.015 (0.008) –0.011 (0.005) –0.006 (0.002) –0.022 (0.014) –0.015 (0.006) –0.011 (0.004) –0.006 (0.002)
0.076 (0.319) 0.006 (0.116) –0.004 (0.066) –0.008 (0.027) –0.069 (0.054) –0.041 (0.019) –0.028 (0.011) –0.014 (0.005) –0.089 (0.038) –0.048 (0.013) –0.032 (0.008) –0.014 (0.004) –0.073 (0.031) –0.042 (0.011) –0.029 (0.006) –0.014 (0.003) –0.075 (0.003) –0.042 (0.008) –0.029 (0.004) –0.014 (0.002)
–0.008 (0.212) –0.018 (0.084) –0.014 (0.050) –0.009 (0.023) –0.073 (0.034) –0.035 (0.014) –0.023 (0.009) –0.011 (0.004) –0.083 (0.024) –0.039 (0.009) –0.026 (0.006) –0.012 (0.003) –0.074 (0.019) –0.036 (0.008) –0.023 (0.005) –0.011 (0.002) –0.072 (0.022) –0.036 (0.006) –0.024 (0.004) –0.011 (0.002)
–0.011 (0.405) 0.001 (0.121) 0.001 (0.071) 0.000 (0.030) –0.000 (0.054) –0.001 (0.020) –0.000 (0.012) –0.000 (0.005) 0.000 (0.038) –0.001 (0.014) 0.000 (0.009) –0.000 (0.003) 0.001 (0.031) –0.001 (0.011) –0.000 (0.007) –0.000 (0.003) 0.001 (0.022) –0.001 (0.008) –0.000 (0.005) –0.000 (0.002)
0.004 (0.264) 0.006 (0.099) 0.005 (0.061) 0.002 (0.029) 0.006 (0.040) 0.004 (0.017) 0.003 (0.011) 0.002 (0.005) 0.007 (0.028) 0.004 (0.012) 0.003 (0.008) 0.002 (0.004) 0.006 (0.023) 0.004 (0.009) 0.003 (0.006) 0.002 (0.003) 0.016 (0.011) 0.004 (0.007) 0.003 (0.004) 0.002 (0.002)
(1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 ~ U[–0.8,0.8] and 21 ~ U[–0.8,0.8].
216
CHIHWA KAO & MIN-HSIEN CHIANG
transformation of the estimated residuals and a group-mean based FMOLS estimator. It would be interesting to study further the issues of estimation and inference in heterogenous panels. However, it goes beyond the scope of this chapter. From Table 10, we note that the DOLS t-statistics tend to have heavier tails than predicted by the asymptotic distribution theory, though the bias of the DOLS t-statistic is much lower than those of the OLS and FMOLS t-statistics. It appears that the DOLS still is the best estimator overall in a heterogeneous panel.
V. CONCLUSION This chapter discusses limiting distributions for the OLS, FMOLS, and DOLS estimators in a cointegrated regression. We also investigate the finite sample proprieties of the OLS, FMOLS, and DOLS estimators. The results from Monte Carlo simulations can be summarized as follows: First, for the homogeneous panel, when the serial correlation parameter, 21, and the endogeneity parameter, 21, are both negative, the OLS is the most biased estimator. The OLS is biased in almost all cases for the heterogenous panel. Second, the FMOLS is more biased than the OLS when 21 ≥ 0 and 21 > 0 for the homogeneous panel. The FMOLS is severely biased for the heterogenous panel in almost all trials. This indicates the failure of the parametric correction is very serious, especially in the heterogenous panel. Third, DOLS performs very well in all cases for both the homogeneous and heterogenous panels. Adding the number of leads and lags reduces the bias of the DOLS substantially. This was predicted by the asymptotic theory in Theorem 3. Fourth, the sequential limit theory approximates the limit distributions of the DOLS and its t-statistic very well. All in all, our findings are summarized as follows: (i) The OLS estimator has a non-negligible bias in finite samples. (ii) The FMOLS estimator does not improve over the OLS estimator in general. (iii) The FMOLS estimator is complicated by the dependence of the correction terms upon the preliminary estimator (here we use OLS), which may be very biased in finite samples with panel data. More seriously, the failure of the non-parametric correction for the FMOLS in panel data could be severe. This indicates that the DOLS estimator may be more promising than the OLS or FMOLS estimators in estimating cointegrated panel regressions.
Panel Cointegration
Table 10.
217
Means Biases and Standard Deviations of t-statistics for Different N and T in a Heterogeneous Panel
(N,T)
OLS
FMOLS(5)
FMOLS(2)
DOLS(4,2)
DOLS(2,1)
(1,20)
–0.893 (1.390) –0.861 (1.265) –0.844 (1.233) –0.845 (1.212) –1.221 (1.578) –1.629 (1.344) –1.774 (1.282) –1.957 (1.239) –1.612 (1.640) –2.194 (1.392) –2.417 (1.306) –2.832 (1.234) –1.946 (1.697) –2.715 (1.389) –3.045 (1.328) –3.346 (1.250) –2.675 (1.720) –3.802 (1.408) –4.269 (1.336) –4.715 (1.250)
0.588 (2.473) 0.101 (1.849) –0.095 (1.579) –0.372 (1.336) –2.411 (1.902) –2.899 (1.345) –3.031 (1.195) –3.095 (1.047) –4.381 (1.882) –4.807 (1.341) –4.905 (1.199) –4.886 (1.059) –4.408 (1.884) –5.171 (1.320) –5.361 (1.170) –5.420 (1.033) –6.382 (1.878) –7.399 (1.314) –7.633 (1.162) –7.723 (1.045)
–0.058 (1.643) 0.280 (1.331) –0.347 (1.207) –0.459 (1.139) –2.530 (1.192) –2.518 (0.999) –2.508 (0.952) –2.466 (0.907) –4.079 (1.191) –3.969 (1.004) –3.932 (0.960) –3.839 (0.911) –4.474 (1.182) –4.407 (0.976) –4.380 (0.933) –4.281 (0.889) –6.383 (1.169) –6.272 (0.967) –6.209 (0.931) –6.084 (0.897)
–0.093 (3.303) 0.009 (1.980) 0.016 (1.729) 0.016 (1.510) 0.010 (1.983) –0.059 (1.485) 0.004 (1.329) 0.046 (1.197) 0.039 (1.987) –0.068 (1.472) 0.007 (1.319) 0.099 (1.181) 0.041 (1.932) –0.110 (1.452) –0.027 (1.307) 0.105 (1.181) 0.073 (1.939) –0.145 (1.444) –0.047 (1.307) 0.136 (1.178)
0.029 (2.156) 0.106 (1.618) 0.119 (1.489) 0.101 (1.405) 0.219 (1.468) 0.271 (1.259) 0.347 (1.184) 0.393 (1.121) 0.365 (1.466) 0.432 (1.233) 0.515 (1.169) 0.608 (1.099) 0.408 (1.449) 0.472 (1.221) 0.572 (1.165) 0.697 (1.099) 0.580 (1.439) 0.683 (1.215) 0.803 (1.165) 0.977 (1.098)
(1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 ~ U[–0.8,0.8] and 21 ~ U[–0.8,0.8].
218
CHIHWA KAO & MIN-HSIEN CHIANG
ACKNOWLEDGMENTS We thank Suzanne McCoskey, Peter Pedroni, Andrew Levin and participants of the 1998 North American Winter Meetings of the Econometric Society for helpful comments and Bangtian Chen for his research assistance on an earlier draft of this chapter. Thanks also go to Denise Paul for correcting my English and carefully checking the manuscript to enhance its readability. A Gauss program for this paper can be retrieved from http://web.syr.edu/ ~ cdkao. Address correspondence to: Chihwa Kao, Center for Policy Research, 426 Eggers Hall, Syracuse University, Syracuse, NY. 13244–1020; e-mail:
[email protected].
REFERENCES Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons. Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey. Advances in Econometrics, 15, 7–51. Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different Bargaining Levels Cointegrated? Applied Economics, 26, 353–361. Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management Sciences, 19, 75–114. Gonzalo, J. (1994). Five Alternative Methods of Estimating Long-Run Equilibrium Relationships. Journal of Econometrics, 60, 203–233. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels. Manuscript, University of Cambridge. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 1–44. Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data When the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center for Policy Research, Syracuse University. Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: New Results. Discussion paper, Department of Economics, UC-San Diego. Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test: Evidence From Simulations and the Bootstrap. Oxford Bulletin of Economics and Statistics, 61, 631–652. McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 57–84. Pesaran, H., & Smith, R. (1995). Estimating Long-Run Relationships from Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79–113. Pedroni, P. (1997). Panel Cointegration: Asymptotics and Finite Sample Properties of Pooled Time Series Tests with an Application to the PPP Hypothesis. Working paper, Department of Economics, No. 95–013, Indiana University.
Panel Cointegration
219
Pedroni, P. (1996). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper, Department of Economics, No. 96–20, Indiana University. Phillips, P. C. B., & Durlauf, S. N. (1986). Multiple Time Series Regression with Integrated Processes. Review of Economic Studies, 53, 473–495. Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables Regression with I(1) Processes. Review of Economic Studies, 57, 99–125. Phillips, P. C. B., & Loretan, M. (1991). Estimating Long-Run Economic Equilibria. Review of Economic Studies, 58, 407–436. Phillips, P. C. B., & Moon, H. (1999). Linear Regression Limit Theory for Non-stationary Panel Data. Econometrica, 67, 1057–1111. Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20, 971–1001. Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 9–19. Saikkonen, P. (1991). Asymptotically Efficient Estimation of Cointegrating Regressions. Econometric Theory, 58, 1–21. Summers, R., & Heston, A. (1991). The Penn World Table; An Expanded Set of International Comparisons 1950–1988. Quarterly Journal of Economics, 106, 327–368. Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems. Econometrica, 61, 783–820.
APPENDIX Proof of Theorem 3 First we write (19) in vector form: yi = ei + xi + ZiqC + v˙ i = xi + ZiD + v˙ i (say), where yi, is a T 1 vector of yit; e is T 1 unit vector; Ziq is the T 2q matrix of observations on the 2 q regressors xit q, · · · , xit + q; xi is a vector of T k of xit; C is a (2 q) 1 vector of cij; v˙ i is a T 1 vector of v˙ it; Zi is a T (2 q + 1) matrix, Zi = (e, Ziq); and D is a (2 q + 1) 1 vector of parameters. Let Qi = I Zi(ZiZi) 1Zi. It follows that
N
(ˆ D ) =
1
N
(xiQi xi)
i=1
We rescale (ˆ D ) by NT to get
(xiQiv˙ i) .
i=1
220
CHIHWA KAO & MIN-HSIEN CHIANG
N
NT(ˆ D ) =
1 N
i=1
1 (xiQi xi) T2
N
1 = N
6iT
1
i=1
1
i=1
i=1
1 (xiQiv˙ i) T
N
1 N N
= [6NT] 1[N5NT],
N
1 where 5NT = N
N
1 N N
5iT
i=1
N
1 1 5iT, 5iT = (xiQiv˙ i), 6NT = T N
6iT, and 6iT =
i=1
Observe that from Saikkonen (1991) 6iT = =
1 (xiQi xi) T2 1 (xiWT xi) + op(1) T2
Tq
1 (xit x¯ i)(xit x¯ i) + op(1) = 2 T t=q+1
⇒
˜ i, B˜ iB
and 1 5iT = (xiQiv˙ i) T 1 = (xiWTv˙ i) + op(1) T
Tq
1 (xit x¯ i)˙vit + op(1) = T t=q+1
⇒
B˜ dBui+ ,
1 (xiQi xi). T2
Panel Cointegration
221
1 ee. Then applying T 1 the multivariate Lindeberg-Levy central limit theorem to B˜ idBui+ and N N 1 ˜ i as in Theorem 2, we have B˜ iB combining this with the limit of N i=1
as T → ∞ for all i, where B˜ i = Bi
Bi and WT = IT
N
1 N
1
˜ i B˜ iB
1
B˜ idBui+ ⇒ N(0, 6 1u.)
N
i=1
as N → ∞ . It follows that using the sequential limit theory NT(ˆ D ) ⇒ N(0, 6 1u.) as required.
Proof of Theorem 5
The proof is the same as that of Theorem 3. First, similar to Theorem 3, we write (25) in vector form: y*i = ei + x*i + Z*iqC + v˙ *i = x*i + Z*i D + v˙ *i (say), and define y*i , e, Z*iq, x*i , C, v˙ *i , Z*i , Zi, D, and Q*i as in the proof of Theorem 3. Then we have:
N
ˆ D ) = 1 NT(* N
i=1
1 (x* i Q* i x* i) T2
N
1 = N
8iT
i=1
1
1 N N
i=1
1 (x* ˙ *i ) i Q* iv T
N
1 N N
7iT
i=1
= [8NT] 1[N7NT],
N
where 8iT =
1 7NT = N
1 (x* i Q* i x* i). T2
i=1
N
1
N
7iT,
1 7iT = (x* ˙ *i), i Q* i v T
1 8NT = N
i=1
8iT,
and
222
CHIHWA KAO & MIN-HSIEN CHIANG
Observe that from Assumption 8, we have 8iT = =
1 (x* i Q* i x* i) T2 1 (x* i W* T x* i ) + op(1) T2
T qi
1 (x*it x¯ *i )(x*it x¯ *i ) + op(1) = 2 T t=q +1
⇒
i
˜ iW ˜ i, W
and 1 7iT = (x* ˙ *i) i Q* i v T 1 ˙ *i ) + op(1) = (x* i WT v T
T qi
1 (x*it x¯ *i )˙v*it + op(1) = T t=q +1 i
⇒
˜ idVi, W
as T → ∞ for all i. The remainder of the proof follows that of Theorem 3.
TESTING FOR UNIT ROOTS IN PANELS IN THE PRESENCE OF STRUCTURAL CHANGE WITH AN APPLICATION TO OECD UNEMPLOYMENT Christian J. Murray and David H. Papell ABSTRACT There has been extensive research on testing for unit roots in the presence of structural change and on testing for unit roots in panels. This chapter takes a small step towards combining the two research agendas. We propose a unit root test for non-trending data in the presence of a onetime change in the mean for a heterogeneous panel. The date of the break is determined endogenously. We perform simulations to investigate the power of the test, and apply the test to a data set of annual unemployment rates for 17 OECD countries from 1955 to 1990.
I. INTRODUCTION The work of Perron (1989) has inspired extensive research on testing for unit roots in the presence of structural change. Banerjee, Lumsdaine & Stock (1992), Zivot & Andrews (1992), and Perron (1997), among many others, develop tests which allow the break to be determined endogenously and Lumsdaine & Papell (1997) extend the tests to allow for two breaks. Starting with Levin & Lin (1992), much work has also been done on testing for unit Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 223–238. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
223
224
CHRISTIAN J. MURRAY & DAVID H. PAPELL
roots in panels, including papers by Im, Peseran & Shin (1997), Maddala & Wu (1999), and Bowman (1999). This chapter takes a small step towards combining the two research agendas. We propose a unit root test for non-trending data in the presence of a one-time change in the mean for a heterogeneous panel. The date of the break, which is common across the ‘countries’ of the panel, is determined endogenously and, in the additive outlier framework, is assumed to occur instantaneously. The speed of mean reversion is also common across countries. The intercepts, coefficients on the break dummy variable, and serial correlation structure, however, are country specific. In the context of testing for a unit root in the presence of structural change, our test is most closely related to the work of Perron & Vogelsang (1992). They develop a test for a unit root in non-trending data in the presence of a one-time change in the mean of a single series, with the date of the change determined endogenously. In the panel unit root context, the most closely related work is Papell (1997), who utilizes a feasible generalized least squares (SUR) method which allows for both contemporaneous and heterogeneous serial correlation. Levin & Lin (1992) and Bowman (1999) show that, in the absence of structural change, panel unit root tests have good power in moderately sized samples of 10 or more countries, even with fairly long persistence. We conduct two power experiments, both involving panels of non-trending, stationary series with a one-time change in the mean. First, using conventional panel unit root tests, we find very low power to reject the unit root null. Second, using tests that incorporate structural change, the power is much improved. We apply the test to a data set of annual unemployment rates for 17 OECD countries from 1955 to 1990. Using the panel tests in the presence of structural change, we find much stronger rejections of unit roots than can be found with univariate tests that do not incorporate structural change, panel tests that do not incorporate structural change, or univariate tests that do incorporate structural change.
II. PANEL UNIT ROOT TESTS IN THE PRESENCE OF STRUCTURAL CHANGE In this section, we develop panel unit root tests in the presence of structural change. We first discuss conventional Augmented Dickey-Fuller (ADF) unit root tests, panel unit root tests which do not incorporate structural change, and single-equation unit root tests with structural change, and then describe how to combine elements from the latter two tests to construct a panel unit root test
Testing for Unit Roots in Panels in the Presence of Structural Change
225
with structural change. While our tests are for non-trending data, an extension to trending data would be straightforward. The most common tests for unit roots are Augmented Dickey-Fuller tests. ADF tests for non-trending data involve running the following regression:
k
ut = + ut 1 +
ciut i + t,
(1)
i=1
where ut is the variable of interest. The null hypothesis of a unit root is rejected if the value of the t-statistic for (in absolute value) is greater than the appropriate critical value. While the critical values are non-standard, they are readily available.1 There is substantial evidence that the lag truncation parameter k is best selected according to data-dependent methods rather than choosing a fixed k a priori. We follow the method suggested by Campbell & Perron (1991), Hall (1994), and Ng & Perron (1995). Start with an upper bound kmax on k. If the tstatistic on the coefficient of the last lag is significant, (using the 10% value of the asymptotic distribution of 1.645), then kmax = k. If it is not significant, then k is lowered by one. This procedure is repeated until the last lag becomes significant. If no lag is significant, then k is chosen to equal zero. Panel unit root tests in the ADF framework for non-trending data with heterogeneous intercepts, which are equivalent to including country-specific dummy variables, involve estimating the following regressions:
kj
ujt = j + ujt 1 +
cjiujt i + jt.
(2)
i=1
The subscript j = 1, . . . , N indexes the elements of the panel which, for convenience of exposition, we will call ‘countries’. While Levin & Lin (1992) show that imposing homogeneous intercepts results in substantial increases in power, there is rarely any support for such a restriction in practice. We estimate equation (2) by feasible generalized least squares (SUR), with the coefficient equated across countries and the lag length kj set equal to the value chosen by the single equation models described in equation (1).2 This method accounts for contemporaneous and serial correlation, both of which are often important in practice.3 In Papell (1997), this method is used to investigate purchasing power parity. The critical values for panel unit root tests computed by Levin & Lin (1992) do not incorporate serial correlation in the disturbances. While, if the number of observations is large enough, the panel ADF statistic converges to the
226
CHRISTIAN J. MURRAY & DAVID H. PAPELL
asymptotic distribution of the panel Dickey-Fuller statistic with no serial correlation, this is a serious problem in samples of the size normally used, especially when the recursive t-statistic method is used to select the lag length. Using Monte Carlo methods, we compute finite sample critical values for our test statistics which account for both serial correlation and cross correlation in the residuals. First, we generate unit root series for panels of 5, 10, 15, and 20 countries with 50, 100, and 200 observations. We then fit autoregressive (AR) models to the first differences of each series, using the Schwarz criterion to choose the optimal model, and then treat the optimal estimated AR models as the true data generating process for the errors of each of the series. For each panel, we construct pseudo samples using the optimal AR models with iid N(0, 2) errors where 2 is the estimated innovation variance of the optimal AR model.4 We then integrate the AR models to get the data in levels. Our test statistic is the t-statistic on in equation (2), with the lag length kj for each series chosen by univariate methods as described above. The critical values for the finite sample distributions, obtained from 10,000 replications, are reported in Table 1. We now discuss univariate tests for a unit root in the presence of structural change for non-trending data, using the methods of Perron & Vogelsang (1992). Additive Outlier (AO) models, where the structural change occurs instantaneously, are estimated by the following two equations:5 ut = + DUt + t, (3) and
k
t =
i=0
k
iDTBt i + t 1 +
ci t i + t,
(4)
i=1
where t is the estimated residual from equation (3).6 TB is the break date, DTBt = 1 if t = TB + 1, 0 otherwise, and DUt = 1 if t > TB, 0 otherwise.7 Equations (3) and (4) are estimated sequentially for each break year TB = k + 2, . . . , T 1, where T is the number of observations. The break date is chosen to minimize the t-statistic for , and data-dependent methods are used to select the lag length k. The null hypothesis of a unit root is rejected if the tstatistic on is sufficiently large (in absolute value). The finite sample critical values of Perron & Vogelsang (1992) can be used to assess the significance of the unit root statistic. We proceed to construct a test for unit roots in panel data in the presence of structural change. With heterogeneous intercepts, the panel AO model is estimated by the following two equations:
Testing for Unit Roots in Panels in the Presence of Structural Change
Table 1.
227
Finite Sample Critical Values for Panel Unit Root Tests without Structural Change 1% T
N
5 10 15 20
50
100
200
–5.525 –6.964 –8.327 –9.775
–5.272 –6.604 –7.675 –8.683
–5.121 –6.251 –7.234 –8.119
5% T
N
5 10 15 20
50
100
200
–4.789 –6.244 –7.603 –8.940
–4.641 –5.923 –6.964 –7.955
–4.512 –5.640 –6.629 –7.512
10% T
N
5 10 15 20
50
100
200
–4.452 –5.857 –7.221 –8.528
–4.314 –5.594 –6.621 –7.587
–4.177 –5.317 –6.308 –7.145
ujt = j + DUjt + jt,
(5)
and
kj
jt =
i=0
kj
jtDTBjt i + jt 1 +
i=1
cjt jt i + jt,
(6)
228
CHRISTIAN J. MURRAY & DAVID H. PAPELL
where jt are the residuals from (5), DTBjt = 1 if t = TB + 1, 0 otherwise, DUjt = 1 if t > TB, 0 otherwise, and j = 1, . . . , N indexes the countries. Using the Monte Carlo methods described above, with 2500 replications, we compute finite sample critical values for our test statistic, the t-statistic on in equation (6).8
III. POWER OF PANEL UNIT ROOT TESTS Finite sample critical values for panel unit root tests, which incorporate lag selection, are presented in Table 1. Critical values for panel unit root tests with structural change are presented in Table 2. As mentioned earlier, we allow for panels 5, 10, 15, and 20 countries (N), with 50, 100, and 200 observations (T). In selecting the lag length, kmax is set to 4, 8, and 12 for T = 50, 100, and 200 respectively. Tables 1 and 2 reveal three properties of panel unit root statistics. An increase in T leads to a decrease in the absolute value of the critical value of the unit root statistic, whereas an increase in N increases its absolute value. Also, allowing for structural change increases the absolute value of the panel unit root statistic. We now focus on the power of the t-statistic on in equations (3) and (4) and equations (5) and (6). The range of (the sum of the AR coefficients) we consider is 0.95, 0.90, and 0.80. We consider mean shifts, , of 0.5 and 1.0. In the following empirical application, these values correspond to a one-half and full percentage point increase in the unemployment rate. We set the break date in the middle of the sample, i.e. TB = T/2.9 Tables 3 and 4 present the finite sample power of panel unit root tests without and with structural change, respectively. The AR length is again chosen by the Schwarz criterion. The number of repetitions used for Table 3 is 2500, while 1000 repetitions are used for Table 4. The upper bound on the standard error of rejection frequencies in Table 4 is 0.016. Table 3 documents the generally poor power of panel unit root tests which fail to allow for a shift in mean which is indeed present. For the alternative closest to the null, = 0.95 and = 0.5, power is essentially zero. Holding constant, power monotonically increases as is lowered to 0.90 and 0.80, but it is only for the latter case where we begin to see decent power for a reasonable amount of data. Holding constant, increasing monotonically reduces power. This is consistent with Perron’s (1989) finding that for a stationary time series, a larger mean shift increases the probability of spuriously finding a unit root. This is problematic in the context of our following empirical example. A value of = 1 corresponds to a small (1%), permanent change in the mean unemployment rate. Our results suggest that if is close to but less than one,
Testing for Unit Roots in Panels in the Presence of Structural Change
Table 2.
229
Finite Sample Critical Values for Panel Unit Root Tests with Structural Change 1% T
N
5 10 15 20
50
100
200
–7.329 –9.056 –10.940 –12.667
–6.941 –8.658 –9.995 –11.103
–6.915 –8.415 –9.571 –10.672
5% T
N
5 10 15 20
50
100
200
–6.613 –8.484 –10.279 –12.011
–6.432 –8.046 –9.461 –10.618
–6.334 –7.852 –9.105 –10.225
10% T
N
5 10 15 20
50
100
200
–6.344 –8.203 –10.025 –11.705
–6.113 –7.785 –9.184 –10.361
–6.051 –7.553 –8.815 –9.958
it is probable that panel unit root tests will incorrectly find that unemployment is integrated, rather than stationary around a one time shift in mean. Table 4 demonstrates that allowing for a mean shift greatly increases power relative to Table 3. For all values of and considered, the power is at least 50%, and often times 100%, for a panel of at least 10 countries with at least 100 observations. Indeed, for T = 100, there are only two instances in which the power is less that 50%, and those occur for the smallest panel considered, N = 5, and the most persistent value of , 0.95.
230
CHRISTIAN J. MURRAY & DAVID H. PAPELL
Table 3.
N
N
N
5 10 15 20
5 10 15 20
5 10 15 20
Power of Panel Unit Root Tests without Structural Change = 0.95, = 0.5
= 0.95, = 1.0
T
T
50
100
200
0.0004 0.0008 0.0000 0.0000
0.0008 0.0004 0.0000 0.0000
0.0008 0.0000 0.0000 0.0000
N
5 10 15 20
50
100
200
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
= 0.90, = 0.5
= 0.90, = 1.0
T
T
50
100
200
0.0180 0.0116 0.0120 0.0084
0.0560 0.1204 0.2300 0.3084
0.3780 0.8312 0.9608 0.9924
N
5 10 15 20
50
100
200
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0008
= 0.80, = 0.5
= 0.80, = 1.0
T
T
50
100
200
0.3652 0.6848 0.8216 0.8732
0.8400 0.9908 0.9992 1.0000
0.9872 1.0000 1.0000 1.0000
N
5 10 15 20
50
100
200
0.0036 0.0052 0.0052 0.0044
0.0336 0.1784 0.4208 0.6432
0.2052 0.6876 0.9124 0.9872
IV. EMPIRICAL EXAMPLE: UNIT ROOTS IN UNEMPLOYMENT We use annual series of unemployment for 17 OECD countries from 1955 to 1990. The source of the data is Layard, Nickell & Jackman (1991). We do not update the data past 1990. Unemployment rates rose sharply, especially in Europe, during the early 1990s. In Papell, Murray & Ghiblawi (2000), the single equation methods of Bai & Perron (1998) detect considerable evidence
Testing for Unit Roots in Panels in the Presence of Structural Change
Table 4.
N
N
N
5 10 15 20
5 10 15 20
5 10 15 20
231
Power of Panel Unit Root Tests with Structural Change = 0.95, = 0.5
= 0.95, = 1.0
T
T
50
100
200
0.0710 0.0840 0.0810 0.0520
0.2320 0.5160 0.7250 0.8730
0.8460 0.9960 1.0000 1.0000
N
5 10 15 20
50
100
200
0.0220 0.0160 0.0060 0.0020
0.4130 0.7570 0.8770 0.9570
0.9980 1.0000 1.0000 1.0000
= 0.90, = 0.5
= 0.90, = 1.0
T
T
50
100
200
0.2750 0.4730 0.5730 0.6600
0.7790 0.9930 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000
N
5 10 15 20
50
100
200
0.2920 0.5150 0.5600 0.5590
0.9430 1.0000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000
= 0.80, = 0.5
= 0.80, = 1.0
T
T
50
100
200
0.8000 0.9910 0.9990 0.9990
1.0000 1.0000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000
N
5 10 15 20
50
100
200
0.8000 0.8520 0.9960 0.9990
1.0000 1.0000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000
of multiple structural changes with unemployment data extended through 1997. Testing for unit roots in panels with multiple structural changes, however, is well beyond the scope of this chapter. Our empirical results, therefore, should be interpreted as an illustration of the techniques rather than as an economic analysis of postwar unemployment. The first step in our investigation is to test for unit roots using methods that do not account for structural change. The objective of this exercise is to provide a benchmark for our later results. We run Augmented Dickey-Fuller (ADF)
232
CHRISTIAN J. MURRAY & DAVID H. PAPELL
tests, as in equation (1), for each of the 17 countries in the sample. The results of the ADF tests are reported in Table 5. We set kmax to 4. Using critical values from MacKinnon (1991), we find that the null of a unit root cannot be rejected for any of the series at the 10% level. Table 5.
Augmented Dickey-Fuller Tests
Country
k
Australia
0.437 (1.60) 0.188 (1.26) 0.337 (1.48) 0.819 (1.61) 0.222 (0.82) 0.359 (1.42) 0.176 (1.38) 0.239 (1.19) 0.470 (1.36) 0.597 (2.04) 0.210 (1.91) 0.248 (1.21) 0.435 (1.01) 0.369 (1.85) 0.413 (1.82) 0.391 (1.38) 1.389 (2.14)
0.936 (–1.15) 0.915 (–1.28) 0.953 (–1.40) 0.893 (–1.46) 0.993 (–0.14) 0.912 (–1.26) 0.987 (–0.54) 0.929 (–1.32) 0.952 (–1.28) 0.885 (–2.08) 0.883 (–2.04) 0.966 (–0.96) 0.835 (–0.84) 0.945 (–2.25 ) 0.760 (–1.37) 0.947 (–1.14) 0.766 (–2.16)
0
Austria Belgium Canada Denmark Finland France Germany Ireland Italy Japan Netherlands Norway Spain Sweden U.K. U.S.A.
1 1 0 4 2 1 1 1 3 3 2 2 3 2 2 0
Note: The critical values for the ADF test, calculated from MacKinnon (1991) with 36 observations, are –3.62 (1%), –2.94 (5%), and –2.61 (10%). Numbers in parentheses are t-statistics.
Testing for Unit Roots in Panels in the Presence of Structural Change
233
One possible reason for the failure of the ADF tests to reject the unit root hypothesis is the relatively short (36 years) time span of the data.10 We investigate this possibility by conducting panel unit root tests, described by equation (2), to exploit cross-section variability among the 17 unemployment rates. The results of the panel unit root tests are reported in Table 6.11 The null hypothesis of a unit root cannot be rejected, at even the 10% level, either for the OECD countries as a whole or for smaller panels consisting of European (13), European Community (EC) (9), European Free Trade Area (EFTA) (4), Non-European (4), or Non-EC (EFTA plus Non-Europe) (8) countries.12 The results for the univariate AO model of equations (3) and (4) are reported in Table 7. The null hypothesis of a unit root is rejected for Finland, Ireland and Spain at the 1% level, Belgium, France, Italy and Norway at the 5% level, and Austria, Canada, Denmark, and the United Kingdom at the 10% level. The structural breaks are all positive, reflecting the general rise in unemployment among the OECD countries. The structural break occurs between 1974 and 1976 for nine out of eleven countries for which the unit root null can be rejected. The results of the panel unit root tests from equations (5) and (6) that account for structural change, along with the associated critical values, are reported in Table 6.
Panel Unit Root Tests
Group
N
t
OECD EUROPE EC NON-EC EFTA NON-EUROPE
17 13 9 8 4 4
0.924 0.936 0.941 0.846 0.868 0.863
–6.40 –4.73 –3.96 –4.82 –3.04 –3.52
1%
5%
10%
–10.16 –8.52 –7.09 –6.83 –5.45 –5.45
–9.00 –7.58 –6.28 –5.99 –4.67 –4.67
–8.48 –7.16 –5.86 –5.58 –4.27 –4.27
Critical Values Group OECD EUROPE EC NON-EC EFTA NON-EUROPE
234
CHRISTIAN J. MURRAY & DAVID H. PAPELL
Table 8.13 The unit root hypothesis is strongly (at the 1% level) rejected in favor of stationarity with a one-time break in 1975 for the OECD, European, and EC countries and a break in 1973 for the non-EC and EFTA countries. For the nonTable 7. The Additive Outlier Model Country
Break Year
k
Australia
1973 1979
Belgium
1975
Canada
1976
Denmark
1975
Finland
1974
France
1975
Germany
1972
Ireland
1976
Italy
1976
Japan
1969
Netherlands
1976
Norway
1986
Spain
1974
Sweden
1964
U.K.
1974
U.S.A.
1974
4.536 (10.61) 1.460 (6.42) 6.908 (13.99) 3.754 (8.17) 5.696 (11.93) 2.885 (8.65) 5.914 (11.81) 3.317 (6.01) 7.287 (8.19) 1.907 (4.20) 0.423 (2.38) 6.662 (10.55) 1.781 (4.91) 11.463 (8.20) 0.334 (2.01) 5.604 (8.82) 2.141 (5.67)
0.609 (–3.99) 0.623 (–4.33)c 0.404 (–4.96)b 0.277 (–4.33)c 0.513 (–4.34)c 0.227 (–6.64)a 0.660 (–4.95)b 0.732 (–3.63) 0.657 (–7.58)a 0.702 (–4.75)b 0.783 (–3.53) 0.606 (–4.06) 0.303 (–4.78)b 0.685 (–7.61)a 0.536 (–3.87) 0.493 (–4.60)c 0.251 (–4.10)
0
Austria
2.053 (6.99) 1.704 (13.55) 2.771 (8.70) 5.145 (17.95) 2.557 (8.29) 1.915 (8.61) 2.052 (6.35) 1.417 (3.63) 5.627 (10.14) 4.650 (16.43) 1.653 (12.19) 1.945 (4.94) 2.094 (16.96) 2.400 (2.57) 1.470 (10.40) 2.715 (6.41) 4.840 (19.21)
1 4 3 3 1 4 1 3 3 3 2 1 4 1 4 3
Note: The critical values for the AO model, reported in Perron and Vogelsang (1992), are –5.20 (1%), –4.67 (5%), and –4.33 (10%). Numbers in parentheses are t-statistics. Superscripts a, b, and c denote rejection of the unit root null at the 1%, 5%, and 10% significance levels respectively.
Testing for Unit Roots in Panels in the Presence of Structural Change
Table 8.
235
Panel Unit Root Tests with Structural Change
Group
N
Break Year
t
OECD
17
1975
0.638
–21.91a
EUROPE
13
1975
0.651
–18.92a
EC
9
1975
0.670
–16.15a
NON-EC
8
1973
0.550
–10.36a
EFTA
4
1973
0.557
–8.45a
NON-EUROPE
4
1975
0.629
–5.61
Critical Values Group
1%
5%
10%
OECD
–12.38
–11.56
–11.16
EUROPE
–10.89
–10.00
–9.63
EC
–9.13
–8.35
–7.97
NON-EC
–8.60
–8.01
–7.66
EFTA
–7.18
–6.46
–6.11
NON-EUROPE
–7.18
–6.46
–6.11
Note: Superscripts a, b, and c denote rejection of the unit root null at the 1%, 5%, and 10% significance levels respectively.
Europe countries, the unit root null could not be rejected at the 10% level. This panel, however, consists of only four countries.
V. CONCLUSIONS The purpose of this chapter was to develop and implement panel unit root tests in the presence of structural change. To that end, we combine methods from two previously disjoint literatures: testing for a unit root in panels and testing
236
CHRISTIAN J. MURRAY & DAVID H. PAPELL
for a unit root in the presence of structural change. The resultant test allows for both serial and contemporaneous correlation, both of which are often found to be important in the panel unit root context. The motivation for the test comes from the hypothesis that ‘conventional’ panel unit root tests, those that do not incorporate structural change, will have low power if the data are stationary with structural change. While this is well established in the univariate literature, it is only a conjecture in the panel context. We investigate this conjecture by conducting power experiments for panels of non-trending, stationary series with a one-time change in the mean, and find that conventional panel unit root tests generally have very low power. We then conduct the same experiments using methods that test for a unit root in the presence of structural change, and find that the power of the tests is much improved. We apply our test to a data set of annual unemployment rates for 17 OECD countries from 1955 to 1990. For these countries, unit root tests that do not incorporate structural change, whether univariate or panel, provide no evidence against the unit root null. While univariate tests that incorporate structural change do provide some evidence against unit roots, the short span of the data suggests that power may be problematic. Using our panel test with a one-time structural change, we find very strong evidence of regime-wise stationarity. This evidence is both for the full panel and for a number of smaller subpanels. Our work could be extended in a number of directions. While the test incorporates a one-time break in non-trending data, extensions to multiple breaks and/or trending data would be straightforward. Once variety in the number of breaks, type of breaks, number of countries, and number of observations are allowed for, the number of possibilities increases rapidly. With the availability of programs for calculating critical values, we suspect that it will be more fruitful to develop tests on a case-by-case basis rather than attempt to achieve generality.14
NOTES 1. MacKinnon (1991) shows how to calculate critical values for ADF tests for any sample size. 2. If the coefficient is not equated across countries, as in Breuer, McNown & Wallace (2000), the gains in power over univariate methods are much smaller. Im, Peseran & Shin (1997) report higher power without equating across countries, but their alternative hypothesis is that one member of the panel, rather than all members, are stationary.
Testing for Unit Roots in Panels in the Presence of Structural Change
237
3. If there is no serial correlation (k = 0), or if the k’s and c’s are constrained to be equal across countries, as in O’Connell (1998), the FGLS estimator can be iterated to achieve maximum likelihood. These restrictions, however, rarely (if ever) hold in practice. 4. For all of the critical value calculations, we generate 50 more observations than are reported, and then discard the first 50 observations. 5. Innovational outlier models, where the structural change occurs gradually, can also be estimated. 6. As explained by Perron & Vogelsang (1992), the dummy variables DTBt–i are included to ensure that the t-statistic on in equation (4) has the same asymptotic distribution as in the IO model and is invariant to the value of k. 7. The dummy variable DTBt is included to allow for a change in the mean under the null. 8. Abuaf and Jorion (1990) conduct panel unit root tests which allow for structural change, but the time of the break is assumed to be known a priori. 9. The results in Tables 3 and 4 are qualitatively unchanged for TB = T/4 or 3T/4. 10. Froot & Rogoff (1995) show that, if a variable follows a stationary AR(1) process with a half life of three years, it would take 72 years of annual data to reject the unit root null using the 5% Dickey-Fuller critical value. 11. The critical values, also reported in Table 6, are calculated for the exact number of countries and observations in each of the panels, using the Monte Carlo methods described above. 12. The members of the EC (included in our data) are Belgium, Denmark, France, Germany, Ireland, Italy, Netherlands, Spain, and the United Kingdom. The EFTA countries are Austria, Finland, Norway, and Sweden. 13. The critical values are calculated for the exact number of countries and observations in each of the panels, using the Monte Carlo methods described above. 14. An example is Papell (2000), who develops a panel unit root test in the presence of three breaks in the slope, but none in the intercept, of the trend function, with further restrictions imposed for consistency with purchasing power parity.
REFERENCES Abuaf, N., & Jorion, P. (1990). Purchasing Power Parity in the Long Run. Journal of Finance, 45, 157–174. Bai, J., & Perron, P. (1998). Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica, 66, 47–78. Banerjee, A., Lumsdaine, R. L., & Stock, J. H. (1992). Recursive and Sequential Tests of the Unit Root and Trend-Break Hypotheses: Theory and International Evidence. Journal of Business and Economic Statistics, 10, 271–288. Bowman, D. (1999). Efficient Tests for Autoregressive Unit Roots in Panel Data. IFDP #646, Board of Governors of the Federal Reserve System. Breuer, J., McNown, R., & Wallace, M. (2000). The Quest for Purchasing Power Parity With A Series-Specific Test using Panel Data. Working paper, Department of Economics, University of South Carolina.
238
CHRISTIAN J. MURRAY & DAVID H. PAPELL
Campbell, J. Y., & Perron, P. (1991). Pitfalls and Opportunities: What Macroeconomists Should Know About Unit Roots. In: O. J. Blanchard & S. Fischer (Eds), NBER Macroeconomic Annual (pp. 141–201). Cambridge: MIT Press. Froot, K. A., & Rogoff, K. (1995). Perspectives on PPP and Long-Run Real Exchange Rates. In: G. Grossman & K. Rogoff (Eds), Handbook of International Economics, Vol. 3 (pp. 1647– 1688). North Holland: Amsterdam. Hall, A. R. (1994). Testing for a Unit Root in Time Series with Pretest Data-Based Model Selection. Journal of Business and Economic Statistics, 12, 461–470. Im, S., Pesaran, H., & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. Working paper, Department of Economics, University of Cambridge. Layard, R., Nickell, S., & Jackman, R. (1991). Unemployment: Macroeconomic Performance and The Labour Market. Oxford: Oxford University Press. Levin, A., & Lin, C. F. (1992). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. Discussion paper 92–23, Department of Economics, University of CaliforniaSan Diego. Lumsdaine, R. L., & Papell, D. H. (1997). Multiple Trend Breaks and the Unit Root Hypothesis. Review of Economics and Statistics, 79, 212–218. Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631–652. MacKinnon, J. G. (1991). Critical Values for Cointegration Tests. In: R. F. Engle & C. W. J. Granger (Eds), Long-Run Economic Relationships: Readings in Cointegration (pp. 267– 276). Oxford: Oxford University Press. Ng, S., & Perron, P. (1995). Unit Root Tests in ARMA Models with Data Dependent Methods for the Selection of the Truncation Lag. Journal of the American Statistical Association, 90, 268–281. O’Connell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of International Economics, 44, 1–20. Papell, D. H. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float. Journal of International Economics, 43, 313–332. Papell, D. H. (2000). The Great Appreciation, the Great Depreciation, and the Purchasing Power Parity Hypothesis. Working paper, Department of Economics, University of Houston. Papell, D. H., Murray, C. J., & Ghiblawi, H. (2000). The Structure of Unemployment. Review of Economics and Statistics, 82, 309–315. Perron, P. (1989). The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis. Econometrica, 57, 1361–1401. Perron, P. (1997). Further Evidence on Breaking Trend Functions in Macroeconomic Variables. Journal of Econometrics, 80, 355–385. Perron, P., & Vogelsang, T. J. (1992). Non-stationarity and Level Shifts With An Application to Purchasing Power Parity. Journal of Business and Economic Statistics, 10, 301–320. Zivot, E., & Andrews, D. W. K. (1992). Further Evidence on the Great Crash, the Oil- Price Shock, and The Unit Root Hypothesis. Journal of Business and Economic Statistics, 10, 251–270.
PANEL DATA LIMIT THEORY AND ASYMPTOTIC ANALYSIS OF A PANEL REGRESSION WITH NEAR INTEGRATED REGRESSORS Heikki Kauppi ABSTRACT This chapter develops a new limit theory for panel data with large numbers of cross section, n, and time series, T, observations. The results apply when n and T tend to infinity simultaneously and provide useful tools for obtaining convergencies in probability and in distribution in cases where the panel data may be cross sectionally heterogenous in a fairly general way. We demonstrate how the new theory can be applied to derive asymptotics for a panel regression where regressors are generated by a local to unit root process with heterogenous localizing coefficients across cross section.
I. INTRODUCTION In the last few years much new research has emerged that develops econometric methods for panel data where both the numbers of cross section and time series observations are large. This research is motivated by the increasing availability of important panel data sets that cover large numbers of different countries, sectors, and individuals over long periods of time. Many of these data sets Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 239–274. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
239
240
HEIKKI KAUPPI
consist of macroeconomic variables that display characteristics resembling those generated by integrated processes. Accordingly, standard panel methods cannot be applied for these data and an appropriate method has to take into account the possible strong persistence of the data. Therefore, particular techniques have been developed for testing for unit roots and cointegration in panel data and for statistical analysis of panel regressions with integrated regressors. Typical empirical applications of these methods involve estimation and testing for the existence of long-run relationships between international financial series such as relative prices and spot and future exchange rates. The purpose of this chapter is to develop a new panel data limit theory that can be applied to derive asymptotics for a variety of interesting estimators and test statistics in the context of models for panel data with large cross sectional dimension, n, and time series dimension, T. Our new theory assumes that n and T tend to infinity simultaneously and builds upon the concepts of joint convergence in probability and in distribution for double indexed processes developed by Phillips & Moon (1999a). The contribution of the chapter is to develop new versions of the law of large numbers and the central limit theorem that apply in panels where the data may be cross sectionally heterogenous in a fairly general way. We demonstrate the usefulness of the new theory in an application where we study asymptotic inference in a panel regression in which the regressors are generated by an autoregressive process with a root local to unity. In this framework, both the regression errors and the errors that drive the autoregressive regressors are specified by a general linear process. The model then deviates from the previously analyzed panel cointegration regressions only in that the autoregressive parameters in the regressors are not necessarily exactly equal to one but rather may be just within a range of near alternatives to unity. This generalization of earlier models is motivated by the fact that in most empirical questions in macroeconomics and finance where the new panel cointegration methods are applied an assumption about exact unit roots can be considerably uncertain. Given that near unit roots are known to result in severe inferential problems for the usual time series cointegration methods it is important to examine related problems in the context of panel data analysis. Our application of the panel asymptotics reveals the following. First, due to error serial correlation biases the usual pooled panel OLS estimator is invalid for inference. Second, a corrected version of this estimator proved to be nTconsistent with an asymptotic normal distribution centered to the true regression parameter irrespective whether the regressors have near or exact unit roots. Unfortunately, this positive result only holds in the special case where the model does not exhibit any deterministic effects, such as individual
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
241
intercepts. In the third application, we derive asymptotics for a pooled panel fully modified estimator of Phillips & Moon (1999a) who assumed exact unit roots. The asymptotic results show that this estimator is subject to severe bias effects, if the regressors are nearly rather than exactly integrated. Our theoretical findings are illustrated by small sample simulations. Overall, the analysis indicates that near unit roots are in general likely to result in insuperable inferential problems even in the context of panel data analysis. The organization of the chapter is as follows. The new limit theorems are given in Section II. Section III presents the applications of the panel asymptotics, while concluding remarks are given in Section IV. Proofs of the theorems are in the appendix.
II. THEORY In panel data limit theory we consider a double indexed process Xn, T , in which both n and T tend to infinity. In general, the limit of Xn, T depends on the treatment of the indices n and T, and the properties that link the two dimensions of the process. Phillips & Moon (1999a) discuss different approaches. One possibility is to allow n and T to pass to infinity along a diagonal path determined by a monotonically increasing functional relation of the type T = T(n) as the index n → . This approach simplifies the asymptotic theory by replacing Xn, T with a single indexed process Xn, T(n). However, a drawback of this diagonal path limit theory is that the assumed expansion path (n, T(n)) → may not provide an appropriate approximation for a given (n, T) situation. Furthermore, the limit theory is likely to depend on the specific functional relation T = T(n) that is used in the asymptotic development. Following Phillips & Moon (1999a) we therefore focus on an alternative approach where n and T are allowed to tend to infinity simultaneously without imposing a specific diagonal path for the divergence of the indices. Merely as an auxiliary tool, we also consider a special form of multi-index asymptotics, called the sequential limit theory. Again, this theory is introduced by Phillips & Moon (1999a). The general idea of this approach is to derive limit results in two steps. The first step is to fix one index, say n, and allow the other, say T, to pass to infinity, giving an intermediate limit. The final limit result is then obtained by letting n tend to infinity subsequently. While the sequential limit theory can offer an easy route to a limit result it may give asymptotic results that are misleading in cases where both indices tend to infinity simultaneously (see Phillips & Moon (1999b)). Nevertheless, this theory can often serve as a helpful tool to obtain conjectures about limit results that hold under the more general joint limit theory.
242
HEIKKI KAUPPI
In this section, we consider a general double indexed process of the form
n
1 Xn, T = kn
Yi, T,
i=1
where the Yi, T are independent random vectors across i and kn is either n or n. A typical Yi, T component is a standardized sum of the time series component of the panel data. Examples are given in the following section. To this end,
n
1 suppose we are interested in the probability limit of Xn, T = n
Yi, T. Assume
i=1
p
Yi, T → Yi as T → for all i. Then, by the independence of Yi, T across i for all T,
n
1 it follows that Xn, T → Xn as T → for all n, where Xn = n p
Yi. Here it should
i=1
be noticed that one has to assume that the Yi are defined on the same probability
n
1 space for all i so that the sum of the limit random variables n
Yi is well
i=1
defined on the same probability space. This can be justified as shown by Phillips & Moon (1999a, Appendix B). By allowing n → and applying an
n
1 appropriate law of large numbers to Xn = n
Yi we may then find the
i=1
n
1 sequential limit of Xn, T . Let X = lim n→ n
E(Yi) exist and be finite. Then,
i=1
p
Xn → X so that as T → followed by n → , p
Xn, T → X. This is a sequential probability limit result in the sense defined by Phillips & Moon (1999a). In general, the sequential probability limit X of Xn, T is not the same as the probability limit of Xn, T under joint convergence of the indices n, T and may not even exist or requires a different normalization. Examples are given in Phillips & Moon (1999b). Therefore, an interesting question arises: when does the sequential limit coincide with the joint limit? The following theorem is adopted from Phillips & Moon (1999a, Theorem 1) and gives sufficient conditions under which the joint probability limit and the sequential probability limit are
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
243
identical. Hereafter, we denote by (n, T → ) the joint limit as T → and n → simultaneously. Also, note that below ‘ ⇒ ’ denotes weak convergence of the associated probability measure, ||A|| is the usual notation for the Euclidean norm tr(AA) of a matrix A, 1{.} denotes an indicator function, and lim supn, T xn, T signifies the superior limit of a sequence {xn, T} when joint convergence is considered. Theorem 1. Suppose the random (k 1) vectors Yi, T are independent across i for all T and integrable. Assume that Yi, T ⇒ Yi as T → for all i. Let the following conditions hold:
n
(i)
lim supn, T
1 n
E||Yi, T|| < ,
i=1 n
1 (ii) lim supn, T n
||E(Yi, T) E(Yi)|| = 0,
i=1 n
1 (iii) lim supn, T n
E||Yi, T||1{||Yi, T|| > n} = 0 for all > 0,
i=1
n
1 (iv) lim supn n 1 If limn n
E||Yi||1{||Yi|| > n} = 0 for all > 0.
i=1
n
i=1
n
1 E(Yi) = X exists and Xn = n
p
Yi → X as n → , then
i=1
n
Xn, T =
1 n
p
Yi, T → X as (T, n → ).
i=1
Theorem 1 gives fairly general conditions under which a joint probability limit can be established. However, in many cases it may be rather tedious to verify all the required conditions (i) through (iv) of the theorem. As shown by Corollary 1 of Phillips & Moon (1999a) somewhat easier conditions can be obtained in the special case, where the Yi, T are scaled variates of an iid process. However, there are certainly various interesting situations where the heterogeneity of the different panel members arises from other sources so that Corollary 1 of Phillips & Moon (1999a) cannot be applied. Therefore, for dealing with heterogenous panels of other types we have designed the following theorem. The basic idea of Theorem 2 arises from Markov’s law of large numbers that applies in the case of independent variates Zi satisfying ‘Markov’s condition’, E||Zi||1 + ≤ M < for some > 0 and for all i.
244
HEIKKI KAUPPI
Theorem 2. Suppose that the random (k 1) vectors Yi, T are independent across i for all T and integrable. Assume that Yi, T ⇒ Yi as T → for all i. Let the following conditions hold: (a) supi||E(Yi, T) E(Yi)|| → 0 as T → . (b) supTE||Yi, T||1 + ≤ M < for some > 0 and for all i,
n
1 If limn n
E(Yi) = X exists, then
i=1
n
1 n
p
Yi, T → X as (T, n → ).
i=1
We turn to consider conditions under which we can obtain convergencies in distribution as (n, T → ). As in the case of the probability limit, we can often
n
easily derive a sequential weak convergence result for Xn, T =
1
Yi, T, say. n i = 1 (Examples are given in Phillips & Moon (1999a, b).) As to how to obtain convergencies in joint limits as (T, n → ), again, Phillips & Moon (1999a) give some general results. Their Theorem 2 provides a joint central limit theorem for (T, n → ) that employs a Lindeberg condition for double indexed processes. In addition, their Theorem 3 gives a version which applies to iid variates scaled differently across cross section. Again, to deal with other types of heterogeneities across cross section we have developed the following version of the joint central limit theorem. Theorem 3. Suppose that Yi, T are independent scalar variables across i for all T with E(Yi, T) = 0 and Var(Yi, T) = Vi, T. Assume the following conditions hold:
n
1 (i) limn, T n
Vi, T = V is finite and positive,
i=1 2+ i, T
(ii) supTE|Y |
≤ M < for some > 0 and for all i.
Then,
n
Xn, T =
1 n
Yi, T ⇒ N(0, V) as (T, n → ).
i=1
The basic idea of Theorem 3 is to employ a Lyapunov condition to guarantee that the Lindeberg condition holds. The corresponding vector case can be handled by using Theorem 3 and the Cramer-Wold device.
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
245
III. AN APPLICATION Most of the recent applications of the new large n, T panel data limit theory has involved studying and developing estimators and tests for panel cointegrating regressions where the regressors are integrated of order one. In this section we analyze problems that arise in these models when the regressors are nearly rather than exactly integrated of order one. We start by introducing the model and assumptions. A. The Model We focus on the simple two variable panel regression yi, t = xi, t + ui, t,
(1)
ci xi, t = i xi, t 1 + i, t, i = exp(ci /T) 1 + , T
(2)
(t = 1, . . . , T, i = 1, . . . , n), where the initial values zi, 0 = (yi, 0, xi, 0) are iid, E||zi, 0||4 < , and the errors are specified below. To this end, notice that if i = 1 (i.e. ci = 0) in (2) for each i, then the xi, t are pure or exact unit root processes and the system given by equations (1) and (2) coincides with the homogenous panel cointegration regression studied by Phillips & Moon (1999a) and many others (for a survey, see Phillips & Moon (1999b)). In these studies the regression coefficient in (1) is called a cointegrating parameter and it represents a stationary relationship that holds between yi, t and xi, t for every i. Such a common long-run relationship is often predicted by economic theory and it is then of central interest to estimate and test whether it satisfies theoretically sound restrictions. A typical example involves testing for the existence of a purchasing power parity hypothesis in a panel of suitably similar countries. In contrast to the recent panel cointegration literature, we do not restrict attention to models, where the regressors are generated by exact unit root processes. Indeed, although most macroeconomic variables analyzed in the recent panel cointegration studies display strong autocorrelation, there are seldom strong prior reasons why the autoregressive parameter should be unity. The problem is aggravated by the fact that unit root tests cannot reliably detect small deviations from unity. Given this uncertainty about the unit roots, it is of interest to study problems that arise in the statistical inference about the regression parameter in (1) when the autoregressive parameters in (2) are close to rather than exactly equal to one. From earlier literature we know that such
246
HEIKKI KAUPPI
problematic near alternatives are best modeled by the local to unit root ci parametrization i = exp(ci /T) 1 + in (2) (see e.g. Elliott (1998) and Stock T (1997)). By this device it is possible to obtain asymptotic results that provide reasonable approximations in cases where the regressors xi, t are stationary but revert to their means so slowly that the standard fixed i asymptotics fail to attain satisfactory accuracy. We close this section by imposing the following assumption. Assumption 1. The errors i, t = (ui, t, i, t) are linear processes satisfying the following conditions:
(a) i, t = C(L) i, t =
Cj i, t j, where
j=0
j3||Cj|| < ,
j=0
(b) i, t = ( i, t, wi, t), where i, t and wi, t are mutually independent and iid across i and over t with E( i, t) = E(wi, t) = 0, E( 2i, t) = E(w2i, t) = 1, and E( 4i, t) = E(w4i, t) = 4 < for all i and t. Under Assumption 1 the error process in the system (1) and (2) satisfy the same conditions as the error process of the homogenous panel cointegration regression of Phillips & Moon (1999a, Assumptions 8 and 9). B. Preliminary Analysis For preliminary insights, we derive sequential limits for the pooled panel OLS estimator,
n
T
i=1
t=1
n
T
xi, tyi, t
ˆ =
i=1
(3)
. x 2i, t
t=1
Let [Tr] denote the integer part of Tr. From Phillips & Solo (1992), we know
[Tr]
1
i, t converges weakly T t = 1 to a two dimensional Brownian motion Bi(r) = (Bui(r), Bi(r)), (0 ≤ r ≤ 1), with that under Assumption 1, the partial sum process
the long-run covariance matrix =
j=
E(i, ji, 0), which we partition
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
247
= [kl], (k, l = u, ). Furthermore, by the well know limit theory for near integrated processes (e.g. Phillips (1987, 1988)) as T → ,
T
1 T2
Kci(r)2dr,
(4)
0
t=1
T
1 T
1
x2i, t ⇒ 1
xi, tui, t ⇒
Kci(r)dBui(r) + u,
(5)
0
t=1
where u is a non-diagonal element of the one sided long-run covariance
matrix =
E(i, ji, 0) = [kl], (k, l = u, ), and Kci(r) =
r
e(r s)cidBi(s),
0
j=0
(0 ≤ r ≤ 1), is an Ornstein-Uhlenbeck process. Given (4) and (5) we may deduce for fixed n as T → ,
n
1 T( ˆ ) → n
1
i=1
2
1 n
Kci(r) dr
0
n
1
1
Kci(r)dBui(r) + u .
0
i=1
(6)
This result provides the first step for obtaining sequential asymptotics for (3). The second step is to derive the limit of the right hand side of (6) as n → . For simplicity assume ci = c for all i. Then, notice that the with mean zero and variance
E
1
Kci(r)dBui(r)
0
2
1
= uu
0
1
Kci(r)dBui(r) are iid
0
r
e2(r s)cdsdr < ,
(7)
0
where the equality follows from well known results for stochastic integrals. Consequently, we may apply the strong law of large numbers to obtain
n
1 n
i=1
1
as
Kci(r)dBui(r) → 0, as n → ,
0
as
where ‘ → ’ denotes almost sure convergence. Furthermore, the also iid, E
0
1
Kci(r)2dr =
1
0
r
0
e2(r s)cdsdr > 0,
(8)
1
0
Kci(r)2dr are
248
HEIKKI KAUPPI
1
and E
0
Kci(r)2dr
2
< . Thus, we may deduce that the denominator on the
right hand side of (6) converges almost surely to
1
0
r
e2(r s)cdsdr, as
0
n → . In view of these results, we may now conclude that as T → followed by n → ,
p
1
T( ˆ ) → 1/
r
0
e2(r s)cdsdr
0
u
.
(9)
This result indicates that although ˆ is consistent it is subject to a second order bias effect arising from temporal correlation between the system errors ui, t and i, t. Note that if i = 1 in (2), the bias term in (9) still exists and actually becomes equal to 2u /. In contrast, if u = 0, there is no asymptotic bias in the estimation error of ˆ irrespective of the values of the localizing parameters ci in (2). In fact, if u = 0, we obtain the sequential weak convergence result nT( ˆ ) → N(0, V ˆ ),
(10)
where V ˆ =
uu
1
0
1 r
.
e2(r s)cdsdr
0
The latter limiting result essentially follows from the fact that
n
1 n
i=1
1
Kci(r)dBui(r)
0
is asymptotically normally distributed with zero mean and variance given in (7). C. Serial Correlation Corrected Estimation In view of the above analysis we may conjecture that the asymptotics in (10) can be attained even when u ≠ 0 provided that we have a suitable estimator for
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
249
u. One alternative is to use the kernel estimation strategy that is used in the pooled fully modified (PFM) estimator of Phillips & Moon (1999a). The PFM estimator will be introduced in the subsequent section and it employes the ˆ = [ˆ kl] and ˆ = [ˆ kl], (k, l = u, ), of and , averaged kernel estimators respectively, defined by
ˆ i= ˆ i,
i=1
1 Here ˆ i(j) = T
i=1
(j/K)ˆ i(j),
j=T+1 T1
n
ˆ =1 n
T1
n
ˆ =1 n
ˆ i, ˆi=
(j/K)ˆ i(j).
(11)
j=0
i, t + ji, t, where the summation is over 1 ≤ t, t + j ≤ T, while
t
(j/K) is a lag kernel for which (0) = 1, (x) = ( x),
(x)2dx < , and
1 (x) < . As to |x|q applicable lag kernel functions and the choice of the bandwidth parameter K we follow Phillips & Moon (1999a) and impose the following assumption.
with Parzen’s exponent q(0, ) such that kq = lim x→0
1 Assumption 2. The lag kernel (j/K) in (11) has Parzen exponent q > , and 2 the bandwidth parameter K tends to infinity with K/T → 0 and K2q/T → > 0, as T → . ˆ ) Remark 1. Under Assumption 2 the normalized estimation errors n( ˆ ) converge in probability to zero. This result was stated in and n( Phillips & Moon (1999a, Proof of Theorem 9) and holds as (T, n → ) with n/T → 0. This result is employed in the proofs of the theorems given below.
Remark 2. Notice that the kernel estimators defined in (11) are not feasible, since they employ the unknown errors i, t = (ui, t, i, t). A natural approach to ˆ i, t, from a preliminary estimate ui, t and i, t is to use the residuals uˆ i, t = yi, t x pooled panel OLS regression, and the differences xi, t , respectively. It is easy to show that the associated estimation errors for ui, t and i, t are of orders of magnitude T 1 and T 1/2, respectively. In view of this and Remark 1 we may then expect that under the assumptions of this chapter and irrespective whether the xi, t in (2) have exact or near unit roots, the use of uˆ i, t and xi, t in places of ui, t and i, t, respectively, has no effect on the rate of consistency of the kernel
250
HEIKKI KAUPPI
estimators in (11). However, following Phillips & Moon (1999a), we proceed by working with the true errors i, t , since we want to avoid any further technical complications that might arise in an asymptotic analysis where the kernel estimators in (11) use the estimates uˆ i, t and xi, t in places of ui, t and i, t, respectively. Now we are ready to define a robust estimator for ,
n
T
xi, tyi, t nTˆ u
ˆ = *
i=1
t=1
n
,
T
i=1
(12)
x2i, t
t=1
where ˆ u is given in (11). The estimator in (12) is called a serial correlation corrected pooled panel estimator. We turn to establish the joint asymptotics of the new estimator in (12). Let Jci(r) =
r
e(r s)cidWi(s), where Wi(r) is a standard Brownian motion. Hereafter,
0
we assume that the values of ci are uniformly bounded and such that the
1
arithmetic mean of the expected values of finite number, i.e.
n
1 lim n→ n
1
E
i=1
0
n
1 Jci(r) dr = lim n→ n 2
Jci(r)2dr converges to a positive
0
i=1
1
0
r
e2(r s)cidsdr = xx
0
exists and is finite by assumption. The latter condition is not restrictive and basically means that we assume that the appropriately normalized sample
n
1 second moment of the pooled regressors xi, t, i.e. 2 nT probability.
i=1
T
x2i, t, converges in
i=1
Theorem 4. Suppose Assumptions 1 and 2 hold and that data are generated by (1) and (2) with ci such that supi|ci| ≤ c < . Then under joint limits as (T, n → ) with n/T → 0 ˆ ) → N(0, V * nT( * ˆ ), where V * ˆ =
uu 1 . xx
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
251
As is apparent from Theorem 4 the serial correlation corrected pooled panel OLS estimator has indeed very desirable properties. It is nT-consistent, asymptotically normal and free of asymptotic biases irrespective whether the regressors xi, t in (2) carry out exact or near unit roots in their generating mechanisms. This is a remarkable improvement that can be gained, if panel data are used, since none of the existing time series estimators for cointegrating parameters can achieve these features. Rather, as shown e.g. by Elliott (1998) the time series cointegration regression estimators tend to suffer from second order biases unless the regressors are generated by exact unit root processes, and these biases lead to severe size distortions in hypothesis testing. In contrast, we will show below that by the use of the serial correlation corrected pooled panel OLS estimator we can achieve robust inferences in fairly general situations where individual regressors may have roots that vary heterogeneously within a range of values near one. Unfortunately, the situation turns out less hopeful, if the panel regression in (1) includes individual intercepts or if the data exhibit linear or higher order time trends. While there is a natural way to modify the new serial correlation corrected pooled OLS estimator to take these effects into account, it turns out that in these cases near unit roots result in nuisance parameters that produce bias effects to the asymptotics of the estimator. To see why this happens suppose the regression in (1) includes an intercept that may vary across individuals. This suggests the use of demeaned data in the formula of the estimator. Accordingly, modify (12) to the form
n
T
i=1
t=1
x˜ i, ty˜ i, t nTˆ u
˜ = *
n
i=1
(13)
,
T
x˜ 2i, t
t=1
T
T
1 1 where y˜ i, t = yi, t y¯ i and x˜ i, t = xi, t x¯ i, with y¯ i = yi, t and x¯ i = xi, t, T t=1 T t=1 respectively. The asymptotic properties of the estimator in (13) are easily found by employing the sequential limit theory. To reveal the most essential part of this exercise note that we have
T
1 T
t=1
x˜ i, tu˜ i, t →
1
0
˜ c (r)dBu (r) + u, K i i
(14)
252
HEIKKI KAUPPI
˜ c (r) is a demeaned Ornstein-Uhlenbeck process defined by K˜ c (r) = where K i i Kci(r)
1
Kci(s)ds. Now, while the temporal correlation correction in (13) can
0
still remove the bias effects that arise from the presence of u on the right hand side of (14), the remaining term, i.e.
1
˜ c (r)dBu (r), does no longer have a zero K i i
0
˜ c (r). mean in comparison with the case in (5), where we had Kci(r) in place of K i In fact, E
1
˜ c (r)dBu (r) = u K i i
0
and we thus obtain
i=1
1
0
n
1 n
r
e(r s)cidsdr
0
1
p ˜ c (r)dBu (r) → uxx, as n → , K i i
0
where xx is given above. In view of this result it is easy to see that the estimator in (13) is subject to an asymptotic bias, which depends on the nuisance parameters ci. Unfortunately, no technique is currently available that would provide consistent estimates for the single localizing coefficients ci. Only in the special case where the localizing coefficient are the same across i, we may use the cross sectional dimension of the panel to provide consistent estimates for the common localizing coefficient (see Moon & Phillips (1999)). This fact opens a possibility for correcting the bias effects. However, such a correction may be rather complicated and is to be restricted in cases where the common c is well below zero (cf. Moon & Phillips (1999)). While it is out of the scope of this study to consider this matter in more detail, in empirical applications the special case of a common c is nevertheless hardly realistic. D. Fully Modified Estimation We turn to consider the PFM estimator of Phillips & Moon (1999a). The idea of the PFM estimator is to modify the pooled OLS estimator in (3) by employing non-parametric corrections in the same way as in the fully modified OLS (FM-OLS) estimator of Phillips & Hansen (1990). The estimator is defined by
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
n
253
T
xi, tyi,+ t nTˆ u+
ˆ + =
i=1
t=1
n
i=1
,
T
(15)
x2i, t
t=1
where 1 ˆ u ˆ xi, t yi,+ t = yi, t
(16)
1 ˆ ˆ u+ = ˆ u ˆ u ˆ ,
(17)
and employ the kernel estimators in (11). The equation (16) gives an endogeneity correction and is similar to that in the FM-OLS estimator of Phillips & Hansen (1990). The equation (17) gives the contemporaneous and serial correlation corrections that are needed to remove all the second order bias effects arising from temporal correlation between ui, t and i, t. Under the assumption that the regressors xi, t in (2) have exact unit roots the joint asymptotics of the PFM estimator are determined by Theorem 9 of Phillips & Moon (1999a). The following theorem shows how this result changes when the regressors xi, t are generated by the more general class of near unit root processes. Here we make an additional (technical) assumption that the values of ci are such that the ci-weighted average of the expected values of
1
Jci(r)2dr converges to a finite number, i.e.
0
n
1 limn → n
1
ciE
i=1
Jci(r)2dr = cxx
0
exists and is finite by assumption. Theorem 5. Suppose the assumptions of Theorem 4 hold. Then under joint limits as (T, n → ) with n/T → 0 (a) nT( ˆ + ) nBn, T → N(0, V +ˆ ), p (b) T( ˆ + ) → B, where V +ˆ =
u · 1 , xx
(18)
254
HEIKKI KAUPPI
1 with u · = uu 2u , and
n
u Bn, T =
i=1
T
ci / T
T(e
1)
xi, t xi, t 1
t=1
n
T
,
(19)
x2i, t
i=1
t=1
u cxx . B = xx
(20)
The following corollary holds when the assumption of Phillips & Moon (1999a) about exact unit roots in the regressors xi, t is valid. Corollary 6. Suppose Assumptions 1 and 2 hold and data are generated by (1) and (2) with ci = 0 for all i. Then under joint limits as (T, n → ) with n/T → 0 1 nT( ˆ + ) ⇒ N(0, 2u · ).
It is indeed easy to see that the result of Corollary 6 follows from Theorem 5, 1 1 1 Jci(r)2dr = E Wi(r)2dr = because if ci = 0, then Bn, T = B = 0, and E 2 0 0 1 giving V ˆ + = 2u · . The result of Corollary 6 coincides precisely with that of Theorem 9 of Phillips & Moon (1999a) and it is illustrative to compare it to Theorems 4 and 5 above. First, note from Corollary 6 the obvious fact that when the exact unit root assumption holds, then ˆ + is nT-consistent, asymptotically normal and unbiased. In addition, note that in this case ˆ + is 1 ˆ because u · = uu 2u ≤ uu. This is the generally more efficient than *, price that we have to pay, if the autoregressive parameters in (2) happen to be ˆ instead of ˆ + . exactly equal to one and we use the estimator * However, as Theorem 5 indicates the behavior of the estimator ˆ + is radically different, if the regressors xi, t are generated by processes with roots that are only local to one. First, the estimator ˆ + is no more nT-consistent. Rather, in order to obtain nT-rate asymptotics, a bias term Bn, T given in (19) has to be subtracted from the estimation error. In fact, in view of the result (b) of Theorem 5, if the xi, t are near, rather than exact, unit root processes, the estimator ˆ + is only T-consistent and has an asymptotic bias given by B in (20). If there is no simultaneity in the model, i.e. if u = 0, then the biases disappear and the PFM estimator is nT-consistent and has an asymptotic normal distribution with the same variance as that of the serial correlation corrected pooled OLS estimator. To see why the biases arise notice first that when an autoregressive parameter
i in (2) is just nearly one with ci non-zero, then xi, t = i, t + (eci /T 1)xi, t 1,
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
255
where (eci /T 1) ci /T. It is then easy to see that the use of xi, t in the endogeneity correction term (16) gives raise to Bn, T in (19), which has the limit given in (20). It is worth noticing that if the nuisance parameters ci were known, we could employ a quasi-difference in place of the pure difference xi, t in (16) so that the bias term, Bn, T = 0. However, as we already noted above such a solution is generally infeasible because the localizing coefficient ci are unknown and cannot be consistently estimated from the individual time series xi, t. We close this section by pointing out that the above bias problem also occurs in cases where the PFM estimator is modified to account of deterministic effects like individual intercepts in (1). This fact can be easily verified through sequential asymptotics (for details see Kauppi (1999, p. 124–125)). E. Hypothesis Testing In this section we consider testing a simple hypothesis H0: = 0 against H1: ≠ 0. First, in view of Theorem 4 we could use the serial correlation corrected pooled OLS estimator to obtained the t-test statistic
ˆ 0) t * = nT( *
n
1 nT2
i=1
T
t=1
ˆ uu
x2i, t
.
In view of Theorem 4 and the result (36) given in its proof in the appendix it is easy to deduce the following corollary. Corollary 7. Suppose the assumptions of Theorem 4 hold. Then, under joint limits as (T, n → ) with n/T → 0, t * ⇒ N(0, 1). For comparison we will also consider assuming exact unit roots in xi, t and accordingly employing the PFM estimator based t-test
t + = nT( ˆ + 0)
ˆ 1 , 2 ˆ u·
1 where ˆ and ˆ u · = ˆ uu ˆ 2u ˆ are obtained from the kernel estimators given in (11) (cf. Phillips & Moon (1999a, Remark (c), p. 1086)).
Corollary 8. Suppose the assumptions of Theorem 5 hold. Then, under joint limits as (T, n → ) with n/T → 0 (a) t + diverges, if u ≠ 0 and B ≠ 0, where B is given in (20);
256
HEIKKI KAUPPI
1 (b) t + ⇒ N(0, Vt + ), if u = 0, where Vt ˆ + = xx. 2
Part (a) of Corollary 8 states the obvious consequence of Theorem 5 that the ttest statistic t + diverges, if the regressors are generated by local to unit root processes and u is non-zero. This means that hypothesis tests based on the PFM estimator are generally severely distorted. The result of part (b) of Corollary 8 shows that even when there is no simultaneity, i.e. u = 0, the test does not have the desired standard normal distribution. To illustrate this latter effect suppose that ci = c for all i. Then, if u = 0, we have
1
because E
0
1
Jci(r)2dr =
0
r
Vt ˆ + =
2c2 , e2c 2c 1
(21)
e2(r s)cdsdr = (e2c 2c 1)/4c2 for all i. It is easy
0
to see from (21) that for negative values of c, the Vt ˆ + becomes larger than unity. For example, for c = 5 and c = 10, the Vt ˆ + is approximately equal to 5.55 and 10.53, respectively. Notice that if the usual 5% critical value 1.96 is applied in the t ˆ + -test, then the true asymptotic rejection rates that correspond to c = 5 and c = 10 are approximately equal to 40.3% and 54.6%, respectively. F. Simulations In this section, we illustrate the theoretical findings obtained in the previous section by conducting some simple Monte Carlo experiments. We focus on investigating the size behavior of the PFM t-test statistic, t + , and that of the bias corrected t-test, t *. For the experiments we generate artificial data by employing equations (1) and (2), where we impose = 1 in (1). The errors i, t = (ui, t, i, t) are generated simply by equation i, t = chol(C) i, t, where i, t ~ nid(0, I2) across i = 1, . . . , n, and over t = 1, . . . , T, and chol(C) is the Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1, C12 = C21 = u. Thus, we have E(ui, t) = E(i, t) = 0, E(u2i, t) = E(2i, t) = 1 = uu = and E(ui, ti, t) = u. The initial values yi, 0 and xi, 0 are set to zeros. Table 1 reports percentage rejection rates of the t-tests, t + and t *, respectively, when a 5% critical value 1.96 is applied, n = 50, T = 250, and the local to unit root coefficients are set equal to a common value c, i.e. we use
i = = 1 + c/T for all i. In computing the long-run covariance estimates in t + and t *, respectively, we employed the Parzen kernel function and the bandwidth parameter value K = 1.[2] The columns under c = 0 report results when an exact unit root assumption holds. In accordance with the analytical
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
Table 1.
257
Monte Carlo results with n = 50 and T = 250
c=0
c = –5
c = –10
u
t +
t *
t +
t *
t +
t *
0 0.2 0.4 0.6 0.8
5.20 5.30 6.60 4.30 4.30
4.70 4.40 6.80 4.50 4.50
42.10 89.80 100.0 100.0 100.0
5.00 4.30 4.90 4.00 5.80
52.30 99.60 100.0 100.0 100.0
4.20 5.40 4.90 5.60 4.50
Notes: The columns under t + and t * report Monte Carlo rejection rates of the respective t-tests computed by employing long-run covariance estimates that were achieved by using a Parzen kernel function and a bandwidth parameter value K = 1. A nominal 5% asymptotic level were applied. In each replication, the data were obtained by using equations (1) and (2) with = 1 and
i = = 1 + c/T in (1) and (2), respectively, initial values zeros, and with the errors i,t = (ui,t,i,t) generated by equation i,t = chol(C) i,t, where i,t ~ nid(0, I2) across i = 1, . . . , n, and over t = 1, . . . , T, and chol(C) is the Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1, C12 = C21 = u. Results are based on 1000 replications.
results of the previous section, in this case, the size behavior of the two tests is good. The columns under c = 5 and c = 10 give rejection rates when the roots of the regressors are only nearly one. As predicted by Corollary 8, now the t + -test is very sensitive to deviations from exact unit roots and suffers from severe size distortions through all values of u. Notice that even when u = 0 the t + -test rejects far in excess to the desired 5% nominal level as was predicted by the considerations of the previous section. In contrast, as predicted by Corollary 7 the bias corrected t-test, t *, maintains well the desired size level through different values of u. Table 2 reports otherwise similarly computed test results as those of Table 1 except that now n and T are set to 25 and 100, respectively. As is apparent the results do not change much from those of Table 1. This indicates that our asymptotic results can provide fairly accurate approximations with sample sizes that are typical in empirical applications. Table 3 examines the performance of the bias corrected t-test when the individual localizing coefficients in the generating mechanisms of the regressors vary across different panel members. The heterogeneity across panel members were obtained by using otherwise similarly generated data as in Tables 1 and 2 except that all the individual specific localizing coefficients ci were drawn from a uniform distribution on the interval [c, 0]. For example, the column denoted by ‘(n = 25, T = 100)’ and ‘c = 10’ reports simulation results
258
HEIKKI KAUPPI
Table 2.
Monte Carlo results with n = 25 and T = 100
c=0
c = –5
c = –10
u
t +
t *
t +
t *
t +
t *
0 0.2 0.4 0.6 0.8
6.80 6.60 6.00 5.40 5.30
6.20 6.10 5.10 4.90 5.80
37.40 74.60 99.20 100.0 100.0
5.50 4.00 5.10 6.20 5.60
52.30 96.50 100.0 100.0 100.0
5.50 6.60 6.20 5.80 5.00
Notes: See the notes of Table 1.
based on an experiment, where the autoregressive coefficients i( = 1 + ci/T) across different panel members vary uniformly within the range [0.9, 1]. A comparison of the results of Table 3 to those of Tables 1 and 2 clearly indicates that the bias corrected t-test behaves equally well whether the xi, t have homogenous or heterogenous localizing coefficients. In view of the above reported simulation experiments we may conclude that near unit roots indeed result in severe size distortions to hypothesis tests based on the PFM estimator. On the other hand, the results are fairly promising with
Table 3.
Monte Carlo results on the bias corrected test when localizing coefficients are heterogenous (n = 50, T = 250)
(n = 25, T = 100)
u
c = –5
c = –10
c = –5
c = –10
0 0.2 0.4 0.6 0.8
4.82 5.80 4.96 5.42 5.18
5.10 5.06 4.62 5.06 5.18
5.00} 6.20 5.12 5.98 5.44
5.18 5.00 5.34 5.46 5.92
Notes: The table reports Monte Carlo rejection rates of the t *-test computed in the same way as in Tables 1 and 2. The data were obtained otherwise similarly as in Tables 1 and 2 except that in each replication the individual specific localizing coefficient ci (i = 1, . . . , n) were drawn from a uniform distribution on the interval [c, 0]. The applied values of c are given in the top of each column. Results are based on 5000 replications.
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
259
regard to the new bias corrected test, which was able to maintain good size behavior through all the performed experiments. However, it should be pointed out that our simulation setup here is rather simple and it is likely that some problems arise in more complicated models. For example, if the data generating mechanism obeys a more general short-run dynamics than experimented here, then it can be expected that the non-parametric corrections are subject to somewhat larger (finite sample) estimation errors, which may weaken the performance of the bias corrected test. Furthermore, an additional source of estimation error results in when the non-parametric estimators use estimated values in places of the true values of the errors.
IV. CONCLUDING REMARKS This chapter developed new panel data limit theory that can be used in obtaining convergencies in probability and in distribution when there is heterogeneity across panel members and the cross sectional and time series dimensions of the data tend to infinity simultaneously. The new theory was applied to study asymptotics of a panel regression in which the regressors were generated by a local to unit root process with cross sectionally heterogenous localizing coefficients. The application demonstrated that a serial correlation corrected pooled panel OLS estimator yields nT-consistent and asymptotically normal estimates that are centered to the true parameter value irrespective of whether the regressors are nearly or exactly integrated. While this desirable result holds only in the special case without deterministic effects, our asymptotic analysis also indicated that the panel fully modified estimator is subject to asymptotic biases even in this simple case, if the regressors are nearly rather than exactly integrated. Therefore, much care should be taken in interpreting results achieved by the recent panel cointegration methods that assume exact unit roots when near unit roots are equally plausible.
NOTES 1. This is proved by Phillips & Moon (1999a, Theorem 8) when ci = 0 for all i. Furthermore, similar result can be proved in the case where the ci are nonzero by following lines given in the proof of Theorem 5 of this chapter. 2. In empirical applications a bandwidth parameter value K = 1 is hardly realistic. However, in the present simulation setup the actual value of K does not play an important role, because we use iid errors in the simulations. For example, in all of the
260
HEIKKI KAUPPI
reported cases, essentially similar results were obtained by using the bandwidth parameter value K = 4.
ACKNOWLEDGMENTS I would like to thank the two referees for their useful comments and suggestions. This paper was completed while the author worked at the Research Department of the Bank of Finland whose hospitality is gratefully acknowledged. This paper is a part of the research program of the Research Unit on Economic Structures and Growth (RUESG) at the Department of Economics at the University of Helsinki. Financial support from the Yrjö Jahnsson Foundation is appreciated. The usual disclaimer applies.
REFERENCES Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley. Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press. Elliott, G. (1998). On The Robustness of Cointegration Methods When Regressors Almost Have Unit Roots. Econometrica, 66(1), 149–158. Kauppi, H. (1999). Essays on Econometrics of Cointegration. Research Reports Nro 84, Dissertationes Oeconomicae, Department of Economics, University of Helsinki. Moon, H., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using Panel Data. Cowles Foundation Discussion Paper No. 1224, Yale University, (http://cowles.econ.yale.edu/). Phillips, P. C. B. (1987). Towards A Unified Asymptotic Theory for Autoregression. Biometrica, 74(3), 535–547. Phillips, P. C. B. (1988). Regression Theory for Near-integrated Time Series. Econometrica, 56(5), 1021–1043. Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference In Instrumental Variables Regression With I(1) Processes. Review of Economic Studies, 57, 99–125. Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Non-stationary Panel Data. Econometrica, 67(5), 1057–1111. Phillips, P. C. B., & Moon, H. (1999b). Non-stationary Panel Data Analysis: An Overview of Some Recent Developments. Cowles Foundation Discussion Paper No. 1221, Yale University, (http://cowles.econ.yale.edu/). Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. The Annals of Statistics, 20(2), 971–1001. Stock, J. H. (1997). Cointegration, Long-run Comovements, and Long Horizon Forecasting. In: D. Kreps & K. F. Wallis (Eds), Advances in Econometrics Proceedings of the Seventh World Congress of the Econometric Society. Cambridge: Cambridge University Press. Stout, W. F. (1974). Almost Sure Convergence. New York: Academic Press. White, H. (1984). Asymptotic Theory for Econometricians. Academic Press: San Diego, California.
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
261
APPENDIX APPENDIX A: PROOF OF THEOREM 2
n
1 From the conditions of the theorem we know that Xn, T = Yi, T ⇒ n i=1 n 1 Xn = Yi as T → for all fixed n. Since supTE||Yi, T||1 + ≤ M < for all i and n i=1
because Yi, T ⇒ Yi implies ||Yi, T||1 + ⇒ ||Yi||1 + by the continuous mapping theorem we also have E||Yi||1 + ≤ M < by Theorem 5.3 of Billingsley (1968) (see also discussion on p. 33 of Billingsley (1968)). By arguments given in the proof of Theorem 1 of Phillips & Moon (1999a) we can justify that the Yi are independent across i, since the Yi, T are independent across i for all T. Given this and the fact that E||Yi||1 + ≤ M < , we may apply Markov’s law of large p numbers to deduce Xn → X as n → (e.g. White (1984, p. 33)). Furthermore, p
if we establish conditions (i) through (iv) of Theorem 1, then Xn, T → X as (T, n → ). First, condition (i) holds, since
n
1 n
n
E||Yi, T|| ≤
i=1
1 n
i=1
¯ < , sup E||Yi, T|| ≤ M T
where the last two inequalities follow from condition (b) of the theorem. Also, condition (ii) holds, since
n
1 n
||E(Yi, T) E(Yi)|| ≤ sup ||E(Yi, T) E(Yi)|| → 0, as T → , i
i=1
by condition (a). For condition (iii) we use the fact that E||Yi, T||1{||Yi, T|| > n} 1 M sup E||Yi, T||1 + ≤ for all i, where the first inequality follows from ≤ (n) T (n) arguments given by Billingsley (1968, p. 32) and the second inequality holds by condition (b). Now, for any > 0,
n
1 n
i=1
E||Yi, T||1{||Yi, T|| > n} ≤
M , (n)
262
HEIKKI KAUPPI
and therefore, condition (iii) follows. Condition (iv) holds by the same 1 M E||Yi||1 + ≤ . argument as we notice that now E||Yi||1{||Yi|| > n} ≤ (n) (n)
APPENDIX B: PROOF OF THEOREM 3
n
Let s
2 n, T
=
Vi, T and define i, n, T =
i=1
Yi, T . Then sn, T
n
i, n, T ⇒ N(0, 1), as (T, n → ),
(22)
i=1
by Theorem 2 of Phillips & Moon (1999a), if the Lindeberg condition
n
lim
n, T →
E[2i, n, T1{|i, n, T| > }] = 0, > 0,
i=1
n
1
Yi, T ⇒ N(0, V) as (T, n → ). It n i = 1 remains to verify the above Lindeberg condition. We have for given > 0, holds. Given condition (i), (22) implies
n
n
E[2i, n, T1{2i, n, T > }] =
i=1
Y2i, T Y2 1 2i, T > 2 sn, T sn, T
E
i=1
n
=
n 1 s2n, T n
E Y 2i, T1 Y 2i, T >
i=1
s2n, T n n
n
n 1 ≤ 2 sn, T n
s2n, T n n
sup E Y 2i, T1 Y 2i, T >
i=1
T
.
(23)
By condition (ii) we can always find > 0 such that sup E|Y 2i, T|(1 + ) ≤ N < for T
all i. Given this we obtain
sup E Y 2i, T1 Y 2i, T > T
s2n, T n n
≤
N
n
s2n, T n
,
(24)
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
263
for all i (cf. Billingsley (1968, p. 32)). In view of (23) and (24) and given that n s2n, T = V < we may condition (i) implies lim 2 = 1/V < (V > 0) and lim n, T → s n, T → n n, T now conclude that
n
lim
n, T →
E[2i, n, T1{2i, n, T > }] = 0,
i=1
so that the Lindeberg condition follows.
APPENDIX C: PROOF OF THEOREM 4 We start by giving some intermediate results that we will use repeatedly in the main part of the proof given below. First, just as in Phillips & Moon (1999a, Lemma 2), based on Phillips and Solo (1992), we decompose the i, t as (25) i, t = C i, t + ˜ i, t 1 ˜ i, t,
where C = C(1) =
Ck and ˜ i, t =
k=0
j=0
j=0
Ck. Under
k=j+1
Assumption 1(a), C is finite and
˜ j i, t j with C ˜j= C
j2||C˜ j||2 =
j2
j=0
2
Cs
< (see
s=j+1
Phillips & Moon (1999a, p. 1083)). It follows that (26) E||˜ i, t||2 ≤ M < . We partition C = [Cab], (a, b = , w), so that the long-run covariance matrix = CC =
C2
+ C2 w C
Cw + C wCww
C
Cw + C wCww uu = 2 2 Cw + Cww u
u
(27)
For subsequent reference note that the components of i, t = (ui, t, i, t) in (25) may be written as (28) ui, t = C
i, t + C wwi, t + u˜ i, t 1 u˜ i, t, i, t = Cw
i, t + Cwwwi, t + ˜ i, t 1 ˜ i, t, (29) where u˜ i, t and ˜ i, t are the two components in ˜ i, t. Next, by equation (2)
t
xi, t =
s=1
e((t s)/T)cii, s + e(t/T)cixi, 0
264
HEIKKI KAUPPI
and using (29) we can write this as xi, t = Cw f( )i, t + Cww f(w)i, t + R(x)i, t, where have used the notation
(30)
t
f(a)i, t =
e((t s)/T)ciai, s, a = , w,
(31)
s=1
and
t1
(t 1)/T)ci
˜ i, 0 + (1 e
R(x)i, t = e
ci /T
)
e((t 1 s)/T)ci˜ i, s ˜ i, t + e(t/T)cixi, 0.
(32)
s=1
For later analysis it is useful to have the following two moment bounds. First,
2 f(a)i, 1 t sup sup E ≤ sup i 1≤t≤T T 1≤t≤T T
t
e((t s)/T)2 supi|ci| ≤ M < ,
(33)
s=1
since e((t s)/T)2 supi|ci| ≤ M < (recall that supi|ci| ≤ c < ). Second, using the
m
inequality E
m
2
Xi| ≤ m
i=1
E|Xi|2 (e.g. Davidson (1994, p. 140)) and the fact
i=1
˜ i, t are iid across i we obtain
sup sup E(R2(x)i, t) ≤ 4 sup e((t 1)/T)2 supi|ci| E(˜2i, 0) + 4 sup E(˜2i, t) i
1≤t≤T
1≤t≤T
1≤t≤T
+ 4( sup e(t/T)2 supi|ci|)E(x2i, 0) 1≤t≤T
1 2 1≤t≤T T
+ 4T 2(1 esupi|ci| /T)2 sup
t1
k=1
t1
e((2t 2 k s)/T)2 supi|ci|E|˜i, s˜ i, k|
s=1
≤ M < .
(34)
To see that (34) holds note that sup1 ≤ t ≤ T e(t/T)2 supi|ci| ≤ e2 supi|ci|, E(˜2i, t) ≤ M by (26), E(x2i, 0) ≤ M (by the initial value condition), T 2(1 esupi|ci| /T)2 = O(1), and by the Cauchy-Schwartz inequality E|˜i, k˜ i, s| ≤ E(˜i, k)2E(˜i, s)2 ≤ M, where the latter inequality follows again from (26).
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
265
We turn to give the completing steps of the proof of Theorem 4. Write
n
1
ˆ ) = nT( *
T
1
n
T
i=1
(xi, tui, t u) n(ˆ u u)
t=1
1 n
n
i=1
,
T
1 T2
x2i, t
t=1
where n(ˆ u u) = op(1), as (n, T → ) with n/T → 0 (recall Remark 1). It suffices to show that 1 nT
n
T
i=1
t=1
(xi, tui, t u) ⇒ N(0, uuxx), as (T, n → ) with n/T → 0, (35)
and
n
1 nT 2
i=1
T
p
x2i, t → xx, as (T, n → ).
(36)
t=1
To prove (36) use (30) to write
n
1 nT 2
i=1
T
t=1
n
1 x = Cw n 2 i, t
2
i=1
n
+ 2Cw Cww
1 n
i=1
n
1 + 2Cww n
i=1
T
1 T2
n
2 ( )i, t
f
+C
t=1
1 n
t=1
T
1 T2
i=1
T
1 T2
2 f(w)i, t
t=1
n
f( )i, t f(w)i, t + 2Cw
t=1
T
1 T2
2 ww
1 n
n
1 f(w)i, tR(x)i, t + n
i=1
i=1
T
1 T2
f( )i, tR(x)i, t
t=1
T
1 T2
R2(x)i, t
t=1
= C2w Ib1 + C2wwIb2 + 2Cw CwwIIb1 + 2Cw IIb2 + 2CwwIIb3 + IIb4, say. p
p
We now show that Cw 2Ib1 + C2wwIb2 → xx and IIb1, IIb2, IIb3, IIb4 → 0 as (T, n → ) so that (36) follows. Write
n
1 Ib1 = n
i=1
Yi, T,
(37)
266
HEIKKI KAUPPI
T
1 where Yi, T = 2 T
2 f( )i, t. For an application of Theorem 2 observe that Yi, T are
t=1
1
independent across i for all T and as T → , Yi, T ⇒ Yi =
1
E(Yi) =
n
r
0
Jci(r)2dr. We know
0
1 dsdr and by assumption lim n→ n
(r s)2ci
e
0
i=1
1
0
r
e(r s)2cidsdr = xx
0
exists. Therefore, if the conditions (i) and (ii) of Theorem 2 hold,
n
1 n
p
Yi, T → xx as (T, n → ).
i=1
For verifying condition (i) let p = 1 + and use the definition of Yi, T in (37) to obtain
T
1 (E|Yi, T| ) = 2 E T p 1/p
p
2 ( )i, t
f
t=1
1/p
T
1 ≤ 2 T
t
E
t=1
e((t s)/T)ci i, s
s=1
2p
1/p
, (38)
where the inequality follows from the Minkowski’s inequality and the definition of f( )i, t in (31). Now, the e((t s)/T)ci i, s, (1 ≤ s ≤ t ≤ T), are independent random variables with zero means and E|e((t s)/T)ci i, s|2p ≤ (esupi|ci|})2 + 2 E| i, s|2 + 2 ≤ M for some M < and some > 0. Therefore, we may apply Theorem 3.7.8 of Stout (1974, p. 213) to obtain
t
E
s=1
e((t s)/T)ci i, s
2p
≤ Mt p,
(39)
where M is finite and independent of i. By inserting (39) into (38) and rising to the power of p = 1 + it is easy to see that E|Yi, T|1 + ≤ M so that condition (i) of Theorem 2 follows. For condition (ii) of Theorem 2 it suffices to note that
T
1 the supremum of the absolute difference between E(Yi, T) = 2 T
1
and E(Yi) =
0
r
t=1
t
e(t q/T)2ci
q=1
e(r s)2cidsdr tends to zero uniformly in i as T → (this follows
0
since supi|ci| ≤ c < , for details see Kauppi (1999, p. 135–136)).
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
267
Obviously the above analysis remains the same if we replace i, t in the definition of Yi, T in (37) with wi, t implying that Ib2 has the same limit as Ib1. 2 2 2 2 + Cww = we therefore see that Cw Ib1 + Cww Ib2 Noticing from (27) that Cw converges in probability to xx as desired. p We turn to prove that IIb1, IIb2, IIb3, IIb4 → 0 as (T, n → ) by showing that E(IIb1)2, E|IIb2|, E|IIb3|, E|IIb4| → 0 as (T, n → ). First, by the inequality
m
E
m
Xi|2 ≤ m
i=1
E|Xi|2 (e.g. Davidson, 1994, p. 140) and condition (b) of
i=1
n
T
1 Assumption 1 we have E(IIb1) = 2 E(f( )i, t /T)2E(f(w)i, t /T)2 = n T i=1 t=1 1 O , where the latter equality follows from (33). Second, the use of the n 2
triangular and Cauchy-Schwartz inequalities shows that
n
1 E n
T
1 T2
i=1
f(a)i, tR(x)i, t
t=1
n
1 1 ≤ T n
i=1
T
1 T
E
t=1
f(a)i, t T
2
E|R(x)i, t|2 = O
1
T
,
where the equality follows from (33) and (34). Hence, E|IIb2|, E|IIb3| → 0 as (T, n → ). It is also straightforward to do similar calculations with IIb4 that show E|IIb4| → 0 as (T, n → ). This completes the proof of (36). We turn to prove the result in (35). First, use (28) through (30) to write
n
1 nT
T
(xi, tui, t u)
i=1 t=1
n
=
n
1 T
i=1
(Cw f( )i, t + Cww f(w)i, t)(C
i, t + C wwi, t) u
t=1
n
+
1
n
T
1
i=1
= Ia + IIa, say.
T
1 T
t=1
[xi, t(˜ui, t 1 u˜ i, t) + R(x)i, t(C
i, t + C wwi, t)] + u u
(40)
Note that f(a)i, 1 = ai, 1 and f(a)i, t = eci /Tf(a)i, t 1 + ai, t, (ai, t = i, t, wi, t), t ≥ 2, so that we may write
268
HEIKKI KAUPPI
n
Ia =
1 n
T
1 T
i=1
n
+
1
n
i=1
(Cw f( )i, t 1 + Cww f(w)i, t 1)(C
i, t + C wwi, t)
t=2
n
+
1
n
T
(eci /T 1) T
(Cw f( )i, t 1 + Cww f(w)i, t 1)(C
i, t + C wwi, t)
t=2
T
i=1
1 T
[(Cw
i, t + Cwwwi, t)(C
i, t + C wwi, t) u] = Ia1 + Ia2 + Ia3, say.
t=1
To consider the asymptotic properties of Ia1 write
n
Ia1 = where
1 n
Yi, T,
i=1
T
1 Yi, T = T
[Cw C
f( )i, t 1 i, t + CwwC wf(w)i, t 1wi, t
t=2
+ Cw C w f( )i, t 1wi, t + CwwC
f(w)i, t 1 i, t]. Since the summands in Yi, T are uncorrelated over t and the four terms in the square brackets in (41) are mutually uncorrelated for all t it follows that
T
1 E(Y ) = 2 T 2 i, T
2 [Cw C
2 E(f( )i, t 1 i, t)2 + C2wwC2 wE(f(w)i, t 1wi, t)2
t=2
2 2 + Cw C w E(f( )i, t 1wi, t)2 + C2wwC2
E(f(w)i, t 1 i, t)2]
T
1 = uu 2 T
t=2
t1
e((t 1 s)/T)2ci,
(42)
s=1
where the last equality uses (27) and the fact that E(f(a)i, t 1bi, t)2 =
t1
e((t 1 s)/T)2ci (a, b = , w).
s=1
Now, we apply Theorem 3. First, note that the Yi, T in (41) are independent across i for all T with mean zero and variance Vi, T = E(Y2i, T ) in (42). Let
1
Vi = uu
0
r
0
e(r s)2cidsdr and write
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
n
1 n
n
1 Vi, T = n
i=1
i=1
269
n
1 Vi + n
(Vi, T Vi).
(43)
i=1
Using the fact that supi|ci| ≤ c < it is straightforward to show that the second term on the right hand side of (43) tends to zero as n, T → (see Kauppi (1999, p. 135–136)). On the other hand, the first term in (43) has the positive and finite limit xx. Thus, condition (i) of Theorem 3 holds with V = uuxx. For establishing condition (ii) of Theorem 3 recall the definition of Yi, T from (41),
m
let p = 2 + and apply the inequality E Davidson (1994, p. 140)) to obtain T
1 E|Yi, T| ≤ M
E T p
f( )i, t 1 i, t
t=2
p
Xi
p1
E|Xi|p (e.g.
≤m
i=1
i=1
T
1 + MwwE T
T
1 + M wE T
m
p
p
f( )i, t 1wi, t
t=2
p
f(w)i, t 1wi, t
t=2
T
1 + Mw E T
p
f(w)i, t 1 i, t ,
(44)
t=2
where Mab = 4p 1|CwaC b|p ≤ M < (a, b = , w). Furthermore, by the fact that i, t are iid we have
E
1
T
f( )i, t 1 i, t
p
=E
1
T
t1
e((t 1 s)/T)ci i, s i, t
s=1
1
p
= E| i, t| E
≤M
T
t1 T
p/2
p
t1
e((t 1 s) /T)ci i, s
s=1
p
≤ M < ,
(45)
t1
((t 1 s)/T)ci
because |e
2+
| ≤ e |ci|} ≤ M < , E| i, t| supi
≤ M < , and E
i, s
2+
≤
s=1
M(t 1)(2 + )/2 for some M < and for some > 0, where the result with regard
t1
i, s
to E
s=1
2+
follows from Theorem 3.7.8 of Stout (1974, p. 213) (note that
270
HEIKKI KAUPPI
an iid sequence is also a martingale difference sequence). Now, given (45) and the fact that the f( )i, t 1 i, t, (2 ≤ t ≤ T) are martingale difference sequences for all i, we may apply Theorem 3.7.8 of Stout (1974, p. 213) one more time giving
T
E
1
T
t=2
f( )i, t 1 i, t T
T1 T
p
≤K
p/2
≤ M < for all i.
The same arguments show that the other three expectations in (44) are similarly bounded, and therefore, supTE|Yi, T|p = supTE|Yi, T|2 + ≤ M < for some > 0 and all i. Hence, the conditions of Theorem 3 hold and we have shown that Ia1 converges weakly to the distribution given in (35) as (T, n → ). Furthermore, p since supi|eci /T 1| = O(T 1), it follows immediately that Ia2 → 0 as (T, n → ). For Ia3 recall from (27) that u = C
Cw + C wCww so that
n
Ia3 =
1 n
T
1 T
i=1 n
=
1 n
[(Cw
i, t + Cwwwi, t)(C
i, t + C wwi, t) (C
Cw + C wCww)]
t=1 T
1 T
i=1
[C
Cw ( 2i, t 1) + CwwC w(w2i, t 1)
t=1
p
+ (Cw C w + CwwC
) i, twi, t] → 0 as (T, n → ), where the probability limit follows because the summands in the square brackets are iid with zero mean and finite second order moment across both i and t. The remaining step in the proof of Theorem 4 is to show that IIa in (40) is asymptotically negligible. First, in the same way as in the proof of Lemma 16 of Phillips & Moon (1999, p. 1105) we may decompose the one sided long-run covariance matrix =+
k=1
s=k
k=0
s=1
CsCk
CkCs = +
˜ kCk + 1 CC ˜ 0. C
k=0
˜ j = [C ˜ ab, j], (a, b = , w); we may Using this in conjunction with the partition C write
n
IIa =
1 n
i=1
t=1
T
1 T
xi, t(˜ui, t 1 u˜ i, t)
j=0
˜
, j + Cww, j + 1C ˜ w, j) (Cw , j + 1C
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
n
+
1 n
T
1 T
i=1
R(x)i, t(C
i, t + C wwi, t) + (C˜ w , 0C
+ C˜ ww, 0C w)} = IIa1 + IIa2,
t=1
say.
For IIa1 note that we can write
T
1 T
T
1 1 xi, tu˜ i, t 1 = xi, 1u˜ i, 0 + T T
t=1
t=2
1 1 xi, tu˜ i, t 1 = xi, 1u˜ i, 0 + T T
1 1 = xi, 1u˜ i, 0 + eci /T T T
and, thus,
T
1 T
t=1
1 xi, t(˜ui, t 1 u˜ i, t) = T
T1
t=1
T1
t=1
+ (e
1 xi, tu˜ i, t + T
T1
T1
i, t + 1u˜ i, t,
t=1
1 1) T
T1
xi, tu˜ i, t.
t=1
In view of this expression we get
n
IIa1 =
n
i=1
1 T
T1
i, t + 1u˜ i, t
+
n
n + T
n
1 1 xi, 1u˜ i, 0 T n
i=1
˜
, j + Cww, j + 1C ˜ w, j) (Cw , j + 1C
j=0
n
1
t=1
i=1
xi, t + 1u˜ i, t
t=1
1 1 i, t + 1u˜ i, t + xi, 1u˜ i, 0 xi, Tu˜ i, T T T
ci /T
1
271
n
1 1 xi, Tu˜ i, T + T n
ci /T
(e
i=1
1 1) T
T1
xi, tu˜ i, t
t=1
˜
, j + Cww, j + 1C ˜ w, j) (Cw , j + 1C
j=0
= IIa1a + IIa1b + IIa1c + IIa1d + O
n , say. T n
As a counterpart to the result ‘E
1
2
R1, i, T
=O
1 ’ derived in the T
n i = 1 proof of Lemma 16 of Phillips & Moon (1999, p. 1107) we have
272
HEIKKI KAUPPI
n
1
E
n
1 T
i=1
T1
˜ i, tt + 1
t=1
2
˜ kCk + 1 C
1 . T
=O
k=0
(46)
Since IIa1a is the (1, 2) element of the matrix inside the norm on the left hand p side of (46), we have IIa1a → 0 as (T, n → ). Next, by the triangle and CauchySchwartz inequalities
n
E
1
n
i=1
n
n1 Tn
1 xi, Tu˜ i, T ≤ T
T
i=1
n T
≤
sup E
1≤i≤n
2
xi, T
E
E|˜ui, T|2
2
xi, T
n , T
E|˜ui, T|2 = O
T
where the equality is easily verified by using (26), (30), (33) and (34). p p Therefore, IIa1c → 0 as (T, n → ) with n/T → 0. Obviously, also, IIa1b → 0 as sup|ci|/T 1| and note that (T, n → ) with n/T → 0. Finally, for IIa1d, let rT = T|e
n
E|IIa1d| ≤ rTE
1
n
i=1
1 T2
≤ rT
i=1
n
1 T
n1 Tn
xi, tu˜ i, t ≤ rT
t=1
T1
n
n1 Tn
T1
E
t=1
xi, t
i=1
2
T
1 T
E|˜ui, t|2 = O
T1
E
t=1
xi, t
T
u˜ i, t
n , T
by similar arguments to those used for IIa1c and the fact that rT = O(1). p We turn to show that IIa2 → 0 as (T, n → ) with n/T → 0. Using (32) write
n
IIa2 =
1
n
T
1 T
i=1 n
+
1 n
i=1
t1
T
1 T
˜ i, t(C
i, t + C wwi, t) (C˜ w , 0C
+ C˜ ww, 0C w)
t=1
((t 1)/T)ci
(e
˜ i, 0 + (1 e
ci /T
t=1
(C
i, t + C wwi, t) = IIa2a + IIa2b,
)
e((t 1 s)/T)ci˜ i, s + e(t/T)cixi, 0)
s=1
say.
p
Here IIa2a → 0 as (T, n → ) with n/T → 0, because IIa2a is identical with the
n
term ‘
1 n
i=1
R3, i, T’ in the proof of Lemma 16 of Phillips & Moon (1999a,
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
273
p
p. 1105). Finally, the result IIa2b → 0 as (T, n → ) with n/T → 0 follows from similar arguments as those used for IIa1. Details are straightforward and thus are omitted. This completes the proof of the theorem.
APPENDIX D: PROOF OF THEOREM 5 The proof follows from the same arguments as the proof of Theorem 4. To see the main lines write nT( ˆ + ) nBn, T
n
1
=
T
1
n
i=1
T
1 1 [xi, t(ui, t ˆ u ˆ xi, t ˆ u+ ) + T(eci /T 1)u xi, txi, t 1]
t=1
, T 1 x2i, t n i=1 T 2 t=1 where the denominator has the limit given in (36). Next let u+ = u 1 and note that the nominator in the above estimation error can be u written as 1
n
1 n
n
T
i=1
1 T
1 [xi, t(ui, t u i, t) u+ ]
t=1
n
1 1 n(ˆ u ˆ u )
1 n
i=1
T
1 T
(xi, txi, t )
t=1
1 n(ˆ u u) + ˆ u ˆ n(ˆ ),
where the n-normalized estimation errors of the kernel estimators are op(1) as (n, T → ) with n/T → 0 (recall Remark 1). Furthermore, using the fact that xi, t = (eci /T 1)xi, t 1 + i, t we can write
n
1 n
i=1
T
1 T
t=1
n
1 (xi, txi, t ) = n
(eci /T 1) T
i=1
n
1 + n
i=1
T
xi, txi, t 1
t=1
T
1 T
(xi, ti, t ) = Op(1),
t=1
where the last equality holds as (n, T → ) and can be proved by applying the arguments given in the proof of Theorem 4. Thus, for the result in part (a) of Theorem 5, it suffices that
274
HEIKKI KAUPPI
n
1 n
i=1
T
1 T
1 [xi, t(ui, t u i, t) u+ ] ⇒ N(0, u · xx),
t=1
as (T, n → ) with n/T → 0. The details of the proof of this latter result are similar to those of the proof of (35) and are thus omitted. Finally, note that the limiting result in part (b) of the theorem follows from lines used in the proof of (36) and the fact that the arithmetic average of the quantities ciE(01 Jci(r)2dr) converges to a finite number cxx.
STATIONARITY TESTS IN HETEROGENEOUS PANELS Yong Yin and Shaowen Wu ABSTRACT Several stationarity tests in heterogeneous panel data models are proposed in this chapter. By allowing maximum degree of heterogeneity in the panel, two different ways of pooling information from independent tests, the group mean and the Fisher tests, are used to develop the panel stationarity tests. We consider the case of serially correlated errors in the level and trend stationary models. The small sample performances of the tests are investigated via Monte Carlo simulations. The simulation experiments reveal good small sample performances. In the presence of serial correlation, either the group mean or the Fisher tests based on individual KPSS tests with l2 and LMC tests with p = 1 are recommended for use in empirical work due to their good small sample performances.
I. INTRODUCTION Dynamic panel data analysis has attracted more and more attention. This is partly due to the recent availability of large panel data sets. These data sets usually cover different countries, industries, or regions over relatively long time spans. They offer new opportunities as well as challenges to the analysis of dynamic panel data models, especially the heterogeneous panel data models as researchers usually would anticipate great differences among the cross-section units in the data. Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 275–296. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
275
276
YONG YIN & SHAOWEN WU
Along with the development of univariate non-stationary time series analysis, researchers also show more interests in analyzing non-stationary panel data. So far, people have proposed various methods to test for unit roots and cointegration along with methods of estimating cointegrating system in the context of panel data, see Baltagi & Kao (2000) for an up to date survey in this volume. The biggest advantage of using the panel data approach is the increased effective sample size, therefore it can effectively increase the powers of statistical tests and the efficiencies of estimation methods compared with their univariate counterparts. However, extending univariate methods of handling non-stationary data to the context of panel data raises the question of heterogeneity as well. The early development of dynamic panel data analysis mainly deals with the homogeneous models. But the availability of panel data sets such as the Penn World Table raises the issue of plausibility of the homogeneous assumption. The parameters as well as dynamic structures of different cross-section units might be different. Hence, it is necessary to develop methods investigating the non-stationary properties in the heterogeneous panel data models. Heterogeneous panel data model is referred to the situation that both the error term structures as well as the slopes can be different across the units. This is quite different from the usual fixed-effects (random-effects) models. There have been some papers dealing with tests for unit root and cointegration in the heterogeneous panel in the literature, see, for example, Im, Pesaran & Shin (1997), Maddala & Wu (1999) for panel unit root tests, and Pedroni (1995, 1997), Kao (1999), McCoskey & Kao (1997, 1998), and Wu & Yin (1999) for panel cointegration tests. Baltagi & Kao (2000) recently give a complete survey on this subject as well. As in the univariate case, it would be interesting to test for unit roots by using stationarity as the null. Not only does it provide a complement to the conventional unit root tests using nonstationarity as the null, but it also incorporates the moving average structure that seems to be a common empirical feature, especially for macroeconomic data.1 Thus, it is quite natural to develop stationarity tests for the heterogeneous panel. However, panel stationarity tests have not yet received serious attention in the literature. Stationarity tests have been developed for residuals to be used as the residual-based tests for the null of cointegration in panel data models in McCoskey & Kao (1998). Hadri (1998) addresses panel stationarity test directly. However, he only considers models with i.i.d. errors and only considers homogeneous deterministic trends under the null hypothesis. In this chapter, we shall develop some stationarity tests in heterogeneous panel data models. The models we consider will allow both heterogeneous
Panel Stationarity Tests
277
deterministic trends under the null and different error structures. The tests should be able to handle serially correlated errors in the models. In the univariate case, based on a Lagrange Multiplier (LM) test in case of i.i.d. errors, there are two different extensions to handle the existence of serial correlation. Kwiatkowski, Phillips, Schmidt & Shin (1992) (KPSS hereafter) propose to use nonparametric estimation to handle the situation while Leybourne & McCabe (1994) (LMC hereafter) propose to use augmented autoregressive components to take care of it. We shall propose panel stationarity tests utilizing both tests. One type of the tests we propose would be based on the group mean of the individual test statistics, which can be shown to have a normal distribution asymptotically after some adjustments are made to the group mean. The second test is in line with Maddala & Wu (1999). The idea of the test could be traced back to Fisher (1932), which pools the p-values from individual tests. We will also design some Monte Carlo experiments to investigate the small sample performances of the proposed tests. The rest of the chapter is organized as follows. In Section II we will set up the models for heterogeneous panel and discuss panel stationary tests. Monte Carlo simulation designs and results aiming at investigating small sample performances of proposed tests can be found in Section III, and Section IV concludes.
II. TESTS FOR STATIONARITY IN THE HETEROGENEOUS PANELS The basic model for testing for trend stationarity in the univariate time series is as follows: yt = rt + t + t
(1)
where rt is a random walk: rt = rt 1 + t It is assumed that t ~ iid(0, ), t ~ iid(0, 2), and t and t are independent. The initial value r0 is treated as fixed and serves as the role of an intercept. The null of stationarity is simply 2 = 0. Under the null, yt is trend stationary because t is assumed to be stationary. Define q = 2/2. q is the so-called signal-to-noise ratio in structural time series models. The null can be specified as H0 : q = 0 as well. If = 0, the model will be reduced to 2
yt = rt + t and under the null yt is level stationary instead of trend stationary.
(2)
278
YONG YIN & SHAOWEN WU
The statistic considered in the literature is both the one-sided LM test statistic and the local best invariant (LBI) test statistic under the stronger assumption that the t’s are normal.2 Let eˆ t be the residuals from the regression
T
of yt on a linear time trend. Define ˆ as ˆ =
2
2
t
process of the residuals St =
eˆ 2t/T and the partial sum
T
t=1
eˆ i . Then the LM test statistic is LM =
i=1
S2t/ˆ 2 .
t=1
In order to construct the LM test statistic to test the null hypothesis of level stationary instead of trend stationary, we should define eˆ t as the residuals from the regression of yt on an intercept only. d
It has been shown that for the trend stationary model, T –2LM →
1
V2(r)2 dr
0
under the null hypothesis, where V2(r) is the second-level Brownian bridge given by V2(r) = W(r) + (2r 3r2)W(1) + (–6r + 6r2)
1
W(s) ds, with W(r)
0
being a Wiener process. For the level stationary model, under the null, d
T –2LM →
1
V(r)2 dr,
where
V(r)
is
a
standard
Brownian
bridge:
0
V(r) = W(r) rW(1). There are two ways to incorporate serial correlation into the basic univariate models. One way is due to KPSS and the other one is due to LMC. In KPSS, the models are still (1) and (2) with modification that t can be serially correlated in any form. The usual specification is that t satisfies the strong mixing regularity conditions of Phillips & Perron (1988). Under such conditions, the normalized numerators of the LM test statistics will converge to the corresponding Brownian bridges associated with the long-run variance 2 of t. So the effort is concentrated on how to get a consistent estimator of 2. KPSS consider the Newey & West (1987) consistent estimator s2(l),
T
2
which is based on nonparametric estimation of s (l) = T
l
T
w(s, l)
s=1
–1
eˆ 2t + 2T –1
t=1
eˆ teˆ t s . This estimator depends on the choice of a spectral
t=s+1
window w(s, l) along with the truncation parameter l. KPSS use the Bartlett window and recommend choosing l = o(T 1/2). The resulting test statistics are labeled as ˆ for level stationary models and ˆ for
Panel Stationarity Tests
279
T
tend stationary models with ˆ () = T
–2
S2t/s2(l), where both S2t and s2(l)
l=1
depend on eˆ t, which is the residual from the regression of yt on an intercept only for the level stationary models and on a linear trend for the trend stationary d
models. It has also been proved that ˆ →
1
d
V(r)2 dr, ˆ →
0
1
V2(r)2 dr and
0
both tests are consistent. See KPSS for more details of derivation and proof along with some simulation results. The KPSS tests handle the serial correlation in a way similar to those of Phillips-Perron tests for unit roots. LMC, on the other hand, propose to use the augmented autoregression to handle serial correlation, which is similar in a way to those of the Augmented Dickey-Fuller tests for unit roots. Since any stationary structure can be represented by autoregressive structures, LMC work with transformed models of (1) and (2). That is, (L)yt = rt + t + t for trend stationary models, and (L)yt = rt + t for level stationary models, where (L) is a polynomial in lag operator L. To construct the test statistics, one should estimate ARIMA(p, 1, 1) models in order to remove the serial correlation first, and proceed with the ‘whitened’ series to get the LM test statistic as if there is no serial correlation. LMC label the test statistic sˆ for the level stationary models and sˆ for the trend stationary models. Please see their paper for detailed descriptions and discussions of the d
tests. They also show that under the null sˆ →
1
d
V(r)2 dr and sˆ →
0
1
V2(r)2 dr.
0
LMC argue that their tests are superior to the KPSS tests due to the fact that the augmented autoregression is used to control for serial correlation. Theoretically, the LMC tests are more powerful than the KPSS tests because the LMC test statistics are Op(T) under the alternative while the KPSS test statistics are Op(T/l). This superiority is also shown through Monte Carlo simulation.3 The univariate model for testing for stationarity can be readily extended to the panel data models. Let yit, i = 1, . . . , N, t = 1, . . . , T, be the observed N cross section units of time span of T for which we want to test for stationarity. Let us consider the following models. Level stationarity: yit = rit + it
(3)
Trend stationarity: yit = rit + it + it
(4)
Where rit = rit 1 + it, with ri0’s being fixed constants such that ri0 is not necessarily equal to rj0 if i ≠ j.4
280
YONG YIN & SHAOWEN WU
Assumption (i)
E(it) = 0, and E(itjs) =
2i 0
if i = j and t = s otherwise
(ii) For each cross-section unit i, it either satisfies the strong mixing conditions for functional central limit theorem to be hold with long-run variance of 2i, or it can be expressed in a p-th order AR model. (iii) E(itjs) = 0 i, j, t, s Note that assumption (i) adds heterogeneity to the error structure of by allowing heteroskedasticity. Assumption (ii) also allows heteroskedasticity in while assumption (iii) rules out contemporaneous correlation and states that and are uncorrelated within units as well. Define qi = 2i/2i, that is, qi’s are the signal-to-noise ratios in each crosssection units. The null hypothesis can be expressed as H0 : qi = 0 for all i. For level stationary models, under H0, each cross-section unit is stationary around a level ri0, which is not necessarily the same across the units. While for trend stationary models, under H0, each cross-section unit is stationary around a linear trend ri0 + it, which is also not necessarily the same across the units. The different levels and linear trends truly reflect the possibility of heterogeneity across sections. The alternative hypothesis is that H1 : qi > 0 for all i. Here, we introduce heterogeneity by allowing different signal-to-noise ratios across sections. That is, the signal-to-noise ratios are only required to be greater than 0 but not necessarily to be the same under the alternative. Let ˆ and ˆ be the individual KPSS test statistic for the i-th unit. Define 1 =
1
1
V(r)2 dr and 2 =
0
V2(r)2 dr. We can construct the standardized group
0
mean tests as
N
¯ = and
1 N N
ˆ i E( 1)
i=1
Var( 1)
N
¯ =
1 N N
i=1
ˆ i E( 2)
for level stationary models
for trend stationary models. Var( 2) Similarly, let sˆ i and sˆi be the individual LMC test statistic for the i-th unit. Define the standardized group mean tests as
Panel Stationarity Tests
281
for level stationary models
for trend stationary models.
sˆ i E( 1)
sˆi E( 2)
N
N
s¯ = and
1 N
i=1
Var( 1) N
N
s¯ =
1 N
i=1
Var( 2)
By using the sequential limit theorem, it can be shown that under the null, all four test statistics would have the standard normal distribution asymptotically under the assumption spelled out earlier. Note that the sequential limit theorem requires that T goes to infinity followed by N goes to infinity, and the asymptotic can be established by an application of the Lindberg-Levy central limit theorem.5 The consistency of the tests is followed by the consistency of the univariate tests established in the literature. It should be noted that the tests are still consistent in the case of a mixed alternative hypothesis in which only part of the panel are nonstationary while the rest are stationary, as long as
= lim N1/N > 0 where N1 is the number of nonstationary series under the N→
alternative. Hadri (1998) used the characteristic function given by Anderson & Darling (1952) to compute the means and the variances of i. For the level stationary model, the mean is 1/6 and the variance is 1/45 while for the trend stationary model, the mean is 1/15 and the variance is 11/6300. However, as suggested in Im, Pesaran & Sin (1997), one can use the mean and the variance of small sample distributions (in finite T) obtained via simulations to enhance the finite sample performances of the group mean tests.6 The group mean test pools independent individual test statistics to find evidence on the composite null. In the literature, there is another way to pool information from individual test to test the composite null, which is due to Fisher (1932). The idea has been applied to develop panel unit root tests in Maddala & Wu (1999) and panel cointegration tests in Wu & Yin (1999). Both the KPSS and the LMC tests can be used to formulate the Fisher tests to test for stationarity as well. Let Pi be the p-value of the individual test for stationarity for the i-th unit (using either the KPSS or the LMC test). Define the
N
Fisher test statistic as = –2
log Pi.7 Then has a 2 distribution with
i=1
degree of freedom 2N under the null hypothesis that qi = 0 for all i. Note that
282
YONG YIN & SHAOWEN WU
the validity of the 2 distribution depends on the accuracy of the distributions from which Pi’s are derived, and thus it does not rely on the asymptotic of N where the group mean test does. On the other hand, the small sample distribution is usually unknown, so it is necessary to get the small sample distributions via simulations to enhance the small sample performance of the Fisher tests.8
III. MONTE CARLO SIMULATION RESULTS In this section, we will design some Monte Carlo simulation experiments to investigate the small sample properties of the panel stationarity tests we proposed in the previous section. The object of the simulations is to shed lights on the relative small sample performances of various tests. As we have seen, we can use either the KPSS or the LMC tests to handle the serial correlation. For each univariate stationarity test, we can use either the group mean test or the Fisher test to formulate the panel version. As illustrated in Maddala & Wu (1999) and Wu & Yin (1999), in many cases they considered, the performances of the group mean and Fisher tests are very similar to each other. However we still need to investigate it for stationarity tests. As for the univariate KPSS and LMC tests, LMC established small sample supremacy of their tests. But whether this supremacy can be carried over to the panel tests based on the individual LMC test remains a question, and it can be answered by simulation experiments. The basic models for simulations are models (3) and (4) with rit = rit 1 + it where it ~ iidN(0, qi2i). The models for it are it = iit 1 + uit where uit ~ iidN(0, (1 2i)2i). Hence when i = 0, it’s are i.i.d. within each unit, while it’s are serially correlated within each unit when i ≠ 0. These two models are extensions of the standard univariate models for stationarity to the panel data. The introduction of different i 2i ri0 and i is to allow the largest degree of heterogeneity. For this purpose, we set the parameters as follows: i ~ U[0, 1], 2i ~ U[0.5, 1.5], ri0 ~ U[0, 5] i = 0 for i.i.d. case and i ~ U[0.1, 0.3] for the case of serial correlation where U denotes the uniform distribution. The null hypothesis is specified as qi = 0 for all i. For the alternative hypothesis, we only consider the case where all qi’s are positive following the
Panel Stationarity Tests
283
tradition in the literature. It should be noted that all our tests are consistent even when there are only parts of the series are non-stationary under the alternative as long as the portion of nonstationary units is non-vanishing asymptotically. Furthermore, we only consider the alternative H1 : qi = q = 0.001 for simplicity.9 We consider time dimensions of 25, 50, and 100 and cross sectional dimensions of 15, 25, 50, and 100. The normal variates are generated by RNDN function in the matrix programming language GAUSS. We apply the group mean and Fisher tests based on the LM, KPSS, and LMC tests to each panel. For each case, the number of iterations is 5,000. For the group mean test, the mean and the variance of small sample distributions are derived from 100,000 simulations for the corresponding time span and test procedures. For the Fisher test, the small sample distributions are simulated using 100,000 replications as well. In order to carry out our experiments, we still need to select two parameters. One is the truncation parameter l in the individual KPSS tests and the other one is the order of autoregression p in the individual LMC tests. Following earlier simulation results regarding the univariate KPSS tests in the litera-
ture, we experiment with l1 = int 4
l3 = int 12
T 100
1/4
T 100
1/4
, l2 = int 8
T 100
1/4
, and
, where int[ ] returns the integer part of the argument.
Also, following earlier simulation results in the literature, we choose the Parzen window instead of the Bartlett window used by KPSS as the former performs better than the later. For the LMC test, we experiment with p = 1, 2, and 3 following Monte Carlo experiments by LMC. Let us first look at the white noise case. In this case i = 0 and the tests based on the individual LM tests are the appropriate ones to be used. Table 1 presents the sizes of the group mean and the Fisher tests based on the LM, KPSS, and LMC tests for the level stationary model. Note that by choosing l = 0 in the KPSS test or p = 0 in the LMC test, the resulting test statistic is nothing but that of the LM test. That is why the results for the tests based on the LM test are listed in the column with the heading of p(l) = 0. We also listed the results for N = 1 as a benchmark, where the results simply replicate those for the univariate case. As we can see from the table, the size performances of the panel stationarity tests are quite satisfactory in this case. In addition the performances are relatively better as T gets larger. In most cases, the Fisher tests have better size performances than the group mean tests, especially for larger T and smaller N. This is not surprising as the Fisher test is an exact test while the group mean
284
Table 1.
T
YONG YIN & SHAOWEN WU
Sizes of Panel Stationarity Tests: Level Stationary Model, White Noise p(l) = 0
l1
KPSS l2
l3
p=1
LMC p=2
p=3
0.047
0.049
0.053
0.055
0.047
0.049
0.051
15 25 50 100
0.061 0.053 0.054 0.046
Group Mean Test 0.057 0.061 0.059 0.057 0.053 0.053 0.055 0.055 0.055 0.047 0.050 0.053
0.063 0.057 0.054 0.046
0.059 0.058 0.054 0.050
0.063 0.063 0.059 0.051
15 25 50 100
0.050 0.045 0.047 0.043
0.047 0.048 0.050 0.043
0.056 0.051 0.053 0.052
0.046 0.048 0.046 0.041
0.045 0.044 0.047 0.046
0.052 0.053 0.052 0.047
1
0.047
0.046
0.051
0.047
0.050
0.050
15 25 50 100
0.066 0.066 0.056 0.057
0.059 0.062 0.053 0.054
Group Mean Test 0.058 0.064 0.060 0.058 0.050 0.054 0.057 0.061
0.051 0.067 0.059 0.056
0.065 0.070 0.065 0.061
0.058 0.067 0.057 0.055
15 25 50 100
0.052 0.054 0.049 0.051
0.050 0.052 0.045 0.050
0.054 0.053 0.048 0.057
0.049 0.054 0.049 0.048
0.050 0.055 0.055 0.054
0.046 0.053 0.049 0.051
1
0.051
0.050
0.049
0.048
0.045
0.049
15 25 50 100
0.056 0.057 0.056 0.056
0.058 0.057 0.057 0.058
Group Mean Test 0.057 0.059 0.058 0.061 0.058 0.054 0.056 0.055
0.060 0.061 0.062 0.059
0.063 0.062 0.054 0.056
0.069 0.063 0.064 0.059
15 25 50 100
0.047 0.047 0.049 0.053
0.046 0.049 0.051 0.053
0.045 0.050 0.051 0.053
0.049 0.049 0.047 0.052
0.048 0.049 0.053 0.051
N 1
25
50
100
Fisher Test 0.051 0.048 0.052 0.047 0.047
Fisher Test 0.050 0.053 0.043 0.054 0.049
Fisher Test 0.047 0.051 0.051 0.053
0.046 0.051 0.049 0.051
Note: 1. The data generating process is yit = ri0 + it, and it ~ i.i.d.N(0, 2i). 2. Please see text for choices of parameters 3. li is the truncation parameter used in individual KPSS test and p is the order of autoregression in ARIMA(p,1,1) used in individual LMC test. p(l) = 0 indicates individual LM test is used.
Panel Stationarity Tests
285
test is an asymptotic test (in N). As for the tests based on the KPSS tests with different lag truncation parameters and the LMC tests with different autoregression orders, the sizes are also quite close to the nominal size of 5%. In general, we also observe that the size performances are better for larger T and the Fisher tests have better size performances in this case. Table 2 presents the powers of the panel stationarity tests for the level stationary models. To make things comparable, all the powers are adjusted according to their true sizes. The powers of the LM based tests clearly state the superiority of the panel stationary tests over their univariate counterparts. When T = 25, the power of the univariate LM test is only 0.117, while the power jumps to 0.392 when 15 cross-section units are used, and it is close to 1 (0.954 for the group mean test and 0.952 for the Fisher test) when N = 100. As a matter of fact, all the powers for T = 100 are 1 and they are close to 1 when T = 50. The powers of the group mean and the Fisher tests in most cases are almost the same. It is documented in the literature that increasing the lag truncation parameter l in the KPSS tests and the autoregression order p in the LMC tests can reduce the powers. This is replicated in Table 2 as those entries for N = 1. However, due to the powerfulness of the panel stationarity tests, the reduction in the powers by overestimating is not an issue in some cases, especially for larger T and N, as in those cases the powers are 1 or close to 1. This is a unique feature of panel stationarity tests. The reduction in power is smaller for the LMC tests as p increases than for the KPSS tests as l increases. The size and power performances of panel stationarity tests in the case of white noise for the trend stationary models are reported in Tables 3 and 4. We have similar observations in these two tables. One thing we need to point out is that in this case the powers are smaller than those of level stationary models, especially for the case of T = 25 where the powers are much smaller. The powers are only 0.280 for the group mean test and 0.279 for the Fisher test even when N = 100, though these represent an increase of nearly four-folds from the univariate case. Next, let us look at the results for the case of serial correlation. Table 5 gives us the sizes of panel stationarity tests in this case. Note that size distortions are expected for the tests based on the LM tests. This can be seen in the table for the case of N = 1. But the size distortions become much worse as N increases. As a matter of fact, the actual sizes are close to 1 when N = 100. This is due to the fact that the size distortions are amplified through pooling the crosssectional units, as pointed out in Wu & Yin (1999) for the panel cointegration tests as well. The size distortions are still quite severe when l1 is used in the KPSS tests and they become moderate when l2 and l3 are used for T = 50 and
286
Table 2.
T
YONG YIN & SHAOWEN WU
Size Adjusted Powers of Panel Stationarity Tests: Level Stationary Model, White Noise p(l) = 0
l1
KPSS l2
l3
p=1
LMC p=2
p=3
0.117
0.110
0.089
0.074
0.105
0.094
0.086
15 25 50 100
0.392 0.546 0.775 0.954
Group Mean Test 0.365 0.272 0.183 0.491 0.383 0.262 0.712 0.576 0.377 0.936 0.834 0.612
0.305 0.414 0.630 0.874
0.263 0.341 0.527 0.779
0.233 0.306 0.473 0.727
15 25 50 100
0.384 0.542 0.771 0.952
0.362 0.492 0.719 0.936
0.156 0.236 0.359 0.584
0.302 0.408 0.635 0.873
0.264 0.346 0.526 0.780
0.235 0.308 0.477 0.729
1
0.302
0.284
0.224
0.277
0.251
0.218
15 25 50 100
0.961 0.995 1.000 1.000
0.939 0.990 1.000 1.000
Group Mean Test 0.903 0.815 0.977 0.944 1.000 0.998 1.000 1.000
0.931 0.986 1.000 1.000
0.884 0.969 0.999 1.000
0.835 0.941 0.999 1.000
15 25 50 100
0.960 0.995 1.000 1.000
0.938 0.991 1.000 1.000
0.828 0.946 0.998 1.000
0.932 0.986 1.000 1.000
0.891 0.972 0.999 1.000
0.836 0.944 0.999 1.000
1
0.583
0.536
0.455
0.566
0.547
0.512
15 25 50 100
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
Group Mean Test 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
15 25 50 100
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
N 1
25
50
100
Note: 1. The data generating it ~ i.i.d.N(0, 2i). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
process
Fisher Test 0.271 0.381 0.576 0.835 0.268
Fisher Test 0.908 0.978 1.000 1.000 0.495
is
Fisher Test 1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
yit = rit + it,
rit = ri,t 1 + it,
it ~ i.i.d.N(0, q2i),
and
Panel Stationarity Tests
Table 3.
T
Sizes of Panel Stationarity Test Based on Group Mean: Trend Stationary Model, White Noise p(l) = 0
l1
KPSS l2
l3
p=1
LMC p=2
p=3
0.052
0.052
0.060
0.054
0.051
0.050
0.054
15 25 50 100
0.065 0.064 0.066 0.062
Group Mean Test 0.064 0.058 0.057 0.058 0.059 0.062 0.063 0.060 0.062 0.060 0.058 0.060
0.071 0.073 0.064 0.060
0.061 0.067 0.065 0.060
0.065 0.067 0.063 0.063
15 25 50 100
0.057 0.054 0.059 0.054
0.055 0.051 0.057 0.053
0.050 0.051 0.054 0.056
0.055 0.057 0.055 0.053
0.051 0.053 0.056 0.053
0.058 0.059 0.057 0.057
1
0.046
0.047
0.049
0.046
0.047
0.050
15 25 50 100
0.050 0.060 0.057 0.058
0.055 0.053 0.055 0.054
Group Mean Test 0.056 0.065 0.053 0.060 0.053 0.056 0.050 0.059
0.064 0.062 0.068 0.064
0.073 0.073 0.075 0.072
0.068 0.069 0.068 0.074
15 25 50 100
0.049 0.051 0.052 0.054
0.047 0.048 0.049 0.052
0.056 0.055 0.052 0.056
0.049 0.048 0.055 0.056
0.053 0.056 0.061 0.066
0.051 0.053 0.054 0.064
1
0.046
0.042
0.042
0.043
0.050
0.048
15 25 50 100
0.061 0.057 0.059 0.054
0.062 0.057 0.059 0.053
Group Mean Test 0.060 0.058 0.057 0.055 0.060 0.056 0.056 0.055
0.064 0.063 0.062 0.062
0.070 0.065 0.068 0.060
0.074 0.068 0.066 0.059
15 25 50 100
0.052 0.049 0.052 0.048
0.051 0.050 0.051 0.049
0.052 0.053 0.053 0.054
0.053 0.055 0.060 0.054
0.060 0.057 0.057 0.053
N 1
25
50
100
287
Fisher Test 0.052 0.052 0.055 0.056 0.045
Fisher Test 0.050 0.049 0.049 0.049 0.041
Fisher Test 0.051 0.050 0.051 0.050
0.052 0.049 0.051 0.048
Note: 1. The data generating process is yit = ri0 + it + it, and it ~ i.i.d.N(0, 2i). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
288
Table 4.
T
YONG YIN & SHAOWEN WU
Size Adjusted Powers of Panel Stationarity Test:Trend Stationary Model, White Noise p(l) = 0
l1
KPSS l2
l3
p=1
LMC p=2
p=3
0.068
0.060
0.047
0.045
0.061
0.061
0.058
15 25 50 100
0.108 0.144 0.172 0.280
Group Mean Test 0.106 0.069 0.040 0.128 0.074 0.034 0.159 0.085 0.031 0.245 0.116 0.027
0.091 0.090 0.118 0.185
0.090 0.092 0.109 0.164
0.080 0.083 0.102 0.132
15 25 50 100
0.109 0.144 0.163 0.279
0.103 0.138 0.157 0.257
0.040 0.030 0.027 0.024
0.089 0.090 0.127 0.187
0.086 0.092 0.109 0.169
0.079 0.083 0.098 0.140
1
0.133
0.124
0.079
0.120
0.106
0.095
15 25 50 100
0.485 0.629 0.867 0.986
0.426 0.576 0.806 0.971
Group Mean Test 0.327 0.159 0.450 0.212 0.677 0.320 0.901 0.508
0.374 0.509 0.723 0.937
0.287 0.374 0.564 0.817
0.252 0.317 0.488 0.718
15 25 50 100
0.490 0.631 0.864 0.985
0.427 0.574 0.805 0.968
0.153 0.205 0.311 0.488
0.385 0.518 0.740 0.939
0.293 0.400 0.581 0.833
0.252 0.336 0.509 0.730
1
0.341
0.317
0.231
0.321
0.272
0.239
15 25 50 100
0.991 1.000 1.000 1.000
0.975 1.000 1.000 1.000
Group Mean Test 0.938 0.874 0.993 0.972 1.000 0.999 1.000 1.000
0.978 0.999 1.000 1.000
0.946 0.994 1.000 1.000
0.873 0.974 1.000 1.000
15 25 50 100
0.990 1.000 1.000 1.000
0.972 1.000 1.000 1.000
0.979 0.999 1.000 1.000
0.950 0.995 1.000 1.000
0.888 0.982 1.000 1.000
N 1
25
50
100
Fisher Test 0.067 0.070 0.084 0.108 0.106
Fisher Test 0.325 0.445 0.673 0.898 0.275
Fisher Test 0.937 0.992 1.000 1.000
0.868 0.972 1.000 1.000
Note: 1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), and it ~ i.i.d.N(0, 2i). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
Panel Stationarity Tests
Table 5.
T
25
50
100
289
Sizes of Panel Stationarity Tests: Level Stationary Model, Serial Correlation
p(l) = 0
KPSS l1
l2
l3
LMC p=1
p=2
p=3
1
0.079
0.059
0.050
0.047
0.051
0.054
0.058
15 25 50 100
0.532 0.694 0.904 0.993
0.232 0.302 0.433 0.657
Group Mean Test 0.074 0.032 0.076 0.025 0.079 0.017 0.087 0.012
0.130 0.144 0.181 0.212
0.150 0.182 0.230 0.314
0.152 0.172 0.221 0.328
15 25 50 100
0.490 0.669 0.897 0.993
0.205 0.270 0.401 0.641
0.028 0.024 0.016 0.012
0.104 0.123 0.156 0.212
0.129 0.160 0.210 0.314
0.129 0.150 0.206 0.328
1
0.080
0.057
0.046
0.055
0.060
0.059
15 25 50 100
0.551 0.747 0.945 0.999
Group Mean Test 0.162 0.081 0.058 0.208 0.090 0.052 0.300 0.103 0.049 0.472 0.132 0.048
0.099 0.102 0.117 0.145
0.126 0.144 0.182 0.250
0.138 0.161 0.209 0.286
15 25 50 100
0.517 0.729 0.944 0.999
0.140 0.190 0.279 0.456
0.050 0.050 0.047 0.047
0.077 0.082 0.091 0.116
0.096 0.113 0.155 0.213
0.109 0.137 0.178 0.264
1
0.094
0.062
0.052
0.052
0.058
0.057
15 25 50 100
0.563 0.783 0.944 0.998
Group Mean Test 0.130 0.080 0.065 0.169 0.083 0.063 0.210 0.091 0.065 0.307 0.104 0.066
0.077 0.082 0.081 0.087
0.086 0.094 0.096 0.106
0.099 0.114 0.124 0.145
15 25 50 100
0.532 0.773 0.943 0.998
0.109 0.148 0.193 0.293
0.057 0.064 0.066 0.070
0.066 0.075 0.072 0.083
0.074 0.083 0.092 0.112
N
Fisher Test 0.066 0.067 0.072 0.089 0.050
Fisher Test 0.070 0.077 0.095 0.128 0.053
Fisher Test 0.062 0.071 0.083 0.098
0.052 0.056 0.059 0.062
Note: 1. The data generating process is yit = ri0 + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
290
YONG YIN & SHAOWEN WU
100. For the LMC test, the size distortion is still considerably large when the true order of autoregression (p = 1) is used when T = 25. The size distortions become smaller and moderate when T increases to 50 and 100. Interestingly, overestimating in this case increases the size distortions. We can also observe that the Fisher tests in general have better size performances than the group mean tests. Table 6 reports the power performances of the panel stationarity tests in the presence of serial correlation. The first thing we can notice is that the powers are lower than those in the white noise case for some combinations of N and T. The powers are around 60% even when N = 100 and T = 25 for the KPSS tests with l2 and the LMC tests with p = 1, which have relatively moderate size distortions. The powers are close to 1 when N is larger than 50 and T = 50 for these two tests (the group mean and Fisher tests). When T = 100, however, all the powers are still 1 or very close to 1. In such a case, smaller size distortion would be the primary criterion to decide which test to be used in practice. The powers of the KPSS tests with l2 and the LMC test with p = 1 are almost the same for most cases though the results for N = 1 actually indicate that the later has an advantage in the univariate case, which agrees with the findings in LMC. There are almost no differences in the power performances of the group mean and the Fisher tests. The size distortions of the panel stationarity tests for the trend stationary models with serial correlation are presented in Table 7 with size adjusted powers presented in Table 8. For the size distortions, we have the same observations as those for the level stationary models. Quite interestingly, the KPSS tests with l2 has slightly edge over the LMC tests with p = 1 when T = 50 while the situation is reversed when T = 100. But we observe severe negative size distortions for the KPSS tests with l2 when T = 25. Except for this case, the size distortions for these two tests are smaller than the corresponding ones in the level stationary models. The Fisher tests have relatively better size performances than the group mean tests, especially when the individual LMC tests are used. As for the adjusted powers, we only need to report the lower powers compared to the level stationary models since things are relatively the same as those for the level stationary models. For the KPSS tests with l2 and the LMC tests with p = 1, the powers are about 70% even when N = 100 for T = 50, compared with powers of 1 in the same situation for the level stationary models. The powers are close to 1 when T = 100 and there are more than 25 cross-section units in the panel. In summary, through Monte Carlo simulations, we found the tests we proposed have quite satisfactory small sample performances in most cases we considered. In the absence of serial correlation, the tests based on the LM tests
Panel Stationarity Tests
Table 6.
T
Size Adjusted Powers of Panel Stationarity Tests:Level Stationary Model, Serial Correlation p(l) = 0
KPSS l1
l2
l3
LMC p=1
p=2
p=3
0.153
0.109
0.095
0.079
0.100
0.095
0.089
15 25 50 100
0.249 0.338 0.489 0.754
Group Mean Test 0.234 0.211 0.161 0.319 0.264 0.210 0.478 0.400 0.306 0.724 0.619 0.474
0.207 0.250 0.394 0.588
0.174 0.207 0.329 0.532
0.157 0.204 0.302 0.479
15 25 50 100
0.247 0.337 0.484 0.750
0.228 0.316 0.488 0.729
0.163 0.209 0.301 0.466
0.212 0.248 0.394 0.584
0.171 0.207 0.331 0.534
0.161 0.200 0.304 0.490
1
0.316
0.242
0.197
0.235
0.198
0.183
15 25 50 100
0.886 0.862 0.998 1.000
0.831 0.939 0.996 1.000
Group Mean Test 0.761 0.656 0.904 0.841 0.992 0.980 1.000 0.999
0.775 0.912 0.993 1.000
0.712 0.854 0.981 1.000
0.643 0.813 0.967 0.999
15 25 50 100
0.885 0.962 1.000 1.000
0.833 0.941 1.000 1.000
0.673 0.844 1.000 1.000
0.774 0.917 1.000 1.000
0.723 0.858 1.000 1.000
0.651 0.812 1.000 1.000
1
0.530
0.500
0.429
0.524
0.490
0.471
15 25 50 100
1.000 1.000 1.000 1.000
0.999 1.000 1.000 1.000
Group Mean Test 0.999 0.996 1.000 1.000 1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
0.999 1.000 1.000 1.000
0.998 1.000 1.000 1.000
15 25 50 100
1.000 1.000 1.000 1.000
0.999 1.000 1.000 1.000
1.000 1.000 1.000 1.000
0.999 1.000 1.000 1.000
0.999 1.000 1.000 1.000
N 1
25
50
100
291
Fisher Test 0.205 0.269 0.412 0.620 0.219
Fisher Test 0.772 0.910 1.000 1.000 0.468
Fisher Test 0.999 1.000 1.000 1.000
0.996 1.000 1.000 1.000
Note: 1. The data generating process is yit = ri0 + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
292
Table 7.
T
YONG YIN & SHAOWEN WU
Sizes of Panel Stationarity Tests:Trend Stationary Model, Serial Correlation p(l) = 0
KPSS l1
l2
l3
LMC p=1
p=2
p=3
0.091
0.067
0.044
0.003
0.056
0.057
0.066
15 25 50 100
0.657 0.808 0.975 0.999
Group Mean Test 0.252 0.016 0.007 0.314 0.001 0.004 0.495 0.005 0.000 0.723 0.001 0.000
0.144 0.151 0.183 0.223
0.156 0.181 0.267 0.377
0.142 0.174 0.245 0.338
15 25 50 100
0.610 0.775 0.966 0.999
0.226 0.292 0.459 0.700
0.003 0.002 0.000 0.000
0.108 0.121 0.149 0.185
0.134 0.158 0.239 0.362
0.127 0.157 0.231 0.333
1
0.094
0.060
0.040
0.057
0.062
0.061
15 25 50 100
0.758 0.931 0.991 1.000
0.177 0.252 0.332 0.524
Group Mean Test 0.058 0.019 0.053 0.011 0.048 0.006 0.048 0.001
0.079 0.091 0.092 0.091
0.134 0.160 0.189 0.251
0.160 0.194 0.237 0.341
15 25 50 100
0.717 0.913 0.988 1.000
0.155 0.224 0.305 0.500
0.017 0.010 0.007 0.002
0.060 0.066 0.072 0.067
0.096 0.118 0.138 0.189
0.120 0.159 0.198 0.297
1
0.092
0.053
0.041
0.049
0.048
0.054
15 25 50 100
0.789 0.928 0.998 1.000
0.138 0.171 0.259 0.377
Group Mean Test 0.059 0.042 0.061 0.039 0.069 0.032 0.066 0.026
0.062 0.056 0.053 0.051
0.076 0.076 0.075 0.074
0.098 0.101 0.115 0.133
15 25 50 100
0.752 0.911 0.997 1.000
0.114 0.148 0.236 0.354
0.046 0.043 0.046 0.046
0.052 0.057 0.055 0.051
0.064 0.077 0.081 0.091
N 1
25
50
100
Fisher Test 0.014 0.012 0.005 0.001 0.051
Fisher Test 0.049 0.050 0.050 0.056 0.044
Fisher Test 0.052 0.054 0.062 0.063
0.039 0.036 0.031 0.027
Note: 1. The data generating process is yit = rit + it + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
Panel Stationarity Tests
Table 8.
T
Size Adjusted Powers of Panel Stationarity Tests:Trend Stationary Model, Serial Correlation p(l) = 0
KPSS l1
l2
l3
LMC p=1
p=2
p=3
0.065
0.060
0.051
0.044
0.064
0.065
0.054
15 25 50 100
0.055 0.088 0.130 0.203
Group Mean Test 0.068 0.076 0.054 0.089 0.076 0.044 0.122 0.088 0.037 0.185 0.106 0.036
0.059 0.072 0.086 0.120
0.066 0.078 0.100 0.122
0.064 0.071 0.090 0.090
15 25 50 100
0.052 0.088 0.132 0.203
0.067 0.085 0.119 0.178
0.053 0.042 0.037 0.032
0.056 0.072 0.091 0.119
0.061 0.077 0.098 0.121
0.066 0.076 0.090 0.089
1
0.123
0.109
0.087
0.100
0.090
0.086
15 25 50 100
0.389 0.381 0.693 0.905
0.311 0.381 0.603 0.827
Group Mean Test 0.240 0.139 0.324 0.213 0.489 0.274 0.699 0.391
0.240 0.302 0.437 0.674
0.207 0.238 0.367 0.548
0.186 0.190 0.324 0.478
15 25 50 100
0.389 0.377 0.696 0.903
0.312 0.384 0.608 0.829
0.140 0.216 0.270 0.403
0.234 0.312 0.444 0.680
0.203 0.247 0.374 0.554
0.189 0.199 0.333 0.502
1
0.302
0.264
0.208
0.273
0.236
0.200
15 25 50 100
0.935 0.993 1.000 1.000
0.908 0.984 1.000 1.000
Group Mean Test 0.853 0.767 0.958 0.909 0.998 0.993 1.000 1.000
0.881 0.976 1.000 1.000
0.816 0.930 0.995 1.000
0.707 0.855 0.981 1.000
15 25 50 100
0.934 0.993 1.000 1.000
0.902 0.984 1.000 1.000
0.886 0.974 1.000 1.000
0.823 0.939 0.997 1.000
0.735 0.869 0.985 1.000
N 1
25
50
100
293
Fisher Test 0.072 0.078 0.087 0.106 0.097
Fisher Test 0.252 0.330 0.481 0.694 0.235
Fisher Test 0.849 0.955 0.998 1.000
0.754 0.882 0.990 1.000
Note: 1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i) 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
294
YONG YIN & SHAOWEN WU
have sizes close to the nominal size and powers much higher than the univariate LM tests. Using the KPSS and LMC tests in this case would not result in much size distortions, but would result in power losses for some combinations of N and T, while the powers are already 1 or close to 1 for other combinations of N and T. In the presence of serial correlation, we found that the tests based on the KPSS tests with l2 and the LMC tests with p = 1 have relatively good size performances though there are still moderate to severe size distortions when the time span is short (T = 25), especially for the trend stationary models. And the powers of all tests are lower than their counterparts in the white noise case. Overall, the Fisher tests have better size performances than the group mean tests while their power performances are almost the same.
IV. CONCLUSION In this chapter, we developed several tests for stationarity in the heterogeneous panel. We analyzed both level stationary and trend stationary models. By allowing maximum degree of heterogeneity in the panel, we considered two different ways to pool information regarding the null hypothesis from each cross-section units by using the group mean test and the Fisher test. The group mean test pools the information of the univariate test statistics while the Fisher test summarizes the p-values of the individual tests. For the univariate stationary tests, we consider the KPSS and LMC tests in the case of serial correlation. The group mean tests based on the KPSS, and LMC tests are asymptotically normal while the Fisher test statistics follow 2 distributions. The small sample performances of the tests were investigated via Monte Carlo simulation experiments. The results of simulations showed that the tests we proposed have quite satisfactory size and power performances. In general, the Fisher type tests have better size performances than the group mean type tests while they have similar power performances. The tests based on the KPSS tests with l2 and the LMC tests with p = 1 perform very similarly in terms of size and power in most cases when there is serial correlation, except for the short time span (T = 25). The size performances of these two tests are quite good in the presence of serial correlation when T = 50 and 100. However, there are still moderate to severe size distortions when T = 25 in the presence of serial correlation. In such a case, bootstrapping method might be an effective way to obtain better size performances. This would be an interesting topic for future research. According to our simulation results, we would recommend to use either the group mean tests or the Fisher tests which are based on both the KPSS tests with l2 and the LMC tests with p = 1 to test for stationarity in the heterogeneous panel data models in empirical work.
Panel Stationarity Tests
295
ACKNOWLEDGMENTS We would like to thank Badi Baltagi and three anonymous referees for their helpful comments. Of course, all remaining errors are ours.
NOTES 1. See, for example, Schwert (1987). 2. See KPSS for all relevant references and derivations of the tests. 3. Please see LMC for the details of this argument. Of course, this supremacy depends on the correct specification of the LMC model, as pointed out by one anonymous referee. 4. This means that the intercepts in different cross-section units can be different, one aspect of the heterogeneous panel. 5. The moment restriction in applying the Lindberg-Levy CLT should not be a problem here because all tests are variants of the LM tests, which are bounded. 6. The small sample distributions of these tests can be derived by simulating series of given T under the null and apply the given test to the simulated series over a prespecified number of iterations. 7. In a recent paper, Choi (2000) proposes to standardize the Fisher test statistics as well. But this is unnecessary unless N is large enough. 8. Please see Maddala & Wu (1999) for a detailed comparison between the group mean and the Fisher tests. 9. By construction of the tests, the qi’s can be different across the units.
REFERENCES Anderson, T. W., & Darling, D. A. (1952). Asymptotic Theory of Certain ‘Goodness of Fit’ Criteria Based on Stochastic Processes. Annals of Mathematical Statistics 23: 193–212. Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey. Advances in Econometrics, 15, 7–51. Choi, I, (1999). Unit Root Tests for Panel Data’. Manuscript, Kookmin University. Fisher, R. A, (1932). Statistical Methods for Research Workers (4th ed.). Edinburgh: Oliver and Boyd. Hadri, K, (1998). Testing for Stationarity in Heterogeneous Panel Data. Working paper, School of Business and Economics, Exeter University. Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels. Discussion paper, University of Cambridge. Kao, C, (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 1–44. Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics, 54, 91–115. Leybourne, S. J., & McCabe, B. P. M. (1994). A Consistent Test for a Unit Root. Journal of Business and Economic Statistics, 12, 157–166.
296
YONG YIN & SHAOWEN WU
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test. Oxford Bulletin of Economics and Statistics, forthcoming. McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 57–84. McCoskey, S., & Kao, C. (1997). A Monte Carlo Comparison of Tests for Cointegration in Panel Data. Working paper, Center for Policy Research and Department of Economics, Syracuse University. Newey, W. K., & West,K. D. (1987). A Simple Positive Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55, 703–708. Pedroni, P, (1995). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests With an Application to the PPP Hypothesis. Working paper, Department of Economics, Indiana University. Pedroni, P, (1997). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests With an Application to the PPP Hypothesis, New Results. Working paper, Department of Economics, Indiana University. Phillips, P. C. B., & Perron, P. (1988). Testing For a Unit Root in Time Series Regression. Biometrika, 75, 335–346. Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo Comparison. Working paper, Department of Economics, State University of New York at Buffalo.
INSTRUMENTAL VARIABLE ESTIMATION OF SEMIPARAMETRIC DYNAMIC PANEL DATA MODELS: MONTE CARLO RESULTS ON SEVERAL NEW AND EXISTING ESTIMATORS M. Douglas Berg, Qi Li and Aman Ullah ABSTRACT We consider the problem of instrumental variable estimation of semiparametric dynamic panel data models. We propose several new semiparametric instrumental variable estimators for estimating a dynamic panel data model. Monte Carlo experiments show that the new estimators perform much better than the estimators suggested by Li & Stengos (1996) and Li & Ullah (1998).
I. INTRODUCTION Economic research has been enriched by the availability of panel data that measure individual cross-sectional behavior over time. For reviews on the literature of estimation and inference in parametric panel data models, see Baltagi (1995), Chamberlain (1984), Hsiao (1986) and Matyas & Sevestre
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 297–315. 2000 by Elsevier Science Inc. ISBN: 0-7623-0688-2
297
298
M. DOUGLAS BERG, QI LI & AMAN ULLAH
(1996)). Recently, semiparametric modeling and estimation has attracted much attention among statisticians and econometricians. One popular semiparametric model is the partially linear model. In this chapter we consider the problem of estimating a semiparametric dynamic panel data model which includes the following model as a special case: yit = yit 1 + (zit) + uit,
(1.1)
where the functional form of ( · ) is unspecified. Therefore (1.1) is a semiparametric dynamic panel data model. When ( · ) has a known form, say (zit) = zit, we obtain a parametric dynamic panel data model: yit = yit 1 + zit + uit.
(1.2)
When the error uit has a one-way error component structure, i.e. uit = i + it, then yit 1 and uit are correlated and instrumental variable methods are needed to obtain consistent estimation for . There is a rich literature on how to obtain consistent and efficient estimation results for parametric dynamic models, see Ahn & Schmidt (1995), Anderson & Hsiao (1981), Arellano & Bover (1995), Baltagi & Griffin (1998), Pesaran & Smith (1995) and Kiviet (1995), among others. The consistent and efficient estimation results for the parametric dynamic panel data model (1.2) depend crucially on the correct specification of the model. If (zit) ≠ zit, parametric estimation methods based on a misspecified model (1.2) will in general lead to inconsistent estimation of . Semiparametric partially linear models have the advantage of not specifying the functional form of ( · ). Hence a consistent semiparametric estimator of based on (1.1) is robust to functional form specification of ( · ). There is a rich literature on estimating a partially linear model with independent data using various non-parametric techniques, e.g. Engle et al. (1986), Robinson (1988), Stock (1989), Donald & Newey (1994), Li (1996). Also, see Ullah & Roy (1998), Ullah & Mundra (1998), and Khanna et al. (1999) for the estimation and applications of static partially linear panel data models. However, little attention has been paid to dynamic partially linear panel data models. Although Li & Stengos (1996) and Li & Ullah (1998) discussed how to estimate model (1.1) by semiparametric instrumental variable methods, no simulations are reported in those works and hence the finite sample performance of the estimators proposed in Li & Stengos (1996) and Li & Ullah (1998) are unknown.1 Li & Stengos (1996) proposed a semiparametric OLS type IV (OLS–IV) estimator for estimating . When the error follows an one-way error components structure. The OLS type estimator is not efficient because it
Semiparametric Dynamic Panel Data Model
299
ignores this error structure. Li & Ullah (1996) therefore proposed a semiparametric GLS-type IV (GLS–IV) estimator. However, the GLS–IV estimator in Li & Ullah (1998) did not make full use of the one-way error component structure. In fact when the model is just identified, their semiparametric IV–GLS estimator reduces to a semiparametric IV–OLS estimator and hence it is inefficient in the sense that the one-way error component structure is not utilized in constructing the estimator. In this chapter we propose a new semiparametric IV–GLS estimator and a new semiparametric IV–Within estimator that are more efficient than the ones considered in Li & Stengos (1996), and Li & Ullah (1998). We then use Monte Carlo experiments to examine the finite sample performances of the new semiparametric estimators and some existing estimators (e.g. Li & Ullah (1998) and Li & Stengos (1998)). Our simulation results show that the new estimators perform substantially better than the existing ones. The chapter is organized as follows. Section 2 first reviews the semiparametric estimators of Li & Stengos (1996), and Li & Ullah (1998). We then propose some new estimators. Section 3 reports Monte Carlo simulations to compare the relative performances of various estimators. Finally section 4 concludes the paper.
II. THE MODEL We consider a slightly more general semiparametric dynamic panel data model than (1.1) considered in the introduction section. yit = xit + (zit) + uit,
(i = 1, . . . , N; t = 1, . . . , T),
(2.1)
where xit is of dimension p 1, is a p 1 unknown parameter, zit is of dimension d, (·) is an unknown smooth function. We assume that the first element of xit is yit 1 so that model (2.1) is a semiparametric dynamic panel data model. We are mainly interested in obtaining accurate estimation for . We consider the case that the error uit follows an one-way error components specification, uit = i + it,
(2.2)
where i is i.i.d. (0, 2), it is i.i.d. (0, 2), i and jt are uncorrelated for all i and jt. In this chapter we propose a new semiparametric IV–GLS estimator that fully uses the one-way error component structure. We also propose a semiparametric IV-within-transformation estimator which has the advantage of computationally simplicity. Because it does not require one to estimate the
300
M. DOUGLAS BERG, QI LI & AMAN ULLAH
variance components. We then employ Monte Carlo simulations to investigate the finite sample performance of our proposed semiparametric IV estimators and compare them with some existing estimators. GLS type estimators require knowledge of error variance structure. In vector notation, the one-way error component model of (2.2) has the following form, u = (IN eT) + ,
(2.3)
where eT is a column of ones of dimension T, = (1, 2, . . . , N) is of dimension N 1, u and are both of dimension NT 1 with u = (u11, . . . , u1T, . . . , uN1, . . . , uNT) and is similarly defined. = E(uu) = 2IN JT + 2 INT, = IN [21J¯ T + 2ET] IN ,
(2.4) (2.5)
where JT = eT eT is a T T matrix with all elements equal to one, J¯ T = JT /T, ET = IT J¯ T and 21 = T2 + 2. By noting the facts that J¯ TET = 0, J¯ T + ET = IT, and both J¯ T and ET are idempotent matrices, it is easy to check that the inverse of is given by2 1 = IN [(1/21)J¯ T + (1/2 )ET] IN 1,
(2.6)
1/2 = IN [(1/1)J¯ T + (1/)ET] IN 1/2,
(2.7)
and The above expression of 1 and 1/2 will be used in GLS estimation procedure discussed below. A. Some Infeasible Estimators Equation (2.1) contains an unknown function ( · ), following Robinson (1988), we first eliminate ( · ). Taking conditional expectation of (2.1) conditional on zit and then subtracting it from (2.1) leads to yit E(yit|zit) = (xit E(xit|zit)) + uit vit + uit,
(2.8)
def
where vit = xit E(xit|zit). In vector-matrix notation we have y E(y|z) = v + u,
(2.9)
where y, E(y|z) and u are all NT 1 vectors with typical elements given by yit, E(yit|zit) and uit, respectively, and v is of dimension NT p with typical row given by vit = xit E(xit|zit).
Semiparametric Dynamic Panel Data Model
301
Equation (2.9) no longer contains the unknown function ( · ). Note that vit and uit are correlated because vit contains yit 1 and uit contains the random individual effects i. Suppose there exists a q 1(q ≥ p) instrumental variable it that is correlated with xit and uncorrelated with uit, then we can use def wit = it E(it|zit) as IV for vit. For example, consider a simple case where both xit and zit are scalars with xit = yit 1 and zit is strictly exogenous, then one can choose it = zit 1 as instrument for yit 1. In vector-matrix notation, an (infeasible) IV–OLS estimator of based on (2.9) is (see White (1984, 1987) for a discussion on IV estimation) ˜ IVO = (vwwv) 1vww(y E(y|z)) = + (vwwv) 1vwwu. (2.10) When the model is just identified, i.e. p = q, and if we assume that wv is invertible, then ˜ IVO becomes ˜ IVO = (wv) 1(vw) 1vww(y E(y|z)) = (wv) 1w(y E(y|z)). (2.11) The above IV–OLS estimator is not efficient because it ignores the error component variance structure. Li and Ullah (1998) suggested estimating by (2.12) ˜ = (vw(w 1w) 1wv) 1 vw(w 1w) 1w(y E(y|z)). However, when q = p and if we assume that the square matrices vw and w 1w are both invertible, then we have from (2.12) ˜ = (wv) 1(w 1w)(vw) 1vw(w 1w) 1w(y E(y|z)) = (wv) 1w(y E(y|z)) = ˜ IVO, that is, ˜ reduces to the IV–OLS estimator of (2.11) when the model is just identified. Therefore, the IV estimator ˜ also ignores the variance component structure when the model is just identified. A new IV–GLS estimator that fully uses the one-way error component structure is given by ˜ IVG = (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1(y E(y|z)) = + (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1u, (2.13) ˜ IVG of (2.13) is an optimal IV estimator as discussed in White (1984, 1987). When the model is just identified, i.e. p = q, and if we assume that both w 1v and w 1w are invertible, then ˜ IVG of (2.13) becomes ˜ IVG = (w 1v) 1(w 1w)(v 1w) 1v 1w(w 1w) 1w 1(y E(y|z)) (2.14) = (w 1v) 1w 1(y E(y|z)), ˜ which is different from IVO of (2.11). Note that one can transform the model by premultiplying y, v and w by 1/2. Denote y* = 1/2y, v* = 1/2v and w* = 1/2w, then the IV–GLS estimator of (2.13) is simply
302
M. DOUGLAS BERG, QI LI & AMAN ULLAH
˜ IVG = (w*v*) 1(w*w*)(v*w*) 1v*w*(w*w*) 1w*(y* E(y*|z)), (2.15) which is easier to compute since it does not require one to invert a NT NT matrix. p Let n = NT, then under the conditions of (i) wu/n → 0 (w is a legitimate IV), (ii) v 1w/n → A, and (iii) w 1w/n → B, a positive definite matrix, one can show that p
p
n( ˜ IVG ) → N(0, AB 1A). p
(2.16)
The proof of (2.16) is similar to the proof of lemma 3 of Li and Ullah and is therefore omitted here. Next we propose a simple IV estimator based on the within transformation. Within type estimator has the advantage of computationally simple, it only requires the least squares regression of the within transformed variables. Define it = E(yit|zit) and define the within transformed variables: y˜ it = yit y¯ i · , ˜ it = wit w ¯ i · , where y¯ i · = Ts= 1 yis /T, ¯ i · , v˜ i · and w ˜ i· ˜ it = it ¯ i · , v˜ it = vit v¯ i · and w are similarly defined. The IV–Within estimator is given by ˜ ˜ w˜ ˜ v) 1v˜ w ˜ w(˜ ˜ y ). (2.17) ˜ IVW = (˜vw When the model is just identified, we have ˜ ˜ v) 1(˜vw) ˜ 1v˜ w ˜ w(˜ ˜ y ). ˜ IVW = (w˜ ˜ ˜ y ). = (w˜ ˜ v) 1w(˜
(2.18)
The within type estimator has the advantage of being computationally simple because it does not require one to estimate the error variance . B. Feasible Estimators The estimators ˜ IVO, ˜ IVG and ˜ IVW discussed above are not feasible, because the conditional mean functions E(y|z), E(x|z) and E(w|z) as well as , are unknown. The feasible estimators can be obtained by replacing the unknown conditional mean functions by their non-parametric estimators, such as the non-parametric kernel estimators, and replacing 21 and 2 by consistent estimators of them. Following Robinson (1988), we use a kernel estimation method to estimate the unknown conditional expectations. Specifically we denote the kernel ˆ it, respectively, estimators of f(zit), E(yit|zit), E(xit|zit), E(wit|zit) by ˆfit, yˆ it, xˆ it and w where
ˆfit = 1 d NTh
j
s
Kit, js,
(2.19)
Semiparametric Dynamic Panel Data Model
yˆ it =
xˆ it =
1 NThd 1 NThd
303
j
s
j
s
yjsKit, js / ˆfit,
(2.20)
xjsKit, js / ˆfit,
(2.21)
wjsKit, js / ˆfit,
(2.22)
and w ˆ it =
1 NThd
j
s
where Kit, js = K((zit zjs)/h), K( · ) is the kernel function and h is the smoothing parameter. Note that when xit = yit 1, we have ˆ it 1|zit) = (NThd) 1 xˆ it = E(y
j
yjs 1 Kit, js / ˆfit,
s
(2.23)
ˆ it 1|zit 1) = (NThd) 1 yjs 1 Kit 1, js 1 / which is different from yˆ it 1 = E(y j s ˆfit 1. We estimate vit xit E(xit|zit) by xit xˆ it and we estimate wit it E(it|zit) by it ˆ it, where ˆ it = (NThd) 1
j
js Kit, js / ˆfit,
(2.24)
s
is the kernel estimator of E(it|zit). In vector-matrix notation, the feasible IV–OLS estimator of is obtained from (2.10) by replacing E(yit|zit), vit = xit E(xit|zit) and wit = it E(it|zit) by their kernel estimators yˆ it, xit xˆ it and it ˆ it, respectively, ˆ IVO = [(x xˆ )( ˆ )( ˆ )(x xˆ )] 1(x xˆ )( ˆ )( ˆ )(y yˆ ). (2.25) Similarly, we have ˆ 1( ) ˆ 1( )] ˆ 1(x xˆ )} 1 ˆ [( ) ˆ ˆ 1( ) ˆ ˜ IVG = {(x xˆ ) ˆ
(x xˆ ) where ˆ
1
ˆ 1(y yˆ ), ( ˆ )[( ˆ ) ˆ 1( ˆ )] 1( ˆ )
1
(2.26)
1
is a consistent estimator of given by ˆ 1, ˆ 1 = IN
(2.27)
304
M. DOUGLAS BERG, QI LI & AMAN ULLAH
with ˆ 1 = (1/ˆ 2)ET + (1/ˆ 21)J¯ T,
(2.28)
ˆ = uˆ (IN ET)ˆu/[N(T 1)]
(2.29)
ˆ = Tˆ + ˆ , ˆ = uˆ (IN J¯ T)ˆu/N,
(2.30)
2
2 1
2
2
2
and uˆ is of dimension n 1 with a typical element given by uˆ it = yit yˆ it (xit xˆ it) ˆ IVO.
(2.31) (2.32)
For the feasible semiparametric IV within estimator, we will use the same tilde notation to denote the feasible quantity to avoid introducing too many new notations. For example we use v˜ it to denote kernel estimator of vit v¯ i · . Recall that vit = xit E(xit|zit). Hence we have
T
1 v˜ it = (xit xˆ it) T
(xis xˆ is).
(2.33)
s=1
Similarly, recall that wit = it E(it|zit) and it = E(yit|zit), we have 1 w ˜ it = (it ˆ it) T
(is ˆ is),
(2.34)
1 ˜ it = ˆ it T
ˆ is.
(2.35)
T
and
s=1
T
s=1
y˜ it remains the same as y˜ it = yit y¯ i · . With the notations given in (2.33) to (2.35), we obtain the feasible semiparametric IV–Within estimator, ˜ w˜ ˜ v) 1v˜ w ˜ w(˜ ˜ y ˜ ). (2.36) ˜ IVW = (˜vw In the next section we compare the finite sample performances of the new estimators proposed in this paper with those suggested by Li & Stengos (1996) and Li & Ullah (1998) via Monte Carlo simulations.
III. MONTE CARLO RESULTS We use the following data generating process (DGP): yit = yit 1 + zit + z2it + i + it = yit 1 + (zit) + i + it,
(2.37)
Semiparametric Dynamic Panel Data Model
305
where zit is independent and uniformly distributed in the interval of [ 3,3], it is i.i.d. N(0,1). We choose = 0.5, = 0, 0.5, 1. We fix total variance of 2 + 2 = 10 and vary = 2/(2 + 2) to be 0.2, 0.5, 0.8. We choose it = zit 1 as IV for yit 1. For comparison we also compute the following non-IV semiparametric estimators: (I) A semiparametric OLS estimator given by ˆ OLS = [(x xˆ )(x xˆ )] 1(x xˆ )(y yˆ ).
(2.38)
(II) A semiparametric GLS estimator defined by ˆ 1(y yˆ ). ˆ GLS = [(x xˆ ) ˆ 1(x xˆ )] 1(x xˆ )
(2.39)
(III) A semiparametric within estimator
ˆ W = [˜v˜v] 1v˜ ˜y,
(2.40)
T
where v˜ it = xit xˆ it (1/T) y˜ it = yit (1/T)
(xis xˆ is) is the same as defined in (2.33) and
s=1
T
yis.
s=1
(I)–(III) do not use instrumental variables and hence these estimators are expected to have large bias because they ignore the fact that yit and uit are correlated. However, they are also expected to have smaller variances compared with the IV estimators. Therefore, for small and moderate samples, their mean square error (MSE) are not necessarily larger than the semiparametric IV estimators. Of course when the sample size is sufficiently large, we expect the semiparametric IV estimators to have smaller MSE because after all, they are consistent estimators, while the non-IV estimators are inconsistent. The bias of non-IV estimators will not die out as the sample size increases. We report estimated bias, standard deviation (Std) and root mean square errors (Rmse) for all the estimators. These are computed via
M
ˆ = M1 Bias( )
j=1
M
( ˆ j ),
ˆ = M1 Std( )
j=1
ˆ 2 ( ˆ j Mean( ))
1/2
and
M
ˆ = {M 1 Rmse( )
( ˆ j )2}1/2, where M is the number of replication and ˆ j
j=1
is the estimated value of at the jth replication. We use M = 2000 in all the simulations. We choose T = 6 and N = 50, 100, 200, 500.
306
M. DOUGLAS BERG, QI LI & AMAN ULLAH
The simulation results are given in Tables 1 and 2. The smallest Rmse for each case (for a given N and ) is shown as boldface number(s). The simulations results are qualitatively similar for = 0, = 0.5 and = 1. Therefore, we only report the cases of = 0 and = 1 to save space. Table 1 reports the result for = 0. From Table 1 we see that the non-IV estimators: ˆ OLS, ˆ GLS and ˆ W have large bias because these estimators ignore the fact that yit 1 is correlated with uit. However, these non-IV estimators all have smaller standard deviations (or variances) than the semiparametric IV estimators. When N is small (N ≤ 100) and with small to moderate values of ( ≤ 0.5), ˆ GLS has the smallest Rmse among all the estimators. For N ≤ 100 with = 0.8, ˆ GLS is no longer the best because of the large bias due to the strong individual effects. In this case ˆ IVG and ˆ IVW have the smallest Rmse. For N = 200 and N = 500 and for small = 0.2, ˆ IVO has the smallest Rmse. But larger values of ( = 0.5, 0.8), ˆ IVG and ˆ IVW become the best in terms of the Rmse criterion. For N ≤ 100 and ≤ 0.5 ˆ GLS has the smallest Rmse. However, for = 0.8, the bias in ˆ GLS is very large and hence its Rmse is much larger than the IV estimators. ˆ IVG and ˆ IVW have the smallest Rmse for = 0.8. As N increases, the bias in ˆ OLS, ˆ GLS and ˆ W remain the same order as expected. The variances of the IV estimators decrease as N increases, and as a result, the IV estimators dominate the non-IV estimators when N ≥ 200. For = 0.2, IV–OLS estimator has the smallest Rmse. For = 0.5 and = 0.8, IV– GLS and IV–Within estimators have much smaller Rmse compared with the IV–OLS estimator. The IV–OLS estimator ignores the one-way error component structure. Hence when the individual effects are large, IV–OLS’s performance is expected to be worse than that of the IV–GLS estimator. We observe, as expected, the bias of non-IV estimators increase as increases. We also observe that the Rmse for IV–OLS estimator remain the same for different values of , while for IV–GLS and IV–Within estimators, the Rmse decrease as increases. Next, we observe that the results of Table 2 is very similar to that of Table 1. That is, the result is not sensitive to the different functional form of (zit). This is as expected because all the estimators are semiparametric and hence they are robust to functional form specifications of ( · ). The DGP given in (2.37) is a just identified model. We have also conducted some simulations for over identified model. In particular, we consider the following model
Semiparametric Dynamic Panel Data Model
Table 1.
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
ˆ OLS ˆ GLS ˆ W IVO ˆ IVG ˆ IVW
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
307
The case of = 0.
Rmse
N = 50 = 0.5 Bias Std
Rmse
Bias
0.198 0.117 0.248 0.291 0.215 0.225
0.352 0.099 –0.213 –0.042 –0.008 –0.009
0.030 0.059 0.057 0.329 0.171 0.174
0.353 0.115 0.220 0.331 0.171 0.174
0.442 0.310 –0.136 –0.128 –0.012 –0.013
= 0.2 Bias Std
Rmse
N = 100 = 0.5 Bias Std
Rmse
Bias
0.196 –0.104 –0.243 –0.008 –0.006 –0.006
0.199 0.111 0.246 0.139 0.146 0.151
0.354 0.100 –0.220 –0.023 –0.007 –0.008
0.021 0.040 0.040 0.158 0.117 0.118
0.355 0.108 0.223 0.159 0.117 0.118
0.443 0.312 –0.154 –0.049 –0.009 –0.010
= 0.2 Bias Std
Rmse
N = 200 = 0.5 Bias Std
Rmse
Bias
0.198 –0.105 –0.244 –0.004 –0.004 –0.005
0.200 0.108 0.246 0.097 0.103 0.105
0.356 0.100 –0.224 –0.010 –0.005 –0.006
0.015 0.029 0.029 0.101 0.083 0.084
0.356 0.104 0.226 0.101 0.083 0.084
0.444 0.312 –0.166 –0.016 –0.007 –0.007
= 0.2 Bias Std
Rmse
N = 500 = 0.5 Bias Std
Rmse
Bias
0.199 –0.105 –0.245 –0.001 –0.006 –0.006
0.200 0.106 0.245 0.058 0.065 0.067
0.357 0.100 –0.227 –0.003 –0.006 –0.006
0.357 0.101 0.228 0.057 0.053 0.053
0.444 0.311 –0.176 –0.004 –0.006 –0.006
= 0.2 Bias Std 0.193 –0.103 –0.241 –0.019 –0.006 –0.005
0.045 0.056 0.058 0.290 0.215 0.225
0.031 0.039 0.041 0.139 0.146 0.150
0.021 0.027 0.029 0.097 0.103 0.105
0.014 0.017 0.019 0.058 0.065 0.066
0.009 0.018 0.018 0.057 0.052 0.053
= 0.8 Std 0.016 0.040 0.061 2.39 0.111 0.111
= 0.8 Std 0.011 0.027 0.042 0.528 0.077 0.076
= 0.8 Std 0.008 0.020 0.029 0.106 0.054 0.054
= 0.8 Std 0.005 0.013 0.018 0.057 0.034 0.034
Rmse 0.442 0.313 0.149 2.39 0.112 0.112
Rmse 0.443 0.313 0.160 0.530 0.077 0.077
Rmse 0.444 0.312 0.168 0.107 0.055 0.055
Rmse 0.444 0.311 0.177 0.058 0.034 0.034
308
M. DOUGLAS BERG, QI LI & AMAN ULLAH
Table 2.
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
ˆ OLS ˆ GLS ˆ W ˆ IVO ˆ IVG ˆ IVW
The case of = 1.
Rmse
N = 50 = 0.5 Bias Std
Rmse
Bias
0.196 0.117 0.244 0.302 0.215 0.225
0.348 0.092 –0.208 –0.045 –0.008 –0.009
0.031 0.058 0.057 0.341 0.171 0.174
0.350 0.109 0.216 0.344 0.172 0.174
0.438 0.298 –0.132 –0.168 –0.012 –0.013
= 0.2} Bias Std
Rmse
N = 100 = 0.5 Bias Std
Rmse
Bias
0.194 –0.104 –0.238 –0.008 –0.006 –0.006
0.196 0.111 0.242 0.139 0.146 0.150
0.351 0.094 –0.214 –0.023 –0.007 –0.008
0.021 0.040 0.040 0.156 0.117 0.118
0.352 0.102 0.218 0.158 0.118 0.119
0.440 0.299 –0.148 –0.042 –0.009 –0.010
= 0.2 Bias Std
Rmse
N = 200 = 0.5 Bias Std
Rmse
Bias
0.196 –0.104 –0.240 –0.004 –0.004 –0.005
0.197 0.108 0.241 0.097 0.103 0.105
0.353 0.093 –0.218 –0.010 –0.005 –0.006
0.015 0.029 0.028 0.101 0.083 0.084
0.353 0.097 0.220 0.101 0.083 0.084
0.441 0.298 –0.158 –0.016 –0.007 –0.007
= 0.2 Bias Std
Rmse
N = 500 = 0.5 Bias Std
Rmse
Bias
0.197 –0.105 –0.240 –0.001 –0.006 –0.006
0.197 0.106 0.241 0.058 0.065 0.067
0.353 0.092 –0.221 –0.003 –0.006 –0.006
0.353 0.094 0.222 0.057 0.053 0.053
0.441 0.297 –0.167 –0.004 –0.006 –0.006
= 0.2 Bias Std 0.190 –0.104 –0.237 –0.021 –0.006 –0.005
0.045 0.055 0.058 0.301 0.215 0.225
0.031 0.039 0.041 0.139 0.146 0.150
0.021 0.027 0.029 0.097 0.103 0.105
0.013 0.017 0.019 0.058 0.065 0.066
0.009 0.018 0.018 0.057 0.052 0.053
= 0.8 Std 0.016 0.041 0.059 3.53 0.112 0.111
= 0.8 Std 0.012 0.028 0.041 0.243 0.077 0.077
= 0.8 Std 0.008 0.021 0.028 0.106 0.054 0.054
= 0.8 Std 0.005 0.013 0.018 0.057 0.034 0.034
Rmse 0.439 0.301 0.144 3.53 0.112 0.112
Rmse 0.440 0.301 0.153 0.246 0.077 0.077
Rmse 0.441 0.299 0.161 0.107 0.055 0.055
Rmse 0.441 0.298 0.168 0.058 0.035 0.035
Semiparametric Dynamic Panel Data Model
309
yit = yi,t 1 + z1,it + 1z1,it + z2,it + 2z2,it + i + it = yi,t 1 + (z1,it,z2,it) + i + it. (2.41) The simulation results for the above over identified model lead to the same conclusion as the just identified model. Therefore, we do not report the results for the over identified case to save space. However, the results are available from the authors upon request.
IV. CONCLUDING REMARKS In this chapter we consider the problem of estimating a semiparametric partially linear panel data model with errors that has a one-way error components structure. We propose two new semiparametric IV estimator for the coefficient of the parametric component, and we argue that the new semiparametric estimators are more efficient than the ones suggested by Li & Stengos (1996) and Li & Ullah (1998) because the new estimators make full use of the one-way error components structure. The Monte Carlo simulation results confirm our theoretical analysis. Throughout the chapter we assume the existence of random individual effects. In practice one may want to test the existence of random individual effects. For this purpose one can use the test statistic suggested by Li & Hsiao (1998) for testing the null of no random individual effects in a partially linear dynamic panel data model. Also in this chapter we only consider the case that i is a random effect. We now briefly discuss the case of fixed effects semiparametric partially linear models. The model is the same as given in (2.1) and (2.2) except that now we assume the individual effect i is a fixed effect rather than a random effect. The semiparametric IV–OLS and IV–GLS estimators that either ignore the fixed effects or treat the fixed effects as random effects will not lead to consistent estimation results by the same reason as in the parametric regression model case. However, the semiparametric within estimator, which wipes out the individual effects whether it is fixed or random, remains a consistent estimator in the case of a fixed effect model. Our Monte Carlo results of Section 3 show that the within semiparametric estimator ˆ IVW performs quite well relative to other estimators. Therefore, we recommend its use in practice.
ACKNOWLEDGMENTS We would like to thank a referee and Badi Baltagi for very useful comments that greatly improve the paper. Q. Li’s research is supported by Natural
310
M. DOUGLAS BERG, QI LI & AMAN ULLAH
Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, Ontario Premier’s Research Excellence Awards, and Bush program in economics on public policy. A. Ullah thanks the Academic Senate of UCR for the research support.
NOTES 1. Li & Ullah (1998) reported some Monte Carlo results on a static semiparametric panel data model. They also proposed two semiparametric instrumental variable estimators for a semiparametric dynamic panel data model, but they did not conduct any Monte Carlo simulations on the dynamic model. 2. Using the simple spectral decomposition method to derive the inverse of was proposed by Wansbeek & Kapteyn (1982, 1983).
REFERENCES Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 5–27. Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models With Error Components. Journal of American Statistical Association, 76, 598–606. Arellano, M., & Bover, O. (1995). Another Look at The Instrumental Variable Estimation of Error Components Models. Journal of Econometrics, 68, 28–51. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: Wiley. Baltagi, B. H., & Griffin, J. M. (1997). Pooled Estimators vs. Their Heterogeneous Counterparts in The Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303–327. Chamberlain, G. (1984). Panel Data. In: Z. Griliches & M. Intriligator (Eds), Handbook of Econometrics (pp. 1247–1318 ), Vol. II. Amsterdam: North Holland. Donald, S. G., & Newey, W. K. (1994). Series Estimation of Semilinear Regression. Journal of Multivariate Analysis, 50, 30–40. Engle, R. F., Granger, C. W. J., Rice, J., & Weiss, A. (1986). Semiparametric Estimates of The Relationship Between Weather and Electricity Sales. Journal of the American Statistical Association, 81, 310–320. Hsiao, C. (1986). Analysis of Panel Data. Econometric Society monograph No. 11. New York: Cambridge: Cambridge University Press. Khanna, M., Mundra, K., & Ullah, A. (1999). Parametric and Semiparametric Estimation of The Effect of Firm Attributes on Efficiency: The Electricity Generating Sector in India. Journal of International Trade and Economic Development, forthcoming. Kiviet, J. F. (1995). On Bias, Inconsistency and Efficiency of Some Estimators in Dynamic Panel Data Models. Journal of Econometrics, 68, 53–78. Li, Q. (1996). On The Root-n-consistent Semiparametric Estimation of Partially Linear Models. Economics Letters, 51, 277–285. Li, Q., & Hsiao, C. (1998). Testing Serial Correlation in Semiparametric Panel Data Models. Journal of Econometrics, 87, 207–237. Li, Q., & Stengos, T. (1996). Semiparametric Estimation of Partially Linear Panel Data Models. Journal of Econometrics, 71, 389–397.
Semiparametric Dynamic Panel Data Model
311
Li, Q., & Ullah, A. (1998). Estimating partially linear models with one-way error components. Econometric Reviews, 17, 145–166. Matyas, L., & Sevestre, P. (1992). The Econometrics of Panel Data. Dordrecht: Kluwer, 2nd edition. Pesaran, M. H., & Smith, R. (1995). Estimation of Long-run Relationship From Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79–114. Robinson, P. M. (1988). Root-N-consistent Semiparametric Regression. Econometrica, 56, 931–954. Stock, J. H. (1989). Nonparametric Policy Analysis. Journal of the American Statistical Association, 84, 567–575. Ullah, A., & Roy, N. (1998). Nonparametric and Semiparametric Econometrics of Panel Data. In: A. Ullah and D. E. A. Giles (Eds), Handbook on Applied Economic Statistics (pp. 579– 604), Ch. 17. Marcel Dekker. Ullah, A., & Mundra, K. (1999). Semiparametric Panel Data Estimation: An Application to Immigrates Homelink Effect on U.S. Producer Trade Flows. Working paper 15, Department of Economics, University of California at Riverside. Wansbeek, T. J., & Kapteyn, A. (1982). A Simple Way to Obtain the Spectral Decomposition of Variance Components Models for Balanced Data. Communications in Statistics, A11, 2105–2112. Wansbeek, T. J., & Kapteyn, A. (1983). A Note on Spectral Decomposition and Maximum Likelihood Estimation of ANOVA Models With Balanced Data. Statistics and Probability Letters, 1, 213–215. White, H. (1984). Asymptotic Theory for Econometricians. New York: Academic Press. White, H. (1986. Instrumental Variables Analogs of Generalized Least Squares Estimator. R. S. Mariano (Ed.), Advances in Statistical Analysis and Statistical Computing (pp.173–277), Vol.1. New York: JAI Press.
APPENDIX /** This is a gauss program using Monte Carlo simulation to examine the finite sample performanes of some semiparametric instrumental variable estimators in a semiparametric dynamic panel data model, written by M. Douglas Berg **/ output file = c:\gauss\doug\work1.out reset; format /rd 8,3; n = 100; T = 6; T00 = 30; T0 = T + T00 + 1; NT = N*T; nr = 500; @ number of replication @ lamt = 0.5; b1 = 1; b2 = 0; sig2 = 10; rho = 0.8; sigmu2 = rho*sig2; signu2 = (1-rho)*sig2; sigmu = sqrt(sigmu2); signu = sqrt(signu2); s1_5 = sqrt(t*sigmu2 + signu2); sv_5 = signu; @ true parameter values @ ycz = zeros(nt,1); y1cz = ycz; z1cz = ycz; fz = ycz;
312
M. DOUGLAS BERG, QI LI & AMAN ULLAH
kel = zeros(nt,1); lam1 = zeros(nr,1); lam3 = lam1; lam1n = lam1; lam3n = lam1; lam4n = lam1; lam6n = lam1; y0 = zeros(n,t0); rndseed 7893450; i1 = 1; do while i1 < = nr; z0 = 2*sqrt(3)*rndu(n,t0) – sqrt(3); u0 = rndn(n,t0); mu = rndn(n,1);
@ Monte Carlo simulation loop @
i2 = 2; do while i2 < = t0; y0[.,i2] = lamt*y0[.,i2–1] + b1*z0[.,i2] + b2*z0[.,i2]2^ + signu*u0[.,i2] + sigmu*mu; i2 = i2 + 1; endo;
@ Generate y @
y = y0[.,T00 + 1:T00 + T]; y1 = y0[.,T00:T00 + T–1]; z = z0[.,T00 + 1:T00 + T]; z1 = z0[.,T00:T00 + T–1]; yv = reshape( y, nt, 1 ); y1v = reshape( y1, nt, 1 ); zv = reshape( z, nt, 1 ); z1v = reshape( z1, nt, 1 ); hz = stdc(zv)*(nt^(–1/5)); hz1 = stdc(z1v)*(nt^(–1/5)); zvh = zv/hz; z1vh = z1v/hz1; i3 = 1; do while i3 < = nt; @ Nonparametric Estimation Loop @ zd = zvh[i3,.] – zvh; z1d = z1vh[i3,.] – z1vh; ^ )/sqrt(2*pi); kelz = prodc( (exp(–0.5*zd2))’ kelz1 = prodc( (exp(–0.5*z1d2^))’ )/sqrt(2*pi); ycz[i3,.] = yv’*kelz/(nt*hz); y1cz[i3,.] = y1v’*kelz/(nt*hz); z1cz[i3,.] = z1v’*kelz/(nt*hz); fz[i3,.] = sumc( kelz )/(nt*hz); i3 = i3 + 1; endo; w1v = z1v – z1cz./fz; xxv = y1v – y1cz./fz; yyv = yv – ycz./fz;
@ Li-Ullah, Li-Stengos IV @
Semiparametric Dynamic Panel Data Model
lam1[i1,.] = inv(w1v’*xxv)*w1v’*yyv; lam3[i1,.] = inv(xxv’*xxv)*xxv’*yyv; u01 = yyv – xxv*lam1[i1,.]; u03 = yyv – xxv*lam3[i1,.]; Jbt = ones(t,t)/t; Et = eye(t) – Jbt; u11 = Et*( (reshape( u01,n,t))’ ); u11 = reshape( u11’,nt,1 ); sv2 = u11’*u11/(n*(t–1)); u22 = Jbt*( (reshape(u01,n,t))’ ); u22 = reshape( u22’,nt,1 ); smu2 = u22’*u22/n; s12 = sv2 + t*smu2; sv_1 = sqrt( sv2 ); s1_1 = sqrt( s12 ); u11 = Et*( (reshape( u03,n,t))’ ); u11 = reshape( u11’,nt,1 ); sv2 = u11’*u11/(n*(t–1)); u22 = Jbt*( (reshape(u03,n,t))’ ); u22 = reshape( u22’,nt,1 ); smu2 = u22’*u22/n; s12 = sv2 + t*smu2; sv_3 = sqrt( sv2 ); s1_3 = sqrt( s12 ); At_1 = Jbt/s1_1 + Et/sv_1; At_3 = Jbt/s1_3 + Et/sv_3; At_5 = Jbt/s1_5 + Et/sv_5; At_w = Et; yyn_1 = At_1*( (reshape(yyv,n,t))’ ); yyn_3 = At_3*( (reshape(yyv,n,t))’ ); yyn_6 = At_w*( (reshape(yyv,n,t))’ ); xxn_1 = At_1*( (reshape(xxv,n,t))’ ); xxn_3 = At_3*( (reshape(xxv,n,t))’ ); xxn_6 = At_w*( (reshape(xxv,n,t))’ ); w1n_w = At_w*( (reshape(w1v,n,t))’ ); w1n = At_1*( (reshape(w1v,n,t))’ ); yyv_1 = reshape(yyn_1’,nt,1); yyv_3 = reshape(yyn_3’,nt,1);
313
@ IV-OLS estimator @ @ Semi-OLS estimator @
314
M. DOUGLAS BERG, QI LI & AMAN ULLAH
yyv_6 = reshape(yyn_6’,nt,1); xxv_1 = reshape(xxn_1’,nt,1); xxv_3 = reshape(xxn_3’,nt,1); w1v_w = reshape(w1n_w’,nt,1); xxv_6 = reshape(xxn_6’,nt,1); w1v = reshape(w1n’,nt,1); lam1n[i1,.] = inv(w1v’*xxv_1)*w1v’*yyv_1; @ IV-GLS estimato @ lam3n[i1,.] = inv(xxv_3’*xxv_3)*xxv_3’*yyv_3; @ Semi-GLS estimator @ lam4n[i1,.] = inv(w1v_w’*xxv_6)*w1v_w’*yyv_6; @ IV-Within estimator @ lam6n[i1,.] = inv(xxv_6’*xxv_6)*xxv_6’*yyv_6; @ Semi-Within est. @ i1 = i1 + 1; endo; Bias1 = meanc( lam1 – lamt ); @ Bias @ Bias3 = meanc( lam3 – lamt ); rmse1 = sqrt( meanc( (lam1-lamt)2^ ) ); @ Root-MSE @ rmse3 = sqrt( meanc( (lam3-lamt)2^ ) ); std1 = stdc(lam1); @ Standard Dev. @ std3 = stdc(lam3); Bias1n = meanc( lam1n – lamt ); Bias3n = meanc( lam3n – lamt ); Bias4n = meanc( lam4n – lamt ); Bias6n = meanc( lam6n – lamt ); rmse1n = sqrt( meanc( (lam1n-lamt)2^ ) ); rmse3n = sqrt( meanc( (lam3n-lamt)2^ ) ); rmse4n = sqrt( meanc( (lam4n-lamt)2^ ) ); rmse6n = sqrt( meanc( (lam6n-lamt)2^ ) ); std1n = stdc(lam1n); std3n = stdc(lam3n); std4n = stdc(lam4n); std6n = stdc(lam6n); print "********************************************************"; print "IVO1, bias1, std1, rmse1 = " bias1 std1 rmse1; print "OLS, bias3, std3, rmse3 = " bias3 std3 rmse3; print "********************************************************"; print "IVG1, bias1n, std1n, rmse1n = " bias1n std1n rmse1n; print "GLS, bias3n, std3n, rmse3n = " bias3n std3n rmse3n; print "********************************************************"; print "With1, bias4n, std4n, rmse4n = " bias4n std4n rmse4n;
Semiparametric Dynamic Panel Data Model
315
print "With, bias6n, std6n, rmse6n = " bias6n std6n rmse6n; print "********************************************************"; end;
SMALL SAMPLE PERFORMANCE OF DYNAMIC PANEL DATA ESTIMATORS IN ESTIMATING THE GROWTH-CONVERGENCE EQUATION: A MONTE CARLO STUDY Nazrul Islam ABSTRACT This chapter conducts a Monte Carlo investigation into small sample properties of some of the dynamic panel data estimators that have been applied to estimate the growth-convergence equation using SummersHeston data set. The results show that the OLS estimation of this equation is likely to yield seriously upward biased estimates. However, indiscriminate use of panel estimators is also risky, because some of them display large bias and mean square error. Yet, there are panel estimators that have much smaller bias and mean square error. Through a judicious choice of panel estimators it is therefore possible to obtain better estimates of the parameters of the growth-convergence equation. The growth researchers may make use of this potential.
I. INTRODUCTION One of the issues around which the recent growth literature has evolved is that of convergence. This refers to the idea that, because of diminishing returns to Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 317–339. Copyright © 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
317
318
NAZRUL ISLAM
capital, poorer economies should grow faster and catch up with the richer ones. Statistically, convergence is therefore interpreted as a negative correlation between the initial level of income and the subsequent growth rate. Accordingly, a popular method for testing the convergence hypothesis has been to run growth-initial level regressions or growth-convergence regressions, where subsequent growth rates are regressed on initial levels of income. For a long time, growth-convergence regressions were estimated using crosssection data. However, recently researchers have drawn attention to the fact that the growth-convergence equation actually represents a dynamic panel data model, and by ignoring the individual effects, cross-section estimation courts omitted variable bias (OVB). Thus, Islam (1993, 1995) argues for using panel procedures to overcome this bias and in particular implements Chamberlain’s (1982, 1983) Minimum Distance (MD) procedure to estimate the equation. Knight et al. (1993) make similar arguments and also use the Minimum Distance procedure to produce similar results. Islam, in addition, presents results from the Least Squares with Dummy Variables (LSDV) procedure. Since these initial works, panel estimation of the growth-convergence equation has spread considerably. For example, Lee, Pesaran & Smith (1997, 1998) consider maximum likelihood estimation of the growth-convergence equation using panel data. Caselli et al. (1996) emphasize the problem of endogeneity in this equation and use the Arellano-Bond GMM panel procedure to overcome the problem. Barro (1997) and Barro & Sala-i-Martin (1995) use pooled estimation on panel data sets. Lee et al. (1998) also present evidence on panel estimation of the growth convergence equation. The panel estimates presented in these papers generally differ from corresponding cross-section estimates. However, they also differ among themselves. Nerlove (1999) highlights this by using a variety of panel estimators to estimate the growth-convergence equation and compiling the results. Similar findings were presented earlier in Islam (1993). This creates a problem of choosing among various panel estimators. Unfortunately, theoretical properties of dynamic panel data estimators are generally asymptotic and often equivalent. This creates the necessity of Monte Carlo studies to ascertain the small sample properties of these estimators. However, Monte Carlo studies are more useful when they are customized to the specification and the data set that are used in actual estimation. Although many researchers have recently presented Monte Carlo evidence on small sample properties of dynamic panel estimators, studies focusing on the growth-convergence equation and using the Summers-Heston (1988, 1991) data set are rare. This chapter tries to help overcome this lacking. The study focuses on those estimators that have been used so far to estimate the growth-convergence
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
319
equation. Accordingly, the estimators included are: least squares with dummy variables (LSDV); the two instrumental variable estimators of Anderson & Hsiao (1981, 1982), namely AH(l), based on ‘level’ instruments, and AH(d), based on ‘difference’ instruments; the minimum distance (MD) estimator, suggested by Chamberlain (1982, 1983); and the one-step (ABGMM1) and two-step (ABGMM2) generalized method of moments estimators proposed by Arellano & Bond (1991). In addition, the exercise includes simultaneous equations (SE) estimators such as the two stage least squares estimator (2SLS), the three stage least squares estimator (3SLS), and the generalized three stage least squares estimator (G3SLS). To complete the picture, the study also includes the (pooled) ordinary least squares (OLS) estimator, which ignores the individual effects. The two main parameters of the model are the dynamic adjustment parameter (attached to the lagged dependent variable) and , the parameter of the exogenous variable. The Monte Carlo results show that the OLS estimates of are, as expected, positively biased, and the magnitude of this bias averages to about seventeen percent of the true parameter value. For most of the panel estimators, the direction of bias is negative, with only the AH(d) estimator providing some exceptions. The bias is small for the AH(d), the LSDV, and the MD estimators, ranging between five and six percent. The bias of the 2SLS, 3SLS, and 3SGLS estimates of ranges between eight to ten percent. The largest bias is observed for the ABGMM estimators, averaging to twenty two percent. The AH(l) estimator perform so poorly that we refrain from reporting its results. The results regarding root mean square error (RMSE) demonstrate a similar pattern. The average RMSE as percentage of the true value of proves to be seventeen percent for the OLS estimator. For the LSDV and the MD estimators, this percentage ranges between six and seven. For the AH(d), 2SLS, 3SLS, and 3GSLS estimators, it ranges between ten and twenty. This percentage is the highest for the ABGMM estimators, ranging between forty to forty-six percent. With regard to , the bias of the OLS estimates is again positive, but now averages much higher to forty-eight percent of the parameter value. The direction of bias of the panel estimates of is quite mixed. However, panel estimates of are on average quite close to the true parameter value. The magnitude of the algebraic average of the bias for the 2SLS, 3SLS, LSDV and the MD estimator remain under one percent. For AH(d) and G3SLS it ranges between one and two percent. For the ABGMM estimates, this percentage is higher but still within five to seven percent.
320
NAZRUL ISLAM
The RMSE results for display a similar ranking of performance. However, the smallness of bias in estimation of is nullified greatly by large variance of the estimates. As a result, the RMSE values for are in general much higher than for . For a good number of panel estimators, which include AH(d), 2SLS, and 3SLS, the RMSE remain under thirty-five percent of true value of . For the LSDV and the MD, this percentage is under twenty-five. However, for 3GSLS, this percentage is fifty-six. For the ABGMM it is around two hundred percent. For the OLS the ratio is fifty-six percent. The results indicate that the OLS estimation of the growth-convergence equation is very likely to give considerably biased results. However, indiscriminate use of panel estimators is risky too. Yet, there are panel estimators that have much smaller bias and RMSE than the OLS. Hence, a judicious choice of panel estimator has the potential to yield much better estimates of the parameters of the growth convergence equation. Growth researchers may make use of this potential. In addition to the above, several general points emerge from this study. First, the performances of the two AH estimators contrast sharply. The source of this contrast lies in different degree of correlation of the instruments with the instrumented variables. This highlights the importance of research into estimation with ‘weak’ instruments. Second, a comparison of the ABGMM1 results with that of ABGMM2 and of 2SLS results with that of either 3SLS or 3GSLS shows that simpler estimators not requiring estimated weighting matrices may perform better than sophisticated estimators that do require such matrices. Use of estimated weighting matrices creates avenue for unwarranted noise to enter into estimation. Third, increasing the number of instruments may not necessarily improve estimation results. This is revealed by the poor performance of the ABGMM estimators compared to that of AH(d). Fourth, theoretically inconsistent estimators can display good small sample performance. The performance of the LSDV estimator, which is inconsistent in the direction of N, illustrates this. Finally, the results of this chapter are in general agreement with other recent Monte Carlo studies, which have also reported large bias of the ABGMM estimators and better performance of the LSDV estimator. The discussion of the chapter is organized as follows. Section 2 reviews previous Monte Carlo studies of dynamic panel estimators and specifies the objectives of the current study. Section 3 presents the model and discusses the data generation processes. Section 4 presents the results. Section 5 contains some concluding remarks.
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
321
II. PREVIOUS MONTE CARLO STUDIES Much of the recent empirical research on growth has revolved around estimation of the growth-convergence equation. A close inspection of this equation shows that it is actually a dynamic panel data model.1A cross-section estimation of the equation therefore suffers from omitted variable bias. This has led to panel estimation. Different panel estimators have however produced different results. Theoretical properties of many of these estimators are asymptotic and equivalent. Hence, Monte Carlo evidence is necessary to gauge which of these estimates are more acceptable. The issue of small sample properties of dynamic panel estimators is not new. Earlier, the gas demand study by Balestra & Nerlove (1966) also raised this issue. This led Nerlove to conduct several Monte Carlo studies. Nerlove (1967) considers a simple auto-regressive model with no exogenous variable and compares the performance of the OLS, LSDV, MLE, and several variants of the GLS estimator in estimating the model. In Nerlove (1971), the dynamic panel model is extended to include an exogenous variable. This allows consideration of instrumental variable (IV) estimator with lagged values of the exogenous variable as instrument. It also allows having another variant of the two-stage GLS. Overall, Nerlove’s Monte Carlo results favor the GLS estimators over other estimators. Since Nerlove’s work, there have been significant developments in the field of dynamic panel data estimators.2 Among these is introduction of the Anderson & Hsiao (1981, 1982) instrumental variable estimators that use further lagged values of the dependent variable as instruments. Arellano & Bond (1991) carry this idea further and propose using all lagged variables (provided they qualify) as instruments within a GMM framework. Ahn & Schimdt (1995, 1997, 1999), Arellano & Bover (1995), Blundell & Bond (1998), Hahn (1999), Wansbeek & Knaap (1998), and Ziliak (1997) suggest various extensions and modifications of the Arellano-Bond GMM estimator (ABGMM). On the other hand, Kiviet (1995) and Wansbeek & Knaap (1998) propose modifications of the LSDV and LIML estimators, respectively. Many of the recent works offer Monte Carlo evidence too. Thus Arellano & Bond (1991) perform a Monte Carlo study to compare primarily the small sample properties of their GMM estimators with corresponding properties of the Anderson-Hsiao estimators. According to their results, the GMM estimators perform better than the Anderson-Hsiao IV estimators, though not so much in terms of bias as in terms of dispersion. However, simulation studies of AlonsoBorrengo & Arellano (1999), Kiviet (1995), Harris & Matyas (1996), Judson & Owen (1997), Wansbeek & Knaap (1998), and Ziliak (1997) report significant
322
NAZRUL ISLAM
bias of the ABGMM estimators. Kiviet (1995) reports good performance of his bias-corrected LSDV estimator. On the other hand, Wansbeek & Knaap (1998) report better performance of a covariance-corrected instrumental variable estimator and their LIML estimator. Baltagi & Kao (2000) in this volume give an extensive survey of recent developments in dynamic panel data models. These studies have illuminated the small sample properties of various dynamic panel estimators. However, most of these studies do not focus on any particular model or data set. Ziliak (1997)’s study is probably an exception, and it focuses on a labor supply model and uses the PSID data. However, it is known that Monte Carlo results are more useful when the exercise is customized to the model whose estimation is in question and when the simulations are conducted on the basis of the data set that is actually used for estimation of the model. From this point of view there exists a void regarding the growth-convergence equation. Monte Carlo evidence on small sample performance of panel data estimators in estimating this equation is rare. This chapter tries to overcome this lacking to some extent. It focuses exclusively on the growth-convergence equation and bases the simulations on the Summers-Heston data set that has been widely used in estimating this equation. This focus also guides the choice of estimators to be included in the study. The main feature of the growth-convergence equation is that the exogenous variable of the model is correlated with the individual, country effects. This implies that panel estimators that rely on uncorrelated randomeffects assumption are not suitable for estimation of this equation. On the other hand, estimators that highlight this correlation, such as the Minimum Distance estimator of Chamberlain, may play an important role in estimating it. The study also considers several different generation mechanism of the random error term, and it considers estimation of the equation in several different samples that have widely figured in the recent growth literature. Because of its customized nature, the results of this study should be directly useful for the empirical growth researchers.
III. MODEL, PARAMETER VALUES, AND DATA GENERATION A. The Model The dynamic panel data model that arises in the convergence literature is as follows: yit = yi,t 1 + xi,t 1 + i + t + vit.
(1)
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
323
Here yit represents log of per capita GDP of country i at time t, yi,t 1 is the same lagged by one period, and xi,t 1 is the difference in log of investment and population growth rate variables of country i at time t 1. Finally, i and t are individual and time effect terms, and vit is the transitory error which varies across both individual and time. In this set up, (t–1) and t denote ‘initial’ and ‘subsequent’ periods of time, respectively. The derivation of this equation proceeds from the Cobb-Douglas aggregate production function, Yt = Kt (AtLt)1 , where Y, K, and L are output, capital, and labor respectively, and A is the labor-augmenting technology which grows exponentially at the exogenous rate g. The derivation yields the following correspondence between the coefficients of equation (1) and the structural parameters of the production function: = e
(2) 1
(3)
i = (1 e ) ln A0i
(4)
= (1 e ) t = g(t2 e
t1).
(5)
Here is the length of time between t2 and t1, where t2 and t1 correspond to t and (t–1) of equation (1), respectively. The parameter is known as the rate of convergence and is given by = (1 )(n + g + ), where n is the exponential growth rate of L, and is the rate of depreciation of capital. An important issue regarding this model is specification of the individual effect term i. The equation (4) shows that i basically stands for A0i. Mankiw, Romer & Weil (1992, p. 6) define A0i as follows: ‘The A0i term reflects not just technology but resource endowments, climate, institutions, and so on; it may therefore differ across countries’. From this definition, it is obvious that A0i is correlated with xi,t 1, which represents savings and fertility behavior in an economy. Thus equation (1) represents a dynamic panel data model with correlated effects. This shows why random-effects estimators are not appropriate for the growth-convergence equation. However, there are different ways to specify the correlation between i and xi,t 1. Mundlak (1971) proposes a simple specification whereby i is a function of x¯ i, the time mean of xi,t 1. This is however restrictive and renders the random effects model equivalent to the fixed effects model, provided the transitory error term is serially uncorrelated. Hence, a more general specification is preferable. Following Chamberlain, we adopt the following specification of i: i = 0 + 1x` i0 + 2xi1 + · · · + TxT 1 + i,
(6)
324
NAZRUL ISLAM
where i distributed as N(0, 2). Viewed as a linear predictor, this does not involve any restriction. Viewed as a conditional expectation function, the only restriction is linearity. Almost all researchers have used the Summers-Heston data set to estimate the growth-convergence equation. This data set has yearly data. However, it is generally believed that yearly data are not suitable for studying growth, because influence of business fluctuations are likely to have more role in such data. Most of the panel studies have used five-year averages/panels for estimation of the model. Accordingly, the value of in this study is set to five.3 B. Parameter Values Considered in full, the model presented in equation (1) and (6) has three sets of parameters. The first consists of the auto-regressive parameter and the slope parameter . These are the main parameters of interest. The second set consists of 0, 1, . . . . , T , which arise from specification of the individual effect term i. In addition, this set includes the time effect terms, t’s. The third set consists of parameters which govern the error terms vit and i. An important issue in data generation is specification of the transitory error term vit. A value of five implies that vits are five years apart. However, some possibility of serial correlation in vit still remains. Accordingly we allow for the following three possibilities: 1. UC (serially Uncorrelated) process: vit ~ N(0, 2v). 2. MA (1) process: vit = it + i,t 1, with ~ N(0, 2 ). 3. AR (1) process: vit = vi,t 1 + , with ~ N(0, 2 ). There are two reasons for limiting the order of MA and AR processes to one. First, given that vit’s are five calendar years apart, orders greater than one are not very plausible theoretically. Second, even if such higher orders cannot be ruled out theoretically, the limited value of T does not make them very feasible. The data used in this chapter range from 1960 to 1985. With equal to five, this implies five cross-sections in the panel, i.e. T equals five. With regard to parameter values for which to conduct the simulations, we again follow the principle of customization. We let the data determine the set of parameter values for which to conduct the simulations. The following threestep procedure is employed for this purpose. In the first step, we obtain consistent estimates of and . This is done by an instrumental variable (IV) regression based on the first-differenced model and using lagged xit’s as instruments. These consistent estimates of and are used to compute
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
325
composite residuals (t + i + vit). In the second step, these residuals are regressed on xit’s and year dummies to get estimates of ’s and t’s. The residuals from this second step regression give estimates of (i + vit)’s. We can denote these as uit’s. The third step consists of estimating the parameters of the MA(1) and AR(1) models from the estimated values of uit’s. We use Chamberlain’s Minimum Distance estimation procedure to do this and get estimated values of , , and the corresponding values of and .4 In growth-convergence studies, three different samples have been frequently used. Following Mankiw et al. (1992), these samples are often referred to as the NONOIL, INTER, and OECD. Of these, the OECD is the smallest and consists of 22 OECD countries. The NONOIL is the largest and consists of most of the sizable countries of the world for which oil extraction is not the dominant economic activity. This sample consists of 96 countries. Finally, the INTER is an intermediate sample comprised of all those countries included in the NONOIL sample except those for which data quality is not satisfactory. This sample consists of 74 countries. Table 1 gives the values of the parameters that belong to the first and second set. These are also the parameters that remain the same under different generation mechanisms of vit. Certain aspects of these parameter values are worth noting. First, there seems to be some agreement across samples regarding direction in which xit’s of different years relate to the individual effect term i. This is reflected in similar signs of t’s across samples. However, this agreement is not complete. Second, the way different time periods affect the growth process differs across samples. Table 1. Common Parameter Values Parameter
NONOIL
INTER
OECD
0 1 2 3 4 5 70 75 80 85
0.7886 0.1641 1.3334 –0.0028 0.1200 –0.1243 0.0267 0.2277 0.0171 –0.0156 –0.0067 –0.0669
0.7925 0.1732 1.3588 0.1927 –0.1098 –0.1644 0.1286 0.1715 0.0093 –0.0015 0.0218 –0.0523
0.6294 0.0954 2.8986 0.5863 –0.6354 –0.0702 0.6355 –0.3484 0.0680 0.0827 0.1295 0.1238
326
NAZRUL ISLAM
This is revealed by the signs of t’s in different samples. There are some differences in this regard between the NONOIL and the INTER samples. However, the difference between these two samples on the one hand, and the OECD, on the other, proves to be more significant. Next we turn to the parameter values that differ with the three different generation mechanisms of vit. The estimated values of these parameters are compiled in Table 2. Several things may be noted from this Table. First, the largest estimated values of and are about 0.2 and 0.3, respectively. This indicates that any serial dependence that vit may have in the actual data is of fairly low order.5 This in turn suggests that the relative performance of different estimators may not vary widely across different ways of modeling of vit. Second, variance of the individual country effect term remains quite stable under alternative generating schemes of vit in all different samples. Third, the estimate of the variance of vit also remains very similar across the samples. Fourth, the relative values of and v suggest that variation in the individual effect term i account for a significant part of the overall variation in the data. C. Data Generation Once the parameter values are available, data generation can begin. It proceeds through the following steps. First of all, values of xit’s are constructed from the
Table 2. Parameter
Parameter Values for Different Generating Mechanisms of vit NONOIL
v
Uncorrelated vit 0.1054 0.1281
v
0.2037 0.1179 0.1225 0.1153
v
0.2994 0.1227 0.1183 0.1171
INTER
OECD
0.0872 0.0139
0.0300 0.0762
0.1250 0.0990 0.1010 0.0980
0.1125 0.0302 0.0742 0.0300
0.1787 0.0943 0.0995 0.0927
0.1394 0.0319 0.0742 0.0316
MA(1) vit
AR(1) vit
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
327
Summers-Heston data set in the way described above.6 This data set also provides the initial values, y0i. We assume that all disturbance terms have normal distribution.7 The second step differs for different models of vit. For the uncorrelated model, random values of vit and i are generated using distributions N(0, 2v) and N(0, 2), respectively. These values of vit and i are then combined with the given values of yi,t 1 and xi,t 1, and the parameter values in Table 1 to produce yit. For the first period, y0i’s serve as the yi,t 1’s. For the subsequent periods, the value of yit serves as the lagged value of y for generating yi,t + 1. The process continues till the last (T-th) period is reached. For the MA(1) model, i is again generated using distribution N(0, 2). However, generation of vit now requires generation of it from the distribution N(0, 2 ). These values of it are then combined with the values of to produce the vit’s. Generation of vit’s for the AR(1) proceeds in analogous manner. Once the data are generated, estimation can proceed. We now turn to the estimation results.
IV. SIMULATION RESULTS Given a certain number of cross-sections available (i.e. given T), different panel data estimators can make use of different numbers of these cross-sections at the final stage of estimation. In simulation, therefore, it is possible to adopt two different approaches. One is to keep the actual number of cross-sections used by the estimators the same by generating varying number of cross-sections for different estimators. The other is to keep the number of cross-sections generated the same and let the number of actual cross-sections used in the final stage of estimation by different estimators to vary. It is the second situation that a researcher faces in actual practice. In order to conform to this real situation, we adopt the second approach. In our particular case, there are five crosssections available, namely for 1965, 1970, 1975, 1980, and 1985, and T is five. We let the actual number of cross-sections used by individual estimators to vary.8 As is known, not all panel estimators are geared to estimation of all the parameters of the model. Because of this and also in order not to clutter the presentation with too many numerical results, we focus here only on results regarding and . The simulation results presented in this chapter are on the basis of one thousand replications. In most cases, Monte Carlo distributions stabilized with only one hundred replications. Hence increasing the number of replications by any further was not necessary. The two criteria that are usually used in judging performance of an estimator are bias and mean square error (MSE). In order to make assessment easy, we
328
NAZRUL ISLAM
present tables showing bias and root mean square error (RMSE) in relative form, i.e. as percentage of the true parameter value.9 Tables 3 and 4 provide the relative magnitudes of bias, and Tables 5 and 6 show the relative magnitudes of root mean square error for the estimates of and , respectively. These Tables indicate that the relative performance of the estimators varies across samples and vit generation mechanisms (DGM). To convey an overall picture, we therefore compute the (algebraic) average of the bias and RMSE for each estimator. These are row-averages and are presented in the last column of the Tables. We will first describe the results in terms of these averages and then consider the inter-sample and inter-DGM variations. Beginning with , we may first consider results regarding bias. Table 3 shows that the OLS estimates of are, as expected, positively biased, and this bias averages to seventeen percent. The panel estimates of , on the other hand and as expected, are negatively biased. The only exception in this regard is the AH(d) estimator, which displays small positive bias when vit is generated under the uncorrelated (UC) scheme. However, the average bias is negative for this estimator too. We refrain from reporting results for the AH(l) estimator because of its very poor performance. (We will come to this issue shortly.) Among the panel estimators, the bias is smaller for the AH(d), the LSDV, and the MD estimators, ranging between five and six percent. These are followed by the SE estimators, for which this bias ranges between eight to ten percent. The largest bias, about twenty-two percent, is associated with the ABGMM estimators. Table 5 shows that the RMSE in estimating has a similar pattern. The average RMSE for the OLS estimator stands at seventeen percent. For the LSDV and the MD estimator, this ratio lies between six and seven percent. For the AH(d) estimator the ratio averages to eleven percent. For the SE estimators, this ratio lies between thirteen to twenty percent. For the ABGMM estimators, this ratio equals to or exceeds forty percent. Looking at the bias results for (Table 4), we see that the OLS estimates are again severely biased upwards, with the bias now averaging to forty-eight percent. The direction of bias of the panel estimators is mixed. But the panel procedures yield estimates that are on average quite close to the true parameter values. The absolute value of this bias for the panel estimators ranges from under one to seven percent. Within this range, however, the LSDV, the MD, the 2SLS, and the 3SLS estimators perform better, with average bias being less than one percent. Next comes the AH(d) and the G3SLS estimator, having a bias ranging between one and two percent. The largest biases, ranging between five and seven percent, are recorded for the ABGMM estimators. The smallness of the average biases of the panel estimates of is however swamped by large variances of the Monte Carlo distributions. This finds
Bias as Percentage of True Parameter Value
MA(1) 14.6 –8.2 n·r· –14.5 –10.4 –10.1 –9.3 –3.3 –5.4 –6.9
UC 14.8 –8.0 n·r· 0.4 –10.7 –9.7 –9.3 –4.5 –6.0 –6.7
OLS LSDV AH(l) AH(d) AGMM1 AGMM2 2SLS 3SLS G3SLS MD
AR(1) 14.8 –7.9 n·r· –15.9 –10.6 –10.2 –8.6 –6.5 –5.2 –6.4
NONOIL UC 15.2 –8.4 n·r· 0.2 –44.4 –47.3 –3.1 –5.8 –8.3 –6.9
INTER MA(1) 15.2 –9.3 n·r· –9.5 –49.5 –49.5 –3.1 –7.9 –10.1 –7.9
INTER AR(1) 15.4 –8.0 n·r· –10.0 –43.4 –44.4 –2.8 –5.3 –8.8 –6.7
INTER UC 21.5 –1.6 n·r· 0.6 –9.5 –8.6 –18.8 –12.7 –19.9 –1.3
OECD MA(1) 20.9 –1.7 n·r· –1.2 –8.6 –8.2 –15.8 –12.2 –16.5 –1.1
OECD AR(1) 21.2 –1.4 n·r· –1.6 –8.3 –8.5 –17.1 –12.2 –13.4 –1.2
OECD
Average 17.1 –6.1 n·r· –5.7 –21.7 –21.8 –9.8 –7.8 –10.4 –5.0
Row
Notes: 1. The true values of are different for different sample and are provided in Table 1. 2. ‘Row Average’ is the algebraic average of the numbers in the row. 3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive generation mechanism of the transitory error vit. 4. ‘n.r’. stands for ‘Not Reported’, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
For in the model: yit = yi,t 1 + xi,t 1 + i + t + vit
Table 3.
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation 329
Bias as Percentage of True Parameter Value
MA(1) 32.1 –0.7 n·r· –2.2 14.4 5.2 –2.7 –0.2 –2.4 –0.7
UC 31.4 1.0 n·r· 0.4 13.7 3.9 –2.3 –0.2 –1.3 0.2
OLS LSDV AH(l) AH(d) AGMM1 AGMM2 2SLS 3SLS G3SLS MD
AR(1) 31.6 –0.3 n·r· –4.0 14.5 14.7 –1.9 –2.0 –1.7 –0.8
NONOIL UC 11.8 0.5 n·r· –0.6 –7.5 3.1 2.7 –2.0 2.0 1.0
INTER MA(1) 11.7 1.4 n·r· –1.1 16.3 22.1 2.0 –2.3 2.2 0.5
INTER AR(1) 11.1 1.1 n·r· –1.0 26.4 34.5 2.5 9.3 8.7 0.0
INTER UC 100.0 –1.5 n·r· –1.7 –7.3 –17.3 –0.8 –5.1 –14.2 –0.6
OECD MA(1) 99.9 –0.7 n·r· –2.5 –5.4 –19.9 –3.9 4.5 –8.1 –0.6
OECD
AR(1) 100.5 –2.1 n·r· –0.4 –2.6 1.5 –3.9 –8.9 2.0 1.3
OECD
Average 47.8 –0.1 n·r· –1.5 6.9 5.3 –0.9 –0.8 –1.4 0.03
Row
Notes: 1. The true values of are different for different sample and are provided in Table 1. 2. ‘Row Average’ is the algebraic average of the numbers in the row. 3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive generation mechanism of the transitory error vit. 4. ‘n.r’. stands for ‘Not Reported’, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
For in the model: yit = yi,t 1 + xi,t 1 + i + t + vit
Table 4.
330 NAZRUL ISLAM
Root MSE as Percentage of True Parameter Value
MA(1) 14.8 8.7 n·r· 16.6 27.5 29.3 12.6 14.7 18.1 7.8
UC 15.0 8.5 n·r· 8.3 27.7 29.6 12.1 8.5 10.0 7.4
OLS LSDV AH(l) AH(d) AGMM1 AGMM2 2SLS 3SLS G3SLS MD
AR(1) 14.9 8.5 n·r· 17.6 26.7 28.9 12.0 10.4 8.7 7.4
NONOIL UC 15.3 8.9 n·r· 5.4 64.8 79.7 5.1 9.6 11.9 7.6
INTER MA(1) 15.3 9.9 n·r· 13.9 70.4 84.9 5.4 11.1 13.8 8.7
INTER AR(1) 15.3 8.7 n·r· 13.3 65.9 77.1 5.0 8.9 12.6 7.6
INTER UC 22.3 3.5 n·r· 7.3 24.3 32.9 24.3 28.4 40.9 3.0
OECD MA(1) 21.7 3.6 n·r· 7.2 21.9 29.1 21.3 28.4 37.6 3.1
OECD AR(1) 22.0 3.6 n·r· 7.5 23.7 31.0 23.0 23.6 29.3 3.2
OECD
Average 17.4 7.1 n·r· 10.8 39.2 46.9 13.4 16.0 20.3 6.2
Row
Notes: 1. The true values of are different for different sample and are provided in Table 1. 2. ‘Row Average’ is the algebraic average of the numbers in the row. 3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive generation mechanism of the transitory error vit. 4. ‘n.r’. stands for ‘Not Reported’, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
For in the model: yit = yi,t 1 + xi,t 1 + i + t + vit
Table 5.
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation 331
Root MSE as Percentage of True Parameter Value
MA(1) 35.2 15.3 n·r· 20.1 151.9 169.7 18.4 21.9 28.2 15.4
UC 34.6 12.8 n·r· 19.9 147.0 169.6 17.1 13.7 17.5 13.3
OLS LSDV AH(l) AH(d) AGMM1 AGMM2 2SLS 3SLS G3SLS MD
AR(1) 34.7 15.4 n·r· 19.1 145.3 165.1 17.5 16.1 15.4 15.8
NONOIL UC 18.8 12.4 n·r· 20.3 148.0 187.7 19.5 16.5 17.8 12.6
INTER MA(1) 18.1 14.5 n·r· 21.2 153.8 205.2 20.8 15.6 18.5 15.1
INTER AR(1) 17.7 14.3 n·r· 18.1 143.6 181.9 19.3 17.8 23.5 14.4
INTER UC 117.4 40.1 n·r· 64.9 243.5 306.8 58.3 67.2 119.4 40.5
OECD MA(1) 116.6 44.9 n·r· 60.2 237.9 284.6 54.7 64.9 111.1 44.6
OECD
AR(1) 116.0 43.8 n·r· 62.1 226.5 284.1 57.9 82.5 149.6 45.6
OECD
Average 56.6 23.7 n·r· 34.0 177.5 217.2 31.5 35.1 55.7 24.1
Row
Notes: 1. The true values of are different for different sample and are provided in Table 1 2) ‘Row Average’ is the algebraic average of the numbers in the row. 3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive generation mechanism of the transitory error vit. 4. ‘n.r’. stands for ‘Not Reported’, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
For in the model: yit = yi,t 1 + xi,t 1 + i + t + vit
Table 6.
332 NAZRUL ISLAM
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
333
reflection in the large relative RMSE values reported in Table 6. The ratio of RMSE to true value of for the OLS estimator stands at fifty-seven percent. For most of the panel estimators this ratio is much lower. For the LSDV and the MD estimators, this ratio is close to twenty-four percent. For the AH(d), the 2SLS, and 3SLS estimators, the ratio lies between thirty-two and thirty-five percent. The G3SLS estimator displays a higher ratio, fifty-six percent, which is close to that observed for the OLS estimator. For the ABGMM estimators, however, this ratio ranges from 178 to 217 percent, which is much higher than that for the OLS. These results show that the OLS estimation of the growth-convergence equation is very likely to produce significantly biased estimates. The performance of the panel estimators, on the other hand, varies. The LSDV and the MD estimators perform well. The SE estimators come next in performance. The AH estimators display very contrasting performance. The AH(l) estimator perform so poorly that we refrain from presenting its results. On the other hand, the AH(d) estimator performs sometimes better than the SE estimators. The ABGMM estimators are found to display large bias and RMSE. These results agree with recent Monte Carlo evidence produced by other researchers in other contexts. For example several studies have reported bias of the ABGMM estimators. Other studies have reported good small sample performance of the LSDV estimator. These results imply that the OLS estimation of the growth-convergence equation should be avoided. Indiscriminate use of panel estimator is also fraught with danger. However, a judicious choice of panel estimator can yield better estimates of the parameters of the growth convergence equation. Empirical growth researchers can make use of this possibility. Beyond these results of immediate concern, the study brings out several general points. The first of these concerns the contrasting performance of the AH estimators. Both these estimators rely on the assumption of orthogonality of lagged yi to vit. This assumption holds only when vit is serially uncorrelated. Therefore, one would expect both these estimators to perform well when vit is serially uncorrelated, and both of them to perform poorly when vit follows either the AR(1) or the MA(1) pattern. However, as the numbers in the Tables show, the AH(d) performs relatively well under all different generation mechanisms of vit and for all samples, while the performance of AH(l) is found to be unsatisfactory under all different generation mechanisms of vit and for all samples, particularly for the NONOIL and the INTER samples. The explanation, as it turns out, lies in the difference in the degree of correlation of the instruments with the instrumented variables. It is found that (yi,t 2 yi,t 3), the instrument used by the AH(d), is strongly correlated with the explanatory
334
NAZRUL ISLAM
variable (yi,t 1 yi,t 2), while yi,t 2, the instrument used by the AH(l), is very poorly correlated with (yi,t 1 yi,t 2). This poor correlation finds reflection in astronomically large values of standard error for the AH(l) estimates. These results reconfirm the necessity of instruments to be sufficiently correlated with the instrumented variable (in addition to being uncorrelated with the error), and highlight the importance of the research on estimation with ‘weak’ instruments.10 A second point concerns the performance of the ABGMM estimators as well as the AH(d) estimator. The performance of these estimators does not vary that much over the three generation mechanisms of vit. This is particularly true with regard to estimation of . This is somewhat surprising because these estimators depend rather heavily for their validity on orthogonality of lagged values of yit to vit, and this orthogonality is violated when vit follows either an AR or a MA scheme. It is true that the order of serial correlation is low. However, one would expect some effect of the serial correlation given that it nullifies validity of so many instruments. Actually, the AH(d) estimator does show some sensitivity with respect to the generation scheme of vit. Why the ABGMM estimators do not display similar sensitivity is an intriguing question. The third point relates to the variation of performance of the estimators across samples. The overall picture portrayed above is on the basis of average over samples and DGMs. Looking at inter-sample variation, however, it is difficult to establish a pattern. For example, going by the results on bias of estimated , the performance of the OLS estimator deteriorates for the OECD when compared with that for either the NONOIL or the INTER samples. However, in case of the LSDV and the MD estimators, the opposite is true. The ABGMM and the SE estimators show a yet different kind of contrast. The performance of the ABGMM estimators deteriorates for the INTER sample in comparison with that for either the NONOIL or the OECD samples. In case of the SE estimators, the opposite is true. The contrasting performance of the ABGMM and the SE estimators may not be entirely surprising in view of the fact that while the former depends on lagged yit’s as instruments, the SE estimators rely entirely on the xit’s. The fourth point concerns relative performance of simple and sophisticated versions of generically similar estimators. The averaged RMSE values presented in Tables 5 and 6 show that the simpler 2SLS estimators outperforms the 3SLS and the G3SLS. Similarly, in terms of these averaged values, the ABGMM1 outperforms the ABGMM2.11 This highlights the fact that sophisticated estimators requiring estimated weighting matrices may not necessarily perform better than their simpler counterpart estimators that do not require such matrices. Estimation of these weighting matrices creates
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
335
additional scope for noise to enter the estimation process, and that may nullify the potential gain. The final point concerns the performance of the LSDV estimator. As is known, for a dynamic panel data model, the LSDV is inconsistent in the direction of N. True that the LSDV estimator is consistent in the direction of T. However T in this study is too small to make one a-priori hopeful of the benefit of T-asymptotics. The results of this chapter regarding LSDV estimates show that even theoretically inconsistent estimators can have good small sample properties. This reinforces the importance of Monte Carlos studies.
V. CONCLUDING REMARKS The issue of small sample properties of dynamic panel estimators is important. Both substantive and methodological conclusions often depend on attention given to this issue. For example, Caselli et al. (1996) reject the Solow model based on their results from estimation of the growth-convergence equation using a variant of the ABGMM estimator. The small sample bias of this estimator reported in this and other studies may raise the question whether such a rejection was too quick. Also, the estimation results prompt the authors to abandon the strictly model-based specification in favor of an extended version that includes a variety of variables based on heuristic reasoning. From a methodological point of view, this is a throwback to the earlier stage of crosscountry growth research when specifications used to be informal, and the coefficient of the regressions did not have exact correspondence with the structural parameters of the production function. One of the great merits of Mankiw, Romer & Weil (1992) and Barro & Sala-i-Martin (1992) was to put an end to this stage. Methodologically, therefore, a return to informal specifications may not be the ideal thing to do. A more satisfactory solution is perhaps to adopt a two-stage analysis, with the first stage adhering to the formal, model-based specification and yielding unbiased estimates of parameters and productivity. The second stage may focus on the role of the ‘heuristic’ variables in explaining productivity differences. However, this requires attention to the issue of small sample performance of the estimator used in the first stage.
NOTES 1. For a derivation of the growth-convergence equation, see Barro & Sala-i-Martin (1992, 1995), Mankiw, Romer & Weil (1992), and Mankiw (1995). For conversion of the growth-convergence equation into a dynamic panel data model, see Islam (1993, 1995).
336
NAZRUL ISLAM
2. For discussions of many of these new estimators, see Baltagi (1995) and Hsiao (1986). 3. This is value of that has been used in Islam (1993, 1995), Knight et al. (1993), Caselli et al. (1996) and in several other papers. 4. For example, for the MA(1) model, this starts by noticing that E(uiui ) has the following structure: 2 + (1 + 2) 2 2 + 2 2 2 2 2 2 2 2 2 2 2 2 + + (1 + ) + 2 2 2 2 2 2 2 2 2 E(uiui) = + + (1 + ) + 2 2 2 2 2 2 2 2 2 + + (1 + ) + 2 2 2 2 2 2 2 + + (1 + 2) 2 where ui = (ui1, ui2, . . . , uiT), and T = 5. As expected, E(uiui) has three parameters, namely , , and . The sample analog of this covariance matrix is obtained from 1 uˆ iuˆ i, where uˆ i = (ˆui1, . . . , uˆ iT), and uˆ it’s are obtained from the second step. There N i are T(T + 1)/2 = 15 distinct elements in this sample covariance matrix, which are (nonlinear) functions of the three underlying parameters , , and . Estimates of , , and can be obtained from these 15 elements using the MD estimation framework. See for details Chamberlain (1982, 1983). An analogous procedure is followed for the AR(1) model to obtain the estimates of , , and . Estimation of v and for the UC case is easier. 5. Perhaps also of interest is that the value of both and are the largest in the NONOIL sample and the smallest in the OECD sample, with the values for the INTER sample being in between. 6. For further details on construction of the xit’s, see Islam (1995). 7. In this study we have limited ourselves to parametric distributions of the disturbance term. In principle it is possible to do away with parametric assumptions. We leave this as a future task. 8. To save space, we do not provide detailed description of the estimators. Many of these are well known. For the rest, the interested reader can see the cited references. An appendix containing the description of the estimators is also available from the author upon request. 9. In this chapter we report only the summary results. The detailed results are in a set of Appendix Tables, which are available upon request. 10. See for example Nelson & Startz (1990), Staiger & Stock (1997), and Wang & Zivot (1998). 11. To be sure, this ranking does not hold for every sample and every DGM. For example in the NONOIL sample, regardless of the DGM, results from the 3SLS and the G3SLS estimators seem to be better than that from the 2SLS. For the INTER sample, however, the 2SLS seems to perform better than either the 3SLS or the G3SLS. In case of the OECD sample, the situation is less clear cut. In terms of the mean of the Monte Carlo distribution, the 3SLS and the G3SLS fare better than the 2SLS, though not in terms of dispersion. On the other hand, in the OECD sample, the Monte Carlo distributions for the 2SLS estimator have very large standard deviation. One reason for
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
337
deterioration of performance of the 3SLS and the G3SLS estimators in the INTER and the OECD samples, when compared to that in the NONOIL sample, may lie in samplesize. The sizes of the former samples are smaller that that of the latter. Since the superiority of the 3SLS and the G3SLS over the 2SLS estimator is an asymptotic result, a larger sample size may help this result to surface.
ACKNOWLEDGMENTS I would like to thank Professor Chamberlain, Professor Jorgenson, and Professor Guido Imbens for their guidance to my work on this paper. Initial versions of this chapter were presented in seminars at Harvard University and Emory University. Comments of the participants of these seminars are greatly appreciated. I would like to extend my sincere thanks to the three referees and the editor, Professor Badi Baltagi, for their comments and suggestions that led to significant improvement of this chapter. All remaining errors are mine.
REFERENCES Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 5–27. Ahn, S. C., & Schmidt, P. (1997). Efficient Estimation of Dynamic Panel Models: Alternative Assumptions and Simplified Estimation. Journal of Econometrics, 76, 309–321. Ahn, S. C., & Schmidt, P. (1999). Estimation of Linear Panel Data Models Using GMM. In: Matyas (Eds), Generalized Method of Moments Estimation. Cambridge: Cambridge University Press. Alonso-Borrengo, C., & Arellano, M. (1999). Symmetrically Nomalized Instrumental-Variable Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 36–49. Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components. Journal of American Statistical Association, 76, 598–606. Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics, 18, 47–82. Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. The Review of Economic Studies, 58, 277–297. Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variable Estimation of Error Components Models. Journal of Econometrics, 68, 29–52. Balestra, P., & Nerlove, M. (1966). Pooling Cross-section and Time Series Data in the Estimation of a Dynamic Model: The Demand of Natural Gas. Econometrica, 34, 585–612. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons. Baltagi, B. H., & Kao, C. (2000). Non-stationary Panels, Cointegration in Panels, & Dynamic Panels: A Survey. Advances in Econometrics, 15 (this volume). Barro, R. (1997). Determinants of Economic Growth: A Cross-country Empirical Study. Cambridge: MIT Press. Barro, R., & Sala-i-Martin, X. (1992). Convergence. Journal of Political Economy, 100(2), 223–251.
338
NAZRUL ISLAM
Barro, R., & Sala-i-Martin, X. (1995). Economic Growth. Boston: McGraw Hill. Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variable Estimators. Econometrica, 62, 657–681. Blundell, R., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115–143. Caselli, F., Esquivel, G., & Lefort, F. (1996). Reopening the Convergence Debate: A New Look at Cross-country Growth Empirics. Journal of Economic Growth, 1(3), 363–390. Chamberlain, G. (1982). Multivariate Regression Models for Panel Data. Journal of Econometrics, 18, 5–46. Chamberlain, G. (1983). Panel Data. In: Z. Griliches, Z. & M. Intrilligator (Eds), Handbook of Econometrics (pp. 1247–1318), Vol. II. North-Holland. Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed Effects? Journal of Econometrics, 93, 309–326. Harris, M. N., & Matyas, L. A. (1996). Comparative Analysis of Different Estimators for Dynamic Panel Data Models. Working paper: 04/96, Department of Econometrics and Business Statistics, Monash University. Harris, M., Longmire, R., & Maytas, L. (1996). Robustness of Estimators for Dynamic Panel Data Models to Misspecification. Working paper No. 14/96, Department of Econometrics and Business Statistics, Monash University. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Islam, N. (1993). Estimation of Dynamic Models from Panel Data. Unpublished Ph.D. Dissertation, Department of Economics, Harvard University. Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, CX, 1127–1170. Judson, R. A., & Owen, A. L. (1997). Estimating Dynamic Panel Data Models: Practical Guide for Macroeconomists. Board of Governors of the Federal Reserve System, Finance and Economics Discussion Paper Series 1997/03. Kiviet, J. (1995). On Bias, Inconsistency, & Efficiency of Various Estimators in Dynamic Panel Data Models. Journal of Econometrics, 68, 53–78. Knight, M., Loyaza, N., & Villanueva, D. (1993). Testing for Neoclassical Theory of Growth. IMF Staff Papers, 40(3), 512–541. Lee, K., Pesaran, H., & Smith, R. (1997). Growth and Convergence in a Multi-Country Empirical Stochastic Growth Model. Journal of Applied Econometrics, 12, 357–392. Lee, K., Pesaran, H., & Smith, R. (1998). Growth Empirics: A Panel Data Approach – A Comment. Quarterly Journal of Economics, CXIII, 319–323. Lee, M., Longmire, R., Matyas, L., & Harris, M. (1998). Growth Convergence: Some Panel Evidence. Applied Economics, 30, 907–912. Mankiw, N. G. (1995). The Growth of Nations. Brookings Papers on Economic Activity, 1, 275–310. Mankiw, N. G., Romer, D., & Weil, D. (1992). A Contribution to the Empirics of Growth. Quarterly Journal of Economics, CVII, 407–437. Maytas, L. (Ed.) (1999). Generalized Method of Moments Estimation. Cambridge: Cambridge University Press. Mundlak, Y. (1971). On the Pooling of Time Series and Cross-section Data. Econometrica, XXXVI, 69–85. Nelson, C. R., & Startz, R. (1990). Some Further Results on the Exact Small Sample Properties of the Instrumental Variables Estimator. Econometrica, 58, 967–976.
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
339
Nerlove, M. (1967). Experimental Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-sections. Economic Studies Quarterly, 18, 42–74. Nerlove, M. (1971). Further Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-sections. Econometrica, 39, 383–396. Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri, L. Lee & M. Pesaran (Eds), Analysis of Panel and Limited Dependent Variable Models. Cambridge: Cambridge University Press. Nickel, S. (1979). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 1399–1416. Staiger, D., & Stock, J. H. (1997). Instrumental Variable Regressions with Weak Instruments. Econometrica, 65, 557–586. Summers, R., & Heston, A. (1988). A New Set of International Comparisons of Real Product and Price Levels Estimates for 130 Countries, 1950–85. Review of Income and Wealth, XXXIV, 1–26. Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950–1988. Quarterly Journal of Economics, 106, 327–368. Wang, J., & Zivot, E. (1998). Inference on Structural Parameters in Instrumental Variables Regression with Weak Instruments. Econometrica, 66(6), 1389–1404. Wansbeek, T. J., & Knaap, T. (1998). Estimating a Dynamic Panel Data Model with Heterogenous Trends. Working paper, Department of Economics, University of Groningen. Ziliak, J. P. (1997). Efficient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators. Journal of Business and Economic Statistics, 15, 419–431.