Editorial policy: The Journal of Econometrics is designed to serve as an outlet for important new research in both theoretical and applied econometrics. Papers dealing with estimation and other methodological aspects of the application of statistical inference to economic data as well as papers dealing with the application of econometric techniques to substantive areas of economics fall within the scope of the Journal. Econometric research in the traditional divisions of the discipline or in the newly developing areas of social experimentation are decidedly within the range of the Journal’s interests. The Annals of Econometrics form an integral part of the Journal of Econometrics. Each issue of the Annals includes a collection of refereed papers on an important topic in econometrics. Editors: T. AMEMIYA, Department of Economics, Encina Hall, Stanford University, Stanford, CA 94035-6072, USA. A.R. GALLANT, Duke University, Fuqua School of Business, Durham, NC 27708-0120, USA. J.F. GEWEKE, Department of Economics, University of Iowa, Iowa City, IA 52240-1000, USA. C. HSIAO, Department of Economics, University of Southern California, Los Angeles, CA 90089, USA. P. ROBINSON, Department of Economics, London School of Economics, London WC2 2AE, UK. A. ZELLNER, Graduate School of Business, University of Chicago, Chicago, IL 60637, USA. Executive Council: D.J. AIGNER, Paul Merage School of Business, University of California, Irvine CA 92697; T. AMEMIYA, Stanford University; R. BLUNDELL, University College, London; P. DHRYMES, Columbia University; D. JORGENSON, Harvard University; A. ZELLNER, University of Chicago. Associate Editors: Y. AÏT-SAHALIA, Princeton University, Princeton, USA; B.H. BALTAGI, Syracuse University, Syracuse, USA; R. BANSAL, Duke University, Durham, NC, USA; M.J. CHAMBERS, University of Essex, Colchester, UK; SONGNIAN CHEN, Hong Kong University of Science and Technology, Kowloon, Hong Kong; XIAOHONG CHEN, Department of Economics, Yale University, 30 Hillhouse Avenue, P.O. Box 208281, New Haven, CT 06520-8281, USA; MIKHAIL CHERNOV (LSE), London Business School, Sussex Place, Regents Park, London, NW1 4SA, UK; V. CHERNOZHUKOV, MIT, Massachusetts, USA; M. DEISTLER, Technical University of Vienna, Vienna, Austria; M.A. DELGADO, Universidad Carlos III de Madrid, Madrid, Spain; YANQIN FAN, Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA; S. FRUHWIRTH-SCHNATTER, Johannes Kepler University, Liuz, Austria; E. GHYSELS, University of North Carolina at Chapel Hill, NC, USA; J.C. HAM, University of Southern California, Los Angeles, CA, USA; J. HIDALGO, London School of Economics, London, UK; H. HONG, Stanford University, Stanford, USA; MICHAEL KEANE, University of Technology Sydney, P.O. Box 123 Broadway, NSW 2007, Australia; Y. KITAMURA, Yale Univeristy, New Haven, USA; G.M. KOOP, University of Strathclyde, Glasgow, UK; N. KUNITOMO, University of Tokyo, Tokyo, Japan; K. LAHIRI, State University of New York, Albany, NY, USA; Q. LI, Texas A&M University, College Station, USA; T. LI, Vanderbilt University, Nashville, TN, USA; R.L. MATZKIN, Northwestern University, Evanston, IL, USA; FRANCESCA MOLINARI (CORNELL), Department of Economics, 492 Uris Hall, Ithaca, New York 14853-7601, USA; F.C. PALM, Rijksuniversiteit Limburg, Maastricht, The Netherlands; D.J. POIRIER, University of California, Irvine, USA; B.M. PÖTSCHER, University of Vienna, Vienna, Austria; I. PRUCHA, University of Maryland, College Park, USA; E. RENAULT, University of North Carolina, Chapel Hill, NC; R. SICKLES, Rice University, Houston, USA; F. SOWELL, Carnegie Mellon University, Pittsburgh, PA, USA; MARK STEEL (WARWICK), Department of Statistics, University of Warwick, Coventry CV4 7AL, UK; DAG BJARNE TJOESTHEIM, Department of Mathematics, University of Bergen, Bergen, Norway; HERMAN VAN DIJK, Erasmus University, Rotterdam, The Netherlands; Q.H. VUONG, Pennsylvania State University, University Park, PA, USA; E. VYTLACIL, Columbia University, New York, USA; T. WANSBEEK, Rijksuniversiteit Groningen, Groningen, Netherlands; T. ZHA, Federal Reserve Bank of Atlanta, Atlanta, USA and Emory University, Atlanta, USA. Submission fee: Unsolicited manuscripts must be accompanied by a submission fee of US$50 for authors who currently do not subscribe to the Journal of Econometrics; subscribers are exempt. Personal cheques or money orders accompanying the manuscripts should be made payable to the Journal of Econometrics. Publication information: Journal of Econometrics (ISSN 0304-4076). For 2011, Volumes 160–165 (12 issues) are scheduled for publication. Subscription prices are available upon request from the Publisher, from the Elsevier Customer Service Department nearest you, or from this journal’s website (http://www.elsevier.com/locate/jeconom). Further information is available on this journal and other Elsevier products through Elsevier’s website (http://www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by standard mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should be made within six months of the date of dispatch. USA mailing notice: Journal of Econometrics (ISSN 0304-4076) is published monthly by Elsevier B.V. (Radarweg 29, 1043 NX Amsterdam, The Netherlands). Periodicals postage paid at Rahway, NJ 07065-9998, USA, and at additional mailing offices. USA POSTMASTER: Send change of address to Journal of Econometrics, Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365 Blair Road, Avenel, NJ 07001-2231, USA. Orders, claims, and journal inquiries: Please contact the Elsevier Customer Service Department nearest you. St. Louis: Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA; phone: (877) 8397126 [toll free within the USA]; (+1) (314) 4478878 [outside the USA]; fax: (+1) (314) 4478077; e-mail:
[email protected]. Oxford: Elsevier Customer Service Department, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK; phone: (+44) (1865) 843434; fax: (+44) (1865) 843970; e-mail:
[email protected]. Tokyo: Elsevier Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg., 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan; phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail:
[email protected]. Singapore: Elsevier Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65) 67331510; e-mail:
[email protected]. Printed by Henry Ling Ltd., Dorchester, United Kingdom The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper)
Journal of Econometrics 163 (2011) 1–3
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Editorial
Factor structures for panel and multivariate time series data
This Special Issue of the Journal of Econometrics entitled Factor Structures for Panel and Multivariate Time Series Data contains a selection of papers presented at the international conference on the same subject organized in Maastricht, The Netherlands on September 18–20, 2008. The conference was held on the occasion of the 25th anniversary of the Faculty of Economics and Business Administration of Maastricht University. The main objective of this conference was to provide a forum where researchers using factor structures in multivariate time series and in panel data could meet and discuss the results of their most recent research. The aim of the conference organizers and the programme committee was also to promote interaction between the junior participants and the senior invited researchers. Younger researchers have been encouraged to participate by giving them the opportunity to present their papers either at the plenary sessions or at the poster sessions. These poster sessions were placed between the main programme sessions to provide maximal opportunity to younger researchers to discuss their work and interact with senior participants. Overall, 4 invited and 14 contributed papers were presented at the plenary sessions and 16 papers in the poster sessions. The meeting was attended by approximately 80 researchers. The Programme Committee members for this meeting were Jörg Breitung, Franc C. Palm, Jean-Pierre Urbain and Joakim Westerlund. Palm and Urbain acted also as local organizers. The conference brought together econometricians and statisticians approaching this general theme from various backgrounds. The range of topics covered by the papers presented in the plenary and poster sessions reflected this broadness. Factor models have become very popular in econometrics over the last decade; for recent surveys we refer to Breitung and Eickmeier (2006) and Bai and Ng (2008). Some contributions to the conference adopted a statistical framework where the factor structure is viewed as the main focus of the analysis following, for example, the approximate static factor framework of Bai and Ng (2002) and Bai (2003), the dynamic factor models for large panels in the spirit of Forni et al. (2000), or adopting an unobserved component model or a factorGARCH set-up for volatility modeling. The papers presented in this issue by Forni and Lippi, by Hallin and Liska, by Hallin, Mathias, Pirotte and Veredas, by Eichler, Motta and von Sachs, by Breitung and Eickmeier, and by Boswijk and van der Weide fall into this category. Other contributions view the factor structure essentially as a convenient way to model potentially strong cross-sectional dependence in a large panel data set, using more traditional dynamic panel models, or in VAR models. In the latter case, the factor structure is not of main interest. The purpose is rather to propose inferential techniques that are robust to the presence of factors, for instance in the line of Pesaran (2006). This is the framework 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.001
adopted by Chudik and Pesaran while Palm, Smeekes and Urbain rely on bootstrap techniques to achieve this robustness. Finally, some contributions, including that of Franchi and Paruolo, focus on the implicit common factor structure contained in reduced rank multivariate time series models displaying co-movement as implied by the existence of common stochastic trends and/or common cyclical features. These different approaches led to an exchange of views and debate between leading researchers on some fundamental issues arising in this field of econometrics. The debate was focused on ways to deal with cross-sectional dependence and other nuisance parameter structures in large dynamic panels and multivariate time series and more generally on directions to address problems of nuisance parameters and the curse of dimensionality in these models. This Special Issue is by no means a proceedings volume. Many more papers, nineteen, presented during the conference have been submitted than those that were accepted for publication after a review process. The papers that are collected here have all undergone a rigorous reviewing and revision process. Notice that the ordering of the papers in this Special Issue is made so as to reflect existing connections and intersection whenever possible. The contributions can be summarized as follows. Pesaran and Chudik consider the issue of the curse of dimensionality in the case of infinite dimensional VAR models with common factors by shrinking part of the parameter space in the limit as the number of variables in the VAR tends to infinity. Using the concepts of weak and strong cross sectional dependence (see Chudik et al., 2010) they introduce the concept of ‘‘neighbors’’ and assume that each unit in the VAR is related to a small number of neighbors that is not affected by the dimension of the VAR; and to a large number of non-neighbors whose effect vanishes asymptotically when the number of units increases. The authors study estimation and inference in this class of models in the presence of common factors and propose a cross section augmented least squares estimator. They study its asymptotic and finite sample behavior under both types of dependence structure. An application to real house prices is used to illustrate the importance of dynamic and contemporaneous spillover effects. Forni and Lippi consider the general (infinite dimensional) dynamic factor model as proposed by Forni et al. (2000). While these authors obtain a consistent estimator of the common components that is based on filters that are two-sided and therefore unsuitable for prediction, the present paper is based on the assumption of a rational spectral density for the common components. The main contribution is a consistent estimator obtained from a factorization of the spectral density matrix into one-sided filters without the finite-dimension assumption.
2
Editorial / Journal of Econometrics 163 (2011) 1–3
Building on earlier work by, inter alia Forni et al. (2000) on generalized dynamic factor models and on their previous work (Hallin and Liska, 2007), Hallin and Liska propose a framework to identify and estimate block-specific common factor structures in a large panel of stationary time series. Their analysis reveals that it is possible to disentangle in a more detailed way common factors that might be specific to particular subpanels or blocks of units. An empirical illustration using industrial indices for three countries shows an interesting decomposition of the factor structure. In their paper, Hallin, Mathias, Pirotte and Veredas show the usefulness of the analysis proposed by Hallin and Liska (2007) in an empirical study. Using data on two observed liquidity measures, namely daily close relative spreads and daily traded volumes, for a sample of 426 S& P500 stocks observed over the period 2004–2006, they propose a measure of liquidity based on dynamic factors. In particular, they define unobserved liquidity as the one-dimensional common shock driving both measures. Motivated by the features of economic data over long time periods which may exhibit smooth transitions over time in their covariance structure, Eichler, Motta and Von Sachs allow the dynamic structure of their factor model to be non-stationary as a result of deterministically varying factor loadings over time. The common components of the model are estimated by the eigenvectors of a consistent estimator of the now time-varying spectral density matrix of the data generating process. This method can be seen as a time-varying principal components approach in the frequency domain. The consistency of this estimator is proved in a ‘doubleasymptotic’ framework where time and cross-section dimensions are tending to infinity. The performance of the estimator is illustrated in a simulation study and in an application to macroeconomic data. Breitung and Eickmeier analyze the consequences of the presence of structural breaks in dynamic factor models and show that it can induce an overestimation of the number of factors based on standard criteria. They propose test statistics to test for structural change in factor models with known or unknown break location. The relevance of these tests is illustrated in two empirical analyzes where the authors revisit a large US macroeconomic dataset studied for example in Stock and Watson (2005, 2008) and find support for substantial changes in the US economy around the Great Moderation in the US. Using a large Euro-area data set, their second empirical analysis concentrates on the effect of the signing of the Maastricht Treaty and of the handover of monetary policy to the ECB. Factor structures have also been extensively used to model cross-sectional dependence when testing for panel unit roots, see for example Bai and Ng (2004) and Pesaran (2007). Palm, Smeekes and Urbain consider the issue of panel unit root testing for large panels of time series where the cross-sectional dependence can take a general form including, but not restricted to factor structures such as those adopted by Bai and Ng (2004) for example. Relying on a block bootstrap procedure, they propose bootstrap unit root tests and show their asymptotic validity. An extensive finite sample simulation study illustrates the good behavior of the tests compared to their asymptotic versions. In contrast to most of the existing tests in the literature, their bootstrap tests are robust to a very large class of cross-sectional dependence structure including both long-run and short-run relationships between the cross sectional units. Factor structures also arise in multiple time series with reduced rank structure. Franchi and Paruolo present necessary and sufficient conditions for the existence of common cyclical features in VAR models integrated of order 0, 1 or 2, where the common cyclical features correspond to common serial (CS) correlation,
commonality in the final equations (CE) and codependence (CD). The results are based on polynomial rank factorizations of the reversed AR polynomial around the poles of its inverse. All processes with CS structures are found to present also CE structures and vice versa. The presence of CD structures instead, implies the presence of both CS and CE structures, but not vice versa. Characterizations of the CS, CE, CD linear combinations are given in terms of linear subspaces defined in the polynomial rank factorizations. The attractiveness of factor structures for modeling large covariance matrices in a parsimonious way has been exploited in multivariate volatility modeling, see Diebold and Nerlove (1989) for an early example. Boswijk and van der Weide propose a method of moments estimation for the factor loading matrix in a generalized orthogonal GARCH (GO-GARCH) model. Their approach based on eigenvectors of sample autocorrelation matrices of squares and cross-products of returns is computationally and numerically simple and only requires mild assumptions on the volatility models of the factors. The authors derive conditions for consistency of the estimator and analyze its efficiency in a Monte Carlo experiment. The attractiveness is illustrated in an empirical study of European sector returns. Acknowledgements The editors of this volume are grateful to Cheng Hsiao and to the late Arnold Zellner for their support in setting up this volume. They wish to thank Jörg Breitung and Joakim Westerlund who acted as editors handling the papers involving the guest editors as co-author. Financial support for the conference was provided by the Maastricht Research School of Economics of Technology and Organizations (METEOR), the Department of Quantitative Economics and by a Journal of Applied Econometrics conference sponsorship grant. Special thanks also to Karin van den Boorn for her outstanding contribution to the organization of this event. Finally, the guest editors wish to express their gratitude to all contributors and participants at the meeting, and to the referees of the nineteen papers submitted to this Special Issue, whose names will be included in the list of referees of the entire volume. References Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71, 135–171. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J., Ng, S., 2004. A PANIC attack on unit roots and cointegration. Econometrica 72, 1127–1177. Bai, J., Ng, S., 2008. Large dimensional factor analysis. Foundations and Trends in Econometrics 3, 89–163. Breitung, J., Eickmeier, S., 2006. Dynamic factor models. Advances in Statistical Analysis 90, 27–42. Chudik, A., Pesaran, M.H., Tosetti, E., 2010. Weak and strong cross section dependence and estimation of large panels. ECB Working Paper No. 1100. October 2009, revised April 2010. Diebold, F.X., Nerlove, M., 1989. The dynamics of exchange rate volatility: a multivariate latent factor ARCH model. Journal of Applied Econometrics 4, 1–21. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. The Review of Economics and Statistics 82, 540–554. Hallin, M., Liska, R., 2007. Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association 102, 603–617. Pesaran, M.H., 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74, 967–1012. Pesaran, M.H., 2007. A simple panel unit root test in the presence of cross section dependence. Journal of Applied Econometrics 22, 265–312. Stock, J.H., Watson, M.W., 2005. Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467.
Editorial / Journal of Econometrics 163 (2011) 1–3 Stock, J.H., Watson, M.W., 2008. Forecasting in dynamic factor models subject to structural instability. In: Castle, J., Shephard, N. (Eds.), The Methodology and Practice of Econometrics, A Festschrift in Honour of Professor David F. Hendry. Oxford University Press, Oxford.
Franz C. Palm ∗ Jean-Pierre Urbain Department of Quantitative Economics, Maastricht University School of Business and Economics, The Netherlands
3
E-mail addresses:
[email protected] (F.C. Palm),
[email protected] (J.-P. Urbain). 8 November 2010 Available online 11 November 2010 ∗ Corresponding address: Department of Quantitative Economics,
Maastricht University School of Business and Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands. Tel.: +31 43 3 88 38 33; fax: +31 43 3 88 20 00.
Journal of Econometrics 163 (2011) 4–22
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Infinite-dimensional VARs and factor models✩ Alexander Chudik a,b , M. Hashem Pesaran b,c,d,∗ a
European Central Bank, Kaiserstrasse 29, 60311 Frankfurt am Main, Germany
b
Centre for International Macroeconomics and Finance, University of Cambridge, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK
c
Faculty of Economics, Cambridge University, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK
d
University of Southern California, College of Letters, Arts and Sciences, University Park Campus, Kaprielian Hall 300, KAP M/C 0253, Los Angeles, CA, USA
article
info
Article history: Available online 11 November 2010 JEL classification: C10 C33 C51 Keywords: Large N and T panels Weak and strong cross-section dependence VARs Spatial models Factor models
abstract This paper proposes a novel approach for dealing with the ‘curse of dimensionality’ in the case of infinitedimensional vector autoregressive (IVAR) models. It is assumed that each unit or variable in the IVAR is related to a small number of neighbors and a large number of non-neighbors. The neighborhood effects are fixed and do not change with the number of units (N), but the coefficients of non-neighboring units are restricted to vanish in the limit as N tends to infinity. Problems of estimation and inference in a stationary IVAR model with an unknown number of unobserved common factors are investigated. A cross-section augmented least-squares (CALS) estimator is proposed and its asymptotic distribution is derived. Satisfactory small-sample properties are documented by Monte Carlo experiments. An empirical illustration shows the statistical significance of dynamic spillover effects in modeling of US real house prices across the neighboring states. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Vector autoregressive (VAR) models provide a flexible framework for the analysis of complex dynamics and interactions that exist across economic variables or units. Traditional VARs assume that the number of such variables, N, is fixed and the time dimension, T , tends to infinity. But since the number of parameters to be estimated grows at a quadratic rate with N, in practice the empirical applications of VARs often involve only a handful of variables. The objective of this paper is to consider VARs where both N and T are large. In this case, parameters of the VAR model can no longer
✩ We are grateful to Elisa Tosetti, Jean-Pierre Urbain, and three anonymous referees for helpful comments and constructive suggestions. This version has also benefited from comments by the seminar participants at the University of California, San Diego, University of Southern California, Columbia University, University of Leicester, European University Institute, McGill University, Princeton University, University of Pennsylvania, Stanford University; as well as by the conference participants in the Nordic Econometric Meeting held at Lund University, and in the conference on the Factor Structures for Panels and Multivariate Time Series Data held at Maastricht University. We would also like to acknowledge Takashi Yamagata for carrying out the computation of the results reported in Section 6. ∗ Corresponding author at: Faculty of Economics, Cambridge University, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK. E-mail addresses:
[email protected] (A. Chudik),
[email protected] (M.H. Pesaran). URL: http://www.econ.cam.ac.uk/faculty/pesaran/ (M.H. Pesaran).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.002
be consistently estimated unless suitable restrictions are imposed to overcome the dimensionality problem. Two different approaches have been suggested in the literature to deal with this ‘curse of dimensionality’: (i) shrinkage of the parameter space, and (ii) shrinkage of the data. Spatial and/or spatiotemporal literature shrinks the parameter space by using a priori given spatial weight matrices that restrict the nature of the links across the units. Alternatively, prior probability distributions are imposed on the parameters of the VAR such as the ‘Minnesota’ priors proposed by Doan et al. (1984). This class of models is known as Bayesian VARs (BVARs).1 The second approach is to shrink the data, along the lines of index models. Geweke (1977) and Sargent and Sims (1977) introduced dynamic factor models, which have more recently been generalized to allow for weak cross-section dependence by Forni and Lippi (2001) and Forni et al. (2000, 2004). Empirical evidence suggests that few dynamic factors are needed to explain
1 Other types of prior have also been considered in the literature. See, for example, Del Negro and Schorfheide (2004) for a recent reference. In most applications, BVARs have been applied to relatively small systems (e.g. Leeper et al. (1996) considered 13- and 18-variable BVARs; a few exceptions include Giacomini and White (2006) and De Mol et al. (2008)), with the focus being mainly on forecasting. Bayesian VARs are known to produce better forecasts than unrestricted VARs or structural models. See Litterman (1986) and Canova (1995) for further references.
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
the co-movements of macroeconomic variables.2 This has led to the development of factor-augmented VAR (FAVAR) models by Bernanke et al. (2005) and Stock and Watson (2005), among others. Applied researchers are often forced to impose arbitrary restrictions on the coefficients that link the variables of a given cross-section unit to the current and lagged values of the remaining units, mostly because they realize that without such restrictions the model is not estimable. This paper proposes a novel way to deal with the curse of dimensionality by shrinking part of the parameter space in the limit as the number of variables (N) tends to infinity. An important example would be a VAR model in which each unit is related to a small number of neighbors and a large number of non-neighbors. The neighbors could be individual units or, more generally, linear combinations of units (spatial averages). The neighborhood effects are fixed and do not change with N, but the coefficients corresponding to the remaining non-neighbor units are small, of order O(N −1 ). Such neighborhood and nonneighborhood effects could be motivated by theoretical economic considerations, or could arise due to the mis-specification of spatial weights. Although under this set-up each of the non-neighboring coefficients is small, the sum of their absolute values in general does not tend to zero and the aggregate spatiotemporal nonneighborhood effects could be large. This paper shows that, under weak cross-section dependence, the spillover effects from non-neighboring units are neither particularly important, nor estimable.3 But the coefficients associated with the neighboring units can be consistently estimated by simply ignoring the nonneighborhood effects that are of second-order importance in N. On the other hand, if the units are cross-sectionally strongly dependent, then the spillover effects from non-neighbors are in general important, and ignoring such effects can lead to inconsistent estimates. Another model of interest arises when, in addition to the neighborhood effects, there is also a fixed number of dominant units that have non-negligible effects on all other units. In this case the limiting outcome is shown to be a dynamic factor model.4 Accordingly, this paper provides a link between data and parameter shrinkage approaches to mitigating the curse of dimensionality. By imposing limiting restrictions on some of the parameters of the VAR we effectively end up with a data shrinkage. To distinguish high-dimensional VAR models from the standard specifications, we refer to the former as the infinite-dimensional VARs or IVARs for short. The paper also establishes the conditions under which the global VAR (GVAR) approach proposed by Pesaran et al. (2004) is applicable.5 In particular, the IVAR featuring all macroeconomic variables could be arbitrarily well approximated by a set of finitedimensional small-scale models that can be consistently estimated separately in the spirit of the GVAR. A second contribution of this paper is the development of appropriate econometric techniques for estimation and inference
5
in stationary IVAR models with an unknown number of unobserved common factors. This extends the analysis of Pesaran (2006) to dynamic models where all variables are determined endogenously. A simple cross-section augmented least-squares estimator (or CALS for short) is proposed and its asymptotic distribution derived. Small-sample properties of the proposed estimator are investigated through Monte Carlo experiments. As an illustration of the proposed approach we consider an extension of the empirical analysis of real house prices across the 49 US states conducted recently by Holly et al. (2010), and show statistically significant dynamic spillover effects of real house prices across the neighboring states. The remainder of the paper is organized as follows. Section 2 introduces the IVAR model. Section 3 investigates cross-section dependence in IVAR models. Section 4 focusses on the estimation of a stationary IVAR model. Section 5 discusses the results of the Monte Carlo (MC) experiments, and Section 6 presents the empirical results. The final section offers some concluding remarks. Proofs are provided in the Appendix. We give a brief word on notation. |λ1 (A)| ≥ |λ2 (A)| ≥ · · · ≥ ×n is the |λn (A)| are the eigenvalues of A ∈ Mn×n , where Mn∑ n space of real-valued n × n matrices. ‖A‖1 ≡ max1≤j≤n i=1 |aij | denotes the maximum ∑ absolute column sum matrix norm of A, n and ‖A‖∞ ≡ max1≤i≤n j=1 |aij | is the absolute row sum matrix norm of A. ‖A‖ = ϱ(A′ A) is the spectral norm of A, and ϱ(A) ≡ max1≤i≤n {|λi (A)|} is the spectral radius of A.6 All vectors are column vectors, and the ith row of A is denoted by a′i . an = O(bn ) denotes that the deterministic sequence {an } is at most of order bn . xn = Op (yn ) states that the random variable xn is at most of order yn in probability. N is the set of natural numbers, and Z is the set of integers. We use K and ϵ to denote positive fixed constants that do not vary with N or T . Convergence in
d
distribution and convergence in probability is denoted by → and q.m.
p
→, respectively. Symbol → represents convergence in quadratic j
mean. (N , T ) → ∞ denotes joint asymptotic in N and T , with N and T → ∞, in no particular order. 2. Infinite-dimensional vector autoregressive models Suppose we have T time series observations on N cross-section units indexed by i ∈ S(N ) ≡ {1, . . . , N } ⊆ N. Individual units could be households, firms, regions, or countries. Both dimensions, N and T , are assumed to be large. For each point in time, t, and for each N ∈ N, the N cross-section observations are collected in the N × 1 vector x(N ),t = (x(N ),1t , . . . , x(N ),Nt )′ , and it is assumed that x(N ),t follows the VAR(1) model: x(N ),t = Φ(N ) x(N ),t −1 + u(N ),t ,
(1)
u(N ),t = R(N ) ε(N ),t .
(2)
Φ(N ) and R(N ) are N × N coefficient matrices that capture the 2 Stock and Watson (1999, 2002), Giannone et al. (2005) conclude that only a few, perhaps two, factors explain much of the predictable variations, while Bai and Ng (2007) estimate four factors and Stock and Watson (2005) estimate as many as seven factors. 3 Concepts of strong and weak cross-section dependence, introduced in Chudik et al. (2010), will be applied to VAR models. 4 The case of IVAR models with a dominant unit is studied in Pesaran and Chudik (2010). 5 The GVAR model has been used to analyse credit risk in Pesaran et al. (2006, 2007). An extended and updated version of the GVAR by Dées et al. (2007), which treats the Euro area as a single economic area, was used by Pesaran et al. (2007) to evaluate UK entry into the Euro. Global dominance of the US economy in a GVAR model is considered in Chudik (2008). Further developments of a global modeling approach are provided in Pesaran and Smith (2006). Garratt et al. (2006) provide a textbook treatment of GVAR.
dynamic and contemporaneous dependences across the N units, and ε(N ),t = (ε1t , ε2t , . . . , εNt )′ is an N × 1 vector of white noise errors with mean 0 and covariance matrix IN . VAR models have been extensively studied when N is small and fixed, and T is large and unbounded. This framework, however, is not appropriate for many empirical applications of interest. This paper aims to fill this gap by analyzing VAR models where both N and T are large. The sequence of models (1) and (2) with dim(x(N ),t ) = N → ∞ will be referred to as the infinitedimensional VAR model, or IVAR for short. The extension of the 6 Note that, if x is a vector, then ‖x‖ = Euclidean length of vector x.
√ ϱ(x′ x) = x′ x corresponds to the
6
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
IVAR(1) to IVAR(p), where p is fixed, is relatively straightforward, and will not be attempted in this paper. The analysis of dependence over time is simplified by the facts that ordering of observations along the time dimension (t = 1, 2, . . . , T ) is immutable and the arrival of new observations cannot change past realizations; namely, bygones are bygones. As a consequence, for any given N, i, and j, the cross-time covariance function, cov(x(N ),it , x(N ),j,t −ℓ ), does not change with T and will depend only on ℓ if the time series processes are covariance stationary. However, since it cannot be assumed that an immutable ordering necessarily exists with respect to the cross-section dimension, addition of new cross-section units to an existing set can potentially alter the pair-wise cross-section covariances of all the units. For instance, in models of oligopoly, where firms strategically interact with each other, new entries can change the relationship between the existing firms. Similarly, the introduction of a new asset in the market can change the correlation of returns on the existing assets. In what follows, to simplify the notation, the explicit dependence of xt and ut and the related parameter matrices on N will be suppressed, with (1)–(2) written as xt = Φxt −1 + ut ,
(3)
and ut = Rεt .
(4)
Clearly, it is not possible to estimate all the N 2 elements of the matrix Φ when both N and T are large. Only a small (fixed) number of unknown coefficients can be estimated per equation, and some restrictions on Φ must be imposed. In order to deal with the dimensionality problem, we assume that, for a given i ∈ N, it is possible to classify cross-section units a priori into ‘neighbors’ and ‘non-neighbors’. No restrictions are imposed on neighbors, but the non-neighbors are assumed to have only negligible effects on xit that vanish at a suitable rate with N. The number of neighbors of unit i, collected in the index set Ni , is assumed to be small (fixed). Neighbors of unit i can have nonnegligible effects that do not vanish even if N → ∞. A similar classification is followed in the spatial econometrics literature, where the non-neighborhood effects are set to zero for all N and the non-zero neighborhood effects are often assumed to be homogenous across i. In this sense, our analysis can also be seen as an extension of spatial econometric models. Subject to the above classification, the equation for unit i can be written as xit =
−
φij xj,t −1 +
φij xj,t −1 +uit .
(5)
j∈Nic
j∈Ni
−
Neighbors
Non-neighbors
The coefficients of the neighboring units, {φij }j∈Ni , are the parameters of interest, and they do not vary with N. The remaining coefficients, {φij }j∈N c , tend to zero for each i as N → ∞, where i Nic ≡ {1, . . . , N } \ Ni is the index set of non-neighbors. Note that the non-neighbors are unordered. More specifically,
|φij | ≤
K N
for any N ∈ N and any j ∈ Nic .
(6)
Individually, the coefficients of non-neighbors are asymptotically negligible, but, as we argue below, it is not clear if∑ the same applies to their aggregate effects on the ith unit, namely j∈N c φij xj,t −1 . i
The bounds in (6) ensure that limN →∞ j=1 |φij | < K . We refer to this as the ‘cross-section absolute summability condition’, which is distinct from the absolute summability condition used in the time series literature, where the same idea is applied to the coefficients of current and past innovations. A similar constraint
∑N
is used in Lasso and Ridge regression shrinkage methods. The Lasso estimation procedure applied to (3) involves minimizing ∑T ∑N 2 t =1 uit for each i subject to j=1 |φij | ≤ K . Under the Ridge regression, the minimization is carried out subject to the weaker ∑N 2 7 constraint, j=1 φij ≤ K . In applications of shrinkage methods, it is necessary that K is specified a priori, but no knowledge of the ordering of the units along the cross-section dimension is needed. In our approach, we do not need to specify the value of K . ∑ The sum of the coefficients of the non-neighboring units, j∈N c φij , does not necessarily tends to zero as N → ∞, which i
implies that the non-neighbors can have a large aggregate spatiotemporal impact on the unit i, as N → ∞. The question that we address is whether it is possible to estimate neighborhood coefficients {φij }j∈Ni without imposing further restrictions. As it turns out, the answer depends on the stochastic behavior of ∑ c φij xj,t −1 , which in turn depends on the strength of crossj∈N i
section dependence in {xit }. If {xit } is weakly cross-sectionally dependent, then
∑
q.m.
j∈Nic
φij xj,t −1 → 0, and the spillover effects
from non-neighboring units are neither particularly important nor estimable. But the coefficients associated with the neighboring units can be consistently estimated by simply ignoring the nonneighborhood effects that are of second-order importance in N. If, on the other hand, ∑ {xit } is strongly cross-sectionally dependent, then limN →∞ Var ( j∈N c φij xj,t −1 ) is not necessarily zero, and the i
spillover effects from non-neighbors are in general Op (1) and important. Therefore, ignoring the non-neighborhood effects can lead to inconsistent estimates. The concepts of weak and strong cross-section dependence were introduced in Chudik et al. (2010), and these concepts are applied to the IVAR model in the next section. Our approach to dealing with the curse of dimensionality can be motivated with a couple of examples. One important example is provided by the Arbitrage Pricing Theory (APT), originally developed by Ross (1976). Under approximate pricing, the conditional mean returns of N risky assets, µt , is modeled in terms of a fixed number (k) of factor risk premia, λt , and an N × 1 vector of pricing errors, vt , namely µt = Bλt + vt , where B is an N × k matrix of factor loadings. In the absence of arbitrage opportunities, we must have vt = 0 when N is fixed, or v′t vt = Op (1) as N → ∞ (see Hubermann (1982) and Ingersoll (1984)). It is clear that any pair-wise dependence of pricing errors must vanish as N → ∞, otherwise there will be unbounded profitable opportunities. Another example relates to a multi-country DSGE model discussed in Chudik (2008). The country interactions need not be symmetric. Nevertheless, as long as the foreign trade weights are granular, the equilibrium solution of such a multicountry DSGE model has a similar structure to the basic IVAR model set out in this paper. Neighbors in this set-up could, for example, be identified in terms of the trade shares. For instance, US would be Canada’s neighbor considering that 80% of Canada’s trade is with the US, although using the same metric Canada might not qualify as a neighbor of the US. In some cases, the strict division of individual units into neighbors and non-neighbors might be considered as too restrictive. In the assumption below, we consider a slightly more general set-up in which the neighborhood effects are characterized in terms of ‘local’ averages defined by S′i xt , where Si is a known spatial or neighborhood weight matrix. Assumption 1. Let K ⊆ N be a non-empty index set. For any i ∈ K , the row i of coefficient matrix Φ, denoted by φ′i , can be divided as
φ′i = φ′ai + φ′bi ,
(7)
7 See Section 3.4.3 of Hastie et al. (2001) for a detailed description of the Lasso and Ridge regression shrinkage methods.
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
7
3. Cross-sectional dependence in stationary IVAR models
where
‖φbi ‖∞ = max |φbij | < j∈{1,...,N }
K N
,
(8)
φai = Si δi ,
(9)
‖δi ‖ < K , δi is an hi × 1-dimensional vector containing the unknown coefficients to be estimated for unit i, which do not change with N, hi < K , hi is fixed and generally small, and Si is a known N × hi ‘spatial’ weight matrix such that ‖Si ‖1 < K . Assuming K ≡ N and stacking (7)–(9) for i = 1, 2, . . . , N, we have
(10)
′ δ1 0 D = .. N ×h .
δ2
0
0
0
···
0
0
,
′
.
(11)
δ′N
∑N
Φa
φ23 φ33 .. .
0 0
0 0
0 0
φ34 0 0
... ... ... .. . ... ...
0 0 0
0 0 0
0 0 0
.. .
.. .
.. .
φN −1,N −2 0
φN −1,N −1 φN −1,N
φN −1,N φNN
, (12)
where the non-zero elements are fixed coefficients that do not change with N. This represents a bilateral spatial representation in which each unit, except for the first and the last units, has one left and one right neighbor. In contrast, the individual elements of Φb are of order O(N −1 ), in particular |φbij | < NK for any N ∈ N and any i, j ∈ {1, . . . , N }. The equation for unit i ∈ {2, . . . , N − 1} can be written as xit = φi,i−1 xi−1,t −1 + φii xi,t −1 + φi,i+1 xi+1,t −1 + φ′bi xt −1 + uit . (13) Section 3 shows that, under weak cross-section dependence of q.m.
errors {uit }, φ′bi xt −1 → 0, while Section 4 considers the problem of estimation of the individual-specific parameters {φi,i−1 , φii , φi,i+1 }. We refer to this model as a two-neighbor IVAR model, which we use later for illustrative purposes as well as in the Monte Carlo experiments. Example 2. As a simple example, consider the model xt = ρx Sx xt −1 + ut ,
(14)
ut = ρu Su ut + εt ,
(15)
where ρx and ρu are scalar unknown coefficients, and Sx and Su are N × N known spatial weight matrices. This model can be obtained from (1)–(2) by setting R = (I − ρu Su )
−1
S = Sx ,
,
δi = ρx for i ∈ {1, . . . , N },
and Φb = 0.
(18)
The necessary condition for covariance stationarity for fixed N is that all eigenvalues of Φ lie inside the unit circle. For a fixed N, and assuming that maxi |λi (Φ)| < 1, the Euclidean norm of Φℓ defined by [Tr (Φℓ Φℓ′ )]1/2 → 0 exponentially in ℓ, and the process ∑∞ ℓ xt = ℓ=0 Φ ut −ℓ will be absolute summable. However, note that, as N → ∞, Var (xit ) need not necessarily be bounded in N even if maxi |λi (Φ)| < 1. For example, consider the IVAR(1) model with
ϕ ψ 0 Φ= . ..
ϕ ψ
0
0
Example 1. An example of Φa is given by 0
{xit } is said to be cross-sectionally strongly dependent (CSD) if
lim Var (w′ xt ) ≥ K > 0.
h = i=1 hi , and S is a known h × N matrix defined by S = (S1 , S2 , . . . , SN )′ . Note also that by assumption the individual elements of Φb are uniformly O(N −1 ).
φ12 φ22 φ32 .. .
for all t .
N →∞
′
..
(17)
there exists a sequence of weight vectors, w, satisfying (16)–(17) and a constant K such that
where Φa = (φa1 , φa2 , . . . , φaN ) , Φb = (φb1 , φb2 , . . . , φbN ) , ′
(16)
we have N →∞
= DS + Φb ,
0
1
‖w‖ = O(N − 2 ), wj 1 = O(N − 2 ) for any j, ‖w‖ lim Var (w′ xt ) = 0,
Φ = Φa + Φb ,
φ11 φ21 0 = .. . 0
This section investigates the correlation pattern of {xit }, over time, t, and along the cross-section units, i. We follow Chudik et al. (2010) and define the covariance stationary process {xit } to be cross-sectionally weakly dependent (CWD) if, for all weight vectors, w = (w1 , . . . , wN )′ , satisfying the ‘granularity’ conditions
0
0 0
ϕ .. . ···
··· ··· ..
. ψ
0 0 0 ,
0
ϕ
and assume that Var (uit ) is uniformly bounded away from zero as N → ∞. It is clear that all the eigenvalues of Φ are inside the unit circle if and only if |ϕ| < 1, regardless of the value of the neighborhood coefficient, ψ . Yet it is easily seen that the variance of xNt increases in N if ψ 2 +ϕ 2 ≥ 1. Therefore, a stronger condition than stationarity for each N is required to prevent the variance of xit from exploding as N → ∞. A set of sufficient conditions that ensure the existence of the variance of xit , even if N → ∞, is set out in the following assumptions. Assumption 2. The elements of the double index process {εit , i ∈ N, t ∈ Z} are independently distributed random variables with zero means and unit variances on the probability space (Ω , F , P ). Assumption 3 (CWD Errors). Matrix R has bounded row and column matrix norms. Assumption 4 (Stationarity and Bounded Variances). There exists a real ϵ , in the range 0 < ϵ < 1, such that8
‖Φ‖ ≤ 1 − ϵ.
(19)
Remark 1. Assumptions 2 and 3 imply that {uit } is CWD, since, for any weight vector, w, satisfying (16), we have Var (w′ ut ) ≤ ‖w‖2 ‖R‖1 ‖R‖∞ → 0 as N → ∞. For future reference, define covariance matrix Σ = Var (ut ) = RR′ and denote the ith diagonal element of Σ by σii2 = Var (uit ). Note also that ‖Σ ‖ ≤ ‖R‖1 ‖R‖∞ < K , which as shown in Pesaran and Tosetti (2010) 8 Our assumptions concerning coefficient matrix Φ can be relaxed so long as they hold for all N ≥ N0 (where N0 is a fixed constant that does not depend on N). But in order to keep the notation and exposition simple, we simply state that Assumptions 1 and 4 hold for any value of N.
8
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
includes all commonly used processes in the spatial literature, such as spatial autoregressive and spatial error component models pioneered by Whittle (1954), and further developed by Cliff and Ord (1973), Anselin (1988), and Kelejian and Robinson (1995). Remark 2. It is not necessary that proximity is measured in terms of physical space. Other measures such as economic (Conley, 1999; Pesaran et al., 2004) or social distance (Conley and Topa, 2002) could also be employed. All these are examples of dependence across nodes in a physical (real) or logical (virtual) networks. In the case of the IVAR model, defined by (3) and (4), such contemporaneous dependence can be modeled through the N × N network topology matrix R.9 , 10 Remark 3. The IVAR model when combined with ut = Rεt yields an infinite-dimensional spatiotemporal model. The model can also be viewed more generally as a ‘dynamic network’, with R and Φ capturing the static and dynamic forms of interconnections that might exist in the network. Remark 4 (Eigenvalues of Φ). Assumption 4 implies that polynomial Φ(L) = I − ΦL is invertible (for any N ∈ N) and
ϱ(Φ) ≤ 1 − ϵ,
(20)
which is a sufficient condition for covariance stationarity. Assumption 4 also ensures that Var (xit ) < K < ∞. Proposition 1. Consider model (1), and suppose that Assumptions 2–4 hold. Then, for any arbitrary sequence of fixed weights w satisfying condition (16), and for any t ∈ Z, lim Var (xwt ) = 0.
N →∞
(21)
Assumptions 2–4 are thus sufficient conditions for weak dependence. Proposition 1 has several interesting implications. Suppose that we can impose limiting restrictions given by Assumption 1. Corollary 1. Consider model (1), and suppose that Assumptions 1–4 hold. Then, for any i ∈ K ,
is augmented with common factors. The basic IVAR model, (3), can be augmented with exogenously specified common factors in a number of different ways. Here we consider two important possibilities. First, a finite number of common factors can be added to the vector of the error terms, defined by (4). This is equivalent to assuming that a finite number of the columns (or linear combinations of the columns) of R have unbounded norms. This compounding of the spatial (weak) cross-section dependence with the strong factor dependence complicates the analysis unduly, and will not be pursued here. A more attractive alternative would be to assume that
Φ(L)(xt − α − Γ ft ) = ut ,
for t = 1, 2, . . . , T ,
(25)
where Φ(L) = I − ΦL, α = (α1 , . . . , αN ) is an N × 1 vector of fixed effects, ft is an m × 1 vector of unobserved common factors (m is fixed but otherwise unknown), Γ =(γ 1 , γ 2 , . . . , γ N )′ is the N × m matrix of factor loadings, and, as before, ut = Rεt . Under this specification, the strong cross-section dependence of xit due to the factors is explicitly separated from other sources of crossdependence as embodied in Φ and R. ′
4. Estimation of a factor-augmented stationary IVAR model We now consider the problem of estimation and inference in the case of the factor-augmented IVAR model as set out in (25), as both N and T tend to infinity. We focus on the parameters of the ith equation and assume that φ′i (the ith row of matrix Φ) can be decomposed as in Assumption 1. See (7)–(9). As an important example, we consider the two-neighbor IVAR model defined in Example 1, where the parameters of interest are given by the elements of the ith row of matrix Φa given by (12). In what follows, we set ξ it = S′i xt , where Si is defined by (9), and note that it reduces to (xi−1,t , xit , xi+1,t )′ in the case of the two-neighbor IVAR model. We suppose that the following assumptions hold.
(22)
Assumption 5 (Available Observations). Available are ∑∞ observations ℓ x0 , x1 , . . . , xT , with the starting values x0 = Φ R ε + α + −ℓ ℓ=0 Γ f0 .
Remark 5. It is also possible to establish (22) under the following conditions:
Assumption 6 (Common Factors). The unobserved common factors, f1t , f2t , . . . , fmt , are covariance stationary and follow the general linear processes:
lim Var (xit − φ′ai xt −1 − uit ) = 0.
N →∞
1
‖φbi ‖ = O(N − 2 ), ‖Σ ‖ = O(N
1−ϵ
),
(23)
fst = ψs (L)εfst ,
(24)
ℓ where ψs (L) = ℓ=0 ψsℓ L with absolute summable coefficients that do not vary with N, and the factor innovations, εfst , are independently distributed over time with zero means and a constant variance, σε2fs , that do not vary with N. The εfst are also distributed independently of the idiosyncratic errors, εit ′ , for any i ∈ N, any t , t ′ ∈ T , and any s ∈ {1, . . . , m}. E (ft f′t ) exists and is a positive definite matrix.
which are less restrictive than condition (8) and Assumption 3 on the boundedness of the column and row norms of matrix R. These stronger conditions are needed for establishing the asymptotic properties of the CALS estimator to be proposed below in Section 4. 3.1. IVAR models with strong cross-sectional dependence The IVAR model can generate observations with strong crosssection dependence if the boundedness assumptions on the column and row norms of R and Φ are relaxed. The analysis of this case is beyond the scope of the present paper, and is considered in Pesaran and Chudik (2010). But even if the boundedness assumptions on R and Φ are maintained, it is still possible for xit to show strong cross-section dependence if the IVAR model 9 A network topography is usually represented by graphs whose nodes are identified with the cross-section units, with the pair-wise relations captured by the arcs in the graph. 10 It is also possible to allow for time variations in the network matrix, R, to capture changes in the network structure over time. However, this will not be pursued here.
for s = 1, 2, . . . , m,
(26)
∑∞
Assumption 7 (Existence of Fourth-Order Moments). There exists a 4 positive real constant K such that E (εfst ) < K and E (εit4 ) < K for any s ∈ {1, . . . , m}, any t ∈ T , and any i ∈ N. Assumption 8 (Bounded Factor Loadings and Fixed Effects). For any i ∈ N, γ i and αi do not change with N, ‖γ i ‖ < K , and |αi | < K . We follow Pesaran (2006) and introduce the following vector of cross-section averages xWt = W′ xt , where W = (w1 , w2 , . . . , wN )′ and {wj }Nj=1 are mw × 1-dimensional vectors. Subscripts denoting the number of groups are again omitted where not necessary, in order to keep the notation simple. Matrix W does not correspond to any spatial weight matrix. It is any arbitrary matrix
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
of pre-determined weights satisfying the following granularity conditions: 1
‖W‖ = O(N − 2 ), ‖wj ‖ 1 = O(N − 2 ) for any j. ‖W‖
(27) (28)
Multiplying (25) by the inverse of polynomial Φ(L) and then by W′ yields xWt = αW + Γ W ft + υ Wt ,
(29)
where αW = W α, Γ W = W Γ , υ Wt = W υ t , and ′
υt =
∞ −
′
Φ ut −ℓ .
(30)
Under Assumptions 2–3, {ut } is weakly cross-sectionally dependent, and
δi,CALS −1 T T − − bˆ i1 ′ = g g git xit . πi = it it bˆ i2 t =1 t =1 cˆi
∞ −
MH = IT − H(H′ H)+ H′ ,
H = [XW , XW (−1), τ],
(41)
Zi = [ξ i1 (−1), ξ i2 (−1), . . . , ξ ihi (−1)],
(42)
for r ∈ {1, . . . , hi },
τ is a T × 1 vector of ones, XW = (xW 1◦ , . . . , xWmw ◦ ), XW (−1) = [ xW 1 (−1), . . . , xWmw (−1)], xWs◦ = (xWs1 , . . . , xWsT )′ , xWs (−1) = (xWs0 , . . . , xWs,T −1 )′ , for s ∈ {1, . . . , mw }, and xi◦ = (xi1 , . . . , xiT )′ . For future reference, also let vit = S′i υ t = ξ it − S′i Γ ft − S′i α,
‖Φ ℓ ‖2 ,
),
(31)
where ‖W‖2 = O(N −1 ) by condition ‖Σ ‖∑= O(1) by ∑∞ (27), ∞ ℓ ℓ Assumption 3 (see Remark 1) and ‖ Φ ‖ ≤ ℓ=0 ℓ=0 ‖Φ‖ =
Q = [F, F(−1), τ],
1
(Γ W Γ W )−1 Γ W (xWt − αW ) = ft + Op (N − 2 ),
(32)
′
provided that the matrix Γ W Γ W is non-singular. It can be inferred that the full column rank of Γ W is important for the estimation of unit-specific coefficients. Pesaran (2006) shows that the full column rank condition is not, however, necessary, if the object of the interest is the cross-section mean of the parameters, E (δi ), as opposed to the unit-specific parameters, δi , which are the focus of the current paper. Using (25), the equation for unit i ∈ K can be written as xit − αi − γ ′i ft = δ′i S′i (xt −1 − α − Γ ft −1 ) + ζi,t −1 + uit ,
(33)
(43)
and 1
1
O(1) under Assumption 4. This implies that υ Wt = Op (N − 2 ), and the unobserved common factors can be approximated as ′
(40)
where
ξ ir (−1) = (ξir0 , . . . , ξi,r ,T −1 ) ,
ℓ=0
′
(39)
Also, using the partitioned regression formula,
′
∞ − ′ ℓ ′ℓ ‖Var (υWt )‖ = W Φ ΣΦ W , ℓ=0
= O(N
the cross-section augmented least-squares estimator (or CALS for short), and denote it by δi,CALS . We have
δi,CALS = (Z′i MH Zi )−1 Z′i MH xi◦ ,
ℓ=0
−1
δi , can now be estimated using the cross-section augmented regression defined by (38). We refer to such an estimator of δi as
′
ℓ
≤ ‖W‖2 ‖Σ ‖
9
A
(2m+1)×(2mw +1)
= 0 0
α′W ′
ΓW 0m×mw
α′W
0m×mw ,
(44)
′
ΓW
where F =(f1◦ , . . . , fm◦ ), F(−1) = [f1 (−1), . . . , fm (−1)], fr ◦ = (fr1 , . . . , frT )′ , and fr (−1) = (fr0 , . . . , fr ,T −1 )′ for r ∈ {1, . . . , m}. First, we consider the asymptotic properties of πi (and δi,CALS ) as j
(N , T ) → ∞, in the case where the number of unobserved common factors equals to the dimension of xWt (m = mw ), and make the following additional assumption. Assumption 9 (Identification of π i ). There exists T0 and N0 such ∑T that, for all T ≥ T0 , N ≥ N0 , and for any i ∈ K , (T −1 t =1 git g′it )−1 1 exists, C(N ),i = E (git g′it ) is positive definite, and ‖C− (N ),i ‖ < K .
where 1
ζit = φ′ib υt = Op (N − 2 ),
(34)
since, by Assumption 1, φib satisfies condition (27). It follows from (29) that
γ ′i ft − φ′ia Γ ft −1 = b′i1 xWt + b′i2 xW ,t −1 − (bi1 + bi2 )′ αW − b′i1 υWt − b′i2 υW ,t −1 , ′
(35)
′
′
′
where bi1 = γ ′i (Γ W Γ W )−1 Γ W and bi2 = −δ′i S′i Γ (Γ W Γ W )−1 Γ W . Substituting (35) into (33) yields xit = δ′i S′i xt −1 + b′i1 xWt + b′i2 xW ,t −1 + ci + uit + qit ,
(36)
where ci = αi − φia α − (bi1 + bi2 ) αW , and ′
Theorem 1. Let xt be generated by model (25), let Assumptions 1–9 hold, and let W be any arbitrary (pre-determined) matrix of weights satisfying conditions (27)–(28). Then, for any i ∈ K and as (N , T ) j
→ ∞, πi defined in Eq. (39) has the following properties. (a)
′
qit = ζi,t −1 − b′i1 υ Wt − b′i2 υ W ,t −1 = Op (N
− 21
p
).
πi − πi → 0. (37)
Consider now the following auxiliary regression based on (36): xit = g′it π i + ϵit ,
Remark 6. Assumption 9 implies that Γ W is a square, full rank matrix and, therefore, the number of unobserved common factors is equal to the number of columns of the weight matrix, W (m = mw ). In cases where m < mw , full augmentation of individual models by (cross-section) averages is not necessary.
(38)
where ϵit = uit + qit , π i = (δ′i , b′i1 , b′i2 , ci )′ is the ki × 1 vector ′ ′ of coefficients associated with the regressors git = (ξ i,t −1 , xWt , ′ ′ xW ,t −1 , 1) , and ki = hi + 2mw + 1. The parameters of interest,
(b) If, in addition, T /N → ~ , with 0 ≤ ~ < ∞,
√ T
σ(N ),ii
1
D
πi − πi ) → N (0, Iki ), C(2N ),i (
(45) 1
where σ(2N ),ii = Var (uit ) = E (e′i RR′ ei ), and C(2N ),i is the square root of the positive definite matrix C(N ),i = E (git g′it ). Also
10
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
(c) p
C(N ),i − C(N ),i → 0,
and
p
σ(N ),ii − σ(N ),ii → 0,
where T 1− C(N ),i = git g′it ,
T t =1
σ(2N ),ii =
T 1−
T t =1
u2it ,
(46)
and uit = xit − g′it πi . Remark 7. Suppose that, in addition to the assumptions of 1 2 Theorem 1, the limits of C− (N ),i and σ(N ),ii , as N → ∞, exist and 1 2 11 are given by C− (∞),i , and σ(∞),ii , respectively. Then (45) yields
√
D
−1 2 T ( πi − πi ) → N (0, σ(∞), ii C(∞),i ).
xt − γ ft = Φ(xt −1 − γ ft −1 ) + ut ,
(47)
Consider now the case where the number of unobserved common factors is unknown, but it is known that mw ≥ m. Since the auxiliary regression (38) is augmented possibly by a larger number of cross-section averages than the number of unobserved common factors, we have a potential problem of multicollinearity (as N → ∞). But this does not affect the estimation of δi so long as the space spanned by the unobserved common factors including ′ a constant and the space spanned by the vector (1, xWt )′ are the same as N → ∞. This is the case when Γ W has full column rank. For this more general case we replace Assumption 9 with the following, and suppress the subscript N to simplify the notation. Assumption 10 (Identification of δi ). There exist T0 and N0 such that, for all T ≥ T0 , N ≥ N0 , and for any i ∈ K , (T −1 Z′i MH Zi )−1 ∑∞ exists, Γ W is a full column rank matrix, Ωv i = E (vit v′it ) = ℓ=0 1 S′i Φℓ RR′ Φ′ℓ Si is positive definite, and ‖Ω− v i ‖ = O(1). Theorem 2. Let xt be generated by model (25), let Assumptions 1–8 and 10 hold, and let W be any arbitrary (pre-determined) matrix of weights satisfying conditions (27)–(28) and Assumption 10. Then, for j
any i ∈ K , and if in addition (N , T ) → ∞ such that T /N → ~ , with 0 ≤ ~ < ∞, the asymptotic distribution of δi,CALS defined by (40) is given by
√ T
σii
1
D
Ωv2i ( δi,CALS − δi ) → N (0, Ihi ),
where σii2 = Var (uit ), Ωv i = E (vit v′it ) and vit = S′i υ t = S′i Φℓ ut −ℓ .
(48)
∑∞
ℓ=0
Remark 8. As before, we also have
√
compare the results with those from standard least-squares estimators. The objectives of the experiments are twofold. First, we would like to investigate how well the CALS estimator performs in the presence of unobserved common factors. Second, we would like to find out the extent to which cross-section augmentation affects the small-sample properties of the estimator when the cross-section dependence is weak, and therefore crosssection augmentation is asymptotically unnecessary. The focus of our analysis will be on the estimation of the individualspecific parameters in an IVAR model that also allows for other interdependences that are of order O(N −1 ). The data-generating process (DGP) used is given by
D
−1 2 T ( δi,CALS − δi ) → N (0, σ(∞), ii Ωv(∞),i ),
2 2 where Ωv(∞),i = limN →∞ Ωv i , and σ(∞), ii = limN →∞ σii , assuming limits exist.
5. Monte Carlo (MC) experiments
(49)
where ft is the only unobserved common factor considered (m = 1), and γ = (γ1 , . . . , γN )′ is the N × 1 vector of factor loadings. We consider two sets of factor loadings to distinguish the cases of weak and strong cross-section dependence. Under the former we set γ = 0, and under the latter we generate the factor loadings γi , for i = 1, 2, . . . , N, from a stationary spatial process in order to show that our estimators are invariant to possible cross-section dependence in the factor loadings. Accordingly, the factor loadings are generated by the following bilateral spatial autoregressive (SAR) model process:
γ i − µγ =
aγ 2
(γi−1 + γi+1 ) − aγ µγ + ηγ i ,
0 < aγ < 1,
(50)
2 where ηγ i ∼ IIDN (0, σηγ ). As established by Whittle (1954), the unilateral SAR(2) scheme
γi = ψγ 1 γi−1 + ψγ 2 γi−2 + ηγ i ,
(51)
with ψγ 1 = αγ + βγ , ψγ 2 = −αγ βγ , αγ = (1 −
βγ−1 = (1 +
1 − a2γ )/aγ , and
1 − a2γ )/aγ , generates the same autocorrelations as
the bilateral SAR(1) scheme (50). The factor loadings are generated using the unilateral scheme (51) with 50 burn-in data points (i = −49, . . . , 0) and the initialization γ−51 = γ−50 = 0. We set 2 2 aγ = 0.4, µγ = 1, and choose σηγ given by σηγ = (1 + ψγ 2 )[(1 −
ψγ 2 )2 − ψγ21 ]/(1 − ψγ 2 ), such that Var (γi ) = 1. The common factors are generated according to the AR(1) process: ft = ρf ft −1 + ηft , ηft ∼ IIDN (0, 1 − ρf2 ), with ρf = 0.9. In line with the theoretical analysis, the autoregressive parameters are decomposed as Φ = Φa + Φb , where Φa captures own and neighborhood effects as in
ϕ1 ψ2 0 Φa = 0
ψ1 ϕ2 ψ3
ψ2 ϕ3
0
ψ4
0
0
0
0
0 0
ψ3 ϕ4 .. .
0 0 0
.. ..
.
. ψN
ψ N −1 ϕN
,
5.1. Monte Carlo (MC) design
and the remaining elements of Φ, defined by Φb , are generated as
In this section, we report some evidence on the small-sample properties of the CALS estimator in the presence of unobserved common factors and weak error cross-section dependence, and
φbij =
11 A sufficient condition for lim N →∞ C(N ),i to exist is the existence of the following limits (together with Assumptions 1–8): limN →∞ S′i α, limN →∞ S′i Γ , limN →∞ W′ Γ , ∑ ∞ limN →∞ W′ α, and limN →∞ ℓ=0 S′i Φℓ RR′ Φ′ℓ Si .
λi ∼ IIDU (−0.1, 0.2) and ωij =
λi ωij 0
for j ̸∈ {i − 1, i, i + 1} for j ∈ {i − 1, i, i + 1},
where
ςij , ∑ ςij N
j =1
(52)
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
with ςij ∼ IIDU (0, 1). This ensures that φbij = Op (N −1 ), and limN →∞ E (φbij ) = 0, for all i and j. With Φa as specified above, each unit i, except the first and the last, has two neighbors: the ‘left’ neighbor i − 1 and the ‘right’ neighbor i + 1. The DGP for the ith unit can now be written as x1t = ϕ1 x1,t −1 + ψ1 x2,t −1 + φ′b1 xt −1 + γ1 ft − (φ′1 γ)ft −1 + u1t , xit = ϕi xi,t −1 + ψi (xi−1,t −1 + xi+1,t −1 ) + φ′bi xt −1 + γi ft − (φ′i γ)ft −1 + uit , i ∈ {2, . . . , N − 1},
To ensure that the DGP is stationary, we generate ϕi ∼ IIDU (0.4, 0.6), and ψi ∼ IIDU (−0.1, 0.1) for i ̸= 2. We choose to focus on the equation for unit i = 2 in all experiments, and we set ϕ2 = 0.5 and ψ2 = 0.1. This yields ‖Φ‖∞ ≤ 0.9, and together with |ρf | < 1 it is ensured that the DGP is stationary and that the variance of xit is bounded in N. The cross-section averages, xwt , are ∑N constructed as simple averages, xt = N −1 j=1 xit . The N-dimensional vector of error terms, ut , is generated using the following SAR model:
i ∈ {2, . . . , N − 1}
for t = 1, 2, . . . , T . We set au = 0.2, which ensures that the errors are cross-sectionally weakly dependent, and generate εit , the ith element of εt , as IIDN (0, σε2 ). We set σε2 = N /tr (Ru R′u ) so that on average Var (uit ) = 1, where Ru = (I − au S)−1 , and the spatial weight matrix S is
0
1
2 0 S=
0
1
0
1 2
0 1 2 0
..
.
0
0
0
0
1 2
..
.
1 0
2 0
..
. 1 0
.
0 1
(53)
2 0
In order to minimize the effects of the initial values, the first 50 observations are dropped. N ∈ {25, 50, 75, 100, 200} and T ∈ {25, 50, 75, 100, 200}. For each N, all parameters were set at the beginning of the experiments and 2000 replications were carried out by generating new innovations εit , ηft , and ηγ i . The focus of the experiments is to evaluate the small-sample properties of the CALS estimator of the own coefficient ϕ2 = 0.5 and the neighboring coefficient ψ2 = 0.1, in the case of the second cross-section unit.12 The cross-section augmented regression for estimating (ϕ2 , ψ2 ) is given by x2t = c2 + ψ2 (x1,t −1 + x3,t −1 ) + ϕ2 x2,t −1 + δ2,0 xt
+ δ2,1 xt −1 + ϵ2t .
Tables 1 and 2 give the bias (×100) and root mean square error (RMSE: ×100) of the CALS and LS estimators as well as size and power of tests based on them at the 5% nominal level. The results for the estimated own coefficient, ϕ2,CALS , and ϕ2,LS , are reported in Table 1. The top panel of this table presents the results for the experiments with an unobserved common factor (γ ̸= 0). In this case, {xit } is CSD, and the standard LS estimator without augmentation by cross-section averages is not consistent. The bias of ϕ2,LS is indeed quite substantial for all values of N and T = 200, and the tests based on ϕ2,LS are grossly oversized. CALS, on the other hand, performs well for T ≥ 100 and all values of N. For smaller values of T , there is a negative bias, and the test based on ϕ2,CALS is slightly oversized. This is the familiar time series bias, where even in the absence of any cross-section dependence the LS estimator of the autoregressive coefficient is biased downward (when ϕ2 > 0) in small-T samples. Moving on to the experiments without a common factor (given at the bottom half of the table), we observe that the LS estimator only slightly outperforms the CALS estimator. In the absence of common factors, {xit } is weakly cross-sectionally dependent, and therefore the augmentation with cross-section averages is (asymptotically) innocuous. The distortions coming from crosssection augmentation are in this case very small. Note that the LS estimator is not efficient because the residuals are crosssectionally dependent. Augmentation by cross-section averages helps to reduce part of this dependence. Nevertheless, the reported RMSE of ϕ2,CALS does not outperform the RMSE of ϕ2,LS . The estimation results for the neighboring coefficient, ψ2 , are presented in Table 2. These are qualitatively similar to the ones reported in Table 1. Cross-section augmentation is clearly needed and is very helpful when common factors are present. But in the absence of such common effects, the presence of weak crosssection dependence, whether through the dynamics or error processes, does not pose any difficulty for the LS and CALS estimators so long as N is sufficiently large. Finally, not surprisingly, the estimates are subject to the small-T bias irrespective of the size of N or the degree of cross-section dependence. Fig. 1 plots the power of the CALS estimator of the own coefficient, ϕ2,CALS , (left chart) and the neighboring coefficient, 2,CALS , (right chart) for N = 200 and two different values of ψ T ∈ {100, 200}. These charts provide a graphical representation of the results reported in Tables 1–2, and also suggest significant improvement in power as T increases for a number of different alternatives.
(54)
We also report results of the least-squares (LS) estimator computed using the above regression but without augmentation with cross-section averages. The corresponding CALS estimator and non-augmented LS estimator are denoted by ϕ2,CALS and ϕ2,LS 2,CALS and ψ 2,LS (neighboring coefficient), (own coefficient), or ψ respectively. To summarize, we carry out two different sets of experiments, one set without the unobserved common factor (γ = 0), and the 12 Similar results are also obtained for other cross-section units.
other with the unobserved common factor (γ ̸= 0). There are many sources of interdependence between individual units: spatial dependence of innovations {uit }, spatiotemporal interactions due to coefficient matrices Φa and Φb , and finally, in the case where γ ̸= 0, the cross-section dependence also arises via the unobserved common factor, ft , and the cross-sectionally dependent factor loadings, γi . 5.2. Monte Carlo results
xNt = ϕN xN ,t −1 + ψN xN −1,t −1 + φ′b,N xt −1 + γN ft − (φ′N γ)ft −1 + uNt .
u1t = au u2t + ε1t , au uit = (ui−1,t + ui+1,t ) + εit , 2 uNt = au uN −1,t + εNt ,
11
6. An empirical illustration: a spatiotemporal model of house prices in the US In a recent study, Holly et al. (2010), hereafter HPY, consider the relation between real house prices, pit , and real per capita personal disposable income yit (both in logs) in a panel of 49 US states over 29 years (1975–2003), where i = 1, 2, . . . , 49 and t = 1, 2, . . . , T . Controlling for heterogeneity and cross-section dependence, they show that pit and yit are cointegrated with coefficients (1, −1), and provide estimates of the following panel error correction model:
1pit = ci + ωi (pi,t −1 − yi,t −1 ) + δ1i 1pi,t −1 + δ2i 1yit + υit .
(55)
12
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
Table 1 MC results for the own coefficient ϕ2 . (N,T)
Bias (×100) 25
50
75
100
200
Root mean square error (×100)
Size (5% level, H0 : ϕ2 = 0.50)
25
25
50
75
100
200
50
75
100
Power (5% level, H1 : ϕ2 = 0.60) 200
25
50
75
100
200
Experiments with non-zero factor loadings LS estimator not augmented with cross-section averages, ϕ2,LS
−4.00 −2.90 −3.72 −3.30 −3.31
25 50 75 100 200
4.79 7.31 4.90 7.14 4.32 7.03 4.44 7.30 4.17 6.55 CALS estimator ϕ2,CALS
25 50 75 100 200
−17.83 −7.51 −18.89 −8.99 −20.63 −9.79 −20.32 −10.04 −20.30 −9.40
−5.02 −6.16 −6.25 −6.34 −6.27
8.60 9.29 8.89 8.43 8.85
11.04 10.89 11.30 11.46 10.94
−3.10 −3.83 −4.36 −4.81 −4.74
−0.94 −1.93 −2.21 −2.12 −1.94
24.73 18.82 17.91 24.13 19.16 17.81 24.67 18.92 17.65 24.96 19.02 17.78 24.46 18.67 17.23
17.58 17.61 13.40 20.75 28.80 33.45 43.70 15.75 17.64 17.53 11.60 22.75 28.70 34.45 45.45 13.15 17.75 17.81 13.80 21.10 27.90 34.95 45.45 15.80 17.61 18.03 13.75 21.40 28.10 33.15 45.75 15.35 17.50 17.51 12.00 19.90 26.50 33.20 44.05 14.60
20.30 20.60 20.25 20.85 18.70
25.25 29.45 41.80 25.75 28.50 41.45 25.00 30.15 41.75 25.00 29.90 42.35 23.75 28.80 40.95
27.88 28.88 29.70 29.93 30.05
9.86 10.02 10.44 10.64 10.43
6.38 6.72 6.77 6.76 6.53
12.75 12.00 14.40 14.45 13.65
8.95 9.50 9.70 9.35 9.90
8.50 8.30 7.55 8.50 8.05
6.45 6.75 7.45 8.15 7.65
5.65 5.55 6.25 6.25 5.30
23.25 23.60 25.05 25.45 26.40
23.25 25.75 26.45 27.65 25.30
26.80 30.00 29.70 29.85 29.00
29.40 29.90 32.70 34.30 33.70
41.55 46.50 48.30 48.05 46.25
6.44 6.65 6.67 6.62 6.66
8.75 8.70 8.60 9.00 8.15
6.85 7.95 7.20 7.85 6.70
5.80 6.25 6.80 5.70 6.75
5.75 6.30 6.25 6.40 5.75
5.45 5.95 5.05 5.85 5.35
17.45 18.30 18.35 17.50 17.00
18.80 19.25 20.55 21.65 19.90
23.05 23.10 22.40 22.80 23.65
26.25 26.55 27.20 26.35 27.50
41.10 45.05 44.05 43.35 42.60
6.25 9.35 6.54 10.25 6.63 9.95 6.61 10.50 6.77 10.10
7.30 8.65 7.35 8.20 7.70
6.10 7.05 7.00 5.55 7.80
5.20 6.40 7.10 6.20 6.05
4.90 5.80 5.25 5.50 5.95
18.65 19.80 19.70 19.50 18.95
18.90 20.40 22.10 24.15 21.70
23.00 23.70 22.70 24.05 24.55
24.45 26.65 27.25 27.35 28.05
36.80 44.70 42.60 42.35 42.85
15.78 16.93 17.10 17.07 17.07
12.14 12.62 12.64 12.63 12.62
Experiments with zero factor loadings LS estimator not augmented with cross-section averages, ϕ2,LS 25 50 75 100 200
−12.90 −13.38 −13.65 −12.68 −12.50
−6.15 −6.55 −6.72 −6.90 −6.54
−4.07 −4.44 −4.04 −4.32 −4.23
−3.06 −3.17 −3.13 −3.17 −3.25
−1.45 −1.67 −1.68 −1.55 −1.75
24.45 24.57 24.63 24.36 23.84
14.81 15.13 15.30 15.56 15.18
11.48 11.52 11.69 11.54 11.79
9.65 9.77 9.80 9.68 9.76
−2.48 −3.14 −3.30 −3.42 −3.72
−0.69 −1.47 −1.58 −1.54 −1.89
25.88 26.59 27.25 26.78 26.50
15.21 15.63 16.01 16.31 16.13
11.49 11.74 11.92 11.88 12.27
9.46 9.90 10.00 9.88 10.03
CALS estimator ϕ2,CALS 25 50 75 100 200
−14.25 −15.22 −15.78 −15.03 −15.03
−6.26 −7.03 −7.41 −7.77 −7.59
−3.82 −4.63 −4.31 −4.80 −4.93
Notes: ϕ2 = 0.5, ψ2 = 0.1, aγ = 0.4, au = 0.2, and Var (γi ) = 1. The DGP is given by the two-neighbor IVAR model (49), where the equation for unit i ∈ {2, . . . , N − 1} ′ is xit = ϕi xi,t −1 + ψi (xi−1,t −1 + xi+1,t −1 ) + φbi xt −1 + γi ft − φi′ γ ft −1 + uit . The CALS estimator of the own coefficient ϕ2 and the neighboring coefficient ψ2 is computed 2,LS are computed from the auxiliary ϕ2,LS and ψ using the following auxiliary regression: x2t = c2 + ψ2 (x1,t −1 + x3,t −1 ) + ϕ2 x2,t −1 + δ2,0 xt + δ2,1 xt −1 + ϵ2t . Estimators regressions not augmented with cross-section averages. The unobserved common factor ft is generated as a stationary AR(1) process, and factor loadings and innovations {uit } are generated according to stationary spatial autoregressive processes. Please refer to Section 5 for the detailed description of Monte Carlo design.
Table 2 MC results for the neighboring coefficient ψ2 . (N,T)
Bias (×100) 25
50
Size (5% level, H0 : ψ2 = 0.10)
Root mean square error (×100) 75
100
200
25
50
75
100
Power (5% level, H1 : ψ2 = 0.20)
200
25
50
75
100
200
25
50
75
100
200
Experiments with non-zero factor loadings
2,LS LS estimator not augmented with cross-section averages, ψ 25 50 75 100 200
9.28 7.28 6.95 7.06 8.49 7.39 7.38 6.27 8.91 7.33 7.23 7.18 8.28 7.84 7.16 6.80 8.83 7.81 7.17 6.99 2,CALS CALS estimator ψ
6.71 6.73 6.25 6.68 6.58
21.77 21.45 21.60 21.53 21.98
16.57 16.44 16.86 16.78 16.86
15.30 15.22 15.45 15.03 15.34
14.41 14.17 14.55 14.34 14.18
12.99 13.01 13.13 13.45 12.75
18.30 18.10 18.75 17.20 18.10
28.55 25.15 26.75 27.15 28.10
35.90 34.65 37.15 35.75 36.30
43.55 41.05 42.15 44.10 42.30
58.40 59.55 59.20 59.30 58.05
16.85 15.45 17.00 15.80 16.90
25.30 23.80 24.95 24.55 25.65
31.20 29.60 30.05 29.10 30.75
33.00 34.10 33.00 33.75 33.05
41.95 43.35 45.20 44.85 42.30
25 50 75 100 200
2.47 1.77 2.46 1.54 1.96
0.78 0.76 0.69 0.44 0.34
17.30 17.44 17.41 17.70 17.58
10.92 10.38 10.32 10.42 10.49
8.36 8.06 7.96 8.26 8.25
7.23 6.89 6.78 7.18 6.70
4.93 4.73 4.74 4.67 4.43
8.90 8.90 8.60 10.35 8.30
7.70 6.65 6.10 7.10 6.00
6.95 6.05 5.90 6.60 5.70
6.90 5.70 5.20 6.55 5.70
6.60 6.10 6.10 5.50 4.95
11.65 12.30 11.30 12.70 12.00
17.05 15.90 16.20 15.95 16.35
22.10 22.50 21.60 24.40 22.95
30.55 30.70 30.70 30.25 29.70
52.85 54.00 53.05 54.90 56.45
1.41 1.29 1.27 1.70 1.32
1.39 1.28 0.93 0.86 1.01
1.01 0.50 0.55 0.75 0.71
Experiments with zero factor loadings
2,LS LS estimator not augmented with cross-section averages, ψ 25 50 75 100 200
1.04 2.33 2.04 1.60 1.25
0.90 0.88 1.11 1.37 0.65
0.38 0.66 0.77 0.61 0.93
0.48 0.28 0.74 0.48 0.37
0.23 0.47 0.22 0.24 0.09
15.73 16.38 16.17 16.10 16.14
9.98 10.43 10.10 10.20 10.22
7.88 8.17 8.28 8.18 7.92
6.86 6.62 6.81 6.89 6.75
4.74 4.64 4.64 4.72 4.67
6.60 8.05 7.50 8.40 7.15
6.05 7.05 5.95 6.30 5.90
5.60 6.35 6.60 6.90 5.45
5.65 5.35 5.85 6.25 5.50
5.20 4.70 4.55 5.70 5.35
12.40 12.20 11.70 12.40 11.75
16.20 17.80 17.00 15.75 18.55
25.30 24.10 24.50 25.40 23.45
31.50 31.60 29.45 30.55 30.95
56.05 55.55 56.65 56.35 57.30
0.36 0.53 0.25 0.25 0.07
16.65 17.64 17.25 16.97 17.17
10.30 10.81 10.40 10.38 10.62
8.19 8.32 8.48 8.31 8.07
6.97 6.80 6.90 6.94 6.87
4.73 4.68 4.69 4.79 4.68
6.60 9.30 8.55 8.35 7.70
6.60 6.75 6.25 6.00 6.40
6.45 6.35 6.60 6.50 5.10
5.20 5.80 5.95 6.10 5.75
5.15 5.30 4.85 5.75 5.05
11.65 11.15 11.80 12.05 10.75
14.90 17.45 16.50 14.90 16.80
24.60 23.75 23.75 24.05 22.70
29.70 31.25 28.45 29.90 29.85
53.50 54.20 56.35 55.75 56.70
2,CALS CALS estimator ψ 25 50 75 100 200
1.12 2.63 2.08 1.74 1.30
0.94 0.88 1.27 1.39 0.71
0.48 0.77 0.80 0.61 0.92
See the notes to Table 1.
0.63 0.39 0.76 0.55 0.40
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
13
Fig. 1. Power curves for the CALS t-tests of the own coefficient, ϕ2 , (left chart) and the neighboring coefficient, ψ2 , (right chart) in the case of experiments with γ ̸= 0 and N = 200. Table 3 Alternative average estimates of the error correction models for house prices across 49 US states over the period 1975–2003.
1pit
Holly et al. (2010) regressions without dynamic spatial effects
Regressions augmented with dynamic spatial effects
MG
CCEMG
CCEP
MG
CCEMG
CCEP
pi,t −1 − yi,t −1
−0.105
−0.183
−0.171
−0.095
−0.154
−0.152
∆pi,t −1
0.524
0.449
0.518
0.296
0.188
0.272
∆yit
0.500
0.277
0.227
0.497
0.284
0.201
∆psi,t −1
–
–
–
0.331
0.350
0.431
(0.008)
(0.030)
(0.040)
(0.016)
(0.038)
(0.015)
(0.065)
(0.059)
(0.063)
(0.009)
(0.060)
(0.040)
(0.066)
(0.018)
(0.049)
(0.059)
(0.085)
(0.018)
(0.082) (0.088)
(0.105)
R¯ 2
0.54
0.70
0.66
0.60
0.79
0.72
Average cross-correlation coefficients (ρˆ )
0.284
−0.005
−0.016
0.267
−0.012
−0.016
Notes: MG, CCEMG, and CCEP, respectively, stand for the mean group, the common correlated effects mean group, and the common correlated effects pooled estimators ¯ t = Σi49 ¯ t −1 − y¯ t −1 = Σi49 defined in Pesaran (2006). Augmentation by simple cross-section averages, 1p¯ t = Σi49 =1 1pit /49, 1y =1 1yit /49, and p =1 (pi,t −1 − yi,t −1 )/49, is used to deal with the possible effects of strong cross-section dependence. Standard errors are in parentheses. ρˆ denotes the average pair-wise correlation of the residuals from the cross-section augmented regressions across the 49 US states.
To take account of unobserved common factors, HPY augmented (55) with simple cross-section averages, 1p¯ t = Σi49 =1 1pit /49, 49 ¯ ¯ 1y¯ t = Σi49 1 y / 49, and p − y = Σ ( p − yi,t −1 )/49, it t − 1 t − 1 i , t − 1 =1 i =1 and obtained common correlated effects mean group and pooled estimates (denoted as CCEMG and CCEP) of {ωi , δ1i , δ2i }, which we reproduce in the left panel of Table 3. HPY then showed that the residuals from these regressions, υˆ it , display a significant degree of spatial dependence. Here we exploit the theoretical results of the present paper and consider the possibility that dynamic neighborhood effects are partly responsible for the residual spatial dependence reported by HPY. To this end, we considered an extended version of (55) in which the lagged spatial variable ∑ 1psi,t −1 = Nj=1 sij 1pj,t −1 is also included amongst the regressors, with sij being the (i, j)th element of a spatial weight matrix, S, namely
1pit = ci + ωi (pi,t −1 − yi,t −1 ) + δ1i 1pi,t −1 + ψi 1psi,t −1
+ δ2i 1yit + υit .
(56)
Here we consider a simple contiguity matrix with sij = 1 when the states i and j share a border and zero otherwise, and with sii = 0. Possible strong cross-section dependence is again controlled for by augmentation of the extended regression equation with 1p¯ t , 1y¯ t , and p¯ t −1 − y¯ t −1 . Estimation results are reported in the right panel of Table 3. The dynamic spatial effects are found to be highly significant, irrespective of the estimation method, increasing R¯ 2 of the price equation by 6–9%. The dynamics of past price changes are now distributed between own and neighborhood effects giving rise to much richer dynamics and spillover effects. It is also interesting that the inclusion of the spatiotemporal variable 1psi,t −1 in the model has had little impact on the estimates of the coefficient of the real income variable, δ2i .
7. Concluding remarks This paper has proposed restrictions on the coefficients of infinite-dimensional VAR (IVAR) that are binding only in the limit as the number of cross-section units (or variables in the VAR) tends to infinity to circumvent the curse of dimensionality. The proposed framework relates to the various approaches considered in the literature. For example, when modeling individual households or firms, aggregate variables, such as market returns or regional/national income, are treated as exogenous. This is intuitive, as the impact of a firm or household on the aggregate economy is small, of the order O(N −1 ). This paper formalizes this idea in a spatiotemporal context. The paper establishes that, in the absence of common factors and when the degree of cross-section dependence is weak, then equations for individual units decouple as N → ∞, and can be consistently estimated by running separate regressions. In the presence of observed and/or unobserved common factors, individual-specific VAR models can still be estimated separately if they are conditioned on the common factors. Unobserved common factors can be approximated by cross-sectional averages, following the idea originally introduced by Pesaran (2006). This paper shows that the global VAR approach of Pesaran et al. (2004) can be motivated as an approximation to an IVAR model featuring all the macroeconomic variables. Asymptotic distribution of the cross-section augmented least-squares (CALS) estimator of the parameters of the unit-specific equations in the IVAR model is established both in the case when the number of unobserved common factors is known, and when it is unknown but fixed. Small-sample properties of the proposed CALS estimator were investigated through Monte Carlo simulations, and an empirical
14
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
illustration shows the statistical significance of dynamic spillover effects in modeling of US real house prices across the neighboring states. Topics for future research include estimation and inference in the case of IVAR models with dominant individual units, analysis of large dynamic networks with and without dominant nodes, and an examination of the relationships between IVAR and dynamic factor models.
∞ FNt . Let {{cNt }∞ t =−∞ }N =1 be a two-dimensional array of constants, and set cNt = 1/TN for all t ∈ Z and N ∈ N. Note that
[ E
E
κNt cNt
]2 | F N ,t − n
∞ −
=
ℓ=mnp
≤ ςn , where mnp = max{n, p} and
ςn = sup ‖θ‖ ‖Σ ‖ ‖Φ‖ 2
Proof of Proposition For any N ∈ N, the variance of xt is Ω = ∑∞ ℓ 1. ′ℓ Var (xt ) = Φ ΣΦ , and, under Assumptions 2–4, ‖Ω‖ ≤ ℓ= 0 ∑ 2ℓ ‖Σ ‖ ∞ ℓ=0 ‖Φ‖ < K . Hence, it follows that, for any arbitrary nonrandom vector of weights satisfying the granularity condition (16), ‖Var (w′ xt )‖ = ‖w′ Ωw‖ ≤ ‖ϱ(Ω)(w′ w)‖, where ϱ(Ω) = ‖Ω‖ < K , and w′ w = O(N −1 ) by condition (16). Therefore, limN →∞ ‖Var (w′ xt )‖ = 0. Proof of Corollary 1. Assumption 1 implies that, for any i ∈ K , vector φbi satisfies condition (16). It follows from Proposition 1 that lim Var (φ′bi xt ) = 0
N →∞
for i ∈ K .
xit − φai xt −1 − uit = φbi xt −1 ,
ς0 < K ,
Taking the variance of (58) and using (57) now yields (22).
Lemma 1. Suppose that Assumptions 2–4 hold. Then, for any p, q ∈ {0, 1}, and for any sequences of non-random vectors θ and ϕ, such j
that ‖θ‖ = O(1) and ‖ϕ‖1 = O(1), as (N , T ) → ∞, we have T 1− ′ p θ υt −p → 0, T t =1
(59)
and T 1− ′ p θ υt −p ϕ′ υt −q − E (θ ′ υt −p ϕ′ υt −q ) → 0, T t =1
(60)
where the process υ t is defined by (30). Furthermore, if ‖θ‖ = O(N −1/2 ), then
√
T N − ′ p θ υt → 0, T t =1
(61)
TN −
lim
T √ N − ′ p θ υt −p ϕ′ υt −q − E ( N θ ′ υt −p ϕ′ υt −q ) → 0. T t =1
(62)
Proof. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN →∞ TN = ∞. Consider the following ∞ two-dimensional array, {{κNt , FNt }∞ t =−∞ }N =1 , defined by
κNt =
1 ′ θ υ t −p , TN
where the subscript N is used to emphasize the number of cross-section units,13 {FNt } denotes the array of σ -fields that is increasing in t for each N, and κNt is measurable with respect to 13 Note that vectors υ and θ change with N as well, but the subscript N is omitted t here to keep the notation simple.
and ςn → 0
N →∞
cNt = lim
N →∞
t =1 TN −
lim
.
as n → ∞.
(64)
2 cNt = lim
N →∞
t =1
TN − 1 t =1
TN
TN − 1 t =1
TN2
= 1 < ∞,
(65)
= 0.
(66)
∞ Therefore, array {{κNt , FNt }∞ t =−∞ }N =1 satisfies the conditions of a
mixingale weak law,16 which implies that
∑TN
L1
t =1
κNt → 0, i.e.,
T 1− ′ L1 θ υ t −p → 0 , T t =1 j
as (N , T ) → ∞ at any rate. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (59). Under the condition ‖θ‖ √ = O(N −1/2 ), result (61) follows from result (59) by noting that ‖ N θ‖ = O(1). Result (60) is established in a similar fashion. Consider the ∞ following two-dimensional array, {{κNt , FN ,t }∞ t =−∞ }N =1 , defined 17 by
κNt =
1 ′ 1 θ υt −p ϕ′ υt −q − E (θ ′ υt −p ϕ′ υt −q ), TN TN
where, as before, TN = T (N ) is any non-decreasing integer-valued function of N such that limN →∞ TN = ∞. Set cNt = 1/TN for all t ∈ Z and N ∈ N. Note that E
√
‖Φ ‖
ℓ=0
and
2ℓ
By Liapunov’s inequality, E |E (κNt | FN ,t −n )| ≤ E {[E (κNt | FN ,t −n )]2 } (Theorem 9.23 of Davidson (1994)). It follows that the two∞ dimensional array {{κNt , FNt }∞ t =−∞ }N =1 is an L1 -mixingale with respect to the constant array {cNt }. Eqs. (63) and (64) establish that array {κNt /cNt } is uniformly bounded in the L2 -norm. This implies uniform integrability.15 Note that
(57)
for any i ∈ K and any N ≥ i. (58)
∞ −
Under Assumptions 2–4, ςn has the following properties:
N →∞
′
2(mnp −p)
N ∈N
Also, (1) implies that ′
(63)
14
Appendix. Lemmas and proofs
θ ′ Φℓ−p ΣΦ′ℓ−p θ,
κNt cNt
| FN ,t −n
=E
∞ −
θΦ ′
s−p
ut −s
s =p
∞ −
ϕΦ ′
ℓ−q
ut −ℓ | FN ,t −n
ℓ=q
− E (θ ′ υt −p ϕ′ υt −q ), ∞ ∞ − − = [θ ′ Φs−p ut −s ϕ′ Φℓ−q ut −ℓ s=mnp ℓ=mnq
− E (θ ′ Φs−p ut −s ϕ′ Φℓ−q ut −ℓ )]. 14 We use the submultiplicative property of matrix norms (‖AB‖ ≤ ‖A‖ ‖B‖ for any matrices A, B such that AB is well defined) and the fact that the spectral matrix norm (i.e., ‖A′ ‖ = ‖A‖). Note also that Assumption 4 implies that ∑∞ is self-adjoint ℓ 2 ℓ=0 ‖Φ ‖ = O(1). 15 A sufficient condition for uniform integrability is L uniform boundedness for any ϵ > 0. 16 See Theorem 19.11 of Davidson (1994).
1+ϵ
17 As before, {F } denotes the array of σ -fields that is increasing in t for each N, Nt and κNt is measurable with respect to FNt .
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
Let θ ′s = θ ′ Φs and ϕ′ℓ = ϕ′ Φℓ ; then
[ E
E
=
κNt cNt
∞ ∞ − −
−
weak law,20 which implies that
| FN ,t −n
s=mpn ℓ=mqn j=mpn d=mqn
∞ array {{κNt , FN ,t }∞ t =−∞ }N =1 satisfies the conditions of a mixingale
]2
∞ ∞ − ∞ − ∞ − −
s=mpn ℓ=mqn
E (θ ′s−p ut −s ϕ′ℓ−q ut −ℓ θ ′j−p ut −j ϕ′d−q ut −d )
s=mpn ℓ=mqn
E (θ ′s−p ut −s ϕ′ℓ−q ut −ℓ ) =
(67)
∞ −
t =1
κNt → 0, i.e.,
j
2 E (θ ′s−p ut −s ϕ′ℓ−q ut −ℓ ) .
L1
∑TN
T 1− ′ L1 θ υt −p ϕ′ υt −q − E (θ ′ υt −p ϕ′ υt −q ) → 0, T t =1
Using the independence of ut and ut ′ for any t ̸= t ′ (Assumption 2), we have ∞ ∞ − −
15
θ ′ Φℓ−p ΣΦ′ℓ−q ϕ
ℓ=max{p,q,n}
as (N , T ) → ∞. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (60). Under −1/2 ‖θ‖ ), result (62) follows from result (60) by noting that √ = O(N ‖ N θ‖ = O(1). Lemma 2. Suppose that xt is generated by model (25), and that j
Assumption 2–8 hold. Then, as (N , T ) → ∞, for any p, q ∈ {0, 1}, and for any sequence of non-random vectors θ and ϕ with growing dimension N × 1 such that ‖θ‖1 = O(1) and ‖ϕ‖1 = O(1), we have T 1− ′ p θ xt −p − E (θ ′ xt −p ) → 0, T t =1
≤ ςa,n , where
(71)
and
ςa,n = sup ‖θ‖ ‖ϕ‖ ‖Σ ‖ ‖Φ‖χ1 (p,n,q) N ∈N
∞ −
‖Φ‖2ℓ ,
ℓ=0
and χ1 (p, n, q) = max{0, q − p, n − p} ∑ + max{0, p − q, n − q}. 2ℓ ‖Σ ‖ = O(1) by Assumptions 2 and 3, ∞ = O(1) by ℓ=0 ‖Φ‖ Assumption 4, ‖θ‖ = O(1), and ‖ϕ‖ ≤ ‖ϕ‖1 = O(1). ςa,n has the following properties:
ςa,0 < Ka ,
and ςa,n → 0
as n → ∞.
(68)
Similarly, since by Assumption 2 ut and ut ′ are independently distributed for any t ̸= t ′ , the first term on the right-hand side of Eq. (67) is bounded by ςb,n 18 :
T 1− ′ p θ xt −p ϕ′ xt −q − E (θ ′ xt −p ϕ′ xt −q ) → 0. T t =1
Furthermore, for ‖θ‖ = O(1) and ‖ϕ‖1 = O(1), we have T 1− ′ p θ υ t −p ϕ ′ Γ f t −q → 0 , T t =1
N ∈N
− ℓ=max{p,q,n}
‖Φ‖2(ℓ−p)+2(ℓ−q) + 2ςa2,n
+ ‖θ‖2 ‖Σ ‖2 ‖ϕ‖2 ‖Φ‖2χ2 (p,n,q)
∞ −
‖Φ‖2ℓ
ℓ=0
2
,
where χ2 (p, n, q) = max{0, n − p} + max{n − q, 0}, B is an N × N matrix with the (i, j)th element given by ‖Ψ ij ‖, and Ψ ij is an N × N matrix of fourth moments with its (n, s)th element given by E (uit ujt unt ust ). It follows from Assumptions 2–4 that ςb,n has the following properties19 :
ςb,0 < Kb ,
and ςb,n → 0
as n → ∞.
(69)
E {[E (κNt /cNt | FN ,t −n )] } is therefore bounded by ςn = ςa,n + ςb,n . Eqs. (68) and (69) establish 2
ς0 < K ,
ςn → 0 as n → ∞.
(70)
By Liapunov’s inequality, E |E (κNt | FN ,t −n )| ≤ E {[E (κNt | FN ,t −n )]2 } (Theorem 9.23 of Davidson (1994)). It follows that the two∞ dimensional array {{κNt , FN ,t }∞ t =−∞ }N =1 is an L1 -mixingale with respect to a constant array {cNt }. Furthermore, (70) establishes that array {κNt /cNt } is uniformly bounded in the L2 -norm. This implies uniform integrability. Since Eqs. (65) and (66) also hold,
(73)
where υ t is defined in Eq. (30). Proof. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN →∞ TN = ∞. Consider the following ∞ two-dimensional array, {{κNt , FNt }∞ t =−∞ }N =1 , defined by
ςb,n = sup ‖B‖ · ‖θ‖2 ‖ϕ‖2
(72)
1 ′ θ υt −p ϕ′ Γ ft −q ,
κNt =
TN
where {FNt } denotes the array of σ -fields that is increasing in t for each N, and κNt is measurable with respect to FNt . Let ∞ {{cNt }∞ t =−∞ }N =1 be a two-dimensional array of constants, and set cNt = 1/TN for all t ∈ Z and N ∈ N. Using the submultiplicative property of the matrix norm, and the independence of ft and υ t ′ for any t , t ′ ∈ Z, we have
[ E
E
]2 κNt F N ,t − n ≤ ςn , c Nt
where
ςn = sup ‖θ‖2 ‖Σ ‖ ‖Φ‖2 max{0,n−p}
∞ −
N ∈N
‖Φ‖2ℓ
ℓ=0
× E {[E (ϕ Γ ft −q | FN ,t −n )] } . ′
2
18 E (θ ′ u ϑ ′ u θ ′ u ϑ′ u ) is non-zero only if one of the following s−p t −s ℓ−q t −ℓ j−p t −j d−q t −d four cases holds: (i) s = ℓ = j = d, (ii) s = ℓ, ℓ ̸= j, and j = d, (iii) s = j, j ̸= ℓ, and ℓ = d, or (iv) s = d, d ̸= ℓ, and ℓ = j. 19 Matrix B is symmetric by construction. Therefore ‖B‖ ≤ √‖B‖ ‖B‖ = ‖B‖ , ∞
where ‖B‖∞ = maxn∈{1,...,N }
∑N
s=1
1
‖Ψ ns ‖ ≤ ‖RR′ ‖2∞ ≤ ‖R‖2∞ ‖R‖21 < K .
2 ‖θ‖ = O(1), ‖Φ‖ ≤ 1 − ϵ by Assumption 4, and ‖Σ ‖ ≤ √ ‖Σ ‖1 ‖Σ ‖∞ = O(1) by Assumption 3. Furthermore, since ft −q is covariance stationary and ‖ϕ′ ΓΓ ′ ϕ‖ = O(1) (by condition ‖ϕ‖1 = O(1) and Assumption 8), we have
E {[E (ϕ′ Γ ft −q | FN ,t −n )]2 } = O(1). It follows that ςn has the following properties:
ς0 < K and ςn → 0 as n → ∞.
∞
20 See Theorem 19.11 of Davidson (1994).
16
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
Array {κNt /cNt } is thus uniformly bounded in the L2 -norm. This proves the uniform integrability of array {κNt /cNt }. Furthermore, using Liapunov’s inequality, the two-dimensional array ∞ {{κNt , FNt }∞ t =−∞ }N =1 is an L1 -mixingale with respect to the constant array {cNt }. Noting that Eqs. (65) and (66) hold, it follows that the array {κNt , FNt } satisfies the conditions of a mixingale weak
∑TN
L1
law, which implies that t =1 κNt → 0. Convergence in L1 -norm implies convergence in probability. This completes the proof of result (73). Assumption 8 implies that the sequence θ ′ α (as well as ϕ′ α) is deterministic and bounded. The vector of endogenous variables xt can be written as 21
Process ft is independent of υ t . Suppose that (N , T ) → ∞. Processes {θ ′ υ t −p } and {θ ′ υ t −p ϕ′ υ t −q } are ergodic in mean by Lemma 1 since ‖θ‖ ≤ ‖θ‖1 = O(1). Furthermore, T 1− ′ p θ Γ ft − θ ′ Γ E (ft ) → 0, T t =1
T
√ ˚ ′r υ t −p ϕ′ xt −q = w
t =1
T N −
T
˚ ′r υ t −p ϕ′ w
t =1
× (α + Γ f t −q + υt −q ).
(79)
√
˚ r ‖ = O(1) for any r ∈ {1, . . . , mw } by condition (27), Since ‖ N w we can use Lemma 1, result (62), which implies that
T
p
˚ ′r υ t −p ϕ′ υ t −q − E (w ˚ ′r υ t −p ϕ′ υ t −q ) → 0. w
(80)
t =1
The sequence {ϕ′ α} is deterministic and bounded in N, and therefore it follows from Lemma 1, result (61), that
√
T N −
T
and
p
˚ ′r υ t −p ϕ′ α → 0. w
(81)
t =1
Similarly, Lemma 2 Eq. (73) implies that
T 1− ′ p θ Γ ft ϕ′ Γ ft −q − θ ′ Γ E (ft f′t −q )Γ ′ ϕ → 0, T t =1
√
T N −
since ft is a covariance stationary m × 1-dimensional process with absolute summable autocovariances (ft is ergodic in mean as well as in variance), and
T
p
˚ ′r υ t −p ϕ′ Γ ft −q → 0. w
(82)
t =1
Results (80)–(82) establish that
√
‖θ ΓΓ ϕ‖ = O(1), ‖(θ ′ ΓΓ ′ ϕ)2 ‖ = O(1), ′
T N −
T
by Assumption 8, condition ‖θ‖1 = O(1), and condition ‖ϕ‖1 = O(1). The sum of a bounded deterministic process and independent processes that are ergodic in mean is a process that is ergodic in mean as well. This completes the proof. Lemma 3. Let xt be generated by model (25), let Assumptions 1–8 j
hold, and let (N , T ) → ∞. Then, for any p, q ∈ {0, 1}, for any sequence of non-random weight matrices, W, of growing dimension N × mw satisfying conditions (27)–(28), and for any i ∈ K ,
√
T
T N −
T N −
j
T N −
√
√
xt = α + Γ ft + υ t .
′
follows directly from Lemma 1, result (61). This completes the proof of result (74). Let ϕ be any sequence of non-random N × 1-dimensional vectors of growing dimension such that ‖ϕ‖1 = O(1). We have
p
W′ υ t −p → 0,
(74)
t =1
p
˚ ′r υ t −p ϕ′ xt −q → 0. w
(83)
t =1
˚ l for any l ∈ Result (75) follows from Eq. (83) by setting ϕ = w {1, . . . , mw }. Result (76) follows from Eq. (83) by setting ϕ = ei , where ei is an N × 1-dimensional selection vector for the ith element. Finally, result (77) directly follows from results (74)–(76). This completes the proof. Lemma 4. Let xt be generated by model (25), let Assumptions 1–8 j
hold, and let (N , T ) → ∞. Then, for any sequence of non-random matrices, W, of growing dimension N × mw satisfying conditions (27)–(28), and for any i ∈ K ,
√
T N −
T
p
W′ υ t −p xW ,t −q → 0,
(75)
T t =1
t =1
√
T N −
T
p
W′ υ t −p xi,t −q → 0,
T
t =1 p
git qit → 0,
where matrix Ci = E (git g′it ) and vector git = (ξ i,t −1 , xWt , xW ,t −1 , 1)′ . ′
′
(77)
Lemma 5. Let xt be generated by model (25), let Assumptions 2–8
(1, ξ ′i,t −1 , x′Wt ,
hold, and let (N , T ) → ∞. Then, for any sequence of non-random weight matrices, W, of growing dimension N × mw satisfying conditions (27)–(28), and for any fixed p ≥ 0,
j
where the process υ t is defined in (30), vector git = ′ xW ,t −1 )′ , and qit is defined in Eq. (37).
˚ r for r ∈ {1,√ Proof. Let w . . . , mw } denote the rth column vector of ˚ r ‖ = O(1) by granularity condition matrix W. Noting that ‖ N w (27), the result
√
T
(84)
Proof. Result (84) directly follows from Lemmas 1–3.
t =1
T N −
p
git g′it − Ci → 0, ′
(76)
√
T N −
T 1−
p
˚ ′r υ t −p → 0 w
(78)
T 1−
T t =1
p
W′ υ t −p uit → 0,
(85)
where the process υ t is defined in (30). If, in addition, T /N → ~ , with 0 ≤ ~ < ∞,
t =1 T 1 −
√ 21 See Theorem 19.11 of Davidson (1994).
T t =1
p
W′ υ t −p uit → 0.
(86)
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
Proof. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN →∞ TN = ∞ and limN →∞ TN /N = ~ < ∞, where ~ ≥ 0 is not necessarily non-zero. Define 1
κNit = √
TN
{W′ υt −p uit − E (W′ υt −p uit )},
(87)
Therefore, for each fixed i ∈ N, each of the mw two-dimensional ∞ arrays given by the elements of vector array {{κNit , FNt }∞ t =−∞ }N =i satisfies the conditions of a mixingale weak law,24 which implies that TN −
1
√
where the subscript N is used to emphasize the number of crosssection units.22 Let {FNt } denote the array of σ -fields that is increasing in t for each N and let κNt be measurable with respect to FNt . First it is established that, for any fixed i ∈ N, the vector ∞ array {{κNit /cNt , FNt }∞ t =−∞ }N =i is uniformly integrable, where 1 . For p > 0, we can write cNt = √NT N
∞ −
‖Φ ℓ ‖2 ,
TN E (W′ υ t −p uit ) → 0.
But
‖ TN E [W′ υt −p uit ]‖1 = TN ‖E (W′ ut uit )‖1 1 = TN O → 0,
−1/2
= O(1), where ‖W‖2 = O(N −1 ) ∑ by condition (27), ‖Σ ‖ = O(1) by ∞ ℓ 2 Assumptions 2 and 3, and ℓ=0 ‖Φ ‖ = O(1) by Assumption 4. For p = 0, we have
∞ ray {TN cNt }, since {{κNit , FNt }∞ t =−∞ }N =i is an L1 -mixingale with respect to the constant array {cNt }. Note that
lim
N →∞
∞ − κNit κ′Nit ′ ′ ℓ E = N · Var W u u + W Φ u u , t it t −ℓ it 2 cNt ℓ=1
TN TN − − 1 √ cNt = lim
TN
t =1
N →∞
N →∞
TN
t =1
= lim
‖Φℓ ‖2 + O(N −1 ) ,
N →∞
TN −
Also, since
E |E (κNit | FN ,t −n )|
E (W′ υ t −p uit ) = O(N −1 ),
0
for any n > 0 and any fixed p ≥ 0 for n = 0 and any fixed p ≥ 0,
(88)
∞ and {{κNit , FNt }∞ t =−∞ }N =i is an L1 -mixingale with respect to 23 constant array {cNt }. Note that
lim
N →∞
TN −
cNt = lim
N →∞
t =1
TN − t =1
√
1
NTN
= lim
N →∞
TN N
√ =
N →∞
N →∞
N
= 0,
t =1
1 TN N
2
1
√ TN
N
= 0.
L1
qNit → 0
as N → ∞.
(89)
t =1
it follows that T 1−
T t =1
L1
W′ υ t −p uit → 0, j
as N , T → ∞ at any rate. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (85).
~ < ∞,
and lim
N
= lim √
Therefore, for any fixed i ∈ N, a mixingale weak law25 implies that
where, as before, Ψ ii is an N × N symmetric matrix with the element (n, s) equal to E (uit uit unt ust ). Therefore, for p ≥ 0, the two-dimensional vector array {κNit /cNt } is uniformly bounded in the L2 -norm. This proves the uniform integrability of {κNit /cNt }.
τ mw cNt O(1)
TN
N →∞
ℓ=1
=
1
2 TN TN − − 1 lim = lim √ cNt
= O(1),
t =1
1
√
and
≤ N ‖W‖2 ‖Ψ ii ‖ + σii2 ‖W‖2 ‖Σ ‖ ∞ −
L1
since limN →∞ TN /N = ~ < ∞. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (86). Result (85) is established in a very similar fashion. Define the −1/2 new vector array qNit = TN κNit , where κNit is the array defined in (87) and i ∈ N is fixed. Let TN = T (N ) be any nondecreasing integer-valued function of N such that limN →∞ √ TN = ∞. Notice that, for any fixed i ∈ N, the vector array {{ TN qNit / ∞ cNt , FNt }∞ t =−∞ }N =i is uniformly integrable because {{κ Nit /cNt , ∞ FNt }∞ } t =−∞ N =i is uniformly integrable. Furthermore, {{qNit , ∞ FNt }∞ t =−∞ }N =i is an L1 -mixingale with respect to the constant ar-
ℓ=0
×
TN t =1
W′ υ t −p uit −
N
∞ − κNit κ′Nit ′ ℓ E = N · E W Φ ut −ℓ−p uit 2 cNt ℓ=0 ′ ∞ − ′ ℓ W Φ ut −ℓ−p uit × , ℓ=0 ∞ − = N σii2 W′ Φℓ ΣΦ′ℓ W , ℓ=0 ≤ N σii2 ‖W‖2 ‖Σ ‖
17
Lemma 6. Let xt be generated by model (25), let Assumptions 1–8 TN − t =1
2 cNt
= lim
N →∞
TN − 1 t =1
TN N
= lim
N →∞
1 N
j
hold and let (N , T ) → ∞ such that T /N → ~ , with 0 ≤ ~ < ∞. Then, for any sequence of non-random matrices of weights W of growing dimension N × mw satisfying conditions (27)–(28), and for any i ∈ K , we have
= 0.
22 Note that W and υ t −p change with N, but, as before, we omit the subscript N here to keep the notation simple. 23 The last equality in Eq. (88) takes advantage of Liapunov’s inequality. τ is an mw × 1-dimensional vector of ones.
mw
24 See Theorem 19.11 of Davidson (1994). 25 See Theorem 19.11 of Davidson (1994).
18
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22 T 1−
(a) under Assumption 9, T 1 − 12 1 − D Ci √ git uit → N (0, Iki ), σii T t =1
(90) ′
′
g′it ) and git = (ξ i,t −1 , f′t Γ W , f′t −1 Γ W , 1)′ , and where Ci = E ( git (b) under Assumption 10, ′
1
−1
√ Ωv i 2
σii T
T −
D
vi,t −1 uit → N (0, Ihi ),
(91)
t =1
where matrix Ωv i = E (vit v′it ) and vector vit = Si
∑∞ ′
ℓ=0
Φℓ ut −ℓ .
Proof. Let a be any ki × 1-dimensional vector such that ‖a‖ = 1, and define
κNt = √
1
− 12
TN σii
a ′ Ci git uit ,
TN 1 − p −1 −1 a′ Ci 2 git g′it Ci 2 a → 1. TN t =1
p
2 κNt → 1.
(92)
T 1−
T t =1
Υ ′i Q
,
N
υW ,t −p f′t −q = op
(95)
1
√
N
υi,t −p υ′W ,t −q = op
,
(96)
1
√
N
υ W , t − p υ W , t − q = op ′
,
1
√
N
(97)
,
(98)
= op (1).
T
(99)
H′ Q T Z′i H T H′ H
= A′ =
Q′ Q T
Z′i Q T
=A
T
= A′
1
√
N
A + op
T
,
N
Q′ ui◦
1
√
A + op
′ ′Q Q
T
+ op
,
(101)
1
√
N
+ op
(100)
,
1
√
N
(102)
,
(103)
where
Υi T ×hi
= (vi0 , vi1 , . . . , vi,T −1 )′ .
(104)
vit = S′i ℓ=0 Φℓ ut −ℓ , H and Zi are defined by (41) and (42), respectively, and Q, F and A are defined in (43)–(44).
−1/2
TN −
N →∞
git uit )4 = O(1), and therefore
E (κ ) = 0. 4 Nt
t =1
Using Liapunov’s theorem (Theorem 23.11 of Davidson (1994)), Lindeberg condition27 holds, which in turn implies that p
max |κNt | → 0
as N → ∞.
(93)
1≤t ≤TN
TN
κNt = √
1 TN σii
Proof. Result (95) follows directly from Eq. (61) of Lemma 1, since the spectral norm of any column vector of the matrix W is O(N −1/2 ). Result (96) follows from result (95) by noting that ft is independently distributed of υ W ,t , and all elements of the variance matrix of ft are finite. Furthermore, since (by Lemma 1)
∑T
− 21
a′ Ci
TN
−
D
git uit → N (0, 1)
(94)
t =1
Since Eq. (94) holds for any ki × 1-dimensional vector a such that ‖a‖ = 1, result (90) directly follows from Eq. (94) and Theorem 25.6 of Davidson (1994). Result (91) can be established in the same way as result (90), −1
−1/2
but this time we set κNt = TN σii−1 a′ Ωvi 2 vi,t −1 uit , where a is any hi × 1-dimensional vector such that ‖a‖ = 1.
p
T −1 t =1 vit → 0, Eq. (99) follows. Results (97) and (98) follow directly from Eq. (62) of Lemma 1 by noting that
√
Results (92), (93) and the martingale difference array central limit theorem (Theorem 24.3 of Davidson (1994)) establish that
t =1
T t =1
1
√
∑∞
Furthermore, E (σii−1 a′ Ci
−
T 1−
H′ ui◦
t =1
lim
T t =1
T
git and uit are independent, and the fourth moments of uit are finite. −1/2 Therefore, a′ Ci git uit is ergodic in variance, and −
T 1−
Furthermore,
where TN = T (N ) is any non-decreasing integer-valued function of N such that limN →∞ TN = ∞ and limN →∞ TN /N = ~ < ∞, where 0 ≤ ~ < ∞. Array {κNt , FNt } is a stationary martingale −1/2 difference array.26 Lemmas 1 and 2 imply that a′ Ci git is ergodic in variance; in particular,
TN
T t =1
υ W , t − p = op
NE (υ i,t −p υ ′W ,t −q ) = O
1
(105)
√
N
as well as28
√
NE (υ W ,t −p υ ′W ,t −q ) = O
1
√
N
.
(106)
In order to prove Eqs. (100)–(103), first note that row t of the matrix H − QA is (0, υ ′Wt , υ ′W ,t −1 ). Using results (95)–(98), we have
(H − QA)′ Q T
T 1−
0
υWt υ W ,t − 1 1 = op √ ,
=
1,
T t =1
f′t ,
f′t −1
(107)
N
Lemma 7. Let xt be generated by model (25), and suppose that j
′
Assumptions 1–8 hold, and that (N , T ) → ∞. Then, for any arbitrary matrix of weights, W, satisfying conditions (27)–(28), for any p, q ∈ {0, 1}, and for any i ∈ K ,
Z′i (H − QA)
26 As before, {F } denotes the array of σ -fields that is increasing in t for each N, Nt and κNt is measurable with respect to FNt . 27 See Condition 23.17 of Davidson (1994).
28 Results (105) and (106) are straightforward to establish by taking the row norm and by noting that the granularity conditions (27)–(28) imply that ‖W‖∞ = O(N −1 ).
T
=
T 1−
T t =1
ξ i,t −1
0
υWt
υ W ,t − 1
= op
1
√
N
,
(108)
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
H′ (H − QA) T
=
T 1−
T t =1
= op
x W ,t − 1
1
0
xWt
υWt
υ W ,t − 1
′
Z′i MH Q
√
T
,
√
N
(109)
H′ H
=
T
H′ (H − QA) T
+
H′ (QA) T
,
H′ (H − QA)
T
T t =1
Υ ′i MQ ui◦
=
√
(114) T
+ op
√
T
,
N
(115)
Proof. Z′i MH Zi
=
T
N
where the last equality uses Eqs. (107) and (109). This completes the proof of result (102). Eq. (89) (see proof of Lemma 5) implies that T 1−
Z′i MH ui◦
,
N
where Ωv i is defined in Assumption 10, MH and Zi are defined in (41) and (42), respectively, Q and F are defined by (43), and Υ i = ′ vi0 , vi1 , . . . , vi,T −1 .
Q′ Q (H − QA)′ Q + A + A′ A, T T T Q′ Q 1 , = A′ A +o p √
=
T
= op
T
Eq. (107)–(108) establish results (100) and (101). Note that
19
Z′i Zi
−
T
Z′i H
H′ H
T
+
H′ Zi
T
T
.
(116)
Results (101)–(102) of Lemma 7 imply that Z′i H
H′ H
T
H′ Zi
T
p
υW ,t −p uit − E (υW ,t −p uit ) → 0,
+
T
′
=
Zi Q T
′Q
A A
′
Q
T
+ A
′Q
A
′
Zi
+ op
T
1
√
N
.
(117)
j
as Result (103) follows by noting that √ N , T → ∞ at any rate. NE (υ W ,t −p uit ) = O(N −1/2 ). This completes the proof. Lemma 8. Let xt be generated by model (25), and suppose that
Using the definition of the Moore–Penrose inverse, it follows that
A′
Q′ Q T
j
Assumptions 1–8 and 10 hold, and that (N , T ) → ∞. Then, for any i ∈ K , and for any arbitrary matrix of weights, W, satisfying conditions (27)–(28) and Assumption 10, we have Q′ Q T
→ ΩQ ,
Υ ′i Υ i T
−1 ′ −1 Q′ Q
′
A A′
p
− Ωvi → 0,
(111)
ΩQ =
1 0 0
0 Γ f (0) Γ f (1)
Proof. Assumption 6 implies that matrix ΩQ is non-singular. Result (110) directly follows from the ergodicity properties of the covariance stationary time-series process ft . Consider now asymptotics N , T → ∞ at any rate. Lemma 1 implies that the hi × 1-dimensional vector vit = S′i υ t is ergodic in p
t =1
S′i υ t υ ′t Si − E (S′i υ t υ ′t Si ) → 0.29 This
Lemma 9. Let xt be generated by model (25), and suppose that j
Assumptions 1–8 and 10 hold, and that (N , T ) → ∞. Then, for any i ∈ K , and for any arbitrary matrix of weights W satisfying conditions (27)–(28) and Assumption 10, we have Z′i MH Zi T Z′i MQ Zi
=
Z′i MQ Zi T
+ op
1
√
p
− Ωvi → 0,
29 ‖S ‖ = O(1) by Assumption 1. i 1
N
Q′ Q = A′ A .
(118)
T
−1
AA′
−1
A from the left and by
A′ =
Q′ Q
−1
T
.
(119)
H′ H
+
T
H′ Zi T
=
Z′i Q
Q′ Q
T
−1
Q′ Zi
T
T
+ op
1
√
N
.
Zi = τα′i Si + F(−1)Γ ′i Si + Υ i .
(120)
,
(121)
Since Q = [τ, F , F (−1)], it follows that Z′i MQ Zi T
j
variance; in particular, T1 completes the proof.
T
+ A
T
A
Result (112) follows from Eqs. (120) and (116). Using (25), we have
Γ f (ℓ) = E (ft f′t −ℓ ), Ωvi = E (vi v′i ), matrix Q is defined in Eq. (43), and matrix Υ i = (vi0 , vi1 , . . . , vi,T −1 )′ .
∑T
Q′ Q
Q′ Q T
Q′ Q
from the right to obtain30
T
T
0 Γ f (1) , Γ f (0)
A′
Eqs. (119) and (117) imply that Z′i H
where
T
+ A
Multiply Eq. (118) by
(110)
ΩQ is non-singular, and
T
A′
A AA p
Q′ Q
A
=
Υ ′i MQ Υ i
=
T
Υ ′i Υ i T
+
Υ ′i Q
Q′ Q
T
−1
Q′ Υ i
T
T
.
(122)
Using Eqs. (99), (110) and (111), result (113) follows directly from (122). Results (100)–(102) of Lemma 7 imply that Z′i H
H′ H
T
T ′
=
+
Zi Q T
H′ Q T
A A′
Q′ Q T
+ A
A′
Q′ Q T
+ op
1
√
N
.
(123)
Substituting Eq. (119), it follows that (112) Z′i H (113)
T
H′ H T
+
H′ Q T
=
Z′i Q T
Q′ Q T
−1
Q′ Q T
+ op
1
√
N
.
(124)
′ 1 ′ 30 Note that plim T →∞ T Q Q is non-singular by Lemma 8, result (110). AA is nonsingular, since matrix A has full row-rank by Assumption 10.
20
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
Proof of Theorem 1. (a) Substituting for xit in Eq. (39) yields
Eq. (124) implies that Z′i MH Q
√
Z′i MQ Q
=
T
T
+ op
√
T
πi − πi
T
= op
N
.
N
=
This completes the proof of result (114). Results (101)–(103) of Lemma 7 imply that Z′i H
H′ H
T
+
Z′i Q
=
T
T 1−
′Q
A A
′
+
Q
A
T
′Q
′
A
ui◦
+ op
T
1
√
N
T t =1
.
Z′i H
H′ H
T
+
H′ Q
T
=
T
Z′i Q
Q′ Q
T
T 1−
−1
Q′ ui◦
T
T
+ op
1
√
N
.
(125)
Noting that MQ (τα′i Si + F Γ i Si ) = 0 since Q = [τ, F , F (−1)], Eqs. (125) and (121) imply that ′
Z′i MH ui◦
√
′
Zi MQ ui◦
=
√
T
T
T
+ op
′
Υ i MQ ui◦
=
√
T
T
+ op
.
N
This completes the proof.
git qit +
T t =1
git uit
.
(129)
p
git uit → 0.
(130)
T t =1
p
git qit → 0,
(131)
and T 1−
T t =1
p
git g′it − C(N ),i → 0,
(132)
respectively. Assumption 9 postulates that the matrix C(N ),i is 1 invertible and that ‖C− (N ),i ‖ is bounded in N. It follows from Eq. (132) that
,
N
T t =1
T 1−
Also, using Lemmas 3 and 4, we have
Substituting Eq. (119), it follows that
git git
T t =1
T 1−
j
T
′
With N , T → ∞ in any order, Lemma 5 yields31
H′ ui◦
T
−1
T 1−
T 1−
T t =1
−1
p
1 − C− (N ),i → 0.
git g′it
(133)
p
Lemma 10. Let xt be generated by model (25), and suppose that j
Assumptions 1–8 and 10 hold, and that (N , T ) → ∞. Then, for any i ∈ K , and for any arbitrary matrix of weights, W, satisfying conditions (27)–(28) and Assumption 10, we have Z′i MH ζ i (−1) T Z′i MH ui◦
√
= op
Υ ′i ui◦
= √
T
T
1
√
N
,
H ζ i (−1)
+ op
+ op (1),
N
=
=
T t =1
√
T
φib ′
∞ −
′ ℓ
Φ ut −ℓ−1
xW ,t −1
Υ ui◦ ΥQ = √i + i ′
′
T
T
Υ ′ ui◦ = √i + op (1), T
−1 ′
git git
T 1 −
√
T t =1
T 1 −
git qit + √ git uit T t =1
.
(134)
With (N , T ) → ∞ such that T /N → ~ < ∞, Lemma 3 can be used to show that T 1 −
φib ′
∞ −
′ ℓ
Φ ut −ℓ−1
′
QQ T
T 1−
.
T t =1
ℓ=0
−1
p
git qit → 0.
(135)
1 Since ‖C− (N ),i ‖ = O(1), Eqs. (133) and (135) now yield
,
ℓ=0
T 1− xWt T t =1
T yields
j
xi,t −1
T t =1
(127)
follows from Eqs. (108) and (109).
Υ i MQ ui◦
T ( πi − πi ) =
T t =1 T 1−
T 1−
×
T
‖φib ‖∞ = O(N −1 ) by Assumption 1; therefore, result (126) directly ′
√
√
′
T
Proof.
T
√
(b) Multiplying Eq. (129) by
(126)
where matrices MH , and Zi are defined in (41) and (42), respectively, Υ i = (vi0 , vi1 , . . . , vi,T −1 ), and vector ζ i (−1) = (ζi,0 , . . . , ζi,T −1 )′ .
Z′i ζ i (−1)
Result πi − πi → 0 directly follows from Eqs. (130), (131) and (133).
T 1 −
T t =1
Q ui◦
√ ,
git git
T 1 −
√
T t =1
p
git qit → 0.
(136)
Lemma 5 establishes that
√ ′
−1 ′
p
υW,t −p uit → 0 for p ∈ {0, 1} .
(137)
It follows from Eq. (137) that
T
(128)
where T −1/2 Q′ ui◦ = Op (1), plimT →∞ Q′ Q/T is non-singular by Lemma 8, and Υ ′i Q/T = op (1) by Lemma 7, Eq. (99). Substituting (128) into Eq. (115) implies result (127). This completes the proof.
T 1 −
√
T t =1
p
(git − git )uit → 0,
(138)
p 31 T −1 ∑T x t =1 j,t −1 uit → 0 since xjt is ergodic in mean by Lemma 2 and uit is independent of xj,t −1 for any N ∈ N and any j ∈ {1, . . . , N }. Furthermore, using
similar arguments, T −1
∑T
t =1 ft uit
p
→ 0.
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
′
′
where git = (ξ i,t −1 , f′t Γ W , f′t −1 Γ W , 1)′ . Lemma 6 establishes that ′
1
T −
1 −1 C(N2),i √
σ(N ),ii
T t =1
D
git uit → N (0, Iki ).
(139)
Eqs. (133), (136), (138) and (139) imply result (45). p
∑T
(c) Lemma 4 establishes that T −1 t =1 git g′it − C(N ),i → 0. The estimated residuals from auxiliary regression (38) are equal to uit = uit − g′it ( πi − πi ), which implies that T 1−
T t =1
T 1−
u2it =
T t =1
u2it − 2( π i − π i )′
+ ( πi − πi )
′
T 1−
T t =1
T 1−
T t =1
git uit
′
git git
( πi − πi ),
(140)
∑T p p where T −1 t =1 u2it − σ(2N ),ii → 0, πi − πi → 0 is established
∑T p ′ t =1 git git − C(N ),i → 0 is ∑T p established in Lemma 4, and T −1 t =1 git uit → 0 is established in part (a) of this proof, T −1
in Eq. (130). This completes the proof.
Proof of Theorem 2. Vector xi◦ can be written, using system (25), as xi◦ = τ(αi − δ′i S′i α) + Zi δi + Fγ i − F(−1)Γ ′ Si δi
+ ζ i (−1) + ui◦ , (141) ′ where ζ i (−1) = ζi0 , . . . , ζi,T −1 . Substituting Eq. (141) into the partition least-squares formula (40) and noting that, by Lemma 9, Z′i MH Q
√
T
T
= op
N
,
(142)
it follows that
√ T δi − δi =
Z′i MH Zi
−1
Z′i MH ui◦ + ζ i (−1)
√
T
T
+ op
T
N
.
(143)
Lemma 9 also establishes that Z′i MH Zi T
p
− Ωvi → 0,
j
as N , T → ∞ at any rate,
(144)
where Ωv i = E (vit v′it ) is non-singular by Assumption 10. j
Consider now asymptotics N , T → ∞ such that T /N → ~ < ∞. Lemma 10 establishes that Z′i MH ζ i (−1)
√
T
p
→ 0,
(145)
and Z′i MH ui◦
√
Υ ′i ui◦
= √
T
T
T
+ op
N
+ op (1),
(146)
where Υ i = (vi0 , . . . , vi,T −1 )′ . Also from Lemma 6 1
−1
√ Ωv i 2
σii T
T −
D
vi,t −1 uit → N (0, Ihi ).
(147)
t =1
The desired result (48) now follows from (143)–(147).
21
References Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Dordrecht, The Netherlands. Bai, J., Ng, S., 2007. Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60. Bernanke, B.S., Bovian, J., Eliasz, P., 2005. Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. Quarterly Journal of Economics 120, 387–422. Canova, F., 1995. Vector autoregressive models: specification, estimation, inference, and forecasting. In: Pesaran, M., Wickens, M. (Eds.), Handbook of Applied Econometrics: Macroeconomics. Basil Blackwell, Oxford (Chapter 2). Chudik, A., 2008. Global Macroeconomic Modelling. Ph.D. Thesis. Trinity College, University of Cambridge. Chudik, A., Pesaran, M.H., Tosetti, E., 2010. Weak and strong cross section dependence and estimation of large panels. ECB Working Paper No. 1100, October 2009, revised April 2010. Cliff, A., Ord, J.K., 1973. Spatial Autocorrelation. Pion, London. Conley, T.G., 1999. GMM estimation with cross sectional dependence. Journal of Econometrics 92, 1–45. Conley, T.G., Topa, G., 2002. Socio–economic distance and spatial patterns in unemployment. Journal of Applied Econometrics 17, 303–327. Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press. De Mol, C., Giannone, D., Reichlin, L., 2008. Forecasting using a large number of predictors: is Bayesian shrinkage a valid alternative to principal components? Journal of Econometrics 146, 318–328. Del Negro, M., Schorfheide, F., 2004. Priors from general equilibrium models for VARs. International Economic Review 45, 643–673. Dées, S., di Mauro, F., Pesaran, M.H., Smith, L.V., 2007. Exploring the international linkages of the Euro area: a global VAR analysis. Journal of Applied Econometrics 22, 1–38. Doan, T., Litterman, R., Sims, C., 1984. Forecasting and conditional projections using realistic prior distributions. Econometric Reviews 3, 1–100. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. Review of Economics and Statistic 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2004. The generalized dynamic factor model: consistency and rates. Journal of Econometrics 119, 231–235. Forni, M., Lippi, M., 2001. The generalized factor model: representation theory. Econometric Theory 17, 1113–1141. Garratt, A., Lee, K., Pesaran, M.H., Shin, Y., 2006. Global and National Macroeconometric Modelling: A Long Run Structural Approach. Oxford University Press. Geweke, J., 1977. The dynamic factor analysis of economic time series. In: Aigner, D., Goldberger, A. (Eds.), Latent Variables in Socio-Economic Models. NorthHolland, Amsterdam. Giacomini, R., White, H., 2006. Tests of conditional predictive ability. Econometrica 74, 1545–1578. Giannone, D., Reichlin, L., Sala, L., 2005. Monetary policy in real time. In: Gertler, M., Rogoff, K. (Eds.), NBER Macroeconomics Annual 2004, Vol. 19. MIT Press, pp. 161–200. Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning. In: Springer Series in Statistics. Holly, S., Pesaran, M.H., Yamagata, T., 2010. A spatio-temporal model of house prices in the US. Journal of Econometrics 158, 160–173. Hubermann, G., 1982. A simple approach to the arbitrage pricing theory. Journal of Economic Theory 28, 183–191. Ingersoll, J., 1984. Some results in the theory of arbitrage pricing. Journal of Finance 39, 1021–1039. Kelejian, H.H., Robinson, D.P., 1995. Spatial correlation: a suggested alternative to the autoregressive model. In: Anselin, L., Florax, R. (Eds.), New Directions in Spatial Econometrics. Springer-Verlag, Berlin, pp. 75–95. Leeper, E.M., Sims, C.A., Zha, T., 1996. What does monetary policy do? Brookings Papers on Economic Activity 2, 1–63. Litterman, R., 1986. Forecasting with Bayesian vector autoregressions — five years of experience. Journal of Business and Economics Statistics 4, 25–38. Pesaran, M.H., 2006. Estimation and inference in large heterogenous panels with multifactor error structure. Econometrica 74, 967–1012. Pesaran, M.H., Chudik, A., 2010. Econometric Analysis of High Dimensional VARs Featuring a Dominant Unit. ECB working paper No. 1194, May 2010. Pesaran, M.H., Schuermann, T., Treutler, B.J., 2007. Global business cycles and credit risk. In: Carey, M., Stultz, R. (Eds.), The Risks of Financial Institutions. University of Chicago Press (Chapter 9). Pesaran, M.H., Schuermann, T., Treutler, B.J., Weiner, S.M., 2006. Macroeconomic dynamics and credit risk: a global perspective. Journal of Money, Credit and Banking 38, 1211–1262. Pesaran, M.H., Schuermann, T., Weiner, S.M., 2004. Modelling regional interdependencies using a global error-correcting macroeconometric model. Journal of Business and Economics Statistics 22, 129–162. Pesaran, M.H., Smith, R., 2006. Macroeconometric modelling with a global perspective. The Manchester School (Supplement), 24–49. Pesaran, M.H., Smith, L.V., Smith, R.P., 2007. What if the UK or Sweden had joined the Euro in 1999? An empirical evaluation using a global VAR. International Journal of Finance and Economics 12, 55–87. Pesaran, M.H., Tosetti, E., 2010. Large Panels with Common Factors and Spatial Correlation. CESifo Working Paper No. 2103, September 2007, revised May 2010. Ross, S., 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13, 341–360.
22
A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22
Sargent, T.J., Sims, C.A., 1977. Business cycle modeling without pretending to have too much a-priori economic theory. In: Sims, C. (Ed.), New Methods in Business Cycle Research. Federal Reserve Bank of Minneapolis, Minneapolis. Stock, J.H., Watson, M.W., 1999. Forecasting inflation. Journal of Monetary Economics 44, 293–335.
Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–162. Stock, J.H., Watson, M.W., 2005. Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467. Whittle, P., 1954. On stationary processes on the plane. Biometrika 41, 434–449.
Journal of Econometrics 163 (2011) 23–28
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
The general dynamic factor model: One-sided representation results Mario Forni a , Marco Lippi b,∗ a
Dipartimento di Economia Politica, Università di Modena e Reggio Emilia, CEPR and RECent, Italy
b
Dipartimento di Economia, Università di Roma La Sapienza, and EIEF, Italy
article
info
Article history: Available online 12 November 2010 JEL classification: E0 C1 Keywords: Dynamic factor models Frequency domain approach Estimation by one-sided filters
abstract Recent dynamic factor models have been almost exclusively developed under the assumption that the common components span a finite-dimensional vector space. However, this finite-dimension assumption rules out very simple factor-loading patterns and is therefore severely restrictive. The general case has been studied, using a frequency domain approach, in Forni et al. (2000). That paper produces an estimator of the common components that is consistent but is based on filters that are two-sided and therefore unsuitable for prediction. The present paper, assuming a rational spectral density for the common components, obtains a one-sided estimator without the finite-dimension assumption. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The dynamic factor model xit = χit + ξit = bi1 (L)u1t + bi2 (L)u2t
+ · · · + biq (L)uqt + ξit ,
(1.1)
where i ∈ N, t ∈ Z, has been studied in a vast literature starting with Stock and Watson (2002a,b), Forni et al. (2000) and Forni and Lippi (2001). The components ξit , called idiosyncratic, are assumed to be orthogonal to the common components χit and cross-sectionally weakly correlated (see Section 2), so the comovement of the x’s is mainly accounted for by the q common shocks ujt . Usually, the assumptions also include that the Hilbert space spanned by the common components χit , for a given t and i ∈ N, is finite dimensional. Under this assumption, the components χit and ξit can be consistently estimated, as n and T (the number of series and the number of observations for each series, respectively) tend to infinity, using principal components (standard or generalized) of the observable series xit (see Stock and Watson, 2002a,b; Bai and Ng, 2002; Forni et al., 2005, 2009). Moreover, these estimators only involve present and past values of the variables xit . Dynamic principal components, based on the spectral density of the x’s, have been used in Forni et al. (2000), where the above mentioned finite-dimension assumption is not required. However,
∗ Corresponding address: Dipartimento di Economia, Circonvallazione Tiburtina 4, 00185 Roma, Italy. Tel.: +39 0649917062; fax: +39 0649917060. E-mail address:
[email protected] (M. Lippi). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.003
dynamic principal components result in two-sided filters, involving present and past but also future values of the variables xit , with the consequence that the estimates are unreliable at the end of the sample and therefore useless for prediction. The present paper starts with the observation that the finitedimension assumption is very strict, as it does not include a model as simple as xit =
1 1 − αi L
ut + ξit ,
(1.2)
with the coefficients αi independently drawn, for example, from the uniform distribution between −0.9 and 0.9. This seems sufficient motivation to go back to model (1.1) without the finite-dimension assumption. Combining the approach taken in Forni et al. (2000) with recent results obtained by Anderson, Deistler and coauthors (see Section 3), we show that under the assumption that the filters bij (L) are rational, plus reasonable technical assumptions, model (1.1) can be rewritten as Hn (L)xnt = Rn ut + Hn (L)ξ nt ,
(1.3)
where xnt and ξ nt stack the first n series xit and ξit respectively, ut = (u1t u2t · · · uqt )′ , Hn (L) is a finite matrix polynomial. Moreover: (i) Hn (L), which is n × n, and Rn , which is n × q, can be obtained from the spectral density of χnt . (ii) Hn (L)ξ nt is idiosyncratic (this is not obvious; see Section 4). Though the paper is limited to representation results, Eq. (1.3), combined with the estimate of the spectral density of χnt proposed in Forni et al. (2000), can be seen as a basis for estimating the common components χit , without the finite-dimension assumption and using only contemporaneous and past values of the series xit .
24
M. Forni, M. Lippi / Journal of Econometrics 163 (2011) 23–28
Section 2 reviews previous results on model (1.1). Section 3 introduces and discusses the main assumptions. Section 4 derives representation (1.3). Section 5 discusses estimation based on (1.3). Section 6 concludes.
where |pxn1 (θ )|2 +|pxn2 (θ )|2 +· · ·+|pxnq (θ )|2 = 1 for all θ ∈ [−π π ]. Define Pnj (L) as the inverse Fourier transform of 1
2. Previous results
ut ,n = (Pn1 (L)′ Pn2 (L)′ · · · Pnq (L)′ )′ xnt
Let us rewrite model (1.1) in vector form: xnt = χnt + ξ nt
is a q-dimensional orthonormal white noise. Moreover, define (2.4)
χnt = Bn (L)ut
with bij (L) being the (i, j) entry of Bn (L) for all n ≥ i (the matrices Bn (L) are nested). We assume that: A1. (Common components) ut is an orthonormal q-dimensional white noise. The filters bij (L) are square summable. A2. (Idiosyncratic components) ξ nt is weakly stationary. A3. (Orthogonality of common and idiosyncratic components) ξ nt ⊥ us , for all n, t , s. ξ A4. (Eigenvalues of the idiosyncratic components) Let Σn (θ ) be ξ the spectral density matrix of ξ nt and λn1 (θ ) its first eigenvalue (in descending order). We assume that there exists a positive ξ real number λ such that λn1 (θ ) ≤ λ for all n. χ A5. (Eigenvalues of the common components) Let Σn (θ ) be the χ spectral density matrix of χnt and λnq (θ ) its qth eigenvalue. χ We assume that λnq (θ ) → ∞, for all θ , for n → ∞. Forni and Lippi (2001) prove that (2.4) and Assumptions A1 through A5 impose little structure on the x’s. They show that the following two assumptions: (1) xnt is stationary for all n, (2) there exists an integer q such that, for n → ∞, the qth eigenvalue of the spectral density matrix of xnt diverges for all frequencies while the (q + 1)th is uniformly bounded, imply that the x’s can be represented as in (2.4) with A1 through A5 holding. Under Assumptions A1 through A5, the decomposition of the x’s into common and idiosyncratic components is unique. To be precise, if xit = χit′ + ξit′ = b′i1 (L)u′1t + b′i2 (L)u′2t
+ · · · + biq (L)uq′ t + ξit ′
′
(1.1 ) ′
for all i ∈ N and t ∈ Z, and Assumptions A1 through A5 are fulfilled for (1.1′ ), then q′ = q,
χit′ = χit ,
˜ t = B ′ (L−1 )ut , u
(2.5)
˜ t , which can replace the second equation we have χnt = B˜ n (L)u in (2.4). Now consider Σnx (θ ), its first q eigenvalues and corresponding eigenvectors: λxn1 (θ) λxn2 (θ ) · · · λxnq (θ ), pxn1 (θ) pxn2 (θ ) · · · pxnq (θ ),
χit ,n = Proj(xit |span(us,n , s ∈ Z)). Then as n → ∞ we have χit ,n → χit in quadratic mean. The above matrices and vectors have sample counterparts: x ΣnT (θ ),
PnT ,j (L),
ut ,nT ,
χit ,nT ,
and the result is that
χit ,nT → χit in probability as n, T → ∞ (see Forni et al., 2000). The following elementary example shows how the dynamic principal components work and their main drawback:
χit =
ut − 1 ut
if i is odd if i is even.
(2.6) ξ
Moreover, assume that Σn (θ ) = 21π In (the idiosyncratic components are orthogonal to one another and have unit variance). Then e− i θ 1
Σnx (θ ) =
1 . iθ . e 1 · · · ei θ 1 + 1 In . . 2π 2π e− i θ 1
The first eigenvalue is 1 + n, with eigenvector √1n eiθ 1 · · ·
eiθ 1 , so
−1 1 Pn1 (L) = √ L 1 · · · L−1 1 , n(1 + n) where L−1 is the forward shift operator: L−1 xit = xi,t +1 . As a consequence this estimator can be used only for t ≤ T − 1.
ξit′ = ξit
for all i ∈ N and t ∈ Z (see Forni and Lippi, 2001). Note that the asymptotic condition in Assumption A4 does not require mutual orthogonality of the idiosyncratic components, a standard identification condition in finite-n factor models. For example, a non-zero correlation of ξit with ξi+1,t does not conflict with A4. As a consequence, the decomposition of the x’s into common and idiosyncratic components is identified only under xit = χit + ξit = χit′ + ξit′ , for all i ∈ N and t ∈ Z. Note also that uniqueness does not extend to Bn (L) or the common shocks ut . For, if B (L) is a q × q filter such that B (z )B ′ (z −1 ) = Iq for |z | = 1, then defining B˜ n (L) = Bn (L)B (L),
pxnj (θ ).
The vector
2.1. The general model
′
λ (θ ) x nj
2.2. The restricted model An important simplification is obtained with the following assumption, which is used in Stock and Watson (2002a,b), Bai and Ng (2002), Forni et al. (2005) and Forni et al. (2009). For a given χ t, we will denote by St the Hilbert space span(χit , i ∈ N), i.e. the closure of the set of all linear combinations of the variables χit . Note χ that stationarity of the vectors χnt implies that the dimension of St is independent of t. χ AF. The space St is finite dimensional. χ Under A1 through A5 plus AF, denote by r the dimension of St . There exist: (I) an r-dimensional stationary process Ft , which has the representation Ft = N (L)ut ,
(2.7)
N (L) being a square-summable r × q filter; (II) nested n × r matrices Cn , such that
χnt = Bn (L)ut = Cn Ft
(2.8)
M. Forni, M. Lippi / Journal of Econometrics 163 (2011) 23–28
(for a proof of this fairly trivial statement, see Forni et al. (2009). The processes Fjt are called the static factors. Note that the static factors evolve according to a dynamic equation; see (2.7). ‘‘Static’’ only refers to the loading of Ft by the χ ’s; see (2.8). Summing up, in general, the stochastic variables {χit , i ∈ N, t ∈ Z}, span an infinite-dimensional Hilbert space X, which is contained in the Hilbert space spanned by {ujt , j = 1, . . . , q, t ∈ Z}. Under AF the Hilbert space spanned by {χit , i ∈ Z}, for any given t, is finite dimensional with stationary basis Ft . Of course in that case X is also contained in the Hilbert space spanned by {Fjt , j = 1, . . . , r , t ∈ Z}. Let Γnx be the covariance matrix of xnt . Under AF, estimation of the common components can be achieved using the first r eigenvalues and corresponding eigenvectors of Γnx to obtain Ft ,n , then projecting xit on Ft ,n . In this case only contemporaneous values of the x’s are involved, so no two-sidedness problem arises. 3. Back to the general model As we have observed in the Introduction, taking the simple case (1.2), rewritten here: xit =
1 1 − αi L
ut + ξit ,
where αi is drawn from the uniform distribution on the interval χ [−.9 .9], we see that St is not finite dimensional; thus, so to speak, we have an infinite number of static factors. Criteria for determining r, the number of static factors, when applied to models like (1.2), will produce wrong results, with the estimated r growing to infinity with n. Moreover, all criteria for determining q that are based on firstly estimating Ft , then estimating a VAR for Ft , are misspecified. To our knowledge, the only criterion for determining q, which does not depend on the assumption of a finite r and has therefore general applicability, is that of Hallin and Liška (2007).
25
is fundamental if and only if there exists an orthogonal matrix Kn such that C˜ n (L) = Cn (L)Kn (n)
v˜ t
= Kn′ vt(n) .
To understand the relationship between representations (3.9) and (2.4), consider again example (2.6):
χit =
ut − 1 ut
for i odd for i even.
In this case a fundamental white noise for χnt is ut −1 for n = 1, ut for n > 1. Note also that the (1, 1) entry of Cn (L) is 1 for n = 1, L for n > 1. The example shows that, firstly, reference to n in (n) vt is necessary and, secondly, that the matrices Cn (L), unlike the matrices Bn (L), are not necessarily nested. In the following example, though Cn (L) ̸= Bn (L) for all n, the matrices Cn (L) are nested. Let q = 1 and let representation (1.1) be
χit = b(L)ut ,
b(L) =
1 − α −1 L 1 − αL
,
with |α| < 1. As the polynomial 1 − α −1 L is not invertible, the white noise ut does not belong to the space spanned by present and past values of the χ ’s. However, elementary calculations show that 1 − α −1 L 1 − αL
ut = −α
−1
[
1 − α L−1 1 − αL
] (Lut ) = −α −1 wt ,
and that the spectral density of wt is equal to unity at all frequencies. Thus wt is a unit-variance white noise. Representation (3.9) is immediately obtained:
χit = c wt ,
c = −α −1 . (n)
We believe that model (1.2) provides a strong motivation for not assuming AF. Instead, we assume here that:
Thus the matrices Cn (L) are nested and vt = wt is independent of n. More generally, under Assumption A7′ , to be introduced below, we can choose the fundamental representations (3.9) in such a way (n) that vt is independent of n and the matrices Cn (L) are nested. Now consider the set of all n × q matrices D(L), with rational entries
A6. The spectral density of χnt is rational.
dij (K ) =
3.1. Fundamental and zeroless representations
Assumptions A6 and A5 imply that there exists n¯ ≥ q such that χ for n ≥ n¯ , rank (Σn (θ )) = q for θ a.e. in [−π π]. As a consequence, ¯ for n ≥ n the vector χnt has a fundamental rational representation of rank q, i.e.
χnt = Cn (L)vt(n) ,
(3.9)
where: (1) the entries of Cn (L), denoted by cij (L), are rational functions cij (L) =
dij (L) eij (L)
, (n)
where dij and eij have no common roots and eij (0) = 1; (2) vt is a q-dimensional orthonormal white noise; (3) Cn (z ) has no zeros for |z | < 1, a zero of Cn (z ) being defined as a complex number ζ such that the rank of Cn (ζ ) is lower than the maximum rank of Cn (z ), and no poles for |z | ≤ 1, the poles of Cn (z ) being defined as the (n) poles of the polynomials eij (z ). This implies that vt belongs to the space spanned by χn,t −k , for k ≥ 0. As (3.9) implies that χnt belongs (n)
to the space spanned by vt −k , for k ≥ 0, the two spaces coincide. Fundamental representations are unique up to an orthogonal matrix. To be precise,
χnt = C˜ n (L)˜vt(n)
fij (L) gij (L)
,
with gij (0) = 1, such that degree(fij ) ≤ p1 ,
degree(gij ) ≤ p2 .
The parameter space for D(L) has dimension nq(p1 + p2 + 1). If the matrix D(L) is tall, i.e. if n > q, then, for generic values of the parameters, D(L) is zeroless, i.e. the rank of D(z ) is q for all complex numbers z. To see why this result holds, consider firstly the following example, in which q = 1:
χit = (αi + βi L)ut ,
(3.10)
for i = 1, . . . , n, with n > 1. Obviously in this case D(z ) is zeroless unless αi /βi = γ for all i. In general, existence of a zero of D(z ) means that the determinants of all the q × q submatrices of D(z ) vanish for the same complex number. This implies algebraic restrictions on the coefficients of D(L), as argued in Forni et al. (2009) and Zinner (2008). For a formal proof see Anderson and Deistler (2008a) and Deistler et al. (2010). This motivates the following assumption, which will be enhanced in the next section: A7. For n ≥ q + 1, the matrix Cn (z ), corresponding to the funda(n) mental representation χnt = Cn (L)vt , is zeroless.
26
M. Forni, M. Lippi / Journal of Econometrics 163 (2011) 23–28
3.2. Autoregressive representations for n > q
and past values of χi1 ,...,iq+1 ,t span the same space as is spanned by
Tall, zeroless moving average rational matrices possess a finite inverse: (F) Let n > q. Consider the rational representation yt = D(L)zt , where yt is n-dimensional and zt is an orthonormal q-dimensional white noise. If D(L) is zeroless then yt has a finite autoregressive representation
present and past values of vt , and therefore by present and past values of χnt , for any n. Assumption A7′ rules out examples like (2.6), which fulfills A7. Note however that (2.6) is a special case of (3.10), in which Assumption A7′ is fulfilled for generic values of αi and βi . Lastly, consider a fundamental representation for χq+1,t :
(n)
A(L)yt = D(0)zt . For a formal proof see Anderson and Deistler (2008b) and Deistler et al. (2010). Example (3.10) for n = 2 provides an intuition:
χ1t = α1 ut + β1 ut −1 χ2t = α2 ut + β2 ut −1 .
(3.11)
We see that ut =
1
α1 β2 − α2 β1
−δβ22
δβ12 L χ1t α1 = u, χ2t α2 t 1 + δβ1 β2 L
where δ = 1/(α1 β2 − α2 β1 ). Note that the autoregressive representation exists if and only if α1 β2 − α2 β1 ̸= 0, that is when D(z ) is zeroless. Moreover, χ1t −1 and χ2t −1 are linearly independent. Therefore the autoregressive representation of order 1 is unique. But as soon as n = 3,
χ1t = α1 ut + β1 ut −1 χ2t = α2 ut + β2 ut −1 χ3t = α3 ut + β3 ut −1 ,
(3.12)
we see that infinitely many autoregressive representations of order 1 are possible. For, setting α = (α1 α2 α3 )′ , we have 1 cα
χit = fi1 (L)v1t + fi2 (L)v2t + · · · + fiq (L)vqt , for all i ∈ N. Thus under A7′ representation (3.9) can be written with a white noise vt , which is independent of n, and nested matrices Cn (L): (3.15)
3.3. Non-stationary variables; cointegration
1 − δβ1 β2 L
ut =
By A7′ , for i > q + 1, χit belongs to the space spanned by present and past values of χq+1,t and therefore of vt , so
χnt = Cn (L)vt .
(β2 χ1t − β1 χ2t ),
and so
χq+1,t = F (L)vt .
(c1 χ1t + c2 χ2t + c3 χ3t ),
(3.13)
where c = (c1 c2 c3 ) is any vector orthogonal to (β1 β2 β3 ) and such that c α ̸= 0. Using (3.13) to replace ut −1 in (3.12), we obtain an autoregressive representation of order one depending on c. Consider now q + 1 integers i1 , i2 , . . . , iq+1 , with 1 ≤ ik < ik+1 ≤ n, and let
χi1 ,...,iq+1 ,t = (χi1 t χi2 t · · · χiq+1 t )′ = Cn;i1 ,...,iq+1 (L)vt(n)
(3.14)
be obtained from (3.9) by selecting the rows i1 , i2 , . . . , iq+1 . The vector (3.14) is tall (it has dimension q + 1 and rank q), so for generic values of the parameters the matrix Cn;i1 ,...,iq+1 (L) is zeroless. As a consequence, by Proposition (F), for generic values of the parameters the vector χi1 ,...,iq+1 ,t has a finite autoregressive representation. This motivates almost all of Assumption A7′ below, which enhances Assumption A7. The uniqueness in part (ii) is motivated by the discussion of examples (3.11) and (3.12).
Application of our dynamic factor model requires stationarity. If the data set contains non-stationary variables, as is the case with macroeconomic data sets, the data must be transformed either by removing a deterministic trend or by differencing (this is current practice in dynamic factor literature). Name as yit the variables in the data set and xit the corresponding transformed stationary variables. The question that we want to briefly discuss here is whether some of our assumptions may fail to hold for the transformed variables xit . We find that strong cointegration relationships among the common components of the y’s imply that Assumption A7′ does not hold for some choice of i1 , i2 , . . . , iq+1 . Assume for simplicity that all the variables yit in the data set are I(1) and that yit = φit + ψit , where: (i) φit is I(1) for all i ∈ N. (ii) The variables xit , χit and ξit , defined as the first differences of yit , φit and ψit respectively, evolve according to model (2.4) and fulfill Assumptions A1 through A6. Consider now a q-dimensional vector obtained by selecting q variables among the φ ’s:
φi1 ,...,iq ,t = (φi1 t φi2 t · · · φiq t )′ . The vector φi1 ,...,iq ,t has the representation
χi1 ,...,iq ,t = Bi1 ,...,iq (L)ut , which is obtained by selecting the rows i1 , i2 , . . . , iq in (2.4). The vector φi1 ,...,iq ,t is cointegrated if and only if Bi1 ,...,iq ,t (1) is singular. Now consider a (q + 1)-dimensional vector
φi1 ,...,iq+1 ,t = (φi1 t φi2 t · · · φiq+1 t )′ ,
A7′ . For all n and all choices of i1 , i2 , . . . , iq+1 , we assume that (i) Cn;i1 ,...,iq+1 (z ) is zeroless, and that (ii) χi1 ,...,iq+1 ,t has a unique minimum-lag autoregressive representation.
whose representation is
As the vector (3.14) is tall, being of dimension (q + 1) but of rank q, part (i) of A7′ can be motivated by the genericity argument. Part (ii) has a motivation in the discussion of examples (3.11) and (3.12). A consequence of A7′ (i) is that the space spanned by present and past values of {χit , i ∈ Z} is equal to that spanned by present and past values of any q + 1 among the variables χit . For, present
where the matrix Bi1 ,...,iq+1 (L) is (q + 1) × q. If we assume that all q-dimensional subvectors of φi1 ,...,iq+1 ,t are cointegrated, the matrix Bi1 ,...,iq+1 (z ) has a zero at z = 1. Thus A7′ does not hold for χi1 ,...,iq+1 ,t . In particular, χi1 ,...,iq+1 ,t has no finite autoregressive representation. The problem has no obvious solution as the variables φit and χit are not observable. Direct estimation of non-stationary factors
χi1 ,...,iq+1 ,t = Bi1 ,...,iq+1 (L)ut ,
M. Forni, M. Lippi / Journal of Econometrics 163 (2011) 23–28
and common components has been obtained in Bai and Ng (2004), but only for the restricted model. Methods allowing estimation of the components φit and testing for their cointegration in the general model are not available. On the other hand, we do not really need as much as Assumption A7′ . In the next section we show that what is needed to obtain a finite autoregressive representation for χnt is the existence of a partition of χnt into (q + 1)-dimensional subvectors each fulfilling A7′ . In empirical situations, careful grouping of the variables, based for example on their economic relationships, should help with avoiding ‘‘dangerous’’ (q + 1)dimensional vectors. 4. Transforming the dynamic model into a static model with q factors We assume for convenience that n = (q + 1)m and partition
χnt as
27
Let us prove that this is a static factor model with q factors, i.e. that as n → ∞ the first q eigenvalues of the covariance matrix of χ˜ nt diverge and the first eigenvalue of the covariance matrix of ξ˜ nt is bounded. The first statement is a consequence of A8. Moreover, using A4 and A9, we have aΣnξ (θ )a∗ = aHn (e−iθ )Σnξ (θ )Hn (e−iθ )∗ a∗ ˜
ξ
≤ λn1 (θ )aHn (e−iθ )Hn (e−iθ )∗ a∗ ≤ λµ|a|2 . ξ˜
Thus the first eigenvalue of the spectral density Σn (θ ), call it ξ˜
λn1 (θ ), is bounded by λµ. On the other hand, the first eigenvalue of the covariance matrix of ξ˜ nt is bounded by ∫ π ξ˜ λn1 (θ )dθ . −π
χnt = (χ′[1]t χ′[2]t · · · χ′[m]t )′ ,
The result follows. Other choices of the autoregressive representation of χnt
where χ′[s]t = (χ(s−1)(q+1)+1,t χ(s−1)(q+1)+2,t · · · χs(q+1),t )′ . We start with (3.15) and denote by
idiosyncratic ξˇ nt . As an example, consider again model (3.10):
A[s] (L)χ[s]t = R[s] vt
ˇ nt + ξˇ nt with a nonmay turn out into representations xˇ nt = χ (4.16)
the minimum-lag autoregressive representation of the (q + 1)dimensional vector χ[s]t (see Assumption A7′ ). Combining Eq. (4.16), χnt has the following autoregressive representation: A[1] (L) 0
··· ··· .. .
0
A[2] (L)
0
···
0
(R′[1]
0 0
χnt = Rn vt ,
χnt = αn ut + βn ut −1 . If c = (c1 c2 · · · cn ) is orthogonal to βn , then an autoregressive representation is I − (δβn cL) χnt = αn ut ,
where δ = (c αn )−1 and therefore (4.17)
A[m] (L)
ˇ nt + ξˇ nt . I − (δβn cL) xnt = αn ut + I − (δβn cL) ξ nt = χ
We have
· · · R′[m] )′ .
where Rn = Of course other representations like (4.17) can be obtained by reordering the components of χnt . However, the component of Rn vt which corresponds to a given component of χnt is independent of which ordering has been chosen.
ξˇit = ξit + δβi [c ξ n,t −1 ].
A8. We assume that the qth eigenvalue of Rn R′n , call it νn , tends to infinity as n → ∞.
5. Estimation; a sketch
Assumption of A5. In example (3.10), A5 ∑A8 is not a−consequence iθ 2 requires that |α + β e | diverges for all θ , while A8 requires i i ∑ 2 that αi diverges. Note that A8 is not affected if Rn is multiplied on the right by an orthogonal matrix. We denote by G∗ the complex conjugate of the matrix G. A9. Let Ai1 ,...,iq+1 (L) be the minimum-lag autoregressive matrix of χi1 ,...,iq+1 ,t . Denote by µi1 ,...,iq+1 (θ ) the maximum eigenvalue of Ai1 ,...,iq+1 (e−iθ )Ai1 ,...,iq+1 (e−iθ )∗ . We assume that µi1 ,...,iq+1 (θ ) ≤ µ for a positive real µ, for all choices of ik , k = 1, . . . , q + 1, for all θ . Assumption A9 is reasonable but not trivial. Take A(L) =
1 βL
αL 1
.
The trace of A(e−iθ )A(e−iθ )∗ is |1 + α e−iθ |2 + |1 + β e−iθ |2 , which is not bounded under the stability condition |αβ| < 1. Defining Hn (L) as the autoregressive matrix in (4.17), we have Hn (L)xnt = Rn vt + Hn (L)ξ nt
(4.18)
˜ nt = Rn vt and ξ˜ nt = Hn (L)ξ nt , or, setting x˜ nt = Hn (L)xnt , χ ˜ nt + ξ˜ nt . x˜ nt = Rn vt + ξ˜ nt = χ
(4.19)
Thus the vector ξˇ nt is not idiosyncratic.
In the previous section we have shown that Assumption A7′ implies the existence of representation (4.18). We now provide a procedure for constructing Hn (L), Rn and vt starting with the χ spectral density of the common components Σn (θ ). As we assume χ that Σn (θ ) is known, this is to be considered only as a sketch of an χ estimation procedure. In practical situations Σn (θ ) is not known; χ ˆ n (θ ) and compute the corresponding we start with an estimate Σ ˆ n (L), Rˆ n and vˆ t . A proof of consistency of such sample-dependent H estimates, for n and T tending to infinity, is beyond the scope of the present paper and left for future research. Let us only observe here that our assumptions, A1 through A9, must be enhanced with conditions ensuring consistency of a smoothed periodogram of xnt (see e.g. Brockwell and Davis, 1991, pp. 445–7). Firstly we determine Hn (L) and Rn . We keep assuming that n = χ (q + 1)m. Using the m diagonal (q + 1) × (q + 1) blocks of Σn (θ ) we can obtain the matrices G[j] (L),
Γ[j] ,
j = 1, 2, . . . , m,
corresponding to the Wold representation
χ[j]t = G[j] (L)w[j]t .
(5.20)
Note that neither the χ[j]t nor the w[j]t are observable. The matrix G[j] (L) is (q + 1) × (q + 1) and has rational entries. Moreover, G[j] (0) = Iq+1 . The matrix Γ[j] is the covariance matrix of the (q + 1)× 1 one-step-ahead prediction error vector w[j]t . The matrix
28
M. Forni, M. Lippi / Journal of Econometrics 163 (2011) 23–28
Γ[j] (like w[j]t ) is of rank q. By Assumption A7′ (ii), (5.20) can be rewritten as
1
A[j] (L)χ[j]t = w[j]t ,
n ∑
where A[j] (L) is the unique minimum-lag left inverse of G[j] (L). The matrix Γ[j] can be factored as
Γ[j] = P[j] Λ[j]
1 2
1 Λ[j] 2 P[′j] ,
the matrix P[j] being (q + 1) × q with the normalized first q eigenvectors of Γ[j] on the columns, while Λ[j] is q × q with the (non-zero) corresponding eigenvectors on the diagonal. The columns of P[j] are mutually orthogonal. We define −1
−1
v[j]t = Λ[j] 2 P[′j] wj[]t = Λ[j] 2 P[′j] A[j] (L)χ[j]t .
(5.21)
It is easily seen that v[j]t is an orthonormal q-dimensional white 1
noise. Moreover, projecting w[j]t on v[j]t we find w[j]t = P[j] Λ[j] 2 1
v[j]t . Defining S[j] = P[j] Λ[2j] , we obtain A[j] (L)χ[j]t = S[j] v[j]t . The white noise vectors v[j]t are different in general but, by Assumption A7′ , span the same space. Therefore, for j = 2, . . . , m, v[j]t = K[j] v[1]t , where Kj is orthogonal. Using (5.21), K[j] = E v[j]t v[′1]t
=
1
∫
2π
Thus the sum of its squares is
π
[ 1 ] − −1 χ Λj 2 P[′j] A[j] (e−iθ )Σ[j1] (θ )A[1] (e−iθ )∗ P[1] Λ[1]2 dθ ,
−π
χ
where Σ[j1] (θ ) is the (q + 1) × (q + 1) cross-spectrum of χ[j]t and χ
χ[1]t (a submatrix of Σn (θ )). In conclusion, setting vt = v[1]t , we
,
R2ks
k=1
i.e. the reciprocal of the sth eigenvalue of Rn R′n . By Assumption A8, this reciprocal tends to zero as n → ∞. Because Hn (L)ξ nt is idiosyncratic, the term Mn R′n Hn (L)ξ nt tends to zero in mean square as n → ∞ (see e.g. Forni and Lippi, 2001). Thus Mn R′n Hn (L)xnt → vt in mean square as n → ∞. Lastly, χnt results from inversion of Hn (L). 6. Conclusions χ
Forni et al. (2000) estimate Σn (θ ), the spectral density of the common components of model (1.1), by means of q dynamic χ principal components, and provide a factorization of Σn (θ ). However, the estimator of the common components based on such factorization, though consistent, applies two-sided filters to the observable variables xit . In the present paper, under the assumption of rationality for χ Σn (θ ) and other mild requirements, we obtain a factorization of χ Σn (θ ) which only employs one-sided filters. An important feature of our method is that the problem of χ factoring Σn (θ ), which is of dimension n and rank q, is solved by separately factoring many spectral matrices of dimension q + 1. Acknowledgements We would like to thank, for suggestions and criticism, Manfred Deistler, Marc Hallin, Hashem Pesaran and Paolo Zaffaroni.
have A[1] (L) 0
0
A[2] (L)
0
0
S
[1]
··· ··· .. . ···
χ[1]t χ[2]t . . . A[m] (L) χ[m]t 0 0
References
′
S[2] K[2] vt , = .. .
(5.22)
S[m] K[′m]
and therefore Hn (L)xnt = Rn vt + Hn (L)ξ nt , where Hn (L) and Rn are defined in (5.22). The next step determines vt . Note that the matrix Rn has mutually orthogonal columns. As a consequence, R′n Rn has the eigenvalues of Rn R′n on the main diagonal (this is easily seen) and zero elsewhere. Setting Mn = (R′n Rn )−1 , Mn R′n Hn (L)xnt = Mn R′n Rn vt + Mn R′n Hn (L)ξ nt
= vt + Mn R′n Hn (L)ξ nt . Denoting by Rij the entries of Rn , the sth row of Mn R′n is 1 n ∑ k=1
R2ks
(R1s R2s · · · Rns ) .
Anderson, B.D.O., Deistler, M., 2008a. Properties of zero-free transfer function matrices. SICE Journal of Control, Measurement and System Integration 1, 1–9. Anderson, B.D.O., Deistler, M., 2008b. Generalized linear dynamic factor models—a structure theory. In: 2008 IEEE Conference on Decision and Control. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J., Ng, S., 2004. A PANIC attack on unit roots and cointegration. Econometrica 72, 1127–1177. Brockwell, P.J., Davis, R.A., 1991. Time Series: Theory and Methods. Springer-Verlag, New York. Deistler, M., Filler, A., Zinner, C., Chen, W., 2010. Generalized linear dynamic factor models: an approach via singular autoregressions. European Journal of Control 16 (3), 211–224. Forni, M., Giannone, D., Lippi, M., Reichlin, L., 2009. Opening the black box: structural factor models with large cross-sections. Econometric Theory 25, 1319–1347. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. The Review of Economics and Statistics 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2005. The generalized factor model: onesided estimation and forecasting. Journal of the American Statistical Association 100, 830–840. Forni, M., Lippi, M., 2001. The generalized dynamic factor model: representation theory. Econometric Theory 17, 1113–1141. Hallin, M., Liška, R., 2007. The generalized dynamic factor model: determining the number of factors. Journal of the American Statistical Association 102, 103–117. Stock, J.H., Watson, M.W., 2002a. Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–162. Stock, J.H., Watson, M.W., 2002b. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167–1179. Zinner, C., 2008. Modeling of high-dimensional time series by generalized dynamic factor models. Ph.D. Dissertation. Technische Universität Wien.
Journal of Econometrics 163 (2011) 29–41
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Dynamic factors in the presence of blocks Marc Hallin a,b,c,d,e,∗ , Roman Liška a,1 a
ECARES (European Centre for Advanced Research in Economics and Statistics), Université libre de Bruxelles, CP 114, B-1050 Bruxelles, Belgium
b
ORFE, Princeton University, United States
c
CentER, Tilburg University, Netherlands
d
ECORE, Bruxelles and Louvain-la-Neuve, Belgium
e
Académie Royale de Belgique
article
info
Article history: Available online 11 November 2010 JEL classification: C13 C33 C43 Keywords: Panel data High dimensional time series data Dynamic factor model Dynamic principal components
abstract Macroeconometric data often come under the form of large panels of time series, themselves decomposing into smaller but still quite large subpanels or blocks. We show how the dynamic factor analysis method proposed in Forni et al. (2000), combined with the identification method of Hallin and Liška (2007), allows for identifying and estimating joint and block-specific common factors. This leads to a more sophisticated analysis of the structures of dynamic interrelations within and between the blocks in such datasets, along with an informative decomposition of explained variances. The method is illustrated with an analysis of a dataset of Industrial Production Indices for France, Germany, and Italy. © 2011 Published by Elsevier B.V.
1. Introduction 1.1. Panel data and dynamic factor models In many fields – macroeconometrics, finance, environmental sciences, chemometrics, . . . – information comes under the form of a large number of observed time series or panel data. Panel data consist of series of observations (length T ) made on n individuals or ‘‘cross-sectional items’’ that have been put together on purpose, because, mainly, they carry some information about some common feature or unobservable process of interest, or are expected to do so. This ‘‘commonness’’ is a distinctive feature of panel data: mutually independent cross-sectional items, in that respect, do not constitute a panel (or then, a degenerate one). Cross-sectional heterogeneity is another distinctive feature of panels: n (possibly non-independent) replications of the same time series would be another form of degeneracy. Moreover, the impact of item-specific or idiosyncratic effects, which have the role of a nuisance, very
∗ Corresponding author at: ECARES (European Centre for Advanced Research in Economics and Statistics), Université libre de Bruxelles, CP 114, B-1050 Bruxelles, Belgium. E-mail address:
[email protected] (M. Hallin). 1 Present address: Institute for Health and Consumer Protection (IHCP), European Commission Joint Research Centre, I-21027 Ispra (VA), Italy. 0304-4076/$ – see front matter © 2011 Published by Elsevier B.V. doi:10.1016/j.jeconom.2010.11.004
often dominate, quantitatively, that of the common features one is interested in. Finally, all individuals in a panel are exposed to the influence of unobservable or unrecorded covariates, which create complex interdependencies, both in the cross-sectional as in the time dimension, which cannot be modelled, as this would require questionable modelling assumptions and a prohibitive number of nuisance parameters. These interdependencies may affect all (or almost all) items in the panel, in which case they are ‘‘common’’; they also may be specific to a small number of items, hence ‘‘idiosyncratic’’. The idea of separating ‘‘common’’ and ‘‘idiosyncratic’’ effects is thus at the core of panel data analysis. The same idea is the cornerstone of factor analysis. There is little surprise, thus, to see a time series version of factor analysis emerging as a powerful tool in the context of panel data. This dynamic version of factor models, however, requires adequate definitions of ‘‘common’’ and ‘‘idiosyncratic’’. This definition should not simply allow for identifying the decomposition of the observation into a ‘‘common’’ component and an ‘‘idiosyncratic’’ one, but also should provide an adequate translation of the intuitive meanings of ‘‘common’’ and ‘‘idiosyncratic’’. Denote by Xit the observation of item i (i = 1, . . . , n) at time t (t = 1, . . . , T ); the factor model decomposition of this observation takes the form Xit = χit + ξit ,
i = 1, . . . , n, t = 1, . . . , T ,
30
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
where the common component χit and an the idiosyncratic one ξit are mutually orthogonal (at all leads and lags) but unobservable. Some authors identify this decomposition by requiring the idiosyncratic components to be ‘‘small’’ or ‘‘negligible’’, as in dimension reduction techniques. Some others require that the n idiosyncratic processes be mutually orthogonal white noises. Such characterizations do not reflect the fundamental nature of factor models: idiosyncratic components indeed can be ‘‘large’’ and strongly autocorrelated, while white noise can be common. For instance, in a model of the form Xit = χt + ξit , where χt is white noise and orthogonal to ξit = εit + ai εi,t −1 , with i.i.d. εit ’s, the white noise component χt , which is present in all cross-sectional items, very much qualifies as being ‘‘common’’, while the cross-sectionally independent autocorrelated ξit ’s, being item-specific, exhibit all the attributes one would like to see in an ‘‘idiosyncratic’’ component. A possible characterization of commonness/idiosyncrasy is obtained by requiring the common component to account for all cross-sectional correlations, leading to possibly autocorrelated but cross-sectionally orthogonal idiosyncratic components. This yields the so-called ‘‘exact factor models’’ considered, for instance, by Sargent and Sims (1977) and Geweke (1977). These exact models, however, are too restrictive in most real life applications, where it often happens that two (or a small number of) crosssectional items, being neighbours in some broad sense, exhibit cross-sectional correlation also in variables that are orthogonal, at all leads and lags, to all other observations throughout the panel. A ‘‘weak’’ or ‘‘approximate factor model’’, allowing for mildly crosssectionally correlated idiosyncratic components, therefore also has been proposed (Chamberlain, 1983; Chamberlain and Rothschild, 1983), in which, however, the common and idiosyncratic components are only asymptotically (as n → ∞) identified. Under its most general form, the characterization of idiosyncrasy, in this weak factor model, can be based on the behavior, as n → ∞, of the eigenvalues of the spectral density matrices of the unobservable idiosyncratic components, but also (Forni and Lippi, 2001) on the asymptotic behavior of the eigenvalues of the spectral density matrices of the observations themselves: see Section 2 for details. This general characterization is the one we are adopting here. Finally, once the common and idiosyncratic components are identified, two types of factor models can be found in the literature, depending on the way factors are driving the common components. In static factor models, it is assumed that common components are of the form
χit =
q −
bil flt ,
i = 1, . . . , n, t = 1, . . . , T ,
(1)
l=1
that is, the χit ’s are driven by q factors f1t , . . . , fqt which are loaded instantaneously. This static approach is the one adopted by Chamberlain (1983), Chamberlain and Rothschild (1983), Stock and Watson (1989, 2002a,b, 2005), Bai and Ng (2002, 2007), and a large number of applied studies. The so-called general dynamic model decomposes common components into
χit =
q −
bil (L)ult ,
i = 1, . . . , n, t = 1, . . . , T ,
(2)
l=1
where u1t , . . . , uqt , the unobservable common shocks, are loaded via one-sided linear filters bil (L). That ‘‘fully dynamic’’ approach (the terminology is not unified and the adjective ‘‘dynamic’’ is often used in an ambiguous way) goes back, under exact factor form, to Chamberlain (1983) and Chamberlain and Rothschild (1983), but was developed, mainly, by Forni et al. (2000, 2003, 2004, 2005, 2009); Forni and Lippi (2001) and Hallin and Liška (2007). The static model (1) clearly is a particular case of the general dynamic one (2). Its main advantage is simplicity. On the other hand, both models share the same assumption on the asympotic
behavior of spectral eigenvalues—the plausibility of which is confirmed by empirical evidence. But the static model (1) places an additional and rather severe restriction on the data-generating process, while the dynamic one (2), as shown by Forni and Lippi (2001), does not—we refer to Section 2 for details. Moreover, the synchronization of clocks and calendars across the panel is often quite approximative, so that the concept of ‘‘instantaneous loading’’ itself may be questionable. Both the static and the general dynamic models are receiving increasing attention in finance and macroeconometric applications where information usually is scattered through a (very) large number n of interrelated time series (n values of the order of several hundreds, or even one thousand, are not uncommon). Classical multivariate time series techniques are totally helpless in the presence of such values of n, and factor model methods, to the best of our knowledge, are the only ones that can handle such datasets. In macroeconomics, factor models are used in business cycle analysis (Forni and Reichlin, 1998; Giannone et al., 2006), in the identification of economy-wide and global shocks, in the construction of indices and forecasts exploiting the information scattered in a huge number of interrelated series (Altissimo et al., 2001), in the monitoring of economic policy (Giannone et al., 2005), and in monetary policy applications (Bernanke and Boivin, 2003; Favero et al., 2005). In finance, factor models are at the heart of the extensions proposed by Chamberlain and Rothschild (1983) and Ingersol (1984) of the classical arbitrage pricing theory; they also have been considered in performance evaluation and risk measurement (Chapters 5 and 6 of Campbell et al., 1997), and in the statistical analysis of the structure of stock returns (Yao, 2008). Factor models in the recent years also generated a huge amount of applied work: see d’Agostino and Giannone (2005), Artis et al. (2005), Bruneau et al. (2007), Den Reijer (2005), Dreger and Schumacher (2004), Schumacher (2007), Nieuwenhuyzen (2004), Schneider and Spitzner (2004), Giannone and Matheson (2007), and Stock and Watson (2002b) for applications to data from UK, France, the Netherlands, Germany, Belgium, Austria, New Zealand, and the US, respectively; Altissimo et al. (2001), Angelini et al. (2001), Forni et al. (2003), and Marcellino et al. (2003) for the Euro area and Aiolfi et al. (2006) for South American data—to quote only a few. Dynamic factor models also have entered the practice of a number of economic and financial institutions, including several central banks and national statistical offices, who are using them in their current analysis and prediction of economic activity. A real time coincident indicator of the EURO area business cycle (EuroCOIN), based on Forni et al. (2000), is published monthly by the London-based Center for Economic Policy Research and the Banca d’Italia: see (http://www.cepr.org/data/EuroCOIN/). A similar index, based on the same methods, is established for the US economy by the Federal Reserve Bank of Chicago. 1.2. Dynamic factor models in the presence of blocks: outline of the paper Although heterogeneous, panel data very often are obtained by pooling together several ‘‘blocks’’ which themselves constitute ‘‘large’’ subpanels. In macroeconometrics, for instance, data typically are organized either by country or sectoral origin: the database which is used in the construction of EuroCOIN, the monthly indicator of the euro area business cycle published by CEPR, includes almost 1000 time series that cover six European countries and are organized into eleven subpanels including industrial production, producer prices, monetary aggregates, etc. Depending on the objectives of the analysis, such a panel could be divided into six blocks (one for each country), or into eleven blocks (one for each subpanel). When these blocks are large enough, several dynamic factor models can be considered and analyzed,
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
allowing for a refined analysis of interblock relations. In the simple two-block case, ‘‘marginal common factors’’ can be defined for each block, and need not coincide with the ‘‘joint common factors’’ resulting from pooling the two blocks. The objective of this paper is to provide a theoretical basis for that type of analysis. For simplicity, we start with the simple case of two blocks. We show (Section 2) how a factorization of the Hilbert space spanned by the n observed series leads to a decomposition of each of them into four mutually orthogonal (at all leads and lags) components: a strongly idiosyncratic one, a strongly common one, a weakly common, and a weakly idiosyncratic one. In Sections 3 and 4, we show how projections onto appropriate subspaces provide consistent data-driven reconstructions of those various components. Section 5 is devoted to the general case of K ≥ 2 blocks, allowing for various decompositions of each observation into mutually orthogonal (at all leads and lags) components. The tools we are using throughout are Brillinger’s theory of dynamic principal components, a key result (Proposition 2) by Forni et al. (2000), and the identification method developed by Hallin and Liška (2007). Proofs are concentrated in Appendix. The potential of the method is briefly illustrated, in Section 6, with a panel of Industrial Production Index data for France and Germany (K = 2 blocks, q = 3 factors), then France, Germany and Italy (K = 3 blocks, q = 4 factors). Simple as it is, the analysis of that dataset reveals some striking facts. For instance, both Germany and Italy exhibit a ‘‘national common factor’’ which is idiosyncratic to the other two countries, while France’s common factors are included in the space spanned by Germany’s. The (estimated) percentages of explained variation associated with the various components also are quite illuminating: Germany, with 25% of common variation, is the ‘‘most common’’ out of the three countries. But it also is, with only 6.4% of its total variation, the ‘‘least strongly common’’ one. France has the highest proportion (82.4%) of marginal idiosyncratic variation but also the highest proportions of strongly and weakly idiosyncratic variations (72.6% and 9.8%, respectively). We do not attempt here to provide an economic interpretation for such facts. Nor do we apply the method to a more sophisticated dataset.2 But we feel that the simple application we are proposing provides sufficient evidence of the potential power of the method, both from a structural as from a quantitative point of view.3 2. The dynamic factor model in the presence of blocks We throughout assume that all stochastic variables in this paper belong to the Hilbert space L2 (Ω , F , P), where (Ω , F , P) is some given probability space. We will study two double-indexed sequences Y := {Yit ; i ∈ N, t ∈ Z} and Z := {Zjt ; j ∈ N, t ∈ Z}, where t stands for time and i, j are cross-sectional indices, of observed random variables. Let Yny := {Yny ,t ; t ∈ Z} and Znz := {Znz ,t ; t ∈ Z} be the ny - and nz -dimensional subprocesses of Y and Z, respectively, where Yny ,t ′ Z1t , . . . , Znz t , and write
:=
Y1t , . . . , Yny t
′
and Znz ,t
:=
31
The following assumption is made throughout the paper. Assumption A1. For all n, the vector process {Xn,t ; t ∈ Z} is a zero-mean second-order stationary process. Denoting by 6y;ny (θ ) and 6z ;nz (θ ) the (ny × ny ) and (nz × nz ) spectral density matrices of Yny ,t and Znz ,t , and by 6yz ;n (θ ) = 6′zy;n (θ ) their (ny × nz ) cross-spectrum matrix, write
6n (θ ) =:
6y;ny (θ ) 6zy;n (θ )
6yz ;n (θ ) , 6z ;nz (θ )
θ ∈ [−π , π]
for the (n × n) spectral density matrix of Xn,t , with elements σi1 i2 (θ ), σj1 j2 (θ ) or σkk (θ ), k = 1, . . . , n, i1 , i2 = 1, . . . , ny , j1 , j2 = 1, . . . , nz . On these matrices, we make the following assumption. Assumption A2. For any k ∈ N, there exists a real ck > 0 such that σkk (θ ) ≤ ck for any θ ∈ [−π , π]. For any θ ∈ [−π , π], let λy;ny ,i (θ ) be 6y;ny (θ )’s i-th eigenvalue (in decreasing order of magnitude). The function θ → λy;ny ,i (θ ) is called 6y;ny (θ )’s i-th dynamic eigenvalue. The notation θ → λz ;nz ,j (θ ) and θ → λn,k (θ ) is used in an obvious way for the dynamic eigenvalues of 6z ;nz (θ ) and 6n (θ ), respectively. The corresponding dynamic eigenvectors, of dimensions (ny × 1), (nz × 1), and (n × 1), are denoted by py;ny ,i (θ ), pz ;nz ,j (θ ), and pn,k (θ ), respectively. Throughout, we repeatedly use the classical correspondence M(L) :=
π
∞ [∫ −
1
2π s=−∞
]
M(θ )eisθ dθ Ls
−π
between a matrix-values function M(θ ) and the filter M(L): for instance, 1 λ− 1;k (L) :=
∞ [∫ −
1
2π s=−∞
6−1/2 (L) :=
] 1 isθ λ− (θ ) e d θ Ls , 1;k
−π
∞ −
1
π
π
[∫
2π s=−∞
]
6−1/2 (θ )eisθ dθ Ls ,
etc.
−π
Dynamic eigenvectors, in particular, can be expanded in Fourier series, e.g. pn,k (θ ) =
1
∞ [∫ −
2π s=−∞
π
]
pn,k (ω)eisω dω e−isθ
−π
where the series on the right-hand side converge in quadratic mean, which in turn defines square summable filters of the form
( L) = n ,k
p
1
∞ [∫ −
2π s=−∞
π
]
pn,k (ω)eisω dω Ls .
−π
Xn,t := (Y1t , . . . , Yny t , Z1t , . . . , Znz t ) := (Yny ,t Znz ,t )
Similarly define p
with n := (ny , nz ) and n := ny + nz . The Hilbert subspaces spanned by the processes Y, Z and X are denoted by Hy , Hz and H , respectively.
respectively. On these dynamic eigenvalues, we make the following assumptions.
′
′
′
′
y;ny ,i
(L) and pz ;n ,j (L) from py;ny ,i (θ ) and pz ;nz ,j (θ ), z
Assumption A3. For some qy , qz ∈ N, 2 Another application, of an entirely different nature and scope, is considered in Hallin et al. (2011). 3 A similar problem is considered in two recent working papers (Ng et al., 2008; Ng and Moench, in press), where the authors adopt a hierarchical model with ‘‘national shocks’’ that are pervasive throughout all blocks, while ‘‘regional shocks’’ are region-specific; their approach, however, resorts to the static factor model literature.
(i) the qy -th dynamic eigenvalue of 6y;ny (θ ), λy;ny ,qy (θ ), diverges as ny → ∞, a.e. in [−π , π], while the (qy + 1)-th one, λy;ny ,qy +1 (θ ), is θ -a.e. bounded; (ii) the qz -th dynamic eigenvalue of 6z ;nz (θ ), λz ;nz ,qz (θ ), diverges as nz → ∞, a.e. in [−π , π], while the (qz + 1)-th one, λz ;nz ,qz +1 (θ ), is θ -a.e. bounded.
32
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
The following lemma shows that this behavior of the dynamic eigenvalues of the subpanel spectral matrices 6y;ny (θ ) and 6z ;nz (θ ) entails a similar behavior for the dynamic eigenvalues λn,k (θ ) of 6n (θ). Lemma 1. Let Assumptions A1–A3 hold. Then, there exists q ∈ N, with max(qy , qz ) ≤ q ≤ qy + qz , such that 6n (θ )’s q-th dynamic eigenvalue λn,q (θ ) diverges as min(ny , nz ) → ∞, a.e. in [−π , π], while the (q + 1)-th one, λn,q+1 (θ ), is θ -a.e. bounded. Proof. See the Appendix.
Theorem 2 in Forni and Lippi (2001) establishes that the behavior of dynamic eigenvalues described in Assumption A3 and Lemma 1 characterizes the existence of a dynamic factor representation. We say that a process X := {Xkt , k ∈ N, t ∈ Z} admits a dynamic factor representation with q factors if Xkt decomposes into a sum Xkt = χkt + ξkt ,
with χkt :=
q −
bkl (L)ult
and
l =1
bkl (L) :=
∞ −
bklm Lm ,
k ∈ N, t ∈ Z,
m=1
(i) the q-dimensional vector process {ut := (u1t u2t . . . uqt ) ; t ∈ Z} is orthonormal white noise; (ii) the (unobservable) n-dimensional processes {ξn := (ξ1t ξ2t · · · ξnt )′ ; t ∈ Z} are zero-mean stationary for any n, with (idiosyncrasy) θ -a.e. bounded (as n → ∞) dynamic eigenvalues; (iii) ξk,t1 and ul,t2 are mutually orthogonal for any k, l, t1 and t2 ; ∑∞ (iv) the filters bkl (L) are square summable: m=1 b2klm < ∞ for all k ∈ N and l = 1, . . . , q, and (v) q is minimal with respect to (i)–(iv). ′
The processes {ult ; t ∈ Z}, l = 1, . . . , q, are called the common shocks or factors, the random variables ξkt and χkt the idiosyncratic and common components of Xkt , respectively. Actually, Forni and Lippi define idiosyncrasy via the behavior of dynamic aggregates, then show (their Theorem 1) that this definition is equivalent to the condition on dynamic eigenvalues given here. That result of Forni and Lippi (2001), along with Lemma 1, leads to the following proposition. Proposition 1. Let Assumptions A1 and A2 hold. Then, (a) Assumption A3(i) is satisfied iff the process Y has dynamic factor representation (qy factors; call them the (common) y-factors, χ spanning the y-common space Hy , with orthogonal complement ξ
Hy ) qy −
by;il (L)uy;lt + ξy;it ,
i ∈ N, t ∈ Z; (3)
l =1
(b) Assumption A3(ii) is satisfied iff the process Z has dynamic factor representation (qz factors; call them the (common) z-factors, χ spanning the z-common space Hz , with orthogonal complement ξ Hz ) Zjt = χz ;jt + ξz ;jt =
(5)
All filters involved have square-summable coefficients. Proof. The proof follows directly from the characterization theorem of Forni and Lippi (2001). It follows that, under Assumption A3, the processes Y and Z admit two distinct decompositions each: the marginal factor models (a) and (b), with marginal common shocks uy;lt (l = 1, . . . , qy ) and uz ;lt (l = 1, . . . , qz ), respectively, and the joint factor model (c), with joint common shocks ult (l = 1, . . . , q). This double representation allows for refining the factor decomposition. Call x-, y-, or zidiosyncratic a process which is orthogonal (at all leads and lags) to χ χ χ the x-, y-, or z-factors (to Hx , Hy , or Hz ), respectively. Similarly, χ χ χ call x-, y-, or z-common any process belonging to Hx , Hy , or Hz . The x-common components χxy;it and χxz ;jt further decompose into
χxy;it = χy;it + νy;it and χxz ;jt = χz ;jt + νz ;jt ,
such that
Yit = χy;it + ξy;it =
q − Yit = χxy;it + ξxy;it = bxy;il (L)ult + ξxy;it , l=1 k ∈ N, t ∈ Z in case Xkt = Yit Xkt = q − Zjt = χxz ;jt + ξxz ;jt = bxz ;jl (L)ult + ξxz ;jt , l =1 k ∈ N, t ∈ Z in case Xkt = Zjt .
qz −
bz ;jl (L)uz ;lt + ξz ;jt ,
j ∈ N, t ∈ Z; (4)
l=1
(c) Assumption A3 is satisfied iff the process X has dynamic factor representation (q factors, with q as in Lemma 1; call them the χ joint common factors, spanning the joint common space Hx , with ξ orthogonal complement Hx ).
where νy;it := ξy;it − ξxy;it is y-idiosyncratic, hence orthogonal to χy;it , and νz ;jt := ξz ;jt − ξxz ;jt is z-idiosyncratic, hence orthogonal to χz ;jt ; since we also have νy;it := χxy;it − χy;it where both χxy;it and χy;it are orthogonal to ξxy;it , it follows that νy;it and ξxy;it also are mutually orthogonal. χ χ χ χ Define Hy∩z := Hy Hz and qy∩z := qy + qz − q: Hy∩z is spanned by a (qy∩z )-tuple of white noises, which are both y- and χ z-common (in case qy + qz = q, Hy∩z = {0}). Denoting by φy;it and χ φz ;jt the projections of χy;it and χz ;jt onto Hy∩z , and by ψy;it and ψz ;jt the corresponding residuals, we obtain the decompositions χxy;it
Yit = φy;it + ψy;it + νy;it +ξxy;it χy;it
and
ξy;it
χxz ;jt
Zjt = φz ;jt + ψz ;jt + νz ;jt +ξxz ;jt , i, j ∈ N, t ∈ Z
χz ;jt
ξz ;jt
(6)
of the original observations into four mutually orthogonal components. The φy;it and φz ;jt components will be called strongly common, ξxy;it and ξxz ;jt strongly idiosyncratic, ψy;it and ψz ;jt weakly common, νy;it and νz ;jt weakly idiosyncratic. These decompositions induce additive decompositions of the variances of the observations into a sum of four terms indicating the relative contributions of each component. In the following sections, we propose a procedure that provides consistent estimates of φy;it , ψy;it , νy;it , ξxy;it and φz ;jt , ψz ;jt , νz ;jt , ξxz ;jt , hence ξy;it , ξxy;it , ξz ;jt , and ξxz ;jt . 3. Identifying the factor structure; population results
′
Based on the n-dimensional vector process Xn,t = Y′ny ,t , Z′nz ,t ,
we first asymptotically identify φy;it , ψy;it , νy;it , φz ;jt , ψz ;jt and νz ;jt as min(ny , nz ) → ∞. More precisely, we show that, under specified spectral structure, all those quantities can be consistently recovered from the finite-n subpanels {Xn,t } as min(ny , nz ) → ∞.
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
3.1. Recovering the joint common and strongly idiosyncratic components Under the joint factor model, Proposition 2 in Forni et al. (2000) n provides Xn,t -measurable reconstructions – denoted by χxy ;it and n χxz ;jt , respectively – of the joint common components χxy;it and χxz ;jt , which converge in quadratic mean for any i, j and t, as min(ny , nz ) → ∞; we are using the terminology ‘‘reconstruction’’ rather than ‘‘estimation’’ to emphasize that spectral densities here, unlike in Section 4, are assumed to be known. Write M∗ for the adjoint (transposed, complex conjugate) of a matrix M. The scalar process {Vn,kt := p∗ (L)Xn,t ; t ∈ Z}, the specn,k
tral density of which is λn,k (θ ), will be called Xn,t ’s (equivalently, 6n (θ)’s) k-th dynamic principal component, k = 1, . . . , n. The basic properties of dynamic principal components imply that {Vn,k1 t } and {Vn,k2 t }, for k1 ̸= k2 , are mutually orthogonal at all leads and lags. Forni et al. (2000) show that the projections of Yit and Zjt onto the closed space spanned by the present, past and future values of Vn,kt , k = 1, . . . , q yield the desired reconstructions of χxy;it and χxz ;jt . They also provide (up to minor changes due to the fact that they consider row- rather than column-eigenvectors, as we do here) the explicit forms
(7)
with q −
Ky;n,i (L) :=
p∗
(L)pn,k (L) and
p∗
(L)pn,k (L),
n,k,i
k=1
Kz ,n,j (L) =
q − k=1
where p
n,k,i
n,k,j
(L) denotes the i-th component of pn,k (L), i such that (L) the j-th component of n,k,j
Xit belongs to the y-subpanel and p p
n ,k
(L), j such that Xjt belongs to the z-subpanel. We then can state
a first consistency result.
lim
n χxy ;it = χxy;it and
lim
min(ny ,nz )→∞
Proof. The proof consists in applying Proposition 2 in Forni et al. (2000) to the joint panel. n It follows from (7) that χxy ;it has variance
Var(χ
)=
q ∫ − k=1
π
ny i = 1
n Var(χxy ;it ) =
1
π
ny i=1 k=1 −π
ny i = 1
Var(Yit ) =
ny ∫ 1 − π
ny i=1
i=1 k=1
π −π
i=1
n Var(ξxy ;it ) = Var(Yit ) −
q ∫ − k=1
π
|pn,k,i (θ )|2 λn,k (θ )dθ ;
−π
the complement to one of (8) therefore constitutes a measure of the ‘‘degree of idiosyncrasy’’ of the y-subpanel within the joint panel. Similar formulas hold for the strongly idiosyncratic component ξxzn ;jt . 3.2. Recovering the marginal common, marginal idiosyncratic, and weakly idiosyncratic components
which were taken care of in the previous section. Assume therefore that q > qy ; the marginal and joint y-common spaces then do not coincide anymore. Applying to the y-subpanels the same type of technique as in Section 3.1, consistent reconstructions of the χy;it ’s could be obtained from the spectral submatrices 6y;ny (θ ). Now, χy;it is also the common component of χxy;it , so that the same result can be obtained from factorizing the joint common spectral density matrices. As a reconstruction of χy;it we therefore consider the n projection χyn;it of χxy ;it onto the space spanned by the first qy dynamic principal components
Vny;t := Vyn;1t , . . . , Vyn;qy t
′
,
with Vyn;kt := p∗
χxy ;n,k
(L)χnxy;t ,
(9)
χyn;it = K∗χxy ;n,i (L)χnxy;t = K∗χxy ;n,i (L)K∗y;n,i (L)Xn,t , (10) ∑qy with Kχxy ;n,i (L) := k=1 pχxy ;n,k,i (L)pχxy ;n,k (L), where pχxy ;n,k,i (L) stands for p ( L)’s i-th component. χxy ;n,k Similarly, the reconstruction χzn;jt of χz ;jt is, with obvious
lim
χyn;it = χy;it and
lim
min(ny ,nz )→∞
χzn;jt = χz ;jt
in quadratic mean, for any i, j, and t. Proof. The proof again is a direct application of Proposition 2 in Forni et al. (2000) to the y- and z-subpanels, respectively.
λy;ny ,i (θ)dθ
ny ∫ −
As for the strongly idiosyncratic components ξxy;it , and ξxz ;jt , n they are consistently recovered, as min(ny , nz ) → ∞, by ξxy ;it := n n n Yit − χxy;it and ξxz ;jt := Zjt − χxz ;jt , respectively. In view of the mutual orthogonality of common and idiosyncatic components, n the variance of ξxy ;it writes
min(ny ,nz )→∞
−π
|pn,k,i (θ )| λn,k (θ)dθ/ 2
λz ;nz ,j (θ )dθ .
−π
Proposition 3. Let Assumptions A1–A3 hold. Then
|pn,k,i (θ)|2 λn,k (θ)dθ
The variance of the reconstructed marginal y-common component χyn;it is
of the y-subpanel yields an evaluation ny q ∫ − −
j =1
We then have a second consistency result.
of the contribution of joint common factors in the variability of the y-subpanel. Dividing it by the averaged variance ny 1 −
−π
j=1 k=1
π
nz ∫ −
χzn;jt = K∗χxz ;n,i (L)χnxz ;t = K∗χxz ;n,j (L)K∗y;n,j (L)Xn,t .
−π
Averaging this variance over the subpanel produces a measure ny 1 −
|pn,k,j (θ )|2 λn,k (θ )dθ /
notation,
|pn,k,i (θ)|2 λn,k (θ )dθ .
ny q ∫ − −
π
ated with 6χxy ;n (θ )’s k-th dynamic eigenvalue λχxy ;n,k (θ ). This projection takes the form
χxzn ;jt = χxz ;jt
in quadratic mean, for any i, j, and t.
n xy;it
q ∫ nz − −
n of the spectral density matrix 6χxy ;n (θ ) of χnxy;t = (χxy ;1t , . . . , n ′ χxy;ny t ) ; pχxy ;n,k (θ ) here denotes the dynamic eigenvector associ-
Proposition 2. Let Assumptions A1–A3 hold. Then, min(ny ,nz )→∞
of its ‘‘degree of commonness’’ within the joint panel. For the zsubpanel, this measure takes the form
If qy = q, the marginal common and idiosyncratic components χy;it and ξy;it coincide with their joint counterparts χxy;it and ξxy;it ,
n ∗ χxy ;it = Ky;n,i (L)Xn,t and
χxzn ;jt = K∗z ;n,j (L)Xn,t i = 1, . . . , ny , j = 1, . . . , nz ,
33
π −π
λy;ny ,i (θ)dθ
(8)
Var(χ
n y;it
)=
qy ∫ − k=1
π −π
|pχxy ;n,k,i (θ )|2 λχxy ;n,k (θ )dθ .
34
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
The averaged variance explained by the y-common factors in the y-subpanel is thus ny 1 −
ny i=1
Var(χ
n y;it
)=
=
ny qy ∫ 1 −− π
ny i=1 k=1 −π qy ∫ 1 − π
ny k=1 −π
|pχxy ;n,k,i (θ )|2 λχxy ;n,k (θ )dθ
λχxy ;n,k (θ )dθ .
(11)
Similarly, the averaged variance explained by the z-common factors in the z-subpanel is 1
∫
nz
π
qz −
λχxz ;n,k (θ )dθ .
(12)
−π k=1
Consistent reconstructions of the marginal idiosyncratic components ξy;it and ξz ;jt are straightforwardly obtained as ξyn;it := Yit − χyn;it and ξzn;jt := Zjt − χzn;jt , whereas the weakly idiosynn cratic components νy;it and νz ;it can be recovered as νyn;it := χxy ;it − n n n n n n n n χy;it = ξy;it − ξxy;it and νz ;jt := χxz ;jt − χz ;jt = ξz ;jt − ξxz ;jt , respectively. The averaged variance of weakly idiosyncratic components ∑ ny (or its ratio to i=1 Var(Yit )) can be interpreted as measuring the extent to which the z-common factors contribute to yidiosyncratic variation. Clearly, since Var(Yit ) = Var(χxy;it ) + Var(ξxy;it ) and Var(χxy;it ) = Var(χy;it ) + Var(νy;it ), we have
Actually, Brillinger also requires (ζ′t , η′t )′ to have absolutely 1 summable autocovariances, so that the filter 6ζ η (L)6− ηη (L) also is absolutely summable. We, however, do not need this here. Next, let H1 , H2 and H12 be the Hilbert spaces spanned by {V1;t ; t ∈ Z}, {V2;t ; t ∈ Z}, and {(V′1;t , V′1;t )′ ; t ∈ Z}, respectively, where V1;t := (V1;1,t , . . . , V1;q1 ,t )′ is a q1 -tuple (resp. V2;t := (V2;1,t , . . . , V2;q2 ,t )′ a q2 -tuple) of mutually orthogonal (at all leads and lags) nondegenerate stochastic processes: the dynamic dimensions of H1 and H2 are thus q1 and q2 , respectively. Denoting by
611 (θ ) 6(θ ) =: 621 (θ )
−1/2
1 −1 622 (θ )621 (θ )diag(λ− 1;1 (θ ), . . . , λ1;q1 (θ ))
−1/2
× 612 (θ )622 (θ ),
ny i=1
ny q ∫ 1 − − π
ny
−
i=1 k=1 qy ∫ − k=1
π
(14)
−1/2
1 of the q2 -dimensional process 622 (L)621 (L)6− 11 (L)V1;t has a maximal eigenvalue equal to one, with multiplicity q1∩2 . (ii) Denoting by p1∩2;1 (θ ), . . . , p1∩2;q1∩2 (θ ) an arbitrary orthonormal basis of the corresponding q1∩2 -dimensional eigenspace, the process {ϒ t := (Υ1,t , . . . , Υq1∩2 ,t )′ ; t ∈ Z}, with
−1/2
k = 1, . . . , q1∩2 ,
(15)
provides an orthonormal basis for H1∩2 . (13)
|pn,k,i (θ )|2 λn,k (θ )dθ
Proof. A random variable Υ ∈ H2 , that is, a variable of the form Υ = a∗Υ (L)V2;t with a∗Υ (L) := (aΥ ,1 (L), . . . , aΥ ,q2 (L)) belongs to H1∩2 iff it coincides with its projection onto the space H1 spanned by the V1;t ’s. In view of Proposition 4, that projection is 1 −1 a∗Υ (L)621 (L)diag(λ− 1;1 (L), . . . , λ1;q (L))V1;t .
(16)
1
−π
The variance of a projection being less than or equal to the variance of the projected variable, the variance of (16) is less than or equal to the variance of Υ itself:
λχxy ;n,k (θ )dθ .
−π
∫
Similar formulas hold for the z-subpanel.
π −π
3.3. Disentangling the strongly and weakly common components
1 −1 a∗Υ (θ )621 (θ )diag(λ− 1;1 (θ ), . . . , λ1;q (θ ))612 (θ )aΥ (θ )dθ 1
∫
π
=
By definition, φy;it is obtained as the projection of χy;it onto
χ
θ ∈ [−π , π]
Υk,t := p∗1∩2;k (L)622 (L)V2;t ,
so that the evaluation of the z-common factors contribution to yidiosyncratic variation is Var(νyn;it ) =
θ ∈ [−π , π],
Proposition 5. (i) The spectral density
In the finite-n panel, Var(νy;it ) is estimated by
ny 1 −
the spectral density matrix of (V′1;t , V′2;t )′ , with 611 (θ ) = diag (λ1;1 (θ ), . . . , λ1;q1 (θ )) and 622 (θ ) = diag(λ2;1 (θ ), . . . , λ2;q2 (θ )), assume that 6(θ ) has rank q12 θ -a.e., so that H12 has dynamic dimension q12 , and the intersection H1∩2 := H1 ∩ H2 dynamic dimension q1∩2 = q1 + q2 − q12 . We then have the following result.
Var(νy;it ) = Var(χxy;it ) − Var(χy;it ) = Var(ξy;it ) − Var(ξxy;it ).
n Var(νyn;it ) = Var(ξyn;it ) − Var(ξxy ;it ),
612 (θ ) , 622 (θ )
χ
Hy∩z , and ψy;it follows as the residual χy;it − φy;it . Unlike Hy χ χ and Hz , however, Hy∩z is not characterized via an explicit
sequence of orthonormal bases. The methods developed in the previous sections, thus, do not apply unless such a sequence can be computed first. This however requires some preparation: Proposition 4 is adapted from Theorem 8.3.1 in Brillinger (1981); Proposition 5, to the best of our knowledge, is new. Proposition 4. Assume that the (r + s)-dimensional second-order mean-zero stationary process {(ζ′t , η′t )′ ; t ∈ Z} is such that the spectral density matrix 6ηη (θ ) of ηt is nonsingular. Then, the projection of ζt onto the closed space Hη spanned by {ηt ; t ∈ Z}—that is, the rtuple A∗ (L)ηt of square-summable linear combinations of the present, past and future of ηt minimizing E[(ζt − A∗ (L)ηt )(ζt − A∗ (L)ηt )′ ] is 1 6ζ η (L)6− ηη (L)ηt , where 6ζ η (θ ) denotes the cross-spectrum of ζt and ηt .
−π
1/2
−1/2
a∗Υ (θ )622 (θ )622 −1/2
1 −1 (θ )621 (θ )diag(λ− 1;1 (θ ), . . . , λ1;q1 (θ ))
1/2
× 612 (θ )622 (θ )622 (θ )aΥ (θ )dθ ∫ π ≤ a∗Υ (θ )622 (θ )aΥ (θ )dθ , −π
1/2
irrespective of 622 (θ )aΥ (θ ). It follows that the spectral matrix (14) has eigenvalues less than or equal to one (θ -a.e.), and that Υ is 1/2 in H1∩2 iff 622 (θ )aΥ (θ ) belongs to the eigenspace with eigenvalue one. The q1∩2 random variables Υk,t defined in (15) clearly satisfy that condition, and it is easy to check that the spectral density of {ϒ t ; t ∈ Z} moreover is the q1∩2 × q1∩2 identity matrix. The result follows. The strongly common component φy;it is defined as the projecχ tion of χy;it onto Hy∩z : recovering it as the projection φyn;it of χyn;it χ
χ
onto the intersection of Hy;n (spanned by Vny;t ) and Hz ;n (spanned χ
by Vnz ;t ) seems a natural idea. Proposition 5, with H1 = Hy;n
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
χ
and H2 = Hz ;n , hence q1 = qy , q2 = qz and q1∩2 = qy∩z , provides an orthonormal basis for that intersection. More precisely, denote by
6nV (θ) :=
6nVy Vy (θ ) 6nVz Vy (θ )
6nVy Vz (θ ) 6nVz Vz (θ )
[6nVz Vz (θ )]−1/2 6nVz Vy (θ )[6nVy Vy (θ )]−1 6nVy Vz (θ )[6nVz Vz (θ )]−1/2 ; denote by py∩z ;n,1 (θ ), . . . , py∩z ;n,qy∩z (θ ) its qy∩z first eigenvectors. χ χ An orthonormal basis of Hy;n ∩ Hz ;n is Vny∩z ;t := (Vyn∩z ;1,t , . . . , Vyn∩z ;qy∩z ,t )′ where, in view of (15),
−1/2 (L) 6nVz Vz (L)Vnz ;t , y∩z ;n,k
k = 1, . . . , qy∩z .
Vyn∩z ;k,t := p∗ Since
χyn;it =
p
χxy ;n,k,i
k=1
(L)Vyn:k,t = (pχ
xy ;n,1,i
(L), . . . , pχ
(L))Vny:t ,
χ Hy;n
Vny:t .
xy ;n,qy ,i
χ Hz ;n
6nVy Vy (θ ) 6nVy∩z Vy (θ )
=
6nVy Vy∩z (θ ) 6nVy∩z Vy∩z (θ )
diag(λχxy ;n,1 (θ ), . . . , λχxy ;n,qy (θ ))
6nVy Vy∩z (θ )
6nVy∩z Vy (θ )
Iny∩z ×y∩z (θ )
.
× py∩z ;n,1 (θ ), . . . , py∩z ;n,qy∩z (θ ) ,
with pn,k(y) (θ ) collecting the components pn,k,i (θ ) of pn,k (θ ) such that Xit belongs to the y-subpanel (resp. pn,k,(z ) (θ ) collecting the components pn,k,j (θ ) such that Xjt belongs to the z-subpanel), we have
6nVy Vy∩z (θ ) = P∗χxy ;n (θ )Pn(y) (θ )diag(λn,1 (θ ), . . . , λn,q (θ ))
× P∗n(z ) (θ )Pχxz ;n (θ )Py∩z ;n (θ ). The desired projection of Vny:t , in view of Proposition 4, is 6nVy Vy∩z (L)Vny∩z ;t ; hence, the reconstructions we are proposing are, for the strongly common component φy;it ,
φyn;it := (pχ
(L), . . . , pχ
xy ;n,qy ,i
(L))6nVy Vy∩z (L)Vny∩z ;t
= (pχ
(L), . . . , pχ
xy ;n,qy ,i
(L))P∗χxy ;n (L)Pn(y) (L)
π
∫
−π
(pχxy ;n,1,i (θ ), . . . , pχxy ;n,qy ,i (θ ))6nVy Vy∩z (θ )
× 6nVy∩z Vy (θ )(pχxy ;n,1,i (θ ), . . . , pχxy ;n,qy ,i (θ ))′ dθ . ∑ny n The average n1 i=1 Var(φy;it ) measures the contribution of the y strongly common factors in the total variation of the y-subpanel. Similar quantities are easily computed for the z-subpanel.
The previous section shows how all components of Yit and Zjt can be recovered asymptotically as min(ny , nz ) → ∞, provided that the spectral density 6n and the numbers q, qy , and qz of factors are known. The estimates φyn;it , ψyn;it and νyn;it all take the form of a filtered series of the observed process Xn,t . We have indeed
(L), . . . , pn,q (L)).
Assumption A1′ . For all n, the∑ vector process {Xn,t ; t ∈ Z} admits ∞ a linear representation Xn,t = k=−∞ Ck ζt −k , where ζt is full-rank n-dimensional white noise with finite fourth∑ order moments, and ∞ the n × n matrices Ck = Cij,k are such that k=−∞ |k||Cij,k |1/2 < ∞ for all i, j. Under Assumption A1′ , if 6Tn (θ ), with elements σnT,ij (θ ), denotes any periodogram-smoothing or lag-window estimator of 6n (θ ),
[ T →∞
sup
θ∈[−π ,π ]
] T σ (θ ) − σij (θ ) > ε = 0 n,ij
for all n, i, j, and ε > 0, (see Forni et al., 2000).4 In Section 6, we consider the lag-window estimators
× P∗n(z ) (L)Pχxz ;n (L)Py∩z ;n (L)P∗y∩z ;n (L)Vnz ;t =: H∗y;n,i (L)Vnz ;t ,
(17)
and, for the weakly common one ψy;it , ψ := χ − φ With obvious changes, we similarly define φzn;jt and ψzn;jt . Parallel with Propositions 2 and 3, we then have the following consistency result for φyn;it and φzn;jt (hence ψyn;it and ψzn;jt ). n y;it
n,1
These three filters all are functions of the spectral density matrix 6n (θ ) which of course in practice is unknown, as we only observe a finite realization XTn := (Xn1 , Xn2 , . . . , XnT ) of Xn . We therefore need an estimator 6Tn (θ ) of 6n (θ ), the consistency of which requires strengthening slightly Assumption A1 into the following Assumption A1′ :
lim P
× diag(λn,1 (L), . . . , λn,q (L))
n y;it
Var(φyn,it ) =
and Pn (L) := (p
and
Pn(z ) (θ ) := pn,1(z ) (θ ), . . . , pn,q(z ) (θ ) ,
It follows from (17) that the reconstructed strongly common component φyn,it has variance
K∗φy ;n,i (L) := H∗y;n,i (L)P∗χxz ;n (L)Pn(z ) (L)P∗n (L)Xn,t ,
Pn(y) (θ ) := pn,1(y) (θ ), . . . , pn,q(y) (θ ) ,
Proof. The proof still follows from Proposition 2 of Forni et al. (2000), and the fact that all spectral densities involved, for given n, are locally continuous functions of 6n (θ ).
with
Pχxz ;n (θ ) := pχxz ;n,1 (θ ), . . . , pχxy ;n,qz (θ ) ,
xy ;n,1,i
φzn;jt = φz ;jt
∗ ∗ ∗ n n νyn;it = χxy ;it − χy;it = [Ky;n,i (L) − Kχxy ;n,i (L)Ky;n,i (L)]Xn,t =: K∗νy ;n,i (L)Xn,t ,
−1/2 −1/2 diag(λχxz ;n,1 (θ ), . . . , λχxz ;n,qz (θ ))
xy ;n,1,i
lim
min(ny ,nz )→∞
ψyn;it = χyn;it − φyn;it = [K∗χxy ;n,i (L)K∗y;n,i (L) − K∗φy ;n,i (L)]Xn,t =: K∗ψy ;n,i (L)Xn,t , and
Pχxy ;n (θ ) := pχxy ;n,1 (θ ), . . . , pχxy ;n,qy (θ ) ,
φyn;it = φy;it and
φyn;it = H∗y;n,i (L)Vnz ;t =: K∗φy ;n,i (L)Xn,t
Letting Py∩z ;n (θ ) :=
lim
min(ny ,nz )→∞
4. Recovering the factor structure; estimation results
we first compute the projection onto ∩ of That projection is obtained by applying Proposition 4 to the (qy + qy∩z )dimensional random vector (Vny:′t , Vny∩′ z ;t )′ , with spectral density
Proposition 6. Let Assumptions A1–A3 hold. Then
in quadratic mean, for any i, j, and t.
the spectrum of (Vny;′t , Vnz ;′t )′ . The matrix (14) here takes the form
qy −
35
n y;it .
6Tn (θ ) :=
MT −
0Tnk ωk e−ikθ
(18)
k=−MT
4 Actually, Forni et al. (2000) wrongly borrow the result from Brockwell and Davis (1987); a more appropriate reference is Robinson (1991), Theorem 2.1.
36
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
where 0Tnk is the sample covariance matrix of Xn,t and Xn,t −k and ωk := 1−|k|/(MT +1) are the weights corresponding to the Bartlett lag window of size MT . Consistency then is achieved provided that the following assumption holds:
The success of this identification method however also requires strengthening somewhat the assumptions; from now on, we reinforce Assumption A1′ into Assumption A1′′ and Assumptions A2 and A3 into Assumptions A2′ and A3′ :
Assumption B. MT → ∞, and MT T −1 → 0, as T → ∞.
Assumption A1′′ . Same as Assumption A1′ , but ∑∞(i) the convergence condition on the Cij,k ’s is uniform, supi,j∈N k=−∞ |Cij,k ||k|1/2 < ∞, and (ii) writing ci1 ,...,iℓ (k1 , . . . , kℓ−1 ) for the cumulant of order ℓ of Xi1 (t + k1 ), . . . , Xiℓ−1 (t + kℓ−1 ), Xiℓ (t ), for all 1 ≤ ℓ ≤ 4 ∑∞ ∑∞ and 1 ≤ j < ℓ, supi1 ,...,iℓ [ k1 =−∞ . . . kℓ−1 =−∞ |ci1 ,...,iℓ (k1 , . . . , kℓ−1 )|] < ∞.
For simplicity, we consider a single MT for the y- and z-subpanels. One also could base the estimation of 6y;ny , 6z ;nz and 6yz ;n on zy
y
three distinct bandwidth parameter values MT , MTz , and MT . A consistent estimator 6Tn (θ ) of 6n (θ ) however is not sufficient here. Deriving, from this estimator 6Tn (θ ), estimated versions KTφy ;n,i (L), KTψy ;n,i (L) and KTνy ;n,i (L), of the filters Kφy ;n,i (L), Kψy ;n,i (L) and Kνy ;n,i (L) indeed also requires an estimation of the numbers of factors q, qy and qz involved. The only method allowing for such estimation is the identification method developed in Hallin and Liška (2007), which we now briefly describe, with a few adjustments taking into account the notation of this paper. For a detailed description of the procedure, we refer to the section entitled ‘‘A practical guide to the selection of q’’ in Hallin and Liška (2007). The lag window method described in (18) provides estimations 6Tn (θl ) of the spectral density at frequencies θl := π l/(MT + 1/2) for l = −MT , . . . , MT . Based on these estimations, consider the information criterion
ICnT;c
(k) := log
n 1 −
MT −
1
n i=k+1 2MT + 1 l=−M T
+ kcp(n, T ),
λ (θl ) T ni
0 ≤ k ≤ qmax , c ∈ R+ 0 ,
where the penalty function p(n, T ) is o(1) while p o min(n,
MT2
−1/2 1/2
, MT
T
(19) −1
(n, T ) is
) as both n and T tend to infinity, and
qmax is some predetermined upper bound; the eigenvalues λTni (θl ) are those of 6Tn (θl ). Depending on c > 0, the estimated number of factors, for given n and T , is qTn;c := argmin ICnT;c (k). 0≤k≤qmax
Hallin and Liška (2007) prove that this qTn;c is consistent for any c > 0. ‘‘Optimal’’ tuning c ∗ of c is then performed as follows. Tj
Consider a J-tuple of the form qc ,nj , j = 1, . . . , J, where nj = (ny;j , nz ;j ) with 0 < ny;1 < · · · < ny;J = ny , 0 < nz ;1 < · · · < nz ;J = nz , and 0 < T1 ≤ · · · ≤ TJ = T . This J-tuple can be interpreted as a ‘‘history’’ of the identification procedure, Tj
and characterizes, for each c > 0, a sequence qc ,nj , j = 1, . . . , J of estimated factor numbers. In order to keep a balanced representation of the two blocks, we only consider J-tuples along which ny;j /nz ;j is as close as possible to ny /nz . The selection of c ∗ is based on the inspection of two mappings: c → qTn;c , and c → Sc , where Sc2
:= J
−1
J −
Tj
qn ;c − J
−1
J −
j
j =1
2 Tj
qn ;c j
j=1 Tj
measures the variability of qn ;c over the ‘‘history’’. For n and T large j
enough, Sc exhibits ‘‘stability intervals’’, that is, intervals of c values over which Sc = 0. The definition of Sc implies that c → qTn;c is constant over such intervals. Starting in the neighbourhood of c = 0, a first stability interval (0, c1+ ) corresponds to qTn;c = qmax ; choose c ∗ as any point in the next one, (c2− , c2+ ). The selected number of factors is then qTn = qTn;c ∗ . The same method, applied
to the y- and z-subpanels, yields estimators qTny and qTnz of qy and qz ; qTn;yz := qTny + qTnz − qTn provides a consistent estimator of qyz .
Assumption A2′ . The entries σij (θ ) of 6n (θ ) (i) are bounded, uniformly in n and θ – that is, there exists a real c > 0 such that |σij (θ )| ≤ c for any i, j ∈ N and θ ∈ [−π , π] – and (ii) they have bounded, uniformly in n and θ , derivatives up to the order two— k
namely, there exists Q < ∞ such that supi,j∈N supθ ddθ k σij (θ ) ≤ Q , k = 0, 1, 2.
Assumption A3′ . Same as Assumption A3, but moreover (i) λy;ny ,qy (θ ) and λz ;nz ,qz (θ ) diverge at least linearly in ny and 1 nz , respectively, that is, lim infny →∞ infθ n− y λy;ny ,qy (θ ) > 0, and 1 lim infnz →∞ infθ n− z λz ;nz ,qz (θ ) > 0, and (ii) both ny /nz and nz /ny are O(1) as min(ny , nz ) → ∞.
This ‘‘at least linear’’ divergence assumption is also made in Hallin and Liška (2007), and can be considered as a form of crosssectional stability of the two panels under study. Once estimated values of the numbers q, qy and qz of factors are available, the estimated counterparts of of Kφy ;n,i (L), Kψy ;n,i (L) and Kνy ;n,i (L) are obtained by substituting 6Tn (θ ), qTn , qTny and qTnz for 6n (θ ), q, qy and qz in all definitions of Section 3, then truncating infinite sums as explained in Section B of Forni et al. (2000) (a truncation which depends on t, which explains the Tt Tt notation), yielding KTt φy ;n,i (L), Kψy ;n,i (L) and Kνy ;n,i (L). Parallel with Proposition 3 in Forni et al. (2000), we then have the following result. Proposition 7. Let Assumptions A1′′ , A2′ , A3′ and B hold. Then, for all ϵk > 0 and ηk > 0, k = 1, 2, 3, there exists N0 (ϵ1 , ϵ2 ,
∗ ≤ η1 , ϵ3 , η1 , η2 , η3 ) such that P KTt φy ;n,i (L)Xn,t − φy;it > ϵ1 Tt ∗ Tt ∗ P Kψy ;n,i (L)Xn,t − ψy;it > ϵ2 ≤ η2 , and P Kνy ;n,i (L)Xn,t − νy;it > ϵ3 ≤ η3 , for all t = tˇ(T ) satisfying a ≤ lim infT →∞ (tˇ(T )/T ) ≤ lim supT →∞ (tˇ(T )/T ) ≤ b, for some a, b such that 0 < a < b < 1, all n ≥ N0 and all T larger than some T0 (n, ϵ1 , ϵ2 , ϵ3 , η1 , η2 , η3 ). Proof. The proof consists in reproducing, for each projection involved in the reconstruction of φy;it , ψy;it and νy;it , the proof of Proposition 3 in Forni et al. (2000). Lengthy but obvious details are omitted. Consistent estimations of the various contributions to the total variance of each subpanel can be obtained either by substituting estimated spectral eigenvalues and eigenvectors for the exact ones in the formulas of Section 3, and replacing integrals with the corresponding finite sums over Fourier frequencies, or by computing the empirical variances of the estimated strongly and weakly common, strongly and weakly idiosyncratic components. 5. Dynamic factors in the presence of K blocks (K > 2) The ideas developed in the previous sections extend to the more general case of K > 2 blocks, with, however, rapidly increasing
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
complexity. Each subset {k1 , . . . , kℓ }, ℓ = 0, 1, . . . , K of {1, . . . , K } indeed characterizes a decomposition of H into mutually χ orthogonal common and idiosyncratic subspaces, H{k1 ,...,kℓ } and ξ
H{k1 ,...,kℓ } , say, leading to 2K distinct implementations of the Hallin–Liška and Forni et al. procedures. Instead of Yit and Zjt , denote all observations as Xk;it (i = 1, . . . , n; k = 1, . . . , K ), with the additional label k specifying χ that Xk;it belongs to block k. Projecting onto H{k1 ,...,kℓ } and ξ H{k1 ,...,kℓ }
yields the orthogonal decomposition (for simplicity, we are dropping the subscripts n and T ) Xk;it = χk;{k1 ,...,kℓ };it + ξk;{k1 ,...,kℓ };it of Xk;it into a {k1 , . . . , kℓ }-common and a {k1 , . . . , kℓ }idiosyncratic component; in particular, ξk;{1,...,K };it , as in Section 2, can be called strongly idiosyncratic. For {l1 , . . . , lm } ⊂ {k1 , . . . , kℓ } and k ∈ {k1 , . . . , kℓ } \ {l1 , . . . , lm }, call weakly idiosyncratic the difference
νk;{l1 ,...,lm },{k1 ,...,kℓ }\{l1 ,...,lm };it := ξk;{k1 ,...,kℓ }\{l1 ,...,lm };it − ξk;{k1 ,...,kℓ };it = χk;{k1 ,...,kℓ };it − χk;{k1 ,...,kℓ }\{l1 ,...,lm };it . χ
χ
ξ
Since, for A ⊂ B, HA is a subspace of HB and HB a subspace ξ
of HA , this weakly idiosyncratic component is orthogonal to both χk;{l1 ,...,lm };it and ξk;{k1 ,...,kℓ };it . Further projections can be performed, yielding components with various degrees of commonness/idiosyncrasy. We restrict to projections onto common spaces of the form χ
χ
Hl1 ∩···∩lm := Hl1
...
χ
Hlm
decomposing χk;{k1 ,...,kℓ };it , k ∈ {l1 , . . . , lm } ⊇ {k1 , . . . , kℓ }, into
φk;{l1 ∩···∩lm };it and ψk;{k1 ,...,kℓ },{l1 ∩···∩lm };it := χk;{k1 ,...,kℓ };it − φk;{l1 ∩···∩lm };it which, in analogy with the two-block case, we respectively call strongly and weakly common components. Projections onto χ (nondegenerate) subspaces of the form Hl1 ∩···∩lm can be obtained via the method described in Section 3.3. Sequences of projections onto decreasing sequences of H χ ’s yield decompositions of the original observations into sums of mutually orthogonal components, along with decompositions of their variances; these decompositions, however, depend on the sequence of projections adopted. In view of the rapidly increasing notational burden, we will not pursue any further with formal developments; an application for K = 3 is considered in Section 6.2. 6. Real data applications We applied our method to a dataset of monthly Industrial Production Indexes for France, Germany, and Italy, observed from January 1995 through December 2006. All data were preadjusted by taking a log-difference transformation (T = 143 throughout— one observation is lost due to differencing), then centered and normalized using their sample means and standard errors. In practice, some care has to be taken, however, due to the fact that, for finite n and T , the joint and marginal common spaces reconstructed from estimated spectral densities need not being nested. When defined from population spectral densities, the yχ common space Hy associated with the Yit ’s and the common space associated with the χxy;it ’s coincide, and are a subspace of the joint χ common space Hxy . When based on finite n and T estimations, those two spaces in general are distinct, and only the second one is χ a subspace of Hxy . In order to avoid ambiguities and inconsistencies in sums of squares, it is important, under finite n and T , to proceed with projections and spectral estimation in a sequence. For instance, one first should estimate the global spectrum, and
37
χ
project the Yit ’s onto the reconstruction of Hxy based on that global spectral estimation. From those projections, χxy;it , say, one should then estimate the spectrum of the χxy;it ’s, the dynamic principal χ components of which in turn yield the reconstruction of Hy , etc. This sequence of projections and estimations is carefully described in the applications we are treating below. n;T
6.1. A two-block analysis First consider the data for France and Germany. Using XF ;it for the French observations Yit and XG;jt for the German ones Zjt , we have ny = nF = 96, nz = nG = 114, hence n = nFG = 210. Spectral densities were estimated from the pooled panel using a lag-window estimators of the form (18), with truncation parame√ ter MT = 0.5 T = 5. Based on this estimation, we ran the Hallin and Liška (2007) identification method on the French and German subpanels, with sequences nF ,j = 96 − 2j, j = 1, . . . , 5 and nG,j = 96 − 2j, j = 1, . . . , 5, respectively, then on the pooled panel, with sequence nFG,j = 210 − 2j, j = 1, . . . , 8 and an ‘‘almost constant’’ proportion 96/210, 114/210 of French and German observations (namely, ⌈96nFG,j /210⌉ French observations, and ⌊114nFG,j /210⌋ German ones. In all cases, we put Tj = T = 143, j = 1, . . . , 5. The range for c values, after some preliminary exploration, was taken as [0, 0.0002, 0.0004, . . . , 0.5], and qmax was set to 10. In all cases, the panels were randomly ordered prior to the analysis. The
−1/2 1/2
penalty function was p(n, T ) = min n, MT2 , MT
T
−1/2
.
The results are shown in Fig. 1, and very clearly conclude for qT(n ,n ) = 3 (for c ∈ [0.1798, 0.1894]), qTnF ,F = 2 (for c F
G
∈ [0.2222, 0.2344]), and qTnG ,G = 3 (for c ∈ [0.2032, 0.2138]). Since qT(n ,n ) = max(qTnF ,F , qTnG ,G ), this identification yields F G
a block structure with 3 joint common factors, 3 Germancommon and 2 French-common factors, the French-common space being included in the German-common one. The Frenchcommon components thus are strongly common (no weakly common French components), whereas one German-common factor is French-idiosyncratic (no weakly idiosyncratic German components). Taking these facts into account, the three factor analyses described in Sections 2 and 3 yield (a) (global analysis) an analysis based on spectral estimation in the global panel (three factors) decomposes the French observation XF ;it (resp. the German observation XG;jt ) into a strongly idiosyncratic ξxF ;it (resp. ξxG;jt = ξG;jt ) and a joint common χxF ;it (resp. χxG;jt = χG;jt ) component; (b) (French block analysis) an analysis based on spectral estimation in the subpanel consisting of the χxF ;it ’s obtained under (a) (two factors) decomposes the French joint common χxF ;it into a χ χ French-common component χF ;it (coinciding, since HF ⊂ HG with the strongly common one φF ;it ) and a French-weaklyidiosyncratic one νF ;it ; the same projection decomposes the German joint common component χxG;jt = χG;jt into a strongly common φG;jt and a weakly common ψG;jt (we already know that νG;jt = 0). The results, along with the corresponding percentages of explained variance, are provided in Fig. 2. For each of the four mutually orthogonal subspaces appearing in the decomposition, we provide the percentage of total variation explained in each country. The two strongly common (Franco-German) factors jointly account for 3.5% only of the German total variability, but 14.6% of the French total variability. Germany’s ‘‘all-German’’ (French-idiosyncratic) common factor explains 22.5% of Germany’s total variance. Although French-idiosyncratic, that German factor nevertheless still accounts for 8.5% of the French total variability. These estimated percentages of explained variation were obtained via estimated eigenvectors and eigenvalues.
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
2 1.5 1 0.5 0 0.2
0.22
0.24
0.26 c
0.28
0.3
0.32
11 10 9 8 7 6 5 4 3 2 1 0 0.18
3 2.5 2 1.5
Sc
2.5
T qc;N
1 0.5 0.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0 0.38
c
(b) Germany.
T qc;N
(a) France.
3.5
11 10 9 8 7 6 5 4 3 2 1 0
3 2.5 2
Sc
11 10 9 8 7 6 5 4 3 2 1 0 0.18
Sc
T qc;N
38
1.5 1 0.5 0
0.18
0.2
0.22
0.24
0.26
0.28
c
(c) France & Germany. Fig. 1. Identification of the numbers of factors for the France–Germany Industrial Production dataset. The three figures show the simultaneous plots of c → Sc and c → qTc,n needed for this identification, ((a) and (b)) in the marginal French and German subpanels, and (c) in the complete panel, respectively.
6.2. A three-block analysis Next, consider the three-block case resulting from adding to the French and German data the corresponding Italian Industrial Production indices, with nI = 91, yielding a panel with K = 3 blocks. The series length is still T = 143. Adapting the notation of Section 5, let XF ;it , XG;it and XI ;it correspond to the French, the German, and the Italian subpanels, respectively. From the global panel (n = nFGI = 301 series), we can extract six subpanels—the three panels we already analyzed in Section 6.1 (the two-block French–German panel, the French and the German one-block subpanels), one new one-block subpanel (the marginal Italian one, with nI = 91), two new two-block subpanels (the French–Italian one, with nFI = 187 and the German–Italian one, with nGI = 205, respectively). Analyzing these new subpanels along the same lines as in the previous section (with, using obvious notation, nI ,j = 91 − 2j, j = 1, . . . , 5, nGI ,j = 191 − 2j, j = 1, . . . , 8, nFI ,j = 187 − 2j, j = 1, . . . , 8, and nFGI ,j = 301 − 2j, j =
Fig. 2. Decomposition of the France–Germany panel data into four mutually orthogonal components, with the corresponding percentages of explained variation.
√
1, . . . , 15), still with MT = 0.5 T = 5, the same penalty function and the same qmax = 10 as before, we obtain the identification results shown in the four graphs of Fig. 3. These graphs again very clearly identify a total number of qTn,FGI = 4 joint common factors (for c ∈ [0.1710, 0.1718]), qT(nF ,n ),FI = 3 (for c ∈ [0.1838, 0.1886]) French–Italian, and I
qT(n ,n ),GI = 4 (for c ∈ [0.1786, 0.1800]) German–Italian marginal G I ‘‘binational’’ factors, and qTnI ,I = 2 (for c ∈ [0.2118, 0.22218]) marginal Italian factors. Along with the figures obtained in χ χ Section 6.1 for France and Germany, this implies that HF ⊂ HG , χ
ξ
χ
ξ
hence HF ∩ HG = {0} = HGI ∩ HF . The relations between those various (dynamic) dimensions are easily obtained; for instance, q(nF ,nG ),FG = qnF ,F + qnG ,G − q(nF ,nG ) , a relation we already used in Section 6.1, or q(nF ,nG ),FG = qnF ,F + qnG ,G + qnI ,I − q(nF ,nG ),FG − q(nF ,nI ),FI − q(nG ,nI ),GI + q(nF ,nG ,nI ),FGI . These relations imply that the three countries share one strongly common factor. As already noted, France (two factors) has no
specific common factor, but one (the strongly common one) shared with Germany and Italy, and one shared with Germany alone. Both Italy (two factors) and Germany (three factors) have a ‘‘national’’ factor. Italy’s ‘‘non-national’’ factor is the strongly common one; Germany’s ‘‘non-national’’ factors are those shared with France, and include the strongly common one. The Italian and German ‘‘national’’ factors need not be mutually orthogonal. Proceeding with the various projections described in Section 5, we successively obtain (a) (global analysis) a four-factor analysis based on spectral estimation in the global panel: projecting onto the χ ξ resulting reconstruction of HFGI and HFGI decomposes XF ;it , XG;it , and XI ;it into their strongly idiosyncratic components ξF ;{FGI };it , ξG;{FGI };it , and ξI ;{FGI };it , with respective orthogonal complements χF ;{FGI };it , χG;{FGI };it , and χI ;{FGI };it ; (b1) (French–German block analysis) a three-factor analysis based on spectral estimation in the subpanel consisting of the χF ;{FGI };it ’s and χG;{FGI };it ’s obtained under (a): projection onto χ χ the resulting reconstructions of HFG = HG decomposes
3.5 3
2 1.5 1 0.5 0 0.22
0.24
0.26
0.28 c
0.3
0.32
0.34
0.34
0.38
0.2
0.22
0.24
0.26
c
(b) France & Italy.
4 3.5 3 2.5 2
Sc
T qc;N
0.18
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0.28
1.5 1 0.5 0.18
0.2
0.22 c
0.24
0.26
0
(c) Germany & Italy.
11 10 9 8 7 6 5 4 3 2 1 0 0.16
3.5 3 2.5 2
Sc
0.2
(a) Italy.
11 10 9 8 7 6 5 4 3 2 1 0 0.16
T qc;N
2.5
11 10 9 8 7 6 5 4 3 2 1 0
Sc
4
39
T qc;N
11 10 9 8 7 6 5 4 3 2 1 0 0.18
Sc
T qc;N
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
1.5 1 0.5 0.17
0.18
0.19
0.2 c
0.21
0.22
0.23
0 0.24
(d) France & Germany & Italy.
Fig. 3. Identification of the numbers of factors for the French–German–Italian Industrial Production dataset. The four figures show the simultaneous plots of c → Sc and c → qTc,n needed for this identification: (a) for the marginal Italian subpanel, ((b) and (c)) for the France–Italy and Germany–Italy subpanels, and (d) for the complete three-country panel, respectively.
χF ;{FGI };it into χF ;{FG};it and νF ;{I },{FG};it , and χG;{FGI };it into χG;{FG};it and νG;{I },{FG};it , respectively; (b2) (French–Italian block analysis) a three-factor analysis, similar to that in (b1), based on spectral estimation in the subpanel consisting of the χF ;{FGI };it ’s and χI ;{FGI };it ’s obtained under χ (a): projection onto the resulting reconstruction of HFI decomposes χI ;{FGI };it into χI ;{FI };it and νI ;{G},{FI };it ; (c1) (French block analysis) a two-factor analysis, similar to step (b) in Section 6.1: projecting χF ;{FG};it obtained in (b1) onto χ HF yields χF ;{F };it = χF ;{F ∩G};it and νF ;{G},{FG};it ; for χG;{FG};it , the same projection actually coincides with a projection onto χ HF ∩G , yielding φG;{F ∩G};it and ψG;{FG},{F ∩G};it = ψG;{FG},{F };it ; (c2) (Italian block analysis) a two-factor analysis based on spectral estimation in the subpanel consisting of the χI ;{FI };it ’s obtained in step (b2): projecting them onto the resulting χ reconstruction of HI yields χI ;{I };it and νI ;{FG},{I };it ; (d) (French-and-Italian block analysis) a final projection of χF ;{F };it and φG;{F ∩G};it obtained in (c1), and χI ;{I };it obtained χ χ in (c2) onto HF ∩G∩I = HF ∩I yields the strongly common components φF ;{F ∩G∩I };it , φG;{F ∩G∩I };it and φI ;{F ∩G∩I };it , along with the weakly common ones ψF ;{F },{F ∩G∩I };it , ψG;{F },{F ∩G∩I };it , and ψI ;{I },{F ∩G∩I };it . Conclusions are summarized in the diagram of Fig. 4, along with the various percentages of explained variances. Inspection of that diagram reveals that the three countries all exhibit a high percentage of about 71% of strongly idiosyncratic variation. France and Italy, with about 9% of strongly common variation, are the ‘‘most strongly common’’ in the group of three.
Acknowledgements Marc Hallin gratefully acknowledges the support of the Sonderforschungsbereich ‘‘Statistical modelling of nonlinear dynamic
processes’’ (SFB 823) of the Deutsche Forschungsgemeinschaft and a Discovery Grant of the Australian Research Council. Part of this work was completed while visiting the Economics Department at the European University Institute in Florence under a Fernand Braudel Fellowship. The authors are grateful to Christine De Mol, Richard Spady, Marco Lippi and Mario Forni for many stimulating discussions and helpful suggestions. They also thank Alexei Onatski for pointing out a flaw in an earlier stage of the manuscript, as well as Franz Palm, Jean-Pierre Urbain, and two anonymous referees for their comments on the original version of the paper. Appendix. Proof of Lemma 1
¯ y the set (with Lebesgue measure zero) of Denote by Θ θ values for which divergence in Assumption A2(i) does not ¯ z , and let Θ ¯ := Θ ¯y ∪ Θ ¯z : Θ ¯ hold. Similarly define Θ also has Lebesgue measure zero. Since 6y;ny (θ ) is a principal submatrix of 6n (θ ), a classical result (see Corollary 1, page 293, in Lancaster and Tismenetsky, 1985) implies that, for any n = (ny , nz ) and θ ∈ [−π , π], λy;ny ,i (θ ) ≤ λn,i (θ ), i = 1, . . . , ny . Since λy;ny ,qy (θ ) diverges for all θ ∈ Θ as ny → ∞, so does λn,qy (θ ). A similar result also holds for the λz ;nz ,j ’s, so that, for all θ ∈ Θ , λn,max(qy ,qz ) (θ ) diverges as min(ny , nz ) → ∞. Note that the same result by Lancaster and Tismenetsky (1985) also implies that, for all θ and k, λn,k (θ ) is a monotone nondecreasing function of both ny and nz and, therefore, either is bounded or goes to infinity as either ny or nz → ∞. Next, let us show that λn,qy +qz +1 (θ ) is bounded as min(ny , nz ) → ∞, for all θ ∈ Θ . For all θ ∈ Θ , consider the sequences of n-dimensional vectors ζn (θ ) := (ζ′y;ny (θ ), ξ′z ;nz (θ ))′ which are orthogonal to the qy + qz vectors (p′y;ny ,1 (θ ), 0, . . . , 0)′ , . . . , (p′y;ny ,qy
(θ ), 0, . . . , 0)′ and (0, . . . , 0, p′z ;nz ,1 (θ ))′ , . . . , (0, . . . , 0, p′z ;nz ,qz (θ ))′ . The collection of all such ξn ’s is a linear subspace 4n (θ ) of dimension at least n − qy − qz . For any such ξn (θ ), in view of the orthogonality of ξy;ny (θ ) and py;ny ,1 (θ ), . . . , py;ny ,qy (θ ) (resp., of ξz ;nz (θ ) and pz ;nz ,1 (θ ), . . . , pz ;nz ,qz (θ )),
40
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41
Fig. 4. Decomposition of the France–Germany–Italy panel data into eight components, with the corresponding percentages of explained variation.
‖ξn (θ)‖−2 ξ∗n (θ )6n (θ )ξn (θ ) = ‖ξn ‖−2 ξ∗y;ny (θ )6y;ny (θ )ξy;ny (θ ) + ‖ξn (θ )‖−2 ξ∗z ;nz (θ )6z ;nz (θ )ξz ;nz (θ ) + ‖ξn (θ )‖−2 ξ∗y;ny (θ )6yz ;n (θ )ξz ;nz (θ ) + ‖ξn (θ )‖−2 ξ∗z ;nz (θ )6zy;n (θ )ξy;ny (θ ) ≤ 2(‖ξy;ny (θ )‖−2 ξ∗y;ny (θ )6y;ny ξy;ny (θ ) + ‖ξz ;nz (θ )‖−2 ξ∗z ;nz (θ )6z ;nz ξz ;nz (θ )) ≤ 2(λ2y;ny ,qy +1 (θ ) + λ2z ;nz ,qz +1 (θ )) for all θ ∈ Θ and n = (ny , nz ). Since λ2y;ny ,qy +1 (θ ) and λ2z ;nz ,qz +1 (θ )
are bounded, for any θ ∈ Θ , as min(ny , nz ) → ∞, so is ξ∗n (θ ) 6n (θ)ξn (θ ). Hence, for all θ ∈ Θ and n = (ny , nz ), 4n (with dimension at least n − qy − qz ) is orthogonal to any eigenvector associated with a diverging sequence of eigenvalues of 6n (θ ). It follows that the number of such eigenvalues cannot exceed qy + qz . Summing up, for all θ ∈ Θ , the number of diverging eigenvalues of 6n (θ) is finite – denote it by q – and comprised between max(qy , qz ) and qy + qz , as was to be shown. References d’Agostino, A., Giannone, D., 2005. Comparing alternative predictors based on largepanel dynamic factor models. ECB Working Paper 680. Aiolfi, M., Catão, L., Timmerman, A., 2006. Common factors in Latin America’s business cycle. Working Paper 06/49. International Monetary Fund. Altissimo, F., Bassanetti, A., Cristadoro, R., Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2001. A real time coincident indicator of the euro area business cycle. CEPR Discussion Paper 3108. Angelini, E., Henry, J., Mestre, R., 2001. Diffusion index based inflation forecasts for the euro area. ECB Working Paper 61. Artis, M., Banerjee, A., Marcellino, M., 2005. Factor forecasts for the UK. Journal of Forecasting 24, 279–298. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J., Ng, S., 2007. Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60.
Bernanke, B.S., Boivin, J., 2003. Monetary policy in a data rich environment. Journal of Monetary Economics 50, 525–546. Brillinger, D.R., 1981. Time Series: Data Analysis and Theory. Holden-Day, San Francisco. Bruneau, C., De Bandt, A., Flageollet, A., Michaux, E., 2007. Forecasting inflation using economic indicators: the case of France. Journal of Forecasting 26, 1–22. Campbell, J.Y., Lo, A.W., MacKinlay, A.C., 1997. The Econometrics of Financial Markets. Princeton University Press, Princeton. Chamberlain, G., 1983. Funds, factors, and diversification in arbitrage pricing models. Econometrica 51, 1281–1304. Chamberlain, G., Rothschild, M., 1983. Arbitrage, factor structure and meanvariance analysis in large asset markets. Econometrica 51, 1305–1324. Den Reijer, A.H.J., 2005. Forecasting Dutch GDP using large scale factor models. DNB Working Paper 028. De Nederlandsche Bank, Research Department. Dreger, C., Schumacher, C., 2004. Estimating large scale factor models for economic activity in Germany. In: Wagner, A. (Ed.), Jahrbücher für Nationalökonomie und Statistik. Lucius & Lucius Verlagsgesellschaft, Stuttgart, pp. 731–750. Favero, C., Marcellino, M., Neglia, F., 2005. Principal components at work: empirical analysis of monetary policy with large datasets. Journal of Applied Econometrics 20, 603–620. Forni, M., Lippi, M., 2001. The generalized factor model: representation theory. Econometric Theory 17, 1113–1141. Forni, M., Reichlin, L., 1998. Let’s get real: a factor analytical approach to disaggregated business cycle dynamics. Review of Economic Studies 65, 453–473. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. Review of Economics and Statistics 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2003. Do financial variables help forecasting inflation and real activity in the euro area? Journal of Monetary Economics 50, 1243–1255. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2004. The generalized dynamic factor model: consistency and rates. Journal of Econometrics 119, 231–255. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2005. The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association 100, 830–840. Forni, M., Giannone, D., Lippi, M., Reichlin, L., 2009. Opening the black box: structural factor models with large cross sections. Econometric Theory 25, 1319–1347. Geweke, J., 1977. The dynamic factor analysis of economic time series. In: Aigner, D.J., Goldberger, A.S. (Eds.), Latent Variables in Socio-Economic Models. North-Holland, Amsterdam. Giannone, D., Matheson, T., 2007. A new core inflation index for New Zealand. International Journal of Central Banking 3, 145–180. Giannone, D., Reichlin, L., Sala, L., 2005. Monetary policy in real time. In: Gertler, M., Rogoff, K. (Eds.), NBER Macroeconomic Annual 2004. MIT Press, Cambridge, Mass, pp. 161–200.
M. Hallin, R. Liška / Journal of Econometrics 163 (2011) 29–41 Giannone, D., Reichlin, L., Sala, L., 2006. VARs, factor models and the empirical validation of equilibrium business cycle models. Journal of Econometrics 132, 257–279. Hallin, M., Liška, R., 2007. Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association 102, 603–617. Hallin, M., Mathias, Ch., Pirotte, H., Veredas, D., 2011. Market liquidity as dynamic factors. Journal of Econometrics 163, 42–50. Ingersol, J., 1984. Some results in the theory of arbitrage pricing. The Journal of Finance 39, 1021–1039. Lancaster, P., Tismenetsky, M., 1985. The Theory of Matrices, 2nd ed. Academic Press, Orlando. Marcellino, M., Stock, J., Watson, M.W., 2003. Macroeconomic forecasting in the euro area: country specific versus area wide information. European Economic Review 47, 1–18. Ng, S., Moench, E., 2009. A hierarchical factor analysis of US housing market. Econometrics Journal (in press). Ng, S., Moench, E., Potter, S., 2008. Dynamic hierarchical factor models. Mimeo. Columbia University. Nieuwenhuyzen, C., 2004. A generalized dynamic factor model for the Belgian economy. Journal of Business Cycle Measurement and Analysis 2, 213–248. Robinson, P., 1991. Automatic frequency domain inference on semiparametric and nonparametric problems. Econometrica 59, 1329–1363.
41
Sargent, T.J., Sims, C.A., 1977. Business cycle modelling without pretending to have too much a priori economic theory. In Sims, C.A. (Ed.), New Methods in Business Research. Federal Reserve Bank of Minneapolis, Minneapolis. Schneider, M., Spitzner, M., 2004. Forecasting Austrian GDP using the generalized dynamic factor model. Working Paper 89. Österreichische Nationalbank. Vienna. Schumacher, C., 2007. Forecasting German GDP using alternative factor models based on large data sets. Journal of Forecasting 26, 271–302. Stock, J.H., Watson, M.W., 1989. New indices of coincident and leading indicators. In: Blanchard, O.J., Fischer, S. (Eds.), NBER Macroeconomics Annual 1989. MIT Press, Cambridge, Mass. Stock, J.H., Watson, M.W., 2002a. Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics 20, 147–162. Stock, J.H., Watson, M.W., 2002b. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167–1179. Stock, J.H., Watson, M.W., 2005. Implications of dynamic factor models for VAR analysis. NBER Working Paper 11467. Yao, T., 2008. Dynamic factors and the source of momentum profits. Journal of Business & Economic Statistics 26, 211–226.
Journal of Econometrics 163 (2011) 42–50
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Market liquidity as dynamic factors Marc Hallin a,b,c,d,h , Charles Mathias e,∗ , Hugues Pirotte f,g , David Veredas e a
ECARES, Université libre de Bruxelles, Belgium
b
ORFE, Princeton University, United States
c
Académie Royale de Belgique
d
CentER, University of Tilburg, The Netherlands
e
ECARES, Solvay Brussels School of Economics and Management, Université libre de Bruxelles, Belgium
f
Centre E. Bernheim, Solvay Brussels School of Economics and Management, Université libre de Bruxelles, Belgium
g
Luxembourg School of Finance, University of Luxembourg, Luxembourg
h
ECORE, Bruxelles and Louvain-la-Neuve, Belgium
article
info
Article history: Available online 12 November 2010 JEL classification: C33 C51 G10 Keywords: Commonality Liquidity Equities Factor models Block structure
abstract We use recent results on the Generalized Dynamic Factor Model (GDFM) with block structure to provide a data-driven definition of unobservable market liquidity and to assess the complementarity of two observed liquidity measures: daily close relative spreads and daily traded volumes for a sample of 426 S&P500 constituents recorded over the years 2004–2006. The advantage of defining market liquidity as a dynamic factor is that, contrary to other definitions, it tackles time dependence and commonness at the same time, without making any restrictive assumptions. Both relative spread and volume in the dataset under study appear to be driven by the same one-dimensional common shocks, which therefore naturally qualify as the unobservable market liquidity shocks. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Liquidity is ubiquitous in financial practice and theory and has a pristine definition: an asset is liquid if it is easily convertible into cash, the reference asset with perfect liquidity. This definition is often rephrased in terms of time, volume, and cost. Indeed, when people think about liquidity, they may think about trading quickly, about trading large size, or about trading at low cost (Harris, 2003, p. 394). Since Kyle (1985), these three dimensions are well defined. The time dimension refers to resiliency—the speed with which pricing errors caused by uninformative order-flow shocks are corrected or neutralized in the market. Cost refers to tightness—the accepted price for immediacy in resolving the trade. Last, volume refers to depth—the volume that can be traded without price variations. See Minguet (2003), O’Hara (1998) or Schwartz (1993), among others, for further details.
Though the qualitative concept of liquidity is clear, its quantitative evaluation poses a major problem. Liquidity indeed is an unobserved variable, which implies that it has to be evaluated from the measurement of liquidity-related quantities or proxies, known as liquidity measures. But this is a delicate task because of the difficulty (i) to capture the three dimensions of liquidity in a single measure and (ii) to reach a consensus on the liquidity measures to be taken into account. This double difficulty seriously challenges the objectivity of any final assessment. The simplest liquidity measures currently considered in the empirical literature cover only one of the three dimensions. Trade durations, for instance, defined as time intervals between two trades, clearly carry liquidity-related information, but only cover the time dimension, ignoring tightness and depth. Moreover, they require tick-by-tick data: at lower frequencies (such as daily frequency), trade durations cannot be computed, as observations are regularly spaced.1 Daily close or open bid–ask spreads, defined as the difference between the
∗ Corresponding address: Université Libre de Bruxelles, 50 Av F.D. Roosevelt CP139, B1050 Brussels, Belgium. Tel.: +32 26504133; fax: +32 26504475. E-mail addresses:
[email protected] (M. Hallin),
[email protected] (C. Mathias),
[email protected] (H. Pirotte),
[email protected] (D. Veredas).
1 Other measures of resiliency are possible, though. Dong et al. (2007) estimate resiliency as the observed mean-reversion parameter in the stock’s pricing-error process (due to a shock in the order flow) via Kalman-filtering methods. The resulting estimate therefore is a model-based measure of resiliency that is not observed directly but a function of prices and volume (order flow).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.005
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
lowest ask and highest bid prices for an asset at some given point in time, measure liquidity effects as well, but mainly cover tightness. Daily realized volumes also measure the liquidity effects of an asset, but only cover its transacted depth. A number of papers have proposed measures that combine tightness and depth, such as the Hasbrouck and Seppi (2001) quote slope, the Domowitz and Wang (2002) order book integral, the Amihud (2002) ratio of average volume effect on absolute returns, or the Pastor and Stambaugh (2003) measure for average volume-related return reversal. The former two consider the existing shape of the available order book through time, while the latter rely on transacted prices and volumes. A common drawback of all those measures is that we do not know up to what extent they capture liquidity dynamics only. To get a deeper understanding of liquidity, we should study interrelations between the various liquidity measures and investigate their relation to liquidity. Literature on this subject is scarce, however, and most analyses are based on one single liquidity measure. The same comments apply in the analysis of market liquidity, where the aim is to understand commonness in liquidity across securities. Chordia et al. (2000) and Hasbrouck and Seppi (2001) find that a common or ‘‘market’’ component is significantly present in various liquidity measures taken over a large cross-section of stocks. Amihud (2002), Eckbo and Norli (2002), Pastor and Stambaugh (2003) or Acharya and Pedersen (2005) find that market liquidity explains price differences across assets. The influence of these articles should not be underestimated, as they all substantially increase our knowledge and understanding of the role of liquidity in financial markets. However, they still rely on the choice of a single liquidity measure, without much of an analysis of its net informative contribution in explaining the market liquidity phenomenon. A first attempt to assess market liquidity based on several liquidity measures via principal component methods was made by Korajczyk and Sadka (2008); although based on an entirely different technical approach, our contribution is largely in the same spirit. None of those previous contributions, indeed, fully exploits the time series nature of the data. In particular, they all overlook the leading/lagging phenomenons that may exist among the various liquidity measures and are particularly relevant here, since liquidity-related data are highly autocorrelated. Taking such time series features into account naturally brings the Generalized Dynamic Factor Model (henceforth GDFM) methods into the picture. These methods were developed in a series of papers by Forni et al. (2000, 2004, 2005), Forni and Lippi (2001), and Hallin and Liška (2007, 2011). They allow for disentangling commonness (market components) and idiosyncrasy (stockspecific components), not only across panels consisting of some given liquidity measure observed over a large number of stocks, but also across panels juxtaposing several such measures. Contrary to other dynamic factor methods (such as Stock and Watson, 2002a,b, or Bai and Ng, 2002), GDFM methods do not impose any restriction (beyond the usual assumptions of second-order stationarity, etc.) on the actual data-generating process. We examine here the complementarity of two simple and widely used liquidity measures: daily close relative bid–ask spread and daily realized dollar volume, for 426 S&P500 listed stocks from January 2004 till December 2006. This is a period characterized by a ‘‘steady’’ state of market liquidity, which is appropriate for the scope of this study.2 A period with extreme events and/or liquidity crunches, though important, would introduce distortions in the measurement of commonness, which are left for future research. It should be also emphasized that our analysis is based exclusively
2 While allowing for serial auto- and cross-correlation, our methods indeed require second-order stationarity.
43
on observed market variables, and does not involve functions of those observed quantities, such as the Amihud (2002), or Pastor and Stambaugh (2003) measures. Contrary to volume and spread, which are primitive measures of liquidity, those measures, as well as those proposed by Dong et al. (2007), already are aggregates of prices and volume, aiming at a similar objective as we do. Therefore, there is no point in incorporating them in our analysis, as this only would distort the overall picture. Our methodological tool throughout this paper is the method proposed by Hallin and Liška (2011) for the analysis of large panels with block structure, where the blocks represent the two subpanels of volume and relative spread, respectively. The Hallin and Liška method permits us to identify, estimate and compare the factors driving each subpanel and the factors driving the joint panel. In particular, it allows us to assess up to what extent commonality in volume coincides with commonality in relative spread. As volume and relative spread cover different aspects of liquidity (depth and tightness, respectively), they are a priori unlikely to carry exactly the same information: it could be that some features of liquidity are explained by realized volume but not by relative spread, and conversely. In GDFM terms, this means that some common spread shocks might a priori be idiosyncratic to volume and vice versa. Moreover, the analysis using GDFM takes into account the dynamic interactions between the two measures: some liquidity features indeed may be leading in volume while lagging in spread, and vice versa. Our analysis admittedly does not include a third subpanel with a measure of resiliency. Trade durations would be natural candidates; being of a tick-by-tick nature, however, they do not match the daily sampling features of our panels. Alternatives are possible, such as daily trade intensity (daily number of trades) series, but these are highly correlated with volume. The measure proposed by Dong et al. (2007) is another candidate, but it is model based and hence not a primitive measure of liquidity. This issue clearly deserves further investigation. Our findings mainly go in three directions. First, it appears that the common relative spread and common volume spaces coincide, and have dynamic dimension one. This means that, although relative spread and volume cover different aspects of liquidity, their market or common components have the same origin and thus carry the same information. Moreover, that common space being one-dimensional, it is driven by a unique shock, which therefore strongly qualifies as the unobservable market liquidity shock. This suggests some market homogeneity with respect to liquidity, and that there should be no distinct liquidity effects originating, for instance, from different sectors or different types of investors. Second, on average, market-related shocks account for 12% of the total variation of a stock’s relative spread and for 18% of the total variation of its volume. This may seem a rather low proportion, but is not surprising when compared to the variance decompositions obtained in Hasbrouck and Seppi (2001) and Chordia et al. (2000), even though we should be careful with such comparisons, given that databases and the nature of measures differ. Third, we observe a significant difference between idiosyncratic spread and volume correlograms. On average, idiosyncratic relative spread components are only weakly autocorrelated, but they are persistent. By contrast, volume components exhibit higher autocorrelations, with much faster decay. This difference can be explained by the fact that S&P500 constituents are highly traded stocks with thick limit order books. Relative spreads for such stocks do not move quickly, so that the impact of a market shock stays for long. Volumes, on the contrary, are more flexible, and their adaptation to changing market circumstances is much faster. The outline of the article is as follows. Section 2 provides a short survey of the literature, and explains why the GDFM approach is more appropriate. In Section 3, we present the building blocks of GDFM. Section 4 gives information about the dataset used and comments on the liquidity measures considered. The main results are presented in Sections 5 and 6 concludes.
44
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
2. Commonness in liquidity The analysis of liquidity is deeply rooted in market microstructure theory; its origins can be traced back to Kyle (1985) and Amihud and Mendelson (1986), among others. Most of the models developed under the market microstructure perspective focus on the liquidity of individual securities, and give little attention to the common determinants of liquidity across the market. On the other hand, modern finance emerged through the study of portfolio theory and the benefits of risk diversification by exploiting return volatility risk and its market component, paying little attention to liquidity risk, though. The present article builds further on recent contributions with respect to the identification of market liquidity. The main idea is that, while agreeing on the existence of standard market illiquidity (i.e. illiquidity in ‘‘normal’’ market conditions), the liquidity of every asset can be seen as the sum of a common (hence non-diversifiable) and an idiosyncratic (hence diversifiable) component, in the same sense as that for return volatility. The contributions and methodology of three leading articles in the identification of common versus idiosyncratic liquidity are discussed in the next paragraphs. Chordia et al. (2000) look at all NYSE transactions in 1992 and analyze commonality in daily percentage changes of five orderbook-related liquidity measures (quoted and effective spread, both in absolute and in proportional terms, and quoted depth)3 by considering regression equations of the form i ∆LIQit = βi ∆LIQM t + ξt ,
(1)
LIQit
where denotes the value taken by one of the five liquidity measures at time t for stock i, LIQM t is the average liquidity over all stocks except i, and ∆ = 1 − L where L denotes the lag operator. The first term on the right hand side accounts for market-related variations. The second term, ξti = αi + εti , includes the intercept of the regression, αi , and a white noise term εti . The authors find that the βi ’s are significantly different from zero, which indicates the presence of common underlying determinants of liquidity (the authors perform two distinct analyses, based on unweighted and value-weighted averages, respectively). However, they obtain low R2 for all measures (about 4%). Hasbrouck and Seppi (2001) perform a similar analysis on four 15 min interval order-flow measures (share volume, dollar volume and square root of dollar volume, both signed and in absolute values) of 30 Dow Jones stocks traded during 1994. Instead of the differences ∆LIQit , they consider the levels LIQit of each liquidity measure. Moreover, they use principal component analysis instead of linear regressions, by defining LIQM t as the principal component of the order flows of the 30 stocks: i LIQit = βi LIQM t + ξt .
(2)
They find that, for signed dollar volume, 8% of the total variance is explained by the first common factor, and empirically suggest that the second and third common components could be negligible. Korajczyk and Sadka (2008) use the asymptotic principal component method of Connor and Korajczyk (1986) and the EM algorithm to identify market liquidity. In a very large sample (more than 4000 stocks followed over 18 years) they apply both methods to eight different liquidity measures independently: the Amihud (2002) monthly average effect of volume on absolute value return,
3 The quoted spread is the difference between the best ask and bid quotes. The effective spread is twice the observed deviation of the price at which the transaction took place and the midquote, is the average between highest bid and lowest ask. Proportional (or relative) spreads are quoted spreads divided by the midquote. Quoted depth is the depth at the best quotes.
turnover (i.e. the ratio of monthly volume and shares outstanding), quoted percentage spread, effective percentage half-spread and four parametric estimates of price impact components. As in Hasbrouck and Seppi (2001), the authors consider up to the third principal component. They obtain very different R2 for the different measures: 7% for turnover versus 24% for effective spread, when looking at the first principal component. The major contribution of Korajczyk and Sadka (2008) is that, for their pricing experiment, they also compute common liquidity by including their eight different liquidity measures into a single joint panel, as they argue that this would eliminate some liquidity measurement bias. All these articles quite significantly contributed to the study and understanding of commonalities in liquidity. None of them, however, takes into account the time series nature of the various liquidity measures. Assuming the ξti ’s in (1) and (2) to be serially uncorrelated clearly is unrealistic; it requires, for instance, LIQM t in (2) to account for all dynamic aspects of all LIQit ’s. Lagged influence of unobserved common liquidity factors is also precluded: LIQM t is a purely static principal component which only depends i on contemporaneous LIQt ’s, with an implicit and questionable assumption that the liquidity characteristics of all stocks are perfectly synchronized. The Generalized Dynamic Factor Model (GDFM) estimates commonality in a spirit which is somewhat similar to Hasbrouck and Seppi (2001) and Korajczyk and Sadka (2008) in the sense that it also seeks for a variance-maximizing linear combination of observations. The main difference, however, is that, by allowing for lagged loadings (instead of contemporaneous) and autocorrelated idiosyncratics (rather than white noise residuals), it does not force any model on the data, while fully exploiting their time series nature. In the present context, this means that GDFM tackles persistence and co-movement at the same time, as it estimates the effect of common shocks. This also implies that all common components are orthogonal, at all leads and lags, to all the idiosyncratic ones, while allowing for mild cross-sectional correlation among the idiosyncratic components of distinct individual stocks (typically, the idiosyncratic component of a given stock may yield autocorrelation with a finite number of other, closely related, idiosyncratic components). The latter is an attractive property in the study of liquidity, as it provides us with a clear and formal distinction between commonness and idiosyncrasy. Another advantage of the GDFM theory is that it identifies the dimension of the common space, as opposed to Hasbrouck and Seppi (2001) and Korajczyk and Sadka (2008), who look at the first three principal components without having a rigorous criterion on whether they all significantly contribute to commonality. Finally, the GDFM method allows for a global analysis of (arbitrarily many) different liquidity measures. The next section describes in detail the essential features of the GDFM method. 3. Dynamic factors and commonness in liquidity First consider a panel of n stocks, for which some liquidity measure has been recorded over a time period of length T . Denote by LIQit , i = 1, . . . , n, t = 1, . . . , T the observation made at time t for stock i. These observations are treated as finite realizations of a double-indexed zero-mean second-order stationary stochastic process {LIQit : i ∈ N, t ∈ Z}. Both n and T are assumed to be large, and asymptotic statements are made as both n and T tend to infinity. Denote by Σ n (θ ) the n × n spectral density matrix of the n-dimensional vector process {LIQn,t := (LIQ1t , . . . , LIQnt )′ ; t ∈ Z}, and assume that, for all n ∈ N, k ∈ {1, . . . , n} and some ck > 0, supθ (Σ n (θ ))kk ≤ ck . For any θ ∈ [−π , π], let λn,k (θ ) be Σ n (θ )’s kth eigenvalue (in decreasing order of magnitude).
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
45
Denote by q the number of such diverging eigenvalues, that is, define q as q := min{k ∈ N : sup ‖λn,k (θ )‖ < ∞ θ − a.e.} − 1, n
and assume that q < ∞. Theorem 2 in Forni and Lippi (2001) then establishes the existence of a unique decomposition of LIQit into LIQit = χti + ξti = B′i (L)ut + ξti
for all i ∈ N, t ∈ Z,
(3)
where χ and ξ are mutually orthogonal at all leads and lags, ut := (u1t , . . . , uqt )′ is a q-dimensional orthonormal white noise, and Bi (L) := (Bi1 (L), . . . , Biq (L))′ is a vector of square-summable filters (the decomposition into χti and ξti is unique; the filters Bi (L) and the ut ’s are not). Eq. (3), with unspecified q, thus is not a statistical model, but a canonical representation of the panel under study— contrary to (1) or (2). That representation is called the dynamic factor representation of LIQit ; the χti ’s are the common, and the ξti ’s the idiosyncratic components, respectively, of LIQit . The process {χi0 ,t } is cross-correlated with infinitely many liquidity measure processes {LIQit }, i ̸= i0 , as n → ∞ and therefore can be identified as the component of {LIQi0 ,t } which is driven by the market, while ξi0 ,t is specific to stock i0 (market uncorrelated), and presents cross-correlations with a finite number of related cross-sectional processes only. The Hilbert space spanned by the χti ’s is called the common space. It has dynamic dimension q and its elements are market liquidity variables. The corresponding innovation process {vt : t ∈ Z} (namely, any orthonormal white noise such that the Hilbert space generated until time s coincides with the Hilbert space generated up to s by all χti ’s, i ∈ N) naturally are interpreted as the market liquidity shocks. Forni et al. (2000) show how the common and idiosyncratic components χti and ξti can be consistently reconstructed from the observed LIQit ’s, along with estimators of their respective variances. The variance decomposition i t
i t
Var[LIQit ] = Var[χti ] + Var[ξti ]
(4)
for given i (because of stationarity, these variances do not depend on t) of course indicates how common or idiosyncratic the liquidity of a particular stock i is. The Hallin and Liška (2011) method for the analysis of panel data with block structure very much relies on the Hallin and Liška (2007) procedure for the identification of the number of dynamic factors. That identification procedure consists in tuning the penalty term of an information-theoretic criterion by a positive multiplicative constant c. A grid (nℓ , Tℓ ), ℓ = 1, . . . , L of increasing n and/or T values (nL = n; TL = T ) is considered, and, for each value of c and ℓ, a number qℓ (c ) of factors is selected as the value of q ∈ N minimizing the information criterion with tuning constant c, computed from a panel consisting of the series 1, . . . , nℓ observed over t = 1, . . . , Tℓ . A particular value c ∗ is then chosen as the second smallest value of c for which the L selected qℓ (c )’s are stable across the (nℓ , Tℓ ) grid. In practice, this is achieved by examining a double plot. In the first one, the empirical variance V (c ) of the L-tuple (q1 (c ), . . . , qL (c )) is plotted against c; the second plot provides the corresponding final selection qL (c ) as a function of c. The number of factors which is ultimately selected then is qˆ := qL (c ∗ ), where c ∗ belongs to the second interval of c values for which the empirical variance V (c ) takes value zero. Hallin and Liška provide several versions of their criterion, showing that they all yield essentially the same results. We therefore adopted their IC2 log criterion, together with their penalty p1 (n, T )—see Section 3.2 of Hallin and Liška (2007) for details. A major advantage of the GDFM in the analysis of market liquidity is that several liquidity measures can be handled, either jointly or separately, via the Hallin and Liška (2011) block structure methodology. The blocks here are the (sub)panels associated
Fig. 1. Schematic representation of the Hilbert space decomposition for two blocks.
with a given liquidity measure—here, relative spread and volume. The method, as we shall see, provides interesting insights into the interrelations between those two measures, and answers such questions as ‘‘do relative spread and volume convey the same information about market liquidity?’’, ‘‘should we choose one of them, or rather combine them?’’, ‘‘is there an optimal way to do so?’’, etc. For K = 2 blocks (in order to fix the ideas, call them relative spread and volume, respectively, and denote by SPRn and VOLn the corresponding subpanels), the method actually decomposes the Hilbert space spanned by all variables in the joint panel LIQn (consisting of all SPRit ’s and VOLit ’s, for i = 1, . . . , n and t = 1, . . . , T ) into several Hilbert subspaces, spanned (for the sake of simplicity, we write ‘‘spread-common’’ instead of ‘‘relative-spread-common’’, etc.) by the spread- and volume-common, spread-common and volume-common factors, and their respective orthogonal complements. Under special conditions (which are satisfied here in view of the fact that the total number of factors is only one), this decomposition yields four mutually orthogonal subspaces. Projecting SPRit and VOLit onto those four subspaces yields the following refinements of (3): i i i i SPRit = φSPR ;t + ψSPR;t + νSPR;t + ξSPR;t , i i i i VOLit = φVOL ;t + ψVOL;t + νVOL;t + ξVOL;t .
(5)
The φ i;t ’s and ξ i;t ’s are called strongly common and strongly idiosyn-
cratic, the ψ i;t ’s and ν i;t ’s weakly common and weakly idiosyncratic components, respectively; under strong block structure, they all are mutually orthogonal—see Hallin and Liška (2011) for details. This decomposition is shown in Fig. 1. Now, if (as will appear in Section 5) all spread-common and volume-idiosyncratic and all volume-common and spreadidiosyncratic components are zero, that is, if (5) boils down to i i SPRit = φSPR ;t + ξSPR;t
and
i i VOLit = φVOL ;t + ξVOL;t ,
(6)
spread and volume are driven by the same common shocks, which unambiguously can be interpreted as the market liquidity shocks. 4. Data We consider n = 426 S&P500 constituents that were listed from Monday January 5th, 2004, to Friday December 29th, 2006, and that were still listed in November 2008. This is a period characterized by ‘‘standard’’ market illiquidity, i.e. without extreme illiquidity conditions, which is appropriate for the scope of this article. From Reuters 3000 Xtra, we extracted, for each of these stocks, the daily close best ask, daily close best bid and the daily realized dollar volume, from which we constructed two liquidity measures. The first one is the relative spread, defined as SPRit := (askit − bidit )/mqit , where askit (respectively bidit ) is the daily close
46
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
Table 1 Descriptive statistics. Relative Spread
All stocks [0, Q 0.25 ) [Q 0.25 , Q 0.5 ) [Q 0.5 , Q 0.75 ) [Q 0.75 , +∞)
Volume
mx¯
IQRx¯
ms2
IQRs2
mx¯
IQRx¯
ms2 x
IQRs2
0.96 0.62 0.78 0.95 1.46
0.35 0.08 0.06 0.08 0.45
0.58 0.26 0.33 0.42 1.26
0.29 0.15 0.12 0.13 0.74
2.59 0.56 1.18 2.16 6.14
2.07 0.31 0.34 0.57 3.60
6.70 0.16 0.69 1.82 18.51
2.23 0.15 0.57 1.10 15.54
x
x
x
Descriptive statistics for the relative spread and volume for all stocks, and for subsets thereof classified by quantile ranges (subsequent rows); Q α stands for the quantile of order α and [Q α1 , Q α2 ) for all the stocks with liquidity measure (either relative spread or volume) lying between the α1 - and α2 -quantiles. The column mx¯ (respectively ms2 ) shows the sample mean of individual stock means (respectively variances), and the column IQRx¯ (respectively IQRs2 ) the interquartile range of individual stock means x x (respectively variances).
best ask (respectively bid), and mqit := (askit + bidit )/2 is the midquote of stock i at day t. We denote the spread subpanel by SPRn := {(SPR1t , . . . , SPRnt )′ ; t = 1, . . . , T }. The second measure is the realized dollar volume, denoted by VOLit . The corresponding subpanel (or volume subpanel) is denoted by VOLn , the total panel by LIQn := (SPRn ′ , VOLn ′ )′ . There are several reasons for choosing these two measures. First, both are simple and widely used in practice. Second, each of them covers a different dimension of liquidity: VOLit is a proxy for depth and SPRit a proxy for tightness. Third, they cover different aspects of the trading process: SPRit is a pre-trading measure, conveying information about the state of the limit order book and the immediacy cost (measuring liquidity ex ante), whereas VOLit is a post-trading measure, conveying information about the actual trade (measuring liquidity ex post). Prior to estimation, we applied some algorithms to clean the data. First, days on which trading for more than 80% of the stocks was suspended were eliminated from the analysis. Second, days for which at least one stock showed negative spread also were eliminated. Third, missing spread or volume values were interpolated. In total, this leaves us with T = 747 observation dates. Finally, SPRit is multiplied by 103 and VOLit by 10−6 . Relative spread for S&P500 constituents is very small, which may entail numerical problems. Likewise, traded volumes are very large, which may entail numerical problems as well. Last, we also checked for the presence of weekly seasonality. Results, available on request, indicate that less than 5% of the volume and spread series show significant weekly seasonal patterns. Table 1 shows descriptive statistics of the means (x¯ ) and variances (s2x ) of the liquidity measures per stock over the considered period. Their cross-sectional means (denoted mx¯ and ms2 , respectively) and interquartile ranges (IQRx¯ and IQRs2 ) are x x computed for the complete collection of all stocks (first row) and for subsets of them. The subsets are defined according to the 25, 50 and 75% quantiles of relative spreads and volumes. So, for instance, row [Q 0.25 , Q 0.5 ) shows the mean and interquartile range of the means and variances of the stocks whose average relative spread and volume (over the period under study) are between 25 and 50 quantiles. Two conclusions can be drown from this table. First, stocks with large relative spread and volume (columns mx¯ ) also have large variances (columns ms2 ). Highly liquid stocks have x tight spreads that do not move much, as the limit order book is very thick. However, since these are the stocks that are the most scrutinized, they are the first to react if an event arrives at the market (such as a bad macroeconomic news) and hence will experiment large changes in trading volume. Second, the interquartile range tells us that the stocks that are on the tails of the distributions of relative spread and volume show the largest discrepancies at both the average and variance of the liquidity measures between the stocks. As far as this article is concerned, the question is whether, regardless of these differences, the stocks share a common component and up to what extent it drives the
relative spread and volume across all stocks. We provide an answer to this question in Section 5. The left plots in Fig. 2 show the evolution of the averaged relative spread (top plot) and volume (bottom plot) over the 426 stocks. Visual inspection does not suggest any violation of the assumption of second-order stationarity. Yet, it clearly reveals some heteroskedasticity. Note that these plots show averaged values. The assumption for applying GDFM is that all the relative spread and volume series in SPRn and VOLn are second-order stationary. We applied Phillips–Perron unit root tests to the spread and volume series. At 1% confidence level, the null hypothesis of a unit root was rejected for all of them (detailed results are available on request). Right plots of Fig. 2 show the autocorrelation functions of the averaged relative spread (top plot) and volume (bottom plot). They confirm the well-known stylized fact that liquidity time series are strongly autocorrelated.4 The averaged relative spread seems to be more persistent than averaged volume, as the autocorrelations decay more slowly. In the spirit of Chordia et al. (2000), these averaged relative spread and volume series can be seen as estimates of market liquidity. Their strong and persistent autocorrelations give more credit to our claim that liquidity needs models that can handle time dependencies. The left plot in Fig. 6 presents a scatter plot of the contemporaneous average spread and volume time series. This plot shows a cloud with undefined principal directions, indicating that relative spread and volume are not contemporaneously correlated. Correlations however are found when looking at lagged relationships. The right plot shows the cross-correlations (y-axis) for different orders (x-axis). Negative orders stand for the relation between lagged volume and lead relative spread. So, for instance, the correlation of order −5 explains the relationship between volume five days ago and today’s relative spread. We observe that, although there is some cross-correlation at the negative orders, the bulk of it is on the positive orders: relative spread leads volume. The conclusions drawn upon these cross-correlations go in the same direction as for the autocorrelograms of Fig. 2: liquidity panels show important time series features that cannot be explained nor exploited via static factor models. In the next section, we show the results of a GDFM analysis. 5. Empirical results As explained in Section 3, we use the Hallin and Liška (2007) information criterion to identify the number of factors, that is, the dynamic dimensions of the common spaces of the three (sub)panels SPRn , VOLn and LIQn . Identification is based on a visual inspection of the three double plots of Fig. 4. For each (sub)panel, the figure shows a measure (the variance V (c ), dashed line) of the
4 The presence of long memory is not discarded; long memory, however, is not incompatible with second-order stationarity.
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
47
Fig. 2. Averaged data and their autocorrelograms. Averaged (over all stocks) observed liquidity measures (left), along with their autocorrelograms (right).
instability of the selection associated with various values of the tuning constant c, along with the final selection associated with the same value of c (solid line). The procedure then consists in spotting the second interval (starting from the left) of c values over which the dashed line touches the horizontal axis (hence, V (c ) = 0); the number of factors to be selected is then obtained by reading, on the solid line curve, the corresponding value shown by the solid line. Each of the three plots leads to a selection of one single factor. This and Lemma 1 in Hallin and Liška (2011) implies the existence of a unique strongly common factor driving the common components of both subpanels, thus yielding the particular case, described in (6), of empty weakly common and weakly idiosyncratic spaces; see Fig. 5. In a way, this result is in line with the empirical intuition provided by Hasbrouck and Seppi (2001), who argue that only the first principal component, in their principal component analysis approach, should be taken into account. Two important conclusions can be drawn from this result. First, SPRn and VOLn share the same common space, meaning that the shocks driving commonness in relative spread and commonness in volume are the same. This result supports the conjecture that, in a liquid market such as S&P500, a single liquidity measure (either relative spread or volume) suffices to understand market liquidity dynamics. Second, this common space presents a dynamic dimension one, suggesting some homogeneity when markets are confronted with liquidity. It may be an indication that no market liquidity effects are originating from, for instance, different sectors or different types of investors but only from the market itself. Such effects may exist; but then they only have an impact on idiosyncratic components, hence on a limited number of stocks. The innovation of the one-dimensional factor driving both the common spread and the common volume components therefore strongly qualifies as the unobservable market liquidity shock (Fig. 5).
We find that, on average, market liquidity accounts for 12% of total variations of relative spread and for 18% of total variations of volume, as shown in Fig. 5. These proportions are much larger than those of Chordia et al. (2000) (about 2%–4%), larger also than those of Hasbrouck and Seppi (2001) (about 8%–14%), and they are comparable to those of Korajczyk and Sadka (2008). These differences can be due to an array of reasons: different databases, different time frequencies, different liquidity measures, and different methods to extract commonality in liquidity. Yet, our results support previous studies and offer an alternative based on a representation result, rather than on model assumptions, and on a rigorous methodology that leaves little room for subjectivism. Idiosyncratic components on the other hand account for 88% and 82% of total variations of relative spread and volume. As mentioned earlier, this large proportion does not mean that relative spread and volume are very noisy measures. Since GDFM allows for mild correlation among the idiosyncratic components of individual stocks, some groups of stocks may share liquidity drivers that are uncorrelated with market liquidity. Further, idiosyncratic terms in the GDFM may be autocorrelated, i.e. an idiosyncratic component at time t may contain information about its future values. All these percentages and variance decompositions are proportions averaged over the panel. Individual decompositions, however, also may reveal interesting features. The plots in Fig. 6 show the proportions explained by the common components for each individual stock (left plot is for relative spread, right plot for volume). Stocks are ordered from the smallest to the largest relative spread and volume, respectively. Vertical lines divide the stocks according to the same quantile ranges as in Table 1. In that table we found significant differences for those stocks with the largest relative spread and volume. Similar features are observed in Fig. 6. For stocks with average relative spread and volume smaller than the 75% quantile, the proportions of total variance explained by
48
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
Fig. 3. Scatter plot and cross-correlogram. Left plot shows the scatter plot of relative spread against volume. Right plot shows the cross-correlogram of averaged relative spreads against averaged volumes. Negative lags correspond to cross-correlations between leading volumes and lagging relative spreads.
Fig. 4. Implementation of the Hallin-Liška identification method. Application of the Hallin and Liška (2007) information criterion in the identification of the dynamic dimensions of the common spaces of the various (sub)panels (top left: relative spreads; top right: volumes; bottom: joint panel), as described in Section 5. Dashed lines are a measure of the instability of the selection associated with various values of the tuning constant c. Solid lines are the final selected number of factors as a function of the tuning factor c.
market dynamics do not exhibit any clear pattern, even though showing significant differences among them. However, beyond the 75% quantile, the market impact on total variances seems to be increasing with the liquidity measure value (this is very clear for volume, somewhat less so for relative spread since there is also a large increase in variability). This fact may be explained with similar arguments as in the discussion of Table 1. Relative spreads lying beyond the 75% quantile are more sensitive to market conditions, as their limit order books are thinner. A shock in market liquidity should imply a reaction in relative spread that is more important than it is for more liquid stocks. On the other hand, stocks with the largest volumes, the blue chips, are the driving forces of the market, which means that a shock in market liquidity entails a reaction in volume which is larger than that for the less liquid stocks.
Fig. 7 shows a plot of the cross-sectional average of relative spread (top left) and volume (top right) autocorrelations, along with the corresponding plots for their common (middle plots) and idiosyncratic components (bottom plots). These plots reveal quite interesting differences between those components. First, note that the autocorrelations of averaged relative spreads, as shown in Fig. 3, strikingly differ from these averaged autocorrelations of observed series and common components. Averaging all spreads and volumes indeed cancels out all leading/lagging dynamics that may exist between stocks, and therefore provides a biased picture of reality. These important differences, especially for relative spread, demonstrate the danger of using averaged relative spreads or volumes as a proxy for market liquidity, and, more particularly, the danger of defining market liquidity
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
49
Fig. 6. Variances of common components of individual stocks. Proportions of variance explained by the common components for all individual stocks and for relative spread (left plot) and volume (right plot). Stocks are ordered from the smallest relative spread (respectively volume) to the largest.
6. Conclusions
Fig. 5. Spaces, factors and variance decompositions.
shocks as the innovation of cross-sectionally averaged spread of volume series. Second, by the fundamental property of GDFM that common and idiosyncratic components are mutually orthogonal at all leads and lags, the average autocorrelations of observed series are a linear combination of the common and idiosyncratic average autocorrelations, with coefficients given by the variance ratios. As a consequence of idiosyncratic predominance in variance decompositions, the autocorrelation profiles of observed series look closer to those of the corresponding idiosyncratic components than to those of the common ones. Third, the autocorrelations for the common components look very different for relative spread and volume. This indicates that the ways market liquidity shocks are transferred to relative spread and to volume also are quite different. While the impact of a market liquidity shock on volume is instantaneously very significant, it also vanishes relatively fast. By contrast, the same market liquidity shock has a rather weak impact on relative spread, but that impact decays very slowly with time. Finally, observe that idiosyncratic volume components are much more autocorrelated than the idiosyncratic relative spread ones. This indicates that observed persistence in relative spread almost entirely originates in market dynamics, whereas serial autocorrelations for observed volume clearly have both market-wide and idiosyncratic origins. A possible explanation for this is that, while the relative spread essentially is a bounded variable, and less dependent on stock specificities, trading volumes are clearly connected to the size of the firms, so that huge cross-sectional magnitude discrepancies may exist. Summing up, although relative spread and volume provide equivalent characterizations of market liquidity as dynamic factors, one should not conclude that they constitute equivalent liquidity measures, since the ways they react to market liquidity shocks are drastically different.
The GDFM presents a number of advantages in the identification, analysis and forecasting of market liquidity dynamics. First, unlike its competitors, it is based on a general representation result which is free of restrictive model assumptions (Forni and Lippi, 2001), yielding a data-driven characterization of market liquidity. Second, it tackles co-movement and time dependencies, two stylized facts of liquidity time series. Third, it provides a clear distinction between commonness and idiosyncrasy. Fourth, it allows us to estimate the dimension of the common space. Finally, it allows identifying commonality over different liquidity measures on a global analysis. An application of GDFM to panels of relative spreads and volumes suggests that these two liquidity-related quantities actually convey the same information about market liquidity. The one-dimensional common shocks driving these two panels therefore strongly qualify as the unobservable market liquidity shocks. Such results of course are calling for more extensive and detailed empirical investigations, involving larger databases and further liquidity measures. Extensions of the present paper go hand in hand with further developments in the theory of GDFM. For instance, a most attractive research direction would consist, in a comparative study, based on their dynamic factor loading filters, of stocks and liquidity measures. These filters indeed characterize the way these stocks and liquidity measures react to market liquidity shocks. A clean identification of the impact of liquidity shocks on various liquidity measures also would lead to a better assessment of the links between liquidity and asset pricing, and a better analysis of the macroeconomic drivers of liquidity. Uncertainty in liquidity could be dealt with by applying the Dynamic Factor GARCH model introduced by Alessi et al. (2009),5 while the Eichler et al. (2011) extension of GDFM to non-stationary time series opens the door, for instance, to a better understanding of financial crises. Last, an empirical analysis to the commonness in liquidity between high and low capitalized stocks is worth studying. Acknowledgements We are grateful to Marco Lippi, Roman Liška, Roberto Pascual, Gonzalo Rubio, two anonymous referees, the editors Franz Palm and Jean-Pierre Urbain, and the participants of several conferences: (XVII Foro de Finanzas, FERC Conference on Individual Decision Making, High Frequency Econometrics and Limit Order
5 Note however, that GARCH effects are, by definition, a conditional secondorder moment phenomenon. Our analysis is based on unconditional second-order moments.
50
M. Hallin et al. / Journal of Econometrics 163 (2011) 42–50
Fig. 7. Averaged autocorrelograms: Observed, Common and Idiosyncratic. Averaged (over all stocks) autocorrelograms of observed liquidity measures (left), their common (middle) and idiosyncratic (right) components; to be contrasted with Fig. 3, where autocorrelograms of averaged (over all stocks) liquidity measures, common and idiosyncratic components, are shown.
Book Dynamics, 3rd Brussels-Waseda workshop on Time Series and Financial Statistics, 6th Conference on Computational Management Science), and seminars (Columbia University, Universidad de Alicante, HEC-Liège) for useful suggestions and comments on a previous version of this article. Marc Hallin acknowledges the financial support of the Sonderforschungsbereich ‘‘Statistical modelling of nonlinear dynamic processes’’ (SFB 823) of the Deutsche Forschungsgemeinschaft, and a Discovery Grant of the Australian Research Council. David Veredas acknowledges the financial support of the IAP P6/07 contract, from the IAP program (Belgian Federal Scientific Policy) ‘‘Economic policy and finance in the global economy’’. Charles Mathias, Marc Hallin and David Veredas are also members of ECORE, the association between CORE and ECARES. Any remaining errors and inaccuracies are ours. References Acharya, V.V., Pedersen, L.H., 2005. Asset pricing with liquidity risk. Journal of Financial Economics 77, 375–410. Alessi, L., Barigozzi, M., Capasso, M., 2009. Estimation and forecasting in large datasets with conditionally heteroskedastic dynamic common factors. ECB Working Paper 1115. Amihud, Y., 2002. Illiquidity and stock returns: cross-section and time-series effects. Journal of Financial Markets 5, 31–56. Amihud, Y., Mendelson, H., 1986. Asset pricing and the bid-ask spread. Journal of Financial Economics 17, 223–249. Bai, J., Ng, S., 2002. Determinig the number of factors in approximate factor models. Econometrica 79, 191–221. Chordia, T., Roll, R., Subrahmanyam, A., 2000. Commonality in liquidity. Journal of Financial Economics 56, 3–28. Connor, G., Korajczyk, R., 1986. Performance measurement with the arbitrage pricing theory: a new framework for analysis. Journal of Financial Economics 15, 373–394. Domowitz, I., Wang, X., 2002. Liquidity, liquidity commonality and its impact on portfolio theory. Penn State University Working Paper.
Dong, J., Kempf, A., Yadav, P.K., 2007. Resiliency, the neglected dimension of market liquidity: empirical evidence from the New York Stock Exchange. Available at SSRN. Eckbo, B.E., Norli, O., 2002. Pervasive Liquidity Risk. Working Paper available at SSRN. Eichler, M., Motta, G., von Sachs, R., 2011. Fitting dynamic factor models to nonstationary time series. Journal of Econometrics 163, 51–70. Forni, M., Lippi, M., 2001. The generalized dynamic factor model: representation theory. Econometric Theory 17, 1113–1141. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. The Review of Economics and Statistics 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2004. The generalized dynamic factor model: consistency and rates. Journal of Econometrics 119, 231–255. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2005. The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association 100, 830–840. Hallin, M., Liška, R., 2007. The generalized dynamic factor model: determining the number of factors. Journal of the American Statistical Association 102, 103–117. Hallin, M., Liška, R., 2011. Dynamic factors in the presence of blocks. Journal of Econometrics 163, 29–41. Harris, L., 2003. Trading and exchanges. In: Market Microstrucrure for Practitioners. Oxford University Press, Oxford. Hasbrouck, J., Seppi, D., 2001. Common factors in prices, order flows, and liquidity. Journal of Financial Economics 59, 383–411. Korajczyk, R.A., Sadka, R., 2008. Pricing the commonality across alternative measures of liquidity. Journal of Financial Economics 87, 45–72. Kyle, A., 1985. Continuous auctions and insider tradings. Econometrica 53, 1315–1335. Minguet, A., 2003. La microstructure des Marchés d’Actions. Economica. O’Hara, M., 1998. Market Microstructure Theory. Wiley, New York. Pastor, L., Stambaugh, R.F., 2003. Liquidity risk and expected stock returns. Journal of Political Economy 111, 642–685. Schwartz, R., 1993. Reshaping equity markets: a guide for the 1990’s. Business One Irwin, Homewood, Illinois. Stock, J., Watson, M., 2002a. Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–162. Stock, J., Watson, M., 2002b. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 47, 1167–1179.
Journal of Econometrics 163 (2011) 51–70
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Fitting dynamic factor models to non-stationary time series Michael Eichler a,∗ , Giovanni Motta a , Rainer von Sachs b a
Department of Quantitative Economics, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands
b
Institut de statistique, Université catholique de Louvain, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium
article
info
Article history: Available online 16 November 2010 JEL classification: C14 C32 Keywords: Approximate factor models Local stationarity Principal components
abstract Factor modelling of a large time series panel has widely proven useful to reduce its cross-sectional dimensionality. This is done by explaining common co-movements in the panel through the existence of a small number of common components, up to some idiosyncratic behaviour of each individual series. To capture serial correlation in the common components, a dynamic structure is used as in traditional (uni- or multivariate) time series analysis of second order structure, i.e. allowing for infinite-length filtering of the factors via dynamic loadings. In this paper, motivated from economic data observed over long time periods which show smooth transitions over time in their covariance structure, we allow the dynamic structure of the factor model to be non-stationary over time by proposing a deterministic time variation of its loadings. In this respect we generalize the existing recent work on static factor models with time-varying loadings as well as the classical, i.e. stationary, dynamic approximate factor model. Motivated from the stationary case, we estimate the common components of our dynamic factor model by the eigenvectors of a consistent estimator of the now time-varying spectral density matrix of the underlying data-generating process. This can be seen as a time-varying principal components approach in the frequency domain. We derive consistency of this estimator in a ‘‘double-asymptotic’’ framework of both cross-section and time dimension tending to infinity. The performance of the estimators is illustrated by a simulation study and an application to a macroeconomic data set. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Factor modelling plays an important role in the analysis of high-dimensional multivariate time series. This is based on a clear empirical observation made for many applications, foremost in economics and finance, but also in various other fields such as psychometrics or biomedical signal processing. It can be observed that often only a small number of (latent, i.e. unobservable) factors is sufficient to explain a certain common behaviour of the second-order structure of a large time series panel, up to an idiosyncratic behaviour of each individual series to be crosssectional uncorrelated with the rest. Moreover, from a statistical, or even a data processing point of view, in view of the really highdimensional data sets available nowadays in the aforementioned applications, there is an obvious need to reduce the cross-section dimension to a much smaller factor space dimension to overcome the estimation curse of dimensionality. Furthermore in many of the given applications where data are observed over long time periods (or on a sufficiently fine time resolution), it appears that these data exhibit some time variation in their serial variance–covariance structure. This is plausible if one
∗
Corresponding author. E-mail address:
[email protected] (M. Eichler).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.007
wants to model, for example, macroeconomic data in the presence of transitions between economic recessions and booms or external events such as the impact of political crises or changes in the monetary policy of central banks. Consequently, it seems certainly restrictive to assume that the common structure explained by fitting a latent factor model to the data remains constant over time. These observations have motivated us to develop a fairly general dynamic factor model which takes the time evolution of the underlying serial second-order structure of the data into account by allowing for time-varying factor loadings: the common components are dynamic and non-stationary at the same time through a model that lets the filters become (smooth) functions of time. With this approach, we provide for a true generalization of two classes of existing factor models in the literature: static factors with time-varying loadings recently developed by Motta et al. (2011), where no serial correlation has been allowed in the common components; and stationary, i.e. time-constant, dynamic factor models with a dynamic structure using (possibly infinite-length) filters of the factors via dynamic loadings, such as in Forni et al. (2000). In the existing yet rather recent literature on non-stationary factor modelling, there exist a variety of different approaches: random, deterministic or a combination of both. Molenaar et al. (1992) just treat non-stationarity in the mean by a linear
52
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
time dependent trend function. Del Negro and Otrok (2008) developed a dynamic factor model with time-varying factor loadings following a random walk and with stochastic volatility in both the latent factors and idiosyncratic components. The situation where breaks occur at random time points has already been covered by Fancourt and Principe (1998) with a model based on piecewise constant loadings. A recent approach of Stock and Watson (2008) investigates the effect of structural instabilities on the forecasting ability of a dynamic factor model with a possible abruptly changing parameter in an autoregressive factor specification. Pan and Yao (2008), finally, choose again a different approach on estimating the factor loading space by a stepwise optimization algorithm on expanding the ‘‘white noise space’’: this allows for a generalization of the traditional factor approach without specifying any (parametric or non-parametric) structure on the departure from stationarity in the autocorrelation structure between the observations and the unobserved factors. In contrast, in this paper we consider smooth evolutions of the dynamics of the process with a non-stationarity which is purely deterministic, and which allows to include the stationary model as a true submodel of our more general model: the factor loadings would then simply become functions which are constant over time. Our main motivation comes from the empirical evidence in economic data observed over long time periods, which show smooth transitions over time in the covariance structure of an observed multivariate time series. To formalize now a bit more what is behind a factor model let us state the main idea that a very large number N of series can be explained by a small number r of factors. The behaviour of the observed N-dimensional stochastic process Y is driven by two components: the common component X describes the comovements of all the series, the idiosyncratic component Z is specific to each particular series YN (t ) = 3N F (t ) + ZN (t ) = XN (t ) + ZN (t ),
(1.1)
where X and Z are both unobserved. The N-dimensional common component XN (t ) is a linear combination of the latent r-dimensional vector of factors F (t ), which drives the joint behaviour of all the series and is ‘loaded’ by the N × r matrix of loadings 3N . Model (1.1) is static in the sense that Y at time t depends only on the value at time t of F . The main task in factor analysis is separating the common components from the specific ones. Chamberlain and Rothschild (1983) defined the class of approximate factor models as a sequence of factor models where the covariance matrix 6ZN of the idiosyncratic components is a sequence of non-diagonal matrices with uniformly bounded eigenvalues. Principal components regression (PCR) is a consistent estimation technique for approximate factor models under the assumption that only the largest r of the N eigenvalues of the covariance matrix 6N of the observations are unbounded as N → ∞. A sufficient condition for such a behaviour of the eigenvalues of 6N is that the eigenvalues of 6ZN are uniformly bounded, i.e. the eigenvalues of 6ZN stay bounded as N → ∞ for all N ∈ Z. Under this assumption on the behaviour of the eigenvalues and very mild conditions on the loadings, the factors, and the idiosyncratic components of a static factor model, Bai (2003) derived the convergence of the estimated common components XN (t ) to the common components XN (t ) of model (1.1), where the estimator XN (t ) is based on the N × r matrix containing the eigenvectors corresponding to the largest r eigenvalues of the sample covariance matrix of XN (t ). If the true data generating process is dynamic, that is, in the presence of serial correlation in the common driving force, a shortcoming of static factor analysis is that one can end up with a very high number of static principal components. This
motivated Forni et al. (2000) to generalize model (1.1) to the dynamic case YN (t ) = 9N (B)F (t ) + ZN (t ) = XN (t ) + ZN (t ).
(1.2)
Here, the one-sided filters 9ij (B) – with B being the back-shift operator – have square summable coefficients and the factors F (t ) are orthonormal white noise. Again, no mutual orthogonality of the idiosyncratic components is imposed, and the separation from the common components is achieved by assuming that only r of the eigenvalues of the spectral density 6N (ω) of YN (t ) diverge as N → ∞. The authors of Forni et al. (2000) use PCR in the frequency domain to construct weakly consistent estimators of the common components. This approach requires consistent estimation of the spectral density matrix of the data-generating process. For the statistical properties of dynamic principal components, we also refer to Brillinger (1981). In this paper now, we generalize the factor model in (1.1) further by allowing the second-order structure of the dynamic factor model to change smoothly over time. More precisely, we consider processes of the form YN (t ) = 9N (t , B)F (t ) + ZN (t ) = XN (t ) + ZN (t ),
(1.3)
where the loadings 9N (t , B) – dynamic filters as in the model of Forni et al. (2000) – depend now on time. Additionally, the idiosyncratic components ZN (t ) are also allowed to be nonstationary with time-varying dynamics. Consequently the spectral density matrix 6N (t , ω) of the process YN becomes dependent on time, and in order to estimate it, we use a time-localized empirical spectrum (a localized periodogram over time). Our contribution is the practical and theoretical treatment of this localized PCR in the frequency domain. We show that, under simultaneous asymptotics (both N → ∞ and T → ∞), the PCR based on a consistent estimator of 6N is consistent for the (dynamic and) non-stationary common components. As such, the development of a rigorous asymptotic theory for consistently estimating the time-varying loadings has been possible by embedding this model into the framework of locally stationary processes as derived by a series of papers by Dahlhaus (1996, 1997, 2000). We note that our approach is also related to the time-varying static approach of Motta et al. (2011), which generalizes the static factor model of Bai (2003). Our paper is organized as follows. In Section 2 we introduce a general non-stationary dynamic factor model and explain the proposed methods to estimate the non-stationary common components. In Section 3 we show that for an increasing panel size the principal components of the time-varying spectral matrix converge to the true common components (see Section 3.3). Furthermore we show the consistency of the estimated common components, that is the convergence of the principal components of the estimated time-varying spectral matrix to the principal components of the true time-varying spectral matrix (see Section 3.5). The finite sample behaviour of our method is investigated with a simulation study in Section 4, while in Section 5 we compare the application of different factor models to macroeconomic data. Section 6 concludes and mentions the main open problems that are left for future research. Throughout the paper we use bold-unslanted letters for matrices, bold-slanted letters for vectors and unbold (normal) letters for scalars. For a real number x, ⌊x⌋ denotes the largest integer smaller or equal to x. For a matrix A, the trace and the conjugate√transpose are denoted by tr(A) and A∗ , respectively, ‖A‖ = tr(AA∗ ) is the Euclidean norm, and In is the identity matrix of dimension n. One important tool will be the convolution of time-varying linear filters. For a given point in time t, let 8(t , B) and 9(t , B)
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
be two time-varying one-sided filters. Then the convolution (8 ⋆ 9)(t , B) of the two filters is defined by
(8 ⋆ 9)(t , B) =
∞ ∞ − − k=0
8j (t )9k−j (t ) B . k
j =0
The obtained filter (8 ⋆ 9)(t , B) has the transfer function
(8 ⋆ 9)(t , ω) = 8(t , ω)9(t , ω), where, for ease of notation, we use 8(t , B) and 8(t , ω) to denote, respectively, the filter in the time-domain and the transfer function in the frequency domain, while 8k (t ) denote the filter coefficients.
53
time-varying moving average representation of the form ZN (t ) = ϒ N (t , B)εN (t ), where εN (t ) is a stationary white noise process with mean E(εN (t )) = 0 and covariance matrix E(εN (t )εN (t )′ ) = IN . Then YN (t ) is obtained from the r + N-dimensional stationary process (F (t ), εN (t )) by application of the time-varying linear filter (9N (t , B), ϒ N (t , B)). Under the additional assumption that the time-varying linear filters 9N (t , B) and ϒ N (t , B) are square summable, this leads to the time-varying spectral representation YN ( t ) =
∫
π
eiωt 9N (t , ω)dξ (ω) + F
Y (t ) = X (t ) + Z (t ) = 9(t , B)F (t ) + Z (t )
=
∞ −
9k (t )F (t ) + Z (t )
(2.1)
k=0
for 1 ≤ t ≤ T , where X (t ) is the common component originating from the common factors F (t ) while Z (t ) is the idiosyncratic component. We assume that the model satisfies the following conditions: (i) F (t ) is an r-dimensional stationary white noise process with E(F (t )) = 0 and E(F (t )F (t )′ ) = Ir ; (ii) 9(t , B) = (ψij (t , B))i∈N,j=1,...,r is a time-varying one-sided linear filter of dimension N × r; (iii) Z (t ) = (Zi (t ))i∈N is uncorrelated to the factor process F (t ). Non-stationarity enters the model in two ways: First, the loadings 9(t , B), which determine the influence of the factor process F (t ) on the given process, are time-varying, that is, the influence of the factors may change over time. Second, the idiosyncratic component Z (t ) may also be non-stationary with time-varying dynamics. Here, we call a process Z (t ) idiosyncratic if lim E‖AN (B)Z (t )‖2 = 0
N →∞
for 1 ≤ t ≤ T and all sequences of filters AN (B), N ∈ N, such that
∫
π
lim
N →∞
ε
where ξ (ω) and ξ N (ω) are the orthogonal increment processes associated with the stationary processes F (t ) and εN (t ), respectively. The corresponding time-varying spectral density is given by F
Let Yi (t ), i ∈ N, be a panel of non-stationary time series at time points 1 ≤ t ≤ T that are driven by a few common factors. Following Forni et al. (2000), we assume that the time series can be described by an approximate dynamic factor model, that is, the idiosyncratic components are allowed to be correlated to some extent. Thus we consider the model
‖AN (ω)‖ dω = 0. 2
−π
Notice that in contrast to the stationary case in Forni and Lippi (2001) Z (t ) is non-stationary and only defined over the period 1 ≤ t ≤ T . For the above definition of idiosyncratic processes, we therefore set Z (t ) = 0 for t < 1 or t > T . The factor process F (t ) is assumed to be a stationary orthonormal white noise process as in Forni et al. (2000). This is, however, not a serious constraint. For example, suppose that F (t ) is a non-stationary factor process with time-varying moving average representation F (t ) = 4T (t , B)η(t ), where η(t ) is a stationary orthonormal white noise process and 4T (t , B) is a square-summable time-varying one-sided linear filter. Then the common component X (t ) can be rewritten as X (t ) = 9(t , B)4(t , B)η(t ) = 9(t , B)η(t ). Thus the process can be represented as in (2.1) with factor process
F (t ) = η(t ) satisfying the above assumptions. For the idiosyncratic component Z (t ), we assume further that for any N ∈ N the subprocess ZN (t ) = (Z1 (t ), . . . , ZN (t ))′ has a
ε
eiωt ϒ N (t , ω)dξ N (ω),
−π
−π
2. A non-stationary dynamic factor model
π
∫
6N (t , ω) = 9N (t , ω)9N (t , ω)∗ + ϒ N (t , ω)ϒ N (t , ω)∗ . Assuming that the factor loadings and thus the dynamics of the process change slowly over time, we can treat the process as if it were stationary over small time intervals and estimate the spectral matrix locally. Then we can proceed for each time-point t as in the stationary case. More precisely, let 6N (t , ω) be a consistent estimator of the time-varying spectral matrix 6N (t , ω); for details on the estimation of time-varying spectral matrices we refer to Appendix A. Then 6N (t , ω) has a spectral decomposition
6N (t , ω) = PN (t , ω)∗ 3N (t , ω) PN (t , ω), where 3N (t , ω) = diag( λN1 (t , ω), . . . , λNN (t , ω)) is the diagonal matrix containing the eigenvalues of 6N (t , ω) in descending order of magnitude and PN (t , ω) is the unitary N × N matrix whose i-th row is the row eigenvector of 6N (t , ω) corresponding to the i-th eigenvalue λNi (t , ω). As in the stationary case of Forni et al. (2000), we estimate the factor space or, equivalently, the space spanned by the common components by the space spanned by the eigenvectors corresponding to the r largest eigenvalues of 6N (t , ω). In the frequency domain this is accomplished by the projection
N (t , ω) = 8 PN (t , ω)∗ QrN PN (t , ω), where QrN is the block matrix given by QrN =
Ir 0
0 . 0
By inverse Fourier transformation, we obtain the corresponding projection filter in the time-domain, ∞ −
(∞)
N (t , B) = 8
N ,k (t )Bk , 8
k=−∞
N ,k (t ) are given by where the filter coefficients 8 N ,k (t ) = 8
∫
π
N (t , ω)eiωk dω. 8
−π
Since this is a two-sided filter of infinite length, it needs to be truncated before it can be applied to the data. Let (MT )
NT (t , B) = 8 N 8
(t , B) =
MT −
N ,k Bk 8
k=−MT
be the truncated filter. Then the common components XN (t ) are estimated by
NT (t , B)YN (t ). XN (t ) = 8 The estimation method is a localized version of the method proposed by Forni et al. (2000). However, the non-stationarity
54
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
gives rise to some questions. First, while working in the frequency domain, we could restrict ourselves to quantities at some point in time. However, the estimated spectral matrix is obtained from data that are only close to this time point and thus have different dynamics. Second, the filter is applied in the time domain and thus again to data within some time range about t. Thus, although the estimation method for the common component XN (t ) is a straightforward extension of the estimation method in the stationary case, its properties are unclear and it is not obvious that the theoretical results from Forni et al. (2000, 2004, 2005) carry over easily. In the next section we introduce an appropriate asymptotic framework and show that the estimated common components are indeed consistent. Furthermore, in Section 4, we investigate the finite sample behaviour of the common component estimator by simulation. The above method for recovering the common component works only for the ‘‘central part’’ of the observation period since the two-sided filters use information from the past as well as the future of the process. In contrast, at or close to the boundaries of the observation period, the projection filters must be adapted appropriately to include only observed and hence past (respectively future) values of the series. This would lead to an approach similar to Forni et al. (2005), which will be investigated in a future research project. Finally, we note that our approach does not work well for time series that exhibit certain types of non-stationary behaviour such as deterministic or stochastic trends even though these may be covered by the theoretical framework we have described. An integrated process, for example, indeed permits a time-varying moving average representation satisfying the above assumptions. The changes in the corresponding linear filters and thus in the dynamics of the process, however, are too fast and make a localized estimation of the time-varying spectrum infeasible. For a successful application of the method, the time series should show locally an approximately stationary behaviour. This is also the key to the consistency result in the next section.
For fixed T , the model in (2.1) can be embedded into the above evolutionary model by setting 9N (t , B) = 9N ( Tt , B) and ϒ N (t , B) = ϒ N ( Tt , B). As the number of observations T increases, we obtain locally more and more observations with approximately the same dynamics provided that, e.g., the linear filters 9N ( Tt , ω) and ϒ N ( Tt , ω) are slowly varying. We note that in practice we observe YNT (t ) only for one given T . Thus the above evolutionary model represents a purely theoretical device for evaluating large-sample properties of estimators for non-stationary processes. Furthermore, we note that the evolutionary form of the model does not include processes with non-stationary factor processes. Such processes could be covered by considering dynamic factor models based on locally stationary processes (Dahlhaus, 1996). For ease of notation, we consider in this paper only evolutionary processes but all results can be similarly derived for general locally stationary processes. We now give the main assumptions on our asymptotic model.
3. Asymptotic behaviour of the estimated common components
The common components XNT (t ) and the idiosyncratic components ZNT (t ) are evolutionary processes in the sense that they admit moving with time-varying coeffi average representations cients 9N ,k Tt and ϒ N ,k Tt , respectively, that are functions of rescaled time Tt ∈ [0, 1]. As in the previous section, this leads to spectral representations
In this section, we prove consistency of the estimated common components. Since we are dealing with non-stationary processes, we have to specify what is meant by asymptotic behaviour. To this end, we first define an evolutionary dynamic factor model, which provides a theoretical framework for the asymptotics.
Assumption 1 (Evolutionary Dynamic Factor Model). YNT (t ), 1 ≤ t ≤ T , with T , N ∈ N is a family of stochastic processes given by (3.1) and satisfying the following conditions: (i) the factor process F (t ) is an r-dimensional stationary white noise process with E(F (t )) = 0 and E(F (t )F (t )′ ) = Ir ; (ii) the coefficients 9N ,k (u) and ϒ N ,k (u) of the time-varying linear filters 9N (u, B) and ϒ N (u, B), respectively, are square summable uniformly in u ∈ [0, 1] ∞ −
sup
u∈[0,1] k=−∞
∞ −
sup
u∈[0,1] k=−∞
‖9N ,k (u)‖2 < ∞ and ‖ϒ N ,k (u)‖2 < ∞;
(iii) the factor process F (t ) and the idiosyncratic errors εN (t ) are orthogonal at all leads and lags, that is, E[εN (t )F (t − k)′ ] = 0 for all N ∈ N and t , k ∈ Z.
XNT (t ) =
π
∫
9N
T
−π
3.1. An asymptotic model
and
In contrast to the non-stationary factor model in the previous section, we study now a family of non-stationary panel time series YNT (t ) = (Y1T (t ), . . . , YNT (t ))′ , 1 ≤ t ≤ T , which is indexed by the number of observations T and the number of cross-sectional variables N. Instead of defining a dynamic factor model for each T simply by (2.1), which would correspond to ordinary asymptotics where the same process is observed over longer time periods, we assume that the processes YNT (t ), T ∈ N, are derived all from the same filter functions for the factor loadings and the idiosyncratic components. Thus, we assume that for T , N ∈ N the process YNT (t ) is given by YNT (t ) = 9N
t T
ZNT (t ) = ϒ N
t
, B F (t ) + ZNT (t ),
T
, B εN (t ),
ZNT (t ) =
∫
1 ≤ t ≤ T,
1 ≤ t ≤ T.
π
ϒN
t
, ω eiωt dξ εN (ω),
T
−π ε
where ξ (ω) and ξ N (ω) are the orthogonal increment processes associated with the stationary processes F (t ) and εN (t ), respectively. Then XNT (t ) has the time-varying spectral density matrix F
6XN
t T
, ω = 9N
t T
, ω 9N
t T
,ω
∗
,
where 9N (u, ω) is the time-varying transfer function
9N (u, ω) = (3.1a)
1
∞ −
2π k=−∞
9N ,k (u)e−iωk .
Similarly, the idiosyncratic component ZN (t ) has the time-varying spectral density matrix
with idiosyncratic component
, ω eiωt dξ F (ω)
t
(3.1b)
6ZN
t T
, ω = ϒN
t T
, ω ϒN
t T
,ω
∗
,
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
where ϒ N (u, ω) is the time-varying transfer function associated with the filter ϒ N (u, B). It follows that the process YNT (t ) has the time-varying spectral matrix
6N (u, ω) = 6XN (u, ω) + 6ZN (u, ω)
= 9N (u, ω)9N (u, ω)∗ + ϒ N (u, ω)ϒ N (u, ω)∗ .
(3.2)
We note that unlike for the non-asymptotic model (2.1), the spectral matrix 6N (u, ω) for fixed N ∈ N is uniquely determined by the family YNT (t ), T ∈ N. Finally, we remark that although evolutionary processes are special cases of locally stationary processes (Dahlhaus, 2000, Remark 2.2) the process YNT (t ) as defined above is not locally stationary in the strict sense of Dahlhaus (1996, 2000) as it does not have a time-varying moving average representation in terms of an N-dimensional white noise process εN (t ). Nevertheless, most results on locally stationary processes can be simply extended also to the above class of processes (see Appendix A.4). Assumption 1(ii) on the filter coefficients implies that the diagonal entries of the spectral matrix 6N (u, ω) = (σij (u, ω))i,j=1,...,N are uniformly bounded for u ∈ [0, 1] and ω ∈ [−π, π], that is, for all N ∈ N, there exists a constant σ i > 0 such that sup
sup
u∈[0,1] ω∈[−π,π ]
σii (u, ω) ≤ σ i .
(3.3)
In order to obtain sensible estimates for the spectral density matrix 6N (u, ω), we require that the dynamics of the process change only slowly over time and that accordingly the process can be viewed as approximately stationary over short time intervals. This can be achieved by a suitable degree of smoothness of the time-varying spectral matrix. This is formalized in the next assumption. Assumption 2 (Smoothness of Spectral Matrix and Transfer Functions). (i) The time-varying spectral density 6N (u, ω) is two times continuously differentiable for all u ∈ [0, 1] and ω ∈ [−π , π]. (ii) The time-varying transfer functions 9N (u, ω) and ϒ N (u, ω) are Lipschitz continuous in u ∈ (0, 1). Next, we note that Assumption 1(iii) implies that the factors F (t ) and the idiosyncratic components ZNT (t ) are orthogonal. The orthogonality is important to ensure the identifiability of the common and the idiosyncratic components. Note that, since XNT (t ) and ZNT (t ) are latent processes, a representation with orthogonal components can always be achieved by considering appropriate projections. For the separation of the common component XNT (t ) and the idiosyncratic component ZNT (t ), we also require the following assumption on the behaviour of the eigenvalues of the spectral matrices 6XN (u, ω) and 6ZN (u, ω). Assumption 3 (Common and Idiosyncratic Dynamic Eigenvalues). Let λXNj (u, ω) and λZNj (u, ω), 1 ≤ j ≤ N, be the time-varying eigenvalues of 6XN (u, ω) and 6ZN (u, ω), respectively, ordered in descending order of magnitude.
(i) The first r common time-varying dynamic eigenvalues λXNj (u, ω), j = 1, . . . , r, diverge uniformly in u ∈ [0, 1] as N increases: for j = 1, . . . , r lim
inf λXNj (u, ω) = ∞ a.e. in [−π , π].
N →∞ u∈[0,1]
(ii) The first idiosyncratic time-varying dynamic eigenvalue λZN1 (u, ω) is uniformly bounded, that is, there exists a positive constant λZ such that λZN1 (u, ω) ≤ λZ for all u ∈ [0, 1], ω ∈ [−π , π], and N ∈ N.
55
The following proposition shows that, under these assumptions on the process, the time-varying spectrum, and the eigenvalues, only r eigenvalues of the spectral density matrix of the observations diverge as N increases while the remaining N − r stay bounded. Thus the proposition generalizes Proposition 1 of Forni et al. (2000) to the non-stationary case. Proposition 1. Under Assumptions 1–3, the first r time-varying dynamic eigenvalues of 6N (u, ω) diverge, as N → ∞, uniformly over u ∈ [0, 1], that is, for j = 1, . . . , r lim
inf λNj (u, ω) = ∞ a.e. in [−π , π].
N →∞ u∈[0,1]
The remaining eigenvalues are uniformly bounded by λZ , that is, for j>r lim sup N →∞
sup λNj (u, ω) ≤ λZ .
sup
ω∈[−π ,π ] u∈[0,1]
Proof. See Appendix B.1.
The assumption is in line with empirical evidence that for many panel time series only few eigenvalues diverge as the cross-sectional dimension increases while the others seem to be bounded. This fact could be, for example, exploited for selecting the dimension r of the factor process (Hallin and Li˘ska, 2007). An important consequence of the previous proposition is that the processes ZNT (t ) are indeed idiosyncratic, that is, for all sequences of filters AN (B) such that the sequences of integrals π 2 −π ‖AN (ω)‖ dω tend to zero as N tends to infinity, the filtered processes AN (B)ZNT (t ) converge to zero in mean square. Corollary 2. Under the assumptions of the proposition, the processes ZNT (t ), T ∈ N, are idiosyncratic. Proof. See Appendix B.2.
3.2. Decomposition of the overall estimation error Let 6N (u, ω) be a consistent estimator of the spectral matrix 6N (u, ω), and let
6N (u, ω) = PN (u, ω)∗ 3N (u, ω) PN (u, ω) its spectral decomposition. Then, as described in the previous section the common component XNT (t ) can be estimated by
NT XNT ,N (t ) = 8
t T
, B YNT (t ),
(3.4)
where for u ∈ [0, 1]
NT (u, B) = 8
MT −
N ,k (u)Bk 8
k=−MT
is the truncated time domain filter obtained from the transfer function
N (u, ω) = 8 PN (u, ω)∗ QrN PN (u, ω). The main objective of this section is to prove the consistency of the common component estimator XNT ,N (t ). As in Forni et al. (2000), the proof of consistency is based on a decomposition of the overall estimation error ‖ XNT ,N (t ) − XNT (t )‖. Whereas in the stationary case covered by Forni et al. (2000) it is sufficient to decompose the error in an approximation and an estimation error, the non-stationary case discussed in this paper imposes additional problems, which lead to a decomposition into four separate errors: approximation, truncation, filtering, and estimation errors. In order to motivate the decomposition, we note that the estimation of the common components is based on a principal
56
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
component analysis in the frequency domain. Since in contrast to the stationary case the spectral decomposition and thus the dynamic principal components vary over time, the resulting projection onto the space spanned by the first r principal components must be applied to the process at time t, which can be accomplished in the frequency domain but not in the time domain. More precisely, let
6N (u, ω) = PN (u, ω)∗ 3N (u, ω)PN (u, ω)
(3.5)
be the spectral decomposition of the true spectral matrix 6N (u, ω). Here 3N (u, ω) = diag{λN1 (u, ω), . . . , λNN (u, ω)} is the diagonal matrix containing the eigenvalues of 6N (u, ω) and PN (u, ω) is the N × N matrix whose j-th row PNj (u, ω) = [pNj,1 (u, ω), pNj,2 (u, ω), . . . , pNj,N (u, ω)]
(ii) truncation error: ‖X NT ,N (t ) − XNT ,N (t )‖, where
+ (8NT
can be viewed as a projection operator projecting the frequency components onto the space spanned by the largest r eigenvectors of 6N (u, ω). Applying this operator in the frequency domain, we obtain the following approximation XNT ,N (t ) of the true common component XNT (t ), XNT ,N (t ) =
∫
π
e
iωt
−π
∫
π
+
t
t
, ω 9N , ω dξ (ω) T t t eiωt 8N , ω ϒN , ω dξ εN (ω). 8N
F
T
T
−π
T
To find an expression in terms of linear filters, let (PN ⋆ 9N )(u, B) be the convolution of the two filters PN (u, B) and ϒ N (u, B),
(PN ⋆ 9N )(u, B) =
∞ − k=0
8N ,j (u)9N ,k−j (u) Bk ;
j=−∞
(8N ⋆ ϒ N )(u, B) is similarly defined. Then XNT ,N (t ) can be also written as XNT ,N (t ) = (8N ⋆ 9N )
t T
+ (8N ⋆ ϒ N )
, B F (t )
t T
, B εN (t ).
(3.7)
The approximation XNT ,N (t ) of the common component XNT (t ) is of purely theoretical interest as it cannot be computed from 6N ( Tt , ω) for two reasons: first, the filter 8N ( Tt , ω) cannot be applied pointwise as suggested above since this would require knowledge of the filters 9N (u, B) and ϒ N (u, B) and of the latent stationary processes F (t ) and εN (t ). Instead the filter needs to be applied to the observed data YNT (t ) in the time domain. Unlike for stationary processes, application of filters in the time and in the frequency domain differs for non-stationary processes as will be shown in Section 3.4. Second, the involved filters are of infinite length and thus must be truncated before they can be applied to the data. This suggests decomposing the overall estimation error ‖ XNT ,N (t ) − XNT (t )‖ as follows: (i) approximation error: ‖XNT ,N (t ) − XNT (t )‖, where XNT ,N (t ) is given by (3.7);
T
8N ,k (u)Bk ;
k=−MT
(iii) filtering error: ‖ XNT ,N (t ) − X NT ,N (t )‖, where
t , B 9N , B F (t ) T T t t + 8NT , B ϒN , B εN (t ) T T t = 8NT , B YNT (t );
XNT ,N (t ) = 8NT
t
T
is obtained by application of the filter 8NT (·, B) in the time domain; (iv) estimation error: ‖ XNT ,N (t ) − XNT ,N (t )‖, where XNT ,N (t ) is the common component estimator in (3.4). 3.3. Approximation error We start our discussion of the approximation error ‖XNT ,N (t ) − XNT (t )‖ defined in the previous section by giving a more detailed account on how XNT ,N (t ) approximates the true common component XNT (t ). This will also show how the non-stationary case differs from the stationary case. As noted before, a key role in the recovering of the common components is played by the dynamic principal components. In analogy to Forni et al. (2000), one might consider defining time-varying principal components by FNT (t ) = PN
∞ −
MT −
8NT (u, B) =
equals the row eigenvector of 6N (u, ω) corresponding to the j-th largest eigenvalue λNj (u, ω). The vectors PNj (u, ω) are called the time-varying dynamic eigenvectors of 6N (u, ω). Then the transfer function
8N (u, ω) = PN (u, ω) QN PN (u, ω)
t
is obtained from the truncated filter
(3.6)
∗ r
, B F (t ) T t ⋆ ϒN ) , B εN (t )
X NT ,N (t ) = (8NT ⋆ 9N )
t T
, B YNT (t ),
t = 1, . . . , T .
There are, however, two problems with this approach. First, since the eigenvectors are time-varying, it is not guaranteed that the time-varying dynamic principal components thus defined are orthogonal at all leads and lags. Second, application of the filter PN ( Tt , B) will mix the time-varying dynamics of YNT (t ) at different points in time. The problems can be avoided by applying the filter PN ( Tt , B) in the frequency domain. Thus, we define the time-varying dynamic principal components process at rescaled time u ∈ [0, 1] by FN (u, s) = (PN ⋆ 9N )(u, B)F (s)
+ (PN ⋆ ϒ N )(u, B)εN (s),
s ∈ Z.
(3.8)
Notice that the whole process depends on some fixed u ∈ [0, 1]. From the process FN (u, s), we obtain the closed linear subspace of L2 (Ω , F , P ) spanned by the first r components of the process FN (u, s) by FNr (u) = sp{FNj (u, s), j = 1, . . . , r , s ∈ Z},
(3.9)
where FNj (u, s) is the j-th element of the N-dimensional vector FN (u, s). Since the dynamic principal components process FN (u, s) is generated from the spectral matrix 6N (u, ω) at a single point u in (rescaled) time, the dynamic principal components are orthogonal at all leads and lags. Then the true common component XNT (t ) can be approximated by the orthogonal projection of YNT (t ) onto FNr ( Tt ), that is, we have
XNT ,N (t ) = proj YNT (t )|
FNr
t
T
.
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
To see that XNT ,N (t ) equals the approximated common component defined in the previous section, we note that the process YNT (t ) can be decomposed as YNT (t ) = XNT ,N (t ) + ZNT ,N (t )
57
E|X iT ,N (t ) − XiT ,N (t )|2 π
2 φNTi t , ω − φNi t , ω T T −π 2 2 t t , ω + ϒ N ,ω × 9 N dω. ∫
≤
(3.10)
where
T
, B F (t ) t ′ r , B εN (t ) + (PN ⋆ (QN PN ) ⋆ ϒ N )
XNT ,N (t ) = (P′N ⋆ (QrN PN ) ⋆ 9N )
By Assumption 1(ii) the factors ‖9N ( Tt , ω)‖2 and ‖ϒ N ( Tt , ω)‖2 are uniformly bounded in ω and u. Furthermore, since ‖φNTi (u, ω)‖2 ≤ 1 as noted before, we have
T
T
π
∫
‖φNTi (u, ω) − φNi (u, ω)‖2 dω → 0
sup
and ZNT ,N (t ) = (P′N ⋆ [(IN − QrN )PN ] ⋆ 9N )
u∈[0,1]
t
, B F (t ) t ′ r + (PN ⋆ [(IN − QN )PN ] ⋆ ϒ N ) , B εN (t ).
−π
as N → ∞, which shows that the truncation error converges to zero in mean square. For the discussion of the filtering error, let
T
T
From the definitions of XNT ,N (t ) and ZNT ,N (t ), it immediately follows that
9
ϕNi
t T
,B
XNT ,N (t ) = [P′N (u, B)QrN FN (u, t )]u= t
T
is in the subspace FNr ( Tt ) whereas ZNT ,N (t ) is orthogonal to FNr ( Tt ). Therefore, XNT ,N (t ) = proj(YNT (t )|FNr ( Tt )) as required. The decomposition in (3.10) separates the dominant part XNT ,N (t ) from the residual one ZNT ,N (t ). The approximate common component XiT ,N (t ) reproduces the r-factor structure of the process, and it is given by a (filtered) projection of the data on the eigenvectors corresponding to the largest r eigenvalues of 6N (u, ω) XiT ,N (t ) = (φNi ⋆ 9N )
t T
, B F (t ) + (φNi ⋆ ϒ N )
t T
, B εN (t )
∗ φNi (u, B) = (PNi ⋆ (QrN PN ))(u, B)
(3.11)
and where the notation emphasizes that the population common components XiT ,N (t ) depends on N. The following proposition shows that such a projection is able to recover the common components of the process when N → ∞. Proposition 3. Suppose that Assumptions 1–3 hold. For all i ∈ N, the population approximate common component XiT ,N (t ) converges in mean square to the true common component XiT (t ) uniformly over 1 ≤ t ≤ T as N tends to infinity: lim sup sup E[XiT ,N (t ) − XiT (t )]2 = 0.
N →∞ T ∈N 1≤t ≤T
Proof. The proof is given in Appendix B.3.
t
= (φNTi ⋆ 9N ) ,B T M ∞ T t t 1 − − φNi,j 9N ,k−j Bk , = 2π k=0 j=−M T T T
and
ϕ9 NTi
t T
,B
= φNTi
t
t
, B 9N ,B T M ∞ T − t 1 − t −j φNi,j = 9N ,k−j Bk , 2π k=−0 j=−M T T T
T
ϒ
and define the filters ϕNTi (t , B) and ϕϒ Ni (t , B) analogously with 9N replaced by ϒ N . Then the filtering error can be written as
ϒ + [ ϕϒ NTi (t , B) − ϕNi (t , B)]εN (t ).
(3.12)
Notice that the filtering error vanishes if the two filters 9N (u, B) and ϒ N (u, B) do not depend on u, in which case the common and the idiosyncratic components are both stationary. Otherwise, the expression for filter coefficients of the second filter shows that the sequential application of time-varying linear filters in the time domain mixes the dynamics of different time points whereas application of filters in the frequency domain retains the pure dynamics at any specific point in time. Since the filter φNTi (u, B) has finite width MT , it will shrink asymptotically to the point u as T → ∞ provided that MT is of order o(T ). Therefore, the filtering error will vanish for large T given that the filters 9N (u, B) and ϒ N (u, B) are smooth enough. This is formalized in the following proposition.
3.4. Truncation and filtering error Next, we show the mean square convergence to zero of the filtering error and the truncation error defined in Section 3.2. We start with the truncation error. Let φNTi (u, ω) be the transfer function of the truncated filter MT −
9 ϕ9 XiT ,N (t ) − X iT ,N (t ) = [ NTi (t , B) − ϕNi (t , B)]F (t )
where the filter φ in time domain is given by
φNTi (u, B) =
T
t
Proposition 4. Suppose that Assumptions 1–3 hold. For fixed N and for all i = 1, . . . , N, the filtering error tends to zero in mean square uniformly over MT ≤ t ≤ T − MT as T tends to infinity: lim
sup
T →∞ M ≤t ≤T −M T T
E| XiT ,N (t ) − X iT ,N (t )|2 = 0,
(3.13)
where 3/2
φNi,k (u)Bk .
k=−MT
Noting that F (t ) and εN (t ) are orthonormal processes, we obtain for the truncation error by Parseval’s identity
MT → ∞ and
MT
T
→ 0 as T → ∞.
Proof. Since the processes F (t ) and εN (t ) are uncorrelated, we have
58
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
E[ XiT ,N (t ) − XiT ,N (t )]2
∫
π
= −π
for estimating the common components does not depend on the particular form of the spectral matrix estimator, we do not discuss the estimation in detail but simply impose the following condition on the spectral estimator 6N (u, ω). For further details on the estimation of 6N (u, ω) we refer to Appendix A.
2 9 t 9 dω ϕ ( t , ω) − ϕ , ω Ni NTi T
π
∫ +
−π
2 ϒ t ϒ dω. , ω ϕ ( t , ω) − ϕ Ni NTi
(3.14)
T
We show that the first term converges to zero uniformly in MT ≤ t ≤ T − MT as T → ∞; for the second term, convergence to zero follows by the same arguments. We have for MT ≤ t ≤ T − MT
Assumption 4 (Spectral Matrix Estimator). There exists a sequence uT with uT → 0 and TuT → ∞ as T → ∞ such that the estimator 6N (u, ω) of the spectral matrix 6N (u, ω) is uniformly consistent in u ∈ [uT , 1 − uT ] and ω ∈ [−π , π], that is, for all (fixed) N ∈ N sup
9 t t 9 ϕNTi T , ω − ϕNi T , ω [ ] MT 1 − t t −j t −i ω j = e 9N , ω − 9N ,ω φNi,j 2π j=−M T T T T MT 1 − φNi,j t 9N t − j , ω − 9N t , ω . ≤ 2π j=−M T T T T
Here, the second factor can be bounded by
9N t − j , ω − 9N t , ω ≤ C |j| T T T due to Lipschitz continuity of 9N in its first component on [0, 1]. With this we obtain the upper bound
MT t t t C MT − φNi,j T . ϕNTi T , ω − ϕNi T , ω ≤ 2π T j=−M T
By the Cauchy–Schwarz inequality, Parseval’s identity, and ‖φNTi (u, ω)‖2 ≤ 1, we have
sup
u∈[uT ,1−uT ] ω∈[−π ,π ]
‖ 6N (u, ω) − 6N (u, ω)‖ = Op (ρT−1 )
for some sequence ρT → ∞ as T → ∞. We note that Assumption 4 is fulfilled, for example, by the two spectral estimators 6N (u, ω) discussed in Appendix A; the required conditions are listed in Assumption 5 in the Appendix A. The above assumption also implies that the matrix PNT (u, ω) of the estimated eigenvectors and thus the estimated transfer funcφN (u, v) of the projection filters are uniformly consistent. tions Based on this uniform convergence, we show in the next proposition that the estimation error converges to zero in probability. Note that for this result to hold, an arbitrary slow rate ρT is sufficient in Assumption 4 as we do not aim at giving a rate of convergence for the estimation error. We also emphasize that for this result we keep N fixed while T tends to infinity. Proposition 5. Suppose that Assumptions 1–4 hold. In addition NT assume for the truncation parameter MT of the estimated filters 8 that MT → ∞ and MT /ρT → 0 as T → ∞. Then, for all δ > 0, 1 ≤ i ≤ N, and all N ∈ N, the estimation error satisfies lim
sup
P[| XiT ,N (t ) − XiT ,N (t )| > δ] = 0
∫ π 2 12 − t t φNi,j ≤ MT φNi ,ω dω T T −π j=−MT ≤ C MT ,
T →∞ M ∗ ≤t ≤T −M ∗
which implies
Remark 1. The proof of Proposition 5 does actually include how Assumption 4 implies derivation of a rate of uniform convergence to zero of the estimated eigenvectors and estimated transfer NT . For more details, we refer functions of the projection filters 8 to Appendix B.4.
T
MT
3/2 9 MT t t 9 , ω − ϕNi ,ω . ϕNTi ≤C T
T
T
where MT∗ = max{uT T , MT }. Proof. See Appendix B.4.
(3.15)
Thus, we have shown that the first term in (3.14) converges to zero as T tends to infinity. Since the second term can be treated similarly, this proves (3.13). The upper bound in (3.15) suffices also to show that the expectation
E[( XiT ,N (t ) − X iT ,N (t ))(X iT ,N (t ) − XiT ,N (t ))] converges to zero as T → ∞. Thus, under the conditions of the proposition, we also obtain the mean square convergence of the combined error due to filtering and truncation, that is, lim
sup
T →∞ M ≤t ≤T −M T T
E| XiT ,N (t ) − XiT ,N (t )|2 = 0,
T
(3.16)
3.6. Consistency of the estimated common components With the results of the previous Sections 3.3–3.5, we can now show the main result of this paper. The following theorem establishes the convergence in probability of the common components estimator XNT ,N (t ) to the true common components XNT (t ). Theorem 6. Suppose that Assumptions 1–4 hold. Then, for all ϵ > 0 and η > 0, there exists N ∗ ∈ N and TN∗ ∈ N, N ≥ N ∗ , such that for all N ≥ N ∗ and T ≥ TN∗
where MT satisfies the condition in the above proposition. sup 3.5. Estimation error In this section we establish the mean square convergence to zero of the estimation error (see Proposition 5). The first step in the estimation of the common components is the estimation of the time-varying spectral density 6N (u, ω). Since our method
MT∗ ≤t ≤T −MT∗
[
M P φNiT
t T
] , B YNT (t ) − XiT (t ) > ϵ ≤ η,
where MT∗ = max{uT T , MT } with uT and MT obeying the same conditions as in Assumption 4 and Proposition 5, respectively. Proof. In order to prove consistency of the estimated common components we decompose the overall estimation error as in
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Section 3.2, and for all t ∈ (MT∗ , T − MT∗ ) we consider the following time-varying probability
P[| XiT ,N (t ) − XiT (t )| > ϵ] ≤ P | XiT ,N (t ) − XiT ,N (t )| >
ϵ
1
3
ϵ 3
≤
η 3
for all N ≥ N ∗ uniformly over 1 ≤ t ≤ T and T ∈ N. Next, we have shown in Section 3.4 that the combined error due to filtering and truncation, | XiT ,N (t )− XiT ,N (t )| vanishes as T → ∞ uniformly over 1 ≤ t ≤ T for all N ∈ N. Then the uniform convergence in (3.16) implies that for all N ∈ N, ϵ > 0, and η > 0 there exists a T1,N such that
P | XiT ,N (t ) − XiT ,N (t )| >
ϵ 3
≤
η 3
for all T > T1,N . Finally, we have shown in Section 3.5 that for fixed N the estimation error tends to zero in probability as T tends to infinity. More precisely, it follows from Proposition 5 that for all ϵ > 0, η > 0, and N > N ∗ there exists T2,N ∈ N such that for all T > T2,N
ϵ
P |XiT ,N (t ) − XiT (t )| >
3
≤
η 3
.
Altogether, we find that for all N > N ∗ and all T > max{T1,N , T2,N }
P[| XiT ,N (t ) − XiT (t )| > ϵ] < η, which concludes the proof.
4. Simulation results
(2)
YNT (t ) = XNT (t ) + XNT (t ) + ZT (t ),
t = 1, . . . , T , (1)
(4.1)
XiT (t ) =
p −
aij,k
k=1
t
T
(j)
XiT (t − k) + Fj (t )
Let XiT (t ) be the (i, t ) entry of XN ,T −2L (i = 1, . . . , N, t = L, . . . , T − L) obtained from the q-th realization of the common component XiT (t ) simulated from the locally stationary AR(p) (q) model in (4.1)–(4.2), with r = p = 2, and define XiT (t ) as the (q)
(2)
estimator of XiT (t ) based on (3.4). For q = 1, . . . , Q , define the following loss function
(4.2)
(q) (q) (q) ∆NT (β) = [(T − 2L + 1)]−β ‖ XN ,T −2L − XN ,T −2L ‖ 1/2 N − T −L − (q) (q) −β 2 = [(T − 2L + 1)] (XiT (t ) − XiT (t )) (4.3)
where the two N-dimensional processes XNT (t ) and XNT (t ) consist of univariate locally stationary AR(p) processes as in (4.1) of Dahlhaus (1997), that is, (j)
in our simulations is bT = 13 T − 6 . Second, this estimator depends on the length L of each segment, which is akin to a smoothing parameter in time. We also note that for convenience we choose the truncation parameter M of the filters φNT ,i Tt , B to be equal to L/2. Finally, as in Dahlhaus (1996), we allow our estimator to be based on shifted segments with a shift S from segment to segment, resulting in P segments (such that T = S (P − 1) + L), and with midpoints tj = S (j − 1) + L/2, j = 1, . . . , P. We consider two numerical scenarios, both based on N = 10, 20, 30, 40, 50 cross-section dimensions. In the first one T = L2 with L = 16, 32, 48, 64, 80, P = L + 1 and S = L − 1. In the second one T = 8L with L = 32, 64, 96, 128, 160, P = L + 1 and S = 7. The consistency result in Theorem 6 ensures the consistency of the estimated common components XiT (t ) elementwise, that is for all i and all t. This is confirmed by the simulated example in Fig. 4.1, where we report the estimation of the common components for N = 10 and T = 2304 (this is the parametrization corresponding to our first scenario with L = 48 and N = 10). The quality of the fit is remarkable, even for a relatively small cross-section size. In this figure we just consider the first four (among 10) common components XiT (t ) and (a realization of) its estimator X1T (t ), t = 1, . . . , T . Our estimator is able to capture the time-varying variance (non-stationarity) as well as the time-varying dynamics (changes in the autocorrelation). To evaluate the global performance of the estimator we consider the rescaled norm of the difference between the matrix of estimated common components and that containing the true ones. Let XN ,T −2L = (XNT (L), . . . , XNT (T − L)) be the N × (T − 2L + 1) matrix containing the vectors of common components XNT (t ), t = L, . . . , T − L, and let XN ,T −2L be the corresponding estimator. We now report simulations of XN ,T −2L based on Q replications. (q)
For the investigation of the small finite sample behaviour of the proposed approach, we consider a 2-factor model given by (1)
i =1 t =L
for i = 1, . . . , N, j = 1, 2, and t = 1, . . . , T . We assume that the factors Fj (t ) as well as the idiosyncratic components ZT (t )
depending on N, T and β . In Tables 4.1 and 4.2 we consider, for both scenarios and for different values of β , the average
iid
are stationary and normally distributed, F (t ) ∼ N (0, Ir ) and iid
ZN (t ) ∼ N (0, IN ). For the simulations, we have set p = 2 with autoregressive coefficients given by aij,1 (u) = ∓
2
opt
1
In Proposition 3, we have shown that the approximation error |XiT ,N (t ) − XiT (t )| vanishes as N → ∞ uniformly over T ∈ N and 1 ≤ t ≤ T . This implies that for all ϵ > 0 and η > 0 there exists an N ∗ such that
estimator depends on two ‘‘smoothing’’ parameters: first, the parameter bT , the bandwidth in the frequency direction. Dahlhaus opt (1996, Theorem 2.3) showed that the optimal value bT for this parameter is of the order T − 6 , where bT is optimal in the sense that it minimizes the relative mean squared error of the estimator of the time-varying spectrum. The bandwidth bT used
3 ϵ + P |XiT ,N (t ) − XiT ,N (t )| > 3 ϵ + P |XiT ,N (t ) − XiT (t )| > .
P | XiT ,N (t ) − XiT ,N (t )| >
59
cos(θij (u))
and aij,2 (u) = ±
1
, 2
∆NTQ (β) =
Q 1 −
Q q =1
(q)
∆NT (β)
and the standard deviation u ∈ [0, 1],
α α where |α| > 1 and θij (u) = φij − cos(νij u) for i = 1 . . . , N and j = 1, 2. For fixed u, the roots of the characteristic polynomials are ρij (u) = α e±iθij (u) , which implies |ρij (u)| = |α| > 1 for all u ∈ [0, 1], i = 1, . . . , N and j = 1, 2. Thus, the auto-spectra of (j) the processes XiT (t ) have single peaks with time-varying locations θij (u). The spectral estimator implemented in the simulations is the smoothed segmented periodogram (see Appendix A.2). This
∆
SNTQ (β) =
1
Q − (∆(NTq) (β) − ∆NTQ (β))2
1/2
Q − 1 q =1
of the loss in (4.3) over Q experiments. In the case of mean-square convergence (of the estimated common components to the true ones), the loss function in (4.3) with β = 12 can be interpreted as the sample counterpart of the (square root of the) mean squared error of the estimator X with respect to X. However, Theorem 6 guarantees only consistency
60
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Fig. 4.1. True common components XiT (t ) (black) and estimated ones XiT (t ) (grey), i = 1, . . . , 4. Table 4.1 First scenario. Average and standard deviation (in brackets) of the loss defined in (4.3) over Q = 100 experiments for different values of β .
Table 4.2 Second scenario. Average and standard deviation (in brackets) of the loss defined in (4.3) over Q = 100 experiments for different values of β .
β = 0.5
T = 256
T = 1024
T = 2304
T = 4096
T = 6400
β = 0.5
T = 256
T = 512
T = 768
T = 1024
T = 1280
N = 10
3.3208 (0.3810) 3.6029 (0.5169) 3.6423 (0.2589) 3.6112 (0.2706) 3.5562 (0.2586)
3.3127 (0.2899) 3.7409 (0.3238) 3.8204 (0.2272) 3.8746 (0.2338) 3.8887 (0.1827)
3.3696 (0.2841) 3.6789 (0.2352) 3.8148 (0.2180) 3.8280 (0.2159) 3.8896 (0.1814)
3.2455 (0.2117) 3.7103 (0.2177) 3.7878 (0.1896) 3.9015 (0.1892) 3.9669 (0.1847)
3.3220 (0.1845) 3.6548 (0.2414) 3.7324 (0.1803) 3.8897 (0.1875) 3.9551 (0.1655)
N = 10
2.9580 (0.2251) 3.3768 (0.2428) 3.5409 (0.2428) 3.5707 (0.1669) 3.7174 (0.2334)
3.3043 (0.3142) 3.5524 (0.1892) 3.7406 (0.2390) 3.8851 (0.1989) 3.9705 (0.2692)
3.2508 (0.2313) 3.6772 (0.2897) 3.8534 (0.2426) 3.9483 (0.2926) 3.9183 (0.2127)
3.4686 (0.2303) 3.7479 (0.2485) 3.7542 (0.1886) 3.9142 (0.2432) 3.8137 (0.2236)
3.5640 (0.2790) 3.7753 (0.2561) 3.8762 (0.2924) 3.8172 (0.2462) 3.8464 (0.2599)
β = 0.6
T = 256
T = 1024
T = 2304
T = 4096
T = 6400
β = 0.6
T = 256
T = 512
T = 768
T = 1024
T = 1280
N = 10
1.5347 (0.1761) 1.5536 (0.2229) 1.5081 (0.1072) 1.4529 (0.1089) 1.3992 (0.1017)
1.3241 (0.1159) 1.3951 (0.1207) 1.3681 (0.0814) 1.3482 (0.0813) 1.3232 (0.0622)
1.2393 (0.1045) 1.2624 (0.0807) 1.2570 (0.0718) 1.2256 (0.0691) 1.2178 (0.0568)
1.1257 (0.0734) 1.2007 (0.0705) 1.1771 (0.0589) 1.1780 (0.0571) 1.1714 (0.0545)
1.1012 (0.0612) 1.1304 (0.0747) 1.1086 (0.0536) 1.1225 (0.0541) 1.1162 (0.0467)
N = 10
1.3882 (0.1056) 1.4786 (0.1063) 1.4888 (0.1021) 1.4588 (0.0682) 1.4852 (0.0933)
1.4472 (0.1376) 1.4517 (0.0773) 1.4678 (0.0938) 1.4813 (0.0758) 1.4805 (0.1004)
1.3673 (0.0973) 1.4431 (0.1137) 1.4521 (0.0914) 1.4457 (0.1071) 1.4031 (0.0762)
1.4176 (0.0941) 1.4292 (0.0947) 1.3747 (0.0691) 1.3927 (0.0865) 1.3270 (0.0778)
1.4245 (0.1115) 1.4079 (0.0955) 1.3881 (0.1047) 1.3282 (0.0857) 1.3088 (0.0884)
β = 0.7
T = 256
T = 1024
T = 2304
T = 4096
T = 6400
β = 0.7
T = 256
T = 512
T = 768
T = 1024
T = 1280
N = 10
0.7093 (0.0814) 0.6699 (0.0961) 0.6245 (0.0444) 0.5845 (0.0438) 0.5505 (0.0400)
0.5292 (0.0463) 0.5203 (0.0450) 0.4899 (0.0291) 0.4691 (0.0283) 0.4503 (0.0212)
0.4558 (0.0384) 0.4332 (0.0277) 0.4142 (0.0237) 0.3924 (0.0221) 0.3813 (0.0178)
0.3904 (0.0255) 0.3886 (0.0228) 0.3658 (0.0183) 0.3557 (0.0173) 0.3459 (0.0161)
0.3650 (0.0203) 0.3496 (0.0231) 0.3292 (0.0159) 0.3239 (0.0156) 0.3150 (0.0132)
N = 10
0.6515 (0.0496) 0.6474 (0.0465) 0.6260 (0.0429) 0.5960 (0.0279) 0.5934 (0.0373)
0.6339 (0.0603) 0.5932 (0.0316) 0.5760 (0.0368) 0.5648 (0.0289) 0.5520 (0.0374)
0.5751 (0.0409) 0.5663 (0.0446) 0.5472 (0.0345) 0.5294 (0.0392) 0.5024 (0.0273)
0.5794 (0.0385) 0.5450 (0.0361) 0.5034 (0.0253) 0.4955 (0.0308) 0.4617 (0.0271)
0.5694 (0.0446) 0.5250 (0.0356) 0.4971 (0.0375) 0.4622 (0.0298) 0.4454 (0.0301)
N = 20 N = 30 N = 40 N = 50
N = 20 N = 30 N = 40 N = 50
N = 20 N = 30 N = 40 N = 50
in probability (of the estimated common components to the true ones). This explains why we cannot expect ∆NTQ (0.5) in generality to be decreasing with increasing N and/or T . We expect however, on the other hand, that ∆NTQ (0.5) should not increase with these parameters, or to phrase it differently that once β becomes larger
N = 20 N = 30 N = 40 N = 50
N = 20 N = 30 N = 40 N = 50
N = 20 N = 30 N = 40 N = 50
than 0.5 the loss ∆NTQ (β) should begin to decrease. Hence we add a small investigation to confirm this conjecture. Whereas for β = 0.6, the values of ∆NTQ (0.6) begin to decrease w.r.t. T (starting from N = 20), the values ∆NTQ (0.7) can be observed to monotonically decrease in both N and T .
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
5. Illustrative example In this section we apply the methodology described in the previous sections to the macroeconomic data1 analysed in Stock and Watson (2005, Section 4), Jungbacker and Koopman (2008, Section 5), and Hallin and Li˘ska (2007). From this dataset we constructed a balanced panel of 132 monthly US macroeconomic time series from January 1960 to December 2003 (44 years, T = 528). The original data have been transformed according to Stock and Watson (2005, Table A.1 in Appendix A). For the application of our non-stationary dynamic factor modelling approach, the panel size needs to be reduced further. This is due to the fact that the estimated time-varying common components are based on estimation of the whole spectral density matrix (cf Assumption 4) and, more precisely, of its eigenvalues and eigenvectors, whose variance increases with the panel size N. Consequently, our approximation results require that for fixed panel size N the sample length must be larger than some T = TN . Conversely, we find that for fixed T the panel size must not exceed N = NT . We note that the same kind of restriction also applies to the stationary case discussed by Forni et al. (2000) although less severely as in our case the spectral density matrix is estimated locally and hence is based on a much smaller effective sample size. For instance, in case of the segmented periodogram estimator (cf Appendix A.2) the estimation is based on segments of length LT with LT /T → 0 as T → ∞. From the original panel of 132 series, we therefore selected a subpanel of N = 22 time series, which consists of two groups: (1) Industrial production (IP) variables (i = 6, . . . , 18); (2) Interest Rate (IR) variables (i = 86, . . . , 94). Though homogeneous within each group, these time series are characterized by different dynamics for the two groups as shown by Fig. 5.1. Before studying and comparing these dynamics, we show that the data are non-stationary; we applied (for each group) two tests (Sections 5.1 and 5.2). For more details about the formal definitions and the numerical outcomes of the tests, see Motta (2009, Chapter 4). 5.1. Testing for shifts in the variance Hallin and Li˘ska (2007, Section 5.2) found three factors r1 = 3 for 1960–1982 (T1 = 276), and one factor ( r2 = 1) for 1983–2003 (T2 = 252). We believe that the different behaviour of the series in the two subsamples is due to non-stationarity rather than to a different number of underlying factors. To have an idea of the behaviour of the variance in the two subsamples, we plotted the selected series together with their standard deviation, see Fig. 5.1. We found that except for one variable (i = 17), the standard deviation in the first subsample is larger than that of the second subsample; this empirical evidence motivated us to test the null hypothesis of equal variance in the two subperiods. The test proposed by Hoffman and Pagan (1989) is performed by splitting the sample into two parts, and the test-statistic is based on the difference between the two sample variances. We rejected the null hypothesis of constant unconditional variance in the two parts for almost all of the investigated series. 5.2. Testing for time-varying dynamics The second test that we considered has been proposed by Priestley and Subba Rao (1969) and takes into account the full dynamic structure of the process. The test is based on a factorization of the time-varying spectrum into two components:
1 The data can be downloaded at http://www.princeton.edu/~mwatson.
61
a time-dependent factor describing the non-stationarity and a frequency-dependent factor characterizing the dynamics of the process. Given the estimated time-varying spectral density 6N (u, ω), we can reject the null hypotheses of a constant spectrum over time as well as over frequency. The results of the tests provide strong evidence against the null hypotheses of stationarity. In Section 5.3 we apply some statistical tools that illustrate how to detect and interpret the nonstationarity. More generally, the outcomes of the tests indicate that the data are characterized by a time-dependence as well as by a frequencydependence. In order to understand how these two components interact, we compare the estimation of different factor models (see Section 5.4). 5.3. Statistical tools to detect non-stationarity In this subsection we apply some non-parametric statistical tools to detect non-stationarity: time-varying covariances, timevarying cross-autocorrelations, time-varying cross-spectra and time-varying cross-coherences. The time-varying covariance matrix is defined as the expectation of the cross-products at rescaled time u ∈ (0, 1)
6N (u) = E[YNT (⌊uT ⌋)YNT (⌊uT ⌋)′ ],
(5.1)
and it is estimated non-parametrically according to Motta et al. (2011) (the computation of confidence intervals is based on the asymptotic normality of the estimator). Fig. 5.2 shows that the autocovariances change slowly over time. The estimated time-varying spectra 6(u, ω), for u ∈ (0, 1) and ω ∈ (0, π ), are presented in Fig. 5.3 (in absolute value). The estimator of the time-varying spectral density matrix is the smoothed segmented periodogram (see Appendix A.2) with LT = 160 and bT = 0.8. The corresponding estimated coherences CN (u, ω), where
σij (u, ω) cij (u, ω) = , σii (u, ω) σjj (u, ω)
(5.2)
are given in Fig. 5.3 (in absolute value). The coherences are standardized measures of the dynamics in the frequency domain. Analogously to the matrix CN (u, ω), which is defined in the frequency domain, we can define in the time domain the timevarying autocorrelation matrix RN (u, k) at lag k and rescaled time u as the N × N matrix with entries rij (u, k) =
E[Yi (⌊uT ⌋ + k)Yj (⌊uT ⌋)] E[Yi2 (⌊uT ⌋ + k)]E[Yj2 (⌊uT ⌋)]
i, j = 1, . . . , N .
, (5.3)
In Fig. 5.4, we plot the estimated functions rij (u, k) for lags 0 ≤ k ≤ 10. For both IP and IR, the spectra and the coherences show (within each subgroup) a common structure over time. Although the spectra for the IR variables are less homogeneous over time than the spectra of the IP variables, investigation of the coherences suggests that the dynamic interrelationships among the IR variables are smoother than those of the IP variables (both over time and over frequency). The non-stationarity of the IR variables is mainly due to the central part of the series. This is reflected by the behaviour of the (cross-) autocorrelation functions in Fig. 5.4, which vary (though smoothly) more for the IP variables than for the IR variables. 5.4. Comparison of different models In order to compare various methods for recovering the common components of the time series, we consider the one-step ahead (in-sample) forecast YNT (T ′ + h) = XNT (T ′ + h − 1), that is,
62
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Fig. 5.1. Typical examples of IP (top, i = 8, 9) and IR (bottom, i = 86, 87) series YiT (t ) and their standard deviation.
Fig. 5.2. Estimated time-varying covariances. Abscissas: rescaled time u ∈ (0, 1); ordinates: σij (u). Top: Industrial Production, i = 8, 9, j = 17. Bottom: Interest Rate, i = 86, 87, j = 92. Bold lines: estimated covariance functions; dashed lines: confidence intervals.
we predict YNT at times T ′ + h by the common component XNT at ′ times T + h − 1. The forecasting performance of each method can be measured by the averaged forecast error
Table 5.1 Forecasting error in (5.4) estimated on the basis of different methods and different choices of LT and bT . Method
H 1 − FE(T ′ , H ) = ‖ YNT (T ′ + h) − YNT (T ′ + h)‖.
H h =1
(5.4)
We note that YNT (T + h) is not a true forecast since computation of the estimate XNT (T ′ + h − 1) also involves the value YNT (T ′ + h) due to the two-sided projection filters that we use. In our application, we used r = 3 factors and computed FE(T ′ , H ) with T ′ = 500 and H = 12. The choice T ′ = 500 is motivated by our interest in measuring the forecasting performance of the four methods for the 12 months (H = 12) after September 2001. We remark that Stock and Watson (2005) are mainly interested in forecasting the growth of industrial production, which is one of the series, whereas we look at forecasting a whole group of variables.
Static stationarya Static evolutionaryb Dynamic stationaryc
′
Dynamic evolutionary
a b c
LT
16 32 48 16 16 16
bT
IP
IR
T −1/2 T −1/4 T −1/6
0.4573 0.4590 0.4501 0.4505 0.4497 0.4718 0.4390 0.4295
5.0946 5.0526 5.0423 5.0944 5.1615 5.9310 5.5544 5.4929
Bai (2003). Motta et al. (2011). Forni et al. (2000).
The results in Table 5.1 show that the dynamic non-stationary approach outperforms the others for the industrial production variables if the bandwidths are chosen properly (for bT = T −1/4 , T −1/6 ), whereas for the interest rate variables the non-stationary
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
63
Fig. 5.3. Estimated time-varying spectra (top) and coherences (bottom). Left abscissas: rescaled time u ∈ (0, 1); right abscissas: frequency ω ∈ (0, π). Top ordinates: | σij (u, ω)|, where σij (u, ω) are defined in (A.2); bottom ordinates: | cij (u, ω)|, where cij (u, ω) are defined in (5.2). Top variables: Industrial Production, i = 8, 9, j = 17. Bottom variables: Interest Rate, i = 86, 87, j = 92.
static and the stationary dynamic approaches seem to outperform the others. This could be due to that the IR variables exhibit much less variation over time in their interrelationships and thus are well approximated by the more parsimonious stationary dynamic model. In contrast, the time-varying coherences and correlations for the IP variables are clearly time-dependent and require a nonstationary model.
6. Conclusions With this paper we have delivered an important generalization of factor modelling as it is currently available in the literature. We allow for a second-order non-stationarity of our data-generating process in order to cope with the empirical observation that – being observed over long time periods (or on a sufficiently fine
64
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Fig. 5.4. Estimated time-varying autocorrelations defined in (5.3). Left abscissas: rescaled time u ∈ (0, 1); right abscissas: lags k = 1, 2, . . . , 10; ordinates: rij (u, k). Top: Industrial Production, i = 8, 9, j = 17. Bottom: Interest Rate, i = 86, 87, j = 92.
time resolution) – macroeconomic or financial data can exhibit some time variation in their serial variance–covariance structure. Furthermore we do not content ourselves to work with a static factor model in order to avoid taking a potentially large number of factors into account. In this respect we generalize both the work by Motta et al. (2011) on time-varying static factors and by Forni et al. (2000) on stationary dynamic factor modelling. With our approach we contribute to the yet recent literature on dimensionreduction of multivariate time series with a possibly time-varying correlation, where the latter one will allow to work with a considerably smaller number of factors (common components) to explain the co-movements in a large panel of observed time series: it is obvious that allowing the dynamics of a few common components to slowly change over time will allow for very sparse modelling. From a statistical point of view we enlarge the domain of applicability of Principal Components Regression (PCR) in the frequency domain, one of the main tools in estimating dynamic factor models. The PCR technique is based on the spectral decomposition of the matrix describing the dynamics of the underlying process. We allow for non-stationarity of these dynamics by working with time-varying filters, and using the standard approach of letting both sample size and cross-sectional dimension tend to infinity, we derive weakly consistent estimators of the common components via the spectral decomposition of a localized smoothed periodogram, as, for example, defined in (A.2). The main theoretical contribution of the paper is provided by the treatment of non-stationarity, both in the time-varying dynamical factor loadings and in the idiosyncratic components. To develop our asymptotic theory of consistency, the approximation error is based on the new definition of time-varying dynamic principal components given in (3.8), the estimation error is based on the new definition in (3.11) of the filter applied to data. In order to localize (at rescaled time u) the corresponding definitions in
the stationary case properly, these definitions are based on the convolution of the eigenvectors. As a result, we have an additional source of error, the filtering error: this is due to the problem of time-varying filtering of a non-stationary process, which is new compared to the dynamic factor model of Forni et al. (2000). A few important problems have not been addressed in this paper. First, our approach for recovering the time-varying common components is largely affected by the quality of estimation of the spectral density matrix. Two commonly used estimators are the segmented periodogram and the smoothed pre-periodogram (see Appendix A). Since the performance of these non-parametric estimators depends crucially on the localization in time and frequency direction, it would be desirable to have an optimal procedure for selecting the corresponding smoothing bandwidths. However, in the case of time-varying spectra, this is still an open problem. The optimal values of the corresponding smoothing bandwidths are known to be proportional to T −1/6 (see Dahlhaus, 1996, Theorem 2.3) but depend on the derivatives of the unknown true spectrum 6(u, ω). In our case, the matter is further complicated by the fact that we are interested not directly in the entries of the spectral matrix but in certain joint features such as eigenvalues and eigenvectors, which needs to be taken into account when selecting the bandwidths. Second, we have not discussed the important problem of using dynamic factor models for the purpose of prediction. Since our approach is based on PCR in the frequency domain, the resulting projection filters are two-sided and thus ill-suited for forecasting. In contrast to the stationary dynamic factor model, for which Forni et al. (2005) have developed a forecasting methods based on one-sided projections, we face an additional problem due to the non-stationarity and the lack of information about how the dynamics of the process evolves outside the observed time period. Finally, we have considered the number of factors as fixed. In applications, the exact number of factors is usually unknown and needs to be determined from the data. We believe that recent work on data-based selection
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
of the number of factors in static and dynamic PCR-based factor models (e.g. Bai and Ng, 2002; Hallin and Li˘ska, 2007) can be generalized, but this is left for future research as well.
The authors thank Christian Hafner, Marc Hallin and Marco Lippi for many stimulating discussions and valuable remarks, and two anonymous referees and the editor, Franz Palm, for their helpful comments on an earlier version of this article. Giovanni Motta and Rainer von Sachs acknowledge financial support from the contract ‘‘Projet d’Actions de Recherche Concertées’’ nr. 07/12/002 of the ‘‘Communauté française de Belgique’’, granted by the ‘‘Académie universitaire Louvain’’. Appendix A. Estimation of time-varying spectral density matrices The methods presented in the main part of this chapter are based on estimators of the time-varying spectral density matrix. As the approximate factor model leaves the dependence structure of the idiosyncratic components unspecified, the spectral matrix is estimated non-parametrically. In this section, we review two common approaches for estimating time-varying spectral densities and cite relevant results to achieve uniform consistency of the spectral matrix estimator 6N (u, ω). A.1. Smoothed pre-periodogram Neumann and von Sachs (1997) proposed estimation of timevarying spectral densities based on the pre-periodogram
SN
t T
,ω
1
=
2π
k:1≤⌊t + 21 ± 2k ⌋≤T
π
1+k
′
1−k YNT t+ e− i ω k , 2 2 where ⌊x⌋ denotes the largest integer smaller than x. The product k YNT (⌊t + 1+ ⌋)YNT (⌊t + 1−2 k ⌋)′ can be regarded as a raw estimator 2 of the local autocovariance 0N ( Tt , k) given by
× YNT
∫
t+
6N (u, ω)eiωk dω.
−π
, ω is a raw estimate of the time-varying spectral T matrix 6N Tt , ω . Like the periodogram in the stationary case, the pre-periodogram SN Tt , ω is not consistent and thus needs to be Similarly, SN
t
smoothed in time and frequency direction. This leads to kernel estimators of the form
6NT (u, ω) =
T 2π −
1
T 2 s,j=1 hT bT
× Kφ
DLNT (u, ω) =
L−1 s −
h
L
K
τ
u−
ω − ωj SN
bT
YNT (⌊uT ⌋ − L/2 + s + 1)e−iωs
be the localized Fourier transform of the process over the segment from ⌊uT ⌋ − L/2 + 1 to ⌊uT ⌋ + L/2, where h : R → R is a data taper with h(x) = 0 for x ̸∈ [0, 1] and Fourier transform HkL (ω) =
L−1 s k −
h
s =0
L
e− i ω s .
The localized or segmented periodogram is defined as
SLNT (u, ω) =
1 2π H2L (0)
DLNT (u, ω)DLNT (u, ω)∗ .
Thus the periodogram estimator of the time-varying spectrum 6N (u, ω) is obtained from a segment of length L with midpoint ⌊uT ⌋. Since the periodogram is not a consistent estimator, the segmented periodogram needs to be smoothed (see Dahlhaus, 1996), which leads to the kernel estimator L 6NT (u, ω) =
1
∫
π
K
bT
f
ω − µ L SNT (u, ω)dµ, bT
−π
(A.2)
where K f is a kernel with compact support [− 12 , 12 ] satisfying K f (x) = K f (−x) and K f (x)dx = 1, and bT is the bandwidth in frequency direction. Usual assumptions on the parameters L = LT and bT to achieve consistency are LT → ∞, LT /T → 0, bT → 0 and bT T → ∞ as T → ∞. A.3. Uniform consistency
−
0N (u, k) =
for the spectral density. More precisely, let (for L even)
s=0
Acknowledgements
65
s T
t T
V (g ) = sup
m −
sup
m∈N 0≤x0 <···<xm ≤1 k=1
|g (xk ) − g (xk−1 )|
be the total variation of a function g : [0, 1] → R. Furthermore, we set lq (k) = max{|k| log1+q |k|, 1} for q > 0. Assumption 5. XNT (t ), 1 ≤ t ≤ T , is an N-dimensional stochastic process satisfying the following conditions: (i) XNT (t ) has a representation
hT
In Assumption 4, we require that the estimator of the spectral matrix is uniformly consistent. For the smoothed pre-periodogram and the smoothed segmented periodogram, uniform consistency has been shown in an unpublished manuscript by Dahlhaus. For convenience, we repeat here the assumptions and the statement. For simplicity, we treat only the case of evolutionary processes, which is sufficient for the purpose of this paper. The proof by Dahlhaus requires a number of technical assumptions. Let
, ωj ,
(A.1)
where ωj = 2π j/T are the Fourier frequencies, K τ and K φ are two kernels and hT and bT are the corresponding bandwidths in time and frequency direction, respectively. We assume that the kernels K τ and K φ have compact support on [− 21 , 12 ] and are of bounded
1/2
1/2
variation with −1/2 xK (x)dx = 0 and −1/2 K (x)dx = 1. A.2. Smoothed segmented periodogram An alternative common approach for estimating time-varying spectral densities is based on the idea that a process with slowly varying spectral matrix 6N (u, ω) can be treated as stationary over short time intervals. This suggests using ordinary kernel estimators applied locally to the time series to obtain a time-varying estimate
XNT (t ) =
∞ −
8N ,k
k=0
t
T
εN (t − k),
(A.3)
where the errors εN (t ), t ∈ Z, are independent and identically distributed with mean E(εN (t )) = 0 and covariance matrix E(εN (t )εN (t )′ ) = IN ; k (ii) there exists a positive constant Cε,N such that E|εi (t )|k ≤ Cε, N for all i = 1, . . . , N, t ∈ Z, and k ∈ N; (iii) the coefficients 8N ,k (u) = (φij,k (u))i,j=1,...,N satisfy sup |φij,k (u)| ≤
u∈[0,1]
Cφ lq (k)
and V (φij,k ) ≤
Cφ lq (k)
for all i = 1, . . . , N, j = 1, . . . , r, and k ∈ Z, where Cφ is a positive finite constant not depending on T ; (iv) the time-varying spectral density 6N (u, ω) is twice differentiable with respect to u ∈ [0, 1] and ω ∈ [−π , π].
66
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Under these conditions, Dahlhaus showed that the above kernel estimators of the spectral density matrix are uniformly consistent in u ∈ [0, 1] and ω ∈ [−π , π]. More precisely, we have the following result. Theorem 7 (Dahlhaus, unpublished manuscript, Thm. 3.1). Suppose that XNT (t ), 1 ≤ t ≤ T , is an N-dimensional evolutionary process satisfying Assumption 5. Furthermore, let 6NT (u, ω) be the smoothed pre-periodogram estimator in (A.1) with bandwidth hT and bT such that hT , bT → 0 and ThT bT → ∞ as T → ∞. Then for ThT bT ≫ log2 T sup
u∈
sup
hT h ,1− 2T 2
ω∈(−π,π]
‖ 6NT (u, ω) − 6N (u, ω)‖
= Op ((hT bT T )−1/2 + h2T + b2T ).
(A.4)
A similar result holds for the kernel estimator based on the segmented periodogram (Dahlhaus, unpublished manuscript, Remark 3.2) where then hT = LT /T . A.4. Application to evolutionary dynamic factor processes
Appendix B. Proofs B.1. Proof of Proposition 1 The matrices 6XN (u, ω) and 6ZN (u, ω) are spectral matrices and thus Hermitian. By Weyl’s Theorem (Lütkepohl, 1996, p. 75)
λXNj (u, ω) + λZNN (u, ω) ≤ λNj (u, ω) ≤ λXNj (u, ω) + λZN1 (u, ω),
j = 1, . . . , N .
On one hand, Assumption 3(i) implies for j = 1, . . . , r that λNj (u, ω) ≥ λXNj (u, ω) tends to infinity a.e. in [−π , π] and uniformly in u ∈ [0, 1] as N → ∞. On the other hand, if j > r, we have λXNj (u, ω) = 0 and thus by Assumption 3(ii) λNj (u, ω) ≤
λZN1 (u, ω) ≤ λZ uniformly in ω ∈ [−π , π] and u ∈ [0, 1] for all N > r. B.2. Proof of Corollary 2 Since ZNT (t ) is set to zero for t < 1 and t > T , we have for linear filters AN (B) AN (B)ZNT (t ) = AN ,t −1 ZNT (1) + · · · + AN ,T −t ZNT (T ). Since T is fixed, it therefore suffices to show that the crosssectional averages AN ZNT (t ) converge to zero in mean square for all sequences of vectors AN with ‖AN ‖ → 0 as N → ∞. The assumptions on the spectral matrix 6ZN (u, ω) imply that for fixed u ∈ [0, 1] the stationary process ZNT (u, t ), t ∈ Z, given by
As remarked in Section 3, the evolutionary dynamic factor model in (3.1) is not locally stationary according to the definition of Dahlhaus nor has it an evolutionary representation as in (A.3). Thus Theorem 7 is not directly applicable. However, YNT (t ) can be written as a linear combination of two evolutionary and thus locally stationary processes; applying Theorem 7 to these processes, we can also establish uniform consistency for the spectral estimator 6N (u, ω). More precisely, let VN (t ) be the 2N-dimensional processes consisting of the common and the idiosyncratic components of YN (t ), that is,
ZNT (u, t ) = ϒ N (u, B)εN (t )
XNT (t ) VNT (t ) = . ZNT (t )
converges to zero in mean square for all AN with ‖AN ‖ → 0 as N → ∞.
Then VNT (t ) is an evolutionary process in the sense of (A.3) and has a time-varying spectral matrix given by
B.3. Proof of Proposition 3
6VN (u, ω) =
X 6N (u, ω) 0
0 , 6ZNT (u, ω)
AN ZNT (t ) = AN ϒ N
t T
, B εN (t )
From (3.1) and (3.10) we have the following decompositions YiT (t ) = XiT (t ) + ZiT (t ) = XiT ,N (t ) + ZiT ,N (t ),
where 6N T , ω and 6ZN (u, ω) are the time-varying spectral density matrices of XNT (t ) and ZNT (t ), respectively, and where the off-diagonal zero matrices are due to the orthogonality between XNT (t ) and ZNT (t ). Now suppose that Assumption 5 holds for XNT (t )
X t
is idiosyncratic according to the original definition of Forni and Lippi (2001). In particular, it follows that
V 6N
which gives for the approximation error
|XiT (t ) − XiT ,N (t )|2 = |ZiT (t ) − ZiT ,N (t )|2 .
Recalling the definition of the approximate common component
and ZNT (t ). Then, by Theorem 7, an estimator (u, ω) based on XNT (t ) and ZNT (t ) would be uniformly consistent; note that such an estimator is hypothetical since XNT (t ) and ZNT (t ) are unobserved. Nevertheless, since the observed process YNT (T ) is given by
XiT ,N (t ) = (φNi ⋆ 9N )
YNT (t ) = (IN IN )VNT (t ),
φNi
t T
t , B F (t ) + (φNi ⋆ ϒ N ) , B εN (t ), T
as well as the definition of the filter φ in (3.11),
t T
, B = [PNi ⋆ ( ′
QrN PN
)]
its spectral matrix 6N (u, ω) can be written as
we find
6N (u, ω) = (
2 t E (φNi ⋆ ϒ N ) , B εN (t )
)
IN IN 6VN
(u, ω)(IN IN ) , ′
V
and 6N (u, ω) is related to 6N (u, ω) analogously. Consequently, the uniform consistency of 6N (u, ω) follows immediately from that of
V 6N (u, ω).
We note that Assumption 5 applied to the common component XNT (t ) and the idiosyncratic component ZNT (t ) provides a sufficient set of conditions for Assumption 4 to hold. We think that these technical conditions are not necessary to establish uniform consistency but are due to the method of proof by Dahlhaus.
(B.1)
t T
,B ,
T
= E|riT (t )| + E|ZiT (t ) − ZiT ,N (t )|2 2
+ E[riT (t )(ZiT (t ) − ZiT ,N (t ))],
(B.2)
where we have set riT (t ) = XiT (t )−(φNi ⋆ 9Ni )( Tt , B)F (t ). In view of (B.1), the stated mean square convergence follows if we show that the term on the left-hand side and the third term on the right-hand side both converge to zero as N tends to infinity. We start with the term on the left side.
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Lemma 8. For N ∈ N, let ϕN (u, B) be an N-dimensional two-sided square-summable filter, and assume that π
∫ lim
N →∞ u∈[0,1]
N (u, B) = (ϒ N ( t , B), IN ) and with ϒ εN (t ) = (εN (t )′ , UN (t )′ )′ . T Then, setting YNT (t ) = XNT (t ) + ZNT (t ),
ϕN (u, ω)2 dω = 0.
sup
67
(B.7)
we find that the process YNT (t ) in (B.7) fulfills Assumptions 1–3
−π
with
Then
6ZN (u, ω) = 6ZN (u, ω) + IN
and thus λZNj (u, ω) = λZNj (u, ω) + 1 and λYNj (u, ω) = λNj (u, ω) + 1.
T
N →∞ T ∈N 1≤t ≤T
and 6YN (u, ω) = 6N (u, ω) + IN
2 t = 0. lim sup sup E (ϕ ⋆ ϒ ) , B ε ( t ) N N N
Furthermore, PNj = PNj and thus also φNj = φNj for all 1 ≤ j ≤ N and N ∈ N. Consequently, if Proposition 3 holds for the process YNT (t ), that is, Y
Y
Proof. Since ZN (t ) has a zero mean, it follows that
2 t sup sup E (ϕN ⋆ ϒ N ) , B εN (t ) T T ∈N 1≤t ≤T ∫ π ϕN (u, ω)6ZN (u, ω)ϕN (u, ω)∗ dω ≤ sup u∈[0,1]
−π ∫ π
≤ sup
u∈[0,1]
2 t t , B F (t ) + (φNi ⋆ ϒ ) ,B εN (t ) E (φNi ⋆ 9N ) →0 T T as N → ∞, then the same result also holds for the original process YNT (t ) since the process UN (t ) is orthogonal to the factors and idiosyncratic components and satisfies by Lemma 8
λZN1 (u, ω)‖ϕN (u, ω)‖2 dω.
−π
Assumption 3(ii) gives the result.
Thus it suffices to show that the conditions of the lemma hold for ϕN (u, ω) = φNi (u, ω), that is,
∫ lim
π
sup
N →∞ u∈[0,1]
‖φNi (u, ω)‖2 dω = 0.
(B.3)
−π
We observe that
‖φNi (u, ω)‖2 = φNi (u, ω)φNi (u, ω)∗ =
r −
|pNi,j (u, ω)| .
(B.4)
By (3.5), (3.6), and Assumption 2, the terms in the sum on the righthand side satisfy
σi inf λNj (v, ω)
,
j = 1, . . . , r ,
(B.5)
v∈[0,1]
sup |pNi,j (u, ω)| = 0
N →∞ u∈[0,1]
for almost all ω ∈ [−π , π]. Consequently, (B.4) converges to zero uniformly in u ∈ [0, 1]. As it is also bounded by 1, condition (B.3) now follows from the application of Lebesgue’s dominated convergence theorem. Next, we have to show that the mixed term on the right-hand side of (B.2) converges to zero. The proof is based on the next Lemma 9, for which we may assume without loss of generality that
λNj (u, ω) ≥ 1 for all j = 1, . . . , N , N ∈ N, u ∈ [0, 1], ω ∈ [−π , π].
(B.6)
Indeed, let UN (t ) be an N-dimensional orthonormal white noise process that is orthogonal to the factors F (t ) and the idiosyncratic component ZNT (t ) at all leads and lags, and define
ZNT (t ) = ZNT (t ) + UN (t ) = ϒ N t = ϒN ,B ε N (t ) T
t T
2 , B UN (t ) →0
as N tends to infinity. Under condition (B.6), the function µNj (u, ω) = [λNj (u, ω)]−1/2 is well-defined for all ω ∈ [−π , π] and is bounded. Therefore it has a mean square convergent Fourier representation. Denote by µNj (u, B) the corresponding square-summable filter and for a fixed value of u ∈ (0, 1) consider the vector process {WN (u, s), s ∈ Z} of the first r normalized principal components
= (MrN ⋆ PrN ⋆ 9N )(u, B)F (s) + (MrN ⋆ PrN ⋆ ϒ N )(u, B)εN (s) = MrN (u, B)FNr (u, s), where FNr (u, s) is the r-dimensional vector containing the first r elements of the vector FN (u, s) defined in (3.8), MrN (u, ω) = diag{µN1 (u, ω), . . . , µNr (u, ω)} 1
uniformly in u ∈ [0, 1] for almost all ω ∈ [−π , π]. Since by Proposition 1 the denominator on the right-hand side diverges as N → ∞, we have lim
WN (u, s) = (WN1 (u, s), . . . , WNr (u, s))′
2
j =1
|pNi,j (u, ω)|2 ≤
E φNi
t T
, B εN (t ) + UN (t )
= [3rN (u, ω)]− 2 r 3N (u, ω) = diag{λN1 (u, ω), . . . , λNr (u, ω)}
(B.8)
are r-dimensional diagonal matrices and PrN (u, ω) = [PN1 (u, ω)′ , . . . , PNr (u, ω)′ ]′ is the r × N matrix containing the first r rows of the N × N matrix PN (u, ω). Using (3.2), (3.5), and (B.8), we find MrN (u, ω)[PrN (u, ω)6XN (u, ω)PrN (u, ω)∗ ]MrN (u, ω)∗
+ MrN (u, ω)[PrN (u, ω)6ZN (u, ω)PrN (u, ω)∗ ]MrN (u, ω)∗ = MrN (u, ω)[PrN (u, ω)6N (u, ω)PrN (u, ω)∗ ]MrN (u, ω)∗ = MrN (u, ω)3rN (u, ω)MrN (u, ω)∗ = Ir . Thus the process WNT (u, s) is an orthonormal r-dimensional white noise since its spectrum is the identity matrix for all u and for all ω. Remark 2. Analogously to the (rescaled) time-varying principal components process FN (u, s) defined in (3.8), the process WNT (u, s) fixes the dynamics at rescaled time point u and generates the whole process for all s ∈ Z. The following lemma shows that the space spanned by the normalized principal components WN (u, s), which is identical to the space spanned by the principal components FN (u, s) themselves, i.e. FNr (u) defined in (3.9), converges to the space spanned by the true factors, F .
68
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
Lemma 9. Consider the orthogonal projection of WN (u, s) on the factor space F spanned by the factors Fj (s), j = 1, . . . , r, s ∈ Z,
As explained in Section 3.3, ZiT ,N (t ) is orthogonal to the terms
WN (u, s) = GN (u, B)F (s) + RN (u, s)
FNj
(B.9)
and the orthogonal projection of F (s) on the space spanned by the normalized principal components F (s) = GN (u, B
) WN (u, s) + SN (u, s),
−1 ∗
(B.10)
where GN (u, B) = (MN ⋆ ⋆ 9N )(u, B) is an r × r two-sided squaresummable filter, and where RN (u, s) = (MN ⋆ PrN ⋆ ϒ N )(u, B)εN (s) is orthogonal to F . Then, as N → ∞, (i) supu∈[0,1] E ‖RN (u, s)‖2 = o(1); (ii) supu∈[0,1] E ‖SN (u, s)‖2 = o(1). PrN
Proof. Noting that
+ (µNj ⋆ PNj ⋆ ϒ N )(u, B)εN (s)
(B.11)
we find for the time-varying spectral density of the j-th element of RN (u, s), j = 1, . . . , r,
σ (u, ω) = µ (u, ω)PNj (u, ω)
6ZN
2 Nj
R It follows from Proposition 1 that supu∈[0,1] σNj (u, ω) → 0 for all
ω as N → ∞. Since by (B.6) the supremum is also bounded by λZ , we get by Lebesgue’s dominated convergence theorem
∫
π
u∈[0,1]
σNjR (u, ω)dω → 0
and the proof of (i) is completed. For the second part, we note that the spectral density of WN (u, s) as well as that of F (s) is the identity matrix for all u and for all s and thus, by (B.9) and (B.10) we have
6SN
(u, ω)
for all u ∈ [0, 1] and ω ∈ [−π , π], where GN (u, ω) is the transfer function of the filter GN (u, B), and 6RN (u, s) and 6SN (u, s) are the spectral densities of RN (u, s) and SN (u, s), respectively. This implies tr(6SN (u, ω)) = tr(6RN (u, ω)), and (ii) is proved. In order to show that E[riT (t )(ZiT (t ) − ZiT ,N (t ))] tends to zero, it suffices to show that the cross-spectrum between riT (t ) and ZiT ,N (t ), fNi12 (u, ω) say, tends to zero uniformly in u ∈ [0, 1] and almost everywhere in [−π , π] since ZiT (t ) is orthogonal to F (t ) and thus to riT (t ) at all leads and lags. Since |fNi12 (u, ω)| is bounded by an integrable function, Lebesgue’s dominated convergence π theorem implies that −π fNi12 (u, ω)dω converges to zero uniformly in u ∈ [0, 1], which proves the desired mean square convergence. Recall that t ∗ , B F (t ) ZiT ,N (t ) = (PNi ⋆ [(IN − QrN )PN ] ⋆ 9N ) T
+ (PNi ⋆ [(IN − ∗
QrN
)PN ] ⋆ ϒ Ni )
t T
, B εN (t ).
XiT (t ) = ψ i
t
t
t
, B F (t ) = ψi ,B G , B−1 T T t t + ψi , B SN ,t . T
T
for j = 1, . . . , r at any lead and lag; then by (B.11) it is also orthogonal at any lead and lag to the terms WNj ( Tt , t ). Let
, ω be the time-varying cross-spectrum between ZiT ,N (t ) t and ψ i T , B SN ( Tt , t ), and let σiZ,N Tt , ω be the time-varying spectrum of ZiT ,N (t ). We have 12 gNi
t
T
12 |gNi (u, ω)|2 ≤ σiZ,N (u, ω)ψi (u, ω)6SN (u, ω)ψi (u, ω)∗ .
sup ‖ψ i (u, ω)6SN (u, ω)ψ i (u, ω)∗ ‖
u∈[0,1]
≤ sup ‖ψi (u, ω)‖2 sup ‖6SN (u, ω)‖, u∈[0,1]
which converges to 0 a.e. in [−π , π] by the assumption on the coefficients ψ i in Assumption 1(ii) and by Lemma 9. By the same arguments, the time-varying cross-spectrum between ZiT ,N (t ) and (φNi ⋆ 9N )( Tt , B)F (t ) equals that between ZiT ,N (t ) and (φNi ⋆ 9N )( Tt , B)SN ( Tt , t ). Denoting it by h12 Ni (u, ω), we have
T
= σiZ,N φNi (u, ω)9N (u, ω)6SN (u, ω)9N (u, ω)∗ φNi (u, ω)∗ ≤ σiZ,N λSN1 (u, ω)φNi (u, ω)9N (u, ω)9N (u, ω)∗ φNi (u, ω)∗ .
∗
WN
=
2 Z S ∗ |h12 Ni (u, ω)| ≤ σi,N λN1 (u, ω)φ Ni (u, ω)6N (u, ω)φ Ni (u, ω) r − |pNi,j (u, ω)|2 λNj (u, ω). = σiZ,N λSN1 (u, ω) j =1
By Lemma 9, λSN1 (u, ω) tends to zero for almost all ω ∈ [−π , π] uniformly in u ∈ [0, 1]. Furthermore, by (B.5), the sum is bounded by the constant σ i . Thus supu∈[0,1] |h12 Ni (u, ω)| tends to zero almost everywhere, and the proof of Proposition 3 is complete. B.4. Proof of Proposition 5 We need to show that for all N ∈ N the estimation error
[ ] t t XiT ,N (t ) − XiT ,N (t ) = φNT ,i , B − φ N ,i , B YNT (t ) T
T
satisfies
First, we study the cross-spectrum between ZiT ,N (t ) and XiT (t ), and then we study the cross-spectrum between ZiT ,N (t ) and (φNi ⋆ 9N )( Tt , B)F (t ). The cross-spectrum between ZiT ,N (t ) and XiT (t ) is equal to the cross-spectrum between ZiT ,N (t ) and ψ i Tt , B SN ( Tt , t ) because, by (B.10), XiT (t ) can be written as
T
Since 6XN (u, ω) = 9N (u, ω)9N (u, ω)∗ and 6N (u, ω) X Z 6N (u, ω) + 6N (ω), we obtain
Ir = GN (u, ω)GN (u, ω)∗ + 6RN (u, ω)
= GN (u, ω) GN (u, ω) +
T
2 |h12 Ni (u, ω)|
−π
∗
,t T [ ] t t = (PNj ⋆ 9N ) , B F (t ) + (PNj ⋆ ϒ N ) , B εN (t ) ,
u∈[0,1]
(ω)PNj (u, ω)
∗
1 ≤ λZ µ2Nj (u, ω)|PNj (u, ω)|2 ≤ λZ λ− Nj (u, ω).
sup
t
By (3.3) σiZ,N (u, ω) ≤ σi (u, ω) ≤ C6,N uniformly in u and ω. Furthermore,
WNj (u, s) = (µNj ⋆ PNj ⋆ 9N )(u, B)F (s)
R Nj
t T
,t
lim
sup
T →∞ Tu
P[| XiT ,N (t ) − XiT ,N (t )| > ϵ] = 0
for all ϵ > 0. Recall that
φNi
T
MT − t ,B = φNi,h Bh
t
t
h=−MT
T
and
φNT ,i
T
MT − t ,B = φNi,h Bh . h=−MT
T
(B.12)
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
By Chebyshev’s inequality we obtain for the probability in (B.12)
M − T t h =P B YNT (t ) > ϵ 1φNi,h h=−M T T 2 MT − 1 t ≤ 2 E Bh YNT (t ) 1 φNi,h h=−M ϵ T
1 φ Nim,k ( Tt )YmT (t − k) to get
M 1/2 2 − 2 − MT T − N t , 1 φ Nim,k (YmT (t − k))2 E T m=1 k=−MT k=−MT which then can simply be bounded by
N
sup E 1≤m≤N MT −
×
2 MT − t 1 φ Nim,k
(YmT (t − k))
2
.
(B.13)
To bound, for fixed N, the expectation we apply again Cauchy’s inequality and will show that
2 2 MT − t = O((ρT )−2 ) 1 φ Nim,k E T
(B.14)
and that
E
MT −
2 (YmT (t − k))2
= O(MT2 ),
(B.15)
k=−MT
which is sufficient to finish the proof because by assumption, MT /ρT → 0 as T → ∞. We will later show that Assumption 4 implies the following uniform rate of convergence of the estimated filters φNi (u, ω) to the true filters φNi (u, ω): for fixed N ∈ N sup
sup
u∈[uT ,1−uT ] ω∈[−π,π]
‖1 φNi (u, ω)‖ = Op ((ρT )−1 ),
(B.16)
and, in particular, for each 1 ≤ m ≤ N, sup
sup
E[1 φNi,m (u, ω)] = O((ρT )−1 ),
(B.17)
sup
Var(1 φNi,m (u, ω)) = O((ρT )−2 ).
(B.18)
u∈[uT ,1−uT ] ω∈[−π,π]
and sup
u∈[uT ,1−uT ] ω∈[−π,π]
|1 φNi,m (t /T , ω)|2 dω,
1 φ Nim,k
k=−MT
MT −
2 2
t
T
1 φ Nim,k
2 t
T
= O((ρT )−2 ),
and hence (B.14). Next, the expectation in (B.15) is simply bounded by CMT2 E(YmT (t )4 ), which is of order O(MT2 ) since YNT (t ) has uniformly bounded fourth moments. Consequently, the upper bound in (B.13) converges to zero as T → ∞, which completes the proof. It remains to show that equations (B.16)–(B.18) are a consequence of Assumption 4. The proof parallels parts of Lemma 4.2 of Forni et al. (2004) which is essentially built on the following Taylor expansions to be found in the proof of Brillinger (1981), Theorem 9.2.4. The idea is to derive the rates of convergence of eigenvalues λj (u, ω) and eigenvectors PNj (u, ω) of 6NT (u, ω) to the limiting ones λj (u, ω) and PNj (u, ω) of 6N (u, ω) by relating their difference in first order to the difference 6NT (u, ω) − 6N (u, ω). (Note that this is done for fixed N, so most of the double asymptotic considerations of Lemma 4.2 of Forni et al. (2004) can be ignored.)
and
PNj (u, ω) − PNj (u, ω) =
k=−MT
k=−MT
≤
T
λj (u, ω) − λj (u, ω) = PNj (u, ω)( 6NT (u, ω) − 6N (u, ω))PNj (u, ω)∗ + · · ·
T
k=−MT
∫
t
k=−MT
T
k
E
MT −
≤ CE
2 MT − t E Bh YNT (t ) 1 φNi,h h=−M T T 2 MT N − − t = E 1 φ Nim,k YmT (t − k) , m=1 k=−M T ∑
2
and that this random variable is stochastically bounded from above because 1 φNi,m (t /T , ω) is a difference of (a function of) eigenvectors of empirical and true spectral matrix of fixed dimension N. Hence we obtain, using (B.17) and (B.18), that
where we have set 1 φNi,h ( Tt ) = ( φNi,h ( Tt )−φNi,h ( Tt )). Analogously, let in the sequel, 1 φNi (u, ω) = ( φNi (u, ω) − φNi (u, ω)). In order to bound the expectation on the right-hand side,
we apply Cauchy’s inequality on as upper bound
1 φ Nim,k
k=−MT
T
2
Further note that
MT −
P[| XiT ,N (t ) − XiT ,N (t )| > ϵ]
69
− [PNi (u, ω)( 6NT (u, ω) i̸=j
− 6N (u, ω))PNj (u, ω)]
PNi (u, ω) + ···. λj (u, ω) − λi (u, ω)
(In both expansions the . . . denote terms of higher order in the difference between 6NT (u, ω) and 6N (u, ω).) For details on the derivation of these Taylor expansions see Wilkinson (1965), Ch. 2, but note that essentially they can be derived from a first order Taylor expansion of the ‘‘symbolic’’ eigenvalue–eigenvector equation Zfj = Ψj fj about Z = 6N which gives: ∗ ∗ ( 6NT − 6N )PNj + 6N ( PNj − PNj ) = ( λj − λj )PNj + λj ( PNj − PNj )∗ ,
and after multiplication of this equation from left by any other PNi , with i ̸= j, a system of equations of the type ∗ PNi ( 6NT − 6N ) PNj = (λj − λi )PNi ( PNj − PNj )∗ .
Note that in the cited sources, no multiplicity of the eigenvalues has been considered. However, there exist some techniques to group those multiple eigenvalue–eigenvectors pairs into blocks and repeat the same technique over blocks. To avoid lengthy technicalities we just refer to, e.g., Anderson (1963), and, more generally, to Wilkinson (1965, Chap. 2), which essentially suggests that a result of the same type as in (B.16) can be derived, with a rate of consistency which is possibly only a fractional power of the rate (ρT )−1 derived for the case of simple eigenvalues. However, note that any arbitrarily slow rate will be sufficient for the purpose of our proof. To finally give some insights into the principles of this proof, details of which we omit here, we note that as in Forni et al.
70
M. Eichler et al. / Journal of Econometrics 163 (2011) 51–70
(2004), Lemma 4.2, we proceed via three steps, which in our situation, again, need to be studied only for a fixed N. W.l.o.g., we formulate a sketch of these constructive steps for the case of using one of our proposed spectral estimator, the smoothed preperiodogram defined in (A.1), fulfilling Assumption 4, with a rate ρT = (hT bT T )1/2 delivered by Theorem 7. τ Φ (i) controlling the ‘‘smoothing’’ bias: Let 6cc NT (u, ω) = Kh Kb ∗ 6N (u, ω) denote the convolution of 6N with the product kernel in time Khτ (u) and frequency KbΦ (ω) (see, e.g., also cc (A.1)), and let PNj (u, ω) denote its j-th eigenvector. Then we have (paralleling Forni et al. (2004), Lemma 4.2(ii)) that uniformly in u and ω, for all j, cc ‖PNj (u, ω) − PNj (u, ω)‖ = O(h2T ) + O(b2T ),
a result that is based on the classical Taylor expansion of order 2 of the kernel convolution (in time and in frequency). (ii) controlling the variance, uniformly in u and ω, for all j, Var( PNj (u, ω)) = O((hT bT T )−1 ), essentially paralleling Forni et al. (2004), Lemma 4.2(iii) and Eq. (9.4.17) in Brillinger (1981). Obviously, here we use that the asymptotic variance of 6NT − 6cc NT is of the same given order, see A.4. (iii) controlling uniformly the bias cc ‖E PNj (u, ω) − PNj (u, ω)‖ = O((hT bT T )−1 ),
because it is essentially bounded by the bias of E 6NT − 6cc NT −1 which is also of order O ((hT bT T ) ) (paralleling Brillinger (1981), Theorem 7.4.1 and the proof of Forni et al. (2004), Lemma 4.2(iv)). Note that, in order to have these results on controlling separately bias and variance of PNj − PNj , uniformly over time u and frequency ω, we observe that the following conditions on the second (partial) derivatives of 6N (u, ω) with respect to ω and u are indeed fulfilled in our situation: sup sup ‖PNi (u, ω)(d2u 6N (u, ω))PNj (u, ω)∗ PNi (u, ω)‖ ≤ Cu (N ) u
ω
and sup sup ‖PNi (u, ω)(d2ω 6N (u, ω))PNj (u, ω)∗ PNi (u, ω)‖ ≤ Cω (N ) u
ω
with constants Cu (N ) and Cω (N ) which remain bounded over T , for each fixed N. For further details on this proof we refer to the given references to avoid lengthy technicalities. References Anderson, T.W., 1963. Asymptotic theory for principal component analysis. Annals of Mathematical Statististics 34, 122–148.
Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71, 135–171. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70 (1), 191–221. Brillinger, D.R., 1981. Time Series: Data Analysis and Theory. Holt, Rinehart and Winston, New York. Chamberlain, G., Rothschild, M., 1983. Arbitrage, factor structure, and meanvariance analysis on large asset markets. Econometrica 51, 1281–1304. Dahlhaus, R., 1996. Asymptotic statistical inference for nonstationary processes with evolutionary spectra. In: Robinson, P.M., Rosenblatt, M. (Eds.), Athens Conference on Applied Probability and Time Series Analysis, vol. II. SpringerVerlag, New York. Dahlhaus, R., 1997. Fitting time series models to nonstationary processes. The Annals of Statistics 25, 1–37. Dahlhaus, R., 2000. A likelihood approximation for locally stationary processes. The Annals of Statistics 28 (6), 1762–1794. Dahlhaus, R., 2008. Uniform convergence of a spectral density estimate for locally stationary processes (unpublished manuscript). Del Negro, M., Otrok, C., 2008. Dynamic factor models with time-varying parameters: measuring changes in international business cycles. Staff Report 326. Federal Reserve Bank of New York. Fancourt, C.L., Principe, J.C., 1998. Competitive principal components analysis for locally stationary time series. IEEE Transactions on Signal Processing 46, 3068–3081. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. The Review of Economics and Statistics 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2004. The generalized dynamic factor model: consistency and rates. Journal of Econometrics 119, 231–255. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2005. The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association 100 (471), 830–840. Forni, M., Lippi, M., 2001. The generalized dynamic factor model: representation theory. Econometric Theory 17, 1113–1141. Hallin, M., Li˘ska, R., 2007. Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association 102 (478), 603–617. Hoffman, D., Pagan, A.R., 1989. Post-sample prediction tests for generalized method of moment estimators. Oxford Bulletin of Economics and Statistics 51, 333–343. Jungbacker, B., Koopman, S.J., 2008. Likelihood-based analysis for dynamic factor models. Technical Report. Department of Econometrics, VU University Amsterdam. Lütkepohl, H., 1996. Handbook of Matrices. John Wiley & Sons. Molenaar, P.C.M., de Gooijer, J.G., Schmitz, B., 1992. Dynamic factor analysis of nonstationary multivariate time series. Psychometrika 57, 333–349. Motta, G., 2009. Evolutionary factor analysis. Ph.D. Thesis. Institut de statistique, Université catholique de Louvain, Belgium. Motta, G., Hafner, C., von Sachs, R., 2011. Locally stationary factor models: identification and nonparametric estimation. Econometric Theory 27 (6). Neumann, M., von Sachs, R., 1997. Wavelet thresholding in anisotropic function classes and application to adaptive estimation of evolutionary spectra. The Annals of Statistics 25 (1), 38–76. Pan, J., Yao, Q., 2008. Modelling multiple time series via common factors. Biometrika 95, 365–379. Priestley, M.B., Subba Rao, T., 1969. A test for non-stationarity of time series. Journal of the Royal Statistical Society B 31, 140–149. Stock, J.H., Watson, M.W., 2008. Forecasting in dynamic factor models subject to structural instability. In: Castle, J., Shephard, N. (Eds.), The Methodology and Practice of Econometrics, A Festschrift in Honour of Professor David F. Hendry. Oxford University Press, Oxford. Stock, J.H., Watson, M.W., 2005. Implications of dynamic factor models for VAR analysis. Working Paper Series 11467. National Bureau of Economic Research. Wilkinson, J.H., 1965. The Algebraic Eigenvalue Problem. Clarendon Press, Oxford.
Journal of Econometrics 163 (2011) 71–84
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Testing for structural breaks in dynamic factor models Jörg Breitung a,∗ , Sandra Eickmeier b a
University of Bonn, Institute of Econometrics, 53113 Bonn, Germany
b
Deutsche Bundesbank, Frankfurt, Germany
article
info
Article history: Available online 15 December 2010 JEL classification: C12 C33 Keywords: Structural break Factor model LM test
abstract In this paper we investigate the consequences of structural breaks in the factor loadings for the specification and estimation of factor models based on principal components and suggest procedures for testing for structural breaks. It is shown that structural breaks severely inflate the number of factors identified by the usual information criteria. The hypothesis of a structural break is tested by using LR, LM and Wald statistics. The LM test (which performs best in our Monte Carlo simulations) is generalized to test for structural breaks in factor models where the break date is unknown and the common factors and idiosyncratic components are serially correlated. The proposed test procedures are applied to datasets from the US and the euro area. © 2010 Elsevier B.V. All rights reserved.
1. Introduction In recent years dynamic factor models have become popular for analyzing and forecasting large macroeconomic datasets. These datasets include hundreds of variables and span large time periods. Thus, there is a substantial risk that the data generating process for a subset of variables or all variables has undergone structural breaks during the sampling period. Stock and Watson (2002) argue that factor models are either able to cope with breaks in the factor loadings in a fraction of the series, or can account for moderate parameter drift in all of the series. However, in empirical applications parameters may change dramatically due to important economic events, such as the collapse of the Bretton Woods system, or changes in the monetary policy regime, such as the conduct of monetary policy in the 1980s in the US or the formation of the European Monetary Union (EMU). There may also be more gradual but nevertheless fundamental changes in economic structures that may have led to significant changes in the comovements of variables, such as those related to globalization and technological progress. The common factors may become more (less) important for some of the variables and, therefore, the loading coefficients attached to the common factors are expected to become larger (smaller). If one is interested in estimating the common components or assessing the transmission of common shocks to specific variables, ignoring structural breaks may give misleading results. Variations in dynamic factor loadings have been considered before. The study most closely related to ours is that of Stock and
∗
Corresponding author. Tel.: +49 228 739201; fax: +49 228 739189. E-mail address:
[email protected] (J. Breitung).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.008
Watson (2008) who examine the implications of structural breaks in the factor loadings. Consequently, we will compare our testing approach with theirs. Del Negro and Otrok (2008) and Eickmeier et al. (2009) have suggested a model where the factor loadings are modelled as random walks. Finally, Banerjee and Marcellino (2008) have investigated the consequences of time variation in the factor loadings for forecasting based on Monte Carlo simulations and find it to worsen the forecasts, in particular for small samples. In our theoretical analysis, we first consider the effects of structural breaks in Section 2. It turns out that structural breaks in the factor loadings increase the dimension of the factor space. The reason is that in the case of a single structural break, two sets of common factors are needed to represent the common components in the two subsamples before and after the break. Thus, structural breaks in the factor loadings lead not only to inconsistent estimates of the loadings but also to a larger dimension of the factor space. If we are only interested in decomposing variables into common and idiosyncratic components, it is sufficient to increase the number of factors such that the factor space is large enough to represent the different subspaces of the two regimes. However, if we are interested in a more parsimonious factor representation that allows us to recover the original factors, the estimation has to account for the structural breaks in the factor loadings. In Section 3, we consider alternative versions of a Chowtype test for a structural break in a strict factor model, where the components are assumed to be white noise. The idea is to treat the estimated factors as if they were known. We show that under certain conditions on the relative rates of N and T the estimation error of the common factors does not affect the asymptotic distribution of the test statistic. A variant of the test procedure for an unknown break date is considered in Section 4.
72
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
In Section 5, the LM test procedure is generalized to allow for serially correlated factors and idiosyncratic components. By adapting the GLS estimation procedure suggested by Breitung and Tenhofen (2008) we obtain a test procedure that is robust to individual-specific dynamics of the components. The LM version of the test is shown to have reliable size properties whereas the OLS-based test statistic with robust standard errors used in Stock and Watson (2008) may exhibit severe size distortions in finite samples. Two empirical applications of the test procedures are presented in Section 6. On the basis of a large US macroeconomic dataset provided by Stock and Watson (2005), we examine whether January 1984 (which is usually associated with the beginning of the so called Great Moderation) coincides with a structural break in the factor loadings. On the basis of the LM test, we find evidence of a break around that date. By testing for shifts in the loadings of specific variables we are able to shed some light on the possible sources of the structural break. We also apply the LM test to a large euro-area dataset used in Altissimo et al. (2007). We find evidence for breaks at the dates of the handover of monetary policy from European national central banks to the European Central Bank (ECB) (stage 3 of EMU) and – to a lesser extent – the signing of the Maastricht treaty. Breaks seem to have occurred relatively frequently in the loadings of Spanish and Italian variables around the two events. The changeover to a single monetary policy in the euro area was associated with relatively frequent structural breaks in the loadings of nominal variables, whereas evidence of structural breaks is mainly found for industrial production series around the signing of the Maastricht treaty. 2. The effect of structural breaks on the number of factors Consider a factor model with r factors1 ft = [f1t , . . . , frt ]′ that is subject to a common break at time T ∗ : (1) yit = ft′ λi + εit ′ (2)
yit = ft λi
for t = 1, . . . , T ∗
(1)
+ εit for t = T + 1, . . . , T , ∗
(2)
where t = 1, . . . , T denotes the time period and i = 1, . . . , N indicates the cross-section unit.2 The assumption of a common structural break at T ∗ is made for convenience only. A generalization to situations with variable-specific break dates is straightforward and is considered in the subsequent sections. The vector of idiosyncratic errors ε·t = [ε1t , . . . , εNt ]′ is assumed to be i.i.d. with covariance matrix E (ε·t ε·′t ) = Σ , where Σ is a diagonal matrix. Furthermore ft is assumed to be white noise with positive definite (k) (k) covariance matrix E (ft ft′ ) = Φ . Let Λ(k) = [λ1 , . . . , λN ]′ , k = ∗ 1, 2; also τ = T /T ∈ (0, 1) denotes the relative break date. The unconditional covariance matrix of the vector y·t = [y1t , . . . , yNt ]′ results as
E
T 1−
T t =1
y·t y′·t
= τ Λ(1) ΦΛ(1) + (1 − τ )Λ(2) ΦΛ(2) + Σ ′
′
≡ Ψ + Σ. Since the matrix Ψ = τ Λ(1) ΦΛ(1) + (1 − τ )Λ(2) ΦΛ(2) is a sum of two matrices of rank r, the rank of the covariance matrix of the common component, Ψ , is 2r in general. This is due to the fact ′
′
1 Note that the notation does not refer to a particular normalization of the (true) common factors. In our asymptotic considerations we follow Bai (2003) and adopt
∑T
that a break in the factor loadings implies two linearly independent factors for the first and second subsamples. It follows that if the structural break in the factor loadings is ignored, the number of common factors is inflated by a factor of 2. More generally, if there are k structural breaks in the factor loadings of r common factors, the number of factors for the whole sample is (k + 1)r, in general. The practical implication of this result is that if one is only interested in a decomposition of the time series yit into a common component and an idiosyncratic component, then it is sufficient to increase the number of common factors accordingly. However, if one is interested in a consistent estimator of the factors and the factor loadings, then it is important to account for the break of the factor loadings, e.g. by splitting the sample at T ∗ and re-estimating the factor model for the two subsamples. For illustration consider (2) (1) the previous example with r = 1, T ∗ = T /2 and λi = λi + b. Define an additional factor as ft∗ =
ft
−ft
for t = 1, . . . , T ∗ for t = T ∗ + 1, . . . , T .
It is not difficult to see that the factor model with a structural break can be represented as yit = λ∗1i ft + λ∗2i ft∗ + εit (1)
where λ∗1i = λi + (b/2) and λ∗2i = −b/2. Note that the factors in this representation are ‘‘orthogonal’’ in the sense that E (T −1 ∑T ∗ t =1 ft ft ) = 0. This example demonstrates that a factor model with a structural break admits a factor representation with a higher dimensional factor space. To investigate the effects of a structural break on the information criteria suggested by Bai and Ng (2002) for selecting the number of common factors, a Monte Carlo experiment is performed. The data are generated by a factor model yit = λit ft + εit , where the single factor ft and idiosyncratic components are i.i.d. with variances E (ft2 ) = 1, E (εit2 ) = σi2 and σi is uniformly distributed with σi ∼ U (0.5, 1.5). The structural break in the loadings is specified as
λit =
λi λi + b
for t = 1, . . . , T /2 for t = T /2 + 1, . . . , T
(4)
and λi is a normally distributed random variable with λi ∼ N (1, 1). Therefore, the parameter b measures the importance of the structural break. Table 1 presents the average of the number of factors selected by the ICp1 criterion suggested by Bai and Ng (2002). The results show that if the break is large, the selection procedure overestimates the number of common factors. As predicted by our theoretical considerations, the information criterion indicates two factors instead of one if b gets large. Thus, ignoring a break in the factor loadings tends to identify too many factors in the sample. This may be misleading and a result of structural breaks. It is interesting to note that the situation is comparable to the problem of estimating a dynamic factor model within a static framework. As argued by Stock and Watson (2002), lags of the original factors can be accounted for by including additional factors. If one is merely interested in a decomposition into common and idiosyncratic components (e.g. in forecasting), then it is sufficient to estimate the static representation with a larger number of factors. However, if one is interested in the original (‘‘primitive’’ or ‘‘dynamic’’3 ) factors, then the static factors are inappropriate as they involve linear combinations of current and lagged values of the original factors.
p
a particular normalization such that T −1 t =1 ft ft′ → Ir . 2 As usual in the literature on factor models, we neglect possible deterministic terms like a constant or a linear time trend. In empirical practice the variables in the dataset are routinely de-meaned or detrended.
(3)
3 See Bai and Ng (2007) and Amengual and Watson (2007).
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84 Table 1 Averages of the estimated number of common factors. T = 50 b
N = 50
0.0 0.3 0.5 0.7 1.0
1.000 1.003 1.100 1.436 1.804
b
N = 100
0.0 0.3 0.5 0.7 1.0
1.000 1.000 1.126 1.525 1.881
b
N = 200
0.0 0.3 0.5 0.7 1.0
1.000 1.001 1.166 1.596 1.926
b
N = 300
0.0 0.3 0.5 0.7 1.0
1.000 1.002 1.165 1.620 1.942
T = 100
T = 200
T = 300
1.000 1.001 1.197 1.729 1.965
1.000 1.000 1.325 1.894 1.999
1.000 1.000 1.398 1.945 1.999
1.000 1.000 1.369 1.888 1.995
1.000 1.002 1.739 1.994 2.000
1.000 1.001 1.866 2.000 2.000
1.000 1.002 1.531 1.969 2.000
1.000 1.032 1.968 2.000 2.000
1.000 1.074 1.998 2.000 2.000
1.000 1.008 1.657 1.980 2.000
1.000 1.063 1.992 2.000 2.000
1.000 1.274 2.000 2.000 2.000
OLS estimates from a regression of yit on ft for two subsamples according to t = 1, . . . , Ti∗ and t = Ti∗ + 1, . . . , T . The second statistic is the Wald (W) test of the hypothesis ψi = 0 in the regression yit = λ′i ft + ψi′ ft∗ + vit ,
0 ∗ ft = ft
(9)
(10)
where εit = yit − λi ft denotes the estimated idiosyncratic component. The score statistic is denoted by si = T · R2i , where R2i denotes the uncentered R2 of the ith regression. To study the limiting null distributions of the three test statistics we first invoke the usual assumptions of the approximate factor model.
′
Assumption 1. Let yit be generated by the factor model yit = λ′i ft +εit , where it is assumed that λi , ft , and εit satisfy Assumptions A–G of Bai (2003).
Consider a model with an individual-specific structural break at period Ti∗ given by (1) yit = ft′ λi + εit
for t = 1, . . . , Ti∗
(5)
(2) yit = ft′ λi + εit
for t = Ti∗ + 1, . . . , T ,
(6)
where ft is an r-dimensional vector of common factors. Under the null hypothesis we assume (7)
To test this null hypothesis, the usual Chow test statistics are formed by replacing the unknown vector of common factors, ft , by its principal components (PC) estimator, ft .4 Applying the likelihood ratio (LR) principle for testing the ith variable gives rise to the statistic
for t = 1, . . . , Ti∗ for t = Ti∗ + 1, . . . , T .
εit = θi′ ft + φi′ ft∗ + εit ,
This set of assumptions allows for some weak serial and crosssection dependence and heteroskedasticity among the idiosyncratic components εit . Furthermore, the factors and idiosyncratic components are allowed to be weakly correlated provided that
= λ(i 2) .
(8)
The resulting test statistic is denoted by wi . The Lagrange multiplier (LM) statistic, indicated by si , is obtained from running a regression of the form
3. The static factor model
(1)
t = 1, . . . , T ,
where
Note: This table presents the averages of the estimated number of common factors selected by the ICp1 criterion suggested by Bai and Ng (2002). The results are based on 1000 replications of the model with a structural break of size b.
H0 : λi
73
lri = T log(S0i ) − log(S1i + S2i ) ,
2
T N 1 − 1 − ft εit < ∞ E √ N i=1 T t =1
√
for all T and N. Under Assumption 1 and T /N → 0 the estimation error in the regressor ft does not affect the asymptotic distribution of the test statistic. To establish the usual asymptotic χ 2 distribution of the Chow test, a more restrictive set of assumptions is required: Assumption 2. (i) For all t = 1, . . . , T , E (εit2 ) = σi2 and E (εit εis ) = 0 for t ̸= s. (ii) ft is independent of εis for all i, t , s. The null distributions of the test statistics are presented in the following theorem. Theorem 1. Under Assumptions 1 and 2, T → ∞, N → ∞, and √ T /N → 0, the statistics si , wi and lri have a χ 2 limiting distribution with r degrees of freedom.
where S0i =
Remark A. It is tempting to combine the individual statistics to obtain a pooled test of the joint null hypothesis that there is no structural break in the N loading vectors λ1 , . . . , λN . For example, a pooled LM test may be constructed as
T − (yit − ft′ λi )2 , t =1 ∗
S1i =
Ti −
(1)
yit − ft′ λi
2
,
N ∑ si
t =1
S2i =
T −
(2)
yit − ft′ λi
2
,
t =Ti∗ +1
and λi denotes the PC estimator of the vector of factor loadings, whereas λ(i 1) and λ(i 2) denote the two estimates obtained as the
4 Some details of the estimator are considered in the Appendix.
LM∗ =
− rN
i=1
√ 2rN
,
which is equivalent to the standardized mean-group version of the LM statistic (eg. Breitung and Pesaran, 2008). However, the application of this panel statistic would require the additional assumption that εit and εjt are independent for all i ̸= j. Such an assumption is highly unrealistic in most empirical applications (e.g. Chamberlain and Rothschild, 1983; Stock and Watson, 2002; Bai and Ng, 2002).
74
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
Remark B. It is important to select the appropriate number of common factors as otherwise the test may lack power. If the number of common factors is determined from the entire sample, the identification criteria tend to select a larger number of common factors. As has been argued in Section 2, a factor model with a structural break admits a (parameter constant) factor representation with a larger number of factors. Therefore, the number of factors should be selected by applying the information criteria of Bai and Ng (2002) to the subsamples before and after the break, at least if it is assumed that the break date is (roughly) the same for all variables.5 Remark C. Although the three test statistics are asymptotically equivalent, i.e., they possess the same asymptotic distribution under the null hypothesis as well as a properly specified sequence of local alternatives, the power of the LM test may suffer from ignoring the break when computing the estimator of the residual variances in small samples (cf. Vogelsang, 1999). To investigate the finite sample properties of the test statistics, a Monte Carlo experiment is performed. We simulate data according (k) to the single-factor model yit = λi ft + εit , where the factor and idiosyncratic components are generated as in Section 2. The empirical sizes of the three different test statistics LR, LM und W are presented for various sample sizes in Table 2. It turns out that for all N and T the rejection frequencies of tests are close to the nominal size of 0.05. Among the three asymptotically equivalent tests the LM test has the best size properties. The LR test performs only slightly worse and the W test tends do be (slightly) oversized. Table 3 reports results on the empirical power of the tests. The structural break is again modelled as a shift of size b in the mean of the factor loadings (see Section 2). The results suggest that the tests have similar power. The LR and W tests seem to have slightly higher power but this is no surprise as these tests are oversized for small T . Overall, our simulation experiments (based also on models with more factors and other data generating mechanisms6 ) suggest that the performances of all three tests are similar if T and N are sufficiently large. In what follows we focus on the LM test statistic as it is computationally convenient and has superior size properties. 4. Unknown break dates So far we have assumed that the break date Ti∗ is known. In many empirical applications (such as the ones considered in Section 6) the precise date of the structural break is unknown. In this section we adapt the Andrews (1993) tests for structural breaks with an unknown break date. Let [·] denote the integer part operator and Wi (a) be an r-dimensional vector of standard Brownian motions defined on a ∈ [0, 1]. Following Andrews (1993) and others, our test is based on the following assumption7 : Assumption 3. As T → ∞, [τ T ] 1−
T t =1 [τ T ] 1−
T t =1
p
ft ft′ → τ Σf 1/2
ft εit ⇒ σi Σf
Wi (τ )
for all i and τ ∈ [0, 1], where Σf is a positive definite matrix.
5 We are grateful to Peter Boswijk who pointed out this problem during the conference. 6 For example, if the number of common factors increases, the positive size biases of the LR and W test increase, whereas the LM test becomes slightly conservative. The results of the additional Monte Carlo simulations are included in the working paper version of this paper. 7 See Perron (2006) for a thorough discussion of this assumption.
Table 2 Empirical sizes (average rejection frequencies). N
LR
LM
W
T = 50 20 50 100 150 200
0.055 0.054 0.056 0.055 0.056
20 50 100 150 200
0.051 0.052 0.051 0.052 0.052
LR
LM
W
0.052 0.048 0.049 0.050 0.049
0.057 0.052 0.054 0.054 0.053
0.050 0.048 0.050 0.049 0.050
0.052 0.050 0.052 0.052 0.052
T = 100 0.049 0.048 0.049 0.048 0.048
0.057 0.056 0.058 0.058 0.058
0.048 0.049 0.049 0.050 0.050
0.051 0.052 0.052 0.053 0.053
T = 150
0.055 0.051 0.052 0.053 0.052 T = 200 0.051 0.050 0.051 0.051 0.051
Note: The entries report the average frequencies of rejection of N variable-specific tests for structural breaks within each dataset generated by a factor model without a structural break (b = 0). The nominal size is 0.05 and 1000 replications are used to compute the averages. Table 3 Power against a break at Ti∗ = T /2. b
LR
LM
W
T = 50 0.1 0.2 0.3 0.5
0.062 0.083 0.116 0.200
0.1 0.2 0.3 0.5
0.071 0.132 0.219 0.381
LR
LM
W
0.061 0.105 0.162 0.298
0.065 0.112 0.170 0.307
0.078 0.150 0.257 0.441
0.081 0.154 0.261 0.446
T = 100 0.054 0.074 0.105 0.186
0.064 0.086 0.118 0.203
0.069 0.128 0.214 0.376
0.072 0.133 0.220 0.383
T = 150
0.064 0.110 0.168 0.305 T = 200 0.080 0.153 0.260 0.445
Note: The entries report the average frequencies of rejection of N = 50 variablespecific tests for structural breaks. The data are generated on the basis of (4) with a structural break of size b. See Table 2 for further information.
Andrews (1993) considered three asymptotically equivalent test statistics based on the supremum of the LM, LR and Wald statistics. Since the test statistics perform very similarly in our Monte Carlo experiment of Section 3 and it is particularly simple to compute, we focus on the sup-LM statistic given by
Si,T (τ0 ) =
sup
τ ∈[τ0 ,1−τ0 ]
(sτi ),
(11)
where sτi denotes the LM statistic for a structural break in the prespecified interval of relative break dates τ ∈ [τ0 , 1 − τ0 ] and crosssection unit i. Andrews and Ploberger (1994) proposed optimal tests that maximize the weighted average power. However, simulation studies (e.g. Andrews et al., 1996) suggest that in most situations the power loss of the simple sup-LM statistic is small relative to that in the optimal tests and, therefore, we will only consider the sup-LM statistic (11). In the following theorem it is stated that under assumptions similar to those in Theorem 1 the limiting distribution of the supLM test is the same as in Andrews (1993) and, therefore, the critical tables presented therein can be used.
√
Theorem 2. Under Assumptions 1–3, T → ∞, N → ∞, and T /N → 0, the sup-LM statistic (11) is asymptotically distributed as
S (τ0 ) =
sup
τ ∈[τ0 ,1−τ0 ]
[τ W (1) − W (τ )]′ [τ W (1) − W (τ )] τ (1 − τ )
where W (·) denotes an r-dimensional vector of standard Brownian motions. Remark D. It is not difficult to show that the same limiting distribution results if the LM statistic is replaced by the Wald or the
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84 Table 4 Empirical sizes of the sup-LM test for unknown break dates. N
T = 50
T = 100
T = 150
T = 200
50 100 150 200
0.024 0.022 0.020 0.021
0.030 0.028 0.025 0.027
0.034 0.031 0.030 0.033
0.038 0.034 0.033 0.034
Note: The entries report the average frequencies of rejection of variable-specific tests for structural breaks assuming unknown break dates. The data are generated by the factor model with r = 1 and no structural break (the null hypothesis). The critical value presented in Andrews (1993) for τ0 = 0.1 is applied. The nominal size is 0.05.
LR statistics. However, as noted by Andrews (1993), the sequence of LM statistics is particularly easy to compute. Remark E. So far we have assumed that under the alternative there is a single structural break in the factor loadings. To allow for multiple breaks, the efficient search procedure proposed by Bai and Perron (2003) may be employed for finding the candidate break dates. Alternatively, a sequential estimation and testing procedure for the break dates may be entertained (cf. Bai and Perron, 1998). To investigate whether the asymptotic distribution presented in Theorem 2 yields a reliable approximation for small samples, we repeated the Monte Carlo experiment of Section 3. However, it is assumed that the break point is unknown and, therefore, the supLM statistic is employed. The test procedures search for a structural break within the interval τ ∈ [0.1, 0.9] (i.e. τ0 = 0.1) and the (asymptotic) critical values provided by Andrews (1993) are applied. Table 4 presents the actual size of the sup-LM test for various sample sizes. The nominal size is 0.05. It turns out that for small samples the tests tend to be conservative, yet the actual sizes tend slowly to 0.05 as T increases. 5. Dynamic factor models In the previous section we have considered the framework of a static factor model, where the common and idiosyncratic components are white noise. In many practical situations, however, the variables are generated by dynamic processes. In this section we therefore generalize the factor model and assume that the idiosyncratic components in the model yit = λ′i ft + uit are generated by individual-specific AR(pi ) processes: uit = ϱi,1 ui,t −1 + · · · + ϱi,pi ui,t −pi + εit
(12)
ϱi (L)uit = εit ,
(13)
where ϱi (L) = 1 − ϱi,1 L − · · · − ϱi,pi L . To analyze the asymptotic properties of the tests in a dynamic factor model we make the following assumption. pi
Assumption 4. (i) The idiosyncratic components are generated by (13), where all roots of the autoregressive polynomial ϱi (z ) are outside the unit circle. (ii) For all t, E (εit2 ) = σi2 and E (εit εis ) = 0 for t ̸= s. (iii) ft is independent of εit for all i and t. The dynamic process of the vector of common factors is left unspecified. We only assume that the second moments are finite,
∑T
p
i.e., the probability limit T −1 t =1 ft ft′ → Σf is a finite positive definite matrix (see Assumption A in Bai (2003)). To test for structural breaks, Stock and Watson (2008) suggest applying conventional Chow tests for each variable yit , where the unobserved factors are replaced by estimates obtained from applying principal components. A possible serial correlation of the errors is accounted for by using heteroskedasticity and autocorrelation consistent (HAC) estimators for the standard errors of the coefficients (cf. Newey and West, 1987). This approach has,
75
however, two important drawbacks. First, since the OLS estimator is inefficient in the presence of autocorrelated errors, the resulting test suffers from a loss of power relative to a test based on a GLS estimator. Second, it is well known that the HAC estimator may perform poorly for small samples. To sidestep these difficulties, we follow Breitung and Tenhofen (2008) and compute the test statistic based on a GLS estimation of the model. The GLS transformed model results as
ϱi (L)yit = λ′i [ϱi (L) ft ] + ψi′ [ϱi (L) ft∗ ] + εit∗ ,
(14)
∗
where ft denotes the PC estimator of the common factors, ft = ft for t = Ti∗ + 1, . . . , T and ft∗ = 0 otherwise. The lag polynomials ϱi (L), i = 1, . . . , N, can be estimated by running least squares regressions
ϵit , ui,t −pi + uit = ϱi,1 ui,t −1 + · · · + ϱi,pi
(15)
where uit is the PC estimator of the idiosyncratic component. The lag length pi can be determined by employing the usual information criteria. To test the hypothesis of no structural break at Ti∗ , the LM statistic for ψi = 0 is computed. The resulting test statistic is denoted by si . We focus on the LM statistic as this statistic possesses the best size properties among all tests considered in Section 3. The following theorem states that the asymptotic null distribution of the resulting LM test statistic is the same as in Theorem 1. Theorem 3. Let si denote the LM statistic for ψi = 0 in the regression
ϱi (L)yit = λ′i [ ϱi (L) ft ] + ψi′ [ ϱi (L) ft∗ ] + εit∗ , t = pi + 1, . . . , T .
(16)
√
Under Assumptions 1 and 4, T → ∞, N → ∞, and T /N → 0, si is asymptotically χ 2 distributed with r degrees of freedom. Remark F. Assumption 3 rules out temporal heteroskedasticity of the idiosyncratic components. It is well known that the Chow test is not robust against a break in the variances. To obtain a robust statistic in the case of serial heteroskedasticity, the approach of White (1980) can be adopted. Alternatively, a GLS variant of the test statistic that is robust against a break in the variance at Ti∗ can be constructed as for
t = pi + 1, . . . , Ti∗ : 1
σi(1)
for
ϱi (L)yit = λi
′
t = Ti∗ + 1, . . . , T : 1
σi(2)
ϱi (L)yit = λi
′
1 ′ ∗ ϱi (L)ft + ψi ϱi (L)ft + εit∗ σi(1) σi(1) 1
1 ′ ∗ ϱi (L) ft + ψi ϱi (L) ft + εit∗ . σi(2) σi(2) 1
Remark G. It is possible to construct the sup-LM statistic of Section 4 based on the GLS transformed series. Using arguments similar to those in the proof of Theorem 2 it can be shown that the resulting test statistic possesses the same limiting distribution as S (τ0 ) defined in Theorem 2. To investigate the small sample properties of the test, we generate the factor as ft = 0.5ft −1 + νt .8 The idiosyncratic errors are generated by using the model uit = ϱui,t −1 + εit
8 Since the data generating process for f is irrelevant for the asymptotic t properties of the test, we do not present the results for other values of the autoregressive coefficient.
76
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
Table 5 Empirical sizes in the dynamic model.
ϱ
LM(stat)
LM(dyn)
HAC(4)
HAC(12)
HAC0 (4)
HAC0 (12)
0.105 0.142 0.258 0.078 0.058
0.147 0.175 0.245 0.129 0.114
0.052 0.072 0.156 0.037 0.028
0.030 0.037 0.055 0.026 0.023
0.070 0.083 0.141 0.057 0.047
0.081 0.088 0.110 0.076 0.068
0.055 0.065 0.115 0.045 0.038
0.050 0.052 0.065 0.046 0.043
0.062 0.070 0.109 0.053 0.046
0.069 0.072 0.088 0.064 0.061
0.053 0.060 0.094 0.046 0.040
0.050 0.052 0.062 0.047 0.045
N = 100, T = 100 0.2 0.5 0.9 −0.2 −0.5
0.089 0.187 0.405 0.023 0.004
0.049 0.052 0.049 0.050 0.049
ϱ
N = 100, T = 500
0.2 0.5 0.9 −0.2 −0.5
0.095 0.193 0.425 0.022 0.003
ϱ
N = 100, T = 1000
0.2 0.5 0.9 −0.2 −0.5
0.094 0.197 0.426 0.021 0.003
0.050 0.049 0.049 0.051 0.050
0.049 0.049 0.050 0.049 0.049
6.1. The US economy in the mid-1980s
Note: Entries report the average frequencies of rejection of N variable-specific tests for a structural break at Ti∗ = T /2 computed from 1000 replications of the dynamic model without a structural break. The nominal size is 0.05. The column LM(stat) presents the rejection rates for an LM test that ignores the serial correlation in the idiosyncratic component. LM(dyn) indicates the test based on a GLS regression considered in Theorem 3. HAC(k) denotes an OLS-based test using robust (HAC) standard errors with the truncation lag computed from (17). HAC0 (k) is the LM variant of the test statistic based on the residuals of the restricted regression.
for all i = 1, . . . , N. For the variances we set E (νt2 ) = 1 and E (εit2 ) = σi2 , where σi ∼ U (0.5, 1.5). The factor loadings are obtained from independent draws of a N (1, 1) distribution. Table 5 reports the average rejection rates for the individual tests si . The tests assume that the break occurs at period Ti∗ = T /2. To assess the size bias that results from ignoring the serial correlation of the idiosyncratic component we first present the ordinary LM statistic that assumes white noise errors. As can be seen from the first column of Table 5, the rejection rates of the test are far from the nominal size of 0.05 even if the autoregressive coefficient is fairly small. In contrast, the actual size of the LM statistic computed from the GLS regression is close to the nominal size for all values of ϱ. The columns labelled as HAC(k) report the actual sizes of the OLS-based t-statistics employing robust standard errors, where the truncation lag is specified by applying the rule
ℓT (k) = k(T /100)2/5 with k ∈ {4, 12}.
possible breaks in the euro-area economies due to the two major events in the 1990s, the signing of the Maastricht treaty and the handover of monetary policy from national central banks to the ECB. We will address important issues that typically arise in applications.
(17)
Since we found that the sizes are more reliable if the test is computed using the LM principle, we also compute the HAC standard errors from the residuals of the restricted regression (i.e. where we have imposed the null hypothesis). The resulting test statistics are indicated by HAC0 (k). From the results presented in Table 5 it turns out that the test statistics based on HAC standard errors perform poorly for small samples. The test based on the restricted residuals (HAC0 (k)) performs much better but still exhibits some size distortions. To demonstrate that the size bias of HAC tests is indeed a small sample phenomenon, we repeat the simulations for T = 500 and T = 1000. The results show that if T increases, the empirical sizes of the original HAC(k) slowly tend to the nominal size. 6. Empirical applications Our test procedure is applied to two settings. In Section 6.1, we investigate whether the mid-1980s in the US can be associated with structural breaks in the loadings. In Section 6.2, we consider
In this subsection we apply our test procedure to the dataset constructed by Stock and Watson (2005) and provided on Mark Watson’s web page to investigate whether the mid-1980s in the US can be associated with structural breaks in the factor loadings. The dataset contains 132 monthly US series including measures of real economic activity, prices, interest rates, money and credit aggregates, stock prices, and exchange rates. It spans 1960–2003.9 We start by considering a single break in 1984:01. That date has been associated with the beginning of the so called Great Moderation, i.e. the decline in the volatility of output growth and inflation (Kim and Nelson, 1999; McConnell and Perez-Quiros, 2000; Stock and Watson, 2008). One motivation for our empirical application is that we will be able to compare our results to those of Stock and Watson (2008) who also test for structural breaks in the factor loadings in 1984:01 and use a very similar dataset.10 Another motivation of our application is that the sources of the Great Moderation are still controversial. Previous papers have applied structural break tests to univariate linear and univariate Markovswitching models or, more recently, structural VAR models with time-varying parameters to tackle this question. They have come up with various explanations, and it is still unclear to what extent either ‘‘good luck’’ or structural changes including ‘‘good policy’’ have contributed to the volatility decline (cf. Galí and Gambetti, 2008 as well as Stock and Watson, 2003 and references therein). ‘‘Good luck’’ is based on the observation that smaller shocks hit the economy after the break date considered (cf. Benati and Mumtaz, 2007). ‘‘Good policy’’ on the other hand emphasizes the fact that monetary policy has put more weight on inflation relative to output stabilization since the 1980s (Clarida et al., 2000). Other structural changes that may have played a role include improved inventory management mainly in the durable goods sector (McConnell and Perez-Quiros, 2000; Davis and Kahn, 2008) as well as financial innovation and better risk sharing spurred on by financial deregulation (IMF, 2008). Given this ongoing controversy in the discussion we find it useful to analyze the mid-1980s in the US with a new methodology. Our data-rich framework enables us not only to test for breaks in the factor loadings associated with many variables and thus to identify ‘‘dramatic’’ changes in the economy, but also to test where breaks have occurred, i.e. loadings of which variables or groups of variables have changed. This may help to shed some light on the sources of possible structural changes. Factor analysis requires some pre-treatment of the data. We proceed exactly as in Stock and Watson (2005). Non-stationary raw data (which were already available to us in seasonally adjusted form) are differenced until they are stationary. In our baseline, we remove outliers.11 To assess whether removing outliers from the data affects our results we also consider below the case where
9 The original dataset is provided for the period 1959–2003. Some observations are, however, missing in 1959. We therefore decided to use a balanced dataset starting in 1960. 10 The main difference is that their dataset is quarterly and also covers more recent years (up to 2006). 11 Outliers are defined as observations of each (stationary) variable with absolute median deviations larger than six times the interquartile range. They are replaced by the median value of the preceding five observations.
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
77
Table 6 Tests for structural breaks (US data). r =6
r =7
r =8
r =9
0.54 0.64 0.75
0.55 0.62 0.70
0.65 0.67 0.80
0.67 0.65 0.76
With outlier adjustment rej % LM (1984:01) rej % HAC (1984:01) rej % sup-LM
0.48 0.50 0.66
0.52 0.58 0.67
Without outlier adjustment rej % LM (1984:01) rej % HAC (1984:01) rej % sup-LM
0.61 0.61 0.73
0.64 0.66 0.73
Note: ‘‘rej % LM’’ is the relative rejection rate for the N individual LM statistics and ‘‘rej % HAC’’ is the respective rejection rate for the OLS-based test procedure with HAC standard errors, where the truncation lag results from (17) with k = 4. ‘‘rej % sup-LM’’ indicates the rejection rates for the test with unknown break date.
data were not outlier adjusted.12 Finally we normalize the series to have means of zero and unit variances. The reader is referred to Stock and Watson (2005) for details on the composition and the treatment of the dataset. Following Stock and Watson (2005), our benchmark estimation is based on r = 9 factors. The Bai and Ng (2002) ICp1 criterion only indicates r = 7, but, as already pointed out in Stock and Watson (2005), we find the criterion to be flat for r = 6–10. We therefore also consider r = 6–8 factors below.13 As argued in the previous sections, we focus on the LM test in our application. We test the null hypothesis of no break in the factor loadings in 1984:01. We generally allow for a break in the variance of the idiosyncratic components as suggested in Remark F. Table 6 shows the rejection rates, i.e. the shares of the 132 variables for which a structural break is found, indicated by the LM test and, in comparison, by the OLS-based test statistic with HAC (robust) standard errors. For the former test, we allow for six autoregressive lags of the idiosyncratic components, and for the latter test, the number of autoregressive lags for the Newey–West correction is set to 7 according to the formula (17) with k = 4. This allows us to concentrate on structural changes in the common component as a source of the Great Moderation as opposed to ‘‘good luck’’ which would at least partly be reflected in the error variances. A clear structural break is identified for the majority of the variables at 1984:01. On the basis of r = 9 and outlier-adjusted data, the LM test yields a rejection rate of 0.55. The rejection rate suggested by the HAC test procedure considered in Section 4 is even larger (0.62), consistent with our simulation results which have illustrated that the HAC test procedure tends to reject too often the null hypothesis of no structural breaks. That share also exceeds the share estimated by Stock and Watson (2008), who find that 35% of the variables exhibited structural breaks in the loadings. The reason is that Stock and Watson (2008) rely on fewer (three or four) factors in that paper. When we re-do the tests on the basis of fewer factors, we obtain rejection rates comparable to those presented by the authors. Interestingly, the shares increase to 0.67 (for the LM test) and 0.65 (for the HAC test procedure) when outliers are not removed prior to the estimation, suggesting that
12 We are grateful to Jean-Pierre Urbain for suggesting this exercise to us. 13 As noted in Remark B, the number of factors should be determined by using the subsamples before and after the break. Indeed we found that the information criteria tend to suggest a smaller number of factors for the subsamples than for the whole sample. However, since the test for structural breaks is applied to a range of possible break dates, this would mean that the numbers of factors have to be reestimated for all time periods under consideration. Furthermore, the information criteria tend to choose different numbers of factors for the two subsamples. We therefore decided to employ the same number of factors as was used in the earlier literature. Note that if the number of factors is overspecified, the tests tend to have low power. Since in our applications the great majority of the tests reject the null hypothesis, we conclude that a possible loss of power is not a problem in our case.
Fig. 1. Relative rejection frequencies (US data).
our outlier adjustment already takes care of breaks in a subset of variables. As shown in Section 2, the number of common factors may be overestimated in the case of a structural break. We therefore split the sample into two subsamples: 1960:01–1983:12 and 1984:01–2003:12 and re-estimated r for each subsample (and for our baseline with outlier-adjusted data). The Bai and Ng (2002) ICp1 test suggests r = 4 for the first subsample and r = 6 factors for the second subsample supporting our theoretical considerations and our finding of a structural break based on r = 9. Unlike in the simulations, the estimated numbers of factors in the two subsamples are not equal, nor are they equal to half the number of factors estimated on the basis of the total sample. One reason may be that the loadings of some of the variables or those associated with some of the factors does not exhibit a structural break. Other explanations may be that the size of the break is moderate (see our Monte Carlo simulations of Section 2) or that variables’ loadings shift at different points in time. If we were interested in estimating the factors, we would need to split the sample and estimate the factors on the basis of smaller r. However, our objective is to test for a structural break. In order to consider all factors, we keep on working with nine factors. We next investigate whether the break has occurred exactly in 1984:01 and whether it is the only structural break during the sample period. We apply the LM test for each possible break point in the interval τ = [0.1, 0.9], i.e. τ0 = 0.1. The solid line in Fig. 1 shows the relative rejection frequencies for the individual LM tests for each point in time. Between the beginning of the 1980s and the beginning of the 1990s the test rejects the null hypothesis of no structural break for more than half of the variables, and particularly high rejection rates (around 60%) are found around 1985. Fig. 1 also shows that it may matter whether one allows for a break in the variance of the idiosyncratic components. The test that assumes a constant variance is represented by the dotted lines. This version of the test tends to yield smaller rejection rates compared to the robust version. We have also applied the sup-LM test for unknown break dates as suggested in Section 4 to our dataset and reject the null hypothesis of no structural break at any point in time for r = 9 for 70% of the variables (Table 6). The average of the estimated break dates obtained from the maxima of the LM statistics is 1981:01. Finally, Fig. 1 also reveals a discrepancy between rejection rates obtained from outlier-adjusted data and from the unadjusted data (already apparent in Table 6). The latter rates are represented by the dashed line and exceed the rejection rates from our baseline over most of the sample period. Interestingly, the dashed line reaches its peak exactly in 1984:01.
78
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
Table 7 Tests for specific variables (US data).
6.2. Have the Maastricht treaty and the handover of monetary policy to the ECB led to structural breaks in the euro area?
Variable
p-value
Commonality
Industrial production (IP) IP durable cons. goods IP non-dur. cons. goods IP durable mat. goods IP non-dur. mat. goods Inventory Consumption CPI FFR Cons. expectations 10 y gvt bond yields S&P 500 Effective exch. rate Commodity prices
0.06 0.04 0.72 0.00 0.04 0.00 0.15 0.00 0.01 0.00 0.01 0.03 0.00 0.12
1.00 1.00 0.99 1.00 0.98 0.50 1.00 0.99 0.74 0.67 0.71 0.95 0.76 0.49
Note: The p-values are the marginal significance levels of the LM test for a structural break at 1984:01. The commonality is equivalent to the R2 of the regression of the variable on the common factors. Variables were transformed as in Stock and Watson (2005). Outlier-adjusted data are used.
Giordani (2007) has pointed out that, although some series may be I(1) in the total period, they may be stationary in subperiods and differencing them would result in an overdifferencing. To avoid overdifferencing, we consider an alternative dataset where inflation, interest rates, money growth, capacity utilization and the unemployment rate enter in levels rather than in growth rates as before (and as in Stock and Watson, 2005, 2008). The results do not change much, and these are available upon request. To investigate where structural breaks have occurred, it may be instructive to look at test results for individual variables. We focus on several key macroeconomic variables which are of general interest, but also on variables which are particularly interesting against the background of the Great Moderation and its possible sources such as a monetary policy instrument, inventories, the production of durable and non-durable goods as well as consumption and several financial variables. Breaks or the lack of breaks in the loadings of these variables may support some of the conjectures on the sources of the Great Moderation discussed above. We provide results for the heteroskedasticity-robust version of the test. Table 7 suggests that there is evidence of a break in the loadings of some but not all key macroeconomic variables in 1984:01. There seems to be a break in the loadings of the CPI and consumer expectations (transformed accordingly), but not in the loadings of commodity prices. For total industrial production the test rejects the null hypothesis only at the 10% significance level. Among the variables which may provide some information on the sources of the changes, breaks are found in the loadings of inventories, the production of material goods and durable consumer goods, but not of the production of non-durable consumer goods. The LM test also rejects the null hypothesis of no structural break in the loadings of the Federal funds rate and of most financial variables (long-term interest rates, stock prices, and effective exchange rates). Table 7, however, also reveals no evidence for changes in the loadings of consumption, although a popular hypothesis is that financial integration leads to consumption smoothing and therefore reduces the responsiveness of consumption to shocks. Notice also that the commonality is high for all variables shown in Table 7: the factors explain at least half of the variation in each variable and almost all of the variance in (the stationary versions of) industrial production variables, consumption, and CPI. To summarize, we find some support for substantial changes in the US economy around the date that is generally associated with the Great Moderation in the US, 1984:01. Our analysis further suggests that various structural changes can probably explain this result, including changes in the conduct of monetary policy, ongoing financial integration and better inventory management (possibly in the durable goods sector).
Our second application is concerned with possible changes in comovements that may have occurred in the euro area in the 1990s due to two major events. The first event is the Maastricht treaty, which was signed in 1992:02. With the treaty, a timetable for EMU was prepared and conditions for countries to become EMU members were fixed. These include low inflation rates, converged interest rates, stable exchange rates, and solid fiscal budgets. The second event was stage 3 of EMU, i.e. the changeover to a single monetary policy and the fixing of exchange rates, in 1999:01. This setting is particularly interesting, since these events may have altered the comovement between variables, and this will just be reflected in breaks in the loadings. It is still not entirely clear how these two events have affected the comovements of business cycles and other variables in euroarea countries. Some arguments point to greater comovements, some to smaller comovements. Also it is unclear whether changes have occurred at exactly the dates of or before or after these two events. On the one hand, the Maastricht treaty and accession prospects have forced countries to improve their fiscal situation and to carry out structural reforms in order to qualify for EMU membership. Greater structural and political similarity could lead to long-run convergence and a greater synchronization of business cycles, possibly already before the handover of monetary policy from national central banks to the ECB. On the other hand, these requirements have limited the scope for national fiscal policy to stabilize the economy. Similarly, the handover of monetary policy from the national central banks to the ECB implied a loss for individual EMU member countries of an important stabilization tool, which they could previously apply in response to asymmetric shocks. Both effects may have lowered business cycle synchronization before and after the events, respectively. There is, however, an argument stressing the ‘‘endogeneity of optimum currencyarea criteria’’ (including the synchronization of business cycles) (Frankel and Rose, 1998): as a consequence of the events, transaction costs have declined, and this should spur the processes of greater trade and financial integration and hence greater business cycle comovements (cf. Imbs, 2004; Kose et al., 2003; Baxter and Kouparitsas, 2005). Given the ambiguity of these arguments, it remains to be tackled empirically whether and to what extent the two events have led to structural breaks and what has been the exact timing of structural breaks if there were any. Our empirical application is most closely related to Canova et al. (2009), who also investigate to what extent these two events have affected business cycles and their comovements in the euro area. On the basis of a panel VAR index model, the authors find some changes over time, but no evidence of clear structural breaks that coincide with the events.14 We apply the LM test procedure presented in Sections 4 and 5 to a monthly dataset used in Altissimo et al. (2007).15 The dataset spans 1987:01–2007:06 and includes 209 macroeconomic variables from EMU member countries, the euro area as a whole, and a few external variables.16 Series which were not already in seasonally adjusted form were seasonally adjusted by using the
14 Canova et al. (2009) assess changes around slightly different points in time, namely 1993:04, i.e. when the Maastricht treaty became effective, 1998:03, i.e. the date of the ECB creation, and 2002:01, i.e. the date of the euro changeover. 15 We are grateful to Giovanni Veronese for providing us with an updated version of that dataset. 16 The New Eurocoin indicator suggested in Altissimo et al. (2007) is constructed on the basis of 145 variables. The underlying dataset is larger. We use a subset of this larger dataset to obtain a balanced panel.
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
79
Table 8 Tests for specific variables. Country
Maastricht
EMU
# variables
DEU BEL ESP FRA ITA NLD
0.14 0.13 0.25 0.03 0.26 0.24
0.31 0.19 0.67 0.36 0.48 0.38
42. 16. 24. 33. 27. 21.
0.24 0.21 0.15 0.17 0.05
0.31 0.44 0.53 0.39 0.23
62. 43. 59. 23. 22.
Variables Ind. prod. Inflation Mon. and fin. var. Labor markets Surveys
Note: This table presents the rejection frequencies for various groups of variables. The last column presents the number of variables in the group. Fig. 2. Relative rejection frequencies (EMU data).
Census X12 procedure. Non-stationary series were transformed to stationary series as in Altissimo et al. (2007). Variables such as inflation and interest rates enter in levels. Therefore, there is no need to consider an additional transformation of the data as in the previous application. Outliers were removed as before. Unlike in the previous application, results barely depend on whether data were outlier adjusted or not. Therefore we only present results based on the adjusted dataset. Finally, as before, the series were de-meaned and divided by their standard deviations. For details on the data and the transformations, see Altissimo et al. (2007). On the basis of the entire dataset and the ICp1 criterion of Bai and Ng (2002), r is estimated to be 9. We also split the dataset into three subsamples, pre-Maastricht, post-Maastricht and pre-EMU, and post-EMU. The ICp1 criterion selects r = 3 for the first, r = 4 for the second, and r = 5 for the third subsample, which is perhaps a first indication of a structural break. The autoregressive order of the idiosyncratic components is, again, set to 6, and the lag length for the Newey–West correction to 5. The rejection rates of the test for structural breaks are 0.18 and 0.63 for the Maastricht treaty and 0.40 and 0.60 for the changeover to a single monetary policy in the euro area when the tests are based on the LM and HAC test procedures, respectively. Have linkages become tighter or looser? We compare the commonality of the pre-Maastricht, post-Maastricht and pre-EMU and the post-EMU periods and find no major change between the first and the second period when nine factors explain 53.7% and 53.8% of the total variance, respectively. By contrast, the commonality increases to 55.7% in the third period which also supports our finding of a break being more likely in 1999:01 than in 1992:02. We can, again, assess whether the breaks have occurred only at the dates of the two specific events or before or after these dates. As shown in Fig. 2, the heteroskedasticity-robust version of the test indicates that the rejection rate is indeed highest (at 0.40) in 1999:01. The unknown break point LM test suggests a rejection rate of 45% of the variables at any point in time. On average, over all variables, the break point is most likely to have occurred in 1995:02; the dispersion is, however, large. One possible interpretation is that reforms and other public measures in the run-up of EMU may have altered comovements. Also, EMU has been anticipated and private agents may have adjusted their behaviour prior to the event. A third explanation is that the mid-1990s are also associated with a general worldwide acceleration of globalization, which may have tightened linkages between countries. Finally, as in the previous application, we find evidence for considerable heteroskedasticity in the idiosyncratic components as indicated by a marked difference between the solid and the dotted line in Fig. 2.
Next, we investigate whether the events have affected certain countries more than others. We have also formed groups of variables with similar economic content17 and examine whether certain groups of variables have experienced structural breaks in the loadings while the loadings of other variables’ groups have remained stable. Table 8 shows the rejection rates for individual countries. We only consider countries for which more than 10 variables were included in the dataset. Rejection rates are relatively high for both events for Spain and Italy, which are the countries with the lowest initial (1992) incomes18 and the highest inflation and long-term interest rates19 of the countries considered and, hence, the greatest needs to converge. Italy’s public debt was, in addition, quite elevated, compared to the public debt of other countries.20 Table 8 also reports rejection rates for groups of variables. As for the overall tests, rejections rates for all groups are higher for stage 3 of EMU than for Maastricht. At the date of the changeover to a single monetary policy in the euro area, rejection rates are relatively high for inflation as well as monetary and financial variables. After all, the changeover to a single monetary policy conducted by the ECB is a monetary event, and this result may therefore not be surprising. Maastricht has mainly caused breaks in the loadings of industrial production series. Unlike Canova et al. (2009), we have detected clear structural breaks in lots of variables at the two dates. The fact that their dataset does not include nominal variables may explain this difference between our finding and theirs. 7. Conclusions Analyzing datasets with a large number of variables and time periods involves a severe risk that some of the model parameters
17 ‘‘Industrial production’’ includes, besides industrial production, also retail sales, orders, export, imports, inventories, and car registrations. The ‘‘Inflation’’ group summarizes PPI as well as export and import price inflation. ‘‘Monetary and financial variables’’ contain interest rates, monetary aggregates, exchange rates, and stock prices. ‘‘Labor market’’ summarizes employment variables and wages as well as unit labor costs. Finally, survey expectations form the group ‘‘Surveys’’. 18 GDP per capita amounted to 25,536 and 21,103 US$ for Italy and Spain in 1992 and to 27,725, 26,608, 27,116, 28,168 US$ for Germany, France, Belgium, and the Netherlands, respectively, according to The Conference Board and Groningen Growth and Development Centre, Total Economy Database, January 2008. 19 In 1992, year-on-year CPI inflation was at 5.3% and 5.9% in Italy and Spain and at 5.1%, 2.4%, 2.4%, 3.2% in Germany, France, Belgium, and the Netherlands, respectively. In 1992, the long-term interest rates were at 13.3% and 11.7% for Italy and Spain and at 7.9%, 8.6%, 8.7% and 8.1% for Germany, France, Belgium, and the Netherlands, respectively. Source: Economic Outlook, OECD. 20 In 1992 the gross public debt as a percentage of GDP according to the Maastricht criterion was at 105.3% for Italy and at 45.9%, 42.1%, 38.8%, 128.5%, 77.4% for Spain, Germany, France, Belgium and the Netherlands, respectively. Source: Economic Outlook, OECD.
80
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
are subject to structural breaks. We show that structural breaks in the factor loadings may inflate the number of factors identified by the usual information criteria. Furthermore, we propose Chowtype tests for structural breaks in factor models. It is shown that under the assumptions of an approximate factor model and if the number of variables is sufficiently large, the estimation error of the common factors does not affect the asymptotic distribution of the Chow statistics. In other words, the PC estimator of the common factors is ‘‘super-consistent’’ with respect to the estimation of the factor loadings and, therefore, the usual Chow test can be applied to the factor model in a regression, where the unknown factors are replaced by principal components. We also show that the Andrews (1993) tests for a structural break with an unknown break date can be used in a factor model. Furthermore, these tests can be generalized to dynamic factor models by adopting a GLS version of the test. This approach assumes a finite order autoregressive process for the idiosyncratic components, whereas no specific dynamic process needs to be specified for the common factors. Our Monte Carlo simulations suggest that the LM version outperforms the other variants of the test. The LM test procedures are applied to two different settings. Our first empirical application uses a large US macroeconomic dataset provided by Stock and Watson (2005). We have tested whether the so called Great Moderation in the US (assuming the first quarter of 1984 as the starting date) coincides with structural breaks in the factor loadings. A lot of the attention of researchers and policy makers has recently been directed to the Great Moderation, and there is still some controversy about the sources (‘‘good luck’’ versus structural changes including ‘‘good policy’’). We find evidence of ‘‘dramatic changes’’ in the economy, reflected in significant breaks in the factor loadings, in the early 1980s. By testing for breaks in the loadings of individual variables such as the Federal funds rate, inventories, industrial production in the durable and non-durable sectors, personal consumption expenditure and financial variables, we can assess the underlying sources of the structural change. We find support for the hypothesis that not a single but various factors have played an important role. These factors are, according to our analysis, changes in the conduct of monetary policy and in inventory management as well as financial integration. In the second application we take a large euro-area dataset used in Altissimo et al. (2007) to test whether structural breaks have occurred in the euro area around two major events: the signing of the Maastricht treaty in the second quarter of 1992 and the handover of monetary policy to the ECB (stage 3 of EMU) in the first quarter of 1999. This setting is particularly interesting, since these events may have altered comovements between variables as noted, and this will just be reflected in structural breaks in the factor loadings. We find evidence of structural breaks around both dates, with higher rejection rates for stage 3 of EMU than for the signing of the Maastricht treaty. It is not fully clear whether breaks have occurred exactly in 1999 or a few years before, possibly due to prior adjustments. Breaks finally seem to have occurred relatively frequently in the loadings of Spanish and Italian variables around the two events. The changeover to a single monetary policy in the euro area was associated with relatively frequent structural breaks in the loadings of nominal variables, whereas the signing of the Maastricht treaty seems to coincide with breaks in the factor loadings of industrial production series. Acknowledgements The views expressed in this paper do not necessarily reflect the views of the Deutsche Bundesbank. This paper was presented
at the Workshop on Panel Methods and Open Economies, Frankfurt/Main, May 21, 2008 and at the International Conference on Factor Structures for Panel and Multivariate Time Series Data, Maastricht, September 19–20, 2008. The authors would like to thank Jörn Tenhofen, Jean-Pierre Urbain, Joakim Westerlund, and two anonymous referees for many helpful comments and suggestions. Appendix Preliminaries Following Bai (2003) the true values of the factors and factor loadings are indicated by a zero, so the model is written as ′
yit = λ0i ft0 + εit . Let Y = (yit ) be the T × N matrix of observations. The√columns of the PC estimator F = [ f1 , . . . , fT ]′ are obtained as T times the eigenvectors corresponding to the r largest eigenvalues of the matrix YY ′ obeying the normalization T −1 F ′ F = Ir . The estimated = [ matrix of factor loadings results as Λ λ1 , . . . , λT ]′ = Y ′ F /T . Since the factors are identified only up to an arbitrary rotation we will apply the normalization ′ ′ yit = λ0i H −1 H ′ ft0 + εit
= λ′i ft + εit , where ft = H ′ ft0 , λi = H −1 λ0i . Furthermore, let F 0 = [f10 , . . . , fT0 ]′ , F = [f1 , . . . , fT ]′ = F 0 H, Λ0 = [λ01 , . . . , λ0T ]′ , Λ = [λ1 , . . . , λT ]′ = ′
Λ0 H −1 . Following Bai (2003) we employ the rotation matrix ′ ′ H = T Λ0 Λ0 F 0 F ( F ′ YY ′ F )−1 .
Using this normalization, Bai and Ng (2002) showed that the PC estimator F is a consistent estimator for F . Proof of Theorem 1 First, consider the LM statistic. Let εi = [εi1 , . . . , εiT ]′ . The ′ residuals are obtained as M F εi , where F = [f1 , . . . , fT ] and M F = ′ −1′ −1′ IT − F (F F ) F = IT − T F F . The individual LM statistic results as
−1 εi′ MF F2 ( F2′ M F2′ M F F2 ) F εi , ′ εi MF εi /T where F2 = [0, . . . , 0, f T ∗ +1 , . . . , f T ]′ . si =
(18)
i
Using Lemma B.3 of Bai (2003) and Lemma A.1(ii) of Breitung and Tenhofen (2008) it follows that −2 T −1 F ′ F = Ir = T −1 F ′ F + Op (δNT ),
√
√
where δNT = min( N , T ). The following Lemma shows that a F2 : similar result holds for T −1 F2′ Lemma A.1. Let F2 = [0, . . . , 0, fT ∗ +1 , . . . , fT ]′ . Under assumpi tions A–F of Bai (2003) we have 1 ′ 1 1 ′ 1 −2 (i) F2 F2 − F2′ F2 = F2 F − F2′ F = Op (δNT )
(ii)
T 1 T
T
T
T
−2 ( F2 − F2 )′ εi = Op (δNT ).
Proof. (i) Since the upper block of F2 is a matrix of zeros we have F2′ F = F2′ F2 and F2′ F = F2′ F2 . Consider
1 ′ F2 F2 − F2′ F2 T 1 1 1 = ( F − F )′ F2 + F ′ ( F2 − F2 ) + ( F − F )′ ( F2 − F2 ) T T T = I + II + III .
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
81
Following Bai (2003) we start from the representation
where
1 −1 ′ VNT F F Λ′ ε·t + F ′ ε Λft + F ′ εε·t , ft − ft =
ζN (s, t ) = ε·′s ε·t /N − γN (s, t ) γN (s, t ) = E (ε·′s ε·t /N ).
NT
where ε·t = [ε1t , . . . , εNt ]′ , ε = [ε·1 , . . . , ε·T ], and VNT is an r × r diagonal matrix of the r largest eigenvalues of (NT )−1 YY ′ . We first analyze
1 T
1
( F − F )′ F2 =
NT 2
T −
−1 ′ VNT F F Λ′
ε·t ft′
Thus, c = VNT
T −
+ F ′ εΛ
+ Op
√ δNT T
√ δNT N
ε·t ft′
t =T ∗ +1
i
i
= Op
From Assumption F(2) of Bai (2003) it follows that
√
ε·t ft′ = Op ( NT ),
t =T ∗ +1 i
and T −1 F ′ F − T −1 F ′ F = T −1 ( F − F )′ F Lemma A.2). Thus, we obtain
−2 = Op (δNT ) (cf. Bai,
T
− 1 1 −1 −1′ a = VNT (T F F ) √ Λ′ ε·t ft′ √ NT NT ∗ t =T +1
1 2 δNT
.
−2 Using the same arguments it follows that II = Op (δNT ). Finally, following closely the proof of Theorem 1 in Bai and Ng (2002) we −2 obtain III = Op (δNT ). (ii) The proof is similar to (i). We therefore present the main steps only. Consider T 1 −
T
1
( ft − ft )εit =
V −1 F ′ F Λ′ 2 NT
NT
t =Ti∗ +1
i
1
= Op
√
NT
NT 1 NT
Λ
T −
T −
ε·t ft′ + Λ′
T −
ε·t ( ft − ft )′ .
t =1
ε·t ft = Op ′
t =1
1
√
δNT N
b = VNT
[
1 NT
= Op
T
1
+ Op
√
NT
= Op
t =Ti∗ +1
]
1
√ √ δNT N N
NT 2
=
T −
F ′ε
ε·t ft = ′
t =Ti∗ +1 T T 1 − −
T 2 s=1 t =Ti∗ +1
+
= Op
t =Ti∗ +1
√ δNT T
1
NT
Λ′
] 1
+ Op
√
NT
ε·t εit
t =Ti∗ +1
1
T −
N
T − 1 ft εit F ′ εΛ
1
T
Op
√
δNT N
t =Ti∗ +1
1
√
T
.
Finally we have Op (1).
−1
ε·′s ε·t fs ft′
ci = VNT
= Op
T T 1 − −
T 2 s=1 t =Ti∗ +1 1
fs εit (N −1 ε·′s ε·t )
+ Op
√ δNT T
1
√ δNT N
(cf. Bai, 2003, p. 163). Collecting these results we have
1 + Op N 1 1 + Op + Op √ √ δNT T δNT N 1 = Op 2 . δNT
ai + bi + ci = Op
T 2 s=1 t =Ti∗ +1 1
NT 2 s=1 t =Ti∗ +1
fs ft′ ζNT (s, t )
T T 1 − −
T T 1 − −
1 NT
T 1 − F εΛ ft ft′
′
−1 bi = VNT
As in Bai (2003, p. 164f), we obtain for the remaining term 1
ft εit + F ′ε
(cf. Bai, 2003, B.1). For the second term we obtain
.
Using T −1 F2′ F2 = Op (1) we obtain −1
1 ′ F F T
= Op (1) Op
NT
ε·t ( ft − ft )′ = Op
[
√
T
−
T −
The term ai results as −1 ai = VNT
1
t =1
Λ′
ε·t εit
t =Ti∗ +1
= ai + bi + ci .
As shown by Bai (2003, p. 160), ′
T −
t =Ti∗ +1
t =1
1
T −
+ F ′ εΛ
.
Next consider
Λ′ ε ′ F = Λ′
.
I = a+b+c 1 1 1 = Op √ + Op + Op √ √ NT δNT T δNT N
= a + b + c.
]
1
Collecting these results we obtain
T −
ft ft′ + F ′ε
t =T ∗ +1
Λ′
Op
1
t =T ∗ +1 i
T −
[
−1
fs ft′ γN (s, t )
+ Op
1
√ δNT N
1
√ NT
ε·t εit
82
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
Using these results, we obtain T
Using Lemma A.1 and Assumption 3 we obtain
−2 −1 ′ F2 M F2 MF F2 + Op (δNT ), F F2 = T
−1′
where MF = IT − F (F ′ F )−1 F ′ . Using Lemma A.1(i) and (ii) and Lemma B.1 of Bai (2003) we obtain in a similar manner −1/2 ′ T −1/2 εi′ M εi F2 + T −1/2 εi′ ( F2 − F2 ) F F2 = T
− T
1 ′ F F2 εi F
−1/2 ′
T
1 ′ F = T −1/2 εi′ F2 − T −1/2 εi′ F F2 T √ 2 + Op ( T /δNT ) √ 2 = T −1/2 εi′ MF F2 + Op ( T /δNT ).
T
−1/2τ ′
F2 εi = T
t =[τ T ]+1 d
−1/2 ′τ εi F2 − T −1/2 εi′ F ( F ′ F )−1 T −1/2 F ′ F2τ T −1/2 F2τ M F εi = T ′
d
→ σi [τ Wi (1) − Wi (τ )] ′
√ εi MF F2 (F2 MF F2 ) F2 MF εi 2 + Op ( T /δNT ) ′ εi MF εi /T √ 2 = s0i + Op ( T /δNT ). −1 ′
Using these results, the limiting distribution of sτi is obtained as
si =
√ τ
∗′ ∗ i i
−1 εi′ MF εi = ε ε + εi′ MF F2 ( F2′ M F2′ M F F2 ) F εit . Using the same results as were obtained for the LM statistic, we have ′
−2 T −1 (εi′ M εi∗ εi∗ ) = T −1 εi′ MF F2 (F2′ MF F2 )−1 F2′ MF εi + Op (δNT ). F εi −
The first term on the r.h.s. is +Op (T −1 ) and therefore the difference between the variance estimators based on the restricted and unrestricted models is positive and Op (T −1 ). Therefore, wi ≥ si
√
2 and wi = s0i + Op (T −1 ) + Op ( T /δNT ). Using a first-order Taylor expansion, we obtain for the LR statistic
=
(S1i + S2i )/T
+ Op (T −1 )
−1 εi′ MF F2 ( F2′ M F2′ M F F2 ) F εi + Op (T −1 ) ′ − 1 εi MF εi /T + Op (T )
= si + Op (T −1 ).
lri =
√
+ Op
1
T
+ Op
T
2 δNT
δ
2 NT
where τ
Z T ,i = √
1
τ (1 − τ )
T τ − εit ft − √
σi T
t =1
1
√ σi T
[τ T ] −
εit ft .
t =1
Note that the latter sequence can be represented as a vector of ∑T (weighted) partial sums of the form ZτT ,i = T −1/2 t =1 w(ft , τ )εit , where the weights are functions of Ft and τ . By letting ZT ,i (τ ) = ZτT ,i with τ = [τ T ]/T we embed the observed sequence in the space D[τ0 , 1 − τ0 ]. The finite dimensional distributions for all choices of τ1 , . . . , τn with τκ ∈ [τ0 , 1 − τ0 ] converge to the corresponding finite dimensional √ distributions of [τ Wi (1) − Wi (τ )]/ τ (1 − τ ). In order to obtain weak convergence in D[τ0 , 1 − τ0 ], the tightness of the sequence ZT ,i (τ ) has to be shown. To this end we closely follow Billingsley (1968, Section 6). Since {Ft } is independent of {εit } we may condition on {Ft } and obtain a sum of independent random variables with uniformly bounded second moments for which the inequality (16.5) in Billingsley (1968) remains valid modulo a constant. Having established weak convergence of ZT ,i (τ ) to [τ Wi (1) − √ Wi (τ )]/ τ (1 − τ ) we invoke the continuous mapping theorem together with standard results on weak convergence yielding
[τ Wi (1) − Wi (τ )]′ [τ Wi (1) − Wi (τ )] . τ (1 − τ )
Another application of the continuous mapping theorem yields
. sup
τ ∈[τ0 ,1−τ0 ]
Proof of Theorem 2 Let F2τ = [0, . . . , 0, f[τ T ]+1 , . . . , fT ]′ and write the LM statistic for a structural break at Ti∗ = [τ T ] as sτi =
T
+ Op
[τ Wi (1) − Wi (τ )]′ [τ Wi (1) − Wi (τ )] , τ (1 − τ )
d
→
sτi ⇒
Therefore, s0i
ZτT ′,i ZτT ,i
si =
Note that s0i is the LM statistic obtained from the (infeasible) regression that uses F instead of F . Under Assumption 2 s0i has a 2 χ limiting distribution as T → ∞. To derive the limiting distribution of the Wald statistic wi we first note that the only difference from the LM statistic is that the variance estimator in the denominator of (18) is computed by using the sum of squared residuals from a regression of M F εi on ∗ M F . Denote the resulting residual vector as ε . From standard 2 F i regression theory it is well known that
S0i − S1i − S2i
p
τ F2τ M T −1 F F2 → τ (1 − τ )Ir .
From these results it follows that
T [log(S0i ) − log(S1i + S2i )] =
T
2 δNT
→ σi [Wi (1) − Wi (τ )] T − 1 ′ τ p ft ft′ + Op T −1 F2τ F 2 = T −1 → (1 − τ )Ir 2 δ NT t =[τ T ]+1 T − 1 p T −1 F ′ F = T −1 ft ft′ + Op → Ir 2 δ NT t =1
−2 −1 ′ T −1 εi′ M εi MF εi + Op (δNT ). F εi = T
′
εit ft + Op
where we have used the fact that ΩF = Ir in Assumption 3 due to the normalization of the factors. It follows that
Finally Eq. 10 of Bai and Ng (2002) implies
′
√
T −
−1/2
′ ′ τ −1 εi′ MF F2τ ( F2τ M F2τ M F F2 ) F εi . ′ εi MF εi /T
(19)
sτi ⇒
sup
τ ∈[τ0 ,1−τ0 ]
[τ Wi (1) − Wi (τ )]′ [τ Wi (1) − Wi (τ )] . τ (1 − τ )
Since the asymptotic distribution is the same for all i we drop the index i in Theorem 2. Proof of Theorem 3 To derive the limiting distribution of the feasible GLS version of the LM test, we make use of the following two lemmas:
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
Lemma A.2. It holds for any fixed m and k ≤ m that T −
(i) T −1
and T 1 ′ 1 − Gi εi = √ ft −1 )(uit − ϱi ui,t −1 ) ( ft − ϱi √
−2 ( ft − ft )ft′−k = Op (δNT ),
T
t =m+1
T
T −
−1
T t =2
T 1 −
= √
−2 ( ft − ft ) ft′−k = Op (δNT )
T t =2
t =m+1
(ii) T
T −
−1
ft ft′−k = T −1
t =m+1
(iii) T
−1
T −
ft ft′−k
+ Op (δNT )
T t =2
−2 ( ft − ft )ui,t −k = Op (δNT ).
t =m+1
Proof. For m = pi these results are shown in Breitung and Tenhofen (2008, Lemma A.1). For m = Ti∗ the proof can be modified straightforwardly according to Lemma A.1.
where u·t = [ u1t , . . . , uNt ]′ , T 1 − A = √ (ϱi − ϱi ) ft −1 εit T t =2
To simplify the notation, we focus on the AR(1) model uit = ϱi ui,t −1 + εit . The extension to AR(p) models is straightforward but implies a considerable additional notational burden. The LM statistic can be written as
T
− 1 ϱi ) f t ut − 1 B1 = √ (ϱi − T t =2 √
i
Using Lemma A.2(ii) and Lemma A.3, we obtain
ft ui,t −1 + Op (δNT )
−2 = Op (T −1/2 ) + Op (δNT ).
For the last term we have
− 1 C = √ (ϱi − ϱi ) 2 ft −1 ut −1 T t =2 −2 −4 ) + Op (δNT ) = Op (T −1 ) + Op (T −1/2 δNT T 1 − ft −1 ut −1 × √
T
1 ′ 1 − Gi,2 Gi = ( ft − ϱi ft −1 )( ft − ϱi f t −1 ) ′ T T ∗ t =Ti +1
T 1 −
T t =2
Using Lemma A.2(iii) it follows that
i
= Op (δNT ). −2
( ft − ϱi ft −1 )( ft − ϱi f t −1 ) ′
t =T ∗ +1
1
+ Op
√
T
T 1 −
1
2 δNT
√
T t =2
(ft − ϱi ft −1 )(ft − ϱi ft −1 )′ 1
+ Op
√
T
1
2 δNT
,
T 1− = (ft − ϱi ft −1 )(ft − ϱi ft −1 )′
T t =2
1
√
T
+ Op
1 2 δNT
√
T 1 −
,
T
(ft − ϱi ft −1 )εit + Op
2 δNT
.
Collecting these results gives
t =Ti∗ +1
+ Op
T t =2
−2
T
i
T (ϱi − ϱi )
T 1−
T
i
1 ′ Gi Gi T
− 1 B2 = − √ ϱi (ϱi − ϱi ) ft −1 ut −1 T t =2
εi = [u2 − ϱi u1 , . . . , uT − ϱ i uT − 1 ] ′ −1′ Gi . M G = IT −1 − Gi (Gi Gi )
+ Op
ft −1 εit + Op (δNT )
and, similarly,
and
−2
−2 = Op (1) Op (T −1/2 ) + Op (δNT )
Gi = [ f2 − ϱi f1 , . . . , ft − ϱi fT −1 ]′ Gi,2 = [0, . . . , 0, fT ∗ +1 − ϱi fT ∗ , . . . , ft − ϱi f T −1 ] ′
T
T t =2
ft −1 εit + ( ft −1 − ft −1 )εit
Next, using Lemma A.2(iii) we obtain
=
ψi,21 = G′i,2 M εi Gi ′ Ψi,22 = Gi,2 M Gi Gi,2 ′ ψi,11 = εi MGi εi /T ,
=
T 1−
−2 = Op (1) Op (T −1/2 ) + Op (δNT ) .
where
+ Op
T (ϱi − ϱi )
=
T 1−
T t =2
√
1 si = ψi′,21 Ψi− ,22 ψi,21 /ψi,11 ,
T
T (ϱi − ϱi )
=
−2 ϱ(i) = ϱ(i) + Op (T −1/2 ) + Op (δNT ).
Proof. The proof is given in Breitung and Tenhofen (2008, Lemma 1).
√
ϱ(i) = [ ϱi,1 , . . . , ϱi,pi ]′ Lemma A.3. Let ϱ(i) = [ϱi,1 , . . . , ϱi,pi ]′ and denote the least squares estimates from (15). Under Assumption 1 we have as (N , T ) → ∞
=
[ ft − ϱi ft −1 + (ϱi − ϱi ) ft −1 ]
× (εit + (ϱi − ϱ i ) ut − 1 ) T 1 − = √ ( ft − ϱi ft −1 )εit + A + B1 + B2 + C ,
−2
t =m+1
T −
83
T
−1/2
1
ψ1,21 = √
T
G′i,2 MGi εi
+ Op
1
√
T
√
+ Op
where Gi = [F2 − ϱi F1 , . . . , ft − ϱi fT −1 ]′ Gi,2 = [0, . . . , 0, fT ∗ +1 − ϱi fT ∗ , . . . , ft − ϱi fT −1 ]′ i
MGi = IT −1 − Gi (G′i Gi )−1 G′i .
i
T
δ
2 NT
,
84
J. Breitung, S. Eickmeier / Journal of Econometrics 163 (2011) 71–84
Furthermore 1 T
Ψi,22
1 = G′i,2 MGi Gi,2 + Op T
1
+ Op
√
T
1 2 δNT
and 1 T
ψi,11
1 = εi′ MGi εi + Op T
1
√
T
+ Op
1
2 δNT
.
It follows that s0i
si = + Op
1
√
T
√
+ Op
T
2 δNT
,
where
s0i =
εi′ MGi Gi,2 (G′i,2 MGi Gi,2 )−1 G′i,2 MGi εi εi′ MGi εi /T
. d
Under Assumption 2 we therefore have si → χ(2r ) as N , T → ∞ √ and T /N → 0. References Andrews, D.W.K., 1993. Tests for parameter instability and structural change with unknown change point. Econometrica 61, 821–856. Andrews, D.W.K., Lee, I., Ploberger, W., 1996. Optimal change point tests for normal linear regression. Journal of Econometrics 70, 9–38. Andrews, D.W.K., Ploberger, W., 1994. Optimal tests when a nuisance parameter is present only under the alternative. Econometrica 62, 1383–1414. Altissimo, F., Forni, M., Lippi, M., Veronese, G., Cristadoro, R., 2007. New Eurocoin: tracking economic growth in real time. Bank of Italy Working Paper 631. Amengual, D., Watson, M.W., 2007. Consistent estimation of the number of dynamic factors in a large N and T panel. Journal of Business & Economic Statistics 25, 91–96. Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71, 135–172. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J., Ng, S., 2007. Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60. Bai, J., Perron, P., 1998. Estimating and testing linear models with multiple structural changes. Econometrica 66, 47–78. Bai, J., Perron, P., 2003. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18, 1–22. Banerjee, A., Marcellino, M., 2008. Forecasting macroeconomic variables using diffusion indexes in short samples with structural change. CEPR Working Paper 6706. Baxter, M., Kouparitsas, M.A., 2005. Determinants of business cycle comovement: a robust analysis. Journal of Monetary Economics 52 (1), 113–157. Benati, L., Mumtaz, H., 2007. US evolving macroeconomic dynamics—a structural investigation. ECB Working Paper 746. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Breitung, J., Pesaran, M., 2008. Unit roots and cointegration in panels. In: Matyas, L., Sevestre, P. (Eds.), The Econometrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice. Kluwer Academic Publishers, pp. 279–322.
Breitung, J., Tenhofen, J., 2008. GLS estimation of dynamic factor models. Working Paper, University of Bonn. Canova, F., Ciccarelli, M., Ortega, E., 2009. Do institutional changes affect business cycles? Evidence from Europe, Mimeo, Universitat Pompeu Fabra, Barcelona. Chamberlain, G., Rothschild, M., 1983. Arbitrage, factor structure and mean– variance analysis in large asset markets. Econometrica 51, 1305–1324. Clarida, R., Galí, J., Gertler, M., 2000. Monetary policy rules and macroeconomic stability: evidence and some theory. The Quarterly Journal of Economics 115, 147–180. Davis, S.J., Kahn, J.A., 2008. Interpreting the great moderation: changes in the volatility of economic activity at the macro and micro levels. Journal of Economic Perspectives 22, 155–180. Del Negro, M., Otrok, C., 2008. Dynamic factor models with time-varying parameters: measuring changes in international business cycles. Revised Version of Federal Reserve Bank of New York Staff Report 326 (2005). Eickmeier, S., Lemke, W., Marcellino, M., 2009. Classical time-varying FAVAR models—estimation, forecasting and structural analysis. Mimeo, Deutsche Bundesbank. Frankel, J.A., Rose, A.K., 1998. The endogeneity of the optimum currency area criteria. Economic Journal 108, 1009–1025. Galí, J., Gambetti, L., 2008. On the sources of the great moderation. CEPR Discussion Paper 6632. Giordani, P., 2007. Discussion of ‘‘forecasting in dynamic factor models subject to structural instability’’ by James Stock and Mark Watson, December 2007. http://www.ecb.int/events/pdf/conferences/ftworkshop2007/Giordani.pdf. Imbs, J., 2004. Trade, finance, specialization and synchronization. Review of Economics and Statistics 86, 723–734. IMF, 2008. World Economic Outlook: Housing and the Business Cycle (Chapter 3). Kim, C., Nelson, C.R., 1999. Has the US economy become more stable? A Bayesian approach based on a Markov-switching model of the business cycle. Review of Economics and Statistics 81, 608–616. Kose, A.M., Prasad, E.S., Terrones, M.E., 2003. How does globalization affect the synchronization of business cycles?. American Economic Review 93, 57–62. McConnell, M., Perez-Quiros, G., 2000. Output fluctuations in the United States: what has changed since the early 1980s?. American Economic Review 90, 1464–1476. Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Perron, P., 2006. Dealing with structural breaks. In: Mills, T.C., Patterson, K. (Eds.), Econometric Theory. In: Palgrave Handbook of Econometrics, vol. 1. Palgrave Macmillan, New York, pp. 278–352. Stock, J.H., Watson, M.W., 2002. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167–1179. Stock, J.H., Watson, M.W., 2003. Has the business cycle changed? Evidence and explanations. Forthcoming FRB Kansas City Symposium, Jackson Hole, Wyoming, August 28–30. Stock, J.H., Watson, M.W., 2005. Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467. Stock, J.H., Watson, M.W., 2008. Forecasting in dynamic factor models subject to structural instability. In: Castle, J., Shephard, N. (Eds.), The Methodology and Practice of Econometrics, A Festschrift in Honour of Professor David F. Hendry. Oxford University Press, Oxford. Vogelsang, T.J., 1999. Sources of nonmonotonic power when testing for a shift in mean of a dynamic time series. Journal of Econometrics 88, 283–299. White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838.
Journal of Econometrics 163 (2011) 85–104
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Cross-sectional dependence robust block bootstrap panel unit root tests Franz C. Palm, Stephan Smeekes ∗ , Jean-Pierre Urbain Department of Quantitative Economics, Maastricht University, The Netherlands
article
info
Article history: Available online 10 December 2010 JEL classification: C15 C23 Keywords: Block bootstrap Panel unit root test Cross-sectional dependence
abstract In this paper we consider the issue of unit root testing in cross-sectionally dependent panels. We consider panels that may be characterized by various forms of cross-sectional dependence including (but not exclusive to) the popular common factor framework. We consider block bootstrap versions of the groupmean (Im et al., 2003) and the pooled (Levin et al., 2002) unit root coefficient DF tests for panel data, originally proposed for a setting of no cross-sectional dependence beyond a common time effect. The tests, suited for testing for unit roots in the observed data, can be easily implemented as no specification or estimation of the dependence structure is required. Asymptotic properties of the tests are derived for T going to infinity and N finite. Asymptotic validity of the bootstrap tests is established in very general settings, including the presence of common factors and cointegration across units. Properties under the alternative hypothesis are also considered. In a Monte Carlo simulation, the bootstrap tests are found to have rejection frequencies that are much closer to nominal size than the rejection frequencies for the corresponding asymptotic tests. The power properties of the bootstrap tests appear to be similar to those of the asymptotic tests. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The use of panel data to test for unit roots and cointegration has become very popular recently. A major problem with tests for unit roots (and cointegration) in univariate time series is that they lack power for small sample sizes. Therefore one of the reasons people have turned to panel data, is to utilize the cross-sectional dimension to increase the power. Another reason to use panel data is that one might be interested in testing a joint unit root hypothesis for N entities. The so-called first-generation panel unit root tests such as the tests proposed by Levin et al. (2002) and Im et al. (2003) are examples where the cross-sectional dimension is used to construct tests that have higher power than individual unit root tests. However, all the first-generation tests rely on independence along the cross-sectional dimension. It was soon realized that cross-sectional independence is a highly unrealistic assumption for most settings encountered in practice, and it has been shown that the first-generation tests exhibit large size distortions in the presence of cross-sectional dependence (e.g. O’Connell, 1998). Therefore, the so-called secondgeneration panel unit root tests have been constructed to take
∗ Corresponding address: Department of Quantitative Economics, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Tel.: +31 43 3883856; fax: +31 43 3884874. E-mail addresses:
[email protected] (F.C. Palm),
[email protected] (S. Smeekes),
[email protected] (J.-P. Urbain). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.010
the cross-sectional dependence into account in some way. These second-generation tests assume specific forms of the crosssectional dependence as their application depends on modelling the structure of the dependence. Most tests model the crosssectional dependence in the form of common factors, although the way the common factors are dealt with differs for each test. Examples of second-generation panel unit root tests are Bai and Ng (2004), Moon and Perron (2004), and Pesaran (2007). An extensive Monte Carlo comparison of these tests can be found in Gengenbach et al. (2010). Breitung and Das (2008) provide an analytical comparison of several first- and second-generation tests in the presence of factor structures. While the second-generation panel unit root tests can deal with common factor structures and contemporaneous dependence, they cannot deal with other forms of cross-sectional dependence, with the exception of Pedroni et al. (2008). Of particular interest for practical applications are dynamic interrelationships (an example of which is Granger causality). Our goal in this paper is to present panel unit root tests that can deal not only with common factors, but also with a wide range of other plausible dynamic dependencies. The tool we use to achieve this is the block bootstrap. Two very useful features of the block bootstrap are that one does not have to model the dependence (both temporal and cross-sectional) in order to apply it, and that it is valid to use under a wide range of possible data generating processes (DGPs). This makes it an appropriate tool to use in this setting with N fixed, possibly large, and large T asymptotics.
86
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Of course, the idea to use the bootstrap in cross-sectionally dependent panels is not new and has already been proposed by Maddala and Wu (1999),1 but so far no one has considered the theoretical properties of the block bootstrap in this setup. There are theoretical results available for other bootstrap and related resampling methods. Chang (2004) considers sieve bootstrap unit root tests, but the sieve bootstrap can only be applied in panels under restrictive assumptions on the cross-sectional dependence. Kapetanios (2008) proposes a bootstrap resampling scheme which resamples in the cross-sectional dimension instead of the usual time dimension, but this is based on cross-sectional independence. Choi and Chue (2007) consider subsampling, which does allow for more general dependence, but as the authors themselves state (p. 235) ‘‘Notwithstanding these nice features of the subsampling approach, depending on the nature of the problem at hand, other methods like bootstrapping may work better in finite samples’’. Hence, the properties of the block bootstrap are still largely unknown in this setting, while in fact the block bootstrap is quite popular among practitioners. We try to fill this gap by providing theoretical results, mainly about asymptotic validity, of block bootstrap panel unit root tests. The block bootstrap method we consider here is the moving-blocks bootstrap (Künsch, 1989), and is an extension of the univariate bootstrap unit root test proposed by Paparoditis and Politis (2003). We will consider a very general DGP that can capture many different interesting and relevant forms of cross-sectional and time dependence. Our results provide the theoretical justification, supported by Monte Carlo evidence, for the use of the proposed panel unit root tests in applications where one is interested in testing for a unit root in the observed data, and where cross-sectional dependence of possibly unknown form might be present in the data. The tests can be easily implemented, as they do not require the specification and estimation of the cross-sectional dependence structure. For example, it is not necessary to know the number of common factors, nor to estimate these factors. It is not even necessary to know whether common factors are present in the data at all. The structure of this paper is as follows. Section 2 explains the model and assumptions. The test statistics and the construction of the bootstrap versions are discussed in Section 3. We establish the asymptotic validity of the bootstrap tests (for T → ∞ and N fixed) for various settings in Section 4. Finite sample performance, including block length selection, is investigated in Section 5. Section 6 concludes. All proofs and preliminary results are contained in the Appendix. Finally, a word on notation. We use | · | to denote the Euclidean norm for vectors and matrices, i.e. |v| = (v ′ v)1/2 for a vector v and |M | = (tr M ′ M )1/2 for a matrix M. ⌊x⌋ is the largest integer smaller than or equal to x. Convergence in distribution (probability) is d
p
denoted by − → (− →). Bootstrap quantities (conditional on the original sample) are indicated by appending a superscript ∗ to the standard notation. 2. Cross-sectionally dependent panels Let us first describe the model that we use for panels with possible unit roots and that allows for various types of crosssectional and temporal dependence. Let yt = (y1,t , . . . , yN ,t )′ (t = 1, . . . , T ) be generated as yt = ΛFt + wt ,
(1)
where Λ = (λ1 , . . . , λN ) , Ft = (F1,t , . . . , Fd,t ) and wt = (w1,t , . . . , wN ,t )′ . Hence, Ft are common factors (d in total), Λ are ′
′
1 Also see Fachin (2007) and Di Iorio and Fachin (2008) for some successful applications of the block bootstrap in testing for cointegration in panels.
the (non-random) factor loadings, and wt are the idiosyncratic components. Let y0 = 0. We let the factors and the idiosyncratic components be generated by Ft = Φ Ft −1 + ft ,
(2)
wt = Θ wt −1 + vt , where Φ = diag(φ1 , . . . , φd ) and Θ = diag(θ1 , . . . , θN ). Furthermore, we let ft and vt be constructed as
[ ][ ] Ψ11 (L) Ψ12 (L) εv,t , (3) ft Ψ21 (L) Ψ22 (L) εf ,t ∑∞ j where Ψ (z ) = j=0 Ψj z (Ψ0 = I ). We also partition Ψ (z ) as ′ ′ ′ Ψ (z ) = (Ψ1 (z ) , Ψ2 (z ) ) where Ψi (z ) = (Ψi1 (z ), Ψi2 (z )), i = 1, 2. We only need some mild conditions on Ψ (z ) and εt . [ ] vt
= Ψ (L)εt =
Assumption 1. (i) det(Ψ (z )) ̸= 0 for all {z ∈ C : |z | = 1} and
∑∞
j =0
j|Ψj | < ∞.
(ii) εt is i.i.d. with E εt = 0, E εt εt′ = Σ and E |εt |2+ϵ < ∞ for some ϵ > 0. Our null hypothesis is H0 : yi,t has a unit root for all i = 1, . . . , N. As in Bai and Ng (2004) and Breitung and Das (2008), we discern three different settings under which this can occur. (A) θi = φj = 1 for all i = 1, . . . , N and j = 1, . . . , d: both the common factors and the idiosyncratic components have a unit root. This is our first main setting. (B) |θi | < 1 for all i = 1, . . . , N , φj = 1 for all j = 1, . . . , d: the common factors have a unit root while the idiosyncratic components are stationary. In this setting the units are crosssectionally cointegrated. In accordance with most of the literature we shall call this cross-unit cointegration. We also discuss this case in detail.2 (C) θi = 1 for all i = 1, . . . , N , |φj | < 1 for all j = 1, . . . , d: the common factors are stationary while the idiosyncratic components have a unit root. We shall not discuss this case in detail in Section 4 but its properties can easily be derived from the previous two cases. Note that in this paper we are not interested in which of the three settings occurs, instead we simply want to test if yi,t has a unit root for all i. We can discern different alternative hypotheses.3 The following two are of interest to us.
• H1a : yi,t is stationary for all i = 1, . . . , N. This implies that |θi | < 1 for all i = 1, . . . , N and |φj | < 1 for all j = 1, . . . , d. • H1b : yi,t is stationary for a portion of the units. This implies that |φj | < 1 for all j = 1, . . . , d; while |θi | < 1 for all i ∈ I1 and θi = 1 for all i ∈ I2 , with I1 ∪ I2 = {1, . . . , N } and n1 /N = κ > 0, where n1 is the number of elements of I1 .4
2 We could also easily think of a setting in between settings (A) and (B), i.e. one where |θi | < 1 for all i ∈ I1 and θi = 1 for all i ∈ I2 (with I1 ∪ I2 = {1, . . . , N }). In other words, where part of the units are cointegrated and others are not. We will not analyze this setting in detail as it is basically contained in the analysis of settings (A) and (B). 3 Di Iorio and Fachin (2008) discuss several alternative hypotheses that are relevant when testing for the null of no panel cointegration. They also argue that the choice of the test statistic should depend on the alternative hypothesis. Their arguments are valid for the unit root setting as well. 4 In principle we could also let some of the factors be I (1) provided they have zero loadings on the units in I1 .
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
In setting (B) we need an additional assumption on the factor loadings. In particular, we need to assume that for each unit at least one of the loadings on the factors is not equal to zero. If all loadings for one unit are equal to zero, this unit would be stationary and therefore the null hypothesis would be violated. This is formalized in Assumption 2. Assumption 2. If setting (B) holds, then λ′i λi > 0 for all i = 1, . . . , N. While this seems an additional and unverifiable assumption, this is not really the case. In fact, Assumption 2 is not really an assumption on the DGP as it is implied by our null hypothesis and in particular setting (B). One does therefore not have to verify whether this assumption holds when applying the test: under the null hypothesis it must hold by construction in setting (B), while in settings (A) and (C) it has no impact on the test. Remark 1. Note that while the setting we adopt is fairly comparable to factor models such as those considered in Bai and Ng (2004) and Breitung and Das (2008), it is more general in several ways. First, it is common to assume Ψ12 (z ) = 0, Ψ21 (z ) = 0 and Σ12 = 0, Σ21 = 0 such that the factors are independent of the idiosyncratic components. There is however no need to do so in order to obtain our theoretical results, and therefore we will not make this assumption in general. Whenever this assumption is made, this will be explicitly mentioned. Moreover, and more importantly, in most common factor models only weak dependence between the idiosyncratic components is allowed. We do not make this assumption; instead we allow for a wide array of possible dependencies between the idiosyncratic components, both through Σ and Ψ (z ). Especially the lag polynomial allows for a wide range of dependencies, including all sorts of dynamic dependencies. Therefore setting (A) is our main setting of interest, as simply setting λi = 0 for all i = 1, . . . , N results in a model without common factors, where the cross-sectional dependence is completely generated by Σ and Ψ (z ). This setting is therefore the most general. We also analyze setting (B) as it has generated a lot of attention in the literature (mainly due to Bai and Ng, 2004), but it is in fact a very specialized setting that lacks the generality of setting (A). The DGP employed by Chang (2004) is also a special case of our DGP: in setting (A), one can obtain this DGP by setting λi = 0 for all i = 1, . . . , N and making Ψ (L) diagonal and obey an invertibility condition. Remark 2. One might wonder whether we can actually call wt idiosyncratic components given the degree of interdependence that we allow for, as Σ11 and Ψ11 (z ) might be non-diagonal without restrictions beyond full rank and Ψ12 (z ) might be nonzero. In fact, one could even raise the question whether the distinction between factors and idiosyncratic components makes sense in our model with finite N. After all, the distinction between strong and weak dependence (that gives meaning to factors and idiosyncratic components) depends on the weak dependence disappearing as N grows (see for example Chudik et al., 2009). As a consequence of this the long-run covariance matrix will always have full rank in setting (A) irrespective of whether factors are present. Therefore we could use a VMA model for 1yt instead. We could even use this for setting (B), for which we currently use the orders of integration of the factor structure and the idiosyncratic components to distinguish it from the other settings. To be precise, we could assume under the null hypothesis that
1yt = Ψ (L)εt . Setting (B) would result from this by assuming that the rank of Ψ (1) is equal to d. By the Beveridge–Nelson decomposition we would then have
87
1yt = Ψ (1)εt + Ψ˜ (L)1εt , which, by writing Ψ (1) = Λβ ′ and letting 1Ft = β ′ εt , could be rewritten as yt = ΛFt + Ψ˜ (L)εt . The correspondence between this model and ours is obvious from this representation (this model could even be seen as a special case of ours with the differenced factors being white noise). The crucial point is that both this and our model allow us to deal with setting (B), where most existing tests fail.5 However, we stick to the factor structure model and terminology for two reasons. First, we want to stay close to the usual nonstationary panel models, such that our results remain comparable to the main strand of the literature. Also, from our model one can easily reach a ‘‘true’’ factor setup by imposing restrictions on the lag polynomial and the covariance matrix of the errors. The second reason is for clarity. As mentioned above, we could do without the factor structure and model everything in terms of a VARMA for yt . However, we would still need restrictions (such as on the rank) and the cases would therefore still need to be analyzed separately, as illustrated above. Therefore, little would be gained from this approach while we believe that the intuitive appeal of our setup and the interpretability would be lost. 3. Bootstrap unit root tests in panels 3.1. Test statistics We will consider bootstrapping simplified versions of the Levin et al. (2002, LLC) and Im et al. (2003, IPS) test statistics. The first simplification is that we take the test statistics before corrections for mean and variance. The reason is that adding or multiplying the original test statistic and the bootstrap test statistic with the same number will obviously not have an effect on the performance of the tests. This is therefore a completely harmless simplification. The second simplification is that we consider DF instead of ADF tests. Usually, the main reason to use ADF type of tests is to obtain asymptotically pivotal statistics. However, in the presence of complicated cross-sectional dependence it is often not possible to obtain asymptotically pivotal statistics anyway. There is therefore little reason (at least asymptotically) to use ADF instead of DF tests. The third simplification is that we look at the DF coefficient test rather than the t-test. The main reason for this is that block bootstrapping naively studentized statistics may lead to serious problems in terms of accuracy of the tests as discussed for example in Section 3.1.2 of Härdle et al. (2003). As this is a second order problem, it does not lead to invalidity of the bootstrap, but it may cause the bootstrap to converge at a slower rate than the standard asymptotic approximation, although the evidence of this effect in finite samples is not always present.6 Given all these modifications, we prefer to call our test statistics ‘‘pooled’’ and ‘‘group-mean’’ instead of LLC and IPS, respectively. Note though that the essence of the LLC and IPS tests remains in our tests and that our methods can be trivially extended to the original LLC and IPS statistics if one so desires.7
5 Notable exceptions are Choi and Chue (2007) and Chang and Song (2009). 6 As pointed out by one of the referees, Gonçalves and Vogelsang (forthcoming) provide an asymptotic framework that can be used as an alternative to the framework based on Edgeworth expansions to explain their observation that naively studentized statistics actually perform better than corrected studentized statistics in finite samples. 7 Note that these tests could also be implemented when we have an unbalanced panel with different numbers of observations Ti over time, provided of course the number of observations increases. The implementation of the block bootstrap in such a setting, while possible, becomes considerably more complicated.
88
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
We define the pooled statistic as the Dickey–Fuller coefficient statistic from the pooled regression of 1yi,t on yi,t −1 . Then we can write the pooled statistic as N ∑ T ∑
τp = T
yi,t −1 1yi,t
i =1 t =2 N ∑ T ∑ i =1 t =2
.
(4)
y2i,t −1
We define our group-mean statistic as the average of the DF coefficient statistics from the individual regressions of 1yi,t on yi,t −1 for each i = 1, . . . , N. We can then write the group-mean statistic as
τgm =
T ∑
N 1 −
T
N i=1
yi,t −1 1yi,t
t =2 T ∑ t =2
.
(5)
y2i,t −1
Bootstrap Algorithm. 1. Let uˆ i,t = yi,t − ρˆi yi,t −1 −
1
T −
T − 1 t =2
(yi,t − ρˆi yi,t −1 ),
(6)
yi,t −1 yi,t
t =2
for all i = 1, . . . , N .
t =2
(7)
y2i,t −1
Let uˆ t = (ˆu1,t , . . . , uˆ N ,t )′ . 2. Choose a block length b (smaller than T ).8 Draw i0 , . . . , ik−1 i.i.d. from the uniform distribution on {1, 2, . . . , T − b}, where k = ⌊(T − 2)/b⌋ + 1 is the number of blocks. 3. Construct the bootstrap errors u∗2 , . . . , u∗T as u∗t = uˆ im +s ,
(8)
where m = ⌊(t − 2)/b⌋ and s = t − mb − 1. 4. Let y∗1 = y1 and construct y∗t for t ≥ 2 recursively as y∗t = y∗t −1 + u∗t .
(9)
5. Calculate the bootstrap versions of the group-mean and pooled statistics, that is, calculate N ∑ T ∑
τp = T ∗
i=1 t =2
y∗i,t −1 1y∗i,t
N ∑ T ∑ i=1 t =2
,
(10)
y∗i,2t −1
and T ∑
N
τgm = ∗
1 − N i=1
T
Note that a crucial role in the analysis of our block bootstrap method will be played by the series ui,t = yi,t − ρi yi,t −1 .
t =2
y∗i,t −1 1y∗i,t
T ∑ t =2
(12)
As in Paparoditis and Politis (2003), ρi = 1 should correspond to a unit root in yi,t , while ρi < 1 should correspond to yi,t being stationary. Given our estimation of ρˆ i in Step 1, ρi is implicitly defined as E(yi,t −1 yi,t ) E(y2i,t −1 )
,
(13)
which fulfills these correspondences (Paparoditis and Politis, 2003, Example 2.1).9 Note that under H0 we simply have that ui,t = yi,t − yi,t −1 for all i = 1, . . . , N or in vector notation ut = 1yt . We need that the estimator in Step 1 satisfies the properties ρˆ i − ρi = Op (T −1 ) if ρi = 1 and ρˆ i − ρi = op (1) if ρi < 1. Our OLS estimator satisfies these properties.10 We also need the following assumption on the block length. Assumption 3. Let b → ∞ and b = o(T 1/2 ) as T → ∞.
where
T ∑
∑B
t →∞
We employ the following block bootstrap algorithm, which is a multivariate extension of the algorithm proposed by Paparoditis and Politis (2003) to test for unit roots in univariate time series.
ρˆ i =
value cα∗ as cα∗ = max{c : B−1 b=1 I (τκ∗b < c ) ≤ α}, or equivalently as the α -quantile of the ordered τκ∗b statistics. Reject the null of a unit root if τκ , calculated from Eq. (4) if κ = p or Eq. (5) if κ = gm, is smaller than cα∗ , where α is the nominal level of the test.
ρi = lim
3.2. Bootstrap algorithm
T ∑
6. Repeat Steps 2–5 B times, obtaining bootstrap test statistics τκ∗b , b = 1, . . . , B, κ = p, gm, and select the bootstrap critical
.
y∗i,2t −1
8 Block lengths will be discussed in detail below.
(11)
Remark 3. While we do not consider deterministic components, our tests can be modified to account for them in the same way as discussed by Levin et al. (2002) and Im et al. (2003). The crucial issue regarding the bootstrap tests is to implement exactly the same deterministic specification in the calculation of both the test statistic on the bootstrap sample and of the test statistic on the original sample. The only further modification of the bootstrap algorithm would be to include the appropriate deterministic components in Step 1 as well. We will not discuss deterministic components in detail in this paper as it would detract from our main objective to deal with cross-sectional dependence. There is a vast literature on deterministic components and their impact. Part of the literature, for example on the local power of panel unit root tests in the case of incidental trends (Moon et al., 2007), depends on N → ∞ and will therefore not apply here, although in finite samples these results will most likely have an impact on our tests as well. We would like to stress that the bootstrap will not solve any issues that arise in the presence of deterministic components. Smeekes (2009) shows for univariate unit root testing that common methods of detrending can be applied to the bootstrap sample such that the asymptotic properties of the tests are correctly replicated as long as the method of detrending in the original sample and the bootstrap sample is the same. Moreover, by simulation it is found there that the properties of the bootstrap tests in finite samples closely follow the asymptotic tests they are based on. Therefore, deterministic components can be analyzed separately from the bootstrap.
9 Given our definition of ρ it is clear that under stationarity we will always have i |ρi | < 1. Paparoditis and Politis (2003, Example 2.2) show that if one estimates and hence implicitly defines ρi differently, for example through an ADF regression, it is not always the case that ρi > −1.
10 This is similar to the conditions needed by Paparoditis and Politis (2003, Remark 2.3). Although they always include an intercept in the regression, the rates of convergence are the same because we assume zero drift (see for example Davidson, 2000, Chapter 14).
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
89
Remark 4. Unlike the methods considered by Moon and Perron (2004) and Pesaran (2007), which are essentially tests of the presence of a unit root in the idiosyncratic components as pointed out by Bai and Ng (2010), our methods are tests of the presence of a unit root in the observed data. Therefore in our setup there is no need to consider the properties of the common factors separately.
and
4. Asymptotic properties
Remark 5. To see how the Brownian motion B(r ) depends on the idiosyncratic components and on the factors, consider the following. Let Bv (r ) = Ψ1 (1)Σ 1/2 W (r ) be the Brownian motion generated by the idiosyncratic components and Bf (r ) = Ψ2 (1)Σ 1/2 W (r ) the Brownian motion generated by the common factors. With this definition B(r ) = Bv (r ) + ΛBf (r ). Note that if Ψ12 (L) = Ψ21 (L) = 0
In this section we will investigate the asymptotic properties of our (bootstrap) test statistics by letting T go to infinity while keeping N fixed. We study only T asymptotics for two reasons. First, it is standard practice in studies on resampling methods; see for example Chang (2004) and Choi and Chue (2007). Second, it is very difficult to obtain meaningful theoretical results for infinite N with our general model without making additional assumptions. However, as neither our bootstrap method nor our proofs of asymptotic validity depend on the finiteness of N, there is no reason to expect that asymptotic validity breaks down with joint T and N asymptotics. 4.1. Asymptotic properties under the main null hypothesis In this section we investigate the validity of the bootstrap procedure proposed above in setting (A), i.e. where φj = 1 for all j = 1, . . . , d and θi = 1 for all i = 1, . . . , N or equivalently Φ = Id and Θ = IN . Note that under this null hypothesis we can write ut = 1yt = Γ xt , ′
(14)
N 1 −
d
τgm − →
1 0
1
N i=1
0
xt = (vt′ , ft′ )′ = Ψ (L)εt .
(15)
4.1.1. Asymptotic properties of the test statistics We start by presenting the asymptotic distributions for the original series. After all, the bootstrap test statistics should mimic these distributions. The first step is the invariance principle, or functional central limit theorem.
1/2
and Σ12 = Σ21 = 0 we can write Bv (r ) = Ψ11 (1)Σ11 W1 (r ) 1/2 and Bf (r ) = Ψ22 (1)Σ22 W2 (r ) where W1 (r ) is of dimension N and W2 (r ) is of dimension d. For the ith element of B(r ), Bi (r ), we can then write Bi (r ) = Bv,i (r ) + λ′i Bf (r ). Note however that, given the points raised in Remark 2, it is not possible to identify Bv from Bf , and in that sense the distinction between the two is conceptual only. 4.1.2. Asymptotic properties of the bootstrap test statistics Next we turn to the bootstrap test statistics. The first step is the bootstrap invariance principle. Lemma 2. Let yt be generated under H0 setting (A). Let Assumptions 1 and 3 hold. Then, as T → ∞, ST∗ (r ) = T −1/2
⌊Tr ⌋ −
Theorem 2. Let yt be generated under H0 setting (A). Let Assumptions 1 and 3 hold. Then, as T → ∞, N ∑ 1 d∗
i=1
τp∗ − →
0
Bi (r )dBi (r ) + 12 (ωi − ω0,i ) N ∑ 1
T →∞
E
T −
d∗
ut
′ T −
t =1
Ω0 = lim T −1 T →∞
i=1
∗ τgm − →
Next define
Ω = lim T
T −
ut
0
in probability Bi (r )2 dr
N 1 −
1 0
N i =1
Bi (r )dBi (r ) + 12 (ωi − ω0,i )
1 0
Bi (r )2 dr
Theorem 2 establishes the asymptotic validity of the proposed tests. 4.2. Asymptotic properties of the tests under cross-unit cointegration In this section we look at setting (B), i.e. where Φ = IN and θi < 1 for all i = 1, . . . , N in (2). Note that in this case we may
t =1
E(ut u′t ).
write ut = 1yt = Λft + 1wt = Λft + (1 − L)(IN − Θ L)−1 vt .
The limiting distributions now follow straightforwardly.
Now let
Theorem 1. Let yt be generated under H0 setting (A) and let Assumption 1 hold. Then, as T → ∞,
[ ] (IN − Θ L)−1 Ψ1 (z ) Ψ¯ (z ) = , Ψ2 ( z )
d
τp − →
i =1
0
Bi (r )dBi (r ) + 21 (ωi − ω0,i ) N ∑ 1 i =1
in probability.
and
t =1
N ∑ 1
in probability.
and
where B(r ) = Γ ′ Ψ (1)Σ 1/2 W (r ) and W (r ) denotes an (N + d)dimensional standard Brownian motion.
d∗
u∗t − → B(r )
Lemma 2 shows that the bootstrap partial sum process correctly mimics the original partial sum process. The limiting distributions of the bootstrap test statistics now follow as given below.
d
ut − → B(r ),
t =1
−1
,
where Bi (r ) is the ith element of B(r ) = Γ ′ Ψ (1)Σ 1/2 W (r ) and ωi (ω0,i ) is the (i, i)th element of Ω (Ω0 ).
Lemma 1. Let yt be generated under H0 setting (A) and let Assumption 1 hold. Then, as T → ∞, ⌊Tr ⌋ −
Bi (r )2 dr
t =1
where Γ = (IN , Λ)′ , and
ST (r ) = T −1/2
Bi (r )dBi (r ) + 21 (ωi − ω0,i )
0
(17)
such that
[ ] wt ft
Bi (r )2 dr
(16)
= Ψ¯ (L)εt .
Note that Ψ¯ (z ) satisfies Assumption 1 just as Ψ (z ).
(18)
90
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
4.2.1. Asymptotic properties of the test statistics We start again by presenting the invariance principle for the original series. Lemma 3. Let yt be generated under H0 setting (B). Let Assumptions 1 and 2 hold. Then, as T → ∞, ⌊Tr ⌋ −
ST (r ) = T −1/2
d
ut − → B¯ (r ),
t =1
where B¯ (r ) = ΛBf (r ) and Bf (r ) = Ψ2 (1)Σ 1/2 W (r ). Note that the resulting Brownian motion B¯ (r ) has a reduced rank covariance matrix as it is only generated by the factors and not the idiosyncratic components. Define
¯ = lim T Ω
−1
T →∞
E
T −
ut
′ T − ut
t =1
¯ 0 = lim T −1 Ω
T −
T →∞
and
t =1
E(ut u′t ).
t =1
Now we can derive the asymptotic distributions. Theorem 3. Let yt be generated under H0 setting (B). Let Assumptions 1 and 2 hold. Then, as T → ∞, N ∑ 1 d
τp − →
0
i=1
¯ i − ω¯ 0,i ) B¯ i (r )dB¯ i (r ) + 12 (ω N
∑1 i=1
0
τgm − →
N 1 −
1 0
B¯ i (r )2 dr
Lemma 4. Let yt be generated under H0 setting (B). Let Assumptions 1–3 hold. Then, as T → ∞,
B¯ i (r )dB¯ i (r ) + 21 (ω ¯ i − ω¯ 0,i )
1
N i=1
0
B¯ i (r )2 dr
,
ST∗ (r ) = T −1/2
4.2.2. Asymptotic properties of the bootstrap test statistics Next we turn to the bootstrap series. Before presenting the bootstrap invariance principle, some discussion is in order. As can be seen in Lemma 3, the Brownian motion generated by the partial sum process has reduced rank as it is only driven by the factors. In order to properly replicate the structure of the original series, the same should be true for the bootstrap partial sum process. In the proof of Lemma 2 it is shown that the bootstrap series u∗t behaves approximately like uim +s , ignoring centering for the moment. Summing over the variables within one block, we obtain ui m + s =
s =1
b −
(Λfim +s + 1wim +s ) =
s=1
b −
m=0
uim +s =
s=1
⌊(k− −1)r ⌋ m=0
=
b
− Λfim +s + wim +b − wim
m=0
s=1
⌊(k− −1)r ⌋ − b s=1
Λfim +s +
Finally we derive the limiting distributions of the test statistics, again establishing the asymptotic validity of the bootstrap tests. Theorem 4. Let yt be generated under H0 setting (B). Let Assumptions 1–3 hold. Then, as T → ∞, N ∑ 1 d∗
τp∗ − →
i =1
0
B¯ i (r )dB¯ i (r ) + 21 (ω ¯ i − ω¯ 0,i ) N ∑ 1 i =1
0
in probability B¯ i (r )2 dr
and d∗
τgm − → ∗
N 1 −
N i=1
1 0
B¯ i (r )dB¯ i (r ) + 12 (ω ¯ i − ω¯ 0,i )
1 0
B¯ i (r )2 dr
in probability.
4.3. Asymptotic properties under the alternative hypothesis
as all intermediate terms cancel against each other. This also happens in the partial sum of the original series and explains why only the factors contribute to the Brownian motion. However, summing both over the blocks and within the blocks, we obtain ⌊(k− −1)r ⌋ − b
∗
d u∗t − → B¯ (r ) in probability.
Λfim +s + wim +b − wim ,
s=1
⌊Tr ⌋ − t =1
where B¯ i (r ) is the ith element of B¯ (r ) and ω ¯ i (ω¯ 0,i ) is the (i, i)th ¯ (Ω ¯ 0 ). element of Ω
b −
Remark 6. The result under the null hypothesis of cross-unit cointegration is closely related to the result obtained by Paparoditis and Politis (2003, Lemma 8.5) for the difference-based block bootstrap (DBB) under the alternative. In both cases one bootstraps an over-differenced series. However, as the result for the DBB is under the alternative hypothesis, the different bootstrap stochastic order leads to serious (power) problems, whereas in our setting, under the null hypothesis, it is what preserves the validity of the bootstrap tests. The result described above is formalized in Lemma A.9 in the Appendix. Given the discussion above, it is clear that the bootstrap validity is preserved in this setting, giving rise to the following bootstrap invariance principle.
and d
where now the endpoints of the blocks do not cancel against each other as the blocks are randomly selected. The first term in this sum is the partial sum process of the factors, which generates the Brownian motion in Lemma 3 if we divide by T 1/2 . The second part is the partial sum process of the idiosyncratic components which generates an (unwanted) Brownian motion by dividing by k1/2 . As this rate is slower than T 1/2 by Assumption 3, the second part will vanish at rate T 1/2 /k1/2 , so at rate b1/2 . Therefore, an increasing block length is crucial to make the second part vanish. In finite samples however one will always have a nonzero partial sum of the idiosyncratic components, although the magnitude will depend on both the sample size and the actual block length. Due to this, the covariance matrix of the resulting Brownian motion will almost always be of full rank in finite samples instead of reduced rank as in Lemma 3. It might therefore be expected that in this setting the block bootstrap might not work optimally in finite samples, although it is also clear that large block lengths should improve the performance of the tests in this case.
⌊(k− −1)r ⌋
(wim +b − wim ),
m=0
Let us start by considering the alternative H1a (stationarity for all yi,t ). Let us define yt = Λ(Id − Φ L)−1 ft + (IN − Θ L)−1 vt = Γ ′ Ψ + (L)εt ,
(19)
where
Ψ + ( L) =
[
] (IN − Θ L)−1 Ψ1 (L) . (Id − Φ L)−1 Ψ2 (L)
(20)
Note that the lag polynomial Ψ + (z ) meets the conditions in Assumption 1.
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
We start by describing the asymptotic properties of our test statistics.
n1
∑ d
τp − →
91
(γi (1) − γi (0)) +
i=1
T
−1
p
τp − →
N 1 ∑
p
γi (0)
T
τgm − →N
N − γi (1) − γi (0)
γi (0)
i=1
,
Lemma 5 shows that both test statistics diverge to −∞ under H1a as γi (1) < γi (0) for all i = 1, . . . , N. This is a necessary, but not a sufficient step in showing consistency of the bootstrap tests. The second step that is needed is to show that the bootstrap tests, and correspondingly the bootstrap critical values, do not diverge under H1a . To this end, let P = diag(ρ1 , . . . , ρN ) and consequently ut = (IN − PL)yt . Then ut = (IN − PL)Γ ′ Ψ + (L)εt = Ψ ++ (L)εt ,
(21)
where Ψ ++ (L) = (IN − PL)Γ ′ Ψ + . Note that the summability condition from Assumption 1 still holds for this lag polynomial. Therefore we can give the following theorem. Theorem 5. Let yt be generated under 3 hold. Then, as T → ∞,
∗ ∗ d
τp − →
i=1
0
H1a .
i=1
0
d∗
τgm − →
N 1 −
1 0
0
2 B+ i (r ) dr
in probability,
∑∞
j=0
in probability,
Ψj++ ΣΨj++′ , respectively.
Note that Lemma 5 and Theorem 5 jointly establish the consistency of our tests. Let us now consider H1b . Again we first look at the properties of the test statistics. Let us first without loss of generality assume that the first n1 units are I (0), while the rest is I (1). Hence, ρi < 1 for i = 1, . . . , n1 and ρi = 1 for i = n1 + 1, . . . , N. We can then define ut = yt − Pyt −1 , where parts of the ρi are equal to one and the rest is smaller than one. We may then write that ut = Ψ # (L)εt ,
∗ ∗ d p
τ − →
d∗
τgm − →
+ ++ where B+ (1)Σ 1/2 W (r ) and i (r ) is the ith element of B (r ) = Ψ + + + ωi and ω0,i are the (i, i)th elements of Ω = Ψ ++ (1)ΣΨ ++ (1)′
and Ω0+ =
Ψj# ΣΨj#′ , respectively.
N ∑ 1
∗
2 B+ i (r ) dr
1
N i =1
j =0
i=1
0
B#i (r )dB#i (r ) + 12 (ωi# − ω0#,i )
N ∑ 1 0
in probability,
B#i (r )2 dr
and
+ + + 1 B+ i (r )dBi (r ) + 2 (ωi − ω0,i )
∑∞
Corollary 1. Let yt be generated under H1b . Let Assumptions 1 and 3 hold. Then, as T → ∞,
i =1
and ∗
,
We see that the group-mean statistic diverges to −∞ as it should. The pooled statistic does not diverge however, which means that it is not consistent against this alternative. This is in fact not surprising, given that the pooled test is designed as a large Ntest for homogeneous alternatives (also see Remark 8). The reason for the inconsistency is that in the denominator the stationary units vanish (as they should) as T → ∞, but the nonstationary units remain. Let us turn to the bootstrap series. Given our expression of ut above, we can simply combine the proofs for the unit root and stationary series and directly state the limiting distributions as a corollary.
Let Assumptions 1 and
+ + + 1 B+ i (r )dBi (r ) + 2 (ωi − ω0,i ) N ∑ 1
γi (0)
Ψ # (1)ΣΨ # (1)′ and Ω0# =
where γi (j) = E(yi,t −j yi,t ).
N ∑ 1
B#i (r )2 dr
where γi (j) = E(yi,t −j yi,t ), B#i (r ) is the ith element of B# (r ) = Ψ # (1)Σ 1/2 W (r ) and ωi# and ω0#,i are the (i, i)th elements of Ω # =
and −1
0
n1 − γi (1) − γi (0) i=1
i=1
−1
i=n1 +1
T −1 τgm − → N −1
i=1
p
0
B#i (r )dB#i (r ) + 21 (ωi# − ω0#,i )
and
(γi (1) − γi (0)) N ∑
∑ 1 i=n1 +1
Lemma 5. Let yt be generated under H1a . Let Assumption 1 hold. Then, as T → ∞, N ∑
N
(22)
where the values for Ψ (L) for the I (1) components are determined as in the analysis under the null, and for the I (0) components as in the analysis above. The summability condition will clearly still hold for Ψ # (L). The limit behavior of the test statistics is then given by the following lemma. #
Lemma 6. Let yt be generated under H1b . Let Assumption 1 hold. Then, as T → ∞,
N 1 −
N i =1
1 0
B#i (r )dB#i (r ) + 12 (ωi# − ω0#,i )
1 0
B#i (r )2 dr
in probability.
Note that Lemma 6 and Corollary 1 jointly establish the consistency of the bootstrap group-mean test. Also note that the inconsistency of the pooled test does not depend on the bootstrap distribution, but purely on the original test statistic. Remark 7. It might seem that our bootstrap method does not correctly reproduce the asymptotic null distribution if the alternative is true as the nuisance parameters are different from those appearing in Theorem 2 for example, but this is not so straightforward. It all depends on how the alternative is formulated in relation to the null. Had we formulated our alternative as yt = Pyt −1 + ut where ut = Γ ′ Ψ (L)εt , the nuisance parameters would have been the same. The key to understanding this is that the process under the null corresponding to the process in (1) and (2) with Φ and Θ implying stationarity is not necessarily the same process with Φ = Id and Θ = IN . Remark 8. A few qualifications are in order regarding the inconsistency of the pooled test. First, the actual location of the pooled test can be seen to depend on both the proportion of stationary units (through n1 in the sums) and the distance from the null (through the quantity γi (1) − γi (0)). If either becomes larger, the statistic will become more negative. Second, if T increases, the denominator will become smaller as the sum over the stationary units disappears (the biT part in the proof). Hence the test statistic will grow larger with increasing T , but the denominator will not go to zero as the nonstationary part does not vanish. Both factors imply that the actual power of the test can still be non-trivial and even reach 1.
92
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
5. Small sample performance In this section we will investigate the small sample properties of our tests using Monte Carlo simulations. First we perform a simulation study to investigate the properties of our tests while fixing the block length to be a function of T only. Next we will perform a separate and smaller simulation study to investigate the selection of block lengths. 5.1. Monte Carlo design We consider the following DGP for the simulation study. yt = ΛFt + wt ,
(23)
where Ft is a single (scalar) factor and Ft = φ Ft −1 + ft ,
wi,t = θi wi,t −1 + vi,t .
(24)
Furthermore,
vt = A1 vt −1 + ε1,t + B1 ε1,t −1 , ft = α2 ft −1 + ε2,t + β2 ε2,t −1 ,
(25)
where ε2,t ∼ N (0, 1) and
ε1,t ∼ N (0, Σ ) , where Σ is generated as in Chang (2004): 1. Generate an N × N matrix U ∼ U [0, 1]. Construct H = U (U ′ U )−1/2 . 2. Generate N eigenvalues ζ1 , . . . , ζN with ζ1 = r , ζN = 1 and ζi ∼ U [r , 1] for i = 2, . . . , N − 1. 3. Let Z = diag(ζ1 , . . . , ζN ). Then let Σ = HZH ′ . We consider both r = 1 (no cross-sectional dependence) and r = 0.1. We consider five settings regarding the parameters in equations (23) and (25) in accordance with Gengenbach et al. (2010). I No common factor, unit root for all idiosyncratic components: λi = 0, θi = 1 for all i = 1, . . . , N. II Unit root in common factor and idiosyncratic components: φ = 1, θi = 1 for all i = 1, . . . , N and λi ∼ U [−1, 3]. III Unit root in common factor, stationary idiosyncratic components: φ = 1, θi ∼ U [0.8, 1] and λi ∼ U [−1, 3]. This is the setting of cross-unit cointegration. IV No common factor, stationary idiosyncratic component: θi ∼ U [0.8, 1] and λi = 0 for all i = 1, . . . , N. This is under the alternative hypothesis.11 V Stationary common factor and idiosyncratic component: φ = 0.95, θi ∼ U [0.8, 1] and λi ∼ U [−1, 3]. This is also under the alternative hypothesis. We consider two different options for the parameters A1 and B1 : 1. No dynamic dependence: A1 = B1 = 0. 2. Dynamic autoregressive moving-average cross-sectional dependence: A1 and B1 are non-diagonal. We let A1 = Ξ , where
11 The reported power estimates are not size adjusted. We give raw power as we believe this is empirically more relevant than the usual size-corrected power; also see Horowitz and Savin (2000) for a discussion on the empirical relevance of the usual size-corrected powers in Monte Carlo simulations.
Ξ =
ξ1 ξ2 η2 .. .
ξN ηNN −1
ξ1 η1 ξ2
ξ1 η12 ξ2 η2 .. .
··· ···
···
ξN ηN2
ξN ηN
ξ1 η1N −1 ξ2 η2N −2 .. , .
(26)
ξN
where ξi , ηi ∼ U [−0.5, 0.5]. To ensure stationarity and invertibility we impose that det(IN − A1 z ) ̸= 0 for {z ∈ C : |z | ≤ 1.2}. Furthermore, we let B1 = Ω . We construct Ω in much the same way as Σ . Let M = HLH ′ where H = U (U ′ U )−1/2 , with U an N × 1-vector of U [0, 1]-variables, and define L as a diagonal matrix with on the diagonal ℓ1 , . . . , ℓN where ℓ1 = 0.1, ℓN = 1 and ℓ2 , . . . , ℓN −1 ∼ U [0.1, 1]. We then let Ω = 2M − IN . By generating Ω this way we assure that IN + Ω is of full rank. Note that invertibility is not guaranteed (on purpose). The parameters of the common factor in (25), α2 and β2 , are taken in accordance with the setting for the idiosyncratic components, so if the dependence for the idiosyncratic components is of the ARMA type, then the same will hold for the common factor. Note that for both Σ and the Ψ (1) matrix derived from A1 and B1 the eigenvalues are bounded if N → ∞; as such these parameters can be regarded as weak dependence parameters. For all combinations of the parameters described above we consider all combinations of T = 25, 50, 100 and N = 5, 25, 50. As several parameters in our DGP are chosen randomly, we repeat the simulations for each setting ten times, and store the mean, median, minimum and maximum. We only report results for the mean here. The mean is representative as in general there is little dispersion between the simulation results. The other results are available upon request. The results are based on 2000 simulations and the Warp-Speed bootstrap (Giacomini et al., 2007) is used to obtain estimates for the rejection frequencies of the bootstrap tests.12 The nominal level is 0.05. The approximate confidence interval around 0.05 with 2000 simulations is (0.042, 0.058).13 In our simulation study we consider the LLC and IPS tests (with lag lengths selected by BIC), denoted by τllc and τips respectively, and the bootstrap pooled and group-mean tests, denoted by τp and τgm . We also consider a bootstrap test based on the median of the individual test statistics, denoted by τmed . This test might be more robust to outlying units than the test based on the mean (also see the discussion in Di Iorio and Fachin, 2008). While we do not consider this test explicitly in our theoretical analysis as the median presents difficulties for asymptotic analysis, it is clear that a median-based test will be valid as well as we can show that the joint bootstrap distribution of the individual DF statistics is asymptotically valid. Block lengths of the bootstrap tests were taken as b = 1.75T 1/3 , which amounts to blocks of length 6, 7 and 9 for sample sizes 25, 50 and 100 respectively, which is within the range usually considered in the literature. We return to the issue of block length selection in Section 5.3. 5.2. Monte Carlo results Table 1 presents results for the setting without common factors. It can be noted in general that the asymptotic tests have poor size for T = 25, which is mainly caused by the performance of the BIC, as this tends to select too large lag lengths for T = 25.14 From
12 The Warp-Speed bootstrap greatly reduces the computational cost of performing the simulations by only drawing one bootstrap replication for each simulation. Giacomini et al. (2007) show that under quite general conditions the Warp-Speed bootstrap is capable of calibrating the finite sample coverage of bootstrap confidence intervals if the bootstrap is asymptotically valid (which we show in this paper). Because of the close relation between confidence intervals and hypothesis testing the method should work properly in our setting as well. 13 This interval only takes into account the randomness of the simulations, not of the Warp-Speed bootstrap, nor takes into account that we take averages of 10 runs. 14 A similar result was obtained by Hlouskova and Wagner (2006).
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
93
Table 1 Size properties without common factors (setting I). A1 , B1
Σ
T
N
τllc
τp
τips
τgm
τmed
25 25 25 50 50 50 100 100 100
5 25 50 5 25 50 5 25 50
0.140 0.211 0.260 0.076 0.063 0.055 0.077 0.062 0.056
0.024 0.001 0.000 0.031 0.004 0.000 0.032 0.009 0.001
0.141 0.183 0.207 0.051 0.060 0.056 0.049 0.051 0.051
0.020 0.005 0.001 0.024 0.011 0.003 0.032 0.014 0.005
0.025 0.009 0.002 0.033 0.014 0.004 0.035 0.020 0.010
5 25 50 5 25 50 5 25 50
0.151 0.200 0.236 0.111 0.077 0.067 0.113 0.084 0.073
0.026 0.003 0.000 0.033 0.006 0.001 0.040 0.013 0.003
0.159 0.197 0.215 0.070 0.072 0.069 0.066 0.064 0.067
0.022 0.006 0.001 0.028 0.009 0.004 0.031 0.015 0.007
0.031 0.010 0.002 0.037 0.017 0.005 0.039 0.021 0.012
5 25 50 5 25 50 5 25 50
0.215 0.235 0.280 0.154 0.113 0.109 0.152 0.130 0.117
0.054 0.004 0.000 0.052 0.008 0.001 0.067 0.013 0.004
0.207 0.198 0.237 0.123 0.097 0.106 0.110 0.108 0.096
0.082 0.032 0.010 0.097 0.032 0.023 0.099 0.028 0.026
0.055 0.016 0.004 0.054 0.020 0.010 0.064 0.023 0.015
5 25 50 5 25 50 5 25 50
0.222 0.252 0.265 0.197 0.144 0.127 0.187 0.158 0.143
0.056 0.003 0.000 0.049 0.011 0.001 0.055 0.012 0.004
0.212 0.238 0.214 0.146 0.131 0.119 0.129 0.129 0.119
0.063 0.020 0.007 0.059 0.052 0.018 0.094 0.029 0.027
0.051 0.012 0.003 0.046 0.030 0.008 0.054 0.023 0.015
Panel A: No short-run dependence A1 = 0, B1 = 0
r =1
Panel B: Contemporaneous, but no dynamic dependence A1 = 0, B1 = 0
r = 0.1
25 25 25 50 50 50 100 100 100
Panel C: No contemporaneous, but dynamic dependence A1 = Ξ , B1 = Ω
r =1
25 25 25 50 50 50 100 100 100
Panel D: Contemporaneous and dynamic dependence A1 = Ξ , B1 = Ω
r = 0.1
25 25 25 50 50 50 100 100 100
A1 , B1 and Σ with smallest eigenvalue r are defined in Section 5.1. τllc = LLC test; τp = pooled bootstrap test; τips = IPS test; τgm = group-mean bootstrap test; τmed = median bootstrap test.
T = 50 on this does not happen anymore. Panel A presents results for the setting without any dependence (both temporal and crosssectional). It can be seen that the asymptotic tests have good size properties for T = 50 and T = 100, while the bootstrap tests are undersized increasing in N. Panel B lists results for the setting where there is only contemporaneous correlation. The asymptotic tests have slight positive size distortions here, while the bootstrap tests are somewhat undersized. Panel C and D give results for the model with autoregressive moving-average errors. It is clear here that the asymptotic tests are quite oversized, while the bootstrap tests perform well although there is some undersize increasing with N. There is little difference between the three bootstrap tests. Table 2 present the results for the model with a nonstationary common factor and nonstationary idiosyncratic components. For all three settings considered the table shows that the bootstrap tests have good size properties, while the asymptotic tests have large size distortions increasing with N. The bootstrap tests again perform very similarly. Table 3 gives the results for the model with cross-unit cointegration, i.e. with a nonstationary common factor and stationary idiosyncratic components. The asymptotic tests have very large size distortions, and while the size distortions of the bootstrap tests are significantly less, they are still large. As expected it indeed seems that the bootstrap tests do not perform very well in this setting.
The problem partly arises, especially for the group-mean test, because for some units the loadings will be very close to zero, thereby making that unit effectively stationary and hence inflating the test statistic. In such a situation we may expect the median-based test to be more robust, and indeed it seems to perform somewhat better than the group-mean test although it still suffers from considerable size distortions. Table 4 presents results for the model under the alternative without a common factor. The power of the bootstrap tests is satisfactory, and as expected, increases with both T and N. The only setting in which we can directly compare the power of the asymptotic and the bootstrap tests is the setting of no dependence (Panel A), and here power results are very similar. Given that the bootstrap tests are somewhat undersized, this shows that the power of the bootstrap tests is good. In the other settings the power of the bootstrap test is somewhat less than the power of the asymptotic tests, which can be explained by the size distortions of the asymptotic tests. Note that the bootstrap tests perform similarly. Table 5 gives results for power with a common factor. It can be seen that the power of the bootstrap tests still increases with T and N, although the power is less than that in Table 4 and especially the increase in power with N is less. This is not surprising as the common factor which is present in every unit ensures that
94
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Table 2 Size properties with common factors (setting II). A1 , B1
Σ
T
N
τllc
τp
τips
τgm
τmed
25 25 25 50 50 50 100 100 100
5 25 50 5 25 50 5 25 50
0.212 0.288 0.352 0.165 0.218 0.263 0.160 0.213 0.253
0.029 0.014 0.009 0.036 0.021 0.018 0.039 0.030 0.020
0.188 0.337 0.425 0.104 0.280 0.362 0.095 0.258 0.342
0.024 0.015 0.013 0.030 0.022 0.020 0.030 0.025 0.022
0.030 0.023 0.014 0.037 0.030 0.023 0.038 0.035 0.024
5 25 50 5 25 50 5 25 50
0.229 0.314 0.359 0.193 0.283 0.296 0.182 0.277 0.315
0.030 0.025 0.023 0.036 0.032 0.027 0.040 0.034 0.031
0.204 0.389 0.467 0.122 0.332 0.393 0.114 0.316 0.390
0.029 0.021 0.021 0.031 0.026 0.025 0.030 0.030 0.031
0.034 0.027 0.024 0.036 0.031 0.028 0.036 0.038 0.035
5 25 50 5 25 50 5 25 50
0.269 0.351 0.406 0.253 0.348 0.378 0.252 0.373 0.420
0.022 0.012 0.007 0.025 0.015 0.013 0.032 0.023 0.023
0.243 0.381 0.448 0.177 0.358 0.411 0.172 0.362 0.425
0.023 0.013 0.010 0.050 0.018 0.016 0.032 0.025 0.022
0.022 0.016 0.011 0.024 0.020 0.021 0.031 0.028 0.028
T
N
τllc
τp
τips
τgm
τmed
25 25 25 50 50 50 100 100 100
5 25 50 5 25 50 5 25 50
0.410 0.629 0.698 0.612 0.782 0.816 0.674 0.798 0.845
0.129 0.198 0.224 0.200 0.267 0.281 0.240 0.303 0.336
0.332 0.585 0.643 0.463 0.620 0.671 0.550 0.642 0.688
0.103 0.173 0.173 0.169 0.259 0.282 0.333 0.282 0.399
0.102 0.198 0.216 0.170 0.251 0.283 0.257 0.275 0.336
5 25 50 5 25 50 5 25 50
0.461 0.612 0.667 0.554 0.742 0.764 0.651 0.806 0.819
0.096 0.128 0.166 0.141 0.179 0.184 0.171 0.177 0.211
0.356 0.561 0.598 0.362 0.552 0.591 0.431 0.622 0.633
0.086 0.111 0.148 0.136 0.183 0.175 0.187 0.198 0.253
0.084 0.126 0.177 0.137 0.184 0.183 0.149 0.176 0.210
5 25 50 5 25 50 5 25 50
0.427 0.549 0.597 0.466 0.670 0.698 0.464 0.701 0.738
0.047 0.062 0.067 0.085 0.098 0.089 0.100 0.117 0.109
0.357 0.492 0.522 0.309 0.516 0.539 0.305 0.527 0.575
0.049 0.071 0.065 0.125 0.122 0.107 0.136 0.165 0.148
0.041 0.064 0.068 0.089 0.092 0.087 0.115 0.104 0.101
Panel A: No short-run dependence A1 = 0, B1 = 0
r =1
Panel B: Contemporaneous, but no dynamic dependence A1 = 0, B1 = 0
r = 0.1
25 25 25 50 50 50 100 100 100
Panel C: Contemporaneous and dynamic dependence A1 = Ξ , B1 = Ω
r = 0.1
25 25 25 50 50 50 100 100 100
See Table 1.
Table 3 Size properties with cross-unit cointegration (setting III). A1 , B1
Σ
Panel A: No short-run dependence A1 = 0, B1 = 0
r =1
Panel B: Contemporaneous, but no dynamic dependence A1 = 0, B1 = 0
r = 0.1
25 25 25 50 50 50 100 100 100
Panel C: Contemporaneous and dynamic dependence A1 = Ξ , B1 = Ω
See Table 1.
r = 0.1
25 25 25 50 50 50 100 100 100
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
95
Table 4 Power properties without common factors (setting IV). A1 , B1
Σ
T
N
τllc
τp
τips
τgm
τmed
25 25 25 50 50 50 100 100 100
5 25 50 5 25 50 5 25 50
0.607 0.829 0.875 0.754 0.995 1.000 0.905 1.000 1.000
0.507 0.866 0.958 0.757 0.999 1.000 0.929 1.000 1.000
0.651 0.980 0.999 0.829 1.000 1.000 0.989 1.000 1.000
0.354 0.894 0.996 0.810 1.000 1.000 0.974 1.000 1.000
0.337 0.892 0.996 0.773 1.000 1.000 0.946 1.000 1.000
5 25 50 5 25 50 5 25 50
0.553 0.832 0.878 0.856 0.998 1.000 0.928 1.000 1.000
0.508 0.827 0.919 0.630 0.989 1.000 0.943 1.000 1.000
0.608 0.985 1.000 0.887 1.000 1.000 0.996 1.000 1.000
0.357 0.850 0.981 0.648 1.000 1.000 0.985 1.000 1.000
0.361 0.854 0.980 0.633 0.999 1.000 0.950 1.000 1.000
5 25 50 5 25 50 5 25 50
0.573 0.798 0.866 0.816 0.991 1.000 0.896 1.000 1.000
0.491 0.618 0.711 0.700 0.955 0.983 0.891 0.996 0.999
0.625 0.972 0.998 0.865 1.000 1.000 0.975 1.000 1.000
0.498 0.862 0.984 0.779 1.000 1.000 0.950 1.000 1.000
0.391 0.768 0.934 0.675 0.997 1.000 0.955 1.000 1.000
5 25 50 5 25 50 5 25 50
0.575 0.767 0.841 0.755 0.989 0.999 0.985 0.999 1.000
0.354 0.754 0.729 0.693 0.934 0.989 0.762 0.995 1.000
0.609 0.949 0.997 0.814 1.000 1.000 0.994 1.000 1.000
0.413 0.860 0.981 0.729 0.999 1.000 0.884 1.000 1.000
0.361 0.820 0.941 0.607 0.996 1.000 0.830 1.000 1.000
Panel A: No short-run dependence A1 = 0, B1 = 0
r =1
Panel B: Contemporaneous, but no dynamic dependence A1 = 0, B1 = 0
r = 0.1
25 25 25 50 50 50 100 100 100
Panel C: No contemporaneous, but dynamic dependence A1 = Ξ , B1 = Ω
r =1
25 25 25 50 50 50 100 100 100
Panel D: Contemporaneous and dynamic dependence A1 = Ξ , B1 = Ω
r = 0.1
25 25 25 50 50 50 100 100 100
See Table 1.
the information on the order of integration is not increased by much by the addition of units in the panel. The fact that the power of the asymptotic tests is higher than the power of the bootstrap tests can be explained by the large size distortions of the asymptotic tests in this case. The bootstrap tests all have similar power properties, although the median-based test seems to be somewhat less powerful than the group-mean test. To conclude, we can see that the bootstrap tests have reasonable finite sample properties, with the exception of the oversize in the cross-unit cointegration setting and, to a lesser extent, the undersize in the setting without common factors for increasing N.15
15 In order to compare the finite sample performance of our tests to that of factor-based tests, we consider the simulation study of Gengenbach et al. (2010), which uses a DGP that is fairly similar to ours (with the exception of the shortrun dynamics). We see that many of the tests considered there also suffer from size distortions, even in the setting with both factors and idiosyncratic components I (1). In the cross-unit cointegration setting most tests, with the exception of the tests by Bai and Ng (2004), suffer from severe size distortions. It seems that no test considered in that simulation study generally outperforms our bootstrap tests, both in terms of size and power. We can also compare the bootstrap tests to the subsampling tests of Choi and Chue (2007), which in a way are the only tests that are directly comparable to ours in terms of underlying assumptions. Their simulation study shows that the subsampling tests have good size properties, and they clearly
5.3. Block length selection The Monte Carlo experiment in the previous section was done with fixed block lengths. It is well known from the literature on block bootstrap that the block length selected can have a large effect on the performance of any kind of application of the block bootstrap. That is of course valid here as well. Added to the usual issues relating to the structure of the temporal dependence, block length selection is also important in our setting in the case of cross-unit cointegration, where one can expect that large blocks are needed based on the discussion in Section 4.2.2. Our discussion here mirrors the discussion in Paparoditis and Politis (2003, Section 6.1), who discuss the selection of block lengths for univariate unit root tests. Quite some research has been done on optimal block length selection in the framework of stationary time series. As noted in Paparoditis and Politis (2003) in order to talk about optimality one needs to set a criterion that is to be optimized. This criterion will depend on the type of application of the bootstrap (variance estimation, confidence intervals, hypothesis tests, etc.). Using
outperform our bootstrap tests in the case of cross-unit cointegration. However, the DGPs employed by them are not completely comparable to ours in terms of shortrun dynamics and factor loadings.
96
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Table 5 Power properties with common factors (setting V). A1 , B1
Σ
T
N
τllc
τp
τips
τgm
τmed
25 25 25 50 50 50 100 100 100
5 25 50 5 25 50 5 25 50
0.597 0.784 0.840 0.883 0.989 0.997 0.978 1.000 1.000
0.213 0.345 0.377 0.505 0.677 0.723 0.822 0.944 0.956
0.512 0.795 0.858 0.783 0.957 0.974 0.970 1.000 0.999
0.173 0.306 0.326 0.504 0.672 0.750 0.830 0.964 0.981
0.177 0.344 0.365 0.474 0.658 0.722 0.802 0.933 0.961
5 25 50 5 25 50 5 25 50
0.590 0.767 0.822 0.828 0.981 0.994 0.963 1.000 1.000
0.189 0.265 0.271 0.415 0.503 0.497 0.695 0.840 0.853
0.506 0.772 0.812 0.682 0.935 0.958 0.928 0.996 0.998
0.140 0.236 0.234 0.423 0.487 0.509 0.792 0.883 0.912
0.148 0.263 0.266 0.399 0.497 0.508 0.731 0.834 0.850
5 25 50 5 25 50 5 25 50
0.534 0.707 0.753 0.801 0.953 0.972 0.967 0.998 0.999
0.078 0.096 0.124 0.243 0.270 0.272 0.482 0.527 0.596
0.480 0.674 0.720 0.669 0.876 0.903 0.904 0.987 0.989
0.087 0.085 0.126 0.244 0.329 0.328 0.619 0.712 0.756
0.074 0.100 0.124 0.190 0.253 0.276 0.454 0.494 0.564
Panel A: No short-run dependence A1 = 0, B1 = 0
r =1
Panel B: Contemporaneous, but no dynamic dependence A1 = 0, B1 = 0
r = 0.1
25 25 25 50 50 50 100 100 100
Panel C: Contemporaneous and dynamic dependence A1 = Ξ , B1 = Ω
r = 0.1
25 25 25 50 50 50 100 100 100
See Table 1.
higher order asymptotics, it has been found for stationary series that an optimal block length bopt is of the form bopt = CT 1/κ ,
(27)
where κ is a known integer depending on the type of application and C is usually unknown and depends on the data. Härdle et al. (2003) and Lahiri (2003) give an overview on optimal block lengths in stationary time series. Several methods have been proposed in the setting where one can describe bopt as in (27). Some are based on the estimation of C by exploiting the dependence of C on certain quantities that can be estimated. Bühlmann and Künsch (1999) and Politis and White (2004) are examples of such methods that are applicable for variance estimation. Lahiri et al. (2007) propose a plug-in method, based on the jackknife-after-bootstrap, that is also applicable for confidence intervals and hypothesis tests. A different method is the subsampling approach by Hall et al. (1995). The attractive feature of this method is that it avoids the estimation of C . This feature, as well as the ease of its implementation, has made this method a popular choice among practitioners. It does however require knowledge of κ to implement it. The problem with nonstationary time series is that κ is unknown here, as the required asymptotic expansions have not been developed yet. This makes it very difficult to implement any of the methods discussed above using a well founded choice of κ . Paparoditis and Politis (2003) discuss this issue and suggest some heuristic ideas to determine κ . Alternative strategies to the methods discussed above are provided by the minimum volatility method and calibration method proposed by Politis et al. (1999). These methods do not require knowledge of κ . The minimum volatility method involves calculating critical values using a range of block lengths and selecting the optimal one in the region where the critical values have the lowest volatility.
We will focus here on the calibration method.16 In particular, we will consider the Warp-Speed calibration method, which is a modification of the original calibration method by Giacomini et al. (2007) for the purpose of constructing confidence intervals. We present the procedure for hypothesis tests below for completeness (also see Remark 9). Block length selection by Warp-Speed calibration 1. Choose a starting value b0 for the block length. Using this value, generate K bootstrap samples: ({y1t }, . . . , {yKt }). Calculate the
statistic of interest for each bootstrap sample, say θˆ k (b0 ) for k = 1, . . . , K . Using the empirical distribution of the statistics, calculate the bootstrap critical value c (b0 ). 2. Let (b1 , . . . , bM ) be the candidate block lengths. For each i = 1, . . . , M and k = 1, . . . , K , construct one bootstrap resample from the bootstrap sample {ykt } using block length bi , call this {ykt (i)}. Using each resample calculate the statistic of interest, say θˆ ∗k (bi ). 3. Using the distribution of θˆ ∗k (bi ) for k = 1, . . . , K , calculate the bootstrap resample critical value c ∗ (bi ) for all i = 1, . . . , M. 4. Select the optimal block length bopt such that bopt = arg min |c ∗ (bi ) − c (b0 )|. bi ,i=1...,M
(28)
To reduce the dependence on b0 one can apply this algorithm iteratively, by using bopt as the starting block length in the next iteration and continuing until convergence.
16 We also considered the minimum volatility method, the subsampling method by Hall et al. (1995) and the plug-in method by Lahiri et al. (2007), the latter two with the value for κ based on the results for stationary time series, but all these methods were inferior to the calibration method; see Remark 10.
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
97
Table 6 Size properties with block length selection. A1 , B1
Σ
T
N
τp RF
τgm AvB
OpB
OpRF
RF
AvB
OpB
OpRF
Panel A: No common factors (setting I) A1 = 0, B1 = 0
r =1
r = 0.1
A1 = Ξ , B1 = Ω
r =1
r = 0.1
25 25 50 50 25 25 50 50 25 25 50 50 25 25 50 50
5 25 5 25 5 25 5 25 5 25 5 25 5 25 5 25
0.022 0.002 0.028 0.012 0.044 0.010 0.034 0.016 0.032 0.002 0.042 0.006 0.024 0.008 0.042 0.008
3.992 2.778 6.472 3.610 4.414 2.944 6.990 3.658 4.056 2.304 7.968 2.918 4.502 2.578 7.212 3.314
1 1 5 1 4 1 1 1 2 1 6 3 4 1 6 5
0.038 0.012 0.050 0.020 0.050 0.014 0.052 0.024 0.038 0.008 0.052 0.010 0.048 0.024 0.054 0.012
0.018 0.012 0.020 0.014 0.014 0.012 0.034 0.014 0.012 0.008 0.024 0.046 0.042 0.028 0.036 0.010
4.172 2.762 6.468 3.760 4.378 2.912 6.482 3.672 4.504 2.332 9.906 4.144 5.066 2.968 7.092 3.434
1 2 2 1 2 1 5 3 2 2 13 5 10 3 4 2
0.038 0.038 0.050 0.038 0.050 0.062 0.054 0.038 0.044 0.032 0.050 0.052 0.042 0.032 0.050 0.020
5 25 5 25 5 25 5 25 5 25 5 25
0.022 0.018 0.018 0.018 0.028 0.016 0.020 0.044 0.006 0.000 0.010 0.014
4.878 3.510 8.850 5.566 5.494 3.978 10.292 7.154 4.946 4.572 8.606 6.270
2 2 12 2 6 1 5 4 3 3 2 3
0.046 0.042 0.050 0.046 0.046 0.038 0.048 0.050 0.020 0.038 0.036 0.018
0.020 0.008 0.034 0.024 0.020 0.004 0.032 0.038 0.004 0.006 0.012 0.016
4.950 3.286 7.086 5.064 4.938 4.154 8.892 6.098 5.128 4.558 8.802 5.532
2 1 5 1 1 2 2 4 5 6 7 6
0.052 0.060 0.050 0.050 0.048 0.048 0.056 0.048 0.014 0.026 0.028 0.024
5 25 5 25 5 25 5 25 5 25 5 25
0.098 0.192 0.126 0.230 0.024 0.152 0.044 0.180 0.046 0.082 0.032 0.070
5.442 4.208 9.266 5.984 6.626 4.296 11.222 7.512 5.140 4.044 9.536 7.192
16 17 30 37 5 15 10 35 5 10 6 1
0.054 0.078 0.050 0.090 0.046 0.056 0.048 0.062 0.050 0.046 0.048 0.052
0.128 0.144 0.108 0.266 0.018 0.102 0.038 0.132 0.044 0.114 0.018 0.086
5.038 3.976 8.056 5.140 6.394 4.122 9.448 6.572 4.756 3.734 8.686 6.266
16 19 27 33 1 15 26 24 8 14 3 20
0.076 0.070 0.052 0.114 0.052 0.052 0.052 0.076 0.052 0.050 0.040 0.052
Panel B: Common factors (setting II) A1 = 0, B1 = 0
r =1
r = 0.1
A1 = Ξ , B1 = Ω
r = 0.1
25 25 50 50 25 25 50 50 25 25 50 50
Panel C: Cross-unit cointegration (setting III) A1 = 0, B1 = 0
r =1
r = 0.1
A1 = Ξ , B1 = Ω
r = 0.1
25 25 50 50 25 25 50 50 25 25 50 50
RF = rejection frequency with block length selection; AvB = average block length selected; OpB = optimal block length (such that the corresponding rejection frequency is as close as possible to 0.05); OpRF = rejection frequency corresponding to the optimal block length.
To analyze the performance of the method, we performed a small Monte Carlo experiment using the same DGP as in Section 5.1 applying the tests τp and τgm . Based on 500 simulations, we let the block length be selected by the Warp-Speed calibration method, and using the same seed, we run the tests for a wide range of fixed block lengths (up to 0.75 times the sample size) to determine the optimal block length. As starting block lengths we take the fixed block lengths from the previous section, while we take K = 199. Due to computational costs we do not iterate the algorithm. Results for size are given in Table 6. Optimal block lengths are determined as those block lengths which give an empirical rejection frequency the closest to the nominal level (5%). It can be seen that while the optimal rejection frequencies are not obtained using the block length selection method, the rejection frequencies for setting I and II are reasonably close. However, while the selected block lengths do increase for setting III, they do not increase sufficiently compared to the optimal block lengths and size distortions persist. Results for power are presented in Table 7. Optimal block lengths here are selected as the block lengths that give the highest power possible. One should regard this with caution, as optimal
block lengths under the alternative hypothesis are difficult to define, as higher power could come at the expense of good size properties under the null. It is therefore not clear whether high power is the criterion that should be optimized.17 What is clear though, is that choosing an unnecessarily large block length will decrease power.18 The results show that the calibration method performs reasonably satisfactorily. To conclude, using the calibration method improves on using a fixed block length, but it is not optimal. A lot of work still needs to be done on this topic, especially from a theoretical perspective. Remark 9. The equivalent for hypothesis testing of the method proposed by Giacomini et al. (2007) for confidence intervals would
17 Note that even when using size-adjusted power this problem would still be present. 18 As pointed out by one of the referees, this may be due to the fact that for a large block length the possible bootstrap samples, and correspondingly the possible bootstrap statistics will be more alike to each other and to the original sample (statistics). This will lead to a poorer approximation of the bootstrap distribution, and eventually to a loss of power.
98
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Table 7 Power properties with block length selection. A1 , B1
Σ
T
N
τp
τgm
RF
AvB
OpB
OpRF
RF
AvB
OpB
OpRF
5 25 5 25 5 25 5 25 5 25 5 25 5 25 5 25
0.218 0.916 0.740 1.000 0.686 0.984 0.652 1.000 0.102 0.476 0.582 0.988 0.234 0.642 0.264 0.998
4.088 2.928 5.974 3.732 4.530 2.868 6.592 3.766 3.788 2.458 7.586 3.192 4.076 2.414 6.976 3.362
2 1 5 1 4 1 5 1 6 4 4 2 4 2 3 2
0.436 0.964 0.856 1.000 0.816 0.992 0.792 1.000 0.132 0.428 0.682 0.992 0.330 0.704 0.410 1.000
0.186 0.986 0.884 1.000 0.340 0.966 0.748 1.000 0.192 0.756 0.570 1.000 0.250 0.784 0.198 1.000
4.060 2.596 5.438 3.168 4.194 2.606 5.730 3.118 3.908 2.390 7.412 2.850 3.966 2.390 6.702 2.994
2 1 4 1 1 2 3 1 4 3 1 1 3 3 4 1
0.244 0.990 0.948 1.000 0.636 0.976 0.850 1.000 0.270 0.860 0.798 1.000 0.400 0.880 0.354 1.000
5 25 5 25 5 25 5 25 5 25 5 25
0.118 0.328 0.340 0.538 0.226 0.182 0.182 0.414 0.064 0.058 0.110 0.200
4.712 3.802 8.188 6.338 4.652 4.342 9.964 7.096 5.120 4.638 8.774 6.424
1 1 3 1 2 1 3 1 8 3 2 7
0.300 0.550 0.486 0.682 0.406 0.250 0.292 0.508 0.120 0.092 0.184 0.286
0.138 0.242 0.372 0.484 0.160 0.100 0.214 0.580 0.058 0.042 0.098 0.258
4.386 3.608 6.868 5.638 4.484 4.276 8.812 5.858 4.574 4.108 8.166 5.508
1 2 3 2 1 2 1 1 2 5 3 8
0.254 0.402 0.438 0.684 0.268 0.240 0.364 0.752 0.136 0.096 0.158 0.288
Panel A: No common factors (setting IV) A1 = 0, B1 = 0
r =1
r = 0.1
A1 = Ξ , B1 = Ω
r =1
r = 0.1
25 25 50 50 25 25 50 50 25 25 50 50 25 25 50 50
Panel B: Common factors (setting V) A1 = 0, B1 = 0
r =1
r = 0.1
A1 = Ξ , B1 = Ω
r = 0.1
25 25 50 50 25 25 50 50 25 25 50 50
See Table 6.
involve a criterion that minimizes size distortions instead of critical values as we do. To be more specific, after Step 3 of the algorithm calculate the rejection frequency given block length bi using the bootstrap critical value c ∗ (bi ) and the test statistics θˆ k (b0 ). Then the optimal block length can be selected as that block length that makes the rejection frequency the closest to the nominal level. While this criterion may seem to differ from our criterion, it is actually the same. We suggest minimizing the distance to c (b0 ), which is actually that value which, if used as a critical value, will exactly give a rejection frequency equal to the nominal level. Therefore, both criteria minimize the same thing.19 Moreover, our criterion may perform better if only a small number of replications is considered. For a small number of bootstrap replications, there are also only a limited number of values the rejection frequencies can take, with the consequence that the rejection frequency for several candidate block lengths will be the same and a number of block lengths may turn out to be optimal. Our criterion does not suffer from that problem, as the critical values are not restricted in this way. Remark 10. As mentioned before, we compared the calibration method to the subsampling approach of Hall et al. (1995), the plug-in method of Lahiri et al. (2007) and the minimum volatility method. The subsampling method tends to select block lengths in a somewhat unpredictable way, although the obtained rejection frequencies are reasonably close (but somewhat inferior) to those obtained with the calibration method. The plug-in method generally favors too small block lengths, regardless of
19 Loosely speaking, we can describe this situation as a p-value approach versus a critical value approach, where both obviously lead to the same conclusion.
the underlying DGP. The minimum volatility method selects block lengths almost uniformly over the range of allowed lengths, thereby selecting too large block lengths in general. The results are available on request. 6. Conclusion We have established the asymptotic validity of two block bootstrap panel unit root tests for a model that includes various kinds of cross-sectional and temporal dependence. This includes a common factor structure and possibly cross-unit cointegration. The tests are simple pooled and group-mean tests based on the popular LLC and IPS tests. The finite sample properties of our test statistics have also been investigated and shown to be satisfactory in general. There also seems to be little difference between the bootstrap tests considered. While for most specific settings (in particular cross-unit cointegration) some tests can be found that perform better for that particular setting, it is a lot more difficult to find a test that is valid for all the settings for which our bootstrap tests are valid. Moreover, there are currently very few tests that are valid in the empirically relevant case of dynamic cross-sectional dependence, while our tests are valid even in that setting. Our tests are very easy to implement as no specification and estimation of the dependence structure is necessary, and will therefore be very useful for practice when the true form of the cross-sectional (and temporal) dependence is not known and robustness to the unknown cross-sectional dependence matters. In fact, quite a lot of practitioners already use the block bootstrap to account for crosssectional dependence for the reasons listed above. Hence, this work provides the necessary theoretical justification. On the basis of the theoretical and simulation results in this paper, we conclude that it is legitimate to use the proposed tests
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
in practice when testing for unit roots in the observed data of a panel of fixed N entities, in the presence of various forms of crosssectional dependence. The block bootstrap algorithm described in Section 3 can be straightforwardly implemented whereby block lengths can be selected using the Warp-Speed calibration method. This study still leaves several ends open. First, while we briefly considered the subject of block length selection, much still needs to be done as at the moment there does not exist a fully satisfactory method to select block lengths. Second, while our derivations do not depend on small N in any way, it will be interesting to see what happens if N → ∞. As explained, such a theoretical analysis is difficult in our setting but it is certainly worth further research. Third, the specification of deterministic components remains an open issue. While a ‘‘naive’’ implementation of deterministic components is quite straightforward, and can even be seen to be valid without too much difficulty, experience has shown that including ‘‘naive’’ deterministic terms in panels is hardly ever a good solution. Thus, further investigation of this issue is also merited. Finally, the block bootstrap may also be used for the analysis of panel cointegration, as done by Fachin (2007) and Di Iorio and Fachin (2008). To develop the appropriate theoretical foundations in that setting would be a logical generalization of this work. Acknowledgements
Appendix Proof of Lemma 1. Note that by Assumption 1 WT (r ) = T −1/2
∑⌊Tr ⌋ t =1
d
εt − → Σ 1/2 W (r ). Then it follows from standard asymptotic
theory for linear processes (see for example Phillips and Solo, 1992) that, uniformly in r, T −1/2
⌊Tr ⌋ −
Proof of Lemma A.1. For part (i), note that
Ω = lim T −1
xt = Ψ (1)WT (r ) + op (1),
T − T −
T →∞
T − T − ∞ − ∞ −
T →∞
Γ ′ Ψi E εs−i εt′ −j Ψj′ Γ
s=1 t =1 i=0 j=0
∞ − ∞ −
=
E us u′t
s=1 t =1
= lim T −1
Γ ′ Ψi ΣΨj′ Γ = Γ ′ Ψ (1)ΣΨ (1)′ Γ .
i =0 j =0
For part (ii) we have
Ω0 = lim T
−1
T −
T →∞
∞ −
E
∞ −
Γ Ψj εt − j ′
∞ −
j =0
′ Γ Ψj εt − j ′
j =0
T − ∞ − ∞ −
T →∞
=
t =1
= lim T −1
Γ ′ Ψi E εt −i εt′ −j Ψj′ Γ
t =1 i =0 j =0
Γ ′ Ψi ΣΨi′ Γ .
i =0
This completes the proof.
Lemma A.2. Let yt be generated under H0 setting (A). Let Assumption 1 hold. Then, as T → ∞, we have for i = 1, . . . , N, (i) T −1
Previous versions of this paper were presented at the conference on Factor Structures for Panel and Multivariate Time Series Data in Maastricht, the Econometric Society European Meeting 2009 in Barcelona and at seminars at the Econometric Institute, Lund University, CORE, the University of Cambridge and Louisiana State University. We thank the conference and seminar participants, Stefano Fachin, Anders Swensen, the associate editors Jörg Breitung and Joakim Westerlund and two anonymous referees for helpful comments and suggestions. We thank NWO and the Royal Netherlands Academy of Arts and Sciences for financial support. The usual disclaimer applies.
99
(ii) T
∑T
∑ −2
d
t =1 T t =1
yi,t −1 1yi,t − → d
y2i,t −1 − →
1 0
1 0
Bi (r )dBi (r ) + 21 (ωi − ω0,i ),
Bi (r )2 dr,
where convergence also holds jointly. Proof of Lemma A.2. The proof follows directly from Lemmas 1, A.1 and the continuous mapping theorem. Joint convergence can be established using the Cramér–Wold device (cf. Davidson, 2002, Theorem 25.5). Proof of Theorem 1. The Lemma A.2.
proof
follows
immediately
from
In order to derive the bootstrap invariance principle we need three preliminary lemmas that build on each other. We exploit the linearity of the processes in our derivation. As for the original series, we first derive the properties for the bootstrap equivalent of εt which we then extend to u∗t . Lemma A.3 establishes some moments for this series, while Lemma A.4 establishes the corresponding invariance principle. Lemma A.5 then extends this to u∗t . ∗ Lemma A.3. Define Hm = b−1/2 tions 1 and 3 hold, we have
∑b
s=1
(εim +s − E∗ εim +s ). If Assump-
∗ (i) E∗ Hm = 0, ∗ ∗′ (ii) E∗ Hm Hm = Σ + op (1).
t =1 d
and consequently T −1/2 t =1 xt − → Ψ (1)Σ 1/2 W (r ). The result then follows by the continuous mapping theorem.
Proof of Lemma A.2. Statement (i) is trivial. To prove statement (ii), write
To prove Theorem 1 we need some moments that appear in the asymptotic distributions.
∗ ∗′ E∗ H m Hm = b−1
∑⌊Tr ⌋
b − b −
E∗ εim +s1 εi′m +s2 − E∗ εim +s1 E∗ εi′m +s2
s1 =1 s2 =1
Lemma A.1. Let yt be generated under H0 setting (A). Let Assumption 1 hold. Then (i) Ω = limT →∞ T −1 E
∑ T
t = 1 ut
∑
T t =1
ut
=
1
b − b − T −b −
b(T − b)
s1 =1 s2 =1 t =1
′
= Γ ′ Ψ (1)ΣΨ (1)′ Γ , ∑T ∑∞ ′ ′ (ii) Ω0 = limT →∞ T −1 t =1 E ut u′t = j=0 Γ Ψj ΣΨj Γ .
−b
1
−1
b − T −b −
T − b s =1 t =1
= AT + BT .
εt +s1 εt′ +s2 εt + s
1
b − T −b −
T − b s =1 t =1
′ εt + s
100
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Let us first look at BT . Note that b − T −b −
1
T − b s=1 t =1
−
T b−
εt + s =
T t =1
b − s−1 −
1
T −
b2
T (T − b) t =1 b −
1
εt −
T − b s=1 t =1
εt +
Using the Beveridge–Nelson decomposition we have
T −
T − b s=1 t =T −b+s+1
T −1/2
εt
= T −1/2
εt
b(T − b) s=1 t =1
+
b(T − b)
= T −1
T −
T −b −
b −
εt +s1 εt′ +s2
s1 =1 s2 =1,s1 ̸=s2 t =1
εt εt′ + Op (bT −1/2 ) = Σ + op (1).
t =1
This concludes the proof of part (ii).
Lemma A.4. Let Assumptions 1 and 3 hold. Then, as T → ∞, WT∗ (r ) = T −1/2
− →Σ
1/2
⌊(k− −1)r ⌋
((Ψ˜ (L)εim +b − E∗ Ψ˜ (L)εim +b )
for s = 0, b by the Markov inequality. Then, letting ξt∗ = εt − E∗ εt ,
⌊(k− −1)r ⌋ − b
(εim +s − E∗ εim +s )
k=0 d∗
− T −1/2
Ψ (1)(εim +s − E∗ εim +s )
s=1
m =0
εt +s εt′ +s
b −
1
⌊(k− −1)r ⌋ − b
× (Ψ˜ (L)εim − E∗ Ψ˜ (L)εim )), ∑∞ ∑∞ ˜ j ˜ where Ψ˜ (z ) = j = 0 Ψj z , Ψj = i = j + 1 Ψj . ∑⌊(k−1)r ⌋ −1/2 We will show that T (Ψ˜ (L)εim +b − E∗ Ψ˜ (L)εim +b ) m=0 ∗ = op (1). First note that ⌊(k− −1)r ⌋ ∗ ∗ −1/2 P T Ψ˜ (L)εim +s − E Ψ˜ (L)εim +s > ϵ m=0 2 ⌊(k−1)r ⌋ 1 ∗ −1/2 − ∗ ˜ ˜ ≤ 2 E T Ψ (L)εim +s − E Ψ (L)εim +s ϵ m=0 2 1 = 2 E ∗ G∗T ,s ϵ
Next we look at the first term. We have b − T −b −
s =1
m=0
from which we can conclude that BT = Op (bT −1 ).
1
(xim +s − E∗ xim +s )
m=0
= Op (bT −1/2 ) + Op (b2 T −3/2 ) + Op (b3/2 T −1 ) + Op (b3/2 T −1 ),
AT =
⌊(k− −1)r ⌋ − b
⌊(k− −1)r ⌋ ⌊(k− −1)r ⌋ ∞ − ∗ 2 − 1 ∗ E GT ,s = T E Ψ˜ j ξi∗m
s=1
m1 =0
W (r ) in probability.
×
∞ −
Proof of Lemma A.4. First note that T
⌊(k− −1)r ⌋ − b
−1/2
(εim +s − E εim +s ) = k
m =0
∗
⌊(k− −1)r ⌋
s=1
=T Hm . ∗
2
m=0
m=0
It is easily seen that the conditions of Corollary 2.2 of Phillips ∗ and Durlauf (1986) hold for the Hm terms, by which the result follows. Lemma A.5. Let yt be generated under H0 setting (A). Let Assumptions 1 and 3 hold. Then, as T → ∞, T −1/2
(uim +s − E∗ uim +s ) − → Γ ′ Ψ (1)Σ 1/2 W (r ).
m =0
≤T
−1
⌊(k− −1)r ⌋ m=0
1
(uim +s − E∗ uim +s )
= T −1/2
⌊(k− −1)r ⌋ − b
j =0
∞ − Ψ˜ j
2 max j
1
T −b − εt +s−j 2 .
T − b t =1
m=0
m=0
we focus on T −1/2
s=1
m=0
(29)
T −1/2
⌊(k− −1)r ⌋ − b
(xim +s − E∗ xim +s ) = Ψ (1)WT∗ (r ) + o∗p (1)
m=0
∑⌊(k−1)r ⌋ ∑b
s=1
(xim +s − E∗ xim +s ).
2
Op (b−1 ) for s = 0, b from which it follows that, uniformly in r,
(xim +s − E∗ xim +s ) ,
|εt +s−j | = Op (1)
s=1
⌊(k− −1)r ⌋ − b
T −b −
∑∞
by the moment conditions in Assumption 1. Therefore E∗ G∗T ,s =
(Γ ′ xim +s − E∗ Γ ′ xim +s )
= Γ ′ T −1/2
−1
T − b t =1
s=1
2 ∞ − 1/2 Ψ˜ j E∗ εi +s−j − E∗ εi +s−j 2 m m
∑∞
⌊(k− −1)r ⌋ − b m =0
2 ∞ − ∗ E Ψ˜ ε − E εim +s−j , j=0 j im +s−j ∗
˜ < ∞ is that < A sufficient condition for j=0 Ψj j=0 j Ψj ∞; see Phillips and Solo (1992, Lemma 2.1). This holds by Assumption 1. We also have that
Proof of Lemma A.5. As T −1/2
+s−j
2
j =0
s=1
+s−j
E∗ G∗T ,s
≤ 4kT
d∗
1
using the independence of the blocks. Now, by Minkowski’s inequality, we have uniformly in r,
⌊(k− −1)r ⌋ − b
j =0
⌊(k− −1)r ⌋
−1
m2 =0
Ψ˜ j ξi∗m
j =0
−1/2
′
∗
s =1
and therefore
(30)
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
T −1/2
⌊(k− −1)r ⌋ − b
Consequently
(xim +s − E∗ xim +s )
m=0
s=1
Ω =T ∗
d∗
− → Ψ (1)Σ 1/2 W (r ) in probability
Γ Ψ (1) E
−1
′
×
ST (r ) = T
y1 + T
Mr − 1
b −−
uˆ im +s + T
−1/2
Nr −
m=0 s=1
uˆ iMr +s ,
s=1
where Mr = ⌊(⌊Tr ⌋ − 2)/b⌋ and Nr = ⌊Tr ⌋ − Mr b − 1. As T −1/2 y1 = Op (T −1/2 ), we write ST (r ) = T ∗
Mr − b −
−1/2
uˆ im +s − T
b −
−1/2
m=0 s=1
uˆ iMr +s + Op (T
Ω0 = T ∗
∑ T
∑
T t =1
′ ]
ut
t =1
T
T −
∗
ut = T
−1/2
t =1
Therefore E∗ T −1/2
u∗t
∑
′
Ω ∗ = T −1 E∗
t =1
T t =1
u∗t
′
∑ T ∗
×
−E
t =1
u∗t
×
′ b −− ∗ (uim +s − E uim +s ) + op (1). k−1
k−1 − b − Γ Ψ (1) (εim +s − E∗ εim +s ) + o∗p (1). m=0 s=1
Γ Ψj (εim +s−j − E εim +s−j ) ∗
+ op (1)
Γ ′ Ψi
1 T −b
− εt +s−i εt′ +s−j Ψj′ Γ k−1 − b − ∞ − ∞ −
Γ ′ Ψi
T −b −
T −b −
1
εt +s−i
1 T −b
εt′ +s−j Ψj′ Γ + op (1)
2 T −b 1 − |BT | ≤ b max εt +s |Γ |2 1≤s≤b T − b t =1 ×
(33)
u∗t = op (1) and
m=0 s=1
=T
′ ′
Note that
k−1 − b − ∗ (uim +s − E uim +s )
′
Γ ′ Ψj (εim +s−j − E∗ εim +s−j )
j =0
T − b t =1 t =1 = AT + BT + op (1).
k−1 − b − (uim +s − E∗ uim +s )
−1/2
∞ −
t =1
×
′
Using the Beveridge–Nelson decomposition, we can show, as in the proof of Lemma A.5, that T
′
Γ ′ Ψj εim +s−j , we can write
m=0 s=1 i=0 j=0
m=0 s=1
−1/2
E
j =0
m=0 s=1 i=0 j=0
m=0 s=1
uim +s − E∗ uim +s
T −b
− T −1
k−1 − b − (uim +s − E∗ uim +s ) + o∗p (1).
∑T
∗
∑∞
k−1 − b − ∞ − ∞ −
= T −1
∞ − ∞ −
|Ψi ||Ψj | = bOp (T −1 )O(1) = Op (k−1 ),
i=0 j=0
m=0 s=1
∞ −
Proof of Lemma A.6. We start with part (i). Using the arguments in the proof of Lemma 2 (take r = 1) we can show that −1/2
j =0
= Γ Ψ (1)ΣΨ (1) Γ + op (1), ∑ T ∗ ∗ ∗ ∗′ (ii) Ω0∗ = T −1 t =1 E∗ (u∗t u∗′ t ) − E ut E ut ∑∞ ′ = i=0 Γ Ψi ΣΨi′ Γ + op (1). E
∗
k−1 − b −
−1
×
s=1
Lemma A.6. Let yt be generated under H0 setting (A). Let Assumptions 1 and 3 hold. Then, as T → ∞,
∗
E∗ uim +s − E∗ uim +s
+ op (1).
(32)
The next step is to determine the moments of the bootstrap series corresponding to the moments in Lemma A.1.
E∗
k−1 − b −
Ω0 = T −1
m=0 s=1
The proof is then concluded by applying Lemma A.5.
(i) Ω ∗ = T −1
Ψ (1)′ Γ + op (1)
which follows from the independence of the blocks and Lemma A.3. This concludes the proof of part (i). The proof of part (ii) is similar to part (i). As in the proof of Lemma 2 we have that
Then, using that uim +s =
(uim +s − E∗ uim +s ) + o∗p (1).
[
(εim +s − E εim +s )
m=0 s=1
).
s=Nr +1
⌊(k− −1)r ⌋ − b m=0
′ ∗
m=0 s=1
−1/2
It now follows from applying the proof of Theorem 3.1 of Paparoditis and Politis (2003) to each element of ST∗ (r ) that, uniformly in r, ST∗ (r ) = T −1/2
k−1 − b −
= Γ ′ Ψ (1)ΣΨ (1)′ Γ + op (1),
Proof of Lemma 2. Note that −1/2
k−1 − b − ∗ (εim +s − E εim +s )
m=0 s=1
by Lemma A.4. The proof is concluded by referring to (29) and applying the continuous mapping theorem.
−1/2
∗
(31)
∗
101
while 1
T −b − b − ∞ −
Γ ′ Ψi (εt +s−i εt′ +s−i )Ψi′ Γ + op (1), (T − b)b t =1 s=1 i=0 ∑T −b ∑b ′ and as (T −1b)b = Σ + op (1), we can t =1 s=1 εt +s−i εt +s−i ∑∞ ′ ∗ ′ Γ + op (1). conclude that Ω0 = Γ Ψ ΣΨ i i i =0 AT =
Lemma A.7. Let yt be generated under H0 setting (A). Let Assumptions 1 and 3 hold. Then, as T → ∞, we have for i = 1, . . . , N, (i) T −1 (ii) T
∑T
∑ −2
t =1 T t =1
d
y∗i,t −1 1y∗i,t − → d
y∗i,2t −1 − →
1 0
1 0
Bi (r )dBi (r ) + 21 (ωi − ω0,i ),
Bi (r )2 dr,
where convergence also holds jointly. Proof of Lemma A.7. The result follows from Lemmas 2 and A.6 and the continuous mapping theorem. Joint convergence can again be established using the Cramér–Wold device.
102
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Proof of Theorem 2. The Lemma A.7.
result
follows
directly
from
Proof of Lemma 3. As in Lemma 1, we have by Assumption 1 that WT (r ) = T −1/2
d
εt − → Σ 1/2 W (r ). Then it follows that T t =1 ft = Ψ2 (1)WT (r ) + op (1) uniformly in r, and ∑⌊Tr ⌋ d consequently T −1/2 t =1 ft − → Ψ2 (1)Σ 1/2 W (r ). ∑⌊Tr ⌋ t =1
∑⌊Tr ⌋ −1/2
Now
T −1/2
⌊Tr ⌋ −
ut = T −1/2
⌊Tr ⌋ −
t =1
⌊Tr ⌋ −
Λft + T −1/2
t =1
= T −1/2
⌊Tr ⌋ −
=T
⌊Tr ⌋ −
T −1/2
⌊(k− −1)r ⌋ − b
t =1
s =1
Proof of Lemma A.9. Note that
Λft + T −1/2 (w⌊Tr ⌋ − w0 )
T −1/2
⌊(k− −1)r ⌋ − b
(uim +s − E∗ uim +s )
m=0
−1/2
Λft + Op (T
∑⌊Tr ⌋ t =1
d∗
(uim +s − E∗ uim +s ) − → ΛΨ2 (1)Σ 1/2 W (r ).
m=0
s =1
⌊(k− −1)r ⌋ − b
)
=Λ T
t =1
uniformly in r and T −1/2
Lemma A.9. Let yt be generated under H0 setting (B). Let Assumptions 1–3 hold. Then, as T → ∞,
1wt
t =1
−1/2
Proof of Theorem 3. Using Lemmas 3, A.8 and the continuous mapping theorem we can construct the counterpart of Lemma A.2. The result then follows.
−1/2
(fim +s − E fim +s ) + T −1/2
m=0
d
Λft − → ΛΨ2 (1)Σ 1/2 W (r ).
×
The next lemma is the counterpart of Lemma A.1.
⌊(k− −1)r ⌋
∗
s =1
(wim +b − E∗ wim +b ) − T −1/2
m=0
Lemma A.8. Let yt be generated under H0 setting (B). Let Assumptions 1 and 2 hold. Then
¯ = limT →∞ T −1 E (i) Ω
∑
T t =1
ut
∑
T t =1
ut
′
′ = ΛΨ2 (1)ΣΨ2 (1)′ Λ∑ , ∑∞ T ′ −1 ′ ′ ¯ (ii) Ω0 = limT →∞ T t = 1 E ( ut ut ) = j=0 (ΛΨ2,j ΣΨ2,j Λ + ′ ′ ′ ′ (Ψ¯ 1,j − Ψ¯ 1,j+1 )ΣΨ2,j Λ + ΛΨ2,j Σ (Ψ¯ 1,j − Ψ¯ 1,j+1 ) + 2Ψ¯ 1,j Σ Ψ¯ 1,j − Ψ¯ 1,j Σ Ψ¯ 1′,j+1 − Ψ¯ 1,j+1 Σ Ψ¯ 1′,j ).
¯ = lim T −1 E Ω T →∞
T −
Λft + wT − w0
T −
t =1
= lim T −1
Λft + wT − w0
Λ E(fs ft′ )Λ′ = ΛΨ2 (1)ΣΨ2 (1)′ Λ′ ,
B∗T ,s = T −1/2
⌊(k− −1)r ⌋
(Ψ¯ 1 (L)εim +s − E∗ Ψ¯ 1 (L)εim +s ) + o∗p (1).
m=0
∑⌊(k−1)r ⌋ −1/2
(Ψ¯ 1 (L)εim +s − E∗ Ψ¯ 1 (L)εim +s ) = O∗p (1), it follows = Op (b ) uniformly in r for s = 0, b. m=0
∗
∗
T
¯ 0 = lim T −1 Ω T →∞
−
E (Λft + 1wt )(Λft + 1wt )
′
that BT ,s As in the proof of Lemma A.5 we have that T −1/2
⌊(k− −1)r ⌋ − b
(fim +s − E∗ fim +s ) = Ψ2 (1)WT∗ (r ) + o∗p (1)
m=0
s =1
= lim
T →∞
T −1
T −
Λ E(ft ft′ )Λ′ + T −1
t =1
− t =1
(uim +s − E∗ uim +s )
s =1
d∗
E(1wt ft′ )Λ′
−
⌊(k− −1)r ⌋ − b
− → ΛΨ2 (1)W (r ) in probability.
T
Λ E(ft 1wt′ ) + T −1
T −1/2
m=0
t =1
T
+ T −1
T −
uniformly in r and consequently that
t =1
E(1wt 1wt′ )
Proof of Lemma 4. As the order of the ft determines the order of ut , the proof proceeds as the proof of Lemma 2 such that we have, uniformly in r,
t =1
= AT + BT + B′T + CT .
ST∗ (r ) = T −1/2
⌊(k− −1)r ⌋ − b m=0
Now ∞ −
∞ − BT = (Ψ¯ 1,j − Ψ¯ 1,j+1 )ΣΨ2′,j Λ′ j =0
Lemma A.10. Let yt be generated under H0 setting (B). Let Assumptions 1–3 hold. Then, as T → ∞,
¯ ∗ = T −1 E (i) Ω
and 2Ψ¯ 1,j Σ Ψ¯ 1′,j
(35)
s=1
We consider the bootstrap moments in the following lemma.
j=0
∞ −
(ui,im +s − E∗ ui,im +s ) + o∗p (1).
The proof is then concluded by applying Lemma A.9.
ΛΨ2,j ΣΨ2′,j Λ′ ,
analogous to the proof of Lemma A.1 part (ii). Similarly
CT =
−1/2
s =1 t =1
where the last step follows as in the proof of Lemma A.1 part (i). For part (ii) we have
AT =
(34)
We want to show that B∗T ,s = O∗p (b−1/2 ) uniformly in r for s = 0, b. First note that by Eq. (30)
t =1
T − T −
T →∞
′
(wim − E∗ wim )
m=0
= A∗T − B∗T ,0 + B∗T ,b .
As k
Proof of Lemma A.8. For part (i), note that
⌊(k− −1)r ⌋
∑
T t =1
u∗t
∑
T t =1
u∗t
′
= ΛΨ2 (1)ΣΨ2 (1)′ Λ′ +
op (1),
−
Ψ¯ 1,j Σ Ψ¯ 1′,j+1
j =0
This completes the proof.
−
Ψ¯ 1,j+1 Σ Ψ¯ 1′,j
.
(ΛΨ2,j ΣΨ2′,j Λ′ + (Ψ¯ 1,j − Ψ¯ 1,j+1 )ΣΨ2′,j Λ′ + ΛΨ2,j Σ (Ψ¯ 1,j − Ψ¯ 1,j+1 )′ + 2Ψ¯ 1,j Σ Ψ¯ 1′,j − Ψ¯ 1,j Σ Ψ¯ 1′,j+1 − Ψ¯ 1,j+1 Σ Ψ¯ 1′,j ) + op (1).
¯ 0∗ = T −1 (ii) Ω
∑T
t =1
E(u∗t u∗′ t ) =
∑∞
j =0
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
Proof of Lemma A.10. We start with part (i). As T
−1/2
T −
and
k−1 − b − u∗t = T −1/2 (uim +s − E∗ uim +s ) + o∗p (1),
t =1
(36)
CT∗ =
∞ −
2Ψ¯ 1,j Σ Ψ¯ 1′,j − Ψ¯ 1,j Σ Ψ¯ 1′,j+1 − Ψ¯ 1,j+1 Σ Ψ¯ 1′,j + op (1).
j =0
m=0 s=1
This completes the proof.
we have that
¯ ∗ = T −1 E∗ Ω
×
Proof of Lemma 5. We can write
′ b −− ∗ (uim +s − E uim +s ) + op (1). k−1
N ∑
m=0 s=1
Combining the proof of Lemmas A.5 and A.9 we can show that k−1 − b − (uim +s − E∗ uim +s )
T
−1
τp =
i=1
aiT = T −1
m=0 s=1
T ∑
T −1
t =2
= y2i,t −1
T −
yi,t −1 yi,t − T −1
t =2
¯ = ΛΨ2 (1)ΣΨ2 (1) Λ + op (1), Ω ∗
′
′
biT = T −1
which follows as in the proof of Lemma A.6. This concludes the proof of part (i). Next we consider part (ii). As in the proof of Lemma A.6 we can show that E
∗
uim +s − E∗ uim +s
uim +s − E∗ uim +s
′
+ op (1).
¯0 = T Ω
b k −1 − −
. biT
i=1
T −
p
y2i,t −1 − → γi (1) − γi (0),
T −
p
y2i,t −1 − → γi (0).
t =2
Similarly, T −1 τgm = N −1
N − aiT i=1
biT
,
from which the result follows.
Lemma A.11. Let yt be generated under H1a . Let Assumptions 1 and 3 hold. Then, as T → ∞,
Then we can write −1
N ∑
t =2
m=0 s=1
∗
aiT
i=1
and
Consequently
¯ 0∗ = T −1 Ω
t =2
N ∑
N ∑
yi,t −1 1yi,t
Now as yi,t is a stationary process for all i = 1, . . . , N, we have that
k−1 − b − (εim +s − E∗ εim +s ) + o∗p (1).
k −1 − b −
T ∑
T −1
i=1
m=0 s=1
= T −1/2 ΛΨ2 (1)
Proof of Theorem 4. As for Theorem 3 we can construct the counterpart of Lemma A.7 using Lemmas 4, A.10 and the continuous mapping theorem. The result then follows.
k−1 − b − ∗ (uim +s − E uim +s ) m=0 s=1
T −1/2
103
ΛE
∗
∗
fim +s−j − E fim +s−j
T −1/2
⌊(k− −1)r ⌋ − b
d∗
(uim +s − E∗ uim +s ) − → Ψ ++ (1)Σ 1/2 W (r ).
m=0
s=1
m=0 s=1
′ × fim +s−j − E∗ fim +s−j Λ′ + T −1
k−1 − b −
Proof of Lemma A.11. Using the Beveridge–Nelson decomposition we can write
Λ E∗ fim +s−j − E∗ fim +s−j
T −1/2
m=0 s=1
× 1wim +s−j − E∗ 1wim +s−j
+ T −1
k−1 − b −
′
⌊(k− −1)r ⌋ − b
(uim +s − E∗ uim +s )
m=0
E∗ 1wim +s−j − E∗ 1wim +s−j
= T −1/2
⌊(k− −1)r ⌋ − b m=0
m=0 s=1
′
− T −1/2
× fim +s−j − E∗ fim +s−j Λ′
+ T −1
k−1 − b −
m=0 s=1
× 1wim +s−j − E∗ 1wim +s−j
′
∗ = A∗T + B∗T + B∗′ T + CT .
Then we can show, in the same way as in the proof of Lemma A.6 part (ii), that ∞ −
as well as ∞ − (Ψ¯ 1,j − Ψ¯ 1,j+1 )ΣΨ2′,j Λ′ + op (1) j =0
s=1
⌊(k− −1)r ⌋
((Ψ˜ ++ (L)εim +b − E∗ Ψ˜ ++ (L)εim +b )
× (Ψ˜ ++ (L)εim − E∗ Ψ˜ (L)εim )), ∑∞ ++ j ∑ ++ ˜ z and Ψ˜ j++ = ∞ where Ψ˜ ++ (z ) = . j = 0 Ψj i = j + 1 Ψj ∑ ⌊(k−1)r ⌋ ˜ ++ ∗ ˜ −1/2 We need to show that T ( Ψ ( L )ε − E Ψ (L)++ i m +b m=0 ∗ εim +b ) = op (1), uniformly in r. Completely analogous to the proof ∑∞ of Lemma A.5 this means showing that j=0 Ψ˜ j++ < ∞ or equiv ∑∞ ++ alently < ∞. This holds as we remarked that the j = 0 j Ψj summability condition continues to hold.
ΛΨ2,j ΣΨ2′,j Λ′ + op (1),
j =0
B∗T =
Ψ ++ (1)(εim +s − E∗ εim +s )
m=0
E∗ 1wim +s−j − E∗ 1wim +s−j
A∗T =
s=1
Proof of Theorem 5. We can apply the proof of Theorem 3.1 of Paparoditis and Politis (2003), now for the stationary case, to show that, uniformly in r, ST∗ (r ) = T −1/2
⌊(k− −1)r ⌋ − b
(ui,im +s − E∗ ui,im +s ) + o∗p (1).
m=0
s=1
(37)
104
F.C. Palm et al. / Journal of Econometrics 163 (2011) 85–104
The result now follows by applying Lemma A.11 and the continuous mapping theorem. Proof of Lemma 6. We write N ∑
τp =
T −1
i=1
T ∑
N ∑
T ∑
T −2
i=1 n1 ∑
=
yi,t −1 1yi,t
t =2
T −1
i=1
t =2 T ∑
y2i,t −1
t =2
T −1
n1 ∑
T −1
=
n1 ∑
T −1 b
T ∑ t =2
N ∑
aiT +
i =1
y2i,t −1 +
yi,t −1 1yi,t
t =2
N ∑
T −2
i=n1 +1
T ∑ t =2
y2i,t −1
.
N ∑
+
i=1
T ∑
ciT
i=n1 +1
iT
T −1
i=n1 +1
i=1 n1 ∑
N ∑
yi,t −1 1yi,t +
diT
i=n1 +1
The convergence of aiT and biT follow from the proof of Lemma 5. Furthermore, as in Lemma A.2, we have that d
1
∫
B#i (r )dB#i (r ),
ciT − → 0 d
1
∫
B#i (r )2 dr ,
diT − → 0
from which the result for τp follows. For τgm we can write T −1 τgm = N −1
n1 − aiT
biT i =1
+ N −1
N −
T −1
i=n1 +1
ciT diT
n1
= N −1
− aiT i =1
biT
+ Op (T −1 ).
Proof of Corollary 1. The proof is immediate by combining the proofs of Theorems 2 and 5. References Bai, J., Ng, S., 2004. A PANIC attack on unit roots and cointegration. Econometrica 72, 1127–1177. Bai, J., Ng, S., 2010. Panel unit root tests with cross-section dependence: a further investigation. Econometric Theory 26, 1088–1114. Breitung, J., Das, S., 2008. Testing for unit roots in panels with a factor structure. Econometric Theory 24, 88–108. Bühlmann, P., Künsch, H.R., 1999. Block length selection in the bootstrap for time series. Computational Statistics & Data Analysis 31, 295–310. Chang, Y., 2004. Bootstrap unit root tests in panels with cross-sectional dependency. Journal of Econometrics 120, 263–293. Chang, Y., Song, W., 2009. Testing for unit roots in small panels with short-run and long-run cross-sectional dependencies. Review of Economic Studies 76, 903–935.
Choi, I., Chue, T.K., 2007. Subsampling hypothesis tests for nonstationary panels with applications to exchange rates and stock prices. Journal of Applied Econometrics 22, 401–428. Chudik, A., Pesaran, M.H., Tosetti, E., 2009. Weak and strong cross section dependence and estimation of large panels. Working Paper Series No. 1100. European Central Bank. Davidson, J., 2000. Econometric Theory, 2nd ed. Blackwell Publishers, Oxford. Davidson, J., 2002. Stochastic Limit Theory, 2nd ed. Oxford University Press, Oxford. Di Iorio, F., Fachin, S., 2008. Testing for cointegration in dependent panels via residual-based bootstrap methods. Working Paper. Fachin, S., 2007. Long-run trends in internal migrations in Italy: a study in panel cointegration with dependent units. Journal of Applied Econometrics 22, 401–428. Gengenbach, C., Palm, F.C., Urbain, J.-P., 2010. Panel unit root tests in the presence of cross-sectional dependencies: comparison and implications for modelling. Econometric Reviews 29, 111–145. Giacomini, R., Politis, D.N., White, H., 2007. A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators. Working Paper. Gonçalves, S., Vogelsang, T.J., 2010. Block bootstrap puzzles in HAC robust testing: the sophistication of the naive bootstrap. Econometric Theory (forthcoming). Hall, P., Horowitz, J.L., Jing, B.-Y., 1995. On blocking rules for the bootstrap with dependent data. Biometrika 82, 561–574. Härdle, W., Horowitz, J.L., Kreiss, J.-P., 2003. Bootstrap methods for time series. International Statistical Review 71, 435–459. Hlouskova, J., Wagner, M., 2006. The performance of panel unit root and stationarity tests: results from a large scale simulation study. Econometric Reviews 25, 85–116. Horowitz, J.L., Savin, N.E., 2000. Empirically relevant critical values for hypothesis tests: a bootstrap approach. Journal of Econometrics 95, 375–389. Im, K.S., Pesaran, M.H., Shin, Y., 2003. Testing for unit roots in heterogeneous panels. Journal of Econometrics 115, 53–74. Kapetanios, G., 2008. A bootstrap procedure for panel data sets with many crosssectional units. Econometrics Journal 11, 377–395. Künsch, H.R., 1989. The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17, 1217–1241. Lahiri, S.N., 2003. Resampling Methods for Dependent Data. Springer-Verlag, New York. Lahiri, S.N., Furukawa, K., Lee, Y.-D., 2007. A nonparametric plug-in rule for selecting optimal block lengths for block bootstrap methods. Statistical Methodology 4, 292–321. Levin, A., Lin, C.-F., Chu, C.-S.J., 2002. Unit root tests in panel data: asymptotic and finite-sample properties. Journal of Econometrics 108, 1–24. Maddala, G.S., Wu, S., 1999. A comparative study for unit root tests with panel data and a simple new test. Oxford Bulletin of Economics and Statistics 61, 631–652. Moon, H.R., Perron, B., 2004. Testing for a unit root in panels with dynamic factors. Journal of Econometrics 122, 81–126. Moon, H.R., Perron, B., Phillips, P.C.B., 2007. Incidental trends and the power of panel unit root tests. Journal of Econometrics 141, 416–459. O’Connell, P.G.J., 1998. The overvaluation of purchasing power parity. Journal of International Economics 44, 1–19. Paparoditis, E., Politis, D.N., 2003. Residual-based block bootstrap for unit root testing. Econometrica 71, 813–855. Pedroni, P., Vogelsang, T.J., Wagner, M., Westerlund, J., 2008. Robust unit root and cointegration rank tests for time series panels. Working Paper. Pesaran, M.H., 2007. A simple panel unit root test in the presence of cross-sectional dependence. Journal of Applied Econometrics 22, 265–312. Phillips, P.C.B., Durlauf, S.N., 1986. Multiple time series regression with integrated processes. Review of Economic Studies 53, 473–495. Phillips, P.C.B., Solo, V., 1992. Asymptotics for linear processes. Annals of Statistics 20, 971–1001. Politis, D.N., Romano, J.P., Wolf, M., 1999. Subsampling. Springer-Verlag, New York. Politis, D.N., White, H., 2004. Automatic block-length selection for the dependent bootstrap. Econometric Reviews 23, 53–70. Smeekes, S., 2009. Detrending bootstrap unit root tests. METEOR Research Memorandum 09/056. Maastricht University.
Journal of Econometrics 163 (2011) 105–117
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
A characterization of vector autoregressive processes with common cyclical features Massimo Franchi a , Paolo Paruolo b,∗ a
Dipartimento di Scienze Statistiche, Università di Roma ‘‘La Sapienza’’, Italy
b
European Commission, Joint Research Centre, Institute for the Protection and Security of the Citizen & Università dell’Insubria, Varese, Italy
article
info
Article history: Available online 21 December 2010 JEL classification: C32 C51 C52 Keywords: Multiple time series Common cycles Cointegration I(1) I(2)
abstract This paper presents necessary and sufficient conditions for the existence of common cyclical features in Vector Auto Regressive (VAR) processes integrated of order 0, 1, 2, where the common cyclical features correspond to common serial correlation (CS), commonality in the final equations (CE) and co-dependence (CD). The results are based on local rank factorizations of the reversed AR polynomial around the poles of its inverse. All processes with CS structures are found to present also CE structures and vice versa. The presence of CD structures, instead, implies the presence of both CS and CE structures, but not vice versa. Characterizations of the CS, CE, CD linear combinations are given in terms of linear subspaces defined in the local rank factorizations. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Several macroeconomic theories predict the presence of common dynamic components in economic time series. For example, the life-cycle hypothesis and permanent income hypothesis relate current consumption to (the present value of) life-income or real wealth, hence implying common trends and cycles between these variables, see Hall (1978) and Campbell and Mankiw (1990). Similarly, co-movements among national consumption aggregates are predicted by international risk-sharing, see Cavaliere et al. (2008) and reference therein. Other economic theories with similar implications include: international equalization of interest rates, see Kugler and Neusser (1993), present value models, see Campbell and Shiller (1987), and balanced growth models, see King et al. (1991). In several of these models, commonality in dynamic behavior is implied by the first order conditions of optimizing agents. Let −1 Xt denote a vector of observable time series, and let Xtt− := k (Xt −k , . . . , Xt −1 ); optimization usually implies that some function −1 yt := g (Xt , Xtt− k ) has zero expectation conditional on information −1 available at time t − 1, which includes Xtt− k . This implies that −1 yt is unpredictable on the basis of Xtt− k , and hence yt does not
∗ Corresponding address: Department of Economics, University of Insubria, Via Monte Generoso 71, 21100 Varese, Italy. Tel.: +39 0332395541; fax: +39 0332395509. E-mail address:
[email protected] (P. Paruolo). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.009
contain cyclical components. A leading special case is when g is a linear function, which corresponds to the notion of common features introduced by Vahid and Engle (1993) and Engle and Kozicki (1993). Special cases of common features are common trends and common cycles. Common trends are associated with the notion of cointegration (CI) introduced in Engle and Granger (1987). The relation between CI and the existence of common trends is the subject of Granger’s representation theorem, which was proved by Johansen for VAR processes integrated of orders 1 and 2, I(1) and I(2), see Johansen (1996) and references therein. Cointegration has generated a vast literature, see Johansen (2009a) for a recent summary. Common cycles have also received considerable attention, usually within systems which also display common trends, see e.g. Kugler and Neusser (1993), Lippi and Reichlin (1994), Vahid and Issler (2002), Hecq et al. (2002, 2006), Paruolo (2003, 2006), Schleicher (2007) and Cubadda et al. (2009). Several notions of common cycles have been proposed in the literature. Engle and Kozicki (1993) and Cubadda and Hecq (2001) proposed the notion of (polynomial-) serial correlation common features, here indicated as CS(d); these correspond to common factors in the AR representation, and d indicates the degree of the AR polynomial of the CS linear combinations. Alternatively, Gourieroux and Peaucelle (1988) and Vahid and Engle (1997) formalized the notion of co-dependence, which requires commonality in the moving average (MA) representation, i.e. collinearity in the impulse responses after some horizon d; we indicate it with CD(d),
106
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
where d is the degree of the MA polynomial of the CD linear combinations. Yet another form of common dynamics is associated with common factors in the set of final equations (FE), see Eq. (2.7) in Zellner and Palm (1974) and Cubadda et al. (2009); we refer to this notion as CE(d), where d refers to the degree of the MA polynomial in the FE of the CE linear combinations. This notion is of interest, for instance, when investigating the univariate ARMA representations of each component in the VAR as in Cubadda et al. (2009). Some aspects of the relationships among the notions of CS, CE, CD, CI for VAR processes have been investigated in the literature. Engle and Kozicki (1993) and Vahid and Engle (1993) considered implications of CI on the existence of CS(0). They noted that I(1) VARs with CI are compatible with CS(0) in the growth rate of the process, and a necessary condition for this is that the CS(0) linear combinations must belong to the orthogonal complement of the space spanned by the adjustment coefficients in the error correction term. Cubadda and Hecq (2001) and Hecq et al. (2006) defined and discussed the case of CS(d) in CI I(1) VARs, where the CS linear combinations always load the contemporaneous growth rate of the process. Paruolo (2003, 2006) gave extensions to cases of I(1) and I(2) systems with CS(d) linear combinations that possibly involve both or either the growth rates and deviations from equilibria. Some implications of CD(d) on VAR processes were discussed in Kugler and Neusser (1993), who noted that CD(d) implies the orthogonality between the CD linear combinations and some (implicitly defined) function of the AR coefficients that one encounters in the recursive calculation of the MA coefficients. Finally Cubadda et al. (2009) considered the implications of CS and CI on the FE representation of VAR processes, en route to obtaining the orders of the univariate ARMA(p, q) representations of single components. In particular they derived the implications of CS(d) on p, q for d ≥ 0, both for stationary and CI VAR processes with I(1) variables. This paper provides a comprehensive and unified discussion of the relationships among CS, CE, CD for I(0), I(1) and I(2) VARs. Our results extend and complement the results available in the literature cited above. We first derive results for stationary VARs, and then we apply them to stationary representations for I(1) and I(2) VAR processes. We present two types of propositions, one which concerns the existence and the other one that concerns the characterization of the common features linear combinations. Both types of conditions are based on algebraic relations between a matrix AR polynomial and its determinant and adjoint. The existence results consist of necessary and sufficient conditions stated in terms of an index m0 which is a function of the degrees of the AR polynomial, of its determinant and of its adjoint and it is shown to equal the order of the pole at 0 of the inverse of the reversed AR matrix polynomial. These existence results replicate the finding of Cubadda et al. (2009) that CS implies CE; the present results also give the reverse implication, i.e. that CE implies CS. In addition, these existence results show that CD implies CS and CE, but that the converse does not hold. The characterization results consist of necessary and sufficient conditions that CS, CE, CD linear combinations need to satisfy. It is shown that they need to belong to certain linear subspaces, associated with the expansion of the inverse of the reversed AR polynomial around its poles, see also Franchi and Paruolo (2010). The conditions are stated in terms of the subspaces directly related to the VAR polynomial, and they typically consist in some orthogonality conditions (which can be formulated as reduced rank conditions), plus a full rank condition. Restricted VARs with common features of type CS, CE, CD respectively, deliver factor structures in the AR, FE and MA
representations of the process. The present results can also be applied to more general dynamic factor models with State-Space and VARMA representations, when the AR or MA representations involve the inverse of a matrix polynomial. Moreover, our characterization results can be used to devise parameterizations of restricted VARs with CS, CE, CD features; this can be used to construct likelihood-based inference methods for the presence of CS, CE, CD features in VARs. Extensions to VARMA and inference methods, however, fall outside the scope of the present paper, which focuses on the characterization of CS, CE, CD constraints in VARs. Reduced rank restrictions on VAR coefficient matrices have been proposed in the time series literature as a way to obtain parameter parsimony. Special cases are the index models (IM) of Reinsel (1983) and the nested reduced rank specification (NRR) of Ahn and Reinsel (1988). The rank conditions derived here are different from the ones associated with IM and NRR; they are motivated not by parameter parsimony but as the restrictions corresponding one-to-one to commonality in cyclical features of the system. However, in characterizing CS structures below, we show a connection between IM-NRR and CS, observing that both IM and NRR imply CS, but they are not implied by it. The rest of the paper is organized as follows: Section 2 introduces notation and definition of structures of interest. Section 3 and Section 6 present a worked-out example. The main results are presented in Sections 4 and 5 which collect the existence and the characterization results for stationary VARs. Section 7 extends results to I(1) and I(2) systems, while Section 8 reports conclusions; the Appendix contains proofs. In the following we employ the following notational conventions. With a := b and b =: a we indicate that a is defined by b. For any full column rank matrix γ ∈ Cp×r , γ ∗ indicates the p × r matrix of complex conjugates and γ ′ the conjugate transpose of γ ; in case γ is real, γ ′ reduces to the transpose. We indicate by col γ the linear span of the columns of γ with coefficients in the field C or R if γ is complex or real, respectively. γ⊥ indicates a basis of col⊥ γ , the orthogonal complement of col γ , γ¯ := γ (γ ′ γ )−1 so that Pγ := γ¯ γ ′ = γ γ¯ ′ denotes the orthogonal projector matrix onto col γ and Mγ := I − Pγ the orthogonal projector matrix onto col⊥ γ . For a matrix A we often employ a rank factorization of the type A = −αβ ′ where α and β are bases of col A and col A′ , and the negative sign is chosen for convenience in the calculations. Any ∑b sum n=a · where b < a is defined equal to 0. For any polynomial
∑d γ (z ) := nγ=0 γn z n , z ∈ C, we indicate its degree by dγ and when p×r γn ∈ R for all n we say that γ (z ) has real coefficients. Finally, 1j,k is the indicator function, i.e. it is equal to 1 if j = k and 0 otherwise. 2. Setup and definitions
In this section we introduce notation and state the AR, FE and MA representations of a VAR system. We consider a VAR of dimension p and finite order dΠ Xt + Π1 Xt −1 + · · · + ΠdΠ Xt −dΠ = ϵt ,
(1)
and ϵt is a p-dimensional martingale where Π1 , . . . , ΠdΠ ∈ R difference sequence (with respect to the natural filtration generated by Xt ) with positive definite conditional covariance matrix Ω . A leading example of this is when ϵt are Gaussian i.i.d. random vectors. Deterministic components Dt are omitted from (1) for ease of exposition; they could be included by replacing Xt with Xt − Dt or by replacing ϵt with ϵt + Dt . ∑dΠ n Indicate the AR polynomial in (1) by Π (z ) := n=0 Πn z , Π0 := I , z ∈ C, and let det Π (z ), adj Π (z ) respectively be its adj Π (z ) characteristic and adjoint polynomials, so that inv Π (z ) = det Π (z ) . Remark that, because Π (z ) has real coefficients, so do det Π (z ) and p×p
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
adj Π (z ). It is useful to factorize the characteristic polynomial in terms of its roots zu ∈ ∏ C; because Π (0) = I , zu ̸= 0 and one q au −1 can write det Π (z ) = and u=1 (1 − wu z ) , where wu := zu au > 0. We define ρ := minu |zu | and observe that ρ > 0 because Π (0) = I is of full rank. Some of the factors in det Π (z ) can be common to adj Π (z ); hence consider the following factorization adj Π (z ) = G(z )
q ∏
(1 − wu z )bu ,
0 ≤ b u < au ,
u=1
(2)
G(zu ) ̸= 0, where1 G(z ) is the ‘reduced adjoint’ of Π (z ), see Franchi (2007) for the application of the same idea to the unit root case. Note that if there are no common factors, one has bu = 0 for all u, and G(z ) = adj Π (z ); the exponents bu are maximal because otherwise the condition G(zu ) ̸= 0 would not be satisfied. The following lemma gives the order of the pole of inv Π (z ) at z = zu , labeled mu , where we say that a matrix rational function B(z ) has a pole of order m at z = w if 0 < limz →w ‖(z − w)m B(z )‖ < ∞ and limz →w ‖(z − w)m−1 B(z )‖ = ∞ for any matrix norm ‖ · ‖. Lemma 2.1 (Co-Prime Polynomials G(z ), g (z ) and Orders mu ). Let ∏q det Π (z ) = ( 1 − wu z )au , where wu := zu−1 and au > 0; u=1 then inv Π (z ) =
G(z ) g (z )
,
z ∈ C \ z1 , . . . , zq ,
(3)
q ∏ (1 − wu z )mu ,
mu := au − bu > 0,
u =1
is the ‘minimal polynomial’ of Π (z ), so that inv Π (z ) has poles of order mu at z = zu , u = 1, . . . , q. We note that G(z ) and g (z ) in Lemma 2.1 have real coefficients, because if a complex root is common to adj Π (z ) and det Π (z ), so is its complex conjugate. Further note that if there are no common factors, one has mu = au for all u, g (z ) = det Π (z ) and G(z ) = adj Π (z ). Next consider the FE form det Π (L)Xt = adj Π (L)ϵt , see e.g. Zellner and Palm (1974) and Cubadda et al. (2009); eliminating the (possible) common factors from det Π (z ) and adj Π (z ) one obtains g (L)Xt = ϵt + G1 ϵt −1 + · · · + GdG ϵt −dG ,
∑dG
(4)
where G(z ) =: n=0 Gn z , G0 = I. We refer to (4) as the FE form of the VAR. The Taylor representation of inv Π (z ) around 0 has real coefficients and it is written as C (z ) := inv Π (z ) =
n
dC −
Cn z n ,
C0 = I , |z | < ρ,
n=0
where dC < ∞ if and only if g (z ) = 1, i.e. if and only if Π (z ) is unimodular. It is well known (see e.g. Brockwell and Davis, 1987, page 408) that if Π (z ) has stable roots, i.e. ρ > 1, then C (z ) is holomorphic on a disk larger than the unit disk, and the moving average (MA) form Xt =
∞ −
Cn ϵt −n ,
i > cϕ ,
(6)
and such that (6) does not hold when replacing > with ≥. Define dγ := minϕ∈Rp \{0} cϕ and let A be the set of all real vectors ϕ that satisfy (6) with cϕ = dγ ; we observe that A is a linear space. We say that A(z ) admits a ‘maximal degree reduction’ from dA to dγ iff dγ < dA . We call r := dim A > 0 the ‘rank of the maximal degree reduction’; we let γ0 indicate a basis of A and refer to it as a ‘basis of the maximal degree reduction’. We observe that the requirement that (6) does not hold when replacing > with ≥ implies that γ0′ Adγ has full row rank. Next we apply the maximal degree reduction to the AR, FE and MA representations in (1), (4) and (5), hence defining the notions of serial correlation common features (CS), commonality in the FE representation (CE) and co-dependence (CD). Definition 2.2 (CS, CE, and CD). We say that Xt ∈ CS(dγ ), 0 ≤ dγ < dΠ , if and only if Π (z ) admits a maximal degree reduction from dΠ to dγ with basis γ0 ∈ Rp×r ; this implies
γ0′ Xt + γ0′ Π1 Xt −1 + · · · + γ0′ Πdγ Xt −dγ = γ0′ ϵt , where γ0′ Πdγ has full row rank. We say that Xt ∈ CE(dγ ), 0 ≤ dγ < dG , if and only if G(z ) admits a maximal degree reduction from dG to dγ with basis γ0 ∈ Rp×r ; this implies g (L)γ0′ Xt = γ0′ ϵt + γ0′ G1 ϵt −1 + · · · + γ0′ Gdγ ϵt −dγ , where γ0′ Gdγ has full row rank. We say that Xt ∈ CD(dγ ), 0 ≤ dγ < dC , if and only if C (z ) admits a maximal degree reduction from dC to dγ with basis γ0 ∈ Rp×r ; this implies
where g (z ) :=
ϕ ′ Ai = 0,
107
(5)
n =0
corresponds to a linear process with second moments. In order to define the structures of interest, consider a real ∑dA n matrix function A(z ) = n=0 An z of degree dA ≤ ∞, where A = Π , G, C . Consider any real vector ϕ ̸= 0 such that
1 See Gantmacher (1959, Section IV.6) for the definition of reduced adjoint and minimal polynomial.
γ0′ Xt = γ0′ ϵt + γ0′ C1 ϵt −1 + · · · + γ0′ Cdγ ϵt −dγ , where γ0′ Cdγ has full row rank. Note that col γ0 and dγ are uniquely identified. Definition 2.2 encompasses several well-known cases: serial correlation common features as introduced in Engle and Kozicki (1993) correspond to the case CS(0) = CD(0). CD(dγ ) was introduced in Gourieroux and Peaucelle (1988), who considered finite-order moving averages with dC < ∞. CD(1) structures were studied in Vahid and Engle (1997); see also the scalar component models in Tiao and Tsay (1989). In Section 7 we map I(1) and I(2) systems into a stationary VAR form; thanks to these results, one finds that special cases of CS (after an appropriate transformation of the system) are given by the following notions: polynomial serial correlation common features, defined in Cubadda and Hecq (2001); weak and strong form reduced rank structures, see Hecq et al. (2006); unpredictable combinations, defined in Paruolo (2003, 2006). Finally a special case of CE is the factor structure in the adjoint, see Cubadda et al. (2009). 3. Motivating example In this section we present examples which show that the three notions given in Definition 2.2 differ. Consider two VAR(2) processes, indicated with superscripts µ and ν . For ℓ = µ, ν , let 1ℓ,µ := 1 if ℓ = µ and 1ℓ,µ := 0 if ℓ = ν ; let also Π ℓ (L)Xtℓ = ϵt , where
Π ℓ (z ) :=
1 0
0 1
+
1 3
3 −6
4
−5 + 1ℓ,µ
z−
1 2
1 1
1 2 z 1
and note that the column spaces of the VAR coefficients of two processes are the same. Because γCS := (1 : −1)′ is such that ′ ′ γCS Π2ℓ = 0 and γCS Π1ℓ has full row rank for ℓ = µ, ν , one has ℓ Xt ∈ CS(1), ℓ = µ, ν . Computing the determinant and the adjoint of Π ℓ (z ), one finds
108
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
1 g µ (z ) = − (z 3 − 2z 2 + 2z − 6), 6
1 g ν (z ) = − (2z − 3) 3
Theorem 4.1. Let the Ě operator be as in (8), w0 := 0 and wu := 1/zu , u = 1, . . . , q; then
and
Gℓ (z ) =
1 0
0 1
+
1 3
−5 + 1ℓ,µ 6
−4 3
z−
1
2
1 −1
−1 1
z2.
′ ℓ ′ ℓ Because γCE := (1 : 1)′ is such that γCE G2 = 0 and γCE G1 has full ℓ row rank for ℓ = µ, ν , one has Xt ∈ CE(1), ℓ = µ, ν . This also shows that the common feature vectors γCS , γCE are the same for the two processes. However γCS ̸= γCE , which shows that the CS and CE notions differ. ∑∞ ℓ Next we compute the MA representations Xtℓ = n=0 Cn ϵt −n ; µ one finds C0 = C0ν = I, µ
C1 = C1ν =
1
3
−3 6
−4 , 5 − 1ℓ,µ
µ
Cn = cn
−7 11
(3 : 1) where cn :=
1 18
n−2 2
3
.
µ
Because Cn is non-singular, there is no vector γ0 ̸= 0 such that µ µ γ0′ Cn = 0 and hence Xt ̸∈ CD(dγ ) for any dγ . However, γCD := ′ ′ ′ (11 : 7) is such that γCD Cnν = 0 for n = 2, 3, . . . and γCD C1ν has full ν row rank, i.e. Xt ∈ CD(1). This shows that the notion of CD differs from the ones of CS and CE. Given that the column spaces of the VAR coefficients of two processes µ and ν are the same, this example also shows that these column spaces do not fully determine the type of common cyclical features of a process. In the following we show that the index m0 in Eq. (7) is the key to understand the difference between the cyclical features of the two processes. 4. Existence of CS, CE and CD linear combinations In this section we present existence results for CS, CE and CD linear combinations. They are contained in Theorem 4.2, which links the degrees of the polynomials Π (z ), G(z ) and g (z ) with the existence of CS, CE and CD linear combinations. A central role is played by the index m0 := dΠ + dG − dg ,
(7)
where dγ indicates the degree of the polynomial γ (z ). In Theorem 4.1 it is shown that m0 is equal to the order of the pole at z = 0 of the reversed2 AR polynomial ΠĚ (z ), defined, for any matrix polynomial A(z ), as AĚ (z ) :=
dA −
AdA −n z n = z dA A(z −1 ).
z ∈ C \ {w0 , . . . , wq },
(9)
where m0 is as in (7), so that inv ΠĚ (z ) has poles of order mu at z = wu , u = 0, . . . , q. Note the presence of the additional pole at w0 in (9) which is absent from (3). This shows the advantage of using the reversed polynomial, which reveals all the points of rank-deficiency of Π (z ), finite or at ∞. Next we formulate the existence results for CS, CE and CD structures in terms of the index m0 . Theorem 4.2 (Existence of CS, CE, CD Structures). Let m0 be as in (7); then
and Cn is non-singular for n = 2, 3, . . . , while ν
GĚ (z ) inv ΠĚ (z ) = z −m0 , gĚ (z )
(8)
n =0
Eq. (8) involves the transformation z → 1/z, which is useful to study the behavior of functions at ∞, see Greene and Krantz (1997, Section 4.7). In particular one can show (see the proof of Theorem 4.1) that the roots of det ΠĚ (z ) include wu := 1/zu , the reciprocals of the characteristic roots zu , which satisfy zu ̸= 0; hence all wu := 1/zu are finite. Moreover, when ΠĚ (0) = ΠdΠ is singular, w0 := 0 is an additional root of det ΠĚ (z ). In fact, the behavior of ΠĚ (z ) near w0 := 0 reflects the behavior of Π (z ) for |z | → ∞, as ΠdΠ is the leading coefficient in Π (z ). Points of rank deficiency of ΠĚ (z ) correspond to poles of inv ΠĚ (z ); in particular one finds that m0 is the order of the pole of inv ΠĚ (z ) at w0 := 0.
2 The polynomial Π (z ) is often used to describe the stability of VAR systems (see Ě e.g. Fuller, 1996, page 77).
(i) the following statements are equivalent: (i.1) ΠdΠ is singular; (i.2) m0 > 0; (i.3) Xt ∈ CS(dγ ) for some dγ such that max(0, dΠ − m0 ) ≤ dγ ≤ dΠ − 1; (i.4) Xt ∈ CE(dγ ) for some dγ such that max(0, dG − m0 ) ≤ dγ ≤ dG − 1. (ii) if Xt ∈ CD(dγ ) then m0 ≥ dΠ , 0 ≤ dγ ≤ m0 − dΠ and (i.j) holds for any j = 1, . . . , 4; (iii) the statement (i.j) for some j = 1, . . . , 4 does not imply Xt ∈ CD(dγ ). Several remarks are in order. Remark 1. Theorem 4.2 states that a CS (or a CE) structure of some degree exists whenever m0 > 0, i.e. when the last coefficient matrix of Π (z ) is singular. Moreover, CS and CE structures always coexist, and this gives a converse to Cubadda et al. (2009, Proposition 8), who show that CS implies CE. Remark 2. Cubadda et al. (2009) observed that in empirical studies the degree of the AR part dg in the FE is usually found to be less than the maximum degree pdΠ implied by the underlying VAR. The index m0 is related to the degree reduction pdΠ − dg , as detailed in the following. Suppose that G(z ) = adj Π (z ) and g (z ) = det Π (z ), i.e. that the reduced adjoint and the minimal polynomial coincide with the adjoint and the determinant of Π (z ) (no common factors case); then one finds dg = pdΠ − m0 . In fact the points of rankdeficiency of ΠĚ (z ) are all finite and hence pdΠ = deg det ΠĚ (z ) = deg z m0 gĚ (z ) = m0 + dg ; moreover dG = m0 + dg − dΠ = (p − 1)dΠ . When there are common factors, one finds dg = pdΠ − m0 − s and dG = (p − 1)dΠ − s for some s > 0 associated to the presence of common factors, so that m0 accounts for the part of the gap which is not due to common factors. Remark 3. Theorem 4.2(ii) shows that a CD structure of some degree exists only if m0 ≥ dΠ ; because dΠ > 0, this implies that if a CD structure exists, then also CS (and CE) structures exist by Theorem 4.2(i). The converse does not hold. Remark 4. The difference dΠ − m0 = dg − dG provides a lower bound for dγ in CS(dγ ), and it reveals the highest reduction that can be achieved by a CS relation. When this difference is negative and if Xt ∈ CD(dγ ), then m0 − dΠ = dG − dg provides an upper bound for dγ , i.e. 0 ≤ dγ ≤ m0 − dΠ , and it reveals the highest order of a CD relation. In the next section we translate statements about m0 into statements about the coefficients of the VAR.
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
5. Characterization of CS, CE and CD linear combinations In this section we characterize CS, CE and CD linear combinations; the main results are contained in Theorem 5.3 for CS structures, Theorem 5.4 for CE structures, Theorem 5.5 for CD structures. Necessary and sufficient conditions are stated for each form of common cyclical features in terms of linear subspaces, associated with the orders m0 , m1 , . . . , mq of the poles of inv ΠĚ (z ), see (9). The relevant subspaces are found through a ‘local rank factorization’ of the reversed AR polynomial at wu , which is described in Definition 5.1 and it is based on the recursive algorithm developed in Franchi (2010) and further analyzed in Franchi and Paruolo (2010). This procedure consists in a sequence of m + 1 rank factorizations on the matrices in (10) and (11). Recall that for any full column rank matrix γ , we let γ¯ := γ (γ ′ γ )−1 so that Pγ := γ¯ γ ′ denotes the orthogonal projector matrix onto col γ and Mγ := I − Pγ the orthogonal projector matrix onto col⊥ γ . In what follows the matrices αj , βj are p × rj , rj ≤ p, of full column rank; when rj = 0, for simplicity of notation we define αj := α¯ j := βj := β¯ j := 0, the zero vector. Definition 5.1 (Local Rank Factorization of A(z ) at z = w ). Fix ∑dA n w ∈ C and let A(z ) = n=0 An (z − w) be a square matrix polynomial such that inv A(z ) has a pole of order m at z = w . Define α0 and β0 from the matrix rank factorization A0 = −α0 β0 . ′
(10)
For j = 1, . . . , m, let aj := (α0 : · · · : αj−1 ), bj := (β0 : · · · : βj−1 ), where if ri = 0 for some i, 0 ≤ i ≤ j − 1, then aj (respectively bj ) does not contain the corresponding αi := 0 (respectively βi := 0); define αj and βj from the matrix rank factorization Maj Aj,1 Mbj = −αj βj , ′
(11)
Aj,k := Aj−1,k+1 + Aj−1,1
β¯ n α¯ n′ An+1,k
successively eliminate subspaces until the terminal condition of full rank is met. Note that for ℓ, n ≥ j, α¯ ℓ′ Maj = α¯ ℓ′ − α¯ ℓ′ Paj = α¯ ℓ′ and Mbj β¯ n = β¯ n − Pbj β¯ n = β¯ n so that using (11) one has α¯ ℓ′ Aj,1 β¯ n =
α¯ ℓ′ Maj Aj,1 Mbj β¯ n = −α¯ ℓ′ αj βj′ β¯ n , or −I r j 0 ′ ¯ , a¯ j⊥ Aj,1 bj⊥ = 0
0
(13)
where rj := rank Maj Aj,1 Mbj . Remark 8. The local rank factorizations of Π (z ) at z = 1 for m = 1, 2 were derived by Johansen (1992), see also (22) and (23); they are called respectively the ‘I(1) conditions’ and the ‘I(2) conditions’. Remark 9. There is a duality between the local rank factorizations of A(z ) and the one of its reduced adjoint, here indicated as B(z ). That is, let αj , βj and ξj , ηj be respectively defined by the local rank factorizations of A(z ) and B(z ) at z = w ; then for j = 0, . . . , m, one can choose
ξj = κ β¯ m−j and ηj = α¯ m−j ,
(14)
where κ := −h(w) and det A(z ) =: (z −w)a h(z ), a > 0, h(w) ̸= 0, see Franchi and Paruolo (2010) for the proof. This can be seen as a generalization of Proposition 8 in Cubadda et al. (2009) about the presence of a factor structure in the adjoint under CS. Finally we introduce a last piece of notation. Definition 5.2 (Matrices Ai,λ , Ai,j,k ). Let m, αj , βj , and Aj,k be as in Definition 5.1 and for 0 ≤ λ ≤ m − 1 and 1 ≤ i ≤ m − λ, define ∑λ ¯ ¯ j′ Ai,j+1,1 where A1,j,k := Aj,k and Ai,j,k is defined Ai,λ := j=0 βj α from the recursions Ai+1,j,k := Ai,j,k+1 + Ai,j,1
λ+i −
β¯ h α¯ h′ Ah+1,k .
(15)
h=0
where Aj,k is defined for j, k ≥ 1 from the recursions j−2 −
109
In the following we compute the local rank factorization of
(12)
n =0
with initial values A0,k := Ak−1 , with AdA +h := 0 for h > 0. The local rank factorization in Definition 5.1 gives a characterization of the set of reduced rank restrictions that are satisfied by the coefficients of a matrix polynomial whose inverse function has a pole of given order at a specific point, see Franchi and Paruolo (2010) for the proof. That is, if A(z ) and its derivatives at z = w satisfy those conditions, then inv A(z ) has a pole of order m at the same point; the converse is also true, i.e. if inv A(z ) has a pole of order m at z = w then A(z ) and its derivatives at z = w satisfy the rank restrictions of the local rank factorization at that point. The following additional remarks are in order: Remark 5. Eqs. (10) and (11) define αj , βj up to a conformable change of bases of the row and column spaces; this indeterminacy does not affect the results, in the sense that the latter do not depend on the particular choice of the pair αj , βj . Remark 6. The square matrices (α0 : · · · : αm ) and (β0 : · · · : βm ) are non-singular with orthogonal blocks, i.e. αj′ αk = βj′ βk = 0 for j ̸= k. To simplify notation, we let aj := (α0 : · · · : αj−1 ), aj⊥ := (αj : · · · : αm ). Similarly we define bj , bj⊥ in terms of βj blocks. Remark 7. The conditions (11) are reduced-rank conditions for j = 1, . . . , m − 1, while the terminal condition for j = m is a full-rank condition. The matrices Maj = I − Paj , Mbj = I − Pbj
ΠĚ (z ) at wu and we write au,j := (αu,0 : · · · : αu,j−1 ), bu,j := (βu,0 : i(,λu) , so that the point at which the · · · : βu,j−1 ), Πj(,uk) , Πi(,uj,)k , and Π local rank factorization is conducted is referenced explicitly. We are now in the position to give a characterization of the structures of interest, starting from CS structures.
Theorem 5.3 (Characterization of CS Structures). Consider the AR ∑dΠ n polynomial Π (z ) = n=0 Πn z and let a0,j , b0,j be defined by the local rank factorization of ΠĚ (z ) at w0 := 0 and let s := dΠ − dγ , where max(0, dΠ − m0 ) ≤ dγ ≤ dΠ − 1; for s > 1 define the orthogonality conditions
γ0′ (Πdγ +1 b0,s−1 : · · · : ΠdΠ −1 b0,1 ) = 0.
(16)
Then the following statements are equivalent: (i) Xt ∈ CS(dγ ), which implies γ0′ Πdγ has full rank and γ0′ (Πdγ +1 : · · · : ΠdΠ ) = 0; (ii) γ0 = a0,s⊥ ϕ0 , γ0′ Πdγ has full rank and, when s > 1, (16) holds. Theorem 5.3 gives the equivalent characterization (ii) of the CS structure defined in (i). In (i) one sees that γ0 is a basis of the orthogonal complement of col(Πdγ +1 : · · · : ΠdΠ ) and satisfies a terminal full rank condition, namely γ0′ Πdγ of full rank. The orthogonality conditions can be expressed also as d col γ0 ⊂ i=Πdγ +1 col⊥ Πi , which is implied by the nested reduced rank specification of Ahn and Reinsel (1988) and by the index models of Reinsel (1983). We observe here that (ii) does not imply col Πj ⊃ col Πj+1 , which corresponds to nested reduced rank specifications, or Π (z ) = I + α(z )β ′ (z ) in an obvious notation,
110
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
which characterizes index models. Hence IM and RR imply CS and not vice versa. Theorem 5.3(ii) gives a characterization of γ0 in terms of the coefficients of the local rank factorization of ΠĚ (z ) at 0; first it shows that γ0 belongs to the space spanned by the columns of a0,s⊥ := (α0,s : · · · : α0,m0 ) where s := dΠ − dγ . This space ranges between the following polar cases: when dγ = dΠ − 1, col γ0 ⊂ col(α0,1 : · · · : α0,m0 ), while when dγ = dΠ − m0 > 0 one has γ0 = α0,m0 ϕ0 . Thus the smaller dγ in CS(dγ ), the smaller is the linear space in which γ0 can be chosen. The condition that γ0 belongs to the given linear space is, however, only necessary and not sufficient in order to obtain CS; the additional condition a′0,s⊥ (Πdγ +1 b0,s−1 : · · · : ΠdΠ −1 b0,1 ) =: A (say) of reduced rank is needed. This determines ϕ0 as a basis of col⊥ A and completes the characterization of γ0 if the full rank condition on γ0′ Πdγ is met. As a final remark, we observe that Theorem 5.3(ii) involves exclusively coefficients defined by the local rank factorization of ΠĚ (z ) at 0. The case of CE structures is considered in the next theorem. Theorem 5.4 (Characterization of CE Structures). Consider the AR ∑dΠ n polynomial Π (z ) = n=0 Πn z and its reduced adjoint G(z ) =
∑dG
Gn z n defined in (2). Let a0,j , b0,j be defined by the local rank factorization of ΠĚ (z ) at w0 := 0, let s := m0 − dG + dγ + 1, where max(0, dG − m0 ) ≤ dγ ≤ dG − 1, and let H0,n := (β0,n−1 : n=0
1(,0n)−1 β0,n : · · · : Π m(0)−n+1,n−1 β0,m0 ), where Π i(,0j ) is obtained Π 0 applying Definition 5.2 to ΠĚ (z ). Further, for s ≤ m0 − 1 define the orthogonality conditions
γ0′ (Gdγ +1 a0,s+1⊥ : · · · : GdG −1 a0,m0 ⊥ ) = 0, 1(,0s)−1 b0,s+1⊥ γ0 (Π ′
: ··· :
m(0)−s,s−1 b0,m0 ⊥ ) Π 0
(17)
= 0.
(18)
Then the following statements are equivalent: (i) Xt ∈ CE(dγ ), which implies γ0′ Gdγ has full rank and γ0′ (Gdγ +1 : · · · : GdG ) = 0; (ii) γ0 = b0,s ϕ0 together with either of the following equivalent conditions: (ii.1) γ0′ Gdγ has full rank and, when s < m0 , (17) holds; (ii.2) γ0′ H0,s has full rank and, when s < m0 , (18) holds. Theorem 5.4 is the dual of Theorem 5.3 for CE structures and its interpretation is exactly as above, with Π· , a0,·⊥ and b0,· in Theorem 5.3(ii), respectively replaced by G· , b0,· and a0,·⊥ in Theorem 5.4(ii.1); this is a consequence of the duality result, see Remark 5 above, Eq. (14). Theorem 5.4(ii) states that γ0 belongs to the space spanned by the columns of b0,s := (β0,0 : · · · : β0,s−1 ) where s := m0 − dG + dγ + 1. Similarly to the CS case, shorter CE structures correspond to smaller linear subspaces; the polar cases are given by γ0 = (β0,0 : · · · : β0,m0 −1 )ϕ0 when dγ = dG − 1 and by γ0 = β0,0 ϕ0 when dγ = dG − m0 > 0. As in the CS case, Theorem 5.4(ii.1) also states the additional condition b′0,s (Gdγ +1 a0,s+1⊥ : · · · : GdG −1 a0,m0 ⊥ ) =: A
(say) of reduced rank. This determines ϕ0 as a basis of col⊥ A and completes the characterization of γ0 if the full rank condition on γ0′ Gdγ is met. The conditions in Theorem 5.4(ii.1) are stated in terms of the coefficients of the reduced adjoint G(z ), while in Theorem 5.4(ii.2) we give an equivalent characterization in terms of the coefficients defined by the local rank factorization of ΠĚ (z ) at w0 = 0. The lat(0)
s,ℓ matrices introduced in Defter characterization involves the Π inition 5.2. Also these conditions involve some reduced rank con1(,0s)−1 b0,s+1⊥ : ditions and a full rank condition; in particular b′0,s (Π m(0)−s,s−1 b0,m0 ⊥ ) =: A (say) must be of reduced rank and ··· : Π 0 ϕ0 is a basis of col⊥ A. This completes the characterization of γ0 if
the full rank condition on γ0′ H0,s is met. As in the CS case, Theorem 5.4(ii.2) involves exclusively coefficients defined by the local rank factorization of ΠĚ (z ) at 0. Finally we turn to CD structures; we let s0 := m0 − dG + k + 1, where dg ≤ k ≤ dG , so that dΠ + 1 ≤ s0 ≤ m0 + 1. Theorem 5.5 (Characterization of CD Structures). Consider the AR ∑dΠ n polynomial Π (z ) = n=0 Πn z and its reduced adjoint G(z ) defined
∑d
G n in (2) with representations G(z ) = n=0 Gu,n (z − zu ) . For u = 0, 1, . . . , q, let au,j , bu,j be defined by the local rank factorization of ΠĚ (z ) at wu . Let s0 := m0 − dG + k + 1, where dg ≤ k ≤ dG ; let also su := 1 for u = 1, . . . , q, and let H0,n be defined as in Theorem 5.4. Further, for su ≤ mu − 1 define the orthogonality conditions
γ0′ (Gu,dG −mu +su au,su +1⊥ : · · · : Gu,dG −1 au,mu ⊥ ) = 0,
(19)
1(,us) −1 bu,su +1⊥ γ 0 (Π u
(20)
′
: ··· :
m(u)−s ,s −1 bu,mu ⊥ ) Π u u u
= 0.
Then, the following statements are equivalent: (i) Xt ∈ CD(k − dg ); (ii) γ0′ G0,k has full rank and γ0′ (Gu,dG −mu +su : · · · : Gu,dG ) = 0 for u = 0, 1, . . . , q; (iii) γ0 = bu,su ϕu for u = 0, 1, . . . , q, together with either of the following equivalent conditions: (iii.1) if s0 < m0 + 1, γ0′ G0,k has full rank and, for all u = 0, 1, . . . , q such that su < mu , (19) holds; (iii.2) if s0 < m0 + 1, γ0′ H0,s0 has full rank and, for all u = 0, 1, . . . , q such that su < mu , (20) holds. Theorem 5.5 has the same structure as Theorems 5.3 and 5.4; two equivalent formulations of the CD structures are given. However, there is an important difference with the previous cases: the fact that γ0′ Xt = γ ′ (L)ϵt , where γ ′ (z ) := γ0′ inv Π (z ) is a matrix polynomial, implies that γ0 must cancel the principal part of inv Π (z ), i.e. the part of inv Π (z ) that contains the poles. Recall that, see Lemma 2.1, inv Π (z ) has poles at the characteristic roots and let ζu ̸= 0 be a real vector such that ζu′ inv Π (z ) has no pole at z = zu but still poles at z = zw , w ̸= u; itthen follows that there q exists a γ0 that cancels all the poles only if u=1 col ζu ̸= {0} and γ0 belongs to this intersection subspace. This highlights the fact that CD structures involve much stronger requirements on Π (z ) than CS, CE. Because su = 1 for u ̸= 0, one has that γ0 must have the representation γ0 = βu,0 ϕu , u ̸= 0, which does not vary with dγ := k − dg . This implies that the length of a CD structure restricts γ0 only at w0 = 0 through γ0 := b0,s0 ϕ0 where s0 := m0 − dG + k + 1 and dg ≤ k ≤ dG . As in the CS, CE cases, shorter CD structures are found in a smaller subspace; it is interesting to note that when k = dG , i.e. s0 = m0 + 1 and dγ = dG − dg = m0 − dΠ , one has γ0 = (β0,0 : · · · : β0,m0 )ϕ0 ; because col(β0,0 : · · · : β0,m0 ) = Rp , this means that the only restrictions are the ones at the characteristic roots wu , u ̸= 0. In the other cases one has γ0 = (β0,0 : · · · : β0,s0 −1 )ϕ0 up to the polar case k = dg , i.e. dγ = 0, in which γ0 = (β0,0 : · · · : β0,dΠ )ϕ0 . As above, the full rank condition on γ0′ G0,k completes the characterization of γ0 . In Theorem 5.5(iii.2) we translate the conditions on left null spaces of the G coefficients into their counterpart in terms of functions of the AR coefficients. As in Theorem 5.4(ii.2), the conditions are given by a reduced rank and a full rank condition i(,uj ) , u = 0, . . . , q, introduced in Defiwhich involve blocks of Π (u)
1,s −1 bu,su +1⊥ : nition 5.2. The reduced rank condition on b′u,su (Π u (u) · · · : Πmu −su ,su −1 bu,mu ⊥ ) =: Au (say) implies that ϕu must be a baq
sis of u=0 col⊥ Au and this completes the characterization of γ0 if the full rank condition on γ0′ H0,s0 is met. As in the CS, CE cases we observe that only the local rank factorization of ΠĚ (z ) at w0 is responsible for the length of the CD structure, while the presence of such a structure is related to the local rank factorization of ΠĚ (z ) at wu , u ̸= 0.
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
ν(0)
6. Continuation of the example In this section we consider the processes introduced in Section 3 and show how one can use the conditions on the VAR polynomial stated in Theorems 4.2, 5.3–5.5 to establish properties of the processes without computing the FE and MA representations. In particular we show that for process µ, which is not in CD, one has m0 = 1,
γCS = α0,1 ,
γCE = β0,0 ,
γCS = α0,3 ,
γCE = β0,0 ,
γCD = β1,0 ,
µ
see Definition 5.1. These give α0,0 = (1 : 1)′ , β0,0 = 12 (1 : µ µ 1 1)′ , α0,1 = (1 : −1)′ , β0,1 = 12 (−1 : 1)′ , and α0ν,0 = (1 : 1)′ , β0ν,0 =
(1 : 1)′ , α0ν,1 = α0ν,2 = β0ν,1 = β0ν,2 = 0, α0ν,3 = (1 : −1) = 13 (−1 : 1)′ . We now illustrate Theorem 5.3 on CS structures. For ℓ = µ one has dγ = 1, see the results given above concerning Theorem 4.2, and Theorem 5.3(ii) gives s = 1, a0,1⊥ = α0,1 = (1 : −1)′ , a′0,1⊥ Π1 = (3 : 8/3) ̸= 0 and condition (16) is not involved because s = 1. Hence condition 5.3(ii) is satisfied for γCS = α0,1 = (1 : −1)′ ; this agrees with the direct calculations ′
1 2
, β0ν,3
given in Section 3. For ℓ = ν one has dγ = 0 or 1, see the results given above concerning Theorem 4.2. For dγ = 1 one has that the conditions in Theorem 5.3(ii) are s = 1, a0,1⊥ = α0,3 = (1 : −1)′ , a′0,1⊥ Π1 = (3 : 3) ̸= 0 and condition (16) is not involved because s = 1. Hence condition 5.3(ii) is satisfied for γCS = α0,3 = (1 : −1)′ , in accordance with the direct calculations given in Section 3. Thus Xtν ∈ CS(1). Setting instead dγ = 0 in Theorem 5.3(ii), one has s = 2 and the candidate linear combination is γ0 := a0,2⊥ = α0,3 = (1 : −1)′ . Condition (16) would require γ0′ Π1 b0,1 = 0; however one finds γ0′ Π1 b0,1 = α0′ ,3 Π1 β0,0 = 3 ̸= 0. Therefore it is not satisfied and we conclude that Xtν ̸∈ CS(0). Next we consider Theorem 5.4 on CE structures for ℓ = µ. In (ii.2) one finds s = 1 = m0 , γ0 := β0,0 = 12 (1 : 1)′ and γ0′ H0,1 = (1/2 : α¯ 0′ ,0 Π1 β0,1 ) ̸= 0; finally condition (17) is not involved µ
3 2
is the only one to be canceled and the
local rank factorization of ΠĚ (z ) at z =
where the index u = 1 in β1,0 refers to the characteristic root z1 = 3/2 in det Π (z ). Here and in the following we omit the superscripts µ and ν for readability, whenever possible. We first consider Theorem 4.2; because dΠ ℓ = dGℓ = 2, dg µ = µ 3, dg ν = 1, using (7) one finds m0 = 1 and mν0 = 3. Because m0 > 0 in Theorem 4.2(i.2) for ℓ = µ, ν , one hence has Xtℓ ∈ CS(dγ ), CE(dγ ), for some dγ such that max(0, djℓ − mℓ0 ) ≤ dγ ≤ djℓ − 1, j = Π , G. For ℓ = µ, dj − m0 = 1 = dj − 1, j = Π , G, µ and hence Xt ∈ CS(1), CE(1). Moreover, as discussed in Remark 4 below Theorem 4.2, m0 ≥ dΠ is a necessary (but not sufficient) condition for CD; in the ℓ = µ case, 1 = m0 < dΠ = 2 and hence µ Xt ̸∈ CD. For ℓ = ν , one has dj − m0 = −1, j = Π , G, so that Xtν ∈ CS(dγ ), CE(dγ ), where 0 ≤ dγ ≤ 1. The necessary condition for CD m0 ≥ dΠ is satisfied and hence Xtν ∈ CD(dγ ) is not ruled out. Because m0 − dΠ = 1, from Theorem 4.2(ii) we have that 0 ≤ dγ ≤ 1. In order to use the characterization results in Theorems 5.3–5.5, we compute the local rank factorizations of ΠĚℓ (z ) at 0, ℓ = µ, ν , µ
1,1 b0,m0 ⊥ = 0; however one Condition (17) would require γ0′ Π ′ finds γ0 Π1 β0,3 = 1/2 ̸= 0 and it is not satisfied. This shows that Xtν ̸∈ CE(0). Finally we illustrate the results in Theorem 5.5 on CD structures. µ From the results given above concerning Theorem 4.2, Xt ̸∈ CD so we only consider ℓ = ν and dγ = 0 or 1, see the results given above concerning Theorem 4.2. Because det Π (z ) = − 13 (2z − 3), the pole of order m1 = 1 at z1 =
while for process ν , which is in CD, one has m0 = 3,
111
because s = m0 . Hence Xt ∈ CE(1) with γCE = β0,0 = 21 (1 : 1)′ , as obtained by direct calculation in Section 3. For ℓ = ν and dγ = 1 one finds s = 3 = m0 , γ0 := β0,0 = 1 (1 : 1)′ , and γ0′ H0,3 = (0 : α¯ 0′ ,0 Π1 β0,3 ) = (0 : 1/9) ̸= 0; finally 2 condition (17) is not involved because s = m0 . Hence Xtν ∈ CE(1) with γCE = β0,0 = 12 (1 : 1)′ , in line with what was obtained in Section 3. For dγ = 0 one finds s = 2 ̸= m0 = 3, and the candidate linear combination is γ0 := β0,0 = 12 (1 : 1)′ .
gives α1,0 = (− 13 : 4 1)′ , β1,0 = 16 (11 : 7)′ , α1,1 = (3 : 1)′ , β1,1 = 1275 (7 : −11)′ . For dγ = 1 = k − dg , i.e. k = 2 = dG , Theorem 5.5(iii) gives γ0 := β1,0 = 16 (11 : 7)′ and condition (iii.1) only involves the full 2 3
1 rank condition γ0′ G0,1 = − 18 (13 : 23) ̸= 0 because s1 = 1 = m1 .
Hence Xtν ∈ CD(1) with γCD = β1,0 = 16 (11 : 7)′ , in line with what obtained in Section 3. For dγ = 0, i.e. k = 1, condition (iii) would require the candidate γ0 to lie in the intersection of the spaces spanned by β1,0 = 16 (11 : 7)′ and b0,3 = β0,0 = 12 (1 : 1)′ ; however col β1,0 ∩ col β0,0 = {0}. This shows that Xtν ̸∈ CD(0). Note that all the results have been directly derived from the AR polynomial. 7. I(1) and I(2) systems In this section we show how the results given for stationary VARs in Sections 4 and 5 can be directly extended to VAR systems with I(1) and I(2) variables. The main idea is that Johansen’s representation theorems for I(1) and I(2) VAR systems (see Johansen, 1996, Chapter 4) imply that one can transform the original system variables into a stationary VAR process of the same dimension. The results of the previous sections then apply to the transformed system. Stationary transformations for I(1) VAR systems can also be found inter alia in Mellander et al. (1992). Here we state the existence of such transformations in Propositions 7.1 and 7.3 for ease of later reference. Corollaries 7.2 and 7.4 show how these transformations encompass CS(0) linear combinations in ∆Yt or ∆2 Yt respectively, using the characterization of these relationships provided in Paruolo (2003, 2006). Consider first an I(1) VAR(dA ) process A(L)Yt = ut with error correction representation
Γ (L)∆Yt = α0 β0′ Yt −1 + ut
(21)
∑dΓ
i where ∆ := 1 − L, Γ (L) = I − i=1 Γi L , dA = dΓ + 1. The VAR polynomial A(z ) = (1 − z )Γ (z ) − α0 β0′ z has a local rank factorization (see Definition 5.1) around the point z1 = 1 which satisfies3
A0 = −α0 β0′ ,
Ma1 A1,1 Mb1 = −α1 β1′
(22)
where (α0 : α1 ) and (β0 : β1 ) are square non-singular matrices with orthogonal blocks of dimension r0 , r1 = p − r0 . The conditions (22) have been shown to hold4 if and only if A(z ) is I(1) with r1 characteristic roots at z1 = 1, see Johansen (1996). Remark that the condition Ma1 A1,1 Mb1 = −α1 β1′ is a full rank condition, and corresponds to the requirement α¯ 1′ Γ β¯ 1 = Ip−r0 , i.e. rank(α1′ Γ β1 ) = p − r0 , see Johansen (1996, Eq. (4.5) p. 49), where Γ := Γ (1), A1,1 = −Γ − α0 β0′ . In the following proposition we present a transformation that eliminates the r1 roots at z1 = 1; the transformation is stated in terms of an arbitrary matrix c1 of the same dimensions as β1 such that c1′ β1 is square and of full rank.
3 In this section we simplify the notation α for z = 1 into α and similarly for 1,j 1 j r1,j , β1,j . 4 Provided all characteristic roots are either at z = 1 or in the stationary region.
112
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
Proposition 7.1 (I(1) VAR). Let A(L)Yt = ut be a VAR(dA ) of dimension p, which satisfies the conditions (22) and let c1 be defined as above; then the transformed system Xt := (Yt′ β0 : ∆Yt′ c1 )′ , still of dimension p × 1, follows a VAR process Π (L)Xt = εt with Π (0) = I , εt := (β0 : c1 )′ ut , and the characteristic roots of Π (z ) are the same as the ones of A(z ), except for the r1 characteristic roots of A(z ) at z1 = 1 that are not characteristic roots of Π (z ). Moreover dΠ = dA and ΠdΠ = (J0 : 0p×r1 ), where we have partitioned the Πj matrices in blocks of columns conformable with the two components of Xt . A special case of the transformation from Yt to Xt in Proposition 7.1 is the one with c1 = β1 . In general the choice of c1 ̸= β1 allows more flexibility; an instance of the latter given in the following corollary. We recall that Theorem 5 in Paruolo (2003) characterizes the CS(0) linear combination of the type ψ ′ ∆Yt when Yt is an I(1) VAR as ψ = α1 ζ⊥ where α1′ (I + A1,1 )β1 = ζ ρ ′ , with ζ and ρ are full column rank r1 × r1 − s matrices. Corollary 7.2. The matrix c1 can be chosen as c1 = (ψ : ϑ) with ψ = α1 ζ⊥ and ϑ such that c1 has full column rank, so that the CS(0) linear combinations ψ ′ ∆Yt appear as (possibly a subset) of the CS(0) relations of Xt , where Xt := (Yt′ β0 : ∆Yt′ c1 )′ is defined as in Proposition 7.1. Note that when dΠ = 1 one has ψ = α1 , i.e. all of α1 identifies CS relations in ∆Yt and one can choose c1 = α1 . More in general, the given transformation from Yt to Xt allows one to embed any CS(0) linear combinations ψ ′ ∆Yt as a component in the vector Xt . This implies that all the results given in Sections 4 and 5 for stationary VARs apply to Xt as defined in Proposition 7.1 and Corollary 7.2. Hence the existence and characterization results of this paper encompass all the cases defined previously in the literature; in particular this includes the CS(0) linear combinations ψ ′ ∆Yt (as detailed above) as well as the weak form of serial correlation common feature relations ψ0′ ∆Yt −ψ1′ α0 β0′ Yt −1 defined in Hecq et al. (2002), where the latter are represented as ϕ0′ Xt +
−1 ϕ1′ Xt −1 with ϕ0′ := ψ0′ B−1 and ϕ1′ := −(ψ0′ c0 β0′ c0 + ψ1′ α0 : 0) −1 −1 ′ ′ −1 and B = (c0 β0 c0 : β1 c1 β1 ) where c0 := c1⊥ , see the proof of Proposition 7.1. We next turn to I(2) systems A(L)Yt = ut , whose VAR polynomial A(z ) is known to satisfy the conditions5 A0 = −α0 β0′ ,
Ma1 A1,1 Mb1 = −α1 β1′ ,
(23)
Ma2 A2,1 Mb2 = −α2 β2 , ′
see Johansen (1996, Eqs. (4.25)–(4.27)), where (α0 : α1 : α2 ) and (β0 : β1 : β2 ) are square non-singular ∑2 matrices with orthogonal blocks of dimension r0 , r1 , r2 , p = i=0 ri . The conditions (23) hold if and only if A(z ) is I(2) with 2r2 + r1 characteristic roots at z1 = 1, see Johansen (1996). Similarly to the I(1) case, we consider a matrix c2 of the same dimension of β2 , such that the product c2′ β2 is square and nonsingu-
−1
lar. We also define µ′ := −α¯ 0′ A1,1 β2 c2′ β2 and let τ := (β0 : τ1 ) be a basis of col(β0 : β1 ) = col(τ ), where τ1 is not necessarily orthogonal to β0 .
Proposition 7.3 (I(2) VAR). Let A(L)Yt = ut be a VAR(dA ) of dimension p which satisfies the conditions (23) and let c2 , µ, τ1 be defined as above; then the transformed system Xt := (Yt′ β0 − ∆Yt′ c2 µ : ∆Yt′ τ1 : ∆2 Yt′ c2 )′ , still of dimension p × 1, follows a VAR process Π (L)Xt = εt with Π (0) = I , εt := (β0 − c2 µ : τ1 :
5 Here and in the following we maintain the assumption d ≥ 2; see Paruolo A (2004, 2005) for the I(2) VAR(1) case.
c2 )′ ut , where the characteristic roots of Π (z ) are the same as the ones of A(z ), except for the 2r2 + r1 characteristic roots of A(z ) at z1 = 1 that are not characteristic roots of Π (z ). Moreover dΠ = dA , ΠdΠ = (J0 : 0p×r1 +r2 ) and J0 µ = −F2 , with ΠdΠ −1 =: (F0 : F1 : F2 ), where the Πj matrices are partitioned conformably with the three component of Xt . Also in I(2) systems, one can consider CS(0) linear combinations of the type ψ ′ ∆2 Yt when Yt is an I(2) VAR and ψ is p × s. We recall that Theorem 6 in Paruolo (2006) characterizes them as ψ = α2 ζ⊥ where α2′ (I +A2,1 )β2 = ζ ρ ′ , with ζ and ρ full column rank r2 ×r2 −s matrices. Corollary 7.4. The matrix c2 can be chosen as c2 = (ψ : ϑ) with ψ = α2 ζ⊥ and ϑ such that c2 has full column rank, so that the CS(0) linear combinations ψ ′ ∆2 Yt appear as (possibly a subset) of the CS(0) relations of Xt , where Xt := (Yt′ β0 − ∆Yt′ c2 µ : ∆Yt′ τ1 : ∆2 Yt′ c2 )′ is defined as in Proposition 7.3. We note that ΠdΠ in Propositions 7.1 and 7.3 is singular, and hence Theorem 4.2(i.1) implies that all I(1) and I(2) VAR processes at least present CS and CE structures; obviously, all remaining conditions should be further controlled, possibly revealing the presence of CD structures and/or more stringent CS and CE structures. 8. Conclusion The present paper characterizes the restrictions on the VAR coefficients that correspond 1-to-1 to the presence of common cyclical features of the CS, CE and CD type. These characterizations are associated with the local rank factorizations of the reversed AR polynomial around its characteristic roots. The given characterizations also apply to stationary representations of VARs with I(1) and I(2) variables. These conditions extend and complement the ones already available in the literature. The approach adopted here is an algebraic one, based on the properties of a matrix polynomial and its adjoint and determinant. These results may be of interest in other contexts involving the inversion of matrix polynomials. Acknowledgements We would like to thank Franz Palm, two anonymous referees and the following people for helpful comments and discussions on previous versions of the paper: Gianluca Cubadda, Stéphane Gregoir, Niels Haldrup, Alain Hecq, Marco Lippi, Roman Liška, Bent Nielsen, Hashem Pesaran, Søren Johansen, Anders Rahbek, Martin Wagner and participants of the conference ‘Factor structures for panel and multivariate time series data’ held in Maastricht, 18–20 September 2008. This paper was written while the first author was Carlo Giannini Fellow 2007–2009. Partial financial support is acknowledged from the Italian Ministry of University grant Cofin2006-13-1140 and University of Insubria FAR 2007–2008. Appendix A. Proofs Proof of Lemma 2.1. Because det Π (zu ) = 0 one has 0 ≤ rank Π (zu ) ≤ p − 1; when rank Π (zu ) < p − 1 one has adj Π (zu ) = 0 and thus each entry of adj Π (z ) contains the factor (1 − wu z )bu , wu := zu−1 , for some 0 < bu < au , because bu ≥ au would imply that inv Π (zu ) exists and this would yield a contradiction. If rank Π∏ (zu ) = p − 1 then adj Π (zu ) ̸= 0 and q thus bu = 0. Let g (z ) := u=1 (1 − wu z )mu =: (1 − wu z )mu gu (z ); adj Π (z )
because inv Π (z ) = det Π (z ) and G(zu ), gu (zu ) ̸= 0 one has that inv Π (z ) has a pole of order mu at z = zu .
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
Proof of Theorem 4.1. Consider a characteristic root zu , finite and different from 0, and let wu := zu−1 ; because ΠĚ (zu−1 ) = −d
zu Π Π (zu ) and Π (zu ) is singular, then rank ΠĚ (wu ) < p, i.e. wu is a root of det ΠĚ (z ). The same holds for all characteristic roots zu ; ∏q one then finds the factor u=1 (z − wu )au , au > 0, in det ΠĚ (z ). Moreover, we observe that lim|z |→∞ z −dΠ Π (z ) = ΠdΠ = ΠĚ (0); hence ΠdΠ is singular iff 0 is an additional point of rank-deficiency for ΠĚ (z ) and adj ΠĚ (z ) = z b0 GĚ (z ),
det ΠĚ (z ) = z a0 gĚ (z ), GĚ (0) ̸= 0,
0 ≤ b 0 < a0 ,
gĚ (0) ̸= 0,
which imply inv ΠĚ (z ) =
G (z ) z −c0 g Ě(z ) , z Ě
∈ C \ {w0 , . . . , wq }, where
c0 := a0 − b0 > 0. Next we wish to show that c0 = m0 . By the definition of the Ě operator one has inv Π (z −1 ) = z dΠ inv ΠĚ (z ); hence, substituting for inv ΠĚ (z ) one finds inv Π (z
−1
)=
z dΠ −c0
GĚ (z ) gĚ (z )
,
z ∈ C \ {w0 , . . . , wq },
G(z )
and because inv Π (z ) = g (z ) , one also has −1
z ∈ C \ {w0 , . . . , wq }.
(25)
Equating (24) and (25) one finds dΠ − c0 = dg − dG , i.e. c0 = dΠ + dG − dg =: m0 . Proof of Theorem 4.2. The identity Π (z )G(z ) = G(z )Π (z ) = g (z )I implies ΠĚ (z )GĚ (z ) = GĚ (z )ΠĚ (z ) = z m0 gĚ (z )I so that det ΠdΠ = 0 ⇔ det GdG = 0 ⇔ m0 > 0
(26)
where ΠdΠ , GdG ̸= 0. • (i.1) ⇔ (i.2). See (26). • (i.2) ⇒ (i.3). If m0 > 0 then, see (26), there exists γ0 ̸= 0 such that γ0′ Π (z ) = γ ′ (z ) where 0 ≤ dγ ≤ dΠ − 1. Post-multiplying by G(z ) one finds g (z )γ0′ = γ ′ (z )G(z ); comparing degrees of the l.h.s. and the r.h.s. one has dg ≤ dγ + dG , i.e. dγ ≥ dg − dG = dΠ − m0 . • (i.3) ⇒ (i.4). If Xt ∈ CS(dγ ), then γ0′ ΠdΠ = 0 so that det ΠdΠ = 0; hence, see (26), also det GdG = 0 and there exists γ0 ̸= 0 such that γ0′ G(z ) = γ ′ (z ) where 0 ≤ dγ ≤ dG − 1. Post-multiplying by Π (z ) one finds g (z )γ0′ = γ ′ (z )Π (z ); comparing degrees of the l.h.s. and the r.h.s. one has dg ≤ dγ + dΠ , i.e. dγ ≥ dg − dΠ = dG − m0 . • (i.4) ⇒ (i.1). If Xt ∈ CE(dγ ), then γ0′ GdG = 0 so that det GdG = 0; hence, see (26), also det ΠdΠ = 0. • (ii) By definition, Xt ∈ CD(dγ ) if and only if γ0′ C (z ) = γ ′ (z ), i.e. γ0′ G(z ) = g (z )γ ′ (z ). Comparing degrees of the l.h.s. and the r.h.s. one has dG ≥ dg + dγ which implies dG − dg = m0 − dΠ ≥ dγ ≥ 0. • (iii) m0 > 0 does not imply m0 ≥ dΠ . In Lemma A.1 we present a result that will be used in the proof of Theorem 5.3.
∑d
A Lemma A.1 (Cancelations). Fix w ∈ C, let A(z ) =: n =0 A n ( z − n w) be such that inv A(z ) has a pole of order m at z = w and define aj , bj , Aj,k from the local rank factorization of A(z ) at w , see Definition 5.1; then for 0 ≤ n ≤ m − 1, the following statements are equivalent:
(i) (ii) (iii) (iv)
γ ′ A(z ) = (z − w)n+1 ζ ′ (z ) and ζ (w) = ζ0 has full column rank; γ ′ (A0 : · · · : An ) = 0 and γ ′ An+1 has full row rank; γ ′ (A0,1 : · · · : An,1 ) = 0 and γ ′ An+1,1 has full row rank; γ = an+1⊥ ϕ, γ ′ (A1 b1 : · · · : An bn ) = 0 and γ ′ An+1 has full row rank;
Proof. We show that (i) ⇔ (ii) ⇔ (iii) ⇔ (v) ⇔ (iv). ∑d −n−1 • (i) ⇒ (ii). Let ζ (z ) =: s=A 0 ζs (z − w)s ; equating coefficients ′ of equal powers of z − w in γ A(z ) = (z − w)n+1 ζ ′ (z ), one finds γ ′ Aj = 0 for 0 ≤ j ≤ n and γ ′ An+1+s = ζs′ for s = 0, . . . , dA − n − 1; in particular ζ (w) = ζ0 = γ ′ An+1 has full column rank. • (ii) ⇒ (i). Pre-multiply A(z ) by γ ′ and use γ ′ Aj = 0 for 0 ≤ j ≤ n to show that
γ ′ A(z ) =
dA −
γ ′ Aj (z − w)j
j=0 dA −n−1
= (z − w)n+1
−
γ ′ An+1+s (z − w)s =: (z − w)n+1 ζ ′ (z )
s=0
−n−1
A where ζ (z ) := ζs (z − w)s and ζs′ := γ ′ An+1+s , with s=0 ′ ′ ζ0 := γ An+1 of full rank. • (ii) ⇒ (iii). Pre-multiplying (12) by γ ′ one has
γ ′ Aj+1,k = γ ′ Aj,k+1 + γ ′ Aj,1 bj cj+1,k
GĚ (z ) ) = z dg −dG , inv Π (z −1 ) = g ( z −1 ) gĚ (z ) G(z
(v) γ = an+1⊥ ϕ, γ ′ (A1,1 b1 : · · · : An,1 bn ) = 0 and γ ′ An+1,1 has full row rank.
∑d
(24)
113
(27)
where cj+1,k := (b′j bj )−1 (A′1,k α¯ 0 : · · · : A′j,k α¯ j−1 )′ with initial conditions γ ′ A1,k = γ ′ Ak . Hence γ ′ A1,1 b1 = 0. Next we proceed by induction assuming γ ′ Aj,1 bj = 0 for 0 ≤ j ≤ t and prove it for j = t + 1 where t ≤ n − 1. By (27) for j ≤ t and using the induction assumption, one finds
γ ′ At +1,1 = γ ′ At ,2 = · · · = γ ′ A1,t +1 = γ ′ At +1 , which equals 0 by assumption, so that γ ′ At +1,1 bt +1 = 0. This shows that γ ′ Aj,1 = 0 for j = 0, . . . , n. Repeating the same induction step for t = n, one finds γ ′ An+1,1 = γ ′ An+1 . • (iii) ⇒ (ii). Consider (27) for k = 1. Because γ ′ Aj,1 = 0, 0 ≤ j ≤ n, one has γ ′ Aj,1 bj = 0 and for 0 ≤ j ≤ n + 1, one finds
γ ′ Aj,1 = γ ′ Aj−1,2 + γ ′ Aj−1,1 bj−1 cj,1 = γ ′ Aj−2,3 + γ ′ Aj−2,1 bj−2 cj−1,2 = · · · = γ ′ A1,j = γ ′ Aj .
(28)
Hence γ ′ Aj = 0, 0 ≤ j ≤ n and γ ′ An+1 has full row rank. • (iii) ⇒ (v). From γ ′ Aj,1 = 0 one has γ ′ Aj,1 bj = 0 and γ ′ Aj,1 Mbj = 0 for 0 ≤ j ≤ n. Observe that 0 = γ ′ A0 = −γ ′ α0 β0′ , which implies γ = aℓ−1 ⊥ ϕℓ−1 for ℓ = 2, i.e. γ = Maℓ γ . Substituting γ into γ ′ Aℓ,1 Mbℓ = 0, one finds 0 = γ ′ Aℓ,1 Mbℓ = γ ′ Maℓ Aℓ,1 Mbℓ = −γ ′ αℓ βℓ′ which implies γ = aℓ⊥ ϕℓ . Repeating the recursion for ℓ = 3, . . . , n + 1, one finds γ = an+1⊥ ϕ , where ϕ := ϕn+1 . • (v) ⇒ (iii). Because γ = an+1⊥ ϕ , then γ ′ Aj,1 Mbj = γ ′ Maj Aj,1 Mbj = −ϕ ′ a′n+1⊥ αj βj′ = 0 and by assumption γ ′ Aj,1 Pbj = 0 for 0 ≤ j ≤ n. Summing terms one finds γ ′ Aj,1 = γ ′ Aj,1 Mbj + γ ′ Aj,1 Pbj = 0 for 0 ≤ j ≤ n. • (v) ⇒ (iv) γ ′ Aj,1 bj = 0 for 0 ≤ j ≤ n implies that (28) holds for 0 ≤ ℓ ≤ n + 1, and hence γ ′ Aj bj = 0 for 0 ≤ j ≤ n and γ ′ An+1,1 = γ ′ An+1 is of full row rank. • (iv) ⇒ (v) Proceed as in (ii) ⇒ (iii). Proof of Theorem 5.3. One has Xt ∈ CS(dγ ) if and only if γ0′ Π (z ) = γ ′ (z ), where 0 ≤ dγ ≤ dΠ − 1; mapping z into z −1 , we write the last equation as γ0′ ΠĚ (z ) = z dΠ −dγ γĚ (z ); this fits the assumptions of Lemma A.1 replacing γ , A, w, n respectively with γ0 , ΠĚ , w0 = 0, dΠ − dγ − 1. For the proofs of Theorems 5.4 and 5.5 we apply Corollaries A.2 and A.6; the former is a consequence of Lemma A.1 and the duality result in (14), the latter of Lemmas A.3–A.5.
114
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
∑d
B n Corollary A.2 (Dual Cancelations). Let B(z ) =: n=0 Bn (z − w) be the reduced adjoint of A(z ) and let aj , bj be as in Lemma A.1; then for 0 ≤ n ≤ m − 1, the following statements are equivalent:
(i) (ii) (iii) (iv)
γ ′ B(z ) = (z − w)n+1 ζ ′ (z ) and ζ (w) = ζ0 has full column rank; γ ′ (B0 : · · · : Bn ) = 0 and γ ′ Bn+1 has full row rank; γ ′ (B0,1 : · · · : Bn,1 ) = 0 and γ ′ Bn+1,1 has full row rank; γ = bm−n ϕ, γ ′ (B1 am⊥ : · · · : Bn am−n+1⊥ ) = 0 and γ ′ Bn+1 has
full row rank; (v) γ = bm−n ϕ, γ ′ (B1,1 am⊥ : · · · : Bn,1 am−n+1⊥ ) = 0 and γ ′ Bn+1,1 has full row rank. Proof. Apply Lemma A.1 with A replaced by B and use the duality result in (14). Lemma A.3. Let g (z ) =: (z − w)m h(z ), h0 := h(w) ̸= 0, be the minimal polynomial of A(z ) and let αj , βj , Aj,k and ξj , ηj , Bj,k be respectively defined by the local rank factorizations of A(z ) and its reduced adjoint B(z ) at z = w and fix n ≤ m − 1; then for 1 ≤ j ≤ n, one has
βj′ Bn−j = α¯ j′
n −j −
Aj+1,k Bn−j−k − 1n,m h0 α¯ j′ ,
(29)
k=1
where 1n,m is the indicator function. Moreover, for 0 ≤ j ≤ s − 1 ≤ n − 1 and q := n − s + 1, one has
βj′ Bτ ,2 αm−τ = −h0 α¯ j′ Aj+1,1 β¯ m−τ + α¯ j′ Aj+1,1 bm−τ +1⊥ fτ . Evaluate the above expression for τ = ℓ − 1, pre-multiply by ϕj′ = γ ′ β¯ j and sum over 0 ≤ j ≤ λ to find γ ′ Bℓ−1,2 αm−ℓ+1 =
−h0 γ ′ A1,λ β¯ m−ℓ+1 , where we have used the induction assumption γ ′ A1,λ bm−ℓ+2⊥ = 0. From the counterpart of (28) for B·,· and the duality result in (14) one has γ ′ Bℓ−1,2 = γ ′ Bℓ,1 . This shows that (31) holds for τ = ℓ. Lemma A.5. Fix n ≤ m − 1 and let λ := m − n − 1 and γ = ∑λ bm−n ϕ =: j=0 βj ϕj ; then one has
γ ′ Bτ ,1 αm−τ +χ = 0, γ ′ Aχ ,λ βm−τ +χ
⇔
−
Aj+1,k Bq,s−j−k αm−q + δq,j,s ,
(30)
γ ′ Bℓ+q,1 αm−q =
λ −
Proof. See Franchi and Paruolo (2010).
In Lemmas A.4 and A.5 and Corollary A.6 we translate the conditions on the coefficients of the reduced adjoint B(z ) in Corollary A.2 into conditions on coefficients of the original polynomial A(z ). Lemma A.4. Fix n ≤ m − 1 and let λ := m − n − 1 and γ = ∑λ bm−n ϕ =: j=0 βj ϕj ; then one has 1 ≤ τ ≤ n,
⇔ γ A1,λ βm−τ +1 = 0, ′
1 ≤ τ ≤ n,
(31)
βj B1 = α¯ j Aj+1,1 B0 ;
(32)
next substitute B0 = −h0 β¯ m α¯ m , see (14), pre- and post-multiply (32) by ϕj′ = γ ′ β¯ j and αm respectively to find ϕj′ βj′ B1 αm = ′
−h0 γ ′ β¯ j α¯ j′ Aj+1,1 β¯ m ; summing over j one finds that (31) holds for τ = 1 because B1 = B1,1 . Next we assume that (31) holds for τ = 1, . . . , ℓ − 1 and prove if for τ = ℓ. For s − j = 1, (30) gives βj′ Bτ ,1 αm−τ = δτ ,j,j+1 , ′ ′ i.e. βm −τ Bτ ,1 αm−τ = −h0 Irm−τ and βj Bτ ,1 αm−τ = 0 for 0 ≤ j ≤ m − τ − 1. Hence −
β¯ j βj′ Bτ ,1 αm−τ = −h0 β¯ m−τ + bm−τ +1⊥ fτ
(35)
∑λ
ϕj′ α¯ j′
∑ℓ
k=1 Aj+1,k Bq,ℓ+1−k αm−q ; hence (35) holds for ω = 1 by (30). Next we show that if (35) holds for 1 ≤ ω ≤ ℓ − 1 ∑ℓ−ω+1 then (35) holds for ω + 1 ≤ ℓ; write k=1 Aω,j+1,k Bq,ℓ+2−ω−k = j =0
∑ℓ−ω
Aω,j+1,1 Bq,ℓ+1−ω + k=1 Aω,j+1,k+1 Bq,ℓ+1−ω−k so that (35) becomes γ ′ Bℓ+q,1 αm−q =: F + R (say), where F :=
λ −
ϕj′ α¯ j′ Aω,j+1,1 Bq,ℓ+1−ω αm−q = γ ′ Aω,λ Bq,ℓ+1−ω αm−q
and R :=
λ −
ϕj′ α¯ j′
j=0
ℓ−ω −
Aω,j+1,k+1 Bq,ℓ+1−ω−k αm−q .
k=1
= 0 for By the induction assumption one has γ ′ A b ∑λ+ωω,λ m−n+ω⊥ 1 ≤ ω ≤ ℓ − 1; hence F = γ ′ Aω,λ h=0 β¯ h βh′ Bq,ℓ+1−ω αm−q . Letting s − j = ℓ + 1 − ω in (30) one finds βj′ Bq,ℓ+1−ω αm−q = α¯ j′
∑ℓ−ω k=1
F =
λ −
Aj+1,k Bq,ℓ+1−ω−k αm−q , so that
ϕj′ α¯ j′ Aω,j+1,1
j =0
λ+ω − h=0
β¯ h α¯ h′
ℓ−ω −
Ah+1,k Bq,ℓ+1−ω−k αm−q .
k=1
Summing F + R and re-arranging terms one finds
γ ′ Bℓ+q,1 αm−q =
λ −
ϕj′ α¯ j′
j =0
(33)
j =0
where fτ := b¯ ′m−τ +1⊥ Bτ ,1 αm−τ . Eq. (30) for 2 ≤ j + 2 = s ≤ n ≤ m − 1, gives βj′ Bτ ,2 αm−τ = α¯ j′ Aj+1,1 Bτ ,1 αm−τ which implies, using (33),
Aω,j+1,k Bq,ℓ+2−ω−k αm−q ,
k=1
where Aω,j,k is as in Definition 5.2. We first show that (35) holds for ω = 1; let s−j−1 = ℓ in (30), pre-multiply by ϕj′ , sum over 0 ≤ j ≤ λ and use γ ′ Bq,ℓ+1 = γ ′ Bq+ℓ,1 , which follows from the counterpart of (28) under the induction assumption, to find γ ′ Bq+ℓ,1 αm−q =
m
Bτ ,1 αm−τ =
ℓ−ω+ −1
j =0
Proof. First we observe that γ ′ A1,λ βm−τ +1 = 0, 1 ≤ τ ≤ n, in (31) can be written as γ ′ A1,λ bm−n+1⊥ = 0. We proceed by induction, first showing that (31) holds for τ = 1; then we assume (31) for 1 ≤ τ ≤ ℓ − 1 and show that this implies that it holds for τ = ℓ ≤ n. Let n ≤ m − 1, n − j = 1 in (29) to get ′
(34)
1 ≤ ω ≤ ℓ,
where Ai,λ is as in Definition 5.2.
′
ϕj′ α¯ j′
j =0
where δm−j,j,j+1 := −h0 Irj and δq,j,s := 0 for all other combinations of the 3 indices.
1 ≤ χ ≤ τ ≤ n,
Proof. First we observe that γ ′ Aχ ,λ βm−τ +χ = 0, χ ≤ τ ≤ n, in (34) can be written as γ ′ Aχ ,λ bm−n+χ ⊥ = 0. We proceed by induction, first showing that (34) holds for χ = 1; then we assume (34) for 1 ≤ χ ≤ ℓ − 1 and show that this implies that it holds for χ = ℓ ≤ τ. Letting χ = 1 in (34) one has (31), so that (34) holds for χ = 1 by Lemma A.4. Assume (34) for 1 ≤ χ ≤ ℓ − 1; we wish to show that under the induction assumption, one has
k=1
γ ′ Bτ ,1 αm−τ +1 = 0,
= 0,
where Ai,λ is as in Definition 5.2.
s−j−1
βj′ Bq,s−j αm−q = α¯ j′
1 ≤ χ ≤ τ ≤ n,
×
ℓ−ω − k=1
Aω,j+1,k+1 + Aω,j+1,1
λ+ω −
β¯ h α¯ h Ah+1,k Bq,ℓ+1−ω−k αm−q , ′
h =0
that is (35) for ω + 1, using the definition (15) of Aω+1,j+1,k . This completes the proof of (35).
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
Setting τ = ℓ + q and ω = ℓ in (35) one has
γ ′ Bτ ,1 αm−τ +ℓ =
λ −
ϕj′ α¯ j′ Aℓ,j+1,1 Bτ −ℓ,1 αm−τ +ℓ
j=0
= γ ′ Aℓ,λ Bτ −ℓ,1 αm−τ +ℓ because ϕj′ = γ ′ β¯ j and substituting Bτ −ℓ,1 αm−τ +ℓ = −h0 β¯ m−τ +ℓ + bm−τ +ℓ+1⊥ fτ −ℓ from (33), one finds
γ ′ Bτ ,1 αm−τ +ℓ = −h0 γ ′ Aℓ,λ β¯ m−τ +ℓ + γ ′ Aℓ,λ bm−τ +ℓ+1⊥ fτ −ℓ .
(36)
Finally we show that (34) holds for χ = ℓ, reported below for convenience
γ ′ Bτ ,1 αm−τ +ℓ = 0,
ℓ ≤ τ ≤ n,
⇔ γ ′ Aℓ,λ βm−τ +ℓ = 0,
ℓ ≤ τ ≤ n.
(37)
Evaluate (36) for τ = ℓ to get γ ′ Bℓ,1 αm = −h0 γ ′ Aℓ,λ β¯ m ; this shows that (37) holds for τ = ℓ. Next we assume that (37) holds for ℓ ≤ τ ≤ θ − 1 and show that it holds for τ = θ ≤ n. Evaluating (36) for τ = θ , one has
γ ′ Bθ ,1 αm−θ+ℓ = −h0 γ ′ Aℓ,λ β¯ m−θ+ℓ + γ ′ Aℓ,λ bm−θ +ℓ+1⊥ fθ−ℓ ′ ¯ = −h0 γ Aℓ,λ βm−θ+ℓ ,
Corollary A.6. Fix n ≤ m − 1 and let λ := m − n − 1 and ∑ γ = bm−n ϕ =: λj=0 βj ϕj ; then one has
γ ′ (B1,1 am⊥ : · · · : Bn,1 am−n+1⊥ ) = 0 1 ≤ χ ≤ n.
readily follows.
Next we report a Lagrange–Hermite approximation formula,6 see Johansen (1931). We use here the shorthand F [s] (z ) = ∂s F (z )/∂ z s . Lemma A.8 (Lagrange–Hermite Approximation). Let F (z ), z ∈ C, be holomorphic for z ∈ U ⊆ C, and consider n distinct points y1 , . . . , yn in U; then one has F (z ) = D(z ) + V (z ), where D(z ) is a polynomial such that for 0 ≤ s ≤ ℓu − 1 and and V (z ) is a remainder. 1 ≤ u ≤ n one has D[s] (yu ) = F [s] (yu )∏ n ℓu Specifically, V (z ) := a(z )S (z ), a(z ) := u=1 (z − yu ) , D(z ) :=
∑ℓ
u s a(z ) u=0 Du (z )(z −wu )−ℓu , Du (z ) := s=0 Du,s (z − yu ) , Du,s := 1 [s] K (yu ), Ku (z ) := F (z )/au (z ), au (z ) := a(z )/(z − yu )ℓu . The s! u ∑n degrees of a(z ) and D(z ) are respectively da = u=1 ℓu and dD = da − 1, and 1/a(z ) and S (z ) are co-prime.
−1
Proof of Theorem 5.5. • (i) ⇔ (ii). One has Xt ∈ CD(dγ ) if and G(z )
only if γ0′ g (z ) = γ ′ (z ), z ∈ C, where 0 ≤ dγ ≤ m0 − dΠ and γdγ
has full column rank. We map z into z −1 , rearrange terms and write the last equation as
(38)
γ0′ GĚ (z ) = z m0 −dΠ −dγ gĚ (z )γĚ′ (z ).
(39)
Next we apply Lemma A.8 letting F (z ) := GĚ (z ), a(z ) := z m0 −dΠ −dγ gĚ (z ), so that ℓ0 := m0 − dΠ − dγ and ℓu := mu , u = 1, . . . , q; this gives
Moreover, under (38),
γ ′ Bn+1,1 = γ ′ Hm−n Dn+1 a¯ ′λ⊥ ,
(Im v)′ (Re A : Im A) = 0. One can choose to represent the intersection of the left null spaces of col A and col A∗ , i.e. V := {v ∈ Cp : v ′ A = v ′ A∗ = 0} through the real left null space of col(Re A : Im A), namely U := {u ∈ Rp : u′ (Re A : Im A) = 0}, in the sense that U = V ∩ Rp and V = U ⊕ iU. Proof. Because v ′ A = v ′ A∗ = 0 one has 0 = v ′ 12 (A + A∗ ) = v ′ Re A and 0 = v ′ 2i (−A + A∗ ) = v ′ Im A; this implies Re v ′ (Re A : Im A) = Im v ′ (Re A : Im A) = 0. Reading these implications in reverse order one obtains the reverse implications. The relation V = U ⊕ iU
∑q
because by the induction assumption on τ one has γ ′ Aℓ,λ βm−τ +ℓ = 0, for ℓ ≤ τ ≤ θ − 1, i.e. γ ′ Aℓ,λ bm−θ+ℓ+1⊥ = 0. This shows that (37) holds for τ = θ and completes the proof.
⇔ γ ′ Aχ,λ bm−n+χ⊥ = 0,
115
where Hj := (βj−1 : A1,j−1 βj : · · · : Am−j+1,j−1 βm ), Dj := −diag(qm−j , h0 qm−j+1 , . . . , h0 qm ) and qn := (βn′ βn )−1 , has full rank if and only if γ ′ Hm−n has full row rank. Proof. Eqs. (38), (39) hold by Lemma A.5. In order to prove (38), note that γ ′ (B1,1 am⊥ : · · · : Bn,1 am−n+1⊥ ) = 0 can be written as γ ′ Bτ ,1 αm−τ +χ = 0 for 1 ≤ χ ≤ τ ≤ n. In order to prove (39), write γ ′ Bn+1 = γ ′ Bn+1 Pam−n + γ ′ Bn+1 Mam−n =: F + R (say). The duality result in (14) together with the counterpart of (13) for the reduced adjoint B(z ) imply F = −γ ′ β¯ λ α¯ λ′ and using the equality γ ′ Bτ ,1 αm−τ +χ = −h0 γ ′ Aχ,λ β¯ m−τ +χ in Lemma A.5 one finds R = −h0 γ ′ ( A1,λ β¯ m−n : · · · : An+1,λ β¯ m )¯a′m−n⊥ . Hence ′ ′ F + R = γ Hm−n Dn+1 a¯ λ⊥ and the last statement follows from Dn+1 a¯ ′λ⊥ of full row rank. Proof of Theorem 5.4. One has Xt ∈ CE(dγ ) if and only if γ0′ G(z ) = γ ′ (z ), where max(0, dG − m0 ) ≤ dγ ≤ dΠ − 1 and γ (z ) has full column rank. We map z into z −1 and write the last equation as γ0′ GĚ (z ) = z dG −dγ ζ ′ (z ), where ζ (z ) := γĚ (z ). This fits the assumptions of Corollary A.2 replacing γ , B, w, n respectively with γ0 , GĚ , w0 = 0, dG − dγ − 1. This proves (i)–(iii.1); in order to prove (iii.2) apply Corollary A.6 replacing n with dG − dγ − 1. In Lemma A.8 we present a Lagrange–Hermite approximation formula that will be used in the proof of Theorem 5.5. Before that, Lemma A.7 states that the left null space of a pair of complex conjugate matrices contains real vectors. Lemma A.7 (Real Left Null Space). Let A ∈ Cp×p ; then v ′ A = v ′ A∗ = 0 for some v ∈ Cp if and only if (Re v)′ (Re A : Im A) =
GĚ (z ) = a(z )
q −
Du (z )
u =0
(z − wu )ℓu
(40)
+ S (z ) ;
(41)
pre-multiplying both sides of (41) by γ0′ , substituting (40) in the l.h.s. and simplifying a(z ) := z m0 −dΠ −dγ gĚ (z ), one finds γĚ′ (z ) =
γ0′
∑q
Du (z )(z − wu )−ℓu + γ0′ S (z ). Because γĚ (z ) is a polynomial, ∑q this implies γ0′ u=0 Du (z )(z − wu )−ℓu = 0, i.e. γĚ′ (z ) = γ0′ S (z ). Next we wish to show that u =0
q −
Du (z ) = 0 ⇔ γ0′ (G(0u) : · · · : G(ℓuu)−1 ) = 0, (42) ℓu ( z − w ) u u=0 ∑dG (u) ∑ℓu −1 n s where GĚ (z ) =: n=0 Gn (z − wu ) , Du (z ) = s=0 Du,s (z − wu ) , ∑s [j] (u) Du,s = j=0 bu,j Gs−j , bu,j := bu (wu )/j! is a scalar and bu (z ) := 1/au (z ); note that bu,0 ̸= 0 because au (wu ) ̸= 0. Hence one has
γ0′
γ0′ Du,s = bu,0 γ0′ G(su) +
s −
(u)
bu,j γ0′ Gs−j ;
(43)
j=1
(u)
for s = 0, this implies γ0′ Du,0 = 0 if and only if γ0′ G0 = 0, because bu,0 ̸= 0. Next we proceed by induction, assuming γ0′ Du,s = 0 =
γ0′ G(su) for 0 ≤ s ≤ t − 1 and showing that the same holds for s = t, where 1 ≤ t ≤ ℓu − 1. For s = t, (43) and the induction 6 See Johansen (2009b) for the application of a particular case of the same formula for the derivation of the error correction representation of VAR processes.
116
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117
(u)
assumption imply γ0′ Du,t = bu,0 γ0′ Gt ; this proves that γ0′ (Du,0 : (u)
· · · : Du,ℓu −1 ) = 0 if and only if γ0 (G0 : · · · : G(ℓuu)−1 ) = 0, i.e. (42) because γ0′ (Du,0 : · · · : Du,ℓu −1 ) = 0 if and only if γ0′ Du (z ) = 0. ∑dG (u) n Finally note that Gj = Gu,dG −j , where G(z ) = n=0 Gu,n (z − zu ) , (u) and because D[ℓu ] (wu ) = 0, S (wu ) = Gℓu . • (ii) ⇔ (iii.1). The result is found by applying Corollary A.2 replacing γ , B, w, n respectively with vu , GĚ , wu , ℓu − 1, for u = 0, . . . , q, where ℓ0 := m0 − dΠ − dγ and ℓu := mu , u ̸= 0; this (u) (u) gives vu′ (G0 : · · · : Gℓu −1 ) = 0 if and only if ′
vu = bu,mu −ℓu +1 ϕu where ϕu′ b′u,mu −ℓu +1 (G(1u) au,mu ⊥ : · · · : G(ℓuu)−1 au,mu −ℓu +2⊥ ) = 0. (44) Because γ0 must satisfy (44) for u = 0, . . . , q then q it must be a q basis of u=0 col vu . Conversely, if γ0 is basis of u=0 col vu , then it satisfies (44) for u = 0, . . . , q. Hence γ0 = bu,su ϕu ; because γ0 belongs to the left null space of pairs of complex conjugate matrices, Lemma A.7 applies and one can choose γ0 to be real. • (iii.1) ⇔ (iii.2). The result is found by applying Corollary A.6 at wu , u = 0, . . . , q, replacing n with ℓu −1, where ℓ0 := m0 −dΠ −dγ and ℓu := mu , u ̸= 0. Proof of Proposition 7.1. Consider B := (β0 : c1 )′ , D(z ) := diag(Ir , (1 − z )Ip−r ), Xt := D(L)BYt and observe that B−1 = (c0 (β0′ c0 )−1 : β1 (c1′ β1 )−1 ). Pre-multiply (21) by B to obtain A(L)BYt = But where A(z ) := BA(z )B−1 with
A(z ) = (BΓ (z )c0 (β0′ c0 )−1 (1 − z ) − Bα0 z : BΓ (z )β1 (c1′ β1 )−1 )D(z ) =: Π (z )D(z ).
(45)
This shows that Π (L)Xt = εt with Π (z ) defined in (45). One sees that the last p − r columns of Π (z ) have degree dΓ = dΠ − 1, which implies ΠdΠ = (ΠdΠ ,0 : 0p×(p−r ) ). By properties of the determinant one sees that det A(z ) = det Π (z ) det D(z ), so that the roots of det A(z ) = det B det A(z ) det B−1 = det A(z ) include the roots of det Π (z ) plus the roots of det D(z ), which are p − r roots at z = 1. Proof of Corollary 7.2. Let ψ = α1 ζ⊥ be p × s. Pre-multiply α1′ (I + A1,1 )β1 = ζ ρ ′ by ζ⊥′ to find ψ ′ β1 = −ζ⊥′ α1′ A1,1 β1 . By the assumption rank(α1′ A1,1 β1 ) = p − r0 , one deduces ψ ′ β1 has full row rank s. Hence one can choose c1 = (ψ : ϑ) in Proposition 7.1, where ϑ is p × r1 − s, linearly independent of ψ and such that c1′ β1 is square and nonsingular. Proof of Proposition 7.3. We first introduce notation. Let c := (c0 : c1 ) := c2⊥ , B := (τ : c2 )′ and observe that R := (R0 : R1 : R2 ) := B−1 = (c (τ ′ c )−1 : β2 (c2′ β2 )−1 ). Rewrite A(z ) as A(z ) = Υ (z )(1 − z )2 + Γ (1 − z )z − α0 β0′ z, with Γ := Γ (1), and note that I = A(0) = Υ (0). Using the error correction representation in Paruolo and Rahbek (1999) one has Γ = (−ξ1 : −ξ2 : α0 µ′ c2′ β¯ 2 )(β0 : β1 : β2 )′ , and hence Γ B−1 = (Γ c (τ ′ c )−1 : α0 µ′ ) =: (φ0 : φ1 : α0 µ′ ). Also let Ψi (z ) := BΥ (z )Ri , i = 0, 1, 2. Pre-multiply (21) by B to obtain A(L)BYt = But where A(z ) := ( A0 (z ) : A1 (z ) : A2 (z )) := BA(z )B−1 with A0 (z ) = Ψ0 (z )(1 − z )2 + Bφ0 (1 − z )z − Bα0 z , A1 (z ) = Ψ1 (z )(1 − z )2 + Bφ1 (1 − z )z , A2 (z ) = Ψ2 (z )(1 − z )2 + Bα0 µ′ (1 − z )z. Collecting (1 − z ) from the last block of r1 + r2 columns, one finds A(z ) = A◦ (z )D1 (z ), with D1 (z ) := diag(Ir0 , (1 − z )Ir1 +r2 ) and A◦ (z ) =: (A◦0 (z ) : A◦1 (z ) : A◦2 (z )), where A◦0 (z ) := A0 (z ), A◦1 (z ) := Ψ1 (z )(1 − z ) + Bφ1 z , A◦2 (z ) := Ψ2 (z )(1 − z ) + Bα0 µ′ z. Also let
−µ′
Ir 0
N :=
,
Ir 1 Ir 2
with N
−1
=
µ′
Ir 0 Ir1
Ir2
,
where we have omitted zero entries, and observe that one can collect (1 − z ) from the last set of r2 columns of A◦ (z )N −1 , i.e. one can write A◦ (z ) = A◦ (z )N −1 N = A◦◦ (z )D2 (z )N with ◦◦ D2 (z ) := diag(Ir0 +r1 , (1 − z )Ir2 ) with A◦◦ (z ) =: (A◦◦ 0 (z ) : A1 (z ) : ◦◦ ◦◦ ◦◦ ◦ ◦◦ A2 (z )), A0 (z ) = A0 (z ), A1 (z ) = A1 (z ) and A2 (z ) = Ψ0 (z )µ′ (1 − z ) + Bφ0 µ′ z + Ψ2 (z ). Finally set Π (L) := NA◦◦ (L), and Xt = D2 (L)ND1 (L)BYt , and observe that the above implies that they satisfy Π (L)Xt = Qut , where Q := NB = (β0 − b2 µ : τ1 : b2 )′ . Because Υ (0) = I, one finds Π (0) = (NBR0 : BR1 : BR0 µ′ + BR2 ) = NBRN −1 = I. By the above, one also has A(z ) = B−1 N −1 Π (z )D2 (z )ND1 (z )B; because B and N are nonsingular matrices, one finds det A(z ) = det Π (z ) det D1 (z ) det D2 (z ) = det Π (z )(1 − z )2r2 +r1 and hence the roots of A(z ) include the roots of det Π (z ) plus 2r2 + r1 = 2p − 2r0 − r1 roots at z = 1. The restrictions on ΠdΠ and ΠdΠ −1 are found by inspection of Π (z ) := NA◦◦ (z ). Proof of Corollary 7.4. Same proof as Corollary 7.2 replacing c1 , α1 , r1 , A1,1 with c2 , α2 , r2 , A2,1 . References Ahn, S., Reinsel, C., 1988. Nested reduced rank autoregressive models for multiple time series. Journal of the American Statistical Association 83, 849–856. Brockwell, P., Davis, R., 1987. Time Series: Theory and Methods. Springer-Verlag, New York. Campbell, J.Y., Mankiw, N.G., 1990. Permanent income, current income, and consumption. Journal of Business & Economic Statistics 8 (3), 265–279. Campbell, J.Y., Shiller, R.J., 1987. Cointegration and tests of present value models. The Journal of Political Economy 95, 1062–1088. Cavaliere, G., Fanelli, L., Gardini, A., 2008. International dynamic risk sharing. Journal of Applied Econometrics 23, 1–16. Cubadda, G., Hecq, A., 2001. On non-contemporaneous short-run comovements. Economics Letters 73, 389–397. Cubadda, G., Hecq, A., Palm, F., 2009. Studying co-movements in large multivariate models without multivariate modelling. Journal of Econometrics 148, 25–35. Engle, R., Granger, C., 1987. Co-integration and error correction: representation, estimation, and testing. Econometrica 55, 251–276. Engle, R., Kozicki, S., 1993. Testing for common features. Journal of Business and Economic Statistics 11, 369–395. Franchi, M., 2007. The integration order of vector autoregressive processes. Econometric Theory 23 (3), 546–553. Franchi, M., 2010. A representation theory for polynomial cofractionality in vector autoregressive models. Econometric Theory 26, 1201–1217. Franchi, M., Paruolo, P., 2010. Inversion of regular analytic matrix functions: local Smith form and subspace duality. WP Facoltà di Economia 2010/08, Università dell’Insubria, Varese, Italy; available at http://www.uninsubria.it/ pls/uninsubria/consultazione.mostra_pagina?id_pagina=10331. Fuller, W.A., 1996. Introduction to Statistical Time Series. John Wiley & Sons, New York. Gantmacher, F., 1959. The Theory of Matrices, vol. I. AMS Chelsea Publishing, New York. Gourieroux, C., Peaucelle, I., 1988. Detecting a long run relationship (with application to the P.P.P. hypothesis). CEPREMAP. Working Paper 8902. Greene, R., Krantz, S., 1997. Function Theory of One Complex Variable. John Wiley & Sons, New York. Hall, R.E., 1978. Stochastic implications of the life cycle-permanent income hypothesis: theory and evidence. The Journal of Political Economy 86 (6), 971–987. Hecq, A., Palm, F., Urbain, J., 2002. Separation, weak exogeneity and P–T decompositions in cointegrated VAR systems with common features. Econometric Reviews 21 (3), 273–307. Hecq, A., Palm, F., Urbain, J., 2006. Common cyclical features analysis in VAR models with cointegration. Journal of Econometrics 132, 117–141. Johansen, P., 1931. Über osculierende interpolation. Skand. Act. 14, 231–237. Johansen, S., 1992. A representation of vector autoregressive processes integrated of order 2. Econometric Theory 8, 188–202. Johansen, S., 1996. Likelihood-Based Inference in Cointegrated Vector AutoRegressive Models. Oxford University Press. Johansen, S., 2009a. Cointegration. Overview and development. In: Andersen, T.G., Davis, R., Kreiss, J.-P., Mikosch, T. (Eds.), Handbook of Financial Time Series. Springer, New York, pp. 671–693. Johansen, S., 2009b. Representation of cointegrated autoregressive process with application to fractional process. Econometric Reviews 28, 121–145.
M. Franchi, P. Paruolo / Journal of Econometrics 163 (2011) 105–117 King, R.G., Plosser, C.I., Stock, J.H., Watson, M.W., 1991. Stochastic trends and economic fluctuations. American Economic Review 81 (4), 819–840. Kugler, P., Neusser, K., 1993. International real interest rate equalization: a multivariate time-series approach. Journal of Applied Econometrics 8 (2), 163–174. Lippi, M., Reichlin, L., 1994. Common and uncommon trends and cycles. European Economic Review 38, 624–635. Mellander, E., Vredin, A., Warne, A., 1992. Stochastic trends and economic fluctuations in a small open economy. Journal of Applied Econometrics 7 (4), 369–394. Paruolo, P., 2003. Common dynamics in I(1) systems. Discussion Paper. WP Facoltà di Economia 2003/33, Università dell’Insubria, Varese, Italy. Invited Paper at the Conference ‘Common features in Maastricht’, Maastricht 14–16 December. Paruolo, P., 2004. An I(2) model for VAR(1) processes—problem 04.3.1. Econometric Theory 20, 639–640. Paruolo, P., 2005. An I(2) model for VAR(1) processes—solution to problem 04.3.1. Econometric Theory 21, 665–666.
117
Paruolo, P., 2006. Common trends and cycles in I(2) VAR systems. Journal of Econometrics 132, 143–168. Paruolo, P., Rahbek, A., 1999. Weak exogeneity in I(2) VAR systems. Journal of Econometrics 93 (2), 281–308. Reinsel, C., 1983. Some results on multivariate autoregressive index models. Biometrika 70, 145–156. Schleicher, C., 2007. Codependence in cointegrated autoregressive models. Journal of Applied Econometrics 22, 137–159. Tiao, G., Tsay, R., 1989. Model specification in multivariate time series (with discussion). Journal of the Royal Statistical Society, Series B 51, 157–213. Vahid, F., Engle, R., 1993. Common trends and common cycles. Journal of Applied Econometrics 8, 341–360. Vahid, F., Engle, R., 1997. Codependent cycles. Journal of Econometrics 80, 199–221. Vahid, F., Issler, J., 2002. The importance of common cyclical features in VAR analysis: a Monte Carlo study. Journal of Econometrics 109, 341–363. Zellner, A., Palm, F., 1974. Time series analysis and simultaneous equations econometric models. Journal of Econometrics 2, 17–54.
Journal of Econometrics 163 (2011) 118–126
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Method of moments estimation of GO-GARCH models H. Peter Boswijk a,∗ , Roy van der Weide a,b a
University of Amsterdam, Netherlands
b
World Bank, Washington DC, USA
article
info
Article history: Available online 13 November 2010 JEL classification: C32 Keywords: Multivariate GARCH Factor models Method of moments Common principal components
abstract We propose a new estimation method for the factor loading matrix in generalized orthogonal GARCH (GOGARCH) models. The method is based on eigenvectors of suitably defined sample autocorrelation matrices of squares and cross-products of returns. The method is numerically more attractive than likelihoodbased estimation. Furthermore, the new method does not require strict assumptions on the volatility models of the factors, and therefore is less sensitive to model misspecification. We provide conditions for consistency of the estimator, and study its efficiency relative to maximum likelihood estimation using Monte Carlo simulations. The method is applied to European sector returns. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The GO-GARCH model was proposed by van der Weide (2002), as a generalization of the orthogonal GARCH model of Ding (1994) and Alexander (2001). The starting point of the model is that an observed vector of returns can be expressed as a non-singular linear transformation of latent factors (either independent or conditionally uncorrelated) that have a GARCH-type conditional variance specification. A restricted version of the model where only a subset of the latent factors has a time-varying conditional variance has been analyzed recently by Lanne and Saikkonen (2007). This shows that a parsimonious version of the factor GARCH model by e.g. Diebold and Nerlove (1989) and Engle et al. (1990) is nested as a special case (where the variance matrix of the idiosyncratic error term will not be of full rank). The closely related model proposed by Vrontos et al. (2003) is also nested as a special case by imposing structure on the linear transformation. Recently, Fan et al. (2008) studied a general version of the model by relaxing the assumption of independent factors to conditionally uncorrelated factors. Note that one has considerable flexibility in specifying models for the factors. One could in principle also consider stochastic volatility as an alternative to the GARCH-type models (see e.g. Doz and Renault (2006)). For surveys on multivariate volatility models we refer to e.g. Bauwens et al. (2006) and Silvennoinen and
∗ Corresponding address: Department of Quantitative Economics, Universiteit van Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, Netherlands. Tel.: +31 20 5254316. E-mail address:
[email protected] (H. Peter Boswijk). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.011
Teräsvirta (2009). For an overview on common features, see Urga (2007), and for a glossary to volatility models, see Bollerslev (2008). The model is designed to balance generality against ease of estimation. Time-varying variances, time-varying correlations and (asymmetric) volatility spillovers are accommodated, which denote the key stylized facts of multivariate financial data. Moreover, the model is closed under linear transformations, i.e., a vector of portfolio returns satisfies the same model, albeit with other parameter values, as the original vector of asset returns. The model is also closed under temporal aggregation; see Hafner (2008). van der Weide (2002) proposed a two-step estimation method that requires joint maximum likelihood (ML) estimation of parameters that feature both in the linear transformation (between factors and observed data) and in the univariate GARCH specifications for the individual factors. Not all parameters of the linear transformation need to be estimated by ML, more than half can be identified from the spectral decomposition of the unconditional variance matrix. While the method works well, and numerical optimization of the likelihood function often converges without difficulties for dimensions up to fifteen, maximum likelihood can become problematic when the dimension is particularly large and/or when the model used to specify the likelihood function is considerably misspecified. This paper puts forward a three-step estimation method that is easy to implement and is numerically attractive. The first two steps define a method of moments (MM) estimator for the linear transformation that does not require any Newtontype optimization of an objective function, but instead only involves iterated matrix rotations, so that the method is free of numerical convergence problems regardless of the dimension. We identify the linear transformation by using the fact that the latent
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
factors are heteroskedastic. All that is assumed is that the factors exhibit persistence in variance and have finite fourth moments. Note that the idea of identification through heteroskedasticity in simultaneous equation models is not new; see e.g. Sentana and Fiorentini (2001) and Rigobon (2003). The third and final step involves estimation of the univariate GARCH-type models for each of the factors. An obvious application of multivariate GARCH models involves forecasting the conditional variance matrix for the purpose of optimal portfolio selection, hedging and risk management, and option pricing. Naturally, the models may also be used to examine patterns in conditional correlations and volatilities over time. Does volatility in one market spill over to other markets? Are correlations increasing or decreasing, i.e., are markets moving closer together over time, and do correlations jump up in periods of extreme volatility (such as in a financial crisis)? In a modest empirical example in this paper we examine the correlation over time between returns on European sector indices, and find that the degree of comovement between volatility and correlation depends on the general state of the economy. The outline of the paper is as follows. In Section 2, we formulate the GO-GARCH model, and discuss the currently available estimation methods. Section 3 introduces our method of moments estimator, and discusses how information from autocorrelation matrices at different lags may be efficiently combined. In Section 4 we use Monte Carlo simulations to study the efficiency of our estimator relative to (quasi-) maximum likelihood in low-dimensional systems. Section 5 contains an empirical application to a vector of ten European industry returns. Section 6 concludes. 2. The GO-GARCH model 2.1. Model and assumptions Consider an m-vector time series {xt }t ≥1 , representing a vector of (daily) returns on m different assets. Letting {Ft }t ≥0 denote the filtration generated by {xt }t ≥1 , we assume that any possibly non-zero conditional mean has been subtracted from xt , so that E (xt |Ft −1 ) = 0. The GO-GARCH model imposes a structure on the conditional variance matrix Σt = var(xt |Ft −1 ) = E (xt x′t |Ft −1 ), implied by: Assumption 1. The process {xt }t ≥1 satisfies the representation 1/2
xt = Zyt = ZHt
εt ,
(1)
Ht = diag(h1t , . . . , hmt ),
(2)
where Z is an m × m non-singular matrix, where {{hit }t ≥1 , i = 1, . . . , m} are positive, {Ft −1 }-adapted processes with E (hit ) = 1, and where {εt }t ≥1 is a vector martingale difference sequence, with E (εt |Ft −1 ) = 0 and var(εt |Ft −1 ) = Im . The model implies that the observed vector of returns xt can be written as a non-singular transformation of a latent vector process yt (of the same dimension m), the components yit of which satisfy E (yit |Ft −1 ) = 0,
var(yit |Ft −1 ) = hit ,
cov(yit , yjt |Ft −1 ) = 0,
i ̸= j = 1, . . . , m,
i.e., the components of yt are conditionally uncorrelated. The original formulation of the GO-GARCH model involved the stronger assumption of independence of the components of yt , but for the methods presented in the present paper, the conditional uncorrelatedness assumption (proposed by Fan et al. (2008)) suffices. The assumptions also imply that yt is a covariance stationary process with mean 0 and unconditional variance E (Ht ) = Im . This in turn
119
implies that xt is covariance stationary with (conditional) mean zero, conditional variance
Σt = var(xt |Ft −1 ) = ZHt Z ′ ,
(3)
and unconditional variance
Σ = var(xt ) = ZZ ′ .
(4)
The conditional variances hit are assumed to follow a GARCHtype structure. One possibility, as considered by van der Weide (2002), is to assume separate univariate GARCH(1, 1) specifications hit = (1 − αi − βi ) + αi y2i,t −1 + βi hi,t −1 ,
αi , βi ≥ 0,
αi + βi < 1,
(5)
which, under a suitable starting value assumption on hi0 , implies independence of the components yit . Fan et al. (2008) propose a more flexible structure, where hit may depend on yj,t −k , j ̸= i, k ≥ 1. A simple extension of (5) is their extended GARCH(1, 1) specification:
hit =
1−
m −
αij − βi +
j =1
m −
αij y2j,t −1 + βi hi,t −1 ,
j =1
m
αij , βi ≥ 0,
−
αij + βi < 1.
(6)
j =1
Intermediate versions, where some of the αij , j ̸= i are restricted to zero, can also be considered. It should be emphasized that Assumption 1, as well as the estimation methods proposed in this paper, also allow for various other specifications of the conditional variance process, including leverage effects and models formulated in terms of log-volatilities. The assumption also allows for stochastic volatility, as long as the projection hit of the latent stochastic volatility process of yit on the observed price history Ft −1 is correlated with lagged squares y2i,t −k . Consider the polar decomposition of Z : Z = SU ,
(7)
where S is a positive definite symmetric matrix, and U is an orthogonal matrix. Using (4), it follows that Σ = S 2 , so that S is the symmetric square root of Σ , i.e., S = PL1/2 P ′ , where PLP ′ is the spectral decomposition of Σ . This implies that part of the matrix Z may be identified from the unconditional information Σ = var(xt ), and the problem of estimating Z may be reduced to the problem of identifying the orthogonal matrix U from the conditional information. In other words, defining the (unconditionally) standardized returns st = Σ −1/2 xt = S −1 xt , we find that st follows a GO-GARCH specification st = Uyt with an orthogonal link matrix U. Note that van der Weide (2002) and Boswijk and van der Weide (2006) consider, instead of (7), the singular value decomposition Z = PL1/2 U ∗ , where U ∗ = P ′ U is another orthogonal matrix. This leads to analyzing the standardized principal components s∗t = L−1/2 P ′ xt , satisfying s∗t = U ∗ yt . Here we follow Lanne and Saikkonen (2007) in using the polar decomposition, which circumvents identification problems that arise when Σ has eigenvalues with a multiplicity. As an extreme example, if Σ = Im , then S = Im and L = Im , but P may be an arbitrary orthogonal matrix; in such cases the principal components s∗t would form an arbitrary orthogonal transformation of the observation vector xt , whereas st = xt . Note also that the O-GARCH model of Alexander (2001) assumes that the standardized principal components s∗t are independent GARCH processes, which corresponds to a special case of our model with U ∗ = Im (hence s∗t = yt ), which in the parametrization considered here requires U = P.
120
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
2.2. Reduced-factor models Lanne and Saikkonen (2007) analyze a special case of the GO-GARCH model with independent components, in which only a subset of the components of yt has a time-varying conditional variance. The motivation for this is that if the number of assets m is large, then it may be reasonable to expect that the conditional variance matrix Σt can be described by a number r < m of heteroskedastic factors. Indeed, the model then reduces to a parsimoniously parametrized version of the factor ARCH model of Engle et al. (1990) and Diebold and Nerlove (1989). The variance matrix Ht can in this case be expressed as
[ Ht =
H1t 0
]
0 Im−r
,
H1t = diag(h1t , . . . , hrt ).
Q (B) =
Partitioning Z = [Z1 : Z2 ] and U = [U1 : U2 ] conformably with Ht , the model implies that
Σt = Z1 H1t Z1 + Z2 Z2 = Σ + Z1 (H1t − Ir )Z1 , var(st |Ft −1 ) = U1 H1t U1′ + U2 U2′ = Im + U1 (H1t − Ir )U1′ . ′
′
tr(St − BSt −1 B)2 .
Using the fact that St = UYt U ′ , with Yt = yt y′t − Im , it follows that n −
tr(Yt − AYt −1 A)2 ,
A = U ′ BU .
t =1
In this subsection we briefly review the currently available methods for estimating the model implied by Assumption 1, or specific versions thereof. Although the GO-GARCH model can be considerably more parsimonious than alternative multivariate GARCH models, for larger m it will become harder to maximize its likelihood function over the entire parameter space, which has motivated the development of two-step approximations of maximum likelihood, or alternative methods that are easier to apply in larger dimensions. Gaussian maximum likelihood estimation of the model with independent GARCH factors was analyzed by van der Weide (2002). He considered the standardized returns st as observable time series, which leads to a log-likelihood function of the form n 1−
{m log(2π ) + log |Ht (θ )|
+ s′t U (θ1 )Ht (θ )−1 U (θ1 )′ st },
n t =1
Q (B) = n−1
2.3. Currently available estimation methods
2 t =1
n 1−
′
These representations imply that in the reduced-factor model, the matrix U2 is only identified by the properties U1′ U2 = 0 and U2′ U2 = Im−r . In other words, U2 and hence Z2 = SU2 are only identified up to orthogonal transformations of their columns. In a companion paper (Boswijk and van der Weide, 2008), we propose a testing procedure for the hypothesis of a reducedfactor model based on the same sample autocorrelation matrix that will be used in the next section to estimate U. Unless indicated otherwise, in the present paper we assume a full-factor GO-GARCH model.
ℓ(θ) = −
(If m is large relative to n, one could also consider shrinkagetype estimators of Σ ; see e.g. Ledoit and Wolf (2004).) Therefore, in practice the procedure of van der Weide (2002) is a two-step approximation of maximum likelihood. If we let θ0 = vech(Σ ), then full maximum likelihood would entail maximizing (8), with st replaced by Σ (θ0 )−1/2 xt , over (θ0′ , θ1′ , θ2′ )′ . Lanne and Saikkonen (2007) derived asymptotic properties of such a full maximum likelihood procedure for the reduced-factor model considered in Section 2.2. Boswijk and van der Weide (2006) proposed a nonlinear leastsquares estimator of U, based on the autocorrelation properties of the matrix-valued process St = st s′t − Im . Let Bˆ be the minimizer, over all symmetric matrices, of the least-squares criterion
(8)
where θ = (θ1′ , θ2′ ), with θ1 a vector of dimension 12 m(m − 1) characterizing the m × m orthogonal matrix U = U (θ1 ), and θ2 a 2m-dimensional parameter vector of GARCH parameters. A convenient parametrization of U (θ1 ) as the product of 12 m(m − 1) rotation matrices, each characterized by one parameter, is discussed in van der Weide (2002). Note that Ht depends on U via yt −1 = U ′ st , so that Ht = Ht (θ ) is characterized by the full parameter vector θ . By applying the general asymptotic results of Comte and Lieberman (2003) for BEKK models (in which the GO-GARCH model is nested), conditions for consistency and asymptotic normality of the maximum likelihood estimator are obtained. In practice st is not observed, and will have to be estimated by ∑ ˆ −1/2 xt , with Σ ˆ the sample variance matrix n−1 nt=1 xt x′t . sˆt = Σ
Boswijk and van der Weide (2006) derived conditions under which ˆ is a diagonal matrix, which in the probability limit of Aˆ = U ′ BU turn implies that the eigenvector matrix Uˆ of Bˆ will be a consistent estimator of U. This estimator can be embedded in a three-step procedure: first estimate Σ to construct sˆt , then estimate U based on sˆt , and finally estimate the GARCH parameters based on yˆ t = Uˆ ′ sˆt . An alternative estimator of U was proposed by Fan et al. (2008). The starting point of their analysis is that the conditionally uncorrelated restriction E (yit yjt |Ft −1 ) = 0 is equivalent to E [yit yjt I (B)] = 0 for all B ∈ Ft −1 , where I (.) is the indicator function. Let ui denote the ith vector of U, so that yit = u′i st , let B be a collection of subsets of Rm , and let p be an arbitrary integer. Then the columns of U should satisfy the (population) criterion
Ψ (U ) =
p − −−
|u′i E [st s′t I (st −k ∈ B)]uj | = 0.
(9)
1≤i<j≤m B∈B k=1
Fan et al. (2008) propose to estimate U by minimizing a sample analog of Ψ (U ), and provide a bootstrap inference procedure for this estimator, and for a test of the conditionally uncorrelatedness hypothesis. Again, this estimator of U should be preceded by the estimation of Σ and st , and followed by the estimation of the (extended) GARCH models for yˆ t = Uˆ ′ sˆt . All methods considered in this subsection require numerical maximization of a criterion function over a high-dimensional parameter space. Therefore, as m increases, each of these methods is likely to run into numerical problems, such as failure of a Newtontype optimization procedure to converge, or the possibility of ending up in a local maximum. The estimator proposed in the next section, on the other hand, only requires the calculation of common eigenvectors of a sequence of sample moment matrices, and therefore can be applied to arbitrary dimensions m. 3. Method of moments estimation 3.1. The estimator based on a single lag The starting point of our method of moments estimator is the same as in Boswijk and van der Weide (2006), i.e., the autocorrelation properties of the (mean zero) matrix-valued processes St = st s′t − Im and Yt = yt y′t − Im . For the autocorrelation matrices of these processes to be well defined (and consistently estimated by their sample analogs) and to be able to identify U from these, we make the following assumption.
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
Assumption 2. The process {yt }t ≥1 is strictly stationary and ergodic, and has finite fourth moments κi = E (y4it ) < ∞, i = 1, . . . , m. Furthermore, the autocorrelations ρik = corr(y2it , y2i,t −k ) and cross-covariances τijk = cov(y2it , yi,t −k yj,t −k ) satisfy, for some integer p, min max |ρik | > 0,
max
1≤k≤p,1≤i<j≤m
1≤i≤m 1≤k≤p
|τijk | = 0.
The stationarity assumption, as well as the assumptions on the moments, would be implied by independent GARCH processes for yit , under suitable parameter restrictions to guarantee a finite kurtosis; see He and Teräsvirta (1999). Because estimated GARCH parameters in practice do not always satisfy the finite kurtosis restrictions, this assumption is not without loss of generality. In the next section, we investigate the sensitivity of our method to deviations from this assumption through Monte Carlo simulations. The non-zero autocorrelation assumption allows us to identify U from the first p autocorrelation coefficients of y2it . It would be hard to think of processes that do display volatility clustering but violate this assumption (i.e., with corr(y2it , y2i,t −k ) = 0 for all k = 1, . . . , p). Finally, the zero cross-covariances τijk exclude dependence in hit on whether yi,t −k and yj,t −k have the same sign. Although this may exclude particular asymmetries in volatility, note that the assumption does allow for the extended GARCH model (6), possibly augmented with yi,t −1 and yj,t −1 (but not their product) to allow for leverage effects. Define the autocovariance matrices
Γk (y) = E (Yt Yt −k ),
k = 1, 2, . . . .
(10)
Note that Γk (y) does not contain all separate kth order (cross-) autocovariances of squares and cross-products of yt (which would require vectorizing Yt ), but is an m × m matrix with elements
Γk (y)ij =
m −
cov(yit yℓt , yℓ,t −k yj,t −k ).
ℓ=1
Therefore, Assumptions 1 and 2 imply, using var(y2it ) = E (y4it ) − E (y2it )2 = κi − 1,
Γk (y)ij = cov(y2it , yi,t −k yj,t −k ) =
(κi − 1)ρik , τijk = 0,
j = i, j ̸= i,
Γk (y) = diag((κ1 − 1)ρ1k , . . . , (κm − 1)ρmk ). For the corresponding autocorrelation matrix, we thus find
Φk (y) = Γ0 (y)−1/2 Γk (y)Γ0 (y)−1/2 = diag(ρ1k , . . . , ρmk ).
and hence
Φk (s) = Γ0 (s)−1/2 Γk (s)Γ0 (s)−1/2 = U Φk (y)U ′ . Because Γk (y) and Φk (y) are diagonal matrices and U is an orthogonal matrix, we find that under Assumptions 1 and 2, U may be identified by the eigenvectors of either Γk (s) or Φk (s). Consider the sample analogs of Γk (s) or Φk (s):
=
n t =k+1
St St −k
(st st − Im )(st −k st −k − Im ), ′
′
ˆ k (s) = Γˆ 0 (s)−1/2 Γˆ k (s)Γˆ 0 (s)−1/2 , Φ
Although one could in principle use the estimator Uˆ k proposed in the previous subsection for one particular choice of the lag length k, we may obtain a more efficient estimator by combining information from different lags. This is relevant in particular for daily financial data, where the autocorrelation function of the squares typically is small but slowly decaying. This implies that the eigenvalues {ρik }m i=1 of Φk (s) will be close to zero (and hence close to each other), yielding weakly identified eigenvectors for fixed k. Provided that the autocorrelation functions {ρik }∞ k=1 are sufficiently different, pooling the information from different Φk (s) matrices will then increase the efficiency of the estimator. ˜k = Φ ˜ k (s). Let p denote the maximal lag length, and let Φ The property that each of the population matrices Φk = Φk (s) have the same matrix of eigenvectors is shared with the so-called common principal components (CPC) model, see Flury (1984), p where {Φk }k=1 represent covariance matrices of p different random vectors. Under the assumption that these are Gaussian, and that we have p independent i.i.d. samples, Flury (1984) derived the maximum likelihood estimator of the common eigenvector matrix U. Because these assumptions are clearly violated, we consider a closely related least-squares estimator, discussed by Beaghen (1997). This estimator minimizes the criterion function S (U ) =
p −
˜ k U − diag(U ′ Φ ˜ k U ))′ tr(U ′ Φ
k=1
˜ k U − diag(U ′ Φ ˜ k U )) × (U ′ Φ
Γk (s) = E (St St −k ) = E (UYt U ′ UYt −k S ′ ) = U Γk (y)U ′ ,
n t =k+1 n 1 −
3.2. Combining information from different lags
′
For the process st = Uyt , the corresponding autocovariance and autocorrelation matrices satisfy
Γˆ k (s) =
where Γˆ 0 (y)−1/2 is the symmetric square root of Γˆ 0 (y)−1 . We define our estimator Uˆ k as the matrix of eigenvectors of the ˜ k (s) = 1 (Φ ˆ k (s) + Φ ˆ k (s)′ ) of Φ ˆ k (s). symmetrized version Φ 2 Although in principle one could also take the eigenvectors of the corresponding symmetric version of Γˆ k (s) as estimator of U, preliminary Monte Carlo experiments have indicated that the ˆ k (s) leads to a more efficient standardization used to construct Φ estimator.
(13)
over all orthogonal matrices U. Although this minimization problem does not have a closed-form solution, Beaghen (1997) shows that the so-called F–G algorithm of Flury and Gautschi (1986) can be easily adapted to this least-squares estimator. This algorithm involves an iteration of rotations until the first-order conditions are satisfied, which can be rewritten as
or in other words
n 1 −
121
(11) (12)
ui
p − ˜ k uj = 0 , (λki − λkj )Φ
i ̸= j = 1, . . . , m,
(14)
k=1
˜ k ui , λki = u′i Φ
i = 1, . . . , m; k = 1, . . . , p,
(15)
with ui the columns of U. This shows that the solution diagonalizes ˜ k , with weights proportional a weighted average of the matrices Φ ˜ k that are less informato the difference in eigenvalues. Matrices Φ tive about the matrix U, because of the near multiplicity of their eigenvalues, are therefore given less weight. The estimator we propose can be summarized as follows. Summary 1. Starting from an m-vector of daily returns {xt }nt=1 , possibly corrected (by least squares) for a constant mean and serial correlation, the model is estimated in the following steps:
ˆ = n− 1 1. Estimate the unconditional variance matrix Σ
∑n
t =1
xt x′t ,
ˆ = PLP , and hence its symmetric its spectral decomposition Σ square root S = PL1/2 P ′ and the standardized returns st = S −1 xt = PL−1/2 P ′ xt . ′
122
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
2. Calculate the matrix-valued series St = st s′t − Im , its sample autocovariance matrices Γˆ k (s) = n−1
∑n
t =1
St St −k , k = 0, . . . , p,
ˆ k (s), k = 1, . . . , p, from and its sample autocorrelation matrices Φ (12). ˜ k = 1 (Φ ˆ k (s) + 3. Using the symmetrized autocorrelation matrices Φ 2 ˆ k (s)′ ), estimate U by minimizing the criterion function S (U ) Φ
given in (13), based on the adapted F–G algorithm. 4. Estimate the conditionally uncorrelated components yt by yˆ t = Uˆ ′ st , and estimate separate GARCH-type models for the components of yit by quasi-maximum likelihood. 3.3. Consistency In this subsection we prove the consistency of the estimator Uˆ defined in the previous section. We use the square root d(·, ·) of a symmetric version of the distance measure D(·, ·) for orthogonal matrices introduced by Fan et al. (2008):
d(U , Uˆ ) =
1 2
[D(U , Uˆ ) + D(Uˆ , U )],
D(Uˆ , U ) = 1 −
m 1 −
max |u′i uˆ j |. m i=1 1≤j≤m
The motivation for D(·, ·) is that in the model st = Uyt , the columns of the matrix U may be reordered and multiplied by −1, by changing rows of yt in the same way. In other words, U is invariant under permutation and sign change of its columns. The modification d(·, ·) is a distance function that satisfies the properties of a metric (symmetry, triangle inequality1 ), provided that an orthogonal matrix is identified by its equivalence class. An identification assumption needed for consistency of Uˆ is the following: Assumption 3. In the model defined by Assumptions 1 and 2, max
min |ρik − ρjk | > 0.
1≤k≤p 1≤i<j≤m
The assumption excludes the possibility that two squared components y2it and y2jt have the same autocorrelation function for k = 1, . . . , p. The reason for this assumption is that the autocorrelations are the eigenvalues of the matrix Φk (s), and if this matrix has eigenvalues with a multiplicity, then the corresponding submatrix of eigenvectors is only identified up to orthogonal transformations. Because such transformations will typically destroy the property of the true matrix U, that U ′ st = yt is a vector of conditionally uncorrelated components, this would result in an inconsistent estimator ˆ U.
ˆ minimizing (13). Theorem 1. Consider the MM-CPC estimator U, Then, under Assumptions 1–3, and as n → ∞, P
d(U , Uˆ ) −→ 0. Proof. From the law of large numbers for stationary ergodic Markov chains, see Jensen and Rahbek (2007), it follows that under Assumptions 1 and 2, and as n → ∞, P Γˆ k (s) −→ Γk (s),
P ˆ k (s) −→ Φ Φk (s).
1 Although we have not been able to prove the triangle inequality, numerically it appears that d(·, ·) satisfies this property; the original distance D(·, ·) violates this property.
This implies that Uˆ converges in probability to a matrix satisfying ˜ k (s) replaced by U diag(ρ1k , . . . , ρmk )U ′ . If the (14)–(15), with Φ eigenvalues ρik are distinct for at least one k = 1, . . . , p, as implied by Assumption 3, then these first-order conditions are only satisfied by a matrix that is in the same equivalence class as U (defined by permutation and sign change of the columns), so that P
d(U , Uˆ ) −→ 0.
The next step is to derive the asymptotic distribution of the estimator. A starting point of this would be to derive conditions under which a joint asymptotic normality applies to ˜ 1 (s), . . . , Φ ˜ p (s)), which would lead to asymptotic normality of (Φ
ˆ analogously to the results obtained by Flury (1984) and Beaghen U, (1997). It is clear that such results would require, at the minimum, finite eighth moments of yit , which is likely to be violated in practical applications, and is therefore not considered here. Moreover, asymptotic normality of the parameter estimators in Uˆ is in itself not a very useful result, unless it helps us to make inference on the parameters directly characterizing the volatility dynamics, or helps us to evaluate estimation uncertainty in volatility and correlation forecasts. We expect that such results are more easily obtained by bootstrap procedures; we leave this for future work. 4. Monte Carlo simulations In this section we study the finite-sample performance of the estimator proposed in this paper, in comparison with maximum likelihood. We focus on a trivariate system (m = 3), and we consider four different data-generating processes (DGPs) for the conditionally uncorrelated process {yt }t ≥1 . The (relative) efficiency of the two estimators is evaluated using the root mean square distance (RMSD), i.e., the square root of the average of d(U , Uˆ )2 , over 5000 Monte Carlo replications. For both the MMCPC estimator and the ML estimator, the distance d(U , Uˆ ) is invariant to U, in the sense that if Uˆ 1 and Uˆ 2 are estimates based on data generated using U1 and U2 , respectively, then d(U1 , Uˆ 1 ) = d(U2 , Uˆ 2 ). Therefore, the choice of U in the DGP is irrelevant. Three sample sizes, with n ∈ {500, 1000, 2000}, are considered. We first consider two DGPs in which the components of yt are independent Gaussian GARCH(1, 1) processes, so the maximum likelihood estimator is based on a well-specified model:
• DGP A: (α1 , β1 ) = (0.055, 0.94); (α2 , β2 ) = (0.16, 0.8); (α3 , β3 ) = (0.25, 0.65); • DGP B: (α1 , β1 ) = (0.095, 0.9); (α2 , β2 ) = (0.26, 0.7); (α3 , β3 ) = (0.25, 0.65). Under DGP A, all components of yt have finite kurtosis, so that Assumption 2 is satisfied. The autocorrelation decay rate α + β varies from 0.995 to 0.9, reflecting empirically relevant persistence in the autocorrelation functions of y2it . The initial autocorrelations are such that the three autocorrelation functions cross each other, implying that the average autocorrelations (over p lags) for the three series can be close to each other (depending on p). Under DGP B, the first and the second component have infinite kurtosis; this DGP is included to investigate how sensitive the estimator is to deviations from the finite fourth moment assumption. The number of lags used in the MM-CPC estimator is fixed at p = 100. Preliminary simulations have indicated that for the type of persistence in autocorrelations considered here, the efficiency does not improve much beyond p = 50, so that p = 100 may be considered a conservative choice. It should be emphasized that the improvements over p = 1 are substantial: the RMSD of the MM estimator based on only the first lag is typically about 3–5 times larger than that of the MM-CPC estimator with p = 100. The results, for the first two DGPs are given in Fig. 1, top panels.
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
123
Fig. 1. Root mean square distance of MM and ML estimators.
We observe that for the first two DGPs, the MM-CPC estimator has a RMSD which is about two and a half times larger than that of the ML estimator. This should be seen as the price that we pay for using a more robust and computationally less intensive estimator than ML. The RMSD of both estimators appears to decrease with the sample size at approximately the rate n−1/2 . (In a bivariate model where U is characterized by a single angle φ , it can be shown that the average d(U , Uˆ )2 is approximately equal to half the mean squared error of φˆ .) Most striking is that the MM estimator in DGP B, which does not satisfy Assumption 2 because some of its components have infinite kurtosis, has the same qualitative behaviour as the estimator in DGP A, and in fact has a smaller RMSD. Therefore, although the definition of the moment matrices Γk (s) and Φk (s) requires the existence of fourth moments, we observe that the estimator is not negatively affected by a departure from this assumption. To investigate the effect of misspecification on the relative performance of the two methods, we now consider the following datagenerating processes. DGP C is an extended GARCH specification, of the form h1t = 0.005 + 0.035y21,t −1 + 0.02y23,t −1 + 0.94h1,t −1 , h2t = 0.04 + 0.08y21,t −1 + 0.08y22,t −1 + 0.8h2,t −1 , h3t = 0.1 + 0.25y23,t −1 + 0.65h3,t −1 , where the standardized innovations εit now follow a standardized Student’s t (5) distribution (with εit independent of εjt , i ̸= j). Extending the results of Hafner (2003), it can be shown that the components of yt have finite kurtosis under this DGP. Clearly, the ML estimator is now based on a misspecified likelihood (assuming independent GARCH processes), whereas the MM estimator is still valid. Finally, DGP D has the same GARCH parameters as DGP A, but now the standardized innovations εit follow a standardized Student’s t (3) distribution (implying infinite kurtosis for all components of yt ). In this case the misspecification of the likelihood function only concerns the shape of the innovation density, not the volatility dynamics. The results in the bottom panels of Fig. 1 show that for these two DGPs and the sample sizes considered here, the MM estimator is in fact more efficient than the ML estimator. Most striking about the results for DGP C is that the RMSD appears to decrease slower as a function of the sample size than in the other cases. Comparison of the results for DGP D and DGP A (which differ only in the innovation distribution) shows that the performance of the MM estimator
has improved by having fat-tailed innovations (despite the lack of finite fourth moments), whereas the ML estimator performs much worse. The superiority of the ML estimator for the first two DGPs indicates that this is the preferred method of estimation when the dimension of the problem is not too large, and the latent processes are known to be independent Gaussian GARCH processes. On the other hand, the results for DGP C and D illustrate that maximum likelihood estimation can be sensitive to both volatility misspecification and innovation distribution misspecification, and that the MM-CPC method can be more robust in such cases. Furthermore, for larger dimensions, likelihood maximization will run into convergence problems, especially for misspecified models, which is not the case for the MM-CPC method. 5. Empirical application to European sector indices In this section we analyze an empirical GO-GARCH model for Dow Jones STOXX 600 European stock market industry indices. From www.stoxx.com, we downloaded daily data, from the beginning of January 1992 through the end of January 2010, on the 10 industry indices, yielding n = 4650 daily log-returns. Table 1 provides some descriptive statistics. The sample means are relatively small, which is due to the recent financial crisis: by the end of 2009 many stock prices were back at their 2003 levels. At the high end we find Basic Materials, at the low end we have Consumer Services. The latter is also among the least volatile industries, whereas the most volatile is the Technology industry. Note that while the first-order autocorrelation is statistically significant for some of the industries, the coefficients are all small in size. For the analysis all first-order autocorrelations are removed from the data. Estimates not reported in the paper show that all unconditional correlations range between 0.5 and 0.9, which suggests that the industries exhibit strong linkages. The industries that show some of the highest correlations include: Industrials, Financials and Consumer Services. Industries that exhibit comparatively low correlations include: Oil and Gas, Technology and Health Care. For our method of moments estimator, denoted by Uˆ MM , we used p = 250 lags. For the maximum likelihood estimator, denoted by Uˆ ML , we assume independent Gaussian GARCH(1, 1) factors. The distance between the two estimates, d(Uˆ MM , Uˆ ML ) = 0.385, indicates that the two methods provide different estimates for the link matrix U. How large these differences really are, and the
124
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
Table 1 Annualized means and standard deviations, and first-order autocorrelations of returns. Industry
Mean
Std. dev.
Basic materials Consumer goods Consumer services Financials Industrials Health care Oil and Gas Technology Telecom Utilities
0.085 0.060 0.028 0.036 0.049 0.072 0.065 0.037 0.050 0.065
0.235 0.169 0.187 0.239 0.196 0.184 0.231 0.314 0.251 0.176
* **
ac(1) 0.027*
−0.004 0.013 0.057** 0.058** 0.024* −0.003 0.040** 0.026* −0.020
Significant at the 10% level. Significant at the 5% level.
Table 2 Estimated GARCH parameters for the factors. Factor
ˆ ML (α, ˆ β)
ˆ MM (α, ˆ β)
1 2 3 4 5 6 7 8 9 10
(0.055, 0.943) (0.074, 0.920) (0.088, 0.894) (0.105, 0.892) (0.088, 0.903) (0.027, 0.968) (0.033, 0.965) (0.075, 0.923) (0.024, 0.973) (0.053, 0.937)
(0.051, 0.948) (0.083, 0.910) (0.082, 0.901) (0.089, 0.908) (0.064, 0.931) (0.028, 0.967) (0.032, 0.966) (0.060, 0.938) (0.026, 0.971) (0.035, 0.961)
consequences of this, can best be seen by comparing the estimates of the GARCH parameters, and ultimately the estimates of the volatilities and correlations. The estimates for the GARCH parameters are shown in Table 2. For most factors, we observe that the MM and ML estimates are not that different from each other. Fig. 2 presents the estimated time-varying annualized volatilities of the 10 industry returns, obtained from the GO-GARCH model estimated by maximum likelihood or the method of moments. These are compared with the volatilities obtained from univariate GARCH(1, 1) models for the returns, to check whether the GO-GARCH model produces sensible estimates. We observe that for all returns, the three volatility estimates follow similar patterns, which are also remarkably similar across industries. After a relatively stable period with low volatilities in the 1990s, volatilities tend to increase and display more variation around the time of the burst of the internet bubble in 2000. After a few years this again is followed by a stable period, which ends with the peak in the credit crisis in September–October 2008, leading to a sharp increase in volatility, followed by a gradual decline. By late January 2010 (the end of our sample), volatilities are back to their 2007 levels. Industries mainly differ in the impact of the internet bubble crash, which, as expected, clearly led to higher volatilities in the Technology and Telecom industries than in other industries. The main difference between the outcomes of the different estimation methods is the height of the volatility peaks. For example, for October 30, 2008, the GO-GARCH-based estimated volatilities of the Technology returns are 1.31 (ML) and 1.25 (MM), respectively, whereas the univariate GARCH-based volatility estimate is 0.67. Closer inspection reveals that on average, the ML-based volatilities are closer to the univariate GARCH estimates than the MM-based estimates, although the differences are very small; this is most evident in the results for the Technology and Telecom industries. Note that the 10 industries yield a total of 45 different pairs. There would be little value added in reporting all 45 estimates of the conditional correlation processes. Pairs whose conditional correlation exhibits considerable variation over time would be
most interesting, and would provide a test whether the different estimators identify the same trends and patterns in conditional correlation. The Oil and Gas industry is an example of such an industry. Correlations between Industrials and Oil and Gas for example are seen to fall as low as 0.25 in 2002, and then climb to levels as high as 0.85 in 2009. For many other pairs, especially those with high unconditional correlation, we observe relatively little variation in conditional correlation over time (which leaves little space for the two estimators to potentially disagree on). Fig. 3 presents estimates for the conditional correlations between the Oil and Gas industry and a choice of six other industries. We observe that, although the estimates for U show some difference, MM and ML largely agree on the conditional correlations. What is apparent is that correlations are relatively low during the first period of higher volatility (between 1999 and 2003), while they climb from around 0.50 to 0.80 during the second period of high volatility in our sample period (between 2008 and 2009). This shows that correlations need not necessarily be higher in periods of higher volatility. What sets these two periods apart is that between 1999 and 2003 the rise in volatility is more gradual in a time when stock prices too are rising (which was the case for all industries except Technology and Telecom; not reported here). The rise in volatility in 2008 is more abrupt, in a time when stock prices started coming down as the financial crisis unfolded. This is consistent with earlier findings that correlations tend to increase in times of crisis, which are characterized by periods of extreme volatility combined with falling stock prices (see e.g. Longin and Solnik (2001), Ang and Bekaert (2002), Ang and Chen (2002), Das and Uppal (2004) and Bekaert et al. (2005)). 6. Conclusion We have put forward a method of moments estimator for the factor loading matrix in GO-GARCH models. The method is based on the common eigenvectors of suitably defined sample autocorrelation matrices of squares and cross-products of the observed data. This means that estimation does not require any Newton-type optimization of an objective function, so that it is free of numerical convergence problems regardless of the dimension. The parameters from the univariate GARCH-type models can be estimated separately for each individual factor, given our estimate for the factor loading matrix, which makes the method particularly easy to implement. Our method of estimation provides an alternative to the estimator originally proposed by van der Weide (2002), which jointly estimates parameters that feature both in the factor loading matrix and the univariate GARCH-type specifications by means of maximum likelihood. ML estimates of the factor loading matrix thus depend on the choice of GARCH-type models used to specify the likelihood function. In a Monte Carlo experiment, the ML estimator is found to be more efficient than the MM estimator when the likelihood function is consistent with the data-generating process. The loss in efficiency is the price the MM estimator pays for its numerical convenience (no optimization required), and the fact that no assumptions need to be made concerning model specification for the individual factors. On the other hand, Monte Carlo simulations show that the MM estimator can be more efficient than ML when the latter is based on a misspecified likelihood function. Moreover, the MM estimator is a welcome alternative whenever ML experiences convergence difficulties. ML estimation can become problematic when the dimension is particularly large and/or when the model used to specify the likelihood function is considerably misspecified, while the MM estimator does not suffer from such problems.
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
Fig. 2. Estimated annualized volatilities: MM, ML, and univariate GARCH.
Fig. 3. Estimated correlations: Oil and Gas versus selected industries.
125
126
H. Peter Boswijk, R. van der Weide / Journal of Econometrics 163 (2011) 118–126
Acknowledgements Helpful comments from two anonymous referees and the guest editor, Franz Palm, are gratefully acknowledged. References Alexander, C., 2001. Orthogonal GARCH. In: Alexander, C. (Ed.), Mastering Risk. Financial Times-Prentice Hall, London, pp. 21–28. Ang, A., Bekaert, G., 2002. International asset allocation under regime switching. Review of Financial Studies 15, 1137–1187. Ang, A., Chen, J., 2002. Asymmetric correlations of equity portfolios. Journal of Financial Economics 63, 443–494. Bauwens, L.S., Laurent,, Rombouts, J.V.K., 2006. Multivariate GARCH models: a survey. Journal of Applied Econometrics 21, 79–109. Beaghen, M., 1997. Canonical Variate Analysis and Related Methods with Longitudinal Data. Ph.D. Dissertation, Virginia Polytechnic Institute and State University, http://scholar.lib.vt.edu/theses/available/etd-11997-212717/ unrestricted/etd.pdf. Bekaert, G., Campbell, R.H., Ng, A., 2005. Market integration and contagion. Journal of Business 78, 39–69. Bollerslev, T., 2008. Glossary to ARCH (GARCH), CREATES Research Paper 2008-49, ftp://ftp.econ.au.dk/creates/rp/08/rp08_49.pdf. Boswijk, H.P., van der Weide, R., 2006. Wake me up before you GO-GARCH, UvA-Econometrics Discussion Paper 2006/03, http://www.ase.uva.nl/pp/bin/ 381fulltext.pdf. Boswijk, H.P., van der Weide, R., 2008. Testing the Number of Factors in GO-GARCH Models, Working Paper, University of Amsterdam. Comte, F., Lieberman, O., 2003. Asymptotic theory for multivariate GARCH processes. Journal of Multivariate Analysis 84, 61–84. Das, S., Uppal, R., 2004. Systematic risk and international portfolio choice. Journal of Finance 59, 2809–2834. Diebold, F.X., Nerlove, M., 1989. The dynamics of exchange rate volatility: a multivariate latent factor ARCH model. Journal of Applied Econometrics 4, 1–21. Ding, Z., 1994. Time Series Analysis of Speculative Returns. Ph.D. Dissertation, University of California at San Diego.
Doz, C., Renault, E., 2006. Factor stochastic volatility in mean models: a GMM approach. Econometric Reviews 25, 275–309. Engle, R.F., Ng, V.K., Rotschild, M., 1990. Asset pricing with a factor-ARCH covariance structure: empirical estimates for treasury bills. Journal of Econometrics 45, 213–237. Fan, J., Wang, M., Yao, Q., 2008. Modelling multivariate volatilities via conditionally uncorrelated components. Journal of the Royal Statistical Society Series B 70, 679–702. Flury, B.N., 1984. Common principal components in k groups. Journal of the American Statistical Association 79, 892–898. Flury, B.N., Gautschi, W., 1986. An algorithm for simultaneous orthogonal transformation of several positive definite matrices to nearly diagonal form. SIAM Journal of Scientific and Statistical Computing 7, 169–184. Hafner, C., 2008. Temporal aggregation of multivariate GARCH processes. Journal of Econometrics 142, 467–483. Hafner, C., 2003. Fourth moment structure of multivariate GARCH models. Journal of Financial Econometrics 1, 26–54. He, C., Teräsvirta, T., 1999. Properties of moments of a family of GARCH processes. Journal of Econometrics 92, 173–192. Jensen, S.T., Rahbek, A., 2007. On the law of large numbers for (geometrically) ergodic Markov chains. Econometric Theory 23, 761–766. Lanne, M., Saikkonen, P., 2007. A multivariate generalized orthogonal factor GARCH model. Journal of Business & Economic Statistics 25, 61–75. Ledoit, O., Wolf, M., 2004. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88, 365–411. Longin, F., Solnik, B., 2001. Extreme correlation of international equity markets. Journal of Finance 56, 649–676. Rigobon, R., 2003. Identification through heteroskedasticity. Review of Economics and Statistics 85, 777–792. Sentana, E., Fiorentini, G., 2001. Identification, estimation and testing of conditionally heteroskedastic factor models. Journal of Econometrics 102, 143–164. Silvennoinen, A., Teräsvirta, T., 2009. In: Andersen, T.G., Davis, R.A., Kreiss, J.-P., Mikosch, T. (Eds.), Multivariate GARCH Models. In: Handbook of Financial Time Series, Springer, New York, pp. 201–229. Urga, G., 2007. Common features in economics and finance: an overview of recent developments. Journal of Business & Economic Statistics 25, 2–11. van der Weide, R., 2002. GO-GARCH: a multivariate generalized orthogonal GARCH model. Journal of Applied Econometrics 17, 549–564. Vrontos, I.D., Dellaportas, P., Politis, D.N., 2003. A full-factor multivariate GARCH model. Econometrics Journal 6, 312–334.