Abstract
This paper proposes simple tests of error cross-sectional dependence which are applicable to a variety of panel data models, including stationary and unit root dynamic heterogeneous panels with short T and large N. The proposed tests are based on the average of pair-wise correlation coefficients of the OLS residuals from the individual regressions in the panel and can be used to test for cross-sectional dependence of any fixed order p, as well as the case where no a priori ordering of the cross-sectional units is assumed, referred to as \(\hbox {CD}(p)\) and \(\hbox {CD}\) tests, respectively. Asymptotic distribution of these tests is derived and their power function analyzed under different alternatives. It is shown that these tests are correctly centred for fixed N and T and are robust to single or multiple breaks in the slope coefficients and/or error variances. The small sample properties of the tests are investigated and compared to the Lagrange multiplier test of Breusch and Pagan using Monte Carlo experiments. It is shown that the tests have the correct size in very small samples and satisfactory power, and, as predicted by the theory, they are quite robust to the presence of unit roots and structural breaks. The use of the \(\hbox {CD}\) test is illustrated by applying it to study the degree of dependence in per capita output innovations across countries within a given region and across countries in different regions. The results show significant evidence of cross-dependence in output innovations across many countries and regions in the World.
Similar content being viewed by others
Notes
The assumption that \(u_{it}\)’s are serially uncorrelated is not restrictive and can be accommodated by including a sufficient number of lagged values of \(y_{it}\) amongst the regressors.
The standard fixed effects estimator also assumes that \(\sigma _{i}^{2}=\sigma ^{2}\).
The requirement \(T>k+1\) can be relaxed under slope homogeneity assumption, \( \varvec{\beta }_{i}=\varvec{\beta }\) where fixed effects residuals can be used in the construction of the CD statistic instead of \(e_{it}\).
Similar results can also be obtained for fixed or random effects models. It suffices if the OLS residuals used in the computation of \({\hat{\rho }}_{ij}\) are replaced with associated residuals from fixed or random effects specifications. But the CD test based on the individual-specific OLS residuals is robust to slope and error-variance heterogeneity while the fixed or random effects residuals are not.
For the case of strictly exogenous regressors and Gaussian errors, it can be shown that \(E\left( {\hat{\rho }}_{ij}^{2}\right) =Tr({\mathbf {M}}_{i}{\mathbf {M}} _{j})/(T-k-1)^{2}\). I am grateful to Aman Ullah for drawing my attention to this result.
See “Appendix A” for a proof.
See, for example, Nickell (1981).
See, for example, Cliff and Ord (1973).
Another possibility that could be more relevant for the analysis of economic and financial panels would be to set \(w_{ij}=1\), if the “economic distance” between the \(i\mathrm{th}\) and the \(j\mathrm{th}\) cross-sectional units is less than a threshold, \({\bar{d}}\), and \(w_{ij}=0\), otherwise.
These assumptions allow for the inclusion of lagged dependent variables amongst the regressors and can be relaxed further to take account of non-stationary I(1) regressors.
In the spatial literature, it is typically assumed that \(a_{i}=b_{i}=0.5\) and \({\mathbf {W}}\) is known as the “rook” formation.
Note that under \(H_{\ell TN}\), \({\mathbf {u}}_{_{\circ }t}=\left( {\mathbf {I}} _{N}+\frac{\delta }{\sqrt{NT}}{\mathbf {W}}\right) \varvec{\Sigma }^{1/2} \varvec{\varepsilon }_{\circ t}+O_{p}\left( \frac{1}{NT}\right) .\)
The asymptotic power function of the CD test is also symmetric under homogeneous alternatives, \(\varvec{\gamma }_{i}=\varvec{\gamma }\).
The PWT code for the series is RGDPL and is constructed in international dollars, with 1996 as the reference year. For further details, see Heston et al. (2002).
References
Ahn SG, Lee YH, Schmidt P (2001) GMM estimation of linear panel data models with time-varying individual effects. J Econom 102:219–255
Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, Dorddrecht
Anselin L (2001) Spatial econometrics. In: Baltagi B (ed) A companion to theoretical econometrics. Blackwell, Oxford
Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A, Giles DEA (eds) Handbook of applied economic statistics. Marcel Dekker, New York
Bailey N, Kapetanios G, Pesaran MH (2016) Exponent of cross-sectional dependence: estimation and inference. J Appl Econoem 31:929–960
Baltagi BH, Feng Q, Kao C (2012) A Lagrange multiplier test for cross-sectional dependence in a fixed effects panel data model. J Econom 170:164–77
Baltagi BH, Kao C, Peng B (2015) On testing for sphericity with non-normality in a fixed effects panel data model. Stat Probab Lett 98:123–30
Baltagi BH, Song SH, Koh W (2003) Testing panel data regression models with spatial error correlation. J Econom 117:123–150
Barro RJ (1997) Determinants of economic growth: a cross-country empirical study. MIT Press, Cambridge, MA
Breusch TS, Pagan AR (1980) The Lagrange multiplier test and its application to model specifications in econometrics. Rev Econ Stud 47:239–53
Cliff A, Ord JK (1973) Spatial aurocorrection. Pion, London
Cliff A, Ord JK (1981) Spatial processes: models and applications. Pion, London
Chen J, Gao J, Li D (2012) A new diagnostic test for cross-sectional uncorrelatedness in nonparametric panel data mode. Econom Theory 28:1144–63
Conley TG, Topa G (2002) Socio-economic distance and spatial patterns in unemployment. J Appl Econom 17:303–327
De Hoyos RE, Sarafidis V (2006) Testing for cross-sectional depedence in panel-data models. Stata J 6:482–496
Haining RP (2003) Spatial data analysis: theory and practice. Cambridge University Press, Cambridge
Heston A, Summers R, Aten B (2002) Penn World Table Version 6.1. Center for International Comparisons at the University of Pennsylvania (CICUP), October 2002
Hsiao C, Pesaran MH, Pick A (2012) Diagnostic tests of cross section independence for nonlinear panel data models. Oxford Bull Econ Stat 74:253–77
Jensen PS, Schmidt TD (2011) Testing for cross-sectional dependence in regional panel data. Spat Econ Anal 6:423–450
Lee K, Pesaran MH, Smith R (1997) Growth and convergence in multi-country empirical stochastic solow model. J Appl Econ 12:357–92
Mao G (2018) Testing for sphericity in a two-way error components panel data model. Econom Rev 37(5):491–506
Moran PAP (1948) The interpretation of statistical maps. Biometrika 35:255–60
Ng S (2006) Testing cross-section correlation in panel data using spacings. J Bus Econ Stat 24:12–23
Nickell S (1981) Biases in dynamic models with fixed effects. Econometrica 49:1399–1416
Pesaran MH (2006) Estimation and inference in large heterogeneous panels with cross section dependence. Econometrica 74:967–1012
Pesaran MH (2015) Testing weak cross-sectional dependence in large panels. Econom Rev 34:1089–117
Pesaran MH, Schuermann T, Weiner SM (2004) Modeling regional interdependencies using a global error-correcting macroeconomic model. J Bus Econ Stat 22:129–181
Pesaran MH, Timmermann A (2005) Small sample properties of forecasts from autoregressive models under structural breaks. J Econom 129:183–217
Pesaran MH, Ullah A, Yamagata T (2008) A bias-adjusted LM test of error cross section independence. Econom J 11:105–27
Sarafidis V, Yamagata T, Robertson D (2009) A test of cross section dependence for a linear dynamic panel model with regressors. J Econom 148:149–61
Sarafidis V, Wansbeek T (2012) Cross-sectional dependence in panel data dnalysis. Econom Rev 31:483–531
Swamy PAVB (1970) Efficient inference in random coefficient regression model. Econometrica 38:311–23
Zellner A (1962) An efficient method for estimating seemingly unrelated regressions and tests of aggregation bias. J Am Stat Assoc 58:977–992
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with animals performed by any of the authors.
Conflict of interest
The author declares that he has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
I am grateful to Mutita Akusuwan and Takashi Yamagata for providing me with excellent research assistance, and for carrying out the Monte Carlo simulations. I would also like to thank Ron Smith, Takashi Yamagata, Aman Ullah and Daniele Massacci for helpful comments. Financial support from the ESRC (Grant No. RES-000-23-0135) is gratefully acknowledged.
Appendices
Appendix A: Properties of residuals in regression models subject to structural breaks
Abstracting from the cross-sectional index i, the regression model (11) with a single break can be written as
where the \(k\times 1\) slope coefficients, \(\varvec{\beta }_{j}\), and the error variances, \(\sigma _{j}^{2},\) for \(j=1,2\), are subject to a single break at time \(t=T_{1}\), and \(\varepsilon _{t}\sim iid(0,1)\). The implied intercepts are given by
The unconditional means of \(y_{t}\) and \({\mathbf {x}}_{t}\), namely \({\mu }_{y}\) and \(\varvec{\mu }_{x}\), are not subject to change. We also assume that \({\mathbf {x}}_{t}\) follows the covariance stationary process:
where \(\varvec{\nu }_{t}\) and \(\varvec{\varepsilon }_{t^{\prime }}\), are independently distributed for all t and \(t^{\prime }\). We shall also assume that the innovations in the \({\mathbf {x}}_{t}\) process, \(\varvec{\nu } _{t}\), are symmetrically distributed around zero.
Suppose now that the breaks are ignored and the residuals\(,e_{t},\) are computed by running the ordinary least squares regression of \(y_{t}\) on \( {\mathbf {x}}_{t}\) over the whole sample, \(t=1,2,\ldots ,T\). We have
where
In what follows we establish that for all \(t=1,2,\ldots ,T,\) the OLS residuals \( e_{t}\) are odd functions of the disturbances, \(\varepsilon _{t}\) and \( E(e_{t})=0\).
We first note that
where \(\lambda _{1}=T_{1}/T,\) \(\overline{{\mathbf {x}}}_{1}=\frac{1}{T_{1}} \sum \nolimits _{t=1}^{T_{1}}{\mathbf {x}}_{t},\) \(\overline{{\mathbf {x}}}_{2}=\frac{ 1}{T-T_{1}}\sum \nolimits _{t=T_{1}+1}^{T}{\mathbf {x}}_{t},\) etc. Hence, for \( t\le T_{1}\)
and for \(t>T_{1}\)
Also since
then for all t,
Consider now the residuals defined by (33) and note that
Hence, it is sufficient to show that the second term has zero expectations for all t. Under the data generating mechanism
where
Therefore,
Since \({\mathbf {x}}_{t}\) and \(\varepsilon _{t^{\prime }}\) are independently distributed for all t and \(t^{\prime }\), the last two terms have zero unconditional expectations. Also using (32), it is easily seen that
which establishes that \({\mathbf {x}}_{t}-\overline{{\mathbf {x}}}\) is an odd function of \(\left\{ \varvec{\nu }_{t}\right\} \), the innovations in \( {\mathbf {x}}_{t}\). Since \({\mathbf {Q}}\) and \({\mathbf {Q}}_{1}\) are even functions of \(\left\{ \varvec{\nu }_{t}\right\} ,\) it follows that \({\mathbf {Q}}_{j} {\mathbf {Q}}^{-1}({\mathbf {x}}_{t}-\overline{{\mathbf {x}}})\), for \(j=1,2\) are also odd functions of \(\left\{ \varvec{\nu }_{t}\right\} \) and in view of the symmetry of \(\left\{ \varvec{\nu }_{t}\right\} \) will have zero mean unconditionally. Thus, under our assumptions, \(e_{t}\) and \(\xi _{t}=\left( \sum _{\tau =1}^{T}e_{\tau }^{2}\right) ^{-1/2}e_{t}\) are odd functions of \( \left\{ \varepsilon _{t},\varvec{\nu }_{t}\right\} ,\) and therefore have zero expectations for all t, despite the breaks in the slopes and the error variances. This result continues to hold under multiple breaks and/or even if \( \varvec{\Psi }_{i}\) are subject to one or more breaks. The key assumptions are symmetry of the innovations, \(\varepsilon _{t}\) and \(\varvec{\nu }_{t}\), and the time-invariance of the unconditional means of \(y_{t}\) and \({\mathbf {x}} _{t}\).
Appendix B: Residuals from AR(p) models subject to breaks
Consider the AR(p) model defined over the period \(t=1,2,\ldots ,T;\) and assumed to have been subject to a single structural break at time \(T_{1}:\)
where \(\varepsilon _{t}\thicksim iid(0,1)\) for all t,
\(\varvec{\beta }_{j}=(\beta _{j1},\beta _{j2},\ldots ,\beta _{jp})^{\prime }\) and \(\varvec{\tau }_{p}\) is a \(p\times 1\) unit vector.
Suppose that the structural break is ignored and residuals are computed by estimating the AR(p) model in \(y_{t}\) using the OLS regression \(y_{t}\) on an intercept and \({\mathbf {x}}_{t}=(y_{t-1},y_{t-2},\ldots ,y_{t-p})^{\prime }\) making using of the available observations \(\digamma _{T}=\left( y_{1-p},y_{2-p},\ldots ,y_{0},y_{1},\ldots ,y_{T}\right) \). In this case, the fitted residuals are given by
where \({\mathbf {x}}_{t}=(y_{t-1},y_{t-2},\ldots ,y_{t-p})^{\prime }\),\(\,{\hat{\varvec{\beta }}}=({\hat{\beta }}_{1},{\hat{\beta }}_{2},\ldots ,{\hat{\beta }}_{p})^{\prime }\ \ \)
\(\mathbf {y{=}}\left( y_{1},y_{2},\ldots ,y_{T}\right) \), \(\mathbf {X{=}}\left( {\mathbf {y}}_{0},{\mathbf {y}}_{-1},\ldots ,{\mathbf {y}}_{-p+1}\right) \), \({\mathbf {y}} _{-j+1}{=}\left( y_{-j+1},y_{-j+2},\ldots ,y_{T-j}\right) ^{\prime }\), \(\varvec{\tau }_{T}\) is a \(T\times 1\) vector of ones, and \({\mathbf {M}}= {\mathbf {I}}_{T}-\varvec{\tau }_{T}(\varvec{\tau }_{T}^{\prime }\varvec{\tau } _{T})^{-1}\varvec{\tau }_{T}^{\prime }\). In what follows, we shall establish that \(E(e_{t})=0\) for \(t=1,2,\ldots ,T\) so long as \(\mu _{1}=\mu _{2},\) \( \varepsilon _{t}\) is symmetrically distributed, and \(E(e_{t})\) exists. We shall provide a proof for the stationary case with a single break, although the result holds much more generally both in the presence of multiple breaks and if there are unit roots in the pre- and/or post-break processes.
In the case where the pre-break regime is stationary, the distribution of the initial values, \({\mathbf {x}}_{p}=(y_{1-p},y_{2-p},\ldots ,y_{0})^{\prime },\) can be written as
where \({\mathbf {V}}_{p}\) is a positive definite matrix.
Using (40) and (35) for \(t=1,2,\ldots ,T\), in matrix notations we have
where \({\mathbf {y}}^{*}=\left( {\mathbf {x}}_{p}^{\prime },{\mathbf {y}}^{\prime }\right) ^{\prime }\), \(\varvec{\varepsilon }^{*}\mathbf {=(}\varepsilon _{1-p},\varepsilon _{2-p},\ldots \varepsilon _{0},\varepsilon _{1},\varepsilon _{2},\ldots ,\varepsilon _{T})^{\prime }\)
The sub-matrices, \({\mathbf {B}}_{ij}\), depend only on the slope coefficients, \( \varvec{\beta }_{1}\) and \(\varvec{\beta }_{2}\) and are as defined in “Appendix B” of Pesaran and Timmermann (2005). \({\mathbf {I}}_{T_{1}}\) and \( {\mathbf {I}}_{T_{2}}\) are identity matrices of order \(T_{1}\) and \(T_{2}\), respectively, \(T_{2}=T-T_{1}\), \(\varvec{\varepsilon }^{*}\sim ({\mathbf {0}} ,{\mathbf {I}}_{T+p})\), and \(\varvec{\psi }_{p}\) is a lower triangular Cholesky factor of \({\mathbf {V}}_{p}\), namely \({\mathbf {V}}_{p}=\varvec{\psi }_{p}\varvec{\psi }_{p}^{\prime }\).
Using (41), it is easily seen that
where \({\mathbf {G}}_{j}\) are \(T\times (T+p)\) selection matrices defined by \( {\mathbf {G}}_{j}=({\mathbf {0}}_{T\times p-j}\varvec{\vdots I}_{T}\varvec{\vdots 0 }_{T\times j})\), \({\mathbf {H}}={\mathbf {B}}^{-1}{\mathbf {D}}\), and \(\mathbf {c=B} ^{-1}{\mathbf {d}}\). In particular,
and
However, as shown in Pesaran and Timmermann (2005, “Appendix B”), under \(\mu _{1}=\mu _{2}=\mu \), \({\mathbf {G}}_{j}\mathbf {c}={\mu {\varvec{\tau }}_{\varvec{T}}}\), and the (i, j) element of \({\mathbf {X}}^{\prime }\mathbf {MX}\) will be given by \( \varvec{\varepsilon }^{*\prime }{\mathbf {H}}^{\prime }{\mathbf {G}} _{i}^{\prime }{\mathbf {M}}_{\tau }{\mathbf {G}}_{j}\mathbf {H}\varvec{\varepsilon }^{*}\) , for \(i,j=1,2,\ldots ,p\), and the \(j\mathrm{th}\) element of \({\mathbf {X}}^{\prime } \mathbf {My}\) by \(\varvec{\varepsilon }^{*\prime }{\mathbf {H}}^{\prime } {\mathbf {G}}_{j}^{\prime }{\mathbf {M}}_{\tau }{\mathbf {G}}_{0}\mathbf {H}\varvec{\varepsilon }^{*}\), for \(j=1,2,\ldots ,p\). Hence, under \(\mu _{1}=\mu _{2}\), \({\hat{\varvec{\beta }}}\) will be an even function of \(\varvec{\varepsilon }\). Similarly, using (39) and recalling that \({\mathbf {G}}_{j}\mathbf { c}={\mu {\varvec{\tau }}_{\varvec{T}}}\), we have
Using this result in (37) and noting that for \(t\le T_{1}\)
we have
Similarly, for \(t>T_{1}\):
It is now easily seen that in both regimes \(e_{t}\) is an odd function of the standardized errors, \(\varepsilon _{t}\), \(t=-p+1,-p+2,\ldots ,T\), and under the distributional symmetry of the errors, we have
Note that for \(\xi _{t}\) to be well defined we need \(T>p+1\), and \(E\left( \xi _{t}\right) \) exists for all \(T>p+1\). In contrast, the condition for the existence of the moments of \(e_{t}\) is much more complicated and demanding. For example, \(E\left( e_{t}\right) \) exists if \(E\left( {\hat{\beta }} _{i}\right) \) exists. A sufficient condition for the latter is known in the literature only for the simple case of \(p=1\). In this case,
and \(E\left( {\hat{\beta }}_{1}\right) \) exists if \(T>3\) (see Pesaran and Timmermann 2005).
Rights and permissions
About this article
Cite this article
Pesaran, M.H. General diagnostic tests for cross-sectional dependence in panels. Empir Econ 60, 13–50 (2021). https://doi.org/10.1007/s00181-020-01875-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-020-01875-7
Keywords
- Cross-sectional dependence
- Spatial dependence
- Diagnostic tests
- Dynamic heterogenous panels
- Empirical growth