Abstract
In the last years there has been a growing interest in proposing methods for estimating covariance functions for geostatistical data. Among these, maximum likelihood estimators have nice features when we deal with a Gaussian model. However maximum likelihood becomes impractical when the number of observations is very large. In this work we review some solutions and we contrast them in terms of loss of statistical efficiency and computational burden. Specifically we focus on three types of weighted composite likelihood functions based on pairs and we compare them with the method of covariance tapering. Asymptotic properties of the three estimation methods are derived. We illustrate the effectiveness of the methods through theoretical examples, simulation experiments and by analyzing a data set on yearly total precipitation anomalies at weather stations in the United States.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B 70, 825–848 (2008)
Bevilacqua, M., Gaetan, C., Mateu, J., Porcu, E.: Estimating space and space–time covariance functions for large data sets: a weighted composite likelihood approach. J. Am. Stat. Assoc. 107, 268–280 (2012)
Caragea, P., Smith, R.: Approximate likelihoods for spatial processes. Department of Statistics, Iowa State University, Technical report (2006)
Cressie N.: Statistics for Spatial Data, revised edn. Wiley, New York (1993)
Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B 70, 209–226 (2008)
Curriero, F., Lele, S.: A composite likelihood approach to semivariogram estimation. J. Agric. Biol. Environ. Stat. 4, 9–28 (1999)
Davis, R., Yau, C.-Y.: Comments on pairwise likelihood in time series models. Stat. Sin. 21, 255–277 (2011)
Doukhan, P.: Mixing Properties and Examples. Springer, New York (1994)
Du, J., Zhang, H., Mandrekar, V.S.: Fixed-domain asymptotic properties of tapered maximum likelihood estimators. Ann. Stat. 37, 3330–3361 (2009)
Eidsvik, J., Reich, B., Wheeler, M., Niemi, J.: Estimation and prediction in spatial models with block composite likelihoods’. J. Comput. Gr. Stat. 15(3), 502–523 (2013)
Fuentes, M.: Approximate likelihood for large irregularly spaced spatial data. J. Am. Stat. Assoc. 102, 321–331 (2007)
Furrer, R., Sain, S.R.: Spam: a sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random rields. J. Stat. Softw. 36, 1–25 (2010)
Gneiting, T.: Compactly supported correlation functions. J. Multivar. Anal. 83, 493–508 (2002)
Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)
Guyon, X.: Random Fields on a Network. Springer, New York (1995)
Heagerty, P., Lumley, T.: Window subsampling of estimating functions with application to regression models. J. Am. Stat. Assoc. 95, 197–211 (2000)
Huang, C., Zhang, H., Robeson, S.: On the validity of commonly used covariance and variogram functions on the sphere. Math. Geosci. 43, 721–733 (2011)
Jenish, N., Prucha, I.R.: Central limit theorems and uniform laws of large numbers for arrays of random fields. J. Econ. 150, 86–98 (2009)
Joe, H., Lee, Y.: On weighting of bivariate margins in pairwise likelihood. J. Multivar. Anal. 100, 670–685 (2009)
Kaufman, C.G., Schervish, M.J., Nychka, D.W.: Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc. 103, 1545–1555 (2008)
Lee, Y., Lahiri, S.: Least squares variogram fitting by spatial subsampling. J. R. Stat. Soc. B 64, 837–854 (2002)
Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B 73, 423–498 (2011)
Lindsay, B.: Composite likelihood methods. Contemp. Math. 80, 221–239 (1988)
Lindsay, B.G., Yi, G.Y., Sun, J.: Issues and strategies in the selection of composite likelihoods. Stat. Sin. 21, 71–105 (2011)
Mardia, K.V., Marshall, J.: Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71, 135–146 (1984)
Newey, W.K.: Uniform convergence in probability and stochastic equicontinuity. Econometrica 59, 1161–1167 (1991)
Padoan, S., Bevilacqua, M.: CompRandFld: Composite-likelihood Based Analysis of Random Fields. R package version 1.0.3, (2013)
Rue, H., Tjelmeland, H.: Fitting Gaussian Markov random fields to Gaussian fields. Scand. J. Stat. 29, 31–49 (2002)
Shaby, B., Ruppert, D.: Tapered covariance: Bayesian estimation and asymptotics. J. Comput. Gr. Stat. 21, 433–452 (2012)
Stein, M.: A modeling approach for large spatial datasets. J. Korean Stat. Soc. 37, 3–10 (2008)
Stein, M.: Statistical properties of covariance tapers. J. Comput. Gr. Stat. 1, 91–110 (2013)
Stein, M., Chi, Z., Welty, L.: Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. B 66, 275–296 (2004)
Varin, C., Reid, N., Firth, D.: An overview of composite likelihood methods. Stat. Sin. 21, 5–42 (2011)
Vecchia, A.: Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. B 50, 297–312 (1988)
Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4, 389–396 (1995)
Whittle, P.: On stationary processes in the plane. Biometrika 49, 305–314 (1954)
Zhang, H., Wang, Y.: Kriging and cross-validation for massive spatial data. Environmetrics 21, 290–304 (2010)
Acknowledgments
Research work of Moreno Bevilacqua was partially supported by grant FONDECYT 11121408 from the Chilean government. We thank the anonymous referees and the Associate Editor for their constructive comments that have improved a lot the first version of this work.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Asymptotic results can be been proved for spatial processes which are observed at finitely many locations in the sampling region. In this case we deal with an increasing domain setup where the sampling region is unbounded. In the sequel we suppose that the mean function for the random field is known and without loss of generality this is zero. More precisely, we consider a weakly dependent random field \(\{Z(s), s\in S\}\) defined over an arbitrary lattice \(S\) in \(\mathbb {R}^{d}\) that is not necessarily regular. The lattice \(S\) is equipped with the metric \(\delta (s_k,s_l)=\max _{1\le l \le d}|s_{i,l}-s_{j,l}|\) and the distance between any subsets \(A,B\subset S\) is defined as \(\delta (A,B)=\inf \{\delta (s_k,s_l): \, s_k\in A\text { and } s_l\in B\}\). We denote
where \(\mathcal {F}(E)\) is the \(\sigma \)-algebra generated by the random variables \(\{Z(s),\, s\in E\}\). The \(\alpha \)-mixing coefficient (Doukhan 1994) for the random field \(\{Z(s), s\in S\}\) is defined as
where \(|C|\) is the cardinality of the set \(C\).
We make the following assumptions:
- C1::
-
\(S\) is infinite, locally finite: for all \(s\in S\) and \(r>0, |\mathcal {B}(s,r)\cap S|=O(r^{d})\), with \(\mathcal {B}(s,r)\) \(d\)-dimensional ball of center \(s\) and radius \(r\); moreover there exists a set of neighbourhoods, \(V_s\subset S\), such that \(|V_s|\) is uniformly bounded.
- C2::
-
\(D_n\) is an increasing sequence of finite subsets of \(S:\, d_n=|D_n|\rightarrow \infty \) as \(n\rightarrow \infty \).
- C3::
-
\(Z\) is a Gaussian random field with covariance function \(C(h;\theta )\), with \(\theta \in \Theta .\, \Theta \) is a compact set of \(\mathbb {R}^p\). The function \(\theta \mapsto C(h;\theta )\) has continuous second order partial derivatives with respect to \(\theta \in \Theta \), and these functions are continuous with respect to \(h\) and \(\inf _{ \theta \in \Theta }C(h;\theta )> 0\);
- C4::
-
The true unknown value of the parameter \(\theta \), namely \(\theta ^*\), is an interior point of \(\Theta \).
- C5::
-
The Gaussian random field is \(\alpha \)-mixing with mixing coefficient \(\alpha (m)=\alpha (\infty ,\infty ,m)\) satisfying:
- (C5a):
-
\(\exists \,\eta >0\) s.t. \(\sum _{k,l\in D_{n}}\alpha (\delta (k,l))^{ \frac{\eta }{2+\eta }}=O(d_{n})\),
- (C5b):
-
\(\sum _{m\ge 0}m^{d-1}\alpha (m)<\infty \);
- C6::
-
Let
$$\begin{aligned} g_k(Y(k);\theta )=-(1/2)\sum _{l\in V_k, l\ne k}l_{(k,l)}(\theta ) \end{aligned}$$with \(Y(k)=(Z(s), s\in V_k)\), \(l_{(k,l)}=l_{kl}, l_{k|l}\) or \(l_{k-l}\). The functions \(l_{kl}, l_{k|l}\), and \(l_{k-l}\) are defined as in (10),(11) and (12), respectively. The composite likelihood is defined as
$$\begin{aligned}&Q_n(\theta )=\frac{1}{d_n}\sum _{k\in D_n}g_k(Y(k);\theta ); \end{aligned}$$(28)and the composite likelihood estimator is given by
$$\begin{aligned}&\widehat{\theta }_n=\mathrm{argmin }_{\theta \in \Theta }Q_n(\theta ). \end{aligned}$$ - C7::
-
the function \(\overline{Q}_n(\theta )=\mathrm {E}_{\theta ^*}[Q_n(\theta )]\) has a unique global minimum over \(\Theta \) at \(\theta ^*\), the true value.
Remarks
-
1.
Assumptions C1-2 are quite general. For instance we can consider a rectangular lattice, as in Shaby and Ruppert (2012), \(D_n\subset \Delta \mathbb {Z}^d\), for a fixed \(\Delta >0\), and \(D_n \subset D_{n+1}\) for all \(n\).
-
2.
The \(\alpha -\)mixing assumption C5 are a bit hard to check in general. It is satisfied when we consider compactly supported correlation functions, like the taper functions (2) and (3). When we consider a rectangular lattice the condition is satisfied for a stationary Gaussian random field with correlation function \(C(h;\theta )=O(\Vert h\Vert ^{-c})\), for some \(c> d\) and its spectral density bounded below (Doukhan 1994, Corollary 2, p. 59). In our examples this condition is satisfied by the exponential model.
-
3.
The assumption C7 is an identifiability condition. For each \(s\), the function \(\mathrm {E}_{\theta ^*}[g_s(Y_s;\theta ) ]\) has a global minimum at \(\theta ^*\) according the Kullback-Leibler inequality but in the multidimensional case (\(p >1\)) \(\theta ^*\) fails, in general, to be the unique minimizer. Assumption \(C7\) is not satisfied, for instance, when we consider a rectangular lattice and a compactly supported correlation functions, see (2) and (3), for instance, and \(\Delta > d\). In the case of the covariance models (20), (21) and (22) the condition is clearly satisfied.
-
4.
The assumption C6 is satisfied if we suppose a cut-off weight function for \(w_{kl}\).
-
5.
Any individual log-likelihood \(l_{(i,j)}\) can be written as
$$\begin{aligned}&l_{(k,l)}=c_{1}(\theta ,k-l)+c_{2}(\theta , k-l)Z_k^2+c_{3}(\theta ,k-l)Z_k^2 \\&\quad +c_{4}(\theta ,k-l)Z_k Z_l, \end{aligned}$$where the functions \(c_{i}, i=1,\ldots ,4\) are \(\mathcal {C}^{2}\) functions with respect to \(\theta \).
1.1 Consistency
Given the previous assumptions C1-C7, \(\widehat{\theta }_n\) is a consistent estimator for \(\theta _0\) provided that \(\sup _{\theta \in \Theta }|Q_n(\theta )-\overline{Q}_n(\theta )|\rightarrow 0\) in probability, as \(n\rightarrow \infty \). According Corollary 2.2 in Newey (1991), we have to prove that
-
1.
for each \(\theta \in \Theta , Q_n(\theta )-\overline{Q}_n(\theta )\rightarrow 0\) in probability, as \(n\rightarrow \infty \);
-
2.
for \(M_n=O_p(1)\),
$$\begin{aligned} |\overline{Q}_n(\theta ') - \overline{Q}_n(\theta )|\le M_n \Vert \theta ' -\theta \Vert . \end{aligned}$$
We sketch the proof for \(l_{(k,l)}=l_{kl}\), the same arguments apply for the other sub-likelihoods, using the fourth remark.
-
1.
We prove that \(\sup _{k\in D_n}\mathrm {E}[(\sup _{\theta \in \Theta } g_k(Y(k);\theta ))^{2+\eta }]<\infty \), for \(\eta >0\). In fact, we have
$$\begin{aligned} g_k(Y(k);\theta )&= \frac{1}{2}\sum _{l\in V_k, l\ne k}\left\{ 2\log \sigma ^2+\log (1-\rho ^2_{kl}) \right. \\&\left. +\frac{Z_k^2+Z_l^2-2 \rho _{kl} Z_kZ_l}{\sigma ^2(1-\rho ^2_{kl} )}\right\} \\&\le \sum _{l\in V_k, l\ne k}\log \sigma ^2+\frac{1}{2}\log (1-\rho ^2_{kl})\\&+\frac{Z_k^2+Z_l^2}{\sigma ^2(1-\rho ^2_{kl} )} \\&\le c_1 |V_k|\log \sigma ^2+c_2|V_k|Z_k^2\\&+c_2\sum _{l\in V_k, l\ne k} Z_l^2 \end{aligned}$$and \(|V_k| \) is uniformly bounded according the assumption C6. The uniform bounded moments \(g_k(Y(k);\theta )\) entail uniform \(L^1\) integrability of \(g_k\) and with the assumption C5 we obtain (Jenish and Prucha 2009, Theorem 3)
$$\begin{aligned}&Q_n(\theta )-\overline{Q}_n(\theta )=d_n^{-1}\sum _{k\in D_n} \left\{ g_k(Y(k),\theta ) \right. \\&\quad \left. -\mathrm {E}_\theta [g_k(Y(k),\theta )]\right\} \rightarrow 0, \text { in probability} \end{aligned}$$ -
2.
We have
$$\begin{aligned}&|g_k(Y(k);\theta ')-g_k(Y(k);\theta ) | \\&\quad =\frac{1}{2}\sum _{l\in V_k, l\ne k} \left| 2\log \frac{\sigma ^{'2}}{\sigma ^{2}}+\log \frac{1-\rho ^{'2}_{kl}}{1-\rho ^{2}_{kl}} \right. \\&\quad \quad \quad +(Z^2_k+Z^2_l)\left[ \frac{1}{\sigma ^{'2}(1-\rho ^{'2}_{kl} )}-\frac{1}{\sigma ^2(1-\rho ^2_{kl} )}\right] \\&\quad \quad \quad - \left. 2Z_kZ_l\left[ \frac{\rho ^{'}_{kl} }{\sigma ^{'2}(1-\rho ^{'2}_{kl} )}-\frac{\rho _{kl} }{\sigma ^2(1-\rho ^2_{kl} )}\right] \right| \\&\quad \le c_1 |V_k|\Vert \theta '- \theta \Vert +c_2 (|V_k| Z_k^2\\&\quad \quad +\sum _{l\in V_k, l\ne k} Z_l^2)\Vert \theta '- \theta \Vert \\&|\overline{Q}_n(\theta ') - \overline{Q}_n(\theta )| \\&\quad \le d_n^{-1}\sum _{k\in D_n}| q_k(\theta ') - q_k(\theta ) | \\&\quad \le c_3 d_n^{-1}\sum _{k\in D_n}(1+Z_k^2+\sum _{l\in V_k, l\ne k}Z_l^2)\Vert \theta '- \theta \Vert \\&\quad =M_n\Vert \theta '- \theta \Vert \end{aligned}$$for some positive constants \(c_1, c_2\) and \(c_3\) and \(M_n=c_3 d_n^{-1}\sum _{k\in D_n}(1+Z_k^2+\sum _{l\in V_k, l\ne k}Z_l^2)\). Since \(\mathrm {E}_\theta [M_n]<\infty \), we obtain the desired result.
1.2 Asymptotic normality
We make the additional assumption:
- N1::
-
there exists two symmetric nonnegative definite matrices \(H\) and \(J\) such that for large \(n\):
$$\begin{aligned}&J_{n}=\mathrm {var}_{\theta ^*}(\sqrt{d_{n}}\nabla Q_{n}(\theta ^* ))\ge J\,\, \hbox {and} \,\,H_{n} \\&\quad =\mathrm {E}_{\theta ^* }(\nabla ^2Q_{n}(\theta ^* ))\ge I. \end{aligned}$$
where if \(A\) and \(B\) are two symmetric matrices, \(A \ge B\) means that \(A-B\) is a semipositive definite matrix.
We note that because \(g_{s}\) is a \(\mathcal {C}^{2}\) and \(\Theta \) is a compact space there exists a random variable \(h(Y(s)), \mathrm {E}_{\theta }(h(Y(s)))<\infty \) satisfying:
Moreover for all \(s\in S\), \(\mathrm {E}_{\theta }[\frac{\partial }{\partial \theta _k} g_{s}(\theta )]=0\), because \(g_s\) is a sum of log-likelihoods, and it is easy to show that we have that \(\sup _{s\in S,\,\theta \in \Theta }\mathrm {E}_\theta \left[ \left| \frac{\partial }{\partial \theta _k} g_{s}(\theta )\right| ^{2+\eta }\right] <\infty \) and \(\sup _{s\in S,\,\theta \in \Theta }\mathrm {E}_\theta \left[ \left| \frac{\partial ^2}{\partial \theta _k\theta _l} g_{s}(\theta )\right| ^{2+\eta }\right] <\infty \), for all \(\eta >0\).
Under the condition C1-C7 and N1, conditions (H1-H2-H3) of Theorem 3.4.5 in Guyon (1995) are satisfied and
where \({I}_p\) is the \(p\times p\) identity matrix.
Appendix 2
In this Section we provide the formulas for the Godambe information matrix associated to \(pl_a (\theta ), a=M,C,D\) for a Gaussian random field \(\{Z(s)\}\) with mean function \(E(Z(s))=\mu \), and covariance function \(Cov(Z(s_i),Z(s_j))=\sigma ^2\rho (s_i-s_j;\phi )=\sigma ^2 \rho _{ij}\). We denote
where \(\theta \) is the vector of the unknown parameters.
The Godambe information matrix is given by
Let \(B_{ij}, G_{ij}\) and \(U_{ij}\) be defined in Sect. 3. Moreover we define
For \(pl_M(\theta )\) and \(pl_C(\theta ), \theta =(\phi ,\sigma ^2,\mu )^\intercal \), the pairwise score functions are given by
and for \(pl_D(\theta )\), \(\theta =(\phi ,\sigma ^2)^\intercal \), we have
where \(\displaystyle { \alpha _{ij} =(1+ \rho _{ij})^{-1}{\sqrt{1+ \rho _{ij}^{2}}} {}}\), and \( \kappa _{ij} =(1- \rho _{ij})^{-1}\nabla \rho _{ij} \). Moreover we have
In order to derive the correlations in the \(J_a(\theta ), a=M,C,D\) matrices, we can exploit the following formula that holds for a stationary Gaussian random field:
After some algebra, we have:
-
\(Cov(Q_{ij},Q_{lk})=\sigma ^4 \frac{\rho _{il}+\rho _{ik}+\rho _{jl}+\rho _{jk}}{\sqrt{1+\rho _{ij}}\sqrt{1+\rho _{lk}}} \rho _{ij} \rho _{lk} \)
-
\(Cov(G_{ij},G_{kl})=\sigma ^2 (\rho _{ik}-\rho _{il}\rho _{kl}- \rho _{ij}\rho _{jk}+ \rho _{ij}\rho _{kl}\rho _{jk}) \)
-
\(Cov(G^2_{ij},G^2_{kl})=2[Cov(G_{ij},G_{kl})]^2\)
-
\( Cov(U_{ij},U_{kl})=\sigma ^2 (\rho _{ik}-\rho _{il}-\rho _{jk}+\rho _{jl}) \)
-
\( Cov(U_{ij}^2,U_{kl}^2)=2[Cov(U_{ij},U_{kl})]^2\)
-
\(Cov(B_{ij},B_{kl})= 2\sigma ^4[\rho _{ik}^2+\rho _{il}^2-2\rho _{kl}\rho _{ik}\rho _{il}+\rho _{jk}^2\) \(+\rho _{jl}^2-2\rho _{kl}\rho _{jk}\rho _{jl} -2\rho _{ij}\rho _{ik}\rho _{jk}-2\rho _{ij}\rho _{il}\rho _{jl} \)
-
\(Cov(F_{ij},F_{kl})\) \(\quad =\rho _{ij}\rho _{kl}Cov(B_{ij},B_{kl})\) \(\quad +\sigma ^4(1-\rho _{ij}^2)(1-\rho _{kl}^2)(\rho _{ik}\rho _{jl}+\rho _{il}\rho _{jk})\) \(\quad -2\sigma ^4\rho _{ij}(1-\rho _{kl}^2)(\rho _{ik}-\rho _{jk})(\rho _{il}-\rho _{jl})\) \(\quad -2\sigma ^4\rho _{kl}(1-\rho _{ij}^2)(\rho _{ik}-\rho _{il})(\rho _{jk}-\rho _{jl})\)
-
\(Cov(F_{ij},B_{kl})=\rho _{ij}Cov(B_{ij},B_{kl})\) \( -2\sigma ^4(1-\rho _{ij}^2)(\rho _{ik}-\rho _{il})(\rho _{jk}-\rho _{jl})\)
-
\(Cov(F_{ij},G^2_{kl})\) \(\quad =2\sigma ^4[\rho _{ij}\rho ^2_{ik}+\rho _{ij}\rho ^2_{kl}\rho ^2_{il}-2\rho _{ij}\rho _{kl}\rho _{ik}\rho _{il}\) \(\quad + \rho _{ij}\rho ^2_{jk}\ +\rho _{ij}\rho ^2_{kl}\rho ^2_{jl} -2\rho _{ij}\rho _{kl}\rho _{jk}\rho _{jl} \) \(\quad -(1+\rho ^2_{ij})\rho _{ik}\rho _{jk}-(1+\rho ^2_{ij})\rho ^2_{kl}\rho _{il}\rho _{jl}\) \(\quad + (1+\rho ^2_{ij})\rho _{kl}(\rho _{ik}\rho _{jl}+\rho _{il}\rho _{jk})]\)
Rights and permissions
About this article
Cite this article
Bevilacqua, M., Gaetan, C. Comparing composite likelihood methods based on pairs for spatial Gaussian random fields. Stat Comput 25, 877–892 (2015). https://doi.org/10.1007/s11222-014-9460-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9460-6