Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (J Am Stat Assoc 72(358):320–338, 1977)’s work, but it is able to deal with models that have a precision matrix for the random effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (separation of overlapping precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88(421), 9–25 (1993)

    MATH  Google Scholar 

  • Camarda, C.G., Eilers, P.H., Gampe, J.: Sums of smooth exponentials to decompose complex series of counts. Stat. Model. 16(4), 279–296 (2016)

    Article  MathSciNet  Google Scholar 

  • Crainiceanu, C.M., Ruppert, D., Carroll, R.J., Joshi, A., Goodner, B.: Spatially adaptive Bayesian penalized splines with heteroscedastic errors. J. Comput. Graph. Stat. 16(2), 265–288 (2007)

    Article  MathSciNet  Google Scholar 

  • Crump, S.L.: The present status of variance component analysis. Biometrics 7(1), 1–16 (1951)

    Article  MathSciNet  Google Scholar 

  • Cui, Y., Hodges, J.S., Kong, X., Carlin, B.P.: Partitioning degrees of freedom in hierarchical and other richly-parameterized models. Technometrics 52, 124–136 (2010)

    Article  MathSciNet  Google Scholar 

  • Currie, I.D., Durban, M.: Flexible smoothing with P-splines: a unified approach. Stat. Model. 2(4), 333–349 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Currie, I.D., Durban, M., Eilers, P.H.C.: Generalized linear array models with applications to multidimensional smoothing. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(2), 259–280 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P.L., Gather, U., Meise, M., Mergel, D., Mildenberger, T.: Residual-based localization and quantification of peaks in X-ray diffractograms. Ann. Appl. Stat. 2(3), 861–886 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P.L., Gather, U., Meise, M. Mergel, D., Mildenberger, T., Bernholt, T., Hofmeister, T.: diffractometry: baseline identification and peak decomposition for x-ray diffractograms. R package version 0.1-10 (2018)

  • Djeundje, V.A., Currie, I.D.: Appropriate covariance-specification via penalties for penalized splines in mixed models for longitudinal data. Electron. J. Stat. 4, 1202–1224 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Durban, M., Aguilera-Morillo, M.C.: On the estimation of functional random effects. Stat. Model. 17(1–2), 50–58 (2017)

    Article  MathSciNet  Google Scholar 

  • Durban, M., Harezlak, J., Wand, M.P., Carroll, R.J.: Simple fitting of subject-specific curves for longitudinal data. Stat. Med. 24(8), 1153–1167 (2005)

    Article  MathSciNet  Google Scholar 

  • Eilers, P.H.C.: Discussion of Verbyla et al. J. R. Stat. Soc. Ser. C (Appl. Stat.) 48, 300–311 (1999)

    Google Scholar 

  • Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Engel, B.: The analysis of unbalanced linear models with variance components. Stat. Neerl. 44, 195–219 (1990)

    Article  MathSciNet  Google Scholar 

  • Engel, B., Buist, W.: Analysis of a generalized linear mixed model: a case study and simulation results. Biom. J. 38(1), 61–80 (1996)

    Article  MATH  Google Scholar 

  • Engel, B., Keen, A.: A simple approach for the analysis of generalizea linear mixed models. Stat. Neerl. 48(1), 1–22 (1994)

    Article  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4), 1440–1450 (1995)

    Article  MATH  Google Scholar 

  • Goldsmith, J., Bobb, J., Crainiceanu, C.M., Caffo, B., Reich, D.: Penalized functional regression. J. Comput. Graph. Stat. 20(4), 830–851 (2011)

    Article  MathSciNet  Google Scholar 

  • Goldsmith, J., Crainiceanu, C.M., Caffo, B., Reich, D.: Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. J. R. Stat. Soc. Ser. C (Appl. Stat.) 61(3), 453–469 (2012)

    Article  MathSciNet  Google Scholar 

  • Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M.W., Swihart, B., Xiao, L., Crainiceanu, C., Reiss, P.T.: refund: Regression with Functional Data. R package version 0.1-16 (2016)

  • Graser, H.-U., Smith, S.P., Tier, B.: A derivative-free approach for estimating variance components in animal models by restricted maximum likelihood. J. Anim. Sci. 2(64), 1362–1373 (1987)

    Article  Google Scholar 

  • Green, P.J.: Penalized likelihood for general semi-parametric regression models. Int. Stat. Rev./Revue Internationale de Statistique 55(3), 245–259 (1987)

    MathSciNet  MATH  Google Scholar 

  • Greven, S., Scheipl, F.: A general framework for functional regression modelling. Stat. Model. 17(1–2), 1–35 (2017)

    Article  MathSciNet  Google Scholar 

  • Groll, A., Tutz, G.: Variable selection for generalized linear mixed models by L1-penalized estimation. Stat. Comput. 24(2), 137–154 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Harville, D.A.: Maximum likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72(358), 320–338 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, Berlin (1997)

    Book  MATH  Google Scholar 

  • Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman & Hall, London (1990)

    MATH  Google Scholar 

  • Heckman, N., Lockhart, R., Nielsen, J.D.: Penalized regression, mixed effects models and appropriate modelling. Electron. J. Stat. 7, 1517–1552 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Henderson, C.R.: Selection index and expected genetic advance. Stat. Genet. Plant Breed. 982, 141–163 (1963)

    Google Scholar 

  • Hunter, D.R., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33(4), 1617–1642 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, D.L., Thompson, R.: Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. J. Dairy Sci. 78, 449–456 (1995)

    Article  Google Scholar 

  • Karas, M., Brzyski, D., Dzemidzic, M., Goñi, J., Kareken, D.A., Randolph, T.W., Harezlak, J.: Brain connectivity-informed regularization methods for regression. Stat. Biosci. (2017). https://doi.org/10.1007/s12561-017-9208-x

  • Krivobokova, T.: Smoothing parameter selection in two frameworks for penalized splines. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75(4), 725–741. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12010 (2009)

  • Krivobokova, T., Crainiceanu, C.M., Kauermann, G.: Fast adaptive penalized splines. J. Comput. Graph. Stat. 17(1), 1–20 (2008)

    Article  MathSciNet  Google Scholar 

  • Lee, D.-J.: Smoothing mixed model for spatial and spatio-temporal data. PhD thesis. Department of Statistics, Universidad Carlos III de Madrid, Spain (2010)

  • McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall/CRC Monographs on Statistics and Applied Probability Series, 2nd edn. Chapman & Hall, London (1989)

    Google Scholar 

  • Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58(3), 545–554 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018)

  • Reiss, P .T., Ogden, R .T.: Smoothing parameter selection for a class of semiparametric linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 71(2), 505–523 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Rodríguez-Álvarez, M.X., Durban, M., Lee, D.-J., Eilers, P.H.C.: Fast estimation of multidimensional adaptive p-spline models. In: Friedl, H., Wagner, H. (eds.) Proceedings of the 30th International Workshop on Statistical Modelling, pp 330 – 335. arXiv:1610.06861 (2015a)

  • Rodríguez-Álvarez, M.X., Lee, D.-J., Kneib, T., Durban, M., Eilers, P.H.C.: Fast smoothing parameter separation in multidimensional generalized P-splines: the sap algorithm. Stat. Comput. 25, 941–957 (2015b)

    Article  MathSciNet  MATH  Google Scholar 

  • Rodríguez-Álvarez, M.X., Durban, M., Lee, D.-J., Eilers, P.H.C., Gonzalez, F.: Spatio-temporal adaptive penalized splines with application to neuroscience. In: Dupuy, J.-F., Josse, J. (eds.) Proceedings of the 31th International Workshop on Statistical Modelling, pp. 267–272. arXiv:1610.06860 (2016)

  • Rodríguez-Álvarez, M.X., Boer, M.P., van Eeuwijk, F.A., Eilers, P.H.: Correcting for spatial heterogeneity in plant breeding experiments with p-splines. Spat. Stat. 23, 52–71 (2018)

    Article  MathSciNet  Google Scholar 

  • Ruppert, D., Carroll, R.J.: Spatially-adaptive penalties for spline fitting. Aust. N. Z. J. Stat. 42(2), 205–223 (2000)

    Article  Google Scholar 

  • Ruppert, D., Wand, M.P., Carroll, R.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  • Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78(4), 719–727 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Simpkin, A., Newell, J.: An additive penalty p-spline approach to derivative estimation. Comput. Stat. Data Anal. 68, 30–43 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Smith S.P.: Estimation of genetic parameters in non-linear models. In: Gianola, D., Hammond, K. (eds.) Advances in Statistical Methods for Genetic Improvement of Livestock. Advanced Series in Agricultural Sciences, vol. 18. Springer, Berlin, Heidelberg (1990)

  • Taylor, J.D., Verbyla, A.P., Cavanagh, C., Newberry, M.: Variable selection in linear mixed models using an extended class of penalties. Aust. N. Z. J. Stat. 54(4), 427–449 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.J.: Adaptive piecewise polynomial estimation via trend filtering. Ann. Stat. 42(1), 285–323 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Wand, M.P.: Smoothing and mixed models. Comput. Stat. 18(2), 223–249 (2003)

    Article  MATH  Google Scholar 

  • Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(3), 495–518 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S .N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(1), 2–36 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N.: Generalized Additive Models: An Introduction with R, 2nd edn. Chapman & Hall CRC, London (2017)

    Book  MATH  Google Scholar 

  • Wood, S.N., Fasiolo, M.: A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. Biometrics 73, 1071–1081 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N., Pya, N., Säfken, B.: Smoothing parameter and model selection for general smooth models. J. Am. Stat. Assoc. 111(516), 1548–1563 (2016)

    Article  MathSciNet  Google Scholar 

  • Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509–1533 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by the Basque Government through the BERC 2018-2021 program and by Spanish Ministry of Economy and Competitiveness MINECO through BCAM Severo Ochoa excellence accreditation SEV-2013-0323 and through projects MTM2017-82379-R funded by (AEI/FEDER, UE) and acronym “AFTERAM”, MTM2014-52184-P and MTM2014-55966-P. The MRI/DTI data were collected at Johns Hopkins University and the Kennedy-Krieger Institute. We are grateful to Pedro Caro and Iain Currie for useful discussions, to Martin Boer and Cajo ter Braak for the detailed reading of the paper and their many suggestions, and to Bas Engel for sharing with us his knowledge. We are also grateful to the two peer referees for their constructive comments of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to María Xosé Rodríguez-Álvarez.

Appendices

A Proof of Theorem 1

Proof

We first note that the first-order partial derivatives of the (approximate) REML log-likelihood function can be expressed as (see, e.g., Rodríguez-Álvarez et al. 2015b)

$$\begin{aligned} \frac{\partial {l}}{\partial {\sigma _{k_l}^2}} = - \frac{1}{2}\hbox {trace}\left( {\varvec{Z}}^{\top }{\varvec{P}}{\varvec{Z}}\frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}}\right) + \frac{1}{2}\widehat{\varvec{\alpha }}^{\top }{\varvec{G}}^{-1}\frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}}{\varvec{G}}^{-1}\widehat{\varvec{\alpha }}. \end{aligned}$$

Given that \({\varvec{G}}\) is a positive definite matrix, we have the identity

$$\begin{aligned} \frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}} = - {\varvec{G}}\frac{\partial {{\varvec{G}}^{-1}}}{\partial {\sigma _{k_l}^2}}{\varvec{G}}, \end{aligned}$$

and thus

$$\begin{aligned} \frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}} = \frac{1}{\sigma ^{4}_{k_l}}\hbox {diag}\left( {\varvec{0}}^{(1)},{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k,{\varvec{0}}^{(2)}\right) , \end{aligned}$$

where \({\varvec{0}}^{(1)}\) and \({\varvec{0}}^{(2)}\) are zero square matrices of appropriate dimensions.

The first-order partial derivatives of the REML log-likelihood function are then expressed as

$$\begin{aligned} 2\frac{\partial {l}}{\partial {\sigma _{k_l}^2}} = - \frac{1}{\sigma _{k_l}^4}\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k\right) + \frac{1}{\sigma _{k_l}^4}\widehat{\varvec{\alpha }}_k^{\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k. \end{aligned}$$

When the REML estimates are positive, they are obtained by equating the former expression to zero (see, e.g., Engel 1990)

$$\begin{aligned} \frac{\widehat{\varvec{\alpha }}^{\top }_k\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k}{\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k\right) } = 1. \end{aligned}$$

We now multiply both sides with \(\sigma _{k_l}^2\) and evaluate the left-hand side for the previous iterates and the right-hand side for the update, obtaining

$$\begin{aligned} {\widehat{\sigma }}^{2}_{k_l}= & {} \frac{\widehat{\varvec{\alpha }}_k^{[t]\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k^{[t]}}{\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}^{[t]}{\varvec{Z}}_k{\varvec{G}}_k^{[t]}\varvec{\Lambda }_{k_l}{\varvec{G}}_k^{[t]}\right) }{\widehat{\sigma }}_{k_l}^{2[t]}\\= & {} \frac{\widehat{\varvec{\alpha }}_k^{[t]\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k^{[t]}}{\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}^{[t]}{\varvec{Z}}_k{\varvec{G}}_k^{[t]}\frac{\varvec{\Lambda }_{k_l}}{{\widehat{\sigma }}_{k_l}^{2[t]}}{\varvec{G}}_k^{[t]}\right) }. \end{aligned}$$

\(\square \)

B Proof of Theorem 2

Proof

First let us recall some notation and introduce some needed results. We denote as \({\varvec{P}} = {\varvec{V}}^{-1} - {\varvec{V}}^{-1}{\varvec{X}}\left( {\varvec{X}}^{\top }{\varvec{V}}^{-1}{\varvec{X}}\right) {\varvec{X}}^{\top }{\varvec{V}}^{-1}\), where \({\varvec{V}} = {\varvec{R}} + {\varvec{Z}}{\varvec{G}}{\varvec{Z}}^{\top }\), \({\varvec{R}} = \phi {\varvec{W}}^{-1}\) with \({\varvec{W}}\) being the diagonal weight matrix involved in the Fisher scoring algorithm.

Denote as \({\mathcal {C}}\left( {\varvec{A}}\right) \) the linear space spanned by the columns of \({\varvec{A}}\), and let \({\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}} = {\varvec{X}}\left( {\varvec{X}}^{\top }{\varvec{V}}^{-1}{\varvec{X}}\right) ^{-1}{\varvec{X}}^{\top }{\varvec{V}}^{-1}\) be the orthogonal projection matrix for \({\mathcal {C}}\left( {\varvec{X}}\right) \) with respect to \({\varvec{V}}^{-1}\). It is easy to show that

$$\begin{aligned} {\varvec{P}}&= {\varvec{V}}^{-1}\left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \\&= \left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) {\varvec{V}}^{-1}\left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) . \end{aligned}$$

By Theorem 14.2.9 in Harville (1997), \({\varvec{P}}\) is a (symmetric) positive semi-definite matrix. In addition,

$$\begin{aligned} {\varvec{P}}{\varvec{X}} = {\varvec{0}}, \end{aligned}$$

and

$$\begin{aligned} {\hbox {rank}}({\varvec{P}})&= \hbox {rank}\left( {\varvec{V}}^{-1}\left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \right) \\&= \hbox {rank}\left( \left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \right) \\&= n - \hbox {rank}\left( {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \\&= n - \hbox {rank}\left( {\varvec{X}}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} ker\left( {\varvec{P}}\right) = {\mathcal {C}}\left( {\varvec{X}}\right) , \end{aligned}$$
(26)

i.e., \({\varvec{P}}{\varvec{x}} = {\varvec{0}}\) if and only if \({\varvec{x}}\) is in \({\mathcal {C}}\left( {\varvec{X}}\right) \).

Let \(\varvec{\Lambda }_{k_l} = {\varvec{U}}{\varvec{\Sigma }}{\varvec{U}}^{\top }\) be the eigenvalue decomposition of \(\varvec{\Lambda }_{k_l}\). Note that \(\varvec{\Lambda }_{k_l} = {\varvec{U}}_{+}{\varvec{\Sigma }}_{+}{\varvec{U}}_{+}^{\top }\), where \({\varvec{U}}_{+}\) and \({\varvec{\Sigma }}_{+}\) are the sub-matrices corresponding to the nonzero eigenvalues. Then

$$\begin{aligned} \widehat{\varvec{\alpha }}_k^{\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k&= \widehat{\varvec{\alpha }}_k^{\top }{\varvec{U}}_{+}{\varvec{\Sigma }}_{+}{\varvec{U}}_{+}^{\top }\widehat{\varvec{\alpha }}_k\\&= {\varvec{z}}^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}{\varvec{\Sigma }}_{+}{\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{z}} \geqslant 0, \end{aligned}$$

with equality holding if and only if \({\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{z}} = {\varvec{0}}\) (since \({\varvec{\Sigma }}_{+}\) is positive definite). Thus, using result (26), the equality holds if \({\varvec{z}}\) is in \({\mathcal {C}}\left( {\varvec{X}}\right) \) or \({\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}\right) \subset {\mathcal {C}}\left( {\varvec{X}}\right) \). By Lemma 4.2.2 and Corollary 4.5.2 in Harville (1997), we have

$$\begin{aligned} {\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}\right)&= {\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}\right) \subset {\mathcal {C}}\left( {\varvec{X}}\right) \\&\iff \hbox {rank}\left( {\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l},{\varvec{X}}\right) = \hbox {rank}\left( {\varvec{X}}\right) . \end{aligned}$$

Regarding the denominator of the REML-based estimates updates, we follow a similar reasoning. Using Corollary 14.7.5 (and Theorem 14.2.9) in Harville (1997), we have

$$\begin{aligned} \hbox {trace}&\left( {\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k\right) \\&= \hbox {trace}\left( {\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}{\varvec{\Sigma }}_{+}\right) \geqslant 0, \end{aligned}$$

with equality holding if and only if \({\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+} = {\varvec{0}}\). Again, this equality holds if and only if \({\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}\right) \subset {\mathcal {C}}\left( {\varvec{X}}\right) \) (i.e., \(\iff \hbox {rank}\left( {\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l},{\varvec{X}}\right) = \hbox {rank}\left( {\varvec{X}}\right) \)). \(\square \)

C Estimating algorithm

This appendix summarises the steps of the estimating algorithm for model (10) based on the SOP method. Recall that interest lies in estimating model

$$\begin{aligned} g\left( {\varvec{\mu }}\right) = {\varvec{X}}\varvec{\beta } + {\varvec{Z}}\varvec{\alpha } = {\varvec{X}}\varvec{\beta } + \sum _{k=1}^{c}{\varvec{Z}}_k\varvec{\alpha }_k, \end{aligned}$$

where \({\varvec{Z}} = \left[ {\varvec{Z}}_{1},\ldots ,{\varvec{Z}}_{c}\right] \), \(\varvec{\alpha } = \left( \varvec{\alpha }_{1}^{\top },\ldots ,\varvec{\alpha }_{c}^{\top }\right) ^{\top }\), \(\varvec{\alpha }_k\sim N\left( {\varvec{0}}, {\varvec{G}}_k\right) \) with \( {\varvec{G}}_{k}^{-1} = \sum _{l = 1}^{p_k}\sigma _{k_l}^{-2}\varvec{\Lambda }_{k_l}\), and \(\varvec{\alpha }\sim N\left( {\varvec{0}}, {\varvec{G}}\right) \) with \({\varvec{G}} = \bigoplus _{k = 1}^{c}{\varvec{G}}_{k}\).

  1. Initialise

    Set initial values for \(\widehat{{\varvec{\mu }}}^{[0]}\) and the variance parameters \({\widehat{\sigma }}_{k_l}^{2[0]}\) (\(l = 1,\ldots ,p_k\) and \(k = 1, \ldots , c\)). In those situations where \(\phi \) is unknown, establish an initial value for this parameter, \({\widehat{\phi }}^{[0]}\). Set \(t = 0\).

  2. Step 1

    Construct the working response vector \({\varvec{z}}\) and the matrix of weights \({\varvec{W}}\) as follows

    $$\begin{aligned} z_i = g({\widehat{\mu }}_i^{[t]}) + (y_i - {\widehat{\mu }}_i^{[t]})g^{\prime }({\widehat{\mu }}_i^{[t]}), \\ w_{ii} = \left\{ g'({\widehat{\mu }}_i^{[t]})^2\nu ({\widehat{\mu }}_i^{[t]})\right\} ^{-1}. \end{aligned}$$
    1. Step 1.1

      Given the initial estimates of variance parameters, estimate \(\varvec{\alpha }\) and \(\varvec{\beta }\) by solving

      $$\begin{aligned} \underbrace{ \begin{bmatrix} {\varvec{X}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{X}}&{\varvec{X}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{Z}} \\ {\varvec{Z}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{X}}&{\varvec{Z}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{Z}} + {{\varvec{G}}^{[t]}}^{-1} \end{bmatrix}}_{{\varvec{C}}^{[t]}} \begin{bmatrix} \varvec{{\widehat{\beta }}}\\ \varvec{{\widehat{\alpha }}} \end{bmatrix} = \begin{bmatrix} {\varvec{X}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{z}}\\ {\varvec{Z}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{z}} \end{bmatrix},\nonumber \\ \end{aligned}$$
      (27)

      where \({\varvec{R}}^{[t]} = {\widehat{\phi }}^{[t]}{\varvec{W}}^{-1}\). Let \(\varvec{{\widehat{\alpha }}}^{[t]}\) and \(\varvec{{\widehat{\beta }}}^{[t]}\) be these estimates.

    2. Step 1.2

      Update the variance parameters as follows

      $$\begin{aligned} {\widehat{\sigma }}^{2}_{k_l} = \frac{\widehat{\varvec{\alpha }}_k^{{[t]}\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k^{[t]}}{\hbox {ED}_{k_l}^{[t]}}, \end{aligned}$$

      and, when necessary,

      $$\begin{aligned} {\widehat{\phi }} = \frac{\left( {\varvec{z}}-{\varvec{X}}\varvec{{\widehat{\beta }}}^{[t]}-{\varvec{Z}}\varvec{{\widehat{\alpha }}}^{[t]}\right) ^{\top }{\varvec{W}}\left( {\varvec{z}}-{\varvec{X}}\varvec{{\widehat{\beta }}}^{[t]}-{\varvec{Z}}\varvec{{\widehat{\alpha }}}^{[t]}\right) }{n - \hbox {rank}({\varvec{X}}) - \sum _{k=1}^c\sum _{l=1}^{p_k}\hbox {ED}_{k_l}^{[t]}}, \end{aligned}$$

      with

      $$\begin{aligned} \hbox {ED}_{k_l}^{[t]} = \hbox {trace}\left( \left( {\varvec{G}}^{[t]}_k - {{\varvec{C}}^{[t]}}^{*}_{kk}\right) \frac{\varvec{\Lambda }_{k_l}}{{{\widehat{\sigma }}_{k_l}^{2[t]}}}\right) . \end{aligned}$$

      Recall that \({{\varvec{C}}^{[t]}}^{*}\) denotes the inverse of \({\varvec{C}}^{[t]}\) in (27), and \({{\varvec{C}}^{[t]}}^{*}_{kk}\) is that partition of \({{\varvec{C}}^{[t]}}^{*}\) corresponding to the k-th random component \(\varvec{\alpha }_k\).

    3. Step 1.3

      Repeat Step 1.1 and Step 1.2, replacing \({\widehat{\sigma }}_{k_l}^{2[t]}\), and, if updated, \({\widehat{\phi }}^{[t]}\), by \({\widehat{\sigma }}_{k_l}^{2}\) and \({\widehat{\phi }}\), until convergence. For the examples presented in Sect. 5, we use the REML deviance as the convergence criterion.

  3. Step 2

    Repeat Step 1, replacing the variance parameters and the model’s fixed and random effects (and thus \(\widehat{{\varvec{\mu }}}^{[t]}\)) by those obtained in the last iteration of Steps 1.1–Step 1.3, until convergence.

It is worth noting that for notational convenience, in the examples described in Sect. 4, the precision matrix was rewritten as \({\varvec{G}}^{-1} = \sum _{l=1}^{p}\sigma _l^{-2}\widetilde{\varvec{\Lambda }}_l\), where \(p = \sum _{k=1}^{c}p_k\) is the number of variance parameters, and \(\widetilde{\varvec{\Lambda }}_l\) are the matrices \(\varvec{\Lambda }_{k_l}\) padded out with zeroes. Here, \(\widetilde{\varvec{\Lambda }}_l\) are matrices of dimension \(q \times q\), where \( q = \sum _{k=1}^{c}q_k\) is the number of random effects coefficients. For this specification, the estimating algorithm discussed above remains essentially the same, but in Step 1.2, \(\varvec{\alpha }^{[t]}_k\), \({\varvec{G}}^{[t]}_k\), and \(\varvec{\Lambda }_{k_l}\) are replaced by, respectively, \(\varvec{\alpha }^{[t]}\), \({\varvec{G}}^{[t]}\) and \(\widetilde{\varvec{\Lambda }}_l\), and \({{\varvec{C}}^{[t]}}^{*}_{kk}\) would be that partition of \({{\varvec{C}}^{[t]}}^{*}\) corresponding to the random vector \(\varvec{\alpha }\) (and thus the same for all variance parameters).

D Factor-by-curve hierarchical curve model

This appendix describes in detail the factor-by-curve interaction model discussed in Sect. 5.3, i.e.,

$$\begin{aligned} y_{ij} = f_{z_j}\left( t_i\right) + g_j\left( t_i\right) + \varepsilon _{ij} \;\; 1\le i \le s,\;1\le j \le m, \end{aligned}$$

where \(z_j = 1\) if the j-th individual is affected by MS (case) and \(z_j = 0\) otherwise (control). Let’s order the data with the observations on controls first, followed by observations on MS patients. In matrix notation, the model can be expressed as

$$\begin{aligned} {\varvec{y}} = [{\varvec{Q}}\otimes {\varvec{B}}]{\varvec{\theta }} + [{\varvec{I}}_{m}\otimes \breve{{\varvec{B}}}]\breve{{\varvec{\theta }}} + \varvec{\varepsilon }, \end{aligned}$$
(28)

with \({\varvec{B}}\), \(\breve{{\varvec{B}}}\), \(\breve{{\varvec{\theta }}}\) as defined in Sect. 4.2, and \(\varvec{\varepsilon } = \left( \varvec{\varepsilon }_{\cdot ,1}^{\top },\ldots , \varvec{\varepsilon }_{\cdot ,m}^{\top }\right) ^{\top }\), with \(\varvec{\varepsilon }_{\cdot j}= \left( \varepsilon _{1j},\ldots , \varepsilon _{sj}\right) ^{\top }\). Matrix \({\varvec{Q}}\) is any suitable contrast matrix of dimension \(m \times 2\), where \(m = m_0 + m_1\), with \(m_0\) being the number of controls and \(m_1\) the number of MS patients. For our application, we consider

$$\begin{aligned} {\varvec{Q}} = \begin{pmatrix} {\varvec{1}}_{m_0} &{} {\varvec{0}}_{m_0}\\ {\varvec{0}}_{m_1} &{} {\varvec{1}}_{m_1} \end{pmatrix}, \end{aligned}$$

and a different amount of smoothing is assumed for \(f_{0}\) and \(f_{1}\), i.e., the penalty matrix acting over the vector of coefficients \({\varvec{\theta }}\) is of the form

$$\begin{aligned} {\varvec{P}} = \begin{pmatrix} \lambda _{1}{\varvec{D}}_q^{\top }{\varvec{D}}_q &{} {\varvec{0}}_{c \times c}\\ {\varvec{0}}_{c \times c} &{} \lambda _{2}{\varvec{D}}_q^{\top }{\varvec{D}}_q \end{pmatrix}. \end{aligned}$$

The reformulation as a mixed model can be done in a similar fashion to that described in Sect. 4.2, with, in this case

$$\begin{aligned} {\varvec{X}} =&[{\varvec{Q}}\otimes {\varvec{B}}{\varvec{U}}_{0}],\\ {\varvec{Z}} =&[{\varvec{Q}}\otimes {\varvec{B}}{\varvec{U}}_{+}:{\varvec{I}}_m\otimes \breve{{\varvec{B}}}], \end{aligned}$$

and

$$\begin{aligned} {\varvec{G}}^{-1} = \begin{pmatrix} \sigma _{1}^{-2}{\varvec{\Sigma }}_{+} &{} {\varvec{0}}_{d \times d} &{} {\varvec{0}}_{d\times \left( m \breve{d}\right) }\\ {\varvec{0}}_{d\times d} &{} \sigma _{2}^{-2}{\varvec{\Sigma }}_{+} &{} {\varvec{0}}_{d\times \left( m \breve{d}\right) }\\ {\varvec{0}}_{\left( m \breve{d}\right) \times d}&{} {\varvec{0}}_{\left( m \breve{d}\right) \times c} &{} {\varvec{I}}_{m}\otimes \breve{{\varvec{G}}}^{-1} \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} \breve{{\varvec{G}}}^{-1} = \sigma _3^{-2}\breve{{\varvec{D}}}_{\breve{q}}^{\top }\breve{{\varvec{D}}}_{\breve{q}} + \sigma _4^{-2}{\varvec{I}}_{\breve{d}}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodríguez-Álvarez, M.X., Durban, M., Lee, DJ. et al. On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing. Stat Comput 29, 483–500 (2019). https://doi.org/10.1007/s11222-018-9818-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-018-9818-2

Keywords

Navigation