Abstract
Factor analysis models have been one of the most popular multivariate methods for data analysis among psychometricians, behavioral and educational researchers. But these models, originally developed for normally distributed observed variables, can be seriously affected by the presence of influential observations and censored data. Motivated by this situation, in this paper we propose a likelihood-based estimation for a multivariate Tobit confirmatory factor analysis model using the Student-t distribution (t-TCFA model). An EM-type algorithm is developed for computing the maximum likelihood estimates, obtaining as a byproduct the standard errors of the fixed effects and the exact likelihood value. Unlike other approaches proposed in the literature, our exact EM-type algorithm uses closed form expressions at the E-step based on the first two moments of a truncated multivariate Student-t distribution with the advantage that these expressions can be computed using standard statistical software. The performance of the proposed methods is illustrated through a simulation study and the analysis of a real dataset of early grade reading assessment test scores.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
This type of censoring scheme relies on the assumption that the time the task was not sufficient to better estimate the responses of the students
References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
Arellano-Valle, R.B., Bolfarine, H.: On some characterizations of the t-distribution. Stat. Probab. Lett. 25, 79–85 (1995)
Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Skew-normal linear mixed models. J. Data Sci. 3, 415–438 (2005)
Arellano-Valle, R.B., Genton, M.G.: Multivariate extended skew-t distributions and related families. Metron 68(3), 201–234 (2010)
Azzalini, A., Genton, M.: Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 76, 1490–1507 (2008)
Bozdogan, H.: Model selection and akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)
Brown, T., Moore, M.: Confirmatory factor analysis. In: Hoyle, R.H. (ed.) Handbook of Structural Equation Modeling, pp. 361–379. Guilford Press, New York (2012)
Costa, D.R., Lachos, V.H., Bazan, J.L., Azevedo, C.L.N.: Estimation methods for multivariate Tobit confirmatory factor analysis. Comput. Stat. Data Anal. 79, 248–260 (2014)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)
DiStefano, C., Zhu, M., Mindrila, D.: Understanding and using factor scores: considerations for the applied researcher. Pract. Assess. Res. Eval. 14(20), 1–11 (2009)
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate normal and t distributions. R package version 0.9-8 (2009). URL: http://CRAN.R-project.org/package=mvtnorm
Ho, H.J., Lin, T.-I., Chen, H.-Y., Wang, W.-L.: Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 142(1), 25–40 (2012)
Jacqmin-Gadda, H., Thiebaut, R., Chene, G., Commenges, D.: Analysis of left-censored longitudinal data with application to viral load in HIV infection. Biostatistics 1(4), 355–368 (2000)
Kamakura, W.A., Wedel, M.: Exploratory Tobit factor analysis for multivariate censored data. Multivar. Behav. Res. 36, 53–82 (2001)
Lange, K.L., Little, R.J., Taylor, J.M.: Robust statistical modeling using the t distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989)
Lin, T.-I., Wu, P. H., McLachlan, G. J., Lee, S.X.: The skew-t factor analysis model. arXiv preprint arXiv:1310.5336 (2013)
Louis, T.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. B 44, 226–233 (1982)
Lucas, A.: Robustness of the Student t based M-estimator. Commun. Stat. 26, 1165–1182 (1997)
Matos, L.A., Lachos, V.H., Balakrishnan, N., Labra, F.V.: Influence diagnostics in linear and nonlinear mixed-effects models with censored data. Comput. Stat. Data Anal. 57(1), 450–464 (2013a)
Matos, L.A., Prates, M.O., H-Chen, M., Lachos, V.H.: Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution. Statistica Sinica 23, 1323–1342 (2013b)
McLachlan, G., Bean, R.: Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput. Stat. Data Anal. 51, 5327–5338 (2007)
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)
Muthén, B.O.: Tobit factor analysis. Br. J. Math. Stat. Psychol. 42, 241–250 (1989)
Prates, M.O., Costa, D.R., Lachos, V.H.: Generalized linear mixed models for correlated binary data with t-link. Stat. Comput. (2013). doi:10.1007/s11222-013-9423-3
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2014). URL http://www.R-project.org
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Vaida, F., Fitzgerald, A., DeGruttola, V.: Efficient hybrid EM for linear and nonlinear mixed effects models with censored response. Comput. Stat. Data Anal. 51(12), 5718–5730 (2007)
Vaida, F., Liu, L.: Fast implementation for Normal mixed effects models with censored response. J. Comput. Graph. Stat. 18(4), 797–817 (2009)
Wang, W., Lin, T.: An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers. Comput. Stat. 28, 751–769 (2013)
Wu, L.: Mixed Effects Models for Complex Data. Chapman & Hall/CRC, Boca Raton (2010)
Zhang, J., Li, J., Liu, C.: Robust factor analysis using the multivariate \(t\)-distribution. Statistica Sinica. 24, 291–312 (2014)
Zhou, X., Liu, X.: The Monte Carlo EM method for estimating multivariate Tobit latent variable models. J. Stat. Comput. Simul. 79, 1095–1107 (2009)
Zhou, X., Tan, C.: Maximum likelihood estimation of Tobit factor analysis for multivariate t-distribution. Commun. Stat. 39, 1–16 (2010)
Acknowledgments
The authors are grateful to the editor, associate editor and two anonymous reviewers for their valuable comments and suggestions that greatly improved this paper. We would also like to thank Dr. Jorge Bazán for supplying the EGRA data . The research of Luis M. Castro was supported by Grant FONDECYT 1130233 from the Chilean government and Grant 2012/19445-0 from FAPESP-Brazil. Denise Costa acknowledges support from CAPES-Brazil. Marcos Prates acknowledges support from CNPq and FAPEMIG-Brazil (Grant APQ-00570-13). The research of Victor Lachos was supported by FAPESP-Brazil (Grant 14/02938-9) and by CNPq-Brazil (Grant 305054/2011-2).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proofs of Propositions
Proof of Proposition 1
The proof of \((i)\) is straightforward from Equation (1). The proof of \((ii)\) follows from Proposition 4 given in Arellano-Valle and Genton (2010) by setting \(\lambda =\tau =0\).
Proof of Proposition 2
If \(\mathbf{X}\sim t_p(\varvec{\mu },\varvec{\varSigma },\nu )\), then we can write
Then, it follows that
which concludes the proof.
Proof of Proposition 3
If \(\mathbf{X}\sim t_p(\varvec{\mu },\varvec{\varSigma },\nu )\), and using the result given in Proposition 1-\((ii)\), we have
and the proof concludes by noting that
where \(\mathbf{X}_2\sim t_{p_2}\left( \varvec{\mu }_{2.1},\widetilde{\varvec{\varSigma }}^*_{22.1},\nu +p_1+2r \right) .\)
Appendix B: Details of the EM algorithm
First, we introduce Lemma 1, which used in our procedures. Its proof can be found in Arellano-Valle et al. (2005).
Lemma 1
Let \(\mathbf{Y}\sim N_p(\varvec{\mu },\varvec{\varSigma })\) and \( \mathbf{x}\sim N_q(\varvec{\eta },\varvec{\varOmega })\). So,
where \(\varvec{\varLambda }= (\varvec{\varOmega }^{-1} + \mathbf{A}^{\top }\varvec{\varSigma }^{-1}\mathbf{A})^{-1}\).
The derivatives of the function \(Q(\varvec{\theta }|\varvec{\theta }^{(k)}\) with respect to \(\varvec{\beta }\), \(\varvec{\varLambda }\) and \(\varvec{\varPsi }\) leads to
where
The solution of these derivatives at zero gives the estimates of the MLE presented in (8)–(10).
Appendix C: Complementary results of the simulation study
Figures 10 and 11 present the absolute bias and the MSE of \(\lambda _{42}\), \(\varPsi _{33}\) and \(\varPsi _{44}\).
Rights and permissions
About this article
Cite this article
Castro, L.M., Costa, D.R., Prates, M.O. et al. Likelihood-based inference for Tobit confirmatory factor analysis using the multivariate Student-t distribution. Stat Comput 25, 1163–1183 (2015). https://doi.org/10.1007/s11222-014-9502-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9502-0