Abstract
This paper deals with the problem of maximum likelihood estimation for a mixture of skew Student-t-normal distributions, which is a novel model-based tool for clustering heterogeneous (multiple groups) data in the presence of skewed and heavy-tailed outcomes. We present two analytically simple EM-type algorithms for iteratively computing the maximum likelihood estimates. The observed information matrix is derived for obtaining the asymptotic standard errors of parameter estimates. A small simulation study is conducted to demonstrate the superiority of the skew Student-t-normal distribution compared to the skew t distribution. The proposed methodology is particularly useful for analyzing multimodal asymmetric data as produced by major biotechnological platforms like flow cytometry. We provide such an application with the help of an illustrative example.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Azzalini, A.: The skew-normal distribution and related multivariate families (with discussion). Scand. J. Stat. 32, 159–188 (2005)
Azzalini, A., Capitaino, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003)
Barndorff-Nielsen, O.E.: Normal inverse Gaussian distributions and stochastic volatility modelling. Scand. J. Stat. 24, 1–13 (1997)
Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture. Comput. Stat. 12, 1–17 (1997)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Singapore (2006)
Cabral, C.R.B., Bolfarine, H., Pereira, J.R.G.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11, 317–336 (2010)
Glynn, E.F.: FCSExtract Utility. Stowers Institute for Medical Research. Online available at: http://research.stowers-institute.org/efg/ScientificSoftware/Utility/FCSExtract/ (2006)
Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)
Hahne, F., LeMeur, N., Brinkman, R.R., Ellis, B., Haaland, P., Sarkar, D., Spidlen, J., Strain, E., Gentleman, R.: flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinform. 10, 106 (2009)
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā 62, 49–66 (2000)
Li, J.Q., Barron, A.R.: Mixture density estimation. In: Advances in Neural Information Processing Systems 12. MIT Press, Cambridge (2000)
Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)
Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20, 343–356 (2010)
Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a)
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)
Liu, C.H., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994)
Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. B 44, 226–233 (1982)
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Application to Clustering. Dekker, New York (1988)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)
Meinicke, P., Brodag, T., Fricke, W.F., Waack, S.: P-value based visualization of codon usage data. Algorithms Mol. Biol. 1, 10 (2006)
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)
Nadarajah, S., Kotz, S.: Skewed distributions generated by the normal kernel. Stat. Probab. Lett. 65, 269–277 (2003)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008)
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with application to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Process. Lett. 15, 77–87 (2002)
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of DICTA 2009, Conference of Digital Image Computing: Techniques and Applications, Melbourne, pp. 526–531. IEEE Computer Society, Los Alamitos (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ho, H.J., Pyne, S. & Lin, T.I. Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Stat Comput 22, 287–299 (2012). https://doi.org/10.1007/s11222-010-9225-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-010-9225-9