Abstract
Joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data has been proposed to study the relationship between diseases and their related biomarkers. However, statistical inference of the joint latent class modeling approach has proved very challenging due to its computational complexity in seeking maximum likelihood estimates. In this article, we propose a series of composite likelihoods for maximum composite likelihood estimation, as well as an enhanced Monte Carlo expectation–maximization (MCEM) algorithm for maximum likelihood estimation, in the context of joint latent class models. Theoretically, the maximum composite likelihood estimates are consistent and asymptotically normal. Numerically, we have shown that, as compared to the MCEM algorithm that maximizes the full likelihood, not only the composite likelihood approach that is coupled with the quasi-Newton method can substantially reduce the computational complexity and duration, but it can simultaneously retain comparative estimation efficiency.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Bellio R, Varin C (2005) A pairwise likelihood approach to generalized linear models with crossed random effects. Stat Model 5:217–227
Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc, Ser B 61:265–285
Buck Louis GM, Weiner JM, Whitcomb BW, Sperrazza R, Schisterman EF, Lobdell DT, Crickard K, Greizerstein H, Kostyniak PJ (2005) Environmental PCB exposure and risk of endometriosis. Hum Reprod 20(1):279–285
Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208
Cave M, Appana S, Patel M, Falkner KC, McClain CJ, Brock G (2010) Polychlorinated biphenyls, lead, and mercury are associated with liver disease in American adults: NHANES 2003–2004. Environ Health Perspect 118(12):1735–1742
Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (2008) National Health and Nutrition Examination Survey Data. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2003–2004, Hyattsville
Chao HR, Wang SL, Lee WJ, Wang YF, Päpke O (2007) Levels of polybrominated diphenyl ethers (PBDEs) in breast milk from central Taiwan and their relation to infant birth outcome and maternal menstruation effects. Environ Int 33(2):239–245
Chan JS, Kuk AY (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53:86–97
Clayton D, Rasbash J (1999) Estimation in large crossed random-effect models by data augmentation. J R Stat Soc, Ser A 162:425–436
Coull BA, Hobert JP, Ryan LM, Holmes LB (2001) Crossed random effect models for multiple outcomes in a study of teratogenesis. J Am Stat Assoc 96(456):1194–1204
Ding G, Shi R, Gao Y, Zhang Y, Kamijima M, Sakai K, Wang G, Feng C, Tian Y (2012) Pyrethroid pesticide exposure and risk of childhood acute lymphocytic leukemia in Shanghai. Environ Sci Technol 46(24):13480–13487
Gennings C, Sabo R, Carneyb E (2010) Identifying subsets of complex mixtures most associated with complex diseases. Epidemiology 21(4):S77–S84
Geyer CJ, Thompson EA (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J R Stat Soc, Ser B 54(3):657–699
Giboney PT (2005) Mildly elevated liver transaminase levels in the asymptomatic patient. Am Fam Physcian 71(6):1105–1110
Herbstman JB, Sjödin A, Jones R, Kurzon M, Lederman SA, Rauh VA, Needham LL, Wang R, Perera FP (2008) Prenatal exposure to PBDEs and neurodevelopment. Epidemiology 19(6):S348
Kortenkamp A (2008) Low dose mixture effects of endocrine disrupters: implications for risk assessment and epidemiology. Int J Androl 31(2):233–237
Kratz A, Ferraro M, Sluss PM, Lewandrowski KB (2004) Case records of the Massachusetts general hospital: laboratory values. N Engl J Med 351(15):1549–1563
Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326
Lindsay B (1998) Composite likelihood methods. Contemp Math 80:220–239
Main KM, Kiviranta H, Virtanen HE, Sundqvist E, Tuomisto JT, Tuomisto J, Vartiainen T, Skakkebaek NE, Toppari J (2007) Flame retardants in placenta and breast milk and cryptorchidism in newborn boys. Environ Health Perspect 115(10):1519–1526
McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 92:162–170
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96:730–1164
Pinheiro JC, Chao EC (2006) Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15:58–81
Renard D, Molenberghs G, Geys H (2004) A pairwise likelihood approach to estimation in multilevel probit models. Comput Stat Data Anal 44(4):649–667
Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42
Xie Y, Chen Z, Albert PS (2013) A crossed random effects modeling approach for estimating diagnostic accuracy from ordinal ratings without a gold standard. Stat Med 32(20):3472–3485
Zhang B, Chen Z, Albert PS (2012) Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data. Biostatistics 13(1):74–88
Acknowledgments
We sincerely thank two anonymous reviewers, Associate Editor, and Editors for their valuable comments, which had substantially improved this manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of US Food and Drug Administration.
Author information
Authors and Affiliations
Corresponding author
Appendix 1: Model selection
Appendix 1: Model selection
In practice, after conducting the MCLE and MLE in joint latent class modeling with fixed K’s, data analysts need to determine the optimal number of latent classes. In the context of joint latent class modeling, a unified model selection strategy that can be applied to both MCLE and MLE is preferable. Here, we propose to employ the simulated likelihood approach (Geyer and Thompson 1992; Xie et al. 2013), combined with the Akaike information criterion (AIC), to select the best K. Let \(\hat{\varvec{\theta }}\) be the estimates obtained from the MCLE or MLE procedures. Note that the marginal likelihood (11), or equivalently (8), is the integration (summation) with respect to two latent processes \(L_i\) and \(\mathbf {b}_j\). By the rule of Monte Carlo integration, the maximized likelihood \(L(\hat{\varvec{\theta }})\) can be approximated by
where \(\Lambda \) is the total number of sampling realizations (\(\Lambda =10^{6}\) in the analysis of case study), \(L_i^{(t)}\) is the tth simulated realizations from the multinomial distribution Multinomial\((1,(\hat{\pi }_0,\ldots , \,\hat{\pi }_{K-1}))\) for the ith subject, and \(\mathbf {b}_j^{(t)}\) is the tth simulated realizations from \(N\left( (0, 0)^\prime , \left( \begin{array}{cc} \hat{\sigma }_0^2 &{} \hat{\rho }\hat{\sigma }_0\hat{\sigma }_1 \\ \hat{\rho }\hat{\sigma }_0\hat{\sigma }_1 &{} \hat{\sigma }_1^2 \\ \end{array} \right) \right) \) for the jth biomarker. Once \(\hat{L}(\hat{\varvec{\theta }})\) is obtained, the AIC values can be calculated accordingly.
Rights and permissions
About this article
Cite this article
Zhang, B., Liu, W., Zhang, H. et al. Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data. Comput Stat 31, 425–449 (2016). https://doi.org/10.1007/s00180-015-0597-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0597-3