Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data

Bo Zhang¹,
Wei Liu²,
Hui Zhang³,
Qihui Chen⁴ &
…
Zhiwei Zhang¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data has been proposed to study the relationship between diseases and their related biomarkers. However, statistical inference of the joint latent class modeling approach has proved very challenging due to its computational complexity in seeking maximum likelihood estimates. In this article, we propose a series of composite likelihoods for maximum composite likelihood estimation, as well as an enhanced Monte Carlo expectation–maximization (MCEM) algorithm for maximum likelihood estimation, in the context of joint latent class models. Theoretically, the maximum composite likelihood estimates are consistent and asymptotically normal. Numerically, we have shown that, as compared to the MCEM algorithm that maximizes the full likelihood, not only the composite likelihood approach that is coupled with the quasi-Newton method can substantially reduce the computational complexity and duration, but it can simultaneously retain comparative estimation efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

The Latent Class Stochastic Process Model for Evaluation of Hidden Heterogeneity in Longitudinal Data

Estimating sensitivity and specificity of diagnostic tests using latent class models that account for conditional dependence between tests: a simulation study

Article Open access 10 March 2023

A copula-based method of classifying individuals into binary disease categories using dependent biomarkers

Article 27 January 2020

References

Bellio R, Varin C (2005) A pairwise likelihood approach to generalized linear models with crossed random effects. Stat Model 5:217–227
Article MathSciNet MATH Google Scholar
Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc, Ser B 61:265–285
Article MATH Google Scholar
Buck Louis GM, Weiner JM, Whitcomb BW, Sperrazza R, Schisterman EF, Lobdell DT, Crickard K, Greizerstein H, Kostyniak PJ (2005) Environmental PCB exposure and risk of endometriosis. Hum Reprod 20(1):279–285
Article Google Scholar
Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208
Article MathSciNet MATH Google Scholar
Cave M, Appana S, Patel M, Falkner KC, McClain CJ, Brock G (2010) Polychlorinated biphenyls, lead, and mercury are associated with liver disease in American adults: NHANES 2003–2004. Environ Health Perspect 118(12):1735–1742
Article Google Scholar
Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (2008) National Health and Nutrition Examination Survey Data. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2003–2004, Hyattsville
Chao HR, Wang SL, Lee WJ, Wang YF, Päpke O (2007) Levels of polybrominated diphenyl ethers (PBDEs) in breast milk from central Taiwan and their relation to infant birth outcome and maternal menstruation effects. Environ Int 33(2):239–245
Article Google Scholar
Chan JS, Kuk AY (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53:86–97
Article MathSciNet MATH Google Scholar
Clayton D, Rasbash J (1999) Estimation in large crossed random-effect models by data augmentation. J R Stat Soc, Ser A 162:425–436
Article Google Scholar
Coull BA, Hobert JP, Ryan LM, Holmes LB (2001) Crossed random effect models for multiple outcomes in a study of teratogenesis. J Am Stat Assoc 96(456):1194–1204
Article MathSciNet MATH Google Scholar
Ding G, Shi R, Gao Y, Zhang Y, Kamijima M, Sakai K, Wang G, Feng C, Tian Y (2012) Pyrethroid pesticide exposure and risk of childhood acute lymphocytic leukemia in Shanghai. Environ Sci Technol 46(24):13480–13487
Article Google Scholar
Gennings C, Sabo R, Carneyb E (2010) Identifying subsets of complex mixtures most associated with complex diseases. Epidemiology 21(4):S77–S84
Article Google Scholar
Geyer CJ, Thompson EA (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J R Stat Soc, Ser B 54(3):657–699
MathSciNet Google Scholar
Giboney PT (2005) Mildly elevated liver transaminase levels in the asymptomatic patient. Am Fam Physcian 71(6):1105–1110
Google Scholar
Herbstman JB, Sjödin A, Jones R, Kurzon M, Lederman SA, Rauh VA, Needham LL, Wang R, Perera FP (2008) Prenatal exposure to PBDEs and neurodevelopment. Epidemiology 19(6):S348
Kortenkamp A (2008) Low dose mixture effects of endocrine disrupters: implications for risk assessment and epidemiology. Int J Androl 31(2):233–237
Article Google Scholar
Kratz A, Ferraro M, Sluss PM, Lewandrowski KB (2004) Case records of the Massachusetts general hospital: laboratory values. N Engl J Med 351(15):1549–1563
Google Scholar
Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326
Article MathSciNet MATH Google Scholar
Lindsay B (1998) Composite likelihood methods. Contemp Math 80:220–239
MathSciNet Google Scholar
Main KM, Kiviranta H, Virtanen HE, Sundqvist E, Tuomisto JT, Tuomisto J, Vartiainen T, Skakkebaek NE, Toppari J (2007) Flame retardants in placenta and breast milk and cryptorchidism in newborn boys. Environ Health Perspect 115(10):1519–1526
Google Scholar
McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 92:162–170
Article MathSciNet MATH Google Scholar
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
MATH Google Scholar
Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96:730–1164
Article MathSciNet MATH Google Scholar
Pinheiro JC, Chao EC (2006) Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15:58–81
Article MathSciNet Google Scholar
Renard D, Molenberghs G, Geys H (2004) A pairwise likelihood approach to estimation in multilevel probit models. Comput Stat Data Anal 44(4):649–667
Article MathSciNet MATH Google Scholar
Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42
MathSciNet MATH Google Scholar
Xie Y, Chen Z, Albert PS (2013) A crossed random effects modeling approach for estimating diagnostic accuracy from ordinal ratings without a gold standard. Stat Med 32(20):3472–3485
Article MathSciNet Google Scholar
Zhang B, Chen Z, Albert PS (2012) Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data. Biostatistics 13(1):74–88
Article MATH Google Scholar

Download references

Acknowledgments

We sincerely thank two anonymous reviewers, Associate Editor, and Editors for their valuable comments, which had substantially improved this manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of US Food and Drug Administration.

Author information

Authors and Affiliations

Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, 20993, USA
Bo Zhang & Zhiwei Zhang
Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
Wei Liu
Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN, 38105, USA
Hui Zhang
Department of Applied Economics, College of Economics and Management, China Agricultural University, Beijing, 100083, People’s Republic of China
Qihui Chen

Authors

Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qihui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Liu.

Appendix 1: Model selection

In practice, after conducting the MCLE and MLE in joint latent class modeling with fixed K’s, data analysts need to determine the optimal number of latent classes. In the context of joint latent class modeling, a unified model selection strategy that can be applied to both MCLE and MLE is preferable. Here, we propose to employ the simulated likelihood approach (Geyer and Thompson 1992; Xie et al. 2013), combined with the Akaike information criterion (AIC), to select the best K. Let $\hat{\varvec{\theta }}$ be the estimates obtained from the MCLE or MLE procedures. Note that the marginal likelihood (11), or equivalently (8), is the integration (summation) with respect to two latent processes $L_i$ and $\mathbf {b}_j$. By the rule of Monte Carlo integration, the maximized likelihood $L(\hat{\varvec{\theta }})$ can be approximated by

$$\begin{aligned}&\displaystyle \hat{L}(\hat{\varvec{\theta }})= \displaystyle \frac{1}{\Lambda } \sum _{\lambda =1}^\Lambda \\&\quad \times \left[ \prod _{i=1}^I\left\{ \frac{e^{\,y_i\left( \hat{\beta }_0+\hat{\beta }_1L_i^{(\lambda )}+ \mathbf {w}_{i}^{\prime } \hat{\varvec{\gamma }}\right) }}{1+e^{\hat{\beta }_0+\hat{\beta }_1L_i^{(\lambda )}+\mathbf {w}_{i}^{\prime }\hat{\varvec{\gamma }}}} \prod _{j=1}^J\frac{f^{\,\,\,u_{ij}}_{V_{ij}|L_i^{(\lambda )}, \mathbf {b}_j^{(\lambda )}, \mathbf {z}_{ij}}(v_{ij})e^{u_{ij}\left( \hat{\eta }_0+\hat{\eta }_1h(\hat{\mu }_{ij}(L_i^{(\lambda )}, \mathbf {b}_j^{(\lambda )}, \mathbf {z}_{ij}), \mathbf {t}_{ij},\hat{\varvec{\zeta }})\right) }}{1+ e^{\hat{\eta _0}+\hat{\eta }_1h(\hat{\mu }_{ij}\left( L_i^{(\lambda )}, \mathbf {b}_j^{(\lambda )}, \mathbf {z}_{ij}), \mathbf {t}_{ij},\hat{\varvec{\zeta }}\right) }}\right\} \right] \end{aligned}$$

where $\Lambda $ is the total number of sampling realizations ($\Lambda =10^{6}$ in the analysis of case study), $L_i^{(t)}$ is the tth simulated realizations from the multinomial distribution Multinomial$(1,(\hat{\pi }_0,\ldots , \,\hat{\pi }_{K-1}))$ for the ith subject, and $\mathbf {b}_j^{(t)}$ is the tth simulated realizations from $N\left( (0, 0)^\prime , \left( \begin{array}{cc} \hat{\sigma }_0^2 &{} \hat{\rho }\hat{\sigma }_0\hat{\sigma }_1 \\ \hat{\rho }\hat{\sigma }_0\hat{\sigma }_1 &{} \hat{\sigma }_1^2 \\ \end{array} \right) \right) $ for the jth biomarker. Once $\hat{L}(\hat{\varvec{\theta }})$ is obtained, the AIC values can be calculated accordingly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, B., Liu, W., Zhang, H. et al. Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data. Comput Stat 31, 425–449 (2016). https://doi.org/10.1007/s00180-015-0597-3

Download citation

Received: 31 October 2014
Accepted: 12 June 2015
Published: 05 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00180-015-0597-3

Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data

Abstract

Access this article

Similar content being viewed by others

The Latent Class Stochastic Process Model for Evaluation of Hidden Heterogeneity in Longitudinal Data

Estimating sensitivity and specificity of diagnostic tests using latent class models that account for conditional dependence between tests: a simulation study

A copula-based method of classifying individuals into binary disease categories using dependent biomarkers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Model selection

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data

Abstract

Access this article

Similar content being viewed by others

The Latent Class Stochastic Process Model for Evaluation of Hidden Heterogeneity in Longitudinal Data

Estimating sensitivity and specificity of diagnostic tests using latent class models that account for conditional dependence between tests: a simulation study

A copula-based method of classifying individuals into binary disease categories using dependent biomarkers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Model selection

Appendix 1: Model selection

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation