Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

Ming Tang^1,2,
Chao Gao^1,2,
Stephen A. Goutman³,
Alexandr Kalinin^1,4,
Bhramar Mukherjee²,
Yuanfang Guan⁴ &
…
Ivo D. Dinov ORCID: orcid.org/0000-0003-3825-4375^1,4,5

1599 Accesses
24 Citations
18 Altmetric
Explore all metrics

Abstract

Amyotrophic lateral sclerosis (ALS) is a complex progressive neurodegenerative disorder with an estimated prevalence of about 5 per 100,000 people in the United States. In this study, the ALS disease progression is measured by the change of Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS) score over time. The study aims to provide clinical decision support for timely forecasting of the ALS trajectory as well as accurate and reproducible computable phenotypic clustering of participants. Patient data are extracted from DREAM-Phil Bowen ALS Prediction Prize4Life Challenge data, most of which are from the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT) archive. We employed model-based and model-free machine-learning methods to predict the change of the ALSFRS score over time. Using training and testing data we quantified and compared the performance of different techniques. We also used unsupervised machine learning methods to cluster the patients into separate computable phenotypes and interpret the derived subcohorts. Direct prediction of univariate clinical outcomes based on model-based (linear models) or model-free (machine learning based techniques – random forest and Bayesian adaptive regression trees) was only moderately successful. The correlation coefficients between clinically observed changes in ALSFRS scores relative to the model-based/model-free predicted counterparts were 0.427 (random forest) and 0.545(BART). The reliability of these results were assessed using internal statistical cross validation and well as external data validation. Unsupervised clustering generated very reliable and consistent partitions of the patient cohort into four computable phenotypic subgroups. These clusters were explicated by identifying specific salient clinical features included in the PRO-ACT archive that discriminate between the derived subcohorts. There are differences between alternative analytical methods in forecasting specific clinical phenotypes. Although predicting univariate clinical outcomes may be challenging, our results suggest that modern data science strategies are useful in clustering patients and generating evidence-based ALS hypotheses about complex interactions of multivariate factors. Predicting univariate clinical outcomes using the PRO-ACT data yields only marginal accuracy (about 70%). However, unsupervised clustering of participants into sub-groups generates stable, reliable and consistent (exceeding 95%) computable phenotypes whose explication requires interpretation of multivariate sets of features.

Highlights

• Used a large ALS data archive of 8,000 patients consisting of 3 million records, including 200 clinical features tracked over 12 months.

• Employed model-based and model-free methods to predict ALSFRS changes over time, cluster patients into cohorts, and derive computable phenotypes.

• Research findings include stable, reliable, and consistent (95%) patient stratification into computable phenotypes. However, clinical explication of the results requires interpretation of multivariate information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Patient Stratification Using Clinical and Patient Profiles: Targeting Personalized Prognostic Prediction in ALS

Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Article Open access 24 January 2019

Unravelling Disease Presentation Patterns in ALS Using Biclustering for Discriminative Meta-Features Discovery

References

Abayomi, K., Gelman, A., & Levy, M. (2008). Diagnostics for multivariate imputations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 57(3), 273–291.
Article Google Scholar
Allen-Zhu, Z., & Hazan, E. (2016). Variance reduction for faster non-convex optimization. in International Conference on Machine Learning.
Atassi, N., Berry, J., Shui, A., Zach, N., Sherman, A., Sinani, E., Walker, J., Katsovskiy, I., Schoenfeld, D., Cudkowicz, M., & Leitner, M. (2014). The PRO-ACT database design, initial analyses, and predictive features. Neurology, 83(19), 1719–1725.
Article CAS Google Scholar
Beaulieu-Jones, B.K., & Moore, J.H. (2017). Missing data imputation in the electronic health record using deeply learned autoencoders, in Pacific Symposium on Biocomputing 2017, R.B. Altman, et al., Editors. p. 207–218.
Bergsma, W., Croon, M.A., & Hagenaars, J.A. (2009). Marginal models: For dependent, clustered, and longitudinal categorical data. Springer Science & Business Media.
Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3–4), 231–357.
Article Google Scholar
Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of statistical software, 45(3).
Carreiro, A. V., Amaral, P. M. T., Pinto, S., Tomás, P., de Carvalho, M., & Madeira, S. C. (2015). Prognostic models based on patient snapshots and time windows: Predicting disease progression to assisted ventilation in amyotrophic lateral sclerosis. Journal of biomedical informatics, 58, 133–144.
Article Google Scholar
Cedarbaum, J. M., & Stambler, N. (1997). Performance of the amyotrophic lateral sclerosis functional rating scale (ALSFRS) in multicenter clinical trials. Journal of the Neurological Sciences, 152, s1–s9.
Article Google Scholar
Cedarbaum, J. M., Stambler, N., Malta, E., Fuller, C., Hilt, D., Thurmond, B., & Nakanishi, A. (1999). The ALSFRS-R: A revised ALS functional rating scale that incorporates assessments of respiratory function. Journal of the neurological sciences, 169(1), 13–21.
Article CAS Google Scholar
Chatterjee, S., & Hadi, A.S. (2015). Regression analysis by example. John Wiley & Sons.
De Sa, J.M. (2012). Pattern recognition: concepts, methods and applications. Springer Science & Business Media.
Dinov, I. D. (2016). Volume and value of big healthcare data. Journal of Medical Statistics and Informatics, 4(1), 1–7.
Article Google Scholar
Dinov, I. D. (2018). Data science and predictive analytics: Biomedical and health applications using R, Springer, Computer Science, https://doi.org/10.1007/978-3-319-72347-1.
Dinov, I. D., Heavner, B., Tang, M., Glusman, G., Chard, K., Darcy, M., Madduri, R., Pa, J., Spino, C., Kesselman, C., Foster, I., Deutsch, E. W., Price, N. D., van Horn, J. D., Ames, J., Clark, K., Hood, L., Hampstead, B. M., Dauer, W., & Toga, A. W. (2016). Predictive big data analytics: A study of Parkinson's disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations. PLoS One, 11(8), e0157077.
Article Google Scholar
Edwards, N., Wu, X., & Tseng, C.-W. (2009). An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra. Clinical Proteomics, 5(1), 23–36.
Article CAS Google Scholar
Fiedler, M., et al. (2006). Linear optimization problems with inexact data. Springer Science & Business Media.
Filzmoser, P., Baumgartner, R., & Moser, E. (1999). A hierarchical clustering method for analyzing functional MR images. Magnetic Resonance Imaging, 17(6), 817–826.
Article CAS Google Scholar
Franchignoni, F., Mora, G., Giordano, A., Volanti, P., & Chiò, A. (2013). Evidence of multidimensionality in the ALSFRS-R scale: A critical appraisal on its measurement properties using Rasch analysis. Journal of Neurology, Neurosurgery, and Psychiatry, 84(12), 1340–1345.
Article Google Scholar
Gomeni, R., Fava, M., & P.R.O.-A.A.C.T. Consortium. (2014). Amyotrophic lateral sclerosis disease progression model. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 15(1–2), 119–129.
Article Google Scholar
Gong, P., et al. (2013). A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. in International Conference on Machine Learning.
Gordon, P. H., Cheng, B., Salachas, F., Pradat, P. F., Bruneteau, G., Corcia, P., Lacomblez, L., & Meininger, V. (2010). Progression in ALS is not linear but is curvilinear. Journal of Neurology, 257(10), 1713–1717.
Article Google Scholar
Grigull, L., et al. (2016). Diagnostic support for selected neuromuscular diseases using answer-pattern recognition and data mining techniques: A proof of concept multicenter prospective trial. BMC Medical Informatics and Decision Making, 16(1), 1.
Article Google Scholar
Hothorn, T., & Jung, H. H. (2014). RandomForest4Life: A random Forest for predicting ALS disease progression. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 15(5–6), 444–452.
Article Google Scholar
Huang, Z., Zhang, H., Boss, J., Goutman, S. A., Mukherjee, B., Dinov, I. D., Guan, Y., & for the Pooled Resource Open-Access ALS Clinical Trials Consortium. (2017). Complete hazard ranking to analyze right-censored data: An ALS survival study. PLOS Computational Biology, 13(12), e1005887.
Article Google Scholar
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition letters, 31(8), 651–666.
Article Google Scholar
Jain, P., & Kar, P. (2017). Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3–4), 142–336.
Article Google Scholar
Kai-Hsiang, C., et al. (1999). Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means. IEEE Transactions on Medical Imaging, 18(12), 1117–1128.
Article Google Scholar
Kuffner, R., et al. (2015). Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nature Biotechnology, 33(1), 51–57.
Article Google Scholar
Maaten, L.v.d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.
Mairal, J. (2015). Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2), 829–855.
Article Google Scholar
Marek, K., Jennings, D., Lasch, S., Siderowf, A., Tanner, C., Simuni, T., Coffey, C., Kieburtz, K., Flagg, E., Chowdhury, S., Poewe, W., Mollenhauer, B., Klinik, P. E., Sherer, T., Frasier, M., Meunier, C., Rudolph, A., Casaceli, C., Seibyl, J., Mendick, S., Schuff, N., Zhang, Y., Toga, A., Crawford, K., Ansbach, A., de Blasio, P., Piovella, M., Trojanowski, J., Shaw, L., Singleton, A., Hawkins, K., Eberling, J., Brooks, D., Russell, D., Leary, L., Factor, S., Sommerfeld, B., Hogarth, P., Pighetti, E., Williams, K., Standaert, D., Guthrie, S., Hauser, R., Delgado, H., Jankovic, J., Hunter, C., Stern, M., Tran, B., Leverenz, J., Baca, M., Frank, S., Thomas, C. A., Richard, I., Deeley, C., Rees, L., Sprenger, F., Lang, E., Shill, H., Obradov, S., Fernandez, H., Winters, A., Berg, D., Gauss, K., Galasko, D., Fontaine, D., Mari, Z., Gerstenhaber, M., Brooks, D., Malloy, S., Barone, P., Longo, K., Comery, T., Ravina, B., Grachev, I., Gallagher, K., Collins, M., Widnell, K. L., Ostrowizki, S., Fontoura, P., Ho, T., Luthman, J., Brug, M. . ., Reith, A. D., & Taylor, P. (2011). The Parkinson progression marker initiative (PPMI). Progress in Neurobiology, 95(4), 629–635.
Article Google Scholar
Markus, K. A. (2012). Principles and practice of structural equation modeling by Rex B. Kline. Structural Equation Modeling: A Multidisciplinary Journal, 19(3), 509–512.
Article Google Scholar
Moon, S. W., et al. (2015a). Structural neuroimaging genetics interactions in Alzheimer’s disease. Journal of Alzheimer's Disease, 48(4), 1051–1063.
Article CAS Google Scholar
Moon, S. W., Dinov, I. D., Hobel, S., Zamanyan, A., Choi, Y. C., Shi, R., Thompson, P. M., Toga, A. W., & for the Alzheimer's Disease Neuroimaging Initiative. (2015b). Structural brain changes in early-onset Alzheimer's disease subjects using the LONI pipeline environment. Journal of Neuroimaging, 25(5), 728–737.
Article Google Scholar
Ong, M.-L., Tan, P. F., & Holbrook, J. D. (2017). Predicting functional decline and survival in amyotrophic lateral sclerosis. PLoS One, 12(4), e0174925.
Article Google Scholar
Pfohl, S. R., Kim, R. B., Coan, G. S., & Mitchell, C. S. (2018). Unraveling the complexity of amyotrophic lateral sclerosis survival prediction. Frontiers in Neuroinformatics, 12(36).
Rodriguez-Galiano, V., et al. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing, 67, 93–104.
Article Google Scholar
Saitta, S., Kripakaran, P., Raphael, B., & Smith, I. F. C. (2010). Feature selection using stochastic search: An application to system identification. Journal of Computing in Civil Engineering, 24(1), 3–10.
Article Google Scholar
Saykin, A. J., Shen, L., Yao, X., Kim, S., Nho, K., Risacher, S. L., Ramanan, V. K., Foroud, T. M., Faber, K. M., Sarwar, N., Munsie, L. M., Hu, X., Soares, H. D., Potkin, S. G., Thompson, P. M., Kauwe, J. S., Kaddurah-Daouk, R., Green, R. C., Toga, A. W., Weiner, M. W., & Alzheimer's Disease Neuroimaging Initiative. (2015). Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans. Alzheimers & Dementia, 11(7), 792–814.
Article Google Scholar
Steinberg, D., & Colla, P. (2009). Cart: classification and regression trees. The Top Ten Algorithms in Data Mining, 9, 179.
Article Google Scholar
Su, Y.-S., et al. (2011). Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software, 45(2), 1–31.
Article Google Scholar
Tamás Kincses, Z., Johansen-Berg, H., Tomassini, V., Bosnell, R., Matthews, P. M., & Beckmann, C. F. (2008). Model-free characterization of brain functional networks for motor sequence learning using fMRI. NeuroImage, 39(4), 1950–1958.
Article Google Scholar
Taylor, A. A., Fournier, C., Polak, M., Wang, L., Zach, N., Keymer, M., Glass, J. D., Ennist, D. L., & The Pooled Resource Open-Access ALS Clinical Trials Consortium. (2016). Predicting disease progression in amyotrophic lateral sclerosis. Annals of Clinical and Translational Neurology, 3(11), 866–875.
Article Google Scholar
Westeneng, H.-J., Debray, T. P. A., Visser, A. E., van Eijk, R. P. A., Rooney, J. P. K., Calvo, A., Martin, S., McDermott, C. J., Thompson, A. G., Pinto, S., Kobeleva, X., Rosenbohm, A., Stubendorff, B., Sommer, H., Middelkoop, B. M., Dekker, A. M., van Vugt, J. J. F. A., van Rheenen, W., Vajda, A., Heverin, M., Kazoka, M., Hollinger, H., Gromicho, M., Körner, S., Ringer, T. M., Rödiger, A., Gunkel, A., Shaw, C. E., Bredenoord, A. L., van Es, M. A., Corcia, P., Couratier, P., Weber, M., Grosskreutz, J., Ludolph, A. C., Petri, S., de Carvalho, M., van Damme, P., Talbot, K., Turner, M. R., Shaw, P. J., al-Chalabi, A., Chiò, A., Hardiman, O., Moons, K. G. M., Veldink, J. H., & van den Berg, L. H. (2018). Prognosis for patients with amyotrophic lateral sclerosis: Development and validation of a personalised prediction model. The Lancet Neurology, 17(5), 423–433.
Article Google Scholar
Wismüller, A., Meyer-Bäse, A., Lange, O., Auer, D., Reiser, M. F., & Sumners, D. W. (2004). Model-free functional MRI analysis based on unsupervised clustering. Journal of Biomedical Informatics, 37(1), 10–18.
Article Google Scholar
Wistuba, M., Schilling, N., & Schmidt-Thieme, L.. (2015). Sequential model-free Hyperparameter tuning. in Data mining (ICDM), 2015 IEEE International Conference on.
Witten, I.H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Zach, N., Ennist, D. L., Taylor, A. A., Alon, H., Sherman, A., Kueffner, R., Walker, J., Sinani, E., Katsovskiy, I., Cudkowicz, M., & Leitner, M. L. (2015). Being PRO-ACTive: What can a clinical trial database reveal about ALS? Neurotherapeutics, 12(2), 417–423.
Article CAS Google Scholar
Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451–462.
Article Google Scholar

Download references

Acknowledgements

Colleagues from the Statistics Online Computational Resource (SOCR), Center for Complexity and Self-management of Chronic Disease (CSCD), Big Data Discovery Science (BDDS), and the Michigan Institute for Data Science (MIDAS) provided constructive feedback about this study.

Data used in the preparation of this article were obtained from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) Database. As such, the following organizations and individuals within the PRO-ACT Consortium contributed to the design and implementation of the PRO-ACT Database and/or provided data, but did not participate in the analysis of the data or the writing of this report: Neurological Clinical Research Institute, MGH; Northeast ALS Consortium; Novartis; Prize4Life Israel; Regeneron Pharmaceuticals, Inc.; Sanofi; Teva Pharmaceutical Industries, Ltd.

Finally, the authors are deeply indebted to the journal editors and the anonymous reviewers who provided valuable recommendations and constructive critiques that improved the manuscript.

Funding

This research was partially supported by NSF grants 1734853, 1636840, 1416953, 0716055 and 1023115, NIH grants P20 NR015331, P50 NS091856, UL1TR002240, P30 DK089503, U54 EB020406, P30 AG053760, and K23 ES027221, and the Elsie Andresen Fiske Research Fund. These funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Ming Tang, Chao Gao and Stephen A. Goutman contributed equally to this work.

Authors and Affiliations

Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, 48109, USA
Ming Tang, Chao Gao, Alexandr Kalinin & Ivo D. Dinov
Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
Ming Tang, Chao Gao & Bhramar Mukherjee
Department of Neurology, University of Michigan, Ann Arbor, MI, 48109, USA
Stephen A. Goutman
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
Alexandr Kalinin, Yuanfang Guan & Ivo D. Dinov
Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
Ivo D. Dinov

Authors

Ming Tang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Stephen A. Goutman
View author publications
You can also search for this author in PubMed Google Scholar
Alexandr Kalinin
View author publications
You can also search for this author in PubMed Google Scholar
Bhramar Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Yuanfang Guan
View author publications
You can also search for this author in PubMed Google Scholar
Ivo D. Dinov
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MT: developed techniques, conducted analyses, and wrote manuscript.

CG: developed techniques, conducted analyses, and wrote manuscript.

SAG: conceptualized the study and wrote manuscript.

AK: informatics, data analytics, and wrote manuscript.

BM: biostatistical methodology and wrote manuscript.

YG: conducted analyses, and wrote manuscript.

IDD: conceptualized the study, developed methods, conducted analyses, and wrote manuscript.

Corresponding author

Correspondence to Ivo D. Dinov.

Ethics declarations

Ethics Approval and Consent to Participate

University of Michigan Institutional Review Board (IRB) approval (HUM00115107) was obtained prior to managing, processing and analyzing the PRO-ACT data.

Competing Interests

S.A.G. Dr. Goutman has received research support from the NIH/NIEHS (K23ES027221), Agency for Toxic Substances and Disease Registry/Centers for Disease Control, the ALS Association, Target ALS, Cytokinetics, and Neuralstem, Inc., and consulted for Cytokinetics.

Electronic supplementary material

ESM 1

(DOCX 433 kb)

ESM 2

(PDF 77 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, M., Gao, C., Goutman, S.A. et al. Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering. Neuroinform 17, 407–421 (2019). https://doi.org/10.1007/s12021-018-9406-9

Download citation

Published: 20 November 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s12021-018-9406-9

Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

Abstract

Highlights

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Patient Stratification Using Clinical and Patient Profiles: Targeting Personalized Prognostic Prediction in ALS

Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Unravelling Disease Presentation Patterns in ALS Using Biclustering for Discriminative Meta-Features Discovery

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval and Consent to Participate

Competing Interests

Electronic supplementary material

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

Abstract

Highlights

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Patient Stratification Using Clinical and Patient Profiles: Targeting Personalized Prognostic Prediction in ALS

Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Unravelling Disease Presentation Patterns in ALS Using Biclustering for Discriminative Meta-Features Discovery

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval and Consent to Participate

Competing Interests

Electronic supplementary material

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation