Abstract
Data from multi-modality provide complementary information in clinical prediction, but missing data in clinical cohorts limits the number of subjects in multi-modal learning context. Multi-modal missing imputation is challenging with existing methods when 1) the missing data span across heterogeneous modalities (e.g., image vs. non-image); or 2) one modality is largely missing. In this paper, we address imputation of missing data by modeling the joint distribution of multi-modal data. Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method that imputes one modality combining the conditional knowledge from another modality. Specifically, C-PBiGAN introduces a conditional latent space in a missing imputation framework that jointly encodes the available multi-modal data, along with a class regularization loss on imputed data to recover discriminative information. To our knowledge, it is the first generative adversarial model that addresses multi-modal missing imputation by modeling the joint distribution of image and non-image data. We validate our model with both the national lung screening trial (NLST) dataset and an external clinical validation cohort. The proposed C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods (e.g., AUC values increase in both NLST (+2.9%) and in-house dataset (+4.3%) compared with PBiGAN, p < 0.05).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2019. CA Cancer J. Clin. 69, 7–34 (2019)
Aberle, D.R., et al.: Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 365, 395–409 (2011)
National Lung Screening Trial Research Team, et al.: The national lung screening trial: overview and study design. Radiology 258, 243–253 (2011)
Huang, P., et al.: Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. Lancet Digit. Heal. 1, e353–e362 (2019)
Tammemägi, M.C., et al.: Selection criteria for lung-cancer screening. N. Engl. J. Med. 368, 728–736 (2013)
Swensen, S.J.: The probability of malignancy in solitary pulmonary nodules. Arch. Intern. Med. 157, 849 (1997)
McWilliams, A., et al.: Probability of cancer in pulmonary nodules detected on first screening CT. N. Engl. J. Med. 369, 910–919 (2013)
Liu, L., Dou, Q., Chen, H., Qin, J., Heng, P.A.: Multi-task deep model with margin ranking loss for lung nodule analysis. IEEE Trans. Med. Imaging 39, 718–728 (2020)
Liao, F., Liang, M., Li, Z., Hu, X., Song, S.: Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-or network. IEEE Trans. Neural Networks Learn. Syst. 2019, 1–12 (2019)
Gao, R., et al.: Time-distanced gates in long short-term memory networks. Med. Image Anal. 65, 101785 (2020)
Gao, R. et al.: Deep Multi-path Network Integrating Incomplete Biomarker and Chest CT Data for Evaluating Lung Cancer Risk. arXiv:2010.09524 (2021)
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976). https://doi.org/10.1093/biomet/63.3.581
Van Buuren, S.: Flexible imputation of missing data. CRC Press (2018)
Mazumder, R., Hastie, T., Edu, H., Tibshirani, R., Edu, T., Jaakkola, T.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010)
Yoon, J., Jordon, J., Van Der Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, pp. 9042–9051. International Machine Learning Society (IMLS) (2018)
Stekhoven, D.J., Bühlmann, P.: Missforest-Non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 112–118 (2012)
Mattei, P.A., Freiisen, J.: Miwae: deep generative modelling and imputation of incomplete data sets. In: 36th International Conference on Machine Learning, ICML 2019, pp. 7762–7772 (2019)
Cheng, S., Li, -Xian, Marlin, B.M.: Learning from irregularly-sampled time series: a missing data perspective. In: International Conference Machine Learning (2020)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations, ICLR (2014)
Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial Feature Learning (2016)
Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets. arXiv Prepr. arXiv:1411.1784 (2014)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Jin, D., Xu, Z., Tang, Y., Harrison, A.P., Mollura, D.J.: CT-realistic lung nodule simulation from 3D conditional generative adversarial networks for robust lung segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 732–740. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_81
Mirsky, Y., Mahler, T., Shelef, I., Elovici, Y.: CT-GAN: malicious tampering of 3D medical imagery using deep learning. In: Proceedings of the 28th USENIX Security Symposium, pp. 461–478 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
Mateuszbuda: Statistical functions based on bootstrapping for computing confidence intervals and p-values comparing machine learning models and human readers. https://github.com/mateuszbuda/ml-stat-util. Accessed 27 Feb 2021
Acknowledgement
This research was supported by NSF CAREER 1452485, R01 EB017230 and R01 CA253923. This study was supported in part by U01 CA196405 to Massion. This project was supported in part by the National Center for Research Resources, Grant UL1 RR024975-01, and is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06. This study was funded in part by the Martineau Innovation Fund Grant through the Vanderbilt-Ingram Cancer Center Thoracic Working Group and NCI Early Detection Research Network 2U01CA152662 to PPM.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, R. et al. (2021). Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing Imputation Perspective. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12905. Springer, Cham. https://doi.org/10.1007/978-3-030-87240-3_62
Download citation
DOI: https://doi.org/10.1007/978-3-030-87240-3_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87239-7
Online ISBN: 978-3-030-87240-3
eBook Packages: Computer ScienceComputer Science (R0)