Abstract
Our method solves unified multi-modal learning in an diverse and imbalanced setting, which are the key features of medical modalities compared to the extensively-studied ones. Different from existing works that assumed fixed or maximum number of modalities for multi-modal learning, our model not only manages any missing scenarios but is also capable of handling new modalities and unseen combinations. We argue that, the key towards this any combination model is the proper design of alignment, which should guarantee both modality invariance across diverse inputs and effective modeling of complementarities within the unified metric space. Instead of exact cross-modal alignment, we propose to decouple these two functions into representation-level and task-level alignment, which we empirically show are both indispensable in this task. Moreover, we introduce tunable modality-agnostic Transformer to unify the representation learning process, which significantly reduces modality-specific parameters and enhances the scalability of our model. The experiments have shown that the proposed method enables one single model handling all possible combinations of the six seen modalities and two new modalities in Alzheimer’s Disease diagnosis, with superior performance on longer combinations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cui, C., et al.: Deep multi-modal fusion of image and non-image data in disease diagnosis and prognosis: a review. Progress in Biomedical Engineering 5(2), 022001 (2023)
Duan, J., et al.: Multi-modal alignment using representation codebook. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15651–15660 (2022)
Fan, Y., Xu, W., Wang, H., Wang, J., Guo, S.: PMR: prototypical modal rebalance for multimodal learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20029–20038 (2023)
Gorishniy, Y., Rubachev, I., Khrulkov, V., Babenko, A.: Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34, 18932–18943 (2021)
Huang, Y., Xu, J., Zhou, Y., Tong, T., Zhuang, X., T.A.D.N.I.A.: Diagnosis of alzheimer’s disease via multi-modality 3d convolutional neural network. Front. Neurosci. 13 (2019). https://doi.org/10.3389/fnins.2019.00509, https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2019.00509
Jack Jr, C.R., et al.: The alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging Official J. Int. Soc. Mag. Reson. Med. 27(4), 685–691 (2008)
Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A., Carreira, J.: Perceiver: general perception with iterative attention. In: International Conference on Machine Learning, pp. 4651–4664. PMLR (2021)
Kang, L., Gong, H., Wan, X., Li, H.: Visual-attribute prompt learning for progressive mild cognitive impairment prediction. In: Greenspan, H., et al. (ed.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, MICCAI 2023, LNCS, vol. 14224, pp. 547–557. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43904-9_53
Lee, Y.L., Tsai, Y.H., Chiu, W.C., Lee, C.Y.: Multimodal prompting with missing modalities for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14943–14952 (2023)
Liu, L., Liu, S., Zhang, L., To, X.V., Nasrallah, F., Chandra, S.S.: Cascaded multi-modal mixing transformers for alzheimer’s disease classification with incomplete data. NeuroImage 277, 120267 (2023)
Qiu, S., Miller, M.I., Joshi, P.S., Lee, J.C., Xue, C., Ni, Y., Wang, Y., De Anda-Duran, I., Hwang, P.H., Cramer, J.A., et al.: Multimodal deep learning for alzheimer’s disease dementia assessment. Nature communications 13(1), 3404 (2022)
Shvetsova, N., et al.: Everything at once-multi-modal fusion transformer for video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20020–20029 (2022)
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Song, J., Zheng, J., Li, P., Lu, X., Zhu, G., Shen, P.: An effective multimodal image fusion method using MRI and pet for alzheimer’s disease diagnosis. Front. Digit. Health 3, 637386 (2021)
Tu, Y., Lin, S., Qiao, J., Zhuang, Y., Zhang, P.: Alzheimer’s disease diagnosis via multimodal feature fusion. Computers in Biology and Medicine 148, 105901 (2022)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, H., Chen, Y., Ma, C., Avery, J., Hull, L., Carneiro, G.: Multi-modal learning with missing modality via shared-specific feature modelling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15878–15887 (2023)
Wang, P., et al.: One-peace: exploring one general representation model toward unlimited modalities. arXiv preprint arXiv:2305.11172 (2023)
Yao, W., Yin, K., Cheung, W.K., Liu, J., Qin, J.: DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency. In: Proceedings of the AAAI Conference on Artificial Intelligence (2024)
Woo, S., Lee, S., Park, Y., Nugroho, M.A., Kim, C.: Towards good practices for missing modality robust action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2776–2784 (2023)
Yao, J., Zhu, X., Zhu, F., Huang, J.: Deep correlational learning for survival prediction from multi-modality data. In: Descoteaux, M., et al. (eds.) Medical Image Computing and Computer-Assisted Intervention − MICCAI 2017, MICCAI 2017, LNCS, vol. 10434, pp. 406–414. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_46
Zhou, R., Zhou, H., Chen, B.Y., Shen, L., Zhang, Y., He, L.: Attentive deep canonical correlation analysis for diagnosing alzheimer’s disease using multimodal imaging genetics. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, MICCAI 2023, LNCS, vol. 14221, pp. 681–691. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43895-0_64
Zhou, T., Thung, K.H., Zhu, X., Shen, D.: Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Human brain mapping 40(3), 1001–1016 (2019)
Zuo, H., Liu, R., Zhao, J., Gao, G., Li, H.: Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Acknowledgments
The study was funded by the General Research Fund of Hong Kong Research Grants Council (No. 15218521); the grant under Theme-based Research Scheme of Hong Kong Research Grants Council (No. T45-401/22-N).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
Author Feng Yidan has received research grants from the General Research Fund of Hong Kong Research Grants Council (No. 15218521) and Theme-based Research Scheme of Hong Kong Research Grants Council (No. T45-401/22-N).
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, Y., Gao, B., Deng, S., Qiu, A., Qin, J. (2024). Unified Multi-modal Learning for Any Modality Combinations in Alzheimer’s Disease Diagnosis. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15003. Springer, Cham. https://doi.org/10.1007/978-3-031-72384-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-72384-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72383-4
Online ISBN: 978-3-031-72384-1
eBook Packages: Computer ScienceComputer Science (R0)