Abstract
Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification. For such safety-critical applications, it is essential for any proposed ML method to present a level of explainability along with good accuracy. In addition, such methods must be able to process several echo videos obtained from various heart views and the interactions among them to properly produce predictions for a variety of cardiovascular measurements or interpretation tasks. Prior work lacks explainability or is limited in scope by focusing on a single cardiovascular task. To remedy this, we propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability, while simultaneously enabling multi-video training where the inter-play among echo image patches in the same frame, all frames in the same video, and inter-video relationships are captured based on a downstream task. We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection. Our model achieves mean absolute errors of 4.15 and 4.84 for single and dual-video EF estimation and an accuracy of 96.5% for AS detection, while providing informative task-specific attention maps and prototypical explainability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 813–824. PMLR (2021)
Biewald, L.: Experiment tracking with weights and biases (2020)
Cheng, L.H., Sun, X., van der Geest, R.J.: Contrastive learning for echocardiographic view integration. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13434, pp. 340–349. Springer, Cham (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2019)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Duffy, G., et al.: High-throughput precision phenotyping of left ventricular hypertrophy with cardiovascular deep learning. JAMA Cardiol. 7(4), 386–395 (2022)
Fiorito, A.M., Østvik, A., Smistad, E., Leclerc, S., Bernard, O., Lovstakken, L.: Detection of cardiac events in echocardiography using 3D convolutional recurrent neural networks. In: IEEE International Ultrasonics Symposium, pp. 1–4 (2018)
Gao, X., Li, W., Loomes, M., Wang, L.: A fused deep learning architecture for viewpoint classification of echocardiography. Inf. Fusion 36, 103–113 (2017)
Ginsberg, T., et al.: Deep video networks for automatic assessment of aortic stenosis in echocardiography. In: Noble, J.A., Aylward, S., Grimwood, A., Min, Z., Lee, S.-L., Hu, Y. (eds.) ASMUS 2021. LNCS, vol. 12967, pp. 202–210. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87583-1_20
Gu, A.N., et al.: Efficient echocardiogram view classification with sampling-free uncertainty estimation. In: Noble, J.A., et al. (eds.) ASMUS 2021. LNCS, vol. 12967, pp. 139–148. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87583-1_14
Huang, Z., Long, G., Wessler, B., Hughes, M.C.: A new semi-supervised learning benchmark for classifying view and diagnosing aortic stenosis from echocardiograms. In: Proceedings of the 6th Machine Learning for Healthcare Conference (2021)
Huang, Z., Long, G., Wessler, B., Hughes, M.C.: Tmed 2: a dataset for semi-supervised classification of echocardiograms (2022)
Kazemi Esfeh, M.M., Luong, C., Behnami, D., Tsang, T., Abolmaesumi, P.: A deep Bayesian video analysis framework: towards a more robust estimation of ejection fraction. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12262, pp. 582–590. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59713-9_56
Liu, F., Wang, K., Liu, D., Yang, X., Tian, J.: Deep pyramid local attention neural network for cardiac structure segmentation in two-dimensional echocardiography. Med. Image Anal. 67, 101873 (2021)
Melas-Kyriazi, L.: Vit pytorch (2020). https://github.com/lukemelas/PyTorch-Pretrained-ViT
Mokhtari, M., Tsang, T., Abolmaesumi, P., Liao, R.: EchoGNN: explainable ejection fraction estimation with graph neural networks. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention. MICCAI 2022, vol. 13434, pp. 360–369. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-16440-8_35
Otto, C.M., et al.: 2020 ACC/AHA guideline for the management of patients with valvular heart disease: executive summary. J. Am. Coll. Cardiol. 77(4), 450–500 (2021)
Ouyang, D., et al.: Video-based AI for beat-to-beat assessment of cardiac function. Nature 580, 252–256 (2020)
Reynaud, H., Vlontzos, A., Hou, B., Beqiri, A., Leeson, P., Kainz, B.: Ultrasound video transformers for cardiac ejection fraction estimation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12906, pp. 495–505. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87231-1_48
Roshanitabrizi, P., et al.: Ensembled prediction of rheumatic heart disease from ungated doppler echocardiography acquired in low-resource settings. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13431, pp. 602–612. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16431-6_57
Spitzer, E., et al.: Aortic stenosis and heart failure: disease ascertainment and statistical considerations for clinical trials. Card. Fail. Rev. 5, 99–105 (2019)
Stacey, J., Belinkov, Y., Rei, M.: Supervising model attention with human explanations for robust natural language inference. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, pp. 11349–11357 (2022)
Suetens, P.: Fundamentals of Medical Imaging, 2nd edn. Cambridge University Press, Cambridge (2009)
Thomas, S., Gilbert, A., Ben-Yosef, G.: Light-weight spatio-temporal graphs for segmentation and ejection fraction prediction in cardiac ultrasound. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13434, pp. 380–390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16440-8_37
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates (2017)
Xue, M., et al.: Protopformer: concentrating on prototypical parts in vision transformers for interpretable image recognition. ArXiv (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mokhtari, M., Ahmadi, N., Tsang, T.S.M., Abolmaesumi, P., Liao, R. (2024). GEMTrans: A General, Echocardiography-Based, Multi-level Transformer Framework for Cardiovascular Diagnosis. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14349. Springer, Cham. https://doi.org/10.1007/978-3-031-45676-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-45676-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45675-6
Online ISBN: 978-3-031-45676-3
eBook Packages: Computer ScienceComputer Science (R0)