Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities

Published: 01 September 2023 Publication History

Abstract

Objective:

With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling of structured and unstructured data is an increasingly important tool for clinical machine learning tasks. However, it is non-trivial to manage the differences in dimensionality, volume, and temporal characteristics of data modalities in the context of a shared target task. Furthermore, patients can have substantial variations in the availability of data, while existing multimodal modeling methods typically assume data completeness and lack a mechanism to handle missing modalities.

Methods:

We propose a Transformer-based fusion model with modality-specific tokens that summarize the corresponding modalities to achieve effective cross-modal interaction accommodating missing modalities in the clinical context. The model is further refined by inter-modal, inter-sample contrastive learning to improve the representations for better predictive performance. We denote the model as Attention-based cRoss-MOdal fUsion with contRast (ARMOUR). We evaluate ARMOUR using two input modalities (structured measurements and unstructured text), six clinical prediction tasks, and two evaluation regimes, either including or excluding samples with missing modalities.

Results:

Our model shows improved performances over unimodal or multimodal baselines in both evaluation regimes, including or excluding patients with missing modalities in the input. The contrastive learning improves the representation power and is shown to be essential for better results. The simple setup of modality-specific tokens enables ARMOUR to handle patients with missing modalities and allows comparison with existing unimodal benchmark results.

Conclusion:

We propose a multimodal model for robust clinical prediction to achieve improved performance while accommodating patients with missing modalities. This work could inspire future research to study the effective incorporation of multiple, more complex modalities of clinical data into a single model.

Graphical abstract

Display Omitted

References

[1]
Riley R.D., Ensor J., Snell K.I.E., Harrell F.E. Jr., Martin G.P., Reitsma J.B., Moons K.G.M., Collins G., van Smeden M., Calculating the sample size required for developing a clinical prediction model, BMJ 368 (2020) m441,.
[2]
Liu J., Capurro D., Nguyen A., Verspoor K., “Note Bloat” impacts deep learning-based NLP models for clinical prediction tasks, J. Biomed. Inform. 133 (2022),.
[3]
Eini-Porat B., Amir O., Eytan D., Shalit U., Tell me something interesting: Clinical utility of machine learning prediction models in the ICU, J. Biomed. Inform. 132 (2022),.
[4]
de Hond A.A.H., Leeuwenberg A.M., Hooft L., Kant I.M.J., Nijman S.W.J., van Os H.J.A., Aardoom J.J., Debray T.P.A., Schuit E., van Smeden M., Reitsma J.B., Steyerberg E.W., Chavannes N.H., Moons K.G.M., Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review, npj Digital Med. 5 (1) (2022) 1–13,. URL https://www.nature.com/articles/s41746-021-00549-7.
[5]
Che Z., Purushotham S., Cho K., Sontag D., Liu Y., Recurrent neural networks for multivariate time series with missing values, Sci. Rep. 8 (1) (2018) 6085,.
[6]
Shukla S.N., Marlin B., Multi-Time attention networks for irregularly sampled time series, in: International Conference on Learning Representations, 2021, URL https://openreview.net/forum?id=4c0J6lwQ4_.
[7]
Alsentzer E., Murphy J., Boag W., Weng W.-H., Jindi D., Naumann T., McDermott M., Publicly available clinical BERT embeddings, in: Proceedings of the 2nd Clinical Natural Language Processing Workshop, Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 72–78,. URL https://aclanthology.org/W19-1909.
[8]
Zhang D., Thadajarassiri J., Sen C., Rundensteiner E., Time-Aware transformer-based network for clinical notes series prediction, in: Doshi-Velez F., Fackler J., Jung K., Kale D., Ranganath R., Wallace B., Wiens J. (Eds.), Proceedings of the 5th Machine Learning for Healthcare Conference, in: Proceedings of Machine Learning Research, vol. 126, PMLR, 2020, pp. 566–588. URL https://proceedings.mlr.press/v126/zhang20c.html.
[9]
Yang H., Kuang L., Xia F., Multimodal temporal-clinical note network for mortality prediction, J. Biomed. Semantics 12 (1) (2021) 3,.
[10]
Soenksen L.R., Ma Y., Zeng C., Boussioux L., Villalobos Carballo K., Na L., Wiberg H.M., Li M.L., Fuentes I., Bertsimas D., Integrated multimodal artificial intelligence framework for healthcare applications, NPJ Digital Med. 5 (1) (2022) 149,.
[11]
Deznabi I., Iyyer M., Fiterau M., Predicting in-hospital mortality by combining clinical notes with time-series data, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online, 2021, pp. 4026–4031,. URL https://aclanthology.org/2021.findings-acl.352.
[12]
Silva J.F., Matos S., Modelling patient trajectories using multimodal information, J. Biomed. Inform. (2022),.
[13]
Z. Xu, D.R. So, A.M. Dai, MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 12, 2021, pp. 10532–10540, https://doi.org/10.1609/aaai.v35i12.17260, URL.
[14]
Baltrusaitis T., Ahuja C., Morency L.-P., Multimodal machine learning: A survey and taxonomy, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2019) 423–443,.
[15]
Yang B., Wu L., How to leverage the multimodal EHR data for better medical prediction?, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 4029–4038. URL https://aclanthology.org/2021.emnlp-main.329.
[16]
Ramachandram D., Taylor G.W., Deep multimodal learning: A survey on recent advances and trends, IEEE Signal Process. Mag. 34 (6) (2017) 96–108,.
[17]
M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, X. Peng, SMIL: Multimodal Learning with Severely Missing Modality, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 3, (ISSN: 2374-3468, 2374-3468) 2021, pp. 2302–2310, https://doi.org/10.1609/aaai.v35i3.16330, URL.
[18]
Khadanga S., Aggarwal K., Joty S., Srivastava J., Using clinical notes with time series data for ICU management, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 6432–6437,. URL https://aclanthology.org/D19-1678.
[19]
Zhang D., Yin C., Zeng J., Yuan X., Zhang P., Combining structured and unstructured data for predictive models: a deep learning approach, BMC Med. Inform. Decis. Making 20 (1) (2020) 280,.
[20]
Harutyunyan H., Khachatrian H., Kale D.C., Ver Steeg G., Galstyan A., Multitask learning and benchmarking with clinical time series data, Sci. data 6 (1) (2019) 96,.
[21]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, in: Advances in Neural Information Processing Systems, 2017, pp. 5998–6008. URL http://papers.nips.cc/paper/7181-attention-is-all-you-%0Aneed.pdf.
[22]
Huang S.-C., Pareek A., Seyyedi S., Banerjee I., Lungren M.P., Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines, NPJ Digital Med. 3 (2020) 136,.
[23]
Wang S., McDermott M.B.A., Chauhan G., Ghassemi M., Hughes M.C., Naumann T., MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III, in: Proceedings of the ACM Conference on Health, Inference, and Learning, CHIL ’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 222–235,.
[24]
Liu J., Capurro D., Nguyen A., Verspoor K., Early prediction of diagnostic-related groups and estimation of hospital cost by processing clinical notes, NPJ Digital Med. 4 (1) (2021) 103,.
[25]
Ghassemi M., Naumann T., Doshi-Velez F., Brimmer N., Joshi R., Rumshisky A., Szolovits P., Unfolding physiological state: mortality modelling in intensive care units, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, Association for Computing Machinery, New York, NY, USA, 2014, pp. 75–84,.
[26]
Cosgriff C.V., Celi L.A., Ko S., Sundaresan T., Armengol de la Hoz M.Á., Kaufman A.R., Stone D.J., Badawi O., Deliberato R.O., Developing well-calibrated illness severity scores for decision support in the critically ill, NPJ Digital Med. 2 (2019) 76,.
[27]
Ma L., Ma X., Gao J., Jiao X., Yu Z., Zhang C., Ruan W., Wang Y., Tang W., Wang J., Distilling knowledge from publicly available online EMR data to emerging epidemic for prognosis, in: Proceedings of the Web Conference 2021, WWW ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 3558–3568,.
[28]
Liu J., Zhang Z., Razavian N., Deep EHR: Chronic disease prediction using medical notes, in: Doshi-Velez F., Fackler J., Jung K., Kale D., Ranganath R., Wallace B., Wiens J. (Eds.), Proceedings of the 3rd Machine Learning for Healthcare Conference, in: Proceedings of Machine Learning Research, vol. 85, PMLR, 2018, pp. 440–464. URL https://proceedings.mlr.press/v85/liu18b.html.
[29]
Liu R., Greenstein J.L., Granite S.J., Fackler J.C., Bembea M.M., Sarma S.V., Winslow R.L., Data-driven discovery of a novel sepsis pre-shock state predicts impending septic shock in the ICU, Sci. Rep. 9 (1) (2019) 6145,.
[30]
Xu Z., Chou J., Zhang X.S., Luo Y., Isakova T., Adekkanattu P., Ancker J.S., Jiang G., Kiefer R.C., Pacheco J.A., Rasmussen L.V., Pathak J., Wang F., Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks, J. Biomed. Inform. 102 (2020),.
[31]
Wynants L., Van Calster B., Collins G.S., Riley R.D., Heinze G., Schuit E., Bonten M.M.J., Dahly D.L., Damen J.A.A., Debray T.P.A., de Jong V.M.T., De Vos M., Dhiman P., Haller M.C., Harhay M.O., Henckaerts L., Heus P., Kammer M., Kreuzberger N., Lohmann A., Luijken K., Ma J., Martin G.P., McLernon D.J., Andaur Navarro C.L., Reitsma J.B., Sergeant J.C., Shi C., Skoetz N., Smits L.J.M., Snell K.I.E., Sperrin M., Spijker R., Steyerberg E.W., Takada T., Tzoulaki I., van Kuijk S.M.J., van Bussel B., van der Horst I.C.C., van Royen F.S., Verbakel J.Y., Wallisch C., Wilkinson J., Wolff R., Hooft L., Moons K.G.M., van Smeden M., Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal, BMJ 369 (2020) m1328,.
[32]
Suresh H., Hunt N., Johnson A., Celi L.A., Szolovits P., Ghassemi M., Clinical intervention prediction and understanding with deep neural networks, in: Doshi-Velez F., Fackler J., Kale D., Ranganath R., Wallace B., Wiens J. (Eds.), Proceedings of the 2nd Machine Learning for Healthcare Conference, in: Proceedings of Machine Learning Research, vol. 68, PMLR, 2017, pp. 322–337. URL https://proceedings.mlr.press/v68/suresh17a.html.
[33]
Komorowski M., Celi L.A., Badawi O., Gordon A.C., Faisal A.A., The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care, Nat. Med. 24 (11) (2018) 1716–1720,.
[34]
Nanayakkara T., Clermont G., Langmead C.J., Swigon D., Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment, PLOS Digital Health 1 (2) (2022),. URL https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000012&type=printable.
[35]
Gartner D., Kolisch R., Neill D.B., Padman R., Machine learning approaches for early DRG classification and resource allocation, INFORMS J. Comput. 27 (4) (2015) 718–734,.
[36]
Singh D., Nagaraj S., Mashouri P., Drysdale E., Fischer J., Goldenberg A., Brudno M., Assessment of machine Learning-Based medical directives to expedite care in pediatric emergency medicine, JAMA Netw. Open 5 (3) (2022),.
[37]
Osawa I., Goto T., Yamamoto Y., Tsugawa Y., Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data, NPJ Digital Med. 3 (1) (2020) 148,.
[38]
Morid M.A., Kawamoto K., Ault T., Dorius J., Abdelrahman S., Supervised learning methods for predicting healthcare costs: Systematic literature review and empirical evaluation, in: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, vol. 2017, 2017, pp. 1312–1321. URL https://www.ncbi.nlm.nih.gov/pubmed/29854200.
[39]
Myers P.D., Ng K., Severson K., Kartoun U., Dai W., Huang W., Anderson F.A., Stultz C.M., Identifying unreliable predictions in clinical risk models, NPJ Digital Med. 3 (2020) 8,.
[40]
Xie F., Yuan H., Ning Y., Ong M.E.H., Feng M., Hsu W., Chakraborty B., Liu N., Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies, J. Biomed. Inform. 126 (2021),.
[41]
Rajkomar A., Oren E., Chen K., Dai A.M., Hajaj N., Hardt M., Liu P.J., Liu X., Marcus J., Sun M., Sundberg P., Yee H., Zhang K., Zhang Y., Flores G., Duggan G.E., Irvine J., Le Q., Litsch K., Mossin A., Tansuwan J., Wang D., Wexler J., Wilson J., Ludwig D., Volchenboum S.L., Chou K., Pearson M., Madabushi S., Shah N.H., Butte A.J., Howell M.D., Cui C., Corrado G.S., Dean J., Scalable and accurate deep learning with electronic health records, NPJ Digital Med. 1 (2018) 18,.
[42]
Ma L., Gao J., Wang Y., Zhang C., Wang J., Ruan W., Tang W., Gao X., Ma X., AdaCare: Explainable clinical health status representation learning via Scale-Adaptive feature extraction and recalibration, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34 no. 01, 2020, pp. 825–832,. URL https://ojs.aaai.org/index.php/AAAI/article/view/5427.
[43]
Purushotham S., Meng C., Che Z., Liu Y., Benchmarking deep learning models on large healthcare datasets, J. Biomed. Inform. 83 (2018) 112–134,.
[44]
Tang S., Davarmanesh P., Song Y., Koutra D., Sjoding M.W., Wiens J., Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data, J. Am. Med. Inform. Assoc. JAMIA 27 (12) (2020) 1921–1934,.
[45]
Liu L., Perez-Concha O., Nguyen A., Bennett V., Jorm L., Hierarchical label-wise attention transformer model for explainable ICD coding, J. Biomed. Inform. 133 (2022),.
[46]
Dai X., Chalkidis I., Darkner S., Elliott D., Revisiting transformer-based models for long document classification, 2022, arXiv:2204.06683.
[47]
Kim B.-H., Ganapathi V., Read, attend, and code: Pushing the limits of medical codes prediction from clinical notes by machines, in: Jung K., Yeung S., Sendak M., Sjoding M., Ranganath R. (Eds.), Proceedings of the 6th Machine Learning for Healthcare Conference, in: Proceedings of Machine Learning Research, vol. 149, PMLR, 2021, pp. 196–208. URL https://proceedings.mlr.press/v149/kim21a.html.
[48]
Vu T., Nguyen D.Q., Nguyen A., A label attention model for ICD coding from clinical text, in: Bessiere C. (Ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 3335–3341,.
[49]
Goodwin T.R., Demner-Fushman D., A customizable deep learning model for nosocomial risk prediction from critical care notes with indirect supervision, J. Am. Med. Inform. Assoc. JAMIA 27 (4) (2020) 567–576,.
[50]
Seinen T.M., Fridgeirsson E.A., Ioannou S., Jeannetot D., John L.H., Kors J.A., Markus A.F., Pera V., Rekkas A., Williams R.D., Yang C., van Mulligen E.M., Rijnbeek P.R., Use of unstructured text in prognostic clinical prediction models: a systematic review, J. Am. Med. Inform. Assoc. JAMIA 29 (7) (2022) 1292–1302,.
[51]
Feng J., Shaib C., Rudzicz F., Explainable clinical decision support from text, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 1478–1489,. URL https://aclanthology.org/2020.emnlp-main.115.
[52]
Huang K., Altosaar J., Ranganath R., ClinicalBERT: Modeling clinical notes and predicting hospital readmission, 2019, arXiv:1904.05342.
[53]
Wang Y., Zhao Y., Callcut R., Petzold L., Integrating physiological time series and clinical notes with transformer for early prediction of sepsis, 2022, arXiv:2203.14469.
[54]
Xu Z., So D.R., Dai A.M., MUFASA: Multimodal fusion architecture search for electronic health records, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 12, 2021, pp. 10532–10540,. URL https://ojs.aaai.org/index.php/AAAI/article/view/17260.
[55]
Ding D.Y., Li S., Narasimhan B., Tibshirani R., Cooperative learning for multiview analysis, Proc. Natl. Acad. Sci. USA 119 (38) (2022),.
[56]
Antol S., Agrawal A., Lu J., Mitchell M., Batra D., Zitnick C.L., Parikh D., VQA: Visual question answering, in: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 2015,. URL http://ieeexplore.ieee.org/document/7410636/.
[57]
Suhr A., Zhou S., Zhang A., Zhang I., Bai H., Artzi Y., A corpus for reasoning about natural language grounded in photographs, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 6418–6428,. URL https://aclanthology.org/P19-1644.
[58]
Su W., Zhu X., Cao Y., Li B., Lu L., Wei F., Dai J., VL-BERT: Pre-training of generic Visual-Linguistic representations, in: International Conference on Learning Representations, 2020, URL https://openreview.net/forum?id=SygXPaEYvH.
[59]
Li G., Duan N., Fang Y., Gong M., Jiang D., Unicoder-VL: A universal encoder for vision and language by Cross-Modal Pre-Training, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34 no. 07, 2020, pp. 11336–11344,. URL https://ojs.aaai.org/index.php/AAAI/article/view/6795.
[60]
Chen Y.-C., Li L., Yu L., El Kholy A., Ahmed F., Gan Z., Cheng Y., Liu J., UNITER: Universal Image-TExt representation learning, in: Computer Vision – ECCV 2020, Springer International Publishing, 2020, pp. 104–120,.
[61]
Lu J., Batra D., Parikh D., Lee S., ViLBERT: Pretraining Task-Agnostic visiolinguistic representations for Vision-and-Language tasks, in: Advances in Neural Information Processing Systems, vol. 32, Curran Associates, Inc., 2019, URL https://proceedings.neurips.cc/paper/2019/file/c74d97b01eae257e44aa9d5bade97baf-Paper.pdf.
[62]
Tan H., Bansal M., LXMERT: Learning Cross-Modality encoder representations from transformers, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 5100–5111,. URL https://aclanthology.org/D19-1514.
[63]
Li J., Selvaraju R.R., Gotmare A.D., Joty S., Xiong C., Hoi S., Align before fuse: Vision and language representation learning with momentum distillation, in: Beygelzimer A., Dauphin Y., Liang P., Vaughan J.W. (Eds.), Advances in Neural Information Processing Systems, 2021, URL https://openreview.net/forum?id=OJLaKwiXSbx.
[64]
Zadeh A., Zellers R., Pincus E., Morency L.-P., Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages, IEEE Intell. Syst. 31 (6) (2016) 82–88,.
[65]
Bagher Zadeh A., Liang P.P., Poria S., Cambria E., Morency L.-P., Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 2236–2246,. URL https://aclanthology.org/P18-1208.
[66]
Soleymani M., Garcia D., Jou B., Schuller B., Chang S.-F., Pantic M., A survey of multimodal sentiment analysis, Image Vis. Comput. 65 (2017) 3–14,. URL https://www.sciencedirect.com/science/article/pii/S0262885617301191.
[67]
Yang J., Wang Y., Yi R., Zhu Y., Rehman A., Zadeh A., Poria S., Morency L.-P., MTAG: Modal-temporal attention graph for unaligned human multimodal language sequences, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 1009–1021,. URL https://aclanthology.org/2021.naacl-main.79.
[68]
Yu W., Xu H., Yuan Z., Wu J., Learning Modality-Specific representations with Self-Supervised Multi-Task learning for multimodal sentiment analysis, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 12, 2021, pp. 10790–10797,. URL https://ojs.aaai.org/index.php/AAAI/article/view/17289.
[69]
Rahman W., Hasan M.K., Lee S., Bagher Zadeh A., Mao C., Morency L.-P., Hoque E., Integrating multimodal information in large pretrained transformers, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 2359–2369,. URL https://aclanthology.org/2020.acl-main.214.
[70]
Hazarika D., Zimmermann R., Poria S., MISA: Modality-invariant and -specific representations for multimodal sentiment analysis, in: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 1122–1131,.
[71]
Hasan M.K., Lee S., Rahman W., Zadeh A., Mihalcea R., Morency L.-P., Hoque E., Humor knowledge enriched transformer for understanding multimodal humor, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 14, 2021, pp. 12972–12980,. URL https://ojs.aaai.org/index.php/AAAI/article/view/17534.
[72]
Yuan Z., Li W., Xu H., Yu W., Transformer-based feature reconstruction network for robust multimodal sentiment analysis, in: Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 4400–4407,.
[73]
Zhao J., Li R., Jin Q., Missing modality imagination network for emotion recognition with uncertain missing modalities, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 2608–2618,. URL https://aclanthology.org/2021.acl-long.203.
[74]
Parthasarathy S., Sundaram S., Training strategies to handle missing modalities for Audio-Visual expression recognition, in: Companion Publication of the 2020 International Conference on Multimodal Interaction, in: ICMI ’20 Companion, Association for Computing Machinery, New York, NY, USA, 2020, pp. 400–404,.
[75]
Johnson A.E.W., Pollard T.J., Shen L., Lehman L.-W.H., Feng M., Ghassemi M., Moody B., Szolovits P., Celi L.A., Mark R.G., MIMIC-III, a freely accessible critical care database, Sci. data 3 (2016),.
[76]
Wu M., Ghassemi M., Feng M., Celi L.A., Szolovits P., Doshi-Velez F., Understanding vasopressor intervention and weaning: risk prediction in a public heterogeneous clinical time series database, J. Am. Med. Inform. Assoc. JAMIA 24 (3) (2017) 488–495,.
[77]
Ghassemi M., Pimentel M.A.F., Naumann T., Brennan T., Clifton D.A., Szolovits P., Feng M., A multivariate timeseries modeling approach to severity of illness assessment and forecasting in ICU with sparse, heterogeneous clinical data, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 2015, 2015, pp. 446–453. URL https://www.ncbi.nlm.nih.gov/pubmed/27182460.
[78]
Boag W., Doss D., Naumann T., Szolovits P., What’s in a note? Unpacking predictive value in clinical note representations, AMIA Joint Summits Transl. Sci. proc. AMIA Joint Summits Transl. Sci. 2017 (2018) 26–34. URL https://www.ncbi.nlm.nih.gov/pubmed/29888035.
[79]
Hsu C.-C., Karnwal S., Mullainathan S., Obermeyer Z., Tan C., Characterizing the value of information in medical notes, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 2062–2072,. URL https://aclanthology.org/2020.findings-emnlp.187.
[80]
Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186,. URL https://aclanthology.org/N19-1423.
[81]
Gong J.J., Guttag J.V., Learning to summarize electronic health records using Cross-Modality correspondences, in: Doshi-Velez F., Fackler J., Jung K., Kale D., Ranganath R., Wallace B., Wiens J. (Eds.), Proceedings of the 3rd Machine Learning for Healthcare Conference, in: Proceedings of Machine Learning Research, vol. 85, PMLR, 2018, pp. 551–570. URL https://proceedings.mlr.press/v85/gong18a.html.
[82]
He K., Fan H., Wu Y., Xie S., Girshick R., Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738. URL http://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html.
[83]
Chen X., Fan H., Girshick R., He K., Improved baselines with momentum contrastive learning, 2020, arXiv:2003.04297.
[84]
van den Oord A., Li Y., Vinyals O., Representation learning with contrastive predictive coding, 2018, arXiv:1807.03748.
[85]
Li L., Jamieson K., Rostamizadeh A., Gonina E., Ben-tzur J., Hardt M., Recht B., Talwalkar A., A system for massively parallel hyperparameter tuning, in: Dhillon I., Papailiopoulos D., Sze V. (Eds.), Proceedings of Machine Learning and Systems, vol. 2, 2020, pp. 230–246. URL https://proceedings.mlsys.org/paper/2020/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf.
[86]
D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, in: ICLR (Poster), 2015.
[87]
Acosta J.N., Falcone G.J., Rajpurkar P., Topol E.J., Multimodal biomedical AI, Nat. Med. 28 (9) (2022) 1773–1784,.
[88]
van Buuren S., Flexible Imputation of Missing Data, Second Edition, CRC Press, 2018.
[89]
van Buuren S., Groothuis-Oudshoorn K., Mice: Multivariate imputation by chained equations in R, J. Stat. Softw. 45 (2011) 1–67,. URL https://www.jstatsoft.org/index.php/jss/article/view/v045i03/0.
[90]
Luo Y., Evaluating the state of the art in missing data imputation for clinical data, Brief. Bioinform. 23 (1) (2022),.
[91]
Johnson A.E.W., Bulgarelli L., Shen L., Gayles A., Shammout A., Horng S., Pollard T.J., Moody B., Gow B., Lehman L.-W.H., Celi L.A., Mark R.G., MIMIC-IV, a freely accessible electronic health record dataset, Sci. Data 10 (1) (2023) 1,.
[92]
Johnson A.E.W., Pollard T.J., Berkowitz S.J., Greenbaum N.R., Lungren M.P., Deng C.-Y., Mark R.G., Horng S., MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports, Sci. data 6 (1) (2019) 317,.
[93]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N., An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations, 2021, URL https://openreview.net/forum?id=YicbFdNTTy.
[94]
Zhu W., Razavian N., Variationally regularized graph-based representation learning for electronic health records, in: Proceedings of the Conference on Health, Inference, and Learning, CHIL ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 1–13,.
[95]
Chen J., Zhang A., HGMF: Heterogeneous graph-based fusion for multimodal data with incompleteness, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 1295–1305,.
[96]
Du C., Du C., Wang H., Li J., Zheng W.-L., Lu B.-L., He H., Semi-supervised deep generative modelling of incomplete Multi-Modality emotional data, in: Proceedings of the 26th ACM International Conference on Multimedia, MM ’18, Association for Computing Machinery, New York, NY, USA, 2018, pp. 108–116,.
[97]
Hessel J., Lee L., Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 861–877,. URL https://aclanthology.org/2020.emnlp-main.62.
[98]
Liang P.P., Wu P., Ziyin L., Morency L.-P., Salakhutdinov R., Cross-Modal generalization: Learning in low resource modalities via Meta-Alignment, in: Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 2680–2689,.
[99]
Tay Y., Dehghani M., Rao J., Fedus W., Abnar S., Chung H.W., Narang S., Yogatama D., Vaswani A., Metzler D., Scale efficiently: Insights from pretraining and finetuning transformers, in: International Conference on Learning Representations, 2022, URL https://openreview.net/forum?id=f2OYVDyfIB.
[100]
Mosbach M., Andriushchenko M., Klakow D., On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines, in: International Conference on Learning Representations, 2021, URL https://openreview.net/forum?id=nzpLWnVAyah.

Cited By

View all
  • (2024)Modular Quantitative Temporal Transformer for Biobank-Scale Unified RepresentationsArtificial Intelligence in Medicine10.1007/978-3-031-66535-6_24(212-226)Online publication date: 9-Jul-2024

Index Terms

  1. Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of Biomedical Informatics
    Journal of Biomedical Informatics  Volume 145, Issue C
    Sep 2023
    198 pages

    Publisher

    Elsevier Science

    San Diego, CA, United States

    Publication History

    Published: 01 September 2023

    Author Tags

    1. Multimodal modeling
    2. Clinical prediction
    3. Missing modality
    4. Natural language processing
    5. Machine learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Modular Quantitative Temporal Transformer for Biobank-Scale Unified RepresentationsArtificial Intelligence in Medicine10.1007/978-3-031-66535-6_24(212-226)Online publication date: 9-Jul-2024

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media