research-article

Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities

Authors: Jinghui Liu, Daniel Capurro,

Anthony Nguyen, Karin VerspoorAuthors Info & Claims

Volume 145, Issue C

https://doi.org/10.1016/j.jbi.2023.104466

Published: 01 September 2023 Publication History

Abstract

Objective:

With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling of structured and unstructured data is an increasingly important tool for clinical machine learning tasks. However, it is non-trivial to manage the differences in dimensionality, volume, and temporal characteristics of data modalities in the context of a shared target task. Furthermore, patients can have substantial variations in the availability of data, while existing multimodal modeling methods typically assume data completeness and lack a mechanism to handle missing modalities.

Methods:

We propose a Transformer-based fusion model with modality-specific tokens that summarize the corresponding modalities to achieve effective cross-modal interaction accommodating missing modalities in the clinical context. The model is further refined by inter-modal, inter-sample contrastive learning to improve the representations for better predictive performance. We denote the model as Attention-based cRoss-MOdal fUsion with contRast (ARMOUR). We evaluate ARMOUR using two input modalities (structured measurements and unstructured text), six clinical prediction tasks, and two evaluation regimes, either including or excluding samples with missing modalities.

Results:

Our model shows improved performances over unimodal or multimodal baselines in both evaluation regimes, including or excluding patients with missing modalities in the input. The contrastive learning improves the representation power and is shown to be essential for better results. The simple setup of modality-specific tokens enables ARMOUR to handle patients with missing modalities and allows comparison with existing unimodal benchmark results.

Conclusion:

We propose a multimodal model for robust clinical prediction to achieve improved performance while accommodating patients with missing modalities. This work could inspire future research to study the effective incorporation of multiple, more complex modalities of clinical data into a single model.

Graphical abstract

Display Omitted

References

[1]

Riley R.D., Ensor J., Snell K.I.E., Harrell F.E. Jr., Martin G.P., Reitsma J.B., Moons K.G.M., Collins G., van Smeden M., Calculating the sample size required for developing a clinical prediction model, BMJ 368 (2020) m441,.

Abstract

Objective:

Methods:

Results:

Conclusion:

Graphical abstract

References

Cited By

Index Terms

Recommendations

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Multimodal Cascaded Framework with Metric Learning Robust to Missing Modalities for Person Classification

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations