Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.09817 (cs)

[Submitted on 21 Apr 2022 (v1), last revised 21 Jul 2022 (this version, v4)]

Title:Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

Authors:Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay

View PDF

Abstract:Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision--language processing. We release a language model that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports. Further, we propose a self-supervised joint vision--language approach with a focus on better text modelling. It establishes new state of the art results on a wide range of publicly available benchmarks, in part by leveraging our new domain-specific language model. We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing. A broad evaluation, including on this new dataset, shows that our contrastive learning approach, aided by textual-semantic modelling, outperforms prior methods in segmentation tasks, despite only using a global-alignment objective.

Comments:	To appear in ECCV 2022. Code: this https URL Dataset: this https URL Demo Notebook: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2204.09817 [cs.CV]
	(or arXiv:2204.09817v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.09817
Journal reference:	Computer Vision - ECCV 2022, LNCS vol 13696, pp 1-21
Related DOI:	https://doi.org/10.1007/978-3-031-20059-5_1

Submission history

From: Ozan Oktay [view email]
[v1] Thu, 21 Apr 2022 00:04:35 UTC (13,264 KB)
[v2] Tue, 17 May 2022 00:30:53 UTC (13,264 KB)
[v3] Thu, 14 Jul 2022 20:45:13 UTC (11,308 KB)
[v4] Thu, 21 Jul 2022 14:46:17 UTC (11,308 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators