Computer Science > Computer Vision and Pattern Recognition

arXiv:2205.01883 (cs)

[Submitted on 4 May 2022]

Title:All You May Need for VQA are Image Captions

Authors:Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut

View PDF

Abstract:Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. In this paper, we propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question generation. We show that the resulting data is of high-quality. VQA models trained on our data improve state-of-the-art zero-shot accuracy by double digits and achieve a level of robustness that lacks in the same model trained on human-annotated VQA data.

Comments:	2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2205.01883 [cs.CV]
	(or arXiv:2205.01883v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2205.01883

Submission history

From: Soravit Changpinyo [view email]
[v1] Wed, 4 May 2022 04:09:23 UTC (2,157 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2022-05

Change to browse by:

cs
cs.CL

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:All You May Need for VQA are Image Captions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:All You May Need for VQA are Image Captions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators