Nothing Special   »   [go: up one dir, main page]

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

Zarana Parekh, Jason Baldridge, Daniel Cer, Austin Waters, Yinfei Yang


Abstract
By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning. Unfortunately, datasets have limited cross-modal associations: images are not paired with other images, captions are only paired with other captions of the same image, there are no negative associations and there are missing positive cross-modal associations. This undermines research into how inter-modality learning impacts intra-modality tasks. We address this gap with Crisscrossed Captions (CxC), an extension of the MS-COCO dataset with human semantic similarity judgments for 267,095 intra- and inter-modality pairs. We report baseline results on CxC for strong existing unimodal and multimodal models. We also evaluate a multitask dual encoder trained on both image-caption and caption-caption pairs that crucially demonstrates CxC’s value for measuring the influence of intra- and inter-modality learning.
Anthology ID:
2021.eacl-main.249
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Editors:
Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2855–2870
Language:
URL:
https://aclanthology.org/2021.eacl-main.249
DOI:
10.18653/v1/2021.eacl-main.249
Bibkey:
Cite (ACL):
Zarana Parekh, Jason Baldridge, Daniel Cer, Austin Waters, and Yinfei Yang. 2021. Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2855–2870, Online. Association for Computational Linguistics.
Cite (Informal):
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO (Parekh et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.249.pdf
Code
 additional community code
Data
CxCConceptual CaptionsFlickr30kMS COCO