Computer Science > Computer Vision and Pattern Recognition

arXiv:2207.01334 (cs)

[Submitted on 4 Jul 2022 (v1), last revised 3 Aug 2022 (this version, v2)]

Title:Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

Authors:Kevin Qinghong Lin, Alex Jinpeng Wang, Rui Yan, Eric Zhongcong Xu, Rongcheng Tu, Yanru Zhu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Wei Liu, Mike Zheng Shou

View PDF

Abstract:In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge. Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation to MIR benchmark. Furthermore, we devise an adaptive multi-instance max-margin loss to effectively fine-tune the model and equip the dual-softmax technique for reliable inference. Our best single model obtains strong performance on the challenge test set with 47.39% mAP and 61.44% nDCG. The code is available at this https URL.

Comments:	To appeared in CVPRW22. 5 pages, 2 figures, 2 tables. Code: this https URL. The EPIC challenge technical report of EgoVLP arXiv:2206.01670. See Ego4D challenge technical report arXiv:2207.01622
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2207.01334 [cs.CV]
	(or arXiv:2207.01334v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2207.01334

Submission history

From: Qinghong Lin [view email]
[v1] Mon, 4 Jul 2022 11:32:48 UTC (661 KB)
[v2] Wed, 3 Aug 2022 12:08:50 UTC (660 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators