Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.10054 (cs)

[Submitted on 15 Apr 2024]

Title:AIGeN: An Adversarial Approach for Instruction Generation in VLN

Authors:Niyati Rawal, Roberto Bigazzi, Lorenzo Baraldi, Rita Cucchiara

Abstract:In the last few years, the research interest in Vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance.

Comments:	Accepted to 7th Multimodal Learning and Applications Workshop (MULA 2024) at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
Cite as:	arXiv:2404.10054 [cs.CV]
	(or arXiv:2404.10054v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.10054

Submission history

From: Roberto Bigazzi [view email]
[v1] Mon, 15 Apr 2024 18:00:30 UTC (3,717 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AIGeN: An Adversarial Approach for Instruction Generation in VLN

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AIGeN: An Adversarial Approach for Instruction Generation in VLN

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators