Computer Science > Artificial Intelligence

arXiv:2406.14343 (cs)

[Submitted on 20 Jun 2024 (v1), last revised 22 Jul 2024 (this version, v5)]

Title:IWISDM: Assessing instruction following in multimodal models at scale

Authors:Xiaoxuan Lei, Lucas Gomez, Hao Yuan Bai, Pouya Bashivan

Abstract:The ability to perform complex tasks from detailed instructions is a key to many remarkable achievements of our species. As humans, we are not only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achieved unprecedented success in performing complex tasks. Yet, most existing benchmarks are largely confined to single-modality inputs (either text or vision), narrowing the scope of multimodal assessments, particularly for instruction-following in multimodal contexts. To bridge this gap, we introduce the instructed-Virtual VISual Decision Making (iWISDM) environment engineered to generate a limitless array of vision-language tasks of varying complexity. Using iWISDM, we compiled three distinct benchmarks of instruction following visual tasks across varying complexity levels and evaluated several newly developed multimodal models on these benchmarks. Our findings establish iWISDM as a robust benchmark for assessing the instructional adherence of both existing and emergent multimodal models and highlight a large gap between these models' ability to precisely follow instructions with that of this http URL code of iWISDM is available on GitHub at this https URL.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.14343 [cs.AI]
	(or arXiv:2406.14343v5 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2406.14343

Submission history

From: Xiaoxuan Lei [view email]
[v1] Thu, 20 Jun 2024 14:09:54 UTC (14,117 KB)
[v2] Sun, 23 Jun 2024 01:51:53 UTC (14,118 KB)
[v3] Tue, 25 Jun 2024 15:12:01 UTC (14,118 KB)
[v4] Wed, 3 Jul 2024 21:44:23 UTC (14,118 KB)
[v5] Mon, 22 Jul 2024 03:25:19 UTC (14,118 KB)

Computer Science > Artificial Intelligence

Title:IWISDM: Assessing instruction following in multimodal models at scale

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:IWISDM: Assessing instruction following in multimodal models at scale

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators