Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.09064 (cs)

[Submitted on 15 Nov 2023]

Title:Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Authors:Yeongbin Kim, Gautam Singh, Junyeong Park, Caglar Gulcehre, Sungjin Ahn

View PDF

Abstract:Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability to control the fraction of possible factor combinations used during training. We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination. We hope that this benchmark will help advance visual systematic compositionality.

Comments:	Published as a conference paper at NeurIPS 2023. The first two authors contributed equally. To download the benchmark, visit this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2311.09064 [cs.CV]
	(or arXiv:2311.09064v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.09064

Submission history

From: Gautam Singh [view email]
[v1] Wed, 15 Nov 2023 16:02:13 UTC (6,175 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators