Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.14628 (cs)

[Submitted on 21 Mar 2024 (v1), last revised 30 Aug 2024 (this version, v2)]

Title:Zero-Shot Multi-Object Scene Completion

Authors:Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov

Abstract:We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image. Despite notable advancements in single-object 3D shape completion, high-quality reconstructions in highly cluttered real-world multi-object scenes remains a challenge. To address this issue, we propose OctMAE, an architecture that leverages an Octree U-Net and a latent 3D MAE to achieve high-quality and near real-time multi-object scene completion through both local and global geometric reasoning. Because a naive 3D MAE can be computationally intractable and memory intensive even in the latent space, we introduce a novel occlusion masking strategy and adopt 3D rotary embeddings, which significantly improves the runtime and scene completion quality. To generalize to a wide range of objects in diverse scenes, we create a large-scale photorealistic dataset, featuring a diverse set of 12K 3D object models from the Objaverse dataset which are rendered in multi-object scenes with physics-based positioning. Our method outperforms the current state-of-the-art on both synthetic and real-world datasets and demonstrates a strong zero-shot capability.

Comments:	Published at ECCV 2024, Webpage: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.14628 [cs.CV]
	(or arXiv:2403.14628v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.14628

Submission history

From: Shun Iwase [view email]
[v1] Thu, 21 Mar 2024 17:59:59 UTC (19,503 KB)
[v2] Fri, 30 Aug 2024 05:34:25 UTC (29,971 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Multi-Object Scene Completion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Multi-Object Scene Completion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators