Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.17618 (cs)

[Submitted on 29 Nov 2023 (v1), last revised 1 Dec 2023 (this version, v3)]

Title:ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

Authors:Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen

View PDF

Abstract:The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generations, versatile multimodal generative shape models can significantly benefit various fields like 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a word-sentence-paragraph framework to discretize continuous shapes into shape words, further assembles these words for shape sentences, as well as integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multimodal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.17618 [cs.CV]
	(or arXiv:2311.17618v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.17618

Submission history

From: Fukun Yin [view email]
[v1] Wed, 29 Nov 2023 13:26:29 UTC (4,977 KB)
[v2] Thu, 30 Nov 2023 08:46:05 UTC (4,977 KB)
[v3] Fri, 1 Dec 2023 12:46:13 UTC (6,086 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators