Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.12908 (cs)

[Submitted on 20 Feb 2024 (v1), last revised 14 Oct 2024 (this version, v3)]

Title:RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Authors:Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui

View PDF HTML (experimental)

Abstract:Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-aware image diffusion models (e.g., layout, keypoints and segmentation maps) to enhance both realism and compositionality of the generated images. An intuitive and novel balancer is proposed to dynamically balance the strengths of the two models in denoising process, allowing plug-and-play use of any model without extra training. Extensive experiments show that our RealCompo consistently outperforms state-of-the-art text-to-image models and spatial-aware image diffusion models in multiple-object compositional generation while keeping satisfactory realism and compositionality of the generated images. Notably, our RealCompo can be seamlessly extended with a wide range of spatial-aware image diffusion models and stylized diffusion models. Our code is available at: this https URL

Comments:	NeurIPS 2024. Project: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2402.12908 [cs.CV]
	(or arXiv:2402.12908v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.12908

Submission history

From: Ling Yang [view email]
[v1] Tue, 20 Feb 2024 10:56:52 UTC (3,702 KB)
[v2] Fri, 24 May 2024 08:26:46 UTC (3,575 KB)
[v3] Mon, 14 Oct 2024 07:27:37 UTC (3,575 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators