Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.13126 (cs)

[Submitted on 23 Mar 2023 (v1), last revised 14 Jul 2023 (this version, v3)]

Title:MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

Authors:Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wenjing Yang

View PDF

Abstract:The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. Specifically, we experimentally find that the responses of classifier-free guidance are highly related to the saliency of generated images. Thus we propose to trust different models in their areas of expertise by blending the predicted noises of two diffusion models in a saliency-aware manner. SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks. Extensive experiments show the impressive effectiveness of SNB in various applications. Project page is available at this https URL.

Comments:	Accepted by ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.13126 [cs.CV]
	(or arXiv:2303.13126v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.13126

Submission history

From: Jing Zhao [view email]
[v1] Thu, 23 Mar 2023 09:30:39 UTC (17,127 KB)
[v2] Sat, 25 Mar 2023 14:38:16 UTC (24,924 KB)
[v3] Fri, 14 Jul 2023 09:36:35 UTC (24,924 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators