Computer Science > Machine Learning

arXiv:2402.10210 (cs)

[Submitted on 15 Feb 2024]

Title:Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Authors:Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

View PDF

Abstract:Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

Comments:	28 pages, 8 figures, 10 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:2402.10210 [cs.LG]
	(or arXiv:2402.10210v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.10210

Submission history

From: Quanquan Gu [view email]
[v1] Thu, 15 Feb 2024 18:59:18 UTC (32,965 KB)

Computer Science > Machine Learning

Title:Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators