Computer Science > Computer Vision and Pattern Recognition

arXiv:2302.12228 (cs)

[Submitted on 23 Feb 2023 (v1), last revised 5 Mar 2023 (this version, v3)]

Title:Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Authors:Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or

View PDF

Abstract:Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain, e.g. a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps - accelerating personalization from dozens of minutes to seconds, while preserving quality.

Comments:	Project page at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2302.12228 [cs.CV]
	(or arXiv:2302.12228v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2302.12228

Submission history

From: Rinon Gal [view email]
[v1] Thu, 23 Feb 2023 18:46:41 UTC (20,565 KB)
[v2] Sun, 26 Feb 2023 18:59:29 UTC (20,736 KB)
[v3] Sun, 5 Mar 2023 15:48:51 UTC (20,736 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators