Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.15275 (cs)

[Submitted on 23 Apr 2024 (v1), last revised 25 Jun 2024 (this version, v3)]

Title:ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Authors:Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang

Abstract:Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline that incorporates unified human attributes and action captioning techniques from a constructed facial image pool. Based on this pipeline, a random reference training strategy is further devised to precisely capture the ID-relevant embeddings with an ID-preserving loss, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints are released at this https URL.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.15275 [cs.CV]
	(or arXiv:2404.15275v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.15275

Submission history

From: Shengju Qian [view email]
[v1] Tue, 23 Apr 2024 17:59:43 UTC (22,863 KB)
[v2] Tue, 14 May 2024 07:18:16 UTC (41,462 KB)
[v3] Tue, 25 Jun 2024 16:57:27 UTC (40,410 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators