Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.02482 (cs)

[Submitted on 2 Jul 2024 (v1), last revised 3 Jul 2024 (this version, v2)]

Title:Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Authors:Fei Shen, Hu Ye, Sibo Liu, Jun Zhang, Cong Wang, Xiao Han, Wei Yang

Abstract:Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Conditional Diffusion Models (RCDMs), a two-stage approach designed to enhance story generation's semantic consistency and temporal consistency. Specifically, in the first stage, the frame-prior transformer diffusion model is presented to predict the frame semantic embedding of the unknown clip by aligning the semantic correlations between the captions and frames of the known clip. The second stage establishes a robust model with rich contextual conditions, including reference images of the known clip, the predicted frame semantic embedding of the unknown clip, and text embeddings of all captions. By jointly injecting these rich contextual conditions at the image and feature levels, RCDMs can generate semantic and temporal consistency stories. Moreover, RCDMs can generate consistent stories with a single forward inference compared to autoregressive models. Our qualitative and quantitative results demonstrate that our proposed RCDMs outperform in challenging scenarios. The code and model will be available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.02482 [cs.CV]
	(or arXiv:2407.02482v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.02482

Submission history

From: Fei Shen [view email]
[v1] Tue, 2 Jul 2024 17:58:07 UTC (3,757 KB)
[v2] Wed, 3 Jul 2024 18:17:01 UTC (3,757 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators