research-article

LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Authors:

Jing LiaoAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 43, Issue 6

Article No.: 177, Pages 1 - 11

https://doi.org/10.1145/3687910

Published: 19 November 2024 Publication History

Abstract

We propose the first video diffusion framework for reference-based lineart video colorization. Unlike previous works that rely solely on image generative models to colorize lineart frame by frame, our approach leverages a large-scale pretrained video diffusion model to generate colorized animation videos. This approach leads to more temporally consistent results and is better equipped to handle large motions. Firstly, we introduce Sketch-guided ControlNet which provides additional control to finetune an image-to-video diffusion model for controllable video synthesis, enabling the generation of animation videos conditioned on lineart. We then propose Reference Attention to facilitate the transfer of colors from the reference frame to other frames containing fast and expansive motions. Finally, we present a novel scheme for sequential sampling, incorporating the Overlapped Blending Module and Prev-Reference Attention, to extend the video diffusion model beyond its original fixed-length limitation for long video colorization. Both qualitative and quantitative results demonstrate that our method significantly outperforms state-of-the-art techniques in terms of frame and video quality, as well as temporal consistency. Moreover, our method is capable of generating high-quality, long temporal-consistent animation videos with large motions, which is not achievable in previous works. Our code and model are available at https://luckyhzt.github.io/lvcd.

References

[1]

AnythingV3. 2023. Anything V3.0. https://huggingface.co/swl-models/anything-v3.0.

[2]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In Proc. CVPR. 18208--18218.

[3]

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, and Robin Rombach. 2023. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. arXiv:2311.15127 [cs.CV]

[4]

Yu Cao, Xiangqiao Meng, P. Y. Mok, Tong-Yee Lee, Xueting Liu, and Ping Li. 2024. AnimeDiffusion: Anime Diffusion Colorization. IEEE Transactions on Visualization and Computer Graphics (2024), 1--14.

Digital Library

[5]

Yu Cao, Hao Tian, and P. Y. Mok. 2023. Attention-Aware Anime Line Drawing Colorization. ICME (2023), 1637--1642.

[6]

Caroline Chan, Fredo Durand, and Phillip Isola. 2022. Learning to generate line drawings that convey geometry and semantics. In CVPR.

[7]

Shuhong Chen and Matthias Zwicker. 2022. Improving the Perceptual Quality of 2D Animation Interpolation. In Proc. ECCV.

Digital Library

[8]

Shu-Yu Chen, Jia-Qi Zhang, Lin Gao, Yue He, Shihong Xia, Min Shi, and Fang-Lue Zhang. 2022. Active Colorization for Cartoon Line Drawings. IEEE Transactions on Visualization and Computer Graphics 28, 2 (2022), 1198--1208.

Digital Library

[9]

Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. 2023. Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models. arXiv:2305.13840 [cs.CV]

[10]

Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024. SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction. In The 12th ICLR.

[11]

Yuanzheng Ci, Xinzhu Ma, Zhihui Wang, Haojie Li, and Zhongxuan Luo. 2018. User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks. In Proceedings of the 26th ACM international conference on Multimedia.

Digital Library

[12]

Per-Erik Danielsson. 1980. Euclidean distance mapping. Computer Graphics and Image Processing 14, 3 (1980), 227--248.

[13]

Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, Vol. 34. 8780--8794.

[14]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Vol. 27.

Digital Library

[15]

Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2023. SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models. arXiv:2311.16933 [cs.CV] https://arxiv.org/abs/2311.16933

[16]

Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2024. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. ICLR (2024).

[17]

Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, and Qifeng Chen. 2022. Latent Video Diffusion Models for High-Fidelity Long Video Generation. arXiv:2211.13221 [cs.CV]

[18]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS, Vol. 30.

[19]

Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, and Taesung Park. 2024. Distilling Diffusion Models into Conditional GANs. arXiv preprint arXiv:2405.05967 (2024).

[20]

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. In Proc. NeurIPS.

[21]

H. Kim, H. Jhoo, E. Park, and S. Yoo. 2019. Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss. In ICCV 2019. IEEE Computer Society, Los Alamitos, CA, USA, 9055--9064.

[22]

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014.

[23]

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2023. Multi-Concept Customization of Text-to-Image Diffusion. In Proc. CVPR.

[24]

Xiaoyu Li, Bo Zhang, Jing Liao, and Pedro V. Sander. 2022. Deep Sketch-Guided Cartoon Video Inbetweening. IEEE Transactions on Visualization and Computer Graphics 28, 8 (aug 2022), 2938--2952.

Digital Library

[25]

lllyasviel. 2017. SketchKeras: An u-net to take the sketch from a painting. https://github.com/lllyasviel/sketchKeras

[26]

Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, and Yu Qiao. 2024. Latte: Latent Diffusion Transformer for Video Generation. arXiv preprint arXiv:2401.03048 (2024).

[27]

Mikubill. 2023. ControlNet for Stable Diffusion WebUI. https://github.com/Mikubill/sd-webui-controlnet.

[28]

Yingge Qu, Tien-Tsin Wong, and Pheng-Ann Heng. 2006. Manga colorization. ACM Trans. Graph. 25, 3 (jul 2006), 1214--1220.

Digital Library

[29]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proc. CVPR. 10684--10695.

[30]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Cham, 234--241.

[31]

Kazuhiro Sato, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2014. Reference-based manga colorization by graph correspondence using quadratic programming. In SIGGRAPH Asia 2014 Technical Briefs. Association for Computing Machinery, Article 15, 4 pages.

Digital Library

[32]

Min Shi, Jia-Qi Zhang, Shu-Yu Chen, Lin Gao, Yu-Kun Lai, and Fang-Lue Zhang. 2020. Deep Line Art Video Colorization with a Few References. arXiv:2003.10685 [cs.CV]

[33]

Li Siyao, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris Metaxas, Chen Change Loy, and Ziwei Liu. 2021. Deep Animation Video Interpolation in the Wild. In CVPR 2021. 6583--6591.

[34]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.

[35]

Daniel Sýkora, John Dingliana, and Steven Collins. 2009. LazyBrush: Flexible Painting Tool for Hand-drawn Cartoons. Comput. Graph. Forum 28 (04 2009), 599--608.

[36]

Zachary Teed and Jia Deng. 2020. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In Proc. ECCV 2020. 402--419.

Digital Library

[37]

Harrish Thasarathan and Mehran Ebrahimi. 2019. Artist-Guided Semiautomatic Animation Colorization. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 3157--3160.

[38]

Harrish Thasarathan, Kamyar Nazeri, and Mehran Ebrahimi. 2019. Automatic Temporally Coherent Video Colorization. In 2019 16th Conference on Computer and Robot Vision (CRV). 189--194.

[39]

Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphaël Marinier, Marcin Michalski, and Sylvain Gelly. 2019. FVD: A new Metric for Video Generation. In ICLR 2019 Workshop DeepGenStruct.

[40]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30.

[41]

Ning Wang, Muyao Niu, Zhi Dou, Zhihui Wang, Zhiyong Wang, Zhaoyan Ming, Bin Liu, and Haojie Li. 2023. Coloring anime line art videos with transformation region enhancement network. Pattern Recognition 141 (2023), 109562.

Digital Library

[42]

Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image Quality Assessment: From Error Visibility to Structural Similarity. Trans. Img. Proc. 13, 4 (apr 2004), 600--612.

Digital Library

[43]

Xiaoyu Xiang, Ding Liu, Xiao Yang, Yiheng Zhu, and Xiaohui Shen. 2021. Anime2Sketch: A Sketch Extractor for Anime Arts with Deep Networks. https://github.com/Mukosame/Anime2Sketch

[44]

Saining Xie and Zhuowen Tu. 2015. Holistically-Nested Edge Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

Digital Library

[45]

Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, and Tien-Tsin Wong. 2024. ToonCrafter: Generative Cartoon Interpolation. arXiv:2405.17933 [cs.CV] https://arxiv.org/abs/2405.17933

[46]

Sheng-Siang Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, and Nan Duan. 2023. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation. In Annual Meeting of the Association for Computational Linguistics.

[47]

Yifeng Yu, Jiangbo Qian, Chong Wang, Yihong Dong, and Baisong Liu. 2024. Animation line art colorization based on the optical flow method. Computer Animation and Virtual Worlds 35 (02 2024).

[48]

Lvmin Zhang, Chengze Li, Tien-Tsin Wong, Yi Ji, and Chunping Liu. 2018. Two-stage sketch colorization. ACM Trans. Graph. 37, 6, Article 261 (dec 2018), 14 pages.

Digital Library

[49]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In Proc. ICCV. 3836--3847.

[50]

Qian Zhang, Bo Wang, Wei Wen, Hai Li, and Junhui Liu. 2021. Line Art Correlation Matching Feature Transfer Network for Automatic Animation Colorization. In Proc. WACV. 3872--3881.

[51]

Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, and Tao Mei. 2024. TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models. arXiv:2403.17005 [cs.CV]

[52]

Changqing Zou, Haoran Mo, Chengying Gao, Ruofei Du, and Hongbo Fu. 2019. Language-based colorization of scene sketches. ACM Trans. Graph. 38, 6, Article 233 (nov 2019), 16 pages.

Digital Library

Index Terms

LVCD: Reference-based Lineart Video Colorization with Diffusion Models
1. Computing methodologies
  1. Computer graphics
    1. Animation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Towards Photorealistic Video Colorization via Gated Color-Guided Image Diffusion Models
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Video colorization poses challenging tasks, necessitating structural stability, continuity, and details control in the colors produced. In this paper, based on a pretrained text-to-image model, we introduce the Gated Color Guidance module (GCG ), ...
OmniFusion: Exemplar-Based Video Colorization Using OmniMotion and DifFusion Priors
Computer Vision – ACCV 2024
Abstract
Exemplar-based video colorization is a challenging task that involves the consistent propagation of colors across frames and the reasonable inference of colors from grayscale within frames. This paper proposes a novel video colorization method ...
Flexible Motion In-betweening with Diffusion Models
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 43, Issue 6

December 2024

1828 pages

EISSN:1557-7368

DOI:10.1145/3702969

Issue’s Table of Contents

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2024

Published in TOG Volume 43, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

GRF grant from the Research Grants Council (RGC) of the Hong Kong Special Administrative Region, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
93
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)32

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents