Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Published: 19 November 2024 Publication History

Abstract

We propose the first video diffusion framework for reference-based lineart video colorization. Unlike previous works that rely solely on image generative models to colorize lineart frame by frame, our approach leverages a large-scale pretrained video diffusion model to generate colorized animation videos. This approach leads to more temporally consistent results and is better equipped to handle large motions. Firstly, we introduce Sketch-guided ControlNet which provides additional control to finetune an image-to-video diffusion model for controllable video synthesis, enabling the generation of animation videos conditioned on lineart. We then propose Reference Attention to facilitate the transfer of colors from the reference frame to other frames containing fast and expansive motions. Finally, we present a novel scheme for sequential sampling, incorporating the Overlapped Blending Module and Prev-Reference Attention, to extend the video diffusion model beyond its original fixed-length limitation for long video colorization. Both qualitative and quantitative results demonstrate that our method significantly outperforms state-of-the-art techniques in terms of frame and video quality, as well as temporal consistency. Moreover, our method is capable of generating high-quality, long temporal-consistent animation videos with large motions, which is not achievable in previous works. Our code and model are available at https://luckyhzt.github.io/lvcd.

References

[1]
AnythingV3. 2023. Anything V3.0. https://huggingface.co/swl-models/anything-v3.0.
[2]
Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In Proc. CVPR. 18208--18218.
[3]
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, and Robin Rombach. 2023. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. arXiv:2311.15127 [cs.CV]
[4]
Yu Cao, Xiangqiao Meng, P. Y. Mok, Tong-Yee Lee, Xueting Liu, and Ping Li. 2024. AnimeDiffusion: Anime Diffusion Colorization. IEEE Transactions on Visualization and Computer Graphics (2024), 1--14.
[5]
Yu Cao, Hao Tian, and P. Y. Mok. 2023. Attention-Aware Anime Line Drawing Colorization. ICME (2023), 1637--1642.
[6]
Caroline Chan, Fredo Durand, and Phillip Isola. 2022. Learning to generate line drawings that convey geometry and semantics. In CVPR.
[7]
Shuhong Chen and Matthias Zwicker. 2022. Improving the Perceptual Quality of 2D Animation Interpolation. In Proc. ECCV.
[8]
Shu-Yu Chen, Jia-Qi Zhang, Lin Gao, Yue He, Shihong Xia, Min Shi, and Fang-Lue Zhang. 2022. Active Colorization for Cartoon Line Drawings. IEEE Transactions on Visualization and Computer Graphics 28, 2 (2022), 1198--1208.
[9]
Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. 2023. Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models. arXiv:2305.13840 [cs.CV]
[10]
Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024. SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction. In The 12th ICLR.
[11]
Yuanzheng Ci, Xinzhu Ma, Zhihui Wang, Haojie Li, and Zhongxuan Luo. 2018. User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks. In Proceedings of the 26th ACM international conference on Multimedia.
[12]
Per-Erik Danielsson. 1980. Euclidean distance mapping. Computer Graphics and Image Processing 14, 3 (1980), 227--248.
[13]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, Vol. 34. 8780--8794.
[14]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Vol. 27.
[15]
Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2023. SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models. arXiv:2311.16933 [cs.CV] https://arxiv.org/abs/2311.16933
[16]
Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2024. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. ICLR (2024).
[17]
Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, and Qifeng Chen. 2022. Latent Video Diffusion Models for High-Fidelity Long Video Generation. arXiv:2211.13221 [cs.CV]
[18]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS, Vol. 30.
[19]
Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, and Taesung Park. 2024. Distilling Diffusion Models into Conditional GANs. arXiv preprint arXiv:2405.05967 (2024).
[20]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. In Proc. NeurIPS.
[21]
H. Kim, H. Jhoo, E. Park, and S. Yoo. 2019. Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss. In ICCV 2019. IEEE Computer Society, Los Alamitos, CA, USA, 9055--9064.
[22]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014.
[23]
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2023. Multi-Concept Customization of Text-to-Image Diffusion. In Proc. CVPR.
[24]
Xiaoyu Li, Bo Zhang, Jing Liao, and Pedro V. Sander. 2022. Deep Sketch-Guided Cartoon Video Inbetweening. IEEE Transactions on Visualization and Computer Graphics 28, 8 (aug 2022), 2938--2952.
[25]
lllyasviel. 2017. SketchKeras: An u-net to take the sketch from a painting. https://github.com/lllyasviel/sketchKeras
[26]
Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, and Yu Qiao. 2024. Latte: Latent Diffusion Transformer for Video Generation. arXiv preprint arXiv:2401.03048 (2024).
[27]
Mikubill. 2023. ControlNet for Stable Diffusion WebUI. https://github.com/Mikubill/sd-webui-controlnet.
[28]
Yingge Qu, Tien-Tsin Wong, and Pheng-Ann Heng. 2006. Manga colorization. ACM Trans. Graph. 25, 3 (jul 2006), 1214--1220.
[29]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proc. CVPR. 10684--10695.
[30]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Cham, 234--241.
[31]
Kazuhiro Sato, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2014. Reference-based manga colorization by graph correspondence using quadratic programming. In SIGGRAPH Asia 2014 Technical Briefs. Association for Computing Machinery, Article 15, 4 pages.
[32]
Min Shi, Jia-Qi Zhang, Shu-Yu Chen, Lin Gao, Yu-Kun Lai, and Fang-Lue Zhang. 2020. Deep Line Art Video Colorization with a Few References. arXiv:2003.10685 [cs.CV]
[33]
Li Siyao, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris Metaxas, Chen Change Loy, and Ziwei Liu. 2021. Deep Animation Video Interpolation in the Wild. In CVPR 2021. 6583--6591.
[34]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
[35]
Daniel Sýkora, John Dingliana, and Steven Collins. 2009. LazyBrush: Flexible Painting Tool for Hand-drawn Cartoons. Comput. Graph. Forum 28 (04 2009), 599--608.
[36]
Zachary Teed and Jia Deng. 2020. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In Proc. ECCV 2020. 402--419.
[37]
Harrish Thasarathan and Mehran Ebrahimi. 2019. Artist-Guided Semiautomatic Animation Colorization. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 3157--3160.
[38]
Harrish Thasarathan, Kamyar Nazeri, and Mehran Ebrahimi. 2019. Automatic Temporally Coherent Video Colorization. In 2019 16th Conference on Computer and Robot Vision (CRV). 189--194.
[39]
Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphaël Marinier, Marcin Michalski, and Sylvain Gelly. 2019. FVD: A new Metric for Video Generation. In ICLR 2019 Workshop DeepGenStruct.
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30.
[41]
Ning Wang, Muyao Niu, Zhi Dou, Zhihui Wang, Zhiyong Wang, Zhaoyan Ming, Bin Liu, and Haojie Li. 2023. Coloring anime line art videos with transformation region enhancement network. Pattern Recognition 141 (2023), 109562.
[42]
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image Quality Assessment: From Error Visibility to Structural Similarity. Trans. Img. Proc. 13, 4 (apr 2004), 600--612.
[43]
Xiaoyu Xiang, Ding Liu, Xiao Yang, Yiheng Zhu, and Xiaohui Shen. 2021. Anime2Sketch: A Sketch Extractor for Anime Arts with Deep Networks. https://github.com/Mukosame/Anime2Sketch
[44]
Saining Xie and Zhuowen Tu. 2015. Holistically-Nested Edge Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[45]
Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, and Tien-Tsin Wong. 2024. ToonCrafter: Generative Cartoon Interpolation. arXiv:2405.17933 [cs.CV] https://arxiv.org/abs/2405.17933
[46]
Sheng-Siang Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, and Nan Duan. 2023. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation. In Annual Meeting of the Association for Computational Linguistics.
[47]
Yifeng Yu, Jiangbo Qian, Chong Wang, Yihong Dong, and Baisong Liu. 2024. Animation line art colorization based on the optical flow method. Computer Animation and Virtual Worlds 35 (02 2024).
[48]
Lvmin Zhang, Chengze Li, Tien-Tsin Wong, Yi Ji, and Chunping Liu. 2018. Two-stage sketch colorization. ACM Trans. Graph. 37, 6, Article 261 (dec 2018), 14 pages.
[49]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In Proc. ICCV. 3836--3847.
[50]
Qian Zhang, Bo Wang, Wei Wen, Hai Li, and Junhui Liu. 2021. Line Art Correlation Matching Feature Transfer Network for Automatic Animation Colorization. In Proc. WACV. 3872--3881.
[51]
Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, and Tao Mei. 2024. TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models. arXiv:2403.17005 [cs.CV]
[52]
Changqing Zou, Haoran Mo, Chengying Gao, Ruofei Du, and Hongbo Fu. 2019. Language-based colorization of scene sketches. ACM Trans. Graph. 38, 6, Article 233 (nov 2019), 16 pages.

Index Terms

  1. LVCD: Reference-based Lineart Video Colorization with Diffusion Models

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 43, Issue 6
      December 2024
      1828 pages
      EISSN:1557-7368
      DOI:10.1145/3702969
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 November 2024
      Published in TOG Volume 43, Issue 6

      Check for updates

      Author Tags

      1. lineart video colorization
      2. diffusion models
      3. animation

      Qualifiers

      • Research-article

      Funding Sources

      • GRF grant from the Research Grants Council (RGC) of the Hong Kong Special Administrative Region, China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 2
        Total Downloads
      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 22 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media