Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394171.3413526acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Stable Video Style Transfer Based on Partial Convolution with Depth-Aware Supervision

Published: 12 October 2020 Publication History

Abstract

As a very important research issue in digital media art, neural learning based video style transfer has attracted more and more attention. A lot of recent works import optical flow method to original image style transfer framework to preserve frame-coherency and prevent flicker. However, these methods highly rely on paired video datasets of content video and stylized video, which are often difficult to obtain. Another limitation of existing methods is that while maintaining inter-frame coherency, they will introduce strong ghosting artifacts. In order to address these problems, this paper has following contributions: (1).presents a novel training framework for video style transfer without dependency on video dataset of target style; (2).firstly focuses on the ghosting problem existing in most previous works and uses partial convolution-based strategy to utilize inter-frame context and correlation, together with additional depth loss as a constrain to the generated frames to suppress ghosting artifacts and preserve stability at the same time. Extensive experiments demonstrate that our method can produce natural and stable video frames with target style. Qualitative and quantitative comparisons also show that the proposed approach outperforms previous works in terms of overall image quality and inter-frame stability. To facilitate future research, we publish our experiment code at \urlhttps://github.com/Huage001/Artistic-Video-Partial-Conv-Depth-Loss.

Supplementary Material

ZIP File (mmfp2161aux.zip)
* PartialConvDepthLossVST.wmv: Our supplementary video to offer a better visual experience for our video style transfer method.
MP4 File (3394171.3413526.mp4)
We propose a novel training framework for video style transfer, which learns the general style of a set of images for video style transfer and only relies on target image dataset instead of video dataset. Meanwhile, we are the first work focusing on the ghosting problem existing in most previous works and using partial convolution-based strategy to utilize inter-frame context and correlation, together with the additional depth loss as a constrain to the generated frames to suppress ghosting artifacts and preserve stability at the same time. Extensive experiments demonstrate that our method can produce natural and stable video frames with target style. Qualitative and quantitative comparisons also show that the proposed approach outperforms previous works in terms of overall image quality and inter-frame stability. To facilitate future research, we publish our experiment code at https://github.com/Huage001/Artistic-Video-Partial-Conv-Depth-Loss.

References

[1]
2013. Oz the Great and Powerful. https://en.wikipedia.org/wiki/Oz_the_Great_ and_Powerful.
[2]
. Loving Vincent. https://en.wikipedia.org/wiki/Loving_Vincent.
[3]
2020. WIKIART. https://www.wikiart.org/.
[4]
Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-gan: Unsupervised video retargeting. In Proceedings of the European conference on computer vision (ECCV). 119--135.
[5]
Dina Bashkirova, Ben Usman, and Kate Saenko. 2018. Unsupervised Video-to-Video Translation. CoRR, Vol. abs/1806.03698 (2018). arxiv: 1806.03698 http://arxiv.org/abs/1806.03698
[6]
D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. 2012. A naturalistic open source movie for optical flow evaluation. In European Conf. on Computer Vision (ECCV) (Part IV, LNCS 7577), A. Fitzgibbon et al. (Eds.) (Ed.). Springer-Verlag, 611--625.
[7]
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017a. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105--1114.
[8]
Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, and Gang Hua. 2017b. Stylebank: An explicit representation for neural image style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1897--1906.
[9]
Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng. 2016. Single-Image Depth Perception in the Wild. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 730--738. http://papers.nips.cc/paper/6489-single-image-depth-perception-in-the-wild.pdf
[10]
Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019. Mocycle-gan: Unpaired video-to-video translation. In Proceedings of the 27th ACM International Conference on Multimedia. 647--655.
[11]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[12]
David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision. 2650--2658.
[13]
Chang Gao, Derun Gu, Fangjun Zhang, and Yizhou Yu. 2018. Reconet: Real-time coherent video style transfer network. In Asian Conference on Computer Vision. Springer, 637--653.
[14]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2414--2423.
[15]
Agrim Gupta, Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2017. Characterizing and improving stability in neural style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 4067--4076.
[16]
Haozhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, and Wei Liu. 2017. Real-time neural style transfer for videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 783--791.
[17]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.
[18]
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). http://lmb.informatik.uni-freiburg.de//Publications/2017/IMKDB17
[19]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.
[20]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[21]
Xueting Li, Sifei Liu, Jan Kautz, and Ming-Hsuan Yang. 2019. Learning linear transformations for fast image and video style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3809--3817.
[22]
Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018a. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV). 85--100.
[23]
Guilin Liu, Kevin J Shih, Ting-Chun Wang, Fitsum A Reda, Karan Sapra, Zhiding Yu, Andrew Tao, and Bryan Catanzaro. 2018b. Partial convolution based padding. arXiv preprint arXiv:1811.11718 (2018).
[24]
Xiao-Chang Liu, Ming-Ming Cheng, Yu-Kun Lai, and Paul L Rosin. 2017. Depth-aware neural style transfer. In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering. 1--10.
[25]
Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, and Luc Van Gool. 2018. Exemplar guided unsupervised image-to-image translation with semantic consistency. arXiv preprint arXiv:1805.11145(2018).
[26]
Stephan R. Richter, Zeeshan Hayder, and Vladlen Koltun. 2017. Playing for Benchmarks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 2232--2241. https://doi.org/10.1109/ICCV.2017.243
[27]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[28]
Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. 2018. Artistic style transfer for videos and spherical images. International Journal of Computer Vision, Vol. 126, 11 (2018), 1199--1219.
[29]
Narayanan Sundaram, Thomas Brox, and Kurt Keutzer. 2010. Dense point trajectories by GPU-accelerated large displacement optical flow. In European conference on computer vision. Springer, 438--451.
[30]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018).
[31]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss. In Computer Vision (ICCV), 2017 IEEE International Conference on.

Cited By

View all
  • (2023)Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01758(18329-18338)Online publication date: Jun-2023
  • (2023)A Temporal Consistency Enhancement Algorithm Based on Pixel Flicker CorrectionNeural Information Processing10.1007/978-981-99-1639-9_6(65-78)Online publication date: 15-Apr-2023
  • (2021)AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.00658(6629-6638)Online publication date: Oct-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cycle-gan
  2. ghosting artifact
  3. optical flow
  4. video style transfer

Qualifiers

  • Research-article

Funding Sources

  • National High Technology Research and Development Program of China
  • National Natural Science Foundation of China
  • Innovation Fund of State Key Laboratory for Novel Software Technology

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01758(18329-18338)Online publication date: Jun-2023
  • (2023)A Temporal Consistency Enhancement Algorithm Based on Pixel Flicker CorrectionNeural Information Processing10.1007/978-981-99-1639-9_6(65-78)Online publication date: 15-Apr-2023
  • (2021)AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.00658(6629-6638)Online publication date: Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media