Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394171.3413951acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

RT-VENet: A Convolutional Network for Real-time Video Enhancement

Published: 12 October 2020 Publication History

Abstract

Real-time video enhancement is in great demand due to the extensive usage of live video applications, but existing approaches are far from satisfying the strict requirements of speed and stability. We present a novel convolutional network that can perform high-quality enhancement on 1080p videos at 45 FPS with a single CPU, which has high potential for real-world deployment. The proposed network is designed based on a light-weight image network and further consolidated for temporal consistency with a temporal feature aggregation (TFA) module. Unlike most image translation networks that use decoders to generate target images, our network discards decoders and employs only an encoder and a small head. The network predicts color mapping functions instead of pixel values in a grid-like container which fits the CNN structure well and also advances the enhancement to be scalable to any video resolution. Furthermore, the temporal consistency of the output will be enforced by the TFA module which utilizes the learned temporal coherence of semantics across frames. We also demonstrate that the mapping representation is general to various enhancement tasks, such as relighting, retouching and dehazing, on benchmark datasets. Our approach achieves the state-of-the-art performance and performs about 10 times faster than the current real-time method on high-resolution videos.

Supplementary Material

ZIP File (mmfp0356aux.zip)
Supplemental Material for RT-VENet.
MP4 File (3394171.3413951.mp4)
Real-time video enhancement is in great demand due to the extensive usage of live video applications, but existing approaches are far from satisfying the strict requirements of speed and stability. We present a novel convolutional network that can perform high-quality enhancement on 1080p videos at 45 FPS with a single CPU, which has high potential for real-world deployment. The proposed network is designed based on a light-weight image network and further consolidated for temporal consistency with a temporal feature aggregation (TFA) module. The network predicts color mapping functions instead of pixel values in a grid-like container which fts the CNN structure well and also advances the enhancement to be scalable to any video resolution. Our approach achieves the state-of-the-art performance and performs about 10 times faster than the current real-time method on high-resolution videos. This video provides a recorded presentation related to our work.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 16). 265--283.
[2]
Mathieu Aubry, Sylvain Paris, Samuel W. Hasinoff, Jan Kautz, and Frédo Durand. 2014. Fast Local Laplacian Filters: Theory and Applications. ACM Trans. Graph., Vol. 33, 5 (Sept. 2014), 167:1--167:14.
[3]
Jonathan T Barron and Jitendra Malik. 2014. Shape, illumination, and reflectance from shading. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 8 (2014), 1670--1687.
[4]
D. Berman, T. Treibitz, and S. Avidan. 2016. Non-local Image Dehazing. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1674--1682.
[5]
Simone Bianco, Claudio Cusano, Flavio Piccoli, and Raimondo Schettini. 2019. Content-Preserving Tone Adjustment for Image Enhancement. In CVPR Workshops.
[6]
Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister. 2015. Blind Video Temporal Consistency. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2015), Vol. 34, 6 (2015).
[7]
V. Bychkovsky, S. Paris, E. Chan, and F. Durand. 2011. Learning photographic global tonal adjustment with a database of input / output image pairs. In CVPR 2011. 97--104.
[8]
Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778--4787.
[9]
Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao. 2016. Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, Vol. 25, 11 (2016), 5187--5198.
[10]
Chen Chen, Qifeng Chen, Minh N Do, and Vladlen Koltun. 2019. Seeing motion in the dark. In Proceedings of the IEEE International Conference on Computer Vision. 3185--3194.
[11]
Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. 2018. Learning to See in the Dark. CoRR, Vol. abs/1805.01934 (2018). arxiv: 1805.01934 http://arxiv.org/abs/1805.01934
[12]
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105--1114.
[13]
Jiawen Chen, Andrew Adams, Neal Wadhwa, and Samuel W Hasinoff. 2016. Bilateral guided upsampling. ACM Transactions on Graphics (TOG), Vol. 35, 6 (2016), 203.
[14]
Y. Chen, Y. Wang, M. Kao, and Y. Chuang. 2018. Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6306--6314. https://doi.org/10.1109/CVPR.2018.00660
[15]
Jimmy Ba Diederik P. Kingma. 2015. Adam: A method for stochastic optimization. (2015).
[16]
Gabriel Eilertsen, Rafal K Mantiuk, and Jonas Unger. 2019. Single-frame Regularization for Temporally Stable CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11176--11185.
[17]
Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, and Frédo Durand. 2017. Deep Bilateral Learning for Real-time Image Enhancement. ACM Trans. Graph., Vol. 36, 4, Article 118 (July 2017), 12 pages. https://doi.org/10.1145/3072959.3073592
[18]
Yoav HaCohen, Eli Shechtman, Dan B Goldman, and Dani Lischinski. 2013. Optimizing color consistency in photo collections. ACM Transactions on Graphics (TOG), Vol. 32, 4 (2013), 1--10.
[19]
Kaiming He and Jian Sun. 2015. Fast guided filter. arXiv preprint arXiv:1505.00996 (2015).
[20]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[21]
Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, and Stephen Lin. 2018. Exposure: A White-Box Photo Post-Processing Framework. ACM Trans. Graph., Vol. 37, 2, Article 26 (May 2018), 17 pages.
[22]
Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV). 172--189.
[23]
Yan Huang, Wei Wang, and Liang Wang. 2015. Bidirectional recurrent convolutional networks for multi-frame super-resolution. In Advances in Neural Information Processing Systems. 235--243.
[24]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2016. Image-to-Image Translation with Conditional Adversarial Networks. arxiv (2016).
[25]
D. J. Jobson, Z. Rahman, and G. A. Woodell. 1997. Properties and performance of a center/surround retinex. IEEE Transactions on Image Processing, Vol. 6, 3 (March 1997), 451--462.
[26]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.
[27]
Nal Kalchbrenner, A"aron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. 2017. Video pixel networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1771--1779.
[28]
Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint Bilateral Upsampling. In ACM SIGGRAPH 2007 Papers (San Diego, California) (SIGGRAPH '07). ACM, Article 96.
[29]
Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018a. Learning Blind Video Temporal Consistency. In European Conference on Computer Vision.
[30]
Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018b. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision (ECCV). 170--185.
[31]
B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng. 2017. AOD-Net: All-in-One Dehazing Network. In 2017 IEEE International Conference on Computer Vision (ICCV). 4780--4788.
[32]
B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang. 2019. Benchmarking Single-Image Dehazing and Beyond. IEEE Transactions on Image Processing, Vol. 28, 1 (Jan 2019), 492--505.
[33]
Sifei Liu, Jinshan Pan, and Ming-Hsuan Yang. 2016. Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 560--576.
[34]
J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3431--3440. https://doi.org/10.1109/CVPR.2015.7298965
[35]
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR, Vol. abs/1411.1784 (2014). arxiv: 1411.1784 http://arxiv.org/abs/1411.1784
[36]
Nayar and Branzoi. 2003. Adaptive dynamic range imaging: optical control of pixel exposures over space and time. In Proceedings Ninth IEEE International Conference on Computer Vision. 1168--1175 vol.2.
[37]
Jongchan Park, Joon-Young Lee, Donggeun Yoo, and In So Kweon. 2018. Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning. CoRR, Vol. abs/1804.04450 (2018). arxiv: 1804.04450 http://arxiv.org/abs/1804.04450
[38]
Lyndsey C. Pickup, Zheng Pan, Donglai Wei, YiChang Shih, Changshui Zhang, Andrew Zisserman, Bernhard Schölkopf, and William T. Freeman. 2014. Seeing the Arrow of Time. In IEEE Conference on Computer Vision and Pattern Recognition.
[39]
Anurag Ranjan and Michael J. Black. 2016. Optical Flow Estimation using a Spatial Pyramid Network. CoRR, Vol. abs/1611.00850 (2016). arxiv: 1611.00850 http://arxiv.org/abs/1611.00850
[40]
Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. 2016. Single image dehazing via multi-scale convolutional neural networks. In European conference on computer vision. Springer, 154--169.
[41]
Zhe Ren, Junchi Yan, Bingbing Ni, Bin Liu, Xiaokang Yang, and Hongyuan Zha. 2017. Unsupervised Deep Learning for Optical Flow Estimation. In AAAI.
[42]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.
[43]
D. Scharstein and R. Szeliski. 2003. High-accuracy stereo depth maps using structured light. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1. I--I.
[44]
S. Shimizu, T. Kondo, T. Kohashi, M. Tsurata, and T. Komuro. 1992. A new algorithm for exposure control based on fuzzy logic for video cameras. IEEE Transactions on Consumer Electronics, Vol. 38, 3 (Aug 1992), 617--623.
[45]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision. Springer, 746--760.
[46]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[47]
Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, and Oliver Wang. 2017. Deep video deblurring for hand-held cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1279--1288.
[48]
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4472--4480.
[49]
Pavel Tokmakov, Karteek Alahari, and Cordelia Schmid. 2017. Learning video object segmentation with visual memory. In Proceedings of the IEEE International Conference on Computer Vision. 4481--4490.
[50]
Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, and Katerina Fragkiadaki. 2017. Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision. CoRR, Vol. abs/1705.11166 (2017). arxiv: 1705.11166 http://arxiv.org/abs/1705.11166
[51]
Baoyuan Wang, Yizhou Yu, and Ying-Qing Xu. 2011. Example-based Image Color and Tone Style Enhancement. In ACM SIGGRAPH 2011 Papers (SIGGRAPH '11). Article 64, 12 pages.
[52]
Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, and Jiaya Jia. 2019 b. Underexposed Photo Enhancement Using Deep Illumination Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).
[54]
Wei Wang, Xin Chen, Cheng Yang, Xiang Li, Xuemei Hu, and Tao Yue. 2019 a. Enhancing Low Light Videos by Exploring High Sensitivity Camera Noise. In Proceedings of the IEEE International Conference on Computer Vision. 4111--4119.
[55]
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems. 802--810.
[56]
Li Xu, Jimmy Ren, Qiong Yan, Renjie Liao, and Jiaya Jia. 2015. Deep Edge-Aware Filters. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 1669--1678.
[57]
J. Yan, S. Lin, S. B. Kang, and X. Tang. 2014. A Learning-to-Rank Approach for Image Color Enhancement. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2987--2994.
[58]
Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, and Yizhou Yu. 2016. Automatic Photo Adjustment Using Deep Neural Networks. ACM Trans. Graph., Vol. 35, 2, Article 11 (Feb. 2016), 15 pages.
[59]
Lu Yuan and Jian Sun. 2012. Automatic exposure correction of consumer photographs. In European Conference on Computer Vision. Springer, 771--785.
[60]
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4694--4702.
[61]
Bo Zhang, Mingming He, Jing Liao, Pedro V Sander, Lu Yuan, Amine Bermak, and Dong Chen. 2019. Deep Exemplar-based Video Colorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8061.
[62]
J. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 2242--2251.

Cited By

View all
  • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Adaptive Locally-Aligned Transformer for low-light video enhancementComputer Vision and Image Understanding10.1016/j.cviu.2023.103916240(103916)Online publication date: Mar-2024
  • (2022)Real-time Image Enhancement with Attention AggregationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356460719:2s(1-19)Online publication date: 26-Sep-2022
  • Show More Cited By

Index Terms

  1. RT-VENet: A Convolutional Network for Real-time Video Enhancement

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. image/video enhancement
    3. relighting
    4. retouching

    Qualifiers

    • Research-article

    Funding Sources

    • Natural Science Foundation of China

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 26 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
    • (2024)Adaptive Locally-Aligned Transformer for low-light video enhancementComputer Vision and Image Understanding10.1016/j.cviu.2023.103916240(103916)Online publication date: Mar-2024
    • (2022)Real-time Image Enhancement with Attention AggregationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356460719:2s(1-19)Online publication date: 26-Sep-2022
    • (2022)Multi-task Video Enhancement for Dental InterventionsMedical Image Computing and Computer Assisted Intervention – MICCAI 202210.1007/978-3-031-16449-1_18(177-187)Online publication date: 17-Sep-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media