research-article

RT-VENet: A Convolutional Network for Real-time Video Enhancement

Authors:

Henrik Turbell,

Yan LuAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 4088 - 4097

https://doi.org/10.1145/3394171.3413951

Published: 12 October 2020 Publication History

Abstract

Real-time video enhancement is in great demand due to the extensive usage of live video applications, but existing approaches are far from satisfying the strict requirements of speed and stability. We present a novel convolutional network that can perform high-quality enhancement on 1080p videos at 45 FPS with a single CPU, which has high potential for real-world deployment. The proposed network is designed based on a light-weight image network and further consolidated for temporal consistency with a temporal feature aggregation (TFA) module. Unlike most image translation networks that use decoders to generate target images, our network discards decoders and employs only an encoder and a small head. The network predicts color mapping functions instead of pixel values in a grid-like container which fits the CNN structure well and also advances the enhancement to be scalable to any video resolution. Furthermore, the temporal consistency of the output will be enforced by the TFA module which utilizes the learned temporal coherence of semantics across frames. We also demonstrate that the mapping representation is general to various enhancement tasks, such as relighting, retouching and dehazing, on benchmark datasets. Our approach achieves the state-of-the-art performance and performs about 10 times faster than the current real-time method on high-resolution videos.

Supplementary Material

ZIP File (mmfp0356aux.zip)

Supplemental Material for RT-VENet.

Download
24.74 MB

MP4 File (3394171.3413951.mp4)

Real-time video enhancement is in great demand due to the extensive usage of live video applications, but existing approaches are far from satisfying the strict requirements of speed and stability. We present a novel convolutional network that can perform high-quality enhancement on 1080p videos at 45 FPS with a single CPU, which has high potential for real-world deployment. The proposed network is designed based on a light-weight image network and further consolidated for temporal consistency with a temporal feature aggregation (TFA) module. The network predicts color mapping functions instead of pixel values in a grid-like container which fts the CNN structure well and also advances the enhancement to be scalable to any video resolution. Our approach achieves the state-of-the-art performance and performs about 10 times faster than the current real-time method on high-resolution videos. This video provides a recorded presentation related to our work.

Download
24.34 MB

References

[1]

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 16). 265--283.

[2]

Mathieu Aubry, Sylvain Paris, Samuel W. Hasinoff, Jan Kautz, and Frédo Durand. 2014. Fast Local Laplacian Filters: Theory and Applications. ACM Trans. Graph., Vol. 33, 5 (Sept. 2014), 167:1--167:14.

Digital Library

[3]

Jonathan T Barron and Jitendra Malik. 2014. Shape, illumination, and reflectance from shading. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 8 (2014), 1670--1687.

[4]

D. Berman, T. Treibitz, and S. Avidan. 2016. Non-local Image Dehazing. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1674--1682.

[5]

Simone Bianco, Claudio Cusano, Flavio Piccoli, and Raimondo Schettini. 2019. Content-Preserving Tone Adjustment for Image Enhancement. In CVPR Workshops.

[6]

Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister. 2015. Blind Video Temporal Consistency. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2015), Vol. 34, 6 (2015).

Digital Library

[7]

V. Bychkovsky, S. Paris, E. Chan, and F. Durand. 2011. Learning photographic global tonal adjustment with a database of input / output image pairs. In CVPR 2011. 97--104.

[8]

Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778--4787.

[9]

Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao. 2016. Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, Vol. 25, 11 (2016), 5187--5198.

Digital Library

[10]

Chen Chen, Qifeng Chen, Minh N Do, and Vladlen Koltun. 2019. Seeing motion in the dark. In Proceedings of the IEEE International Conference on Computer Vision. 3185--3194.

[11]

Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. 2018. Learning to See in the Dark. CoRR, Vol. abs/1805.01934 (2018). arxiv: 1805.01934 http://arxiv.org/abs/1805.01934

[12]

Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105--1114.

[13]

Jiawen Chen, Andrew Adams, Neal Wadhwa, and Samuel W Hasinoff. 2016. Bilateral guided upsampling. ACM Transactions on Graphics (TOG), Vol. 35, 6 (2016), 203.

Digital Library

[14]

Y. Chen, Y. Wang, M. Kao, and Y. Chuang. 2018. Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6306--6314. https://doi.org/10.1109/CVPR.2018.00660

[15]

Jimmy Ba Diederik P. Kingma. 2015. Adam: A method for stochastic optimization. (2015).

[16]

Gabriel Eilertsen, Rafal K Mantiuk, and Jonas Unger. 2019. Single-frame Regularization for Temporally Stable CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11176--11185.

[17]

Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, and Frédo Durand. 2017. Deep Bilateral Learning for Real-time Image Enhancement. ACM Trans. Graph., Vol. 36, 4, Article 118 (July 2017), 12 pages. https://doi.org/10.1145/3072959.3073592

Digital Library

[18]

Yoav HaCohen, Eli Shechtman, Dan B Goldman, and Dani Lischinski. 2013. Optimizing color consistency in photo collections. ACM Transactions on Graphics (TOG), Vol. 32, 4 (2013), 1--10.

Digital Library

[19]

Kaiming He and Jian Sun. 2015. Fast guided filter. arXiv preprint arXiv:1505.00996 (2015).

[20]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[21]

Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, and Stephen Lin. 2018. Exposure: A White-Box Photo Post-Processing Framework. ACM Trans. Graph., Vol. 37, 2, Article 26 (May 2018), 17 pages.

[22]

Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV). 172--189.

Digital Library

[23]

Yan Huang, Wei Wang, and Liang Wang. 2015. Bidirectional recurrent convolutional networks for multi-frame super-resolution. In Advances in Neural Information Processing Systems. 235--243.

[24]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2016. Image-to-Image Translation with Conditional Adversarial Networks. arxiv (2016).

[25]

D. J. Jobson, Z. Rahman, and G. A. Woodell. 1997. Properties and performance of a center/surround retinex. IEEE Transactions on Image Processing, Vol. 6, 3 (March 1997), 451--462.

Digital Library

[26]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.

[27]

Nal Kalchbrenner, A"aron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. 2017. Video pixel networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1771--1779.

Digital Library

[28]

Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint Bilateral Upsampling. In ACM SIGGRAPH 2007 Papers (San Diego, California) (SIGGRAPH '07). ACM, Article 96.

[29]

Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018a. Learning Blind Video Temporal Consistency. In European Conference on Computer Vision.

[30]

Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018b. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision (ECCV). 170--185.

Digital Library

[31]

B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng. 2017. AOD-Net: All-in-One Dehazing Network. In 2017 IEEE International Conference on Computer Vision (ICCV). 4780--4788.

[32]

B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang. 2019. Benchmarking Single-Image Dehazing and Beyond. IEEE Transactions on Image Processing, Vol. 28, 1 (Jan 2019), 492--505.

[33]

Sifei Liu, Jinshan Pan, and Ming-Hsuan Yang. 2016. Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 560--576.

[34]

J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3431--3440. https://doi.org/10.1109/CVPR.2015.7298965

[35]

Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR, Vol. abs/1411.1784 (2014). arxiv: 1411.1784 http://arxiv.org/abs/1411.1784

[36]

Nayar and Branzoi. 2003. Adaptive dynamic range imaging: optical control of pixel exposures over space and time. In Proceedings Ninth IEEE International Conference on Computer Vision. 1168--1175 vol.2.

[37]

Jongchan Park, Joon-Young Lee, Donggeun Yoo, and In So Kweon. 2018. Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning. CoRR, Vol. abs/1804.04450 (2018). arxiv: 1804.04450 http://arxiv.org/abs/1804.04450

[38]

Lyndsey C. Pickup, Zheng Pan, Donglai Wei, YiChang Shih, Changshui Zhang, Andrew Zisserman, Bernhard Schölkopf, and William T. Freeman. 2014. Seeing the Arrow of Time. In IEEE Conference on Computer Vision and Pattern Recognition.

[39]

Anurag Ranjan and Michael J. Black. 2016. Optical Flow Estimation using a Spatial Pyramid Network. CoRR, Vol. abs/1611.00850 (2016). arxiv: 1611.00850 http://arxiv.org/abs/1611.00850

[40]

Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. 2016. Single image dehazing via multi-scale convolutional neural networks. In European conference on computer vision. Springer, 154--169.

[41]

Zhe Ren, Junchi Yan, Bingbing Ni, Bin Liu, Xiaokang Yang, and Hongyuan Zha. 2017. Unsupervised Deep Learning for Optical Flow Estimation. In AAAI.

[42]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.

[43]

D. Scharstein and R. Szeliski. 2003. High-accuracy stereo depth maps using structured light. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1. I--I.

[44]

S. Shimizu, T. Kondo, T. Kohashi, M. Tsurata, and T. Komuro. 1992. A new algorithm for exposure control based on fuzzy logic for video cameras. IEEE Transactions on Consumer Electronics, Vol. 38, 3 (Aug 1992), 617--623.

Digital Library

[45]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision. Springer, 746--760.

Digital Library

[46]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[47]

Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, and Oliver Wang. 2017. Deep video deblurring for hand-held cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1279--1288.

[48]

Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4472--4480.

[49]

Pavel Tokmakov, Karteek Alahari, and Cordelia Schmid. 2017. Learning video object segmentation with visual memory. In Proceedings of the IEEE International Conference on Computer Vision. 4481--4490.

[50]

Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, and Katerina Fragkiadaki. 2017. Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision. CoRR, Vol. abs/1705.11166 (2017). arxiv: 1705.11166 http://arxiv.org/abs/1705.11166

[51]

Baoyuan Wang, Yizhou Yu, and Ying-Qing Xu. 2011. Example-based Image Color and Tone Style Enhancement. In ACM SIGGRAPH 2011 Papers (SIGGRAPH '11). Article 64, 12 pages.

[52]

Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, and Jiaya Jia. 2019 b. Underexposed Photo Enhancement Using Deep Illumination Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).

[54]

Wei Wang, Xin Chen, Cheng Yang, Xiang Li, Xuemei Hu, and Tao Yue. 2019 a. Enhancing Low Light Videos by Exploring High Sensitivity Camera Noise. In Proceedings of the IEEE International Conference on Computer Vision. 4111--4119.

[55]

SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems. 802--810.

[56]

Li Xu, Jimmy Ren, Qiong Yan, Renjie Liao, and Jiaya Jia. 2015. Deep Edge-Aware Filters. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 1669--1678.

[57]

J. Yan, S. Lin, S. B. Kang, and X. Tang. 2014. A Learning-to-Rank Approach for Image Color Enhancement. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2987--2994.

[58]

Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, and Yizhou Yu. 2016. Automatic Photo Adjustment Using Deep Neural Networks. ACM Trans. Graph., Vol. 35, 2, Article 11 (Feb. 2016), 15 pages.

Digital Library

[59]

Lu Yuan and Jian Sun. 2012. Automatic exposure correction of consumer photographs. In European Conference on Computer Vision. Springer, 771--785.

Digital Library

[60]

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4694--4702.

[61]

Bo Zhang, Mingming He, Jing Liao, Pedro V Sander, Lu Yuan, Amine Bermak, and Dong Chen. 2019. Deep Exemplar-based Video Colorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8061.

[62]

J. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 2242--2251.

Cited By

Pan JLiu XBai YZhai DJiang JZhao D(2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664653
Cao YSu YDeng JZhang YWu Q(2024)Adaptive Locally-Aligned Transformer for low-light video enhancementComputer Vision and Image Understanding10.1016/j.cviu.2023.103916240(103916)Online publication date: Mar-2024
https://doi.org/10.1016/j.cviu.2023.103916
Gao QLi JZhao TWang Y(2022)Real-time Image Enhancement with Attention AggregationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356460719:2s(1-19)Online publication date: 26-Sep-2022
https://dl.acm.org/doi/10.1145/3564607
Show More Cited By

Index Terms

RT-VENet: A Convolutional Network for Real-time Video Enhancement
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Computational photography

Recommendations

Real-time Image Enhancement with Attention Aggregation
Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that ...
An Image-to-video Model for Real-Time Video Enhancement
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Recent years have witnessed the increasing popularity of learning-based methods to enhance the color and tone of images. Although these methods achieve satisfying performance on static images, it is non-trivial to extend such image-to-image methods to ...
Adjustable Contrast Enhancement Using Fast Piecewise Linear Histogram Equalization
ICIGP '20: Proceedings of the 2020 3rd International Conference on Image and Graphics Processing

Histogram equalization is a technique to enhance the contrast of the image by redistributing the histogram. In this paper, a fast piecewise linear histogram equalization method is introduced based on an adjustable degree of enhancement and piecewise ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of China

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
434
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pan JLiu XBai YZhai DJiang JZhao D(2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664653
Cao YSu YDeng JZhang YWu Q(2024)Adaptive Locally-Aligned Transformer for low-light video enhancementComputer Vision and Image Understanding10.1016/j.cviu.2023.103916240(103916)Online publication date: Mar-2024
https://doi.org/10.1016/j.cviu.2023.103916
Gao QLi JZhao TWang Y(2022)Real-time Image Enhancement with Attention AggregationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356460719:2s(1-19)Online publication date: 26-Sep-2022
https://dl.acm.org/doi/10.1145/3564607
Katsaros EOstrowski PWłódarczak KLewandowska ERuminski JSiupka-Mróz DLassmann ŁJezierska AWęsierski D(2022)Multi-task Video Enhancement for Dental InterventionsMedical Image Computing and Computer Assisted Intervention – MICCAI 202210.1007/978-3-031-16449-1_18(177-187)Online publication date: 17-Sep-2022
https://doi.org/10.1007/978-3-031-16449-1_18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents