Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer

Published: 01 January 2023 Publication History

Abstract

Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically suffer from temporal inconsistency (e.g., flickering artifacts and motion blurs) when handling large-scale motions since the local perception property of CNNs limits them to model long-range dependencies in both spatial and temporal domains. To address these problems, we propose the first unsupervised method for low-light video enhancement to our best knowledge, named LightenFormer, which models long-range intra- and inter-frame dependencies with a spatial-temporal co-attention transformer to enhance brightness while maintaining temporal consistency. Specifically, an effective but lightweight S-curve Estimation Network (SCENet) is first proposed to estimate pixel-wise S-shaped non-linear curves (S-curves) to adaptively adjust the dynamic range of an input video. Next, to model the temporal consistency of the video, we present a Spatial-Temporal Refinement Network (STRNet) to refine the enhanced video. The core module of STRNet is a novel Spatial-Temporal Co-attention Transformer (STCAT), which exploits multi-scale self- and cross-attention interactions to capture long-range correlations in both spatial and temporal domains among frames for implicit motion estimation. To achieve unsupervised training, we further propose two non-reference loss functions based on the invertibility of the S-curve and the noise independence among frames. Extensive experiments on the SDSD and LLIV-Phone datasets demonstrate that our LightenFormer outperforms state-of-the-art methods.

References

[1]
C. Li, C. Guo, and C. C. Loy, “Learning to enhance low-light image via zero-reference deep curve estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4225–4238, Aug. 2022.
[2]
Z. Zhao, B. Xiong, L. Wang, Q. Ou, L. Yu, and F. Kuang, “RetinexDIP: A unified deep framework for low-light image enhancement,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1076–1088, Mar. 2022.
[3]
H. Jiang and Y. Zheng, “Learning to see moving objects in the dark,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7323–7332.
[4]
R. Wang, X. Xu, C.-W. Fu, J. Lu, B. Yu, and J. Jia, “Seeing dynamic scene in the dark: A high-quality video dataset with mechatronic alignment,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9680–9689.
[5]
H. Woo, Y. M. Jung, J.-G. Kim, and J. K. Seo, “Environmentally robust motion detection for video surveillance,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2838–2848, Nov. 2010.
[6]
M. Yang, X. Nie, and R. W. Liu, “Coarse-to-fine luminance estimation for low-light image enhancement in maritime video surveillance,” in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019, pp. 299–304.
[7]
H. Rashed, M. Ramzy, V. Vaquero, A. El Sallab, G. Sistu, and S. Yogamani, “FuseMODNet: Real-time camera and LiDAR based moving object detection for robust low-light autonomous driving,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 2393–2402.
[8]
H. Wanget al., “SFNet-N: An improved SFNet algorithm for semantic segmentation of low-light autonomous driving road scenes,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 11, pp. 21405–21417, Nov. 2022.
[9]
S. Wen, X. Hu, J. Ma, F. Sun, and B. Fang, “Autonomous robot navigation using Retinex algorithm for multiscale image adaptability in low-light environment,” Intell. Service Robot., vol. 12, no. 4, pp. 359–369, Aug. 2019.
[10]
Q. Guo, W. Feng, R. Gao, Y. Liu, and S. Wang, “Exploring the effects of blur and deblurring to visual object tracking,” IEEE Trans. Image Process., vol. 30, pp. 1812–1824, 2021.
[11]
Z. Zhang, Y. Liu, B. Li, W. Hu, and H. Peng, “Toward accurate pixelwise object tracking via attention retrieval,” IEEE Trans. Image Process., vol. 30, pp. 8553–8566, 2021.
[12]
Z. Chen, Z. Fan, Y. Li, H. Gao, and S. Lin, “Z-domain entropy adaptable flex for semi-supervised action recognition in the dark,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 4258–4265.
[13]
R. Chen, J. Chen, Z. Liang, H. Gao, and S. Lin, “DarkLight networks for action recognition in the dark,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2021, pp. 846–852.
[14]
C. Chen, Q. Chen, M. Do, and V. Koltun, “Seeing motion in the dark,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 3184–3193.
[15]
F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video enhancement using CNNs,” in Proc. Brit. Mach. Vis. Conf. (BMVC), Sep. 2018, pp. 1–13.
[16]
D. Triantafyllidou, S. Moran, S. McDonagh, S. Parisot, and G. Slabaugh, “Low light video enhancement using synthetic data produced with an intermediate domain mapping,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, pp. 103–119.
[17]
T. Arici, S. Dikbas, and Y. Altunbasak, “A histogram modification framework and its application for image contrast enhancement,” IEEE Trans. Image Process., vol. 18, no. 9, pp. 1921–1935, Sep. 2009.
[18]
T. Celik and T. Tjahjadi, “Contextual and variational contrast enhancement,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3431–3441, Dec. 2011.
[19]
C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2D histograms,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 5372–5384, Dec. 2013.
[20]
S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013.
[21]
X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement via illumination map estimation,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 982–993, Feb. 2017.
[22]
X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2782–2790.
[23]
S. M. Pizer, R. E. Johnston, J. P. Ericksen, B. C. Yankaskas, and K. E. Ḿ’uller, “Contrast-limited adaptive histogram equalization: Speed and effectiveness,” in Proc. IEEE Conf. Vis. Biomed. Comput., Jan. 1990, pp. 337–345.
[24]
E. H. Land, “The Retinex theory of color vision,” Sci. Amer., vol. 237, no. 6, pp. 108–129, Dec. 1977.
[25]
S. W. Zamiret al., “Learning enriched features for real image restoration and enhancement,” in Proc. 16th Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, pp. 492–511.
[26]
W. Yang, S. Wang, Y. Fang, Y. Wang, and J. Liu, “Band representation-based semi-supervised low-light image enhancement: Bridging the gap between signal fidelity and perceptual quality,” IEEE Trans. Image Process., vol. 30, pp. 3461–3473, 2021.
[27]
R. Wan, B. Shi, W. Yang, B. Wen, L.-Y. Duan, and A. C. Kot, “Purifying low-light images via near-infrared enlightened image,” IEEE Trans. Multimedia, early access, Dec. 26, 2022. 10.1109/TMM.2022.3232206.
[28]
C. Guoet al., “Zero-reference deep curve estimation for low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1777–1786.
[29]
R. Wan, B. Shi, H. Li, Y. Hong, L.-Y. Duan, and A. C. Kot, “Benchmarking single-image reflection removal algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1424–1441, Feb. 2023.
[30]
C. Wang, B. He, S. Wu, R. Wan, B. Shi, and L.-Y. Duan, “Coarse-to-fine disentangling demoiréing framework for recaptured screen images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 9439–9453, Aug. 2023.
[31]
C. Liet al., “Embedding Fourier for ultra-high-definition low-light image enhancement,” in Proc. Int. Conf. Learn. Represent. (ICLR), Feb. 2023. [Online]. Available: https://iclr.cc/virtual/2023/poster/11576
[32]
W. Renet al., “Low-light image enhancement via a deep hybrid network,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4364–4375, Sep. 2019.
[33]
Y. Wang, R. Wan, W. Yang, H. Li, L.-P. Chau, and A. Kot, “Low-light image enhancement with normalizing flow,” in Proc. 36th AAAI Conf. Artif. Intell., 2022, pp. 2604–2612.
[34]
Y. Jianget al., “EnlightenGAN: Deep light enhancement without paired supervision,” IEEE Trans. Image Process., vol. 30, pp. 2340–2349, 2021.
[35]
R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 10556–10565.
[36]
E. P. Bennett and L. Mcmillan, “Video enhancement using per-pixel virtual exposures,” ACM Trans. Graph., vol. 24, no. 3, pp. 845–852, Jul. 2005.
[37]
H. Malm, M. Oskarsson, E. Warrant, P. Clarberg, J. Hasselgren, and C. Lejdfors, “Adaptive enhancement and noise reduction in very low light-level video,” in Proc. IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
[38]
M. Kim, D. Park, D. K. Han, and H. Ko, “A novel approach for denoising and enhancement of extremely low-light video,” IEEE Trans. Consum. Electron., vol. 61, no. 1, pp. 72–80, Feb. 2015.
[39]
D. Wang, X. Niu, and Y. Dou, “A piecewise-based contrast enhancement framework for low lighting video,” in Proc. IEEE Int. Conf. Secur., Pattern Anal., Cybern. (SPAC), Oct. 2014, pp. 235–240.
[40]
H. Liu, X. Sun, H. Han, and W. Cao, “Low-light video image enhancement based on multiscale Retinex-like algorithm,” in Proc. Chin. Control Decis. Conf. (CCDC), May 2016, pp. 3712–3715.
[41]
X. Donget al., “Fast efficient algorithm for enhancement of low lighting video,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2011, pp. 1–6.
[42]
X. Jiang, H. Yao, S. Zhang, X. Lu, and W. Zeng, “Night video enhancement using improved dark channel prior,” in Proc. IEEE Int. Conf. Image Process., Sep. 2013, pp. 553–557.
[43]
E. J. McCartney, Optics of the Atmosphere: Scattering by Molecules and Particles. New York, NY, USA: Wiley, 1976.
[44]
F. Zhang, Y. Li, S. You, and Y. Fu, “Learning temporal consistency for low light video enhancement from single images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 4965–4974.
[45]
C. Liet al., “Low-light image and video enhancement using deep learning: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9396–9416, Dec. 2022.
[46]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242–2251.
[47]
A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Jun. 2017, pp. 5998–6008.
[48]
A. Dosovitskiyet al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2021, pp. 1–22.
[49]
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in Proc. 38th Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 10347–10357.
[50]
S. Zhenget al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6877–6886.
[51]
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2021, pp. 12077–12090.
[52]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. 16th Eur. Conf. Comput. Vis. (ECCV), Nov. 2020, pp. 213–229.
[53]
Z. Liuet al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9992–10002.
[54]
H. Chenet al., “Pre-trained image processing transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 12294–12305.
[55]
J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: Image restoration using Swin transformer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2021, pp. 1833–1844.
[56]
Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general U-shaped transformer for image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 17662–17672.
[57]
C. Guo, Q. Yan, S. Anwar, R. Cong, W. Ren, and C. Li, “Image dehazing transformer with transmission-aware 3D position embedding,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5802–5810.
[58]
J. Cao, Y. Li, K. Zhang, and L. Van Gool, “Video super-resolution transformer,” 2021, arXiv:2106.06847.
[59]
L. Yuan and J. Sun, “Automatic exposure correction of consumer photographs,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2012, pp. 771–785.
[60]
L. Zhang, L. Zhang, X. Liu, Y. Shen, S. Zhang, and S. Zhao, “Zero-shot restoration of back-lit images using deep internal learning,” in Proc. 27th ACM Int. Conf. Multimedia, Oct. 2019, pp. 1623–1631.
[61]
K. C. K. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, “Understanding deformable alignment in video super-resolution,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 973–981.
[62]
K. C. K. Chan, S. Zhou, X. Xu, and C. C. Loy, “BasicVSR++: Improving video super-resolution with enhanced propagation and alignment,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5962–5971.
[63]
X. Wang, K. C. K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: Video restoration with enhanced deformable convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 1954–1963.
[64]
T. Mertens, J. Kautz, and F. Van Reeth, “Exposure fusion,” in Proc. 15th Pacific Conf. Comput. Graph. Appl. (PG), Nov. 2007, pp. 382–390.
[65]
P. Gehler, C. Rother, M. Kiefel, L. Zhang, and B. Schölkopf, “Recovering intrinsic images with a global sparsity prior on reflectance,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Dec. 2011, pp. 765–773.
[66]
G. Buchsbaum, “A spatial processor model for object colour perception,” J. Franklin Inst., vol. 310, no. 1, pp. 1–26, Jul. 1980.
[67]
J. Lehtinenet al., “Noise2Noise: Learning image restoration without clean data,” in Proc. 35th Int. Conf. Mach. Learn. (ICML), Jul. 2018, pp. 2971–2980.
[68]
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “FlowNet 2.0: Evolution of optical flow estimation with deep networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1647–1655.
[69]
W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consistency,” in Proc. Eur. Conf. Comput. Vis. (ECCV), vol. 11219, Oct. 2018, pp. 179–195.
[70]
K. Wang, K. Akash, and T. Misu, “Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow,” in Proc. 36th AAAI Conf. Artif. Intell., 2022, pp. 2477–2486.
[71]
S. Zheng and G. Gupta, “Semantic-guided zero-shot learning for low-light image/video enhancement,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. Workshops (WACVW), Jan. 2022, pp. 581–590.
[72]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2015, pp. 1–14.
[73]
A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely blind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, Mar. 2013.
[74]
W. Zhang, K. Ma, G. Zhai, and X. Yang, “Uncertainty-aware blind image quality assessment in the laboratory and wild,” IEEE Trans. Image Process., vol. 30, pp. 3474–3486, 2021.
[75]
K. Gu, W. Lin, G. Zhai, X. Yang, W. Zhang, and C. W. Chen, “No-reference quality metric of contrast-distorted images based on information maximization,” IEEE Trans. Cybern., vol. 47, no. 12, pp. 4559–4565, Dec. 2017.
[76]
K. Gu, D. Tao, J.-F. Qiao, and W. Lin, “Learning a no-reference quality assessment model of enhanced images with big data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 1301–1313, Apr. 2018.
[77]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[78]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2015, pp. 1–15.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 32, Issue
2023
5324 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media