Nothing Special   »   [go: up one dir, main page]

skip to main content

Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer

Published: 01 January 2023 Publication History


Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically suffer from temporal inconsistency (e.g., flickering artifacts and motion blurs) when handling large-scale motions since the local perception property of CNNs limits them to model long-range dependencies in both spatial and temporal domains. To address these problems, we propose the first unsupervised method for low-light video enhancement to our best knowledge, named LightenFormer, which models long-range intra- and inter-frame dependencies with a spatial-temporal co-attention transformer to enhance brightness while maintaining temporal consistency. Specifically, an effective but lightweight S-curve Estimation Network (SCENet) is first proposed to estimate pixel-wise S-shaped non-linear curves (S-curves) to adaptively adjust the dynamic range of an input video. Next, to model the temporal consistency of the video, we present a Spatial-Temporal Refinement Network (STRNet) to refine the enhanced video. The core module of STRNet is a novel Spatial-Temporal Co-attention Transformer (STCAT), which exploits multi-scale self- and cross-attention interactions to capture long-range correlations in both spatial and temporal domains among frames for implicit motion estimation. To achieve unsupervised training, we further propose two non-reference loss functions based on the invertibility of the S-curve and the noise independence among frames. Extensive experiments on the SDSD and LLIV-Phone datasets demonstrate that our LightenFormer outperforms state-of-the-art methods.


C. Li, C. Guo, and C. C. Loy, “Learning to enhance low-light image via zero-reference deep curve estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4225–4238, Aug. 2022.
Z. Zhao, B. Xiong, L. Wang, Q. Ou, L. Yu, and F. Kuang, “RetinexDIP: A unified deep framework for low-light image enhancement,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1076–1088, Mar. 2022.
H. Jiang and Y. Zheng, “Learning to see moving objects in the dark,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7323–7332.
R. Wang, X. Xu, C.-W. Fu, J. Lu, B. Yu, and J. Jia, “Seeing dynamic scene in the dark: A high-quality video dataset with mechatronic alignment,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9680–9689.
H. Woo, Y. M. Jung, J.-G. Kim, and J. K. Seo, “Environmentally robust motion detection for video surveillance,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2838–2848, Nov. 2010.
M. Yang, X. Nie, and R. W. Liu, “Coarse-to-fine luminance estimation for low-light image enhancement in maritime video surveillance,” in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019, pp. 299–304.
H. Rashed, M. Ramzy, V. Vaquero, A. El Sallab, G. Sistu, and S. Yogamani, “FuseMODNet: Real-time camera and LiDAR based moving object detection for robust low-light autonomous driving,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 2393–2402.
H. Wanget al., “SFNet-N: An improved SFNet algorithm for semantic segmentation of low-light autonomous driving road scenes,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 11, pp. 21405–21417, Nov. 2022.
S. Wen, X. Hu, J. Ma, F. Sun, and B. Fang, “Autonomous robot navigation using Retinex algorithm for multiscale image adaptability in low-light environment,” Intell. Service Robot., vol. 12, no. 4, pp. 359–369, Aug. 2019.
Q. Guo, W. Feng, R. Gao, Y. Liu, and S. Wang, “Exploring the effects of blur and deblurring to visual object tracking,” IEEE Trans. Image Process., vol. 30, pp. 1812–1824, 2021.
Z. Zhang, Y. Liu, B. Li, W. Hu, and H. Peng, “Toward accurate pixelwise object tracking via attention retrieval,” IEEE Trans. Image Process., vol. 30, pp. 8553–8566, 2021.
Z. Chen, Z. Fan, Y. Li, H. Gao, and S. Lin, “Z-domain entropy adaptable flex for semi-supervised action recognition in the dark,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 4258–4265.
R. Chen, J. Chen, Z. Liang, H. Gao, and S. Lin, “DarkLight networks for action recognition in the dark,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2021, pp. 846–852.
C. Chen, Q. Chen, M. Do, and V. Koltun, “Seeing motion in the dark,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 3184–3193.
F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video enhancement using CNNs,” in Proc. Brit. Mach. Vis. Conf. (BMVC), Sep. 2018, pp. 1–13.
D. Triantafyllidou, S. Moran, S. McDonagh, S. Parisot, and G. Slabaugh, “Low light video enhancement using synthetic data produced with an intermediate domain mapping,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, pp. 103–119.
T. Arici, S. Dikbas, and Y. Altunbasak, “A histogram modification framework and its application for image contrast enhancement,” IEEE Trans. Image Process., vol. 18, no. 9, pp. 1921–1935, Sep. 2009.
T. Celik and T. Tjahjadi, “Contextual and variational contrast enhancement,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3431–3441, Dec. 2011.
C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2D histograms,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 5372–5384, Dec. 2013.
S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013.
X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement via illumination map estimation,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 982–993, Feb. 2017.
X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2782–2790.
S. M. Pizer, R. E. Johnston, J. P. Ericksen, B. C. Yankaskas, and K. E. Ḿ’uller, “Contrast-limited adaptive histogram equalization: Speed and effectiveness,” in Proc. IEEE Conf. Vis. Biomed. Comput., Jan. 1990, pp. 337–345.
E. H. Land, “The Retinex theory of color vision,” Sci. Amer., vol. 237, no. 6, pp. 108–129, Dec. 1977.
S. W. Zamiret al., “Learning enriched features for real image restoration and enhancement,” in Proc. 16th Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, pp. 492–511.
W. Yang, S. Wang, Y. Fang, Y. Wang, and J. Liu, “Band representation-based semi-supervised low-light image enhancement: Bridging the gap between signal fidelity and perceptual quality,” IEEE Trans. Image Process., vol. 30, pp. 3461–3473, 2021.
R. Wan, B. Shi, W. Yang, B. Wen, L.-Y. Duan, and A. C. Kot, “Purifying low-light images via near-infrared enlightened image,” IEEE Trans. Multimedia, early access, Dec. 26, 2022. 10.1109/TMM.2022.3232206.
C. Guoet al., “Zero-reference deep curve estimation for low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1777–1786.
R. Wan, B. Shi, H. Li, Y. Hong, L.-Y. Duan, and A. C. Kot, “Benchmarking single-image reflection removal algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1424–1441, Feb. 2023.
C. Wang, B. He, S. Wu, R. Wan, B. Shi, and L.-Y. Duan, “Coarse-to-fine disentangling demoiréing framework for recaptured screen images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 9439–9453, Aug. 2023.
C. Liet al., “Embedding Fourier for ultra-high-definition low-light image enhancement,” in Proc. Int. Conf. Learn. Represent. (ICLR), Feb. 2023. [Online]. Available:
W. Renet al., “Low-light image enhancement via a deep hybrid network,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4364–4375, Sep. 2019.
Y. Wang, R. Wan, W. Yang, H. Li, L.-P. Chau, and A. Kot, “Low-light image enhancement with normalizing flow,” in Proc. 36th AAAI Conf. Artif. Intell., 2022, pp. 2604–2612.
Y. Jianget al., “EnlightenGAN: Deep light enhancement without paired supervision,” IEEE Trans. Image Process., vol. 30, pp. 2340–2349, 2021.
R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 10556–10565.
E. P. Bennett and L. Mcmillan, “Video enhancement using per-pixel virtual exposures,” ACM Trans. Graph., vol. 24, no. 3, pp. 845–852, Jul. 2005.
H. Malm, M. Oskarsson, E. Warrant, P. Clarberg, J. Hasselgren, and C. Lejdfors, “Adaptive enhancement and noise reduction in very low light-level video,” in Proc. IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
M. Kim, D. Park, D. K. Han, and H. Ko, “A novel approach for denoising and enhancement of extremely low-light video,” IEEE Trans. Consum. Electron., vol. 61, no. 1, pp. 72–80, Feb. 2015.
D. Wang, X. Niu, and Y. Dou, “A piecewise-based contrast enhancement framework for low lighting video,” in Proc. IEEE Int. Conf. Secur., Pattern Anal., Cybern. (SPAC), Oct. 2014, pp. 235–240.
H. Liu, X. Sun, H. Han, and W. Cao, “Low-light video image enhancement based on multiscale Retinex-like algorithm,” in Proc. Chin. Control Decis. Conf. (CCDC), May 2016, pp. 3712–3715.
X. Donget al., “Fast efficient algorithm for enhancement of low lighting video,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2011, pp. 1–6.
X. Jiang, H. Yao, S. Zhang, X. Lu, and W. Zeng, “Night video enhancement using improved dark channel prior,” in Proc. IEEE Int. Conf. Image Process., Sep. 2013, pp. 553–557.
E. J. McCartney, Optics of the Atmosphere: Scattering by Molecules and Particles. New York, NY, USA: Wiley, 1976.
F. Zhang, Y. Li, S. You, and Y. Fu, “Learning temporal consistency for low light video enhancement from single images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 4965–4974.
C. Liet al., “Low-light image and video enhancement using deep learning: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9396–9416, Dec. 2022.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242–2251.
A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Jun. 2017, pp. 5998–6008.
A. Dosovitskiyet al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2021, pp. 1–22.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in Proc. 38th Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 10347–10357.
S. Zhenget al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6877–6886.
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2021, pp. 12077–12090.
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. 16th Eur. Conf. Comput. Vis. (ECCV), Nov. 2020, pp. 213–229.
Z. Liuet al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9992–10002.
H. Chenet al., “Pre-trained image processing transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 12294–12305.
J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: Image restoration using Swin transformer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2021, pp. 1833–1844.
Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general U-shaped transformer for image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 17662–17672.
C. Guo, Q. Yan, S. Anwar, R. Cong, W. Ren, and C. Li, “Image dehazing transformer with transmission-aware 3D position embedding,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5802–5810.
J. Cao, Y. Li, K. Zhang, and L. Van Gool, “Video super-resolution transformer,” 2021, arXiv:2106.06847.
L. Yuan and J. Sun, “Automatic exposure correction of consumer photographs,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2012, pp. 771–785.
L. Zhang, L. Zhang, X. Liu, Y. Shen, S. Zhang, and S. Zhao, “Zero-shot restoration of back-lit images using deep internal learning,” in Proc. 27th ACM Int. Conf. Multimedia, Oct. 2019, pp. 1623–1631.
K. C. K. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, “Understanding deformable alignment in video super-resolution,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 973–981.
K. C. K. Chan, S. Zhou, X. Xu, and C. C. Loy, “BasicVSR++: Improving video super-resolution with enhanced propagation and alignment,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5962–5971.
X. Wang, K. C. K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: Video restoration with enhanced deformable convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 1954–1963.
T. Mertens, J. Kautz, and F. Van Reeth, “Exposure fusion,” in Proc. 15th Pacific Conf. Comput. Graph. Appl. (PG), Nov. 2007, pp. 382–390.
P. Gehler, C. Rother, M. Kiefel, L. Zhang, and B. Schölkopf, “Recovering intrinsic images with a global sparsity prior on reflectance,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Dec. 2011, pp. 765–773.
G. Buchsbaum, “A spatial processor model for object colour perception,” J. Franklin Inst., vol. 310, no. 1, pp. 1–26, Jul. 1980.
J. Lehtinenet al., “Noise2Noise: Learning image restoration without clean data,” in Proc. 35th Int. Conf. Mach. Learn. (ICML), Jul. 2018, pp. 2971–2980.
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “FlowNet 2.0: Evolution of optical flow estimation with deep networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1647–1655.
W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consistency,” in Proc. Eur. Conf. Comput. Vis. (ECCV), vol. 11219, Oct. 2018, pp. 179–195.
K. Wang, K. Akash, and T. Misu, “Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow,” in Proc. 36th AAAI Conf. Artif. Intell., 2022, pp. 2477–2486.
S. Zheng and G. Gupta, “Semantic-guided zero-shot learning for low-light image/video enhancement,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. Workshops (WACVW), Jan. 2022, pp. 581–590.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2015, pp. 1–14.
A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely blind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, Mar. 2013.
W. Zhang, K. Ma, G. Zhai, and X. Yang, “Uncertainty-aware blind image quality assessment in the laboratory and wild,” IEEE Trans. Image Process., vol. 30, pp. 3474–3486, 2021.
K. Gu, W. Lin, G. Zhai, X. Yang, W. Zhang, and C. W. Chen, “No-reference quality metric of contrast-distorted images based on information maximization,” IEEE Trans. Cybern., vol. 47, no. 12, pp. 4559–4565, Dec. 2017.
K. Gu, D. Tao, J.-F. Qiao, and W. Lin, “Learning a no-reference quality assessment model of enhanced images with big data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 1301–1313, Apr. 2018.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2015, pp. 1–15.



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 32, Issue
5324 pages


IEEE Press

Publication History

Published: 01 January 2023


  • Research-article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics


View Options

View options






Share this Publication link

Share on social media