Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Dual Frequency Transformer for Efficient SDR-to-HDR Translation

  • Research Article
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Image Parameter Values for High Dynamic Range Television for Use in Production and International Programme Exchange, ITU Standard ITU-R BT.2100-2, 2016.

  2. Parameter Values for the HDTV1 Standards for Production and International Programme Exchange, ITU Standard ITU-R BT.709-6, 2015.

  3. C. Dong, Y. B. Deng, C. C. Loy, X. O. Tang. Compression artifacts reduction by a deep convolutional network. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 576–584, 2015. DOI: https://doi.org/10.1109/ICCV.2015.73.

  4. K. Zhang, W. M. Zuo, S. H. Gu, L. Zhang. Learning deep CNN denoiser prior for image restoration. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2808–2817, 2017. DOI: https://doi.org/10.1109/CVPR.2017.300.

  5. K. Zhang, W. M. Zuo, Y. J. Chen, D. Y. Meng, L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017. DOI: https://doi.org/10.1109/TIP.2017.2662206.

    Article  MathSciNet  Google Scholar 

  6. W. S. Dong, P. Y. Wang, W. T. Yin, G. M. Shi, F. F. Wu, X. T. Lu. Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 10, pp. 2305–2318, 2019. DOI: https://doi.org/10.1109/TPAMI.2018.2873610.

    Article  Google Scholar 

  7. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 5718–5729, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00564.

    Google Scholar 

  8. J. Y. Liang, J. Z. Cao, G. L. Sun, K. Zhang, L. Van Gool, R. Timofte. SwinIR: Image restoration using swin transformer. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 1833–1844, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00210.

    Google Scholar 

  9. H. T. Chen, Y. H. Wang, T. Y. Guo, C. Xu, Y. P. Deng, Z. H. Liu, S. W. Ma, C. J. Xu, C. Xu, W. Gao. Pre-trained image processing transformer. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 12294–12305, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.01212.

    Google Scholar 

  10. X. Y. Chen, Z. W. Zhang, J. S. Ren, L. Tian, Y. Qiao, C. Dong. A new journey from SDRTV to HDRTV. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp.4480–4489, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00446.

    Google Scholar 

  11. P. E. Debevec, J. Malik. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, ACM, Los Angeles, USA, pp. 369–378, 1997. DOI: https://doi.org/10.1145/258734.258884.

    Google Scholar 

  12. Y. L. Liu, W. S. Lai, Y. S. Chen, Y. L. Kao, M. H. Yang, Y. Y. Chuang, J. B. Huang. Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.1648–1657, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00172.

    Google Scholar 

  13. S. Y. Kim, M. Kim. A multi-purpose convolutional neural network for simultaneous super-resolution and high dynamic range image reconstruction. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 379–394, 2018. DOI: https://doi.org/10.1007/978-3-030-20893-6_24.

    Google Scholar 

  14. S. Y. Kim, J. Oh, M. Kim. Deep SR-ITM: Joint learning of super-resolution and inverse tone-mapping for 4K UHD HDR applications. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 3116–3125, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00321.

    Google Scholar 

  15. S. Y. Kim, J. Oh, M. Kim. JSI-GAN: GAN-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for UHD HDR video. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 11287–11295, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6789.

  16. G. Xu, Q. B. Hou, L. Zhang, M. M. Cheng. FMNet: Frequency-aware modulation network for SDR-to-HDR translation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp.6425-6435, 2022. DOI: https://doi.org/10.1145/3503161.3548016.

  17. G. He, K. P. Xu, L. Xu, C. Wu, M. Sun, X. Wen, Y. W. Tai. SDRTV-to-HDRTV via hierarchical dynamic context feature mapping. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp. 2890–2898, 2022. DOI: https://doi.org/10.1145/3503161.3548043.

  18. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.

  19. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, 2020.

  20. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou. Training data-efficient image transformers Sz distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, pp.10347-10357, 2021.

  21. L. Yuan, Y. P. Chen, T. Wang, W. H. Yu, Y. J. Shi, Z. H. Jiang, F. E. H. Tay, J. S. Feng, S. C. Yan. Tokens-to-token ViT: Training vision transformers from scratch on imagenet. In Proceedings of IEEE/CVF InternationalConference on Computer Vision, IEEE, Montreal, Canada, pp. 538–547, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00060.

    Google Scholar 

  22. L. Yuan, Q. B. Hou, Z. H. Jiang, J. S. Feng, S. C. Yan. VOLO: Vision outlooker for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelhgence, vol. 45, no. 5, pp. 6575–6586, 2023. DOI: https://doi.org/10.1109/TPAMI.2022.3206108.

    Google Scholar 

  23. W. H. Wang, E. Z. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of IEEE/C’VF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 548–558, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00061.

    Google Scholar 

  24. E. Z. Xie, W. H. Wang, Z. D. Yu, A. Anandkumar, J. M. Alvarez, P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th Neural Information Processing Systems, pp.12077-12090,2021.

  25. S. X. Zheng, J. C. Lu, H. S. Zhao, X. T. Zhu, Z. K. Luo, Y. B. Wang, Y. W. Fu, J. F. Feng, T. Xiang, P. H. S. Torr, L. Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 6877–6886, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00681.

    Google Scholar 

  26. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp.213–229, 2020. DOI: https://doi.org/10.1007/978-3-030-58452-813.

    Google Scholar 

  27. Z. Liu, Y. T. Lin, Y. Cao, H. Hu, Y. X. Wei, Z. Zhang, S. Lin, B. N. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE/C VF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 9992–10002, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00986.

    Google Scholar 

  28. X. Z. Zhu, W. J. Su, L. W. Lu, B. Li, X. G. Wang, J. F. Dai. Deformable DETR: Deformable transformers for end-to-end object detection. Proceedings of the 9th International Conference on Learning Representations, 2021.

  29. J. Z. Cao, Y. W. Li, K. Zhang, L. Van Gooi. Video super-resolution transformer. [Online], Available: https://arxiv.org/abs/2106.06847, 2021.

  30. Y. F. Xu, H. P. Wei, M. X. Lin, Y. Y. Deng, K. K. Sheng, M. D. Zhang, F. Tang, W. M. Dong, F. Y. Huang, C. S. Xu. Transformers in computational visual media: A survey. Computational Visual Media, vol. 8, no. 1, pp. 33–62, 2022. DOI: https://doi.org/10.1007/s41095-021-0247-3.

    Article  Google Scholar 

  31. H. P. Wei, Y. Y. Deng, F. Tang, X. J. Pan, W. M. Dong. A comparative study of CNN-and transformer-based visual style transfer. Journal of Computer Science and Technology, vol. 37, no. 3, pp. 601–614, 2022. DOI: https://doi.org/10.1007/s11390-022-2140-7.

    Article  Google Scholar 

  32. Y. Y. Deng, F. Tang, W. M. Dong, C. Y. Ma, X. J. Pan, L. Wang, C. S. Xu. Stytr.2: Image style transfer with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 11316–11326, 2022. DOI: https://doi.org/10.1109/CV-PR52688.2022.01104.

    Google Scholar 

  33. S. Kim, J. Baek, J. Park, G. Kim, S. Kim. InstaFormer: Instance-aware image-to-image translation with transformer. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 18300–18310, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01778.

    Google Scholar 

  34. J. Park, Y. Kim. Styleformer: Transformer based generative adversarial networks with style vector. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp.8973–8982, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00878.

    Google Scholar 

  35. Z. Hui, X. B. Gao, Y. C. Yang, X. M. Wang. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, pp. 2024–2032, 2019. DOI: https://doi.org/10.1145/3343031.3351084.

    Google Scholar 

  36. J. Liu, J. Tang, G. S. Wu. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp.41–55, 2020. DOI: https://doi.org/10.1007/978-3-030-67070-22.

    Google Scholar 

  37. S. H. Gao, M. M. Cheng, K. Zhao, X. Y. Zhang, M. H. Yang, P. Torr. Res2Net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, 2021. DOI: https://doi.org/10.1109/TPAMI.2019.2938758.

    Article  Google Scholar 

  38. J. L. Ba, J. R. Kiros, G. E. Hinton. Layer normalization. [Online], Available: https://arxiv.org/abs/1607.06450, 2016.

  39. X. Y. Zou, F. Y. Xiao, Z. D. Yu, Y. J. Lee. Delving deeper into anti-aliasing in convNets. In Proceedings of the 31st British Machine Vision Conference, Manchester, UK, 2020.

  40. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980, 2014.

  41. R. K. Mantiuk, M. Azimi. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Proceedings of Picture Coding Symposium, IEEE, Bristol, UK, 2021. DOI: https://doi.org/10.1109/PCS50896.2021.9477471.

    Google Scholar 

  42. Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. DOI: https://doi.org/10.1109/TIP.2003.819861.

    Article  Google Scholar 

  43. L. Zhang, H. Y. Li. SR-SIM: A fast and high performance IQA index based on spectral residual. In Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, USA, pp. 1473–1476, 2012. DOI: https://doi.org/10.1109/ICIP.2012.6467149.

  44. Objective Metric for the Assessment of the Potential Visibility of Colour Differences in Television, ITU Standard ITU-RBT.2124-0, 2019.

  45. R. Mantiuk, K. J. Kim, A. G. Rempel, W. Heidrich. HDRVDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactionson Graphics, vol. 30, no. 4, Article number 40, 2011. DOI: https://doi.org/10.1145/2010324.1964935.

  46. Y. Q. Huo, F. Yang, L. Dong, V. Brost. Physiological inverse tone mapping based on retina response. The Visual Computer, vol. 30, no. 5, pp. 507–517, 2014. DOI: https://doi.org/10.1007/s00371-013-0875-4.

    Article  Google Scholar 

  47. R. P. Kovaleski, M. M. Oliveira. High-quality reverse tone mapping for a wide range of exposures. In Proceedings of the 27th SIBGRAPI Conference on Graphics, Patterns and Images, IEEE, Rio de Janeiro, Brazil, pp.49–56, 2014. DOI: https://doi.org/10.1109/SIBGRAPI.2014.29.

    Google Scholar 

  48. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.

  49. P. Isola, J. Y. Zhu, T. H. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 5967–5976, 2017. DOI: https://doi.org/10.1109/CVPR.2017.632.

  50. J. Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2242–2251, 2017. DOI: https://doi.org/10.1109/ICCV.2017.244.

  51. M. Gharbi, J. W. Chen, J. T. Barron, S. W. Hasinoff, F. Durand. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics, vol.36, no.4, Article number 118, 2017. DOI: https://doi.org/10.1145/3072959.3073592.

  52. J. W. He, Y. H. Liu, Y. Qiao, C. Dong. Conditional sequential modulation for efficient global image retouching. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp.679–695, 2020. DOI: https://doi.org/10.1007/978-3-030-58601-0_40.

    Google Scholar 

  53. H. Zeng, J. R. Cai, L. D. Li, Z. S. Cao, L. Zhang. Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2058–2073, 2020. DOI: https://doi.org/10.1109/TPAMI.2020.3026740.

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61922046 and 62276145), the National Key Research and Development Program of China (No. 2018AAA0100400), and Fundamental Research Funds for Central Universities, China (No. 63223049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qibin Hou.

Ethics declarations

Ming-Ming Cheng is an Associate Editor for Machine Intelligence Research and was not involved in the editorial review, or the decision to publish, this article. All authors declare that there are no other competing interests.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Gang Xu received the B.Sc. degree in information security from Xidian University, China in 2018. He is currently a Ph. D. degree candidate in computer science and technology at Media Computing Laboratory, Nankai University, China.

His research interests include computer vision, deep learning, especially the video frame interpolation, video super-resolution and standard-dynamic-range to high-dynamic-range translation.

Qibin Hou received the Ph.D. degree in computer science and technology from School of Computer Science, Nankai University, China in 2019. Then, he worked at National University of Singapore, Singapore as a research fellow. Now, he is an associate professor at School of Computer Science, Nankai University, China. He has published more than 30 papers on top conferences/journals, including T-PAMI, CVPR, ICCV, NeurIPS, etc.

His research interests include deep learning and computer vision.

Ming-Ming Cheng received the Ph.D. degree in computer science and technology from Tsinghua University, China in 2012. Then, he did two years research fellow with Professor Philip Torr in Oxford, UK. He is now a professor at Nankai University, China, leading the Media Computing Laboratory. He received research awards, including ACM China Rising Star Award, IBM Global SUR Award, and CCF-Intel Young Faculty Researcher Program. He is on the editorial boards of IEEE TPAMI/TIP.

His research interests include computer graphics, computer vision and image processing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, G., Hou, Q. & Cheng, MM. Dual Frequency Transformer for Efficient SDR-to-HDR Translation. Mach. Intell. Res. 21, 538–548 (2024). https://doi.org/10.1007/s11633-023-1418-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-023-1418-8

Keywords

Navigation