Dual Frequency Transformer for Efficient SDR-to-HDR Translation

294 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-attention StarGAN for Multi-domain Image-to-Image Translation

Lightweight improved residual network for efficient inverse tone mapping

Article 19 January 2024

SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution

Article 24 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Image Parameter Values for High Dynamic Range Television for Use in Production and International Programme Exchange, ITU Standard ITU-R BT.2100-2, 2016.
Parameter Values for the HDTV1 Standards for Production and International Programme Exchange, ITU Standard ITU-R BT.709-6, 2015.
C. Dong, Y. B. Deng, C. C. Loy, X. O. Tang. Compression artifacts reduction by a deep convolutional network. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 576–584, 2015. DOI: https://doi.org/10.1109/ICCV.2015.73.
K. Zhang, W. M. Zuo, S. H. Gu, L. Zhang. Learning deep CNN denoiser prior for image restoration. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2808–2817, 2017. DOI: https://doi.org/10.1109/CVPR.2017.300.
K. Zhang, W. M. Zuo, Y. J. Chen, D. Y. Meng, L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017. DOI: https://doi.org/10.1109/TIP.2017.2662206.
Article MathSciNet Google Scholar
W. S. Dong, P. Y. Wang, W. T. Yin, G. M. Shi, F. F. Wu, X. T. Lu. Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 10, pp. 2305–2318, 2019. DOI: https://doi.org/10.1109/TPAMI.2018.2873610.
Article Google Scholar
S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 5718–5729, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00564.
Google Scholar
J. Y. Liang, J. Z. Cao, G. L. Sun, K. Zhang, L. Van Gool, R. Timofte. SwinIR: Image restoration using swin transformer. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 1833–1844, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00210.
Google Scholar
H. T. Chen, Y. H. Wang, T. Y. Guo, C. Xu, Y. P. Deng, Z. H. Liu, S. W. Ma, C. J. Xu, C. Xu, W. Gao. Pre-trained image processing transformer. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 12294–12305, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.01212.
Google Scholar
X. Y. Chen, Z. W. Zhang, J. S. Ren, L. Tian, Y. Qiao, C. Dong. A new journey from SDRTV to HDRTV. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp.4480–4489, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00446.
Google Scholar
P. E. Debevec, J. Malik. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, ACM, Los Angeles, USA, pp. 369–378, 1997. DOI: https://doi.org/10.1145/258734.258884.
Google Scholar
Y. L. Liu, W. S. Lai, Y. S. Chen, Y. L. Kao, M. H. Yang, Y. Y. Chuang, J. B. Huang. Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.1648–1657, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00172.
Google Scholar
S. Y. Kim, M. Kim. A multi-purpose convolutional neural network for simultaneous super-resolution and high dynamic range image reconstruction. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 379–394, 2018. DOI: https://doi.org/10.1007/978-3-030-20893-6_24.
Google Scholar
S. Y. Kim, J. Oh, M. Kim. Deep SR-ITM: Joint learning of super-resolution and inverse tone-mapping for 4K UHD HDR applications. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 3116–3125, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00321.
Google Scholar
S. Y. Kim, J. Oh, M. Kim. JSI-GAN: GAN-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for UHD HDR video. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 11287–11295, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6789.
G. Xu, Q. B. Hou, L. Zhang, M. M. Cheng. FMNet: Frequency-aware modulation network for SDR-to-HDR translation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp.6425-6435, 2022. DOI: https://doi.org/10.1145/3503161.3548016.
G. He, K. P. Xu, L. Xu, C. Wu, M. Sun, X. Wen, Y. W. Tai. SDRTV-to-HDRTV via hierarchical dynamic context feature mapping. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp. 2890–2898, 2022. DOI: https://doi.org/10.1145/3503161.3548043.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, 2020.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou. Training data-efficient image transformers Sz distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, pp.10347-10357, 2021.
L. Yuan, Y. P. Chen, T. Wang, W. H. Yu, Y. J. Shi, Z. H. Jiang, F. E. H. Tay, J. S. Feng, S. C. Yan. Tokens-to-token ViT: Training vision transformers from scratch on imagenet. In Proceedings of IEEE/CVF InternationalConference on Computer Vision, IEEE, Montreal, Canada, pp. 538–547, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00060.
Google Scholar
L. Yuan, Q. B. Hou, Z. H. Jiang, J. S. Feng, S. C. Yan. VOLO: Vision outlooker for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelhgence, vol. 45, no. 5, pp. 6575–6586, 2023. DOI: https://doi.org/10.1109/TPAMI.2022.3206108.
Google Scholar
W. H. Wang, E. Z. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of IEEE/C’VF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 548–558, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00061.
Google Scholar
E. Z. Xie, W. H. Wang, Z. D. Yu, A. Anandkumar, J. M. Alvarez, P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th Neural Information Processing Systems, pp.12077-12090,2021.
S. X. Zheng, J. C. Lu, H. S. Zhao, X. T. Zhu, Z. K. Luo, Y. B. Wang, Y. W. Fu, J. F. Feng, T. Xiang, P. H. S. Torr, L. Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 6877–6886, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00681.
Google Scholar
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp.213–229, 2020. DOI: https://doi.org/10.1007/978-3-030-58452-813.
Google Scholar
Z. Liu, Y. T. Lin, Y. Cao, H. Hu, Y. X. Wei, Z. Zhang, S. Lin, B. N. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE/C VF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 9992–10002, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00986.
Google Scholar
X. Z. Zhu, W. J. Su, L. W. Lu, B. Li, X. G. Wang, J. F. Dai. Deformable DETR: Deformable transformers for end-to-end object detection. Proceedings of the 9th International Conference on Learning Representations, 2021.
J. Z. Cao, Y. W. Li, K. Zhang, L. Van Gooi. Video super-resolution transformer. [Online], Available: https://arxiv.org/abs/2106.06847, 2021.
Y. F. Xu, H. P. Wei, M. X. Lin, Y. Y. Deng, K. K. Sheng, M. D. Zhang, F. Tang, W. M. Dong, F. Y. Huang, C. S. Xu. Transformers in computational visual media: A survey. Computational Visual Media, vol. 8, no. 1, pp. 33–62, 2022. DOI: https://doi.org/10.1007/s41095-021-0247-3.
Article Google Scholar
H. P. Wei, Y. Y. Deng, F. Tang, X. J. Pan, W. M. Dong. A comparative study of CNN-and transformer-based visual style transfer. Journal of Computer Science and Technology, vol. 37, no. 3, pp. 601–614, 2022. DOI: https://doi.org/10.1007/s11390-022-2140-7.
Article Google Scholar
Y. Y. Deng, F. Tang, W. M. Dong, C. Y. Ma, X. J. Pan, L. Wang, C. S. Xu. Stytr.2: Image style transfer with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 11316–11326, 2022. DOI: https://doi.org/10.1109/CV-PR52688.2022.01104.
Google Scholar
S. Kim, J. Baek, J. Park, G. Kim, S. Kim. InstaFormer: Instance-aware image-to-image translation with transformer. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 18300–18310, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01778.
Google Scholar
J. Park, Y. Kim. Styleformer: Transformer based generative adversarial networks with style vector. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp.8973–8982, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00878.
Google Scholar
Z. Hui, X. B. Gao, Y. C. Yang, X. M. Wang. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, pp. 2024–2032, 2019. DOI: https://doi.org/10.1145/3343031.3351084.
Google Scholar
J. Liu, J. Tang, G. S. Wu. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp.41–55, 2020. DOI: https://doi.org/10.1007/978-3-030-67070-22.
Google Scholar
S. H. Gao, M. M. Cheng, K. Zhao, X. Y. Zhang, M. H. Yang, P. Torr. Res2Net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, 2021. DOI: https://doi.org/10.1109/TPAMI.2019.2938758.
Article Google Scholar
J. L. Ba, J. R. Kiros, G. E. Hinton. Layer normalization. [Online], Available: https://arxiv.org/abs/1607.06450, 2016.
X. Y. Zou, F. Y. Xiao, Z. D. Yu, Y. J. Lee. Delving deeper into anti-aliasing in convNets. In Proceedings of the 31st British Machine Vision Conference, Manchester, UK, 2020.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980, 2014.
R. K. Mantiuk, M. Azimi. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Proceedings of Picture Coding Symposium, IEEE, Bristol, UK, 2021. DOI: https://doi.org/10.1109/PCS50896.2021.9477471.
Google Scholar
Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. DOI: https://doi.org/10.1109/TIP.2003.819861.
Article Google Scholar
L. Zhang, H. Y. Li. SR-SIM: A fast and high performance IQA index based on spectral residual. In Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, USA, pp. 1473–1476, 2012. DOI: https://doi.org/10.1109/ICIP.2012.6467149.
Objective Metric for the Assessment of the Potential Visibility of Colour Differences in Television, ITU Standard ITU-RBT.2124-0, 2019.
R. Mantiuk, K. J. Kim, A. G. Rempel, W. Heidrich. HDRVDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactionson Graphics, vol. 30, no. 4, Article number 40, 2011. DOI: https://doi.org/10.1145/2010324.1964935.
Y. Q. Huo, F. Yang, L. Dong, V. Brost. Physiological inverse tone mapping based on retina response. The Visual Computer, vol. 30, no. 5, pp. 507–517, 2014. DOI: https://doi.org/10.1007/s00371-013-0875-4.
Article Google Scholar
R. P. Kovaleski, M. M. Oliveira. High-quality reverse tone mapping for a wide range of exposures. In Proceedings of the 27th SIBGRAPI Conference on Graphics, Patterns and Images, IEEE, Rio de Janeiro, Brazil, pp.49–56, 2014. DOI: https://doi.org/10.1109/SIBGRAPI.2014.29.
Google Scholar
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
P. Isola, J. Y. Zhu, T. H. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 5967–5976, 2017. DOI: https://doi.org/10.1109/CVPR.2017.632.
J. Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2242–2251, 2017. DOI: https://doi.org/10.1109/ICCV.2017.244.
M. Gharbi, J. W. Chen, J. T. Barron, S. W. Hasinoff, F. Durand. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics, vol.36, no.4, Article number 118, 2017. DOI: https://doi.org/10.1145/3072959.3073592.
J. W. He, Y. H. Liu, Y. Qiao, C. Dong. Conditional sequential modulation for efficient global image retouching. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp.679–695, 2020. DOI: https://doi.org/10.1007/978-3-030-58601-0_40.
Google Scholar
H. Zeng, J. R. Cai, L. D. Li, Z. S. Cao, L. Zhang. Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2058–2073, 2020. DOI: https://doi.org/10.1109/TPAMI.2020.3026740.
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61922046 and 62276145), the National Key Research and Development Program of China (No. 2018AAA0100400), and Fundamental Research Funds for Central Universities, China (No. 63223049).

Author information

Authors and Affiliations

Tianjin Media Computing Center, College of Computer Science, Nankai University, Tianjin, 300000, China
Gang Xu, Qibin Hou & Ming-Ming Cheng

Authors

Gang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qibin Hou
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Ming Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qibin Hou.

Ethics declarations

Ming-Ming Cheng is an Associate Editor for Machine Intelligence Research and was not involved in the editorial review, or the decision to publish, this article. All authors declare that there are no other competing interests.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Gang Xu received the B.Sc. degree in information security from Xidian University, China in 2018. He is currently a Ph. D. degree candidate in computer science and technology at Media Computing Laboratory, Nankai University, China.

His research interests include computer vision, deep learning, especially the video frame interpolation, video super-resolution and standard-dynamic-range to high-dynamic-range translation.

Qibin Hou received the Ph.D. degree in computer science and technology from School of Computer Science, Nankai University, China in 2019. Then, he worked at National University of Singapore, Singapore as a research fellow. Now, he is an associate professor at School of Computer Science, Nankai University, China. He has published more than 30 papers on top conferences/journals, including T-PAMI, CVPR, ICCV, NeurIPS, etc.

His research interests include deep learning and computer vision.

Ming-Ming Cheng received the Ph.D. degree in computer science and technology from Tsinghua University, China in 2012. Then, he did two years research fellow with Professor Philip Torr in Oxford, UK. He is now a professor at Nankai University, China, leading the Media Computing Laboratory. He received research awards, including ACM China Rising Star Award, IBM Global SUR Award, and CCF-Intel Young Faculty Researcher Program. He is on the editorial boards of IEEE TPAMI/TIP.

His research interests include computer graphics, computer vision and image processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, G., Hou, Q. & Cheng, MM. Dual Frequency Transformer for Efficient SDR-to-HDR Translation. Mach. Intell. Res. 21, 538–548 (2024). https://doi.org/10.1007/s11633-023-1418-8

Download citation

Received: 05 October 2022
Accepted: 30 January 2023
Published: 12 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11633-023-1418-8

Dual Frequency Transformer for Efficient SDR-to-HDR Translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self-attention StarGAN for Multi-domain Image-to-Image Translation

Lightweight improved residual network for efficient inverse tone mapping

SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dual Frequency Transformer for Efficient SDR-to-HDR Translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self-attention StarGAN for Multi-domain Image-to-Image Translation

Lightweight improved residual network for efficient inverse tone mapping

SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation