Abstract
The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Image Parameter Values for High Dynamic Range Television for Use in Production and International Programme Exchange, ITU Standard ITU-R BT.2100-2, 2016.
Parameter Values for the HDTV1 Standards for Production and International Programme Exchange, ITU Standard ITU-R BT.709-6, 2015.
C. Dong, Y. B. Deng, C. C. Loy, X. O. Tang. Compression artifacts reduction by a deep convolutional network. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 576–584, 2015. DOI: https://doi.org/10.1109/ICCV.2015.73.
K. Zhang, W. M. Zuo, S. H. Gu, L. Zhang. Learning deep CNN denoiser prior for image restoration. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2808–2817, 2017. DOI: https://doi.org/10.1109/CVPR.2017.300.
K. Zhang, W. M. Zuo, Y. J. Chen, D. Y. Meng, L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017. DOI: https://doi.org/10.1109/TIP.2017.2662206.
W. S. Dong, P. Y. Wang, W. T. Yin, G. M. Shi, F. F. Wu, X. T. Lu. Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 10, pp. 2305–2318, 2019. DOI: https://doi.org/10.1109/TPAMI.2018.2873610.
S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 5718–5729, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00564.
J. Y. Liang, J. Z. Cao, G. L. Sun, K. Zhang, L. Van Gool, R. Timofte. SwinIR: Image restoration using swin transformer. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 1833–1844, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00210.
H. T. Chen, Y. H. Wang, T. Y. Guo, C. Xu, Y. P. Deng, Z. H. Liu, S. W. Ma, C. J. Xu, C. Xu, W. Gao. Pre-trained image processing transformer. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 12294–12305, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.01212.
X. Y. Chen, Z. W. Zhang, J. S. Ren, L. Tian, Y. Qiao, C. Dong. A new journey from SDRTV to HDRTV. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp.4480–4489, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00446.
P. E. Debevec, J. Malik. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, ACM, Los Angeles, USA, pp. 369–378, 1997. DOI: https://doi.org/10.1145/258734.258884.
Y. L. Liu, W. S. Lai, Y. S. Chen, Y. L. Kao, M. H. Yang, Y. Y. Chuang, J. B. Huang. Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.1648–1657, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00172.
S. Y. Kim, M. Kim. A multi-purpose convolutional neural network for simultaneous super-resolution and high dynamic range image reconstruction. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 379–394, 2018. DOI: https://doi.org/10.1007/978-3-030-20893-6_24.
S. Y. Kim, J. Oh, M. Kim. Deep SR-ITM: Joint learning of super-resolution and inverse tone-mapping for 4K UHD HDR applications. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 3116–3125, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00321.
S. Y. Kim, J. Oh, M. Kim. JSI-GAN: GAN-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for UHD HDR video. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 11287–11295, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6789.
G. Xu, Q. B. Hou, L. Zhang, M. M. Cheng. FMNet: Frequency-aware modulation network for SDR-to-HDR translation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp.6425-6435, 2022. DOI: https://doi.org/10.1145/3503161.3548016.
G. He, K. P. Xu, L. Xu, C. Wu, M. Sun, X. Wen, Y. W. Tai. SDRTV-to-HDRTV via hierarchical dynamic context feature mapping. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp. 2890–2898, 2022. DOI: https://doi.org/10.1145/3503161.3548043.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, 2020.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou. Training data-efficient image transformers Sz distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, pp.10347-10357, 2021.
L. Yuan, Y. P. Chen, T. Wang, W. H. Yu, Y. J. Shi, Z. H. Jiang, F. E. H. Tay, J. S. Feng, S. C. Yan. Tokens-to-token ViT: Training vision transformers from scratch on imagenet. In Proceedings of IEEE/CVF InternationalConference on Computer Vision, IEEE, Montreal, Canada, pp. 538–547, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00060.
L. Yuan, Q. B. Hou, Z. H. Jiang, J. S. Feng, S. C. Yan. VOLO: Vision outlooker for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelhgence, vol. 45, no. 5, pp. 6575–6586, 2023. DOI: https://doi.org/10.1109/TPAMI.2022.3206108.
W. H. Wang, E. Z. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of IEEE/C’VF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 548–558, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00061.
E. Z. Xie, W. H. Wang, Z. D. Yu, A. Anandkumar, J. M. Alvarez, P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th Neural Information Processing Systems, pp.12077-12090,2021.
S. X. Zheng, J. C. Lu, H. S. Zhao, X. T. Zhu, Z. K. Luo, Y. B. Wang, Y. W. Fu, J. F. Feng, T. Xiang, P. H. S. Torr, L. Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 6877–6886, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00681.
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp.213–229, 2020. DOI: https://doi.org/10.1007/978-3-030-58452-813.
Z. Liu, Y. T. Lin, Y. Cao, H. Hu, Y. X. Wei, Z. Zhang, S. Lin, B. N. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE/C VF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 9992–10002, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00986.
X. Z. Zhu, W. J. Su, L. W. Lu, B. Li, X. G. Wang, J. F. Dai. Deformable DETR: Deformable transformers for end-to-end object detection. Proceedings of the 9th International Conference on Learning Representations, 2021.
J. Z. Cao, Y. W. Li, K. Zhang, L. Van Gooi. Video super-resolution transformer. [Online], Available: https://arxiv.org/abs/2106.06847, 2021.
Y. F. Xu, H. P. Wei, M. X. Lin, Y. Y. Deng, K. K. Sheng, M. D. Zhang, F. Tang, W. M. Dong, F. Y. Huang, C. S. Xu. Transformers in computational visual media: A survey. Computational Visual Media, vol. 8, no. 1, pp. 33–62, 2022. DOI: https://doi.org/10.1007/s41095-021-0247-3.
H. P. Wei, Y. Y. Deng, F. Tang, X. J. Pan, W. M. Dong. A comparative study of CNN-and transformer-based visual style transfer. Journal of Computer Science and Technology, vol. 37, no. 3, pp. 601–614, 2022. DOI: https://doi.org/10.1007/s11390-022-2140-7.
Y. Y. Deng, F. Tang, W. M. Dong, C. Y. Ma, X. J. Pan, L. Wang, C. S. Xu. Stytr.2: Image style transfer with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 11316–11326, 2022. DOI: https://doi.org/10.1109/CV-PR52688.2022.01104.
S. Kim, J. Baek, J. Park, G. Kim, S. Kim. InstaFormer: Instance-aware image-to-image translation with transformer. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 18300–18310, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01778.
J. Park, Y. Kim. Styleformer: Transformer based generative adversarial networks with style vector. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp.8973–8982, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00878.
Z. Hui, X. B. Gao, Y. C. Yang, X. M. Wang. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, pp. 2024–2032, 2019. DOI: https://doi.org/10.1145/3343031.3351084.
J. Liu, J. Tang, G. S. Wu. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp.41–55, 2020. DOI: https://doi.org/10.1007/978-3-030-67070-22.
S. H. Gao, M. M. Cheng, K. Zhao, X. Y. Zhang, M. H. Yang, P. Torr. Res2Net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, 2021. DOI: https://doi.org/10.1109/TPAMI.2019.2938758.
J. L. Ba, J. R. Kiros, G. E. Hinton. Layer normalization. [Online], Available: https://arxiv.org/abs/1607.06450, 2016.
X. Y. Zou, F. Y. Xiao, Z. D. Yu, Y. J. Lee. Delving deeper into anti-aliasing in convNets. In Proceedings of the 31st British Machine Vision Conference, Manchester, UK, 2020.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980, 2014.
R. K. Mantiuk, M. Azimi. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Proceedings of Picture Coding Symposium, IEEE, Bristol, UK, 2021. DOI: https://doi.org/10.1109/PCS50896.2021.9477471.
Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. DOI: https://doi.org/10.1109/TIP.2003.819861.
L. Zhang, H. Y. Li. SR-SIM: A fast and high performance IQA index based on spectral residual. In Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, USA, pp. 1473–1476, 2012. DOI: https://doi.org/10.1109/ICIP.2012.6467149.
Objective Metric for the Assessment of the Potential Visibility of Colour Differences in Television, ITU Standard ITU-RBT.2124-0, 2019.
R. Mantiuk, K. J. Kim, A. G. Rempel, W. Heidrich. HDRVDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactionson Graphics, vol. 30, no. 4, Article number 40, 2011. DOI: https://doi.org/10.1145/2010324.1964935.
Y. Q. Huo, F. Yang, L. Dong, V. Brost. Physiological inverse tone mapping based on retina response. The Visual Computer, vol. 30, no. 5, pp. 507–517, 2014. DOI: https://doi.org/10.1007/s00371-013-0875-4.
R. P. Kovaleski, M. M. Oliveira. High-quality reverse tone mapping for a wide range of exposures. In Proceedings of the 27th SIBGRAPI Conference on Graphics, Patterns and Images, IEEE, Rio de Janeiro, Brazil, pp.49–56, 2014. DOI: https://doi.org/10.1109/SIBGRAPI.2014.29.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
P. Isola, J. Y. Zhu, T. H. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 5967–5976, 2017. DOI: https://doi.org/10.1109/CVPR.2017.632.
J. Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2242–2251, 2017. DOI: https://doi.org/10.1109/ICCV.2017.244.
M. Gharbi, J. W. Chen, J. T. Barron, S. W. Hasinoff, F. Durand. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics, vol.36, no.4, Article number 118, 2017. DOI: https://doi.org/10.1145/3072959.3073592.
J. W. He, Y. H. Liu, Y. Qiao, C. Dong. Conditional sequential modulation for efficient global image retouching. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp.679–695, 2020. DOI: https://doi.org/10.1007/978-3-030-58601-0_40.
H. Zeng, J. R. Cai, L. D. Li, Z. S. Cao, L. Zhang. Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2058–2073, 2020. DOI: https://doi.org/10.1109/TPAMI.2020.3026740.
Acknowledgements
This work was supported by National Natural Science Foundation of China (Nos. 61922046 and 62276145), the National Key Research and Development Program of China (No. 2018AAA0100400), and Fundamental Research Funds for Central Universities, China (No. 63223049).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ming-Ming Cheng is an Associate Editor for Machine Intelligence Research and was not involved in the editorial review, or the decision to publish, this article. All authors declare that there are no other competing interests.
Additional information
Colored figures are available in the online version at https://link.springer.com/journal/11633
Gang Xu received the B.Sc. degree in information security from Xidian University, China in 2018. He is currently a Ph. D. degree candidate in computer science and technology at Media Computing Laboratory, Nankai University, China.
His research interests include computer vision, deep learning, especially the video frame interpolation, video super-resolution and standard-dynamic-range to high-dynamic-range translation.
Qibin Hou received the Ph.D. degree in computer science and technology from School of Computer Science, Nankai University, China in 2019. Then, he worked at National University of Singapore, Singapore as a research fellow. Now, he is an associate professor at School of Computer Science, Nankai University, China. He has published more than 30 papers on top conferences/journals, including T-PAMI, CVPR, ICCV, NeurIPS, etc.
His research interests include deep learning and computer vision.
Ming-Ming Cheng received the Ph.D. degree in computer science and technology from Tsinghua University, China in 2012. Then, he did two years research fellow with Professor Philip Torr in Oxford, UK. He is now a professor at Nankai University, China, leading the Media Computing Laboratory. He received research awards, including ACM China Rising Star Award, IBM Global SUR Award, and CCF-Intel Young Faculty Researcher Program. He is on the editorial boards of IEEE TPAMI/TIP.
His research interests include computer graphics, computer vision and image processing.
Rights and permissions
About this article
Cite this article
Xu, G., Hou, Q. & Cheng, MM. Dual Frequency Transformer for Efficient SDR-to-HDR Translation. Mach. Intell. Res. 21, 538–548 (2024). https://doi.org/10.1007/s11633-023-1418-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-023-1418-8