SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers
<p>(<b>a</b>) Overall architecture of SymSwin, containing three main functional stages. The chief deep feature extraction stage involves SyMWBs and CRAAs. (<b>b</b>) Detailed illustration of SyMWB composition. (<b>c</b>) Detailed illustration of the CRAA module. (<b>d</b>) Detailed illustration of the Swin-DCFF layer. SW-SA denotes conventional shifted-window self-attention. (<b>e</b>) Detailed illustration of DCFF.</p> "> Figure 2
<p>Indication of SyMW mechanism. <span class="html-fig-inline" id="remotesensing-16-04734-i001"><img alt="Remotesensing 16 04734 i001" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i001.png"/></span> denotes window for SyMWB<sub>i</sub>, <span class="html-fig-inline" id="remotesensing-16-04734-i002"><img alt="Remotesensing 16 04734 i002" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i002.png"/></span> denotes window for SyMWB<sub>i+1</sub>, and <span class="html-fig-inline" id="remotesensing-16-04734-i003"><img alt="Remotesensing 16 04734 i003" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i003.png"/></span> denotes feature map of SyMWB<sub>i</sub>. Each feature map represents the extraction of a whole block. The grid denotes the window size used on each feature map. The illustration intuitively demonstrates the SyMW can provide multi-scale context.</p> "> Figure 3
<p>Illustration of the CRAA module, containing two main functional stages. During the CRA stage, we calculate the correlation between context with different receptive fields and achieve flexible fusion. During the AFF stage, we adaptively enhance the fusion feature.</p> "> Figure 4
<p>Illustration of the SWT process. The color space conversion converts an image from RGB space to YCrCb space, and we select the Y-band value, representing the luminance information. LF denotes the low-frequency sub-band, and HF denotes high-frequency sub-bands. The sketches of HF directly depict the horizontal, vertical, and diagonal direction edges.</p> "> Figure 5
<p>The visualization examples of the ×4 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p> "> Figure 5 Cont.
<p>The visualization examples of the ×4 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p> "> Figure 6
<p>The visualization examples of the ×3 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p> "> Figure 6 Cont.
<p>The visualization examples of the ×3 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p> "> Figure 7
<p>A comparison of the visualized feature maps extracted by each layer of the backbone with and without multi-scale representations, illustrating the different regions of interest the nets tend to focus on. The color closer to red denotes the stronger attention.</p> ">
Abstract
:1. Introduction
- We propose the symmetric multi-scale window (SyMW) mechanism to functionalize the backbone with the capability of capturing multi-scale characteristics brought by RS data and generating more precise contexts.
- We introduce the cross-receptive field-adaptive attention (CRAA) module to every block of our backbone to model the dependencies across multi-scale representations, effectively enhancing the information.
- In addition, we train SymSwin with an innovating U-shape wavelet transform (UWT) loss. The UWT aims to leverage frequency features to facilitate more effective image restoration.
2. Related Works
2.1. Transformer-Based Image SR
2.2. Multi-Scale Representation Mechanism in Single RS Image SR
2.3. SR Methods Combining with Wavelet Transform
3. Methodology
3.1. Overview of SymSwin Architecture
3.2. SymSwin Backbone with Multi-Scale Representations
3.2.1. Symmetric Multi-Scale Window (SyMW) Mechanism
3.2.2. Cross-Receptive Field-Adaptive Attention (CRAA) Module
3.3. U-Shape Wavelet Transform (UWT) Loss
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Evaluation Metrics
4.1.3. Implementation Details
4.2. Comparative Experiments
4.2.1. Quantitative Results
4.2.2. Qualitative Results
4.3. Ablation Studies
4.4. Visual Demonstration of Multi-Scale Representation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, X.; Yi, J.; Guo, J.; Song, Y.; Lyu, J.; Xu, J.; Yan, W.; Zhao, J.; Cai, Q.; Min, H. A Review of Image Super-Resolution Approaches Based on Deep Learning and Applications in Remote Sensing. Remote Sens. 2022, 14, 5423. [Google Scholar] [CrossRef]
- Tang, X.; Zhang, H.; Mou, L.; Liu, F.; Zhang, X.; Xiang, X.; Zhu, X.; Jiao, L. An Unsupervised Remote Sensing Change Detection Method Based on Multiscale Graph Convolutional Network and Metric Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5609715. [Google Scholar] [CrossRef]
- Liu, C.; Zhang, S.; Hu, M.; Song, Q. Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sens. 2024, 16, 907. [Google Scholar] [CrossRef]
- Wu, F.; Duan, J.; Chen, S.; Ye, Y.; Ai, P.; Yang, Z. Multi-target recognition of bananas and automatic positioning for the inflorescence axis cutting point. Front. Plant Sci. 2021, 12, 705021. [Google Scholar] [CrossRef]
- Li, X.; Yong, X.; Li, T.; Tong, Y.; Gao, H.; Wang, X.; Xu, Z.; Fang, Y.; You, Q.; Lyu, X. A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images. Remote Sens. 2024, 16, 1214. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), LasVegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Zhou, Y.; Li, Z.; Guo, C.-L.; Bai, S.; Cheng, M.-M.; Hou, Q. SRFormer: Permuted Self-Attention for Single Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12734–12745. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Real-worldistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Chen, C.-L.; Qiao, Y.; Tang, X. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), Munich, Germany, 8–14 September 2018; Volume 11133, pp. 63–79. [Google Scholar]
- Hu, W.; Ju, L.; Du, Y.; Li, Y. A Super-Resolution Reconstruction Model for Remote Sensing Image Based on Generative Adversarial Networks. Remote Sens. 2024, 16, 1460. [Google Scholar] [CrossRef]
- Chitwan, S.; Jonathan, H.; William, C.; Tim, S.; David, J.-F.; Mohammad, N. Image Super-Resolution via Iterative Refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4710. [Google Scholar]
- Hshmat, S.; Daniel, W.; Chitwan, S.; David, F. Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild. arXiv 2023, arXiv:2302. 07864. [Google Scholar]
- Xie, L.-B.; Wang, X.-T.; Chen, X.-Y.; Li, G.; Shan, Y.; Zhou, J.-T.; Dong, C. DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models. In Proceedings of the Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; pp. 8561–8572. [Google Scholar]
- Lu, Y.-T.; Wang, S.-Z.; Wang, B.-L.; Zhang, X.; Wang, X.-X.; Zhao, Y.-Q. Enhanced Window-Based Self-Attention with Global and Multi-Scale Representations for Remote Sensing Image Super-Resolution. Remote Sens. 2024, 16, 2837. [Google Scholar] [CrossRef]
- Alexey, D.; Lucas, B.; Alexander, K.; Dirk, W.; Zhai, X.-H.; Thomas, U.; Mostafa, D.; Matthias, M.; Georg, H.; Sylvain, G.; et al. An Image is Worth 16x16 Words Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Zhang, X.; Zhang, Y.-L.; Yu, F. HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Cansu, K.; Ahmet Murat, T. Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Seattle, WA, USA, 17–21 June 2024; pp. 6661–6670. [Google Scholar]
- Boah, K.; Jeongsol, K.; Jong, C.-Y. Task-Agnostic Vision Transformer for Distributed Learning of Image Processing. IEEE Trans. Image Process. 2023, 32, 203. [Google Scholar]
- Liu, Z.; Lin, Y.-T.; Cao, Y.; Hu, H.; Wei, Y.-X.; Zhang, Z.; Lin, S.; Guo, B.-N. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
- Chen, X.-Y.; Wang, X.-T.; Zhou, J.-T.; Qian, Y.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 22367–22377. [Google Scholar]
- Chen, Z.; Zhang, Y.-L.; Gu, J.-J.; Kong, L.-H.; Yang, X.-K.; Yu, F. Dual Aggregation Transformer for Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Pairs, France, 2–6 October 2023; pp. 12312–12321. [Google Scholar]
- Zhang, A.-P.; Ren, W.-Q.; Liu, Y.; Cao, X.-C. Lightweight Image Super-Resolution with Superpixel Token Interaction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Pairs, France, 2–6 October 2023; pp. 12728–12737. [Google Scholar]
- Qin, Y.; Wang, J.-R.; Cao, S.-Y.; Zhu, M.; Sun, J.-Q.; Hao, Z.-C.; Jiang, X. SRBPSwin: Single-Image Super-Resolution for Remote Sensing Images Using a Global Residual Multi-Attention Hybrid Back-Projection Network Based on the Swin Transformer. Remote Sens. 2024, 16, 2252. [Google Scholar] [CrossRef]
- Xiao, Y.; Su, X.; Yuan, Q.-Q.; Liu, D.-H.; Shen, H.-F.; Zhang, L.-P. Satellite Video Super-Resolution via Multiscale Deformable Convolution Alignment and Temporal Grouping Projection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5610819. [Google Scholar] [CrossRef]
- Lei, S.; Shi, Z.-W.; Mo, W.-J. Transformer-Based Multiscale Enhancement for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5615611. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Xiao, Y.; Yuan, Q.-Q.; Jiang, K.; He, J.; Lin, C.W.; Liang, P.-Z. TTST: A Top-k Token Selective Transformer for Remote Sensing Image Super-Resolution. IEEE Trans. Image Process. 2024, 33, 738. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.-F.; Huang, F.-Y.; Liu, S.-Z.; Wang, X.-B.; Jin, Z.-Z. Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super-resolution. arXiv 2023, arXiv:2208.11247v3. [Google Scholar]
- Cansu, K.; Ahmet Murat, T.; Zafer, D. Training generative image super-resolution models by wavelet-domain losses enables better control of artifacts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 5926–5936. [Google Scholar]
- Zhu, Q.-Y.; Wang, H.; Zhang, R.-X. Wavelet Loss Function for Auto-Encoder. IEEE Access 2021, 9, 27101. [Google Scholar] [CrossRef]
- Li, Z.; Kuang, Z.-S.; Zhu, Z.-L.; Wang, H.-P.; Shao, X.-L. Wavelet-based Texture Reformation Network for Image Super-Resolution. IEEE Trans. Image Process. 2022, 31, 2647. [Google Scholar] [CrossRef]
- Zou, W.-B.; Chen, L.; Wu, Y.; Zhang, Y.-C.; Xu, Y.-X.; Shao, J. Joint Wavelet Sub-Bands Guided Network for Single Image Super-Resolution. IEEE Trans. Multimedia 2023, 25, 4623. [Google Scholar] [CrossRef]
- Li, W.-J.; Guo, H.; Liu, X.-N.; Liang, K.-M.; Hu, J.-N.; Ma, Z.-Y.; Guo, J. Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network. arXiv 2024, arXiv:2407.19768. [Google Scholar]
- Xiao, T.-T.; Mannat, S.; Eric, M.; Trevor, D.; Piotr, D.; Ross, G. Early Convolutions Help Transformers See Better. In Proceedings of the Neural Information Processing Systems (NeurIPS), Montral, QU, Canada, 6–14 December 2021; pp. 30392–30400. [Google Scholar]
- Shi, W.-Z.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Lu, Y.-T.; Min, L.-T.; Wang, B.-L.; Zheng, L.; Wang, X.-X.; Zhao, Y.-Q.; Long, T. Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5625616. [Google Scholar] [CrossRef]
- Zamir, S.-W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.-S.; Yang, M.-H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 5728–5739. [Google Scholar]
- Chen, L.-Y.; Chun, X.-J.; Zhang, X.-Y.; Sun, J. Simple Baselines for Image Restoration. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Jawerth, B.-D.; Wim, S. An Overview of Wavelet Based Multiresolution Analyses. SIAM Rev. 1994, 36, 377. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.-Q.; Han, J.-W. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRSJ. Photogram. Remote Sens. 2020, 159, 296. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.-A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
- Wolters, P.; Bastani, F.; Kembhavi, A. Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing. arXiv 2023, arXiv:2311.18082v1. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Reconstruction. arXiv 2015, arXiv:1409.1556v5. [Google Scholar]
- Radford, A.; Kim, J.-W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the Conference on Machine Learning (ICML), Vienna, Austria, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Haram, C.; Jeongmin, L.; Jihoon, Y. N-Gram in Swin Transformer for Efficient Lightweight Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 2071–2081. [Google Scholar]
- Ian, J.G.; Jean, P.-A.; Mehdi, M.; Xu, B.; David, W.-F.; Sherjil, O.; Aaron, C.; Yoshua, B. Generative Adversarial Nets. In Proceedings of the Neural Information Processing Systems (NeurIPS), Montral, QU, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Tero, K.; Samuli, L.; Timo, A. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Angeles, CA, USA, 16–19 June 2019. [Google Scholar]
Algorithms | Scales | NWPU-RESISC45 | DIOR | ||||||
---|---|---|---|---|---|---|---|---|---|
PSNR | SSIM | LPIPS | CLIPscore | PSNR | SSIM | LPIPS | CLIPscore | ||
SwinIR | 2 | 31.562 | 0.906 | 0.117 | 0.974 | 32.354 | 0.892 | 0.121 | 0.971 |
DAT | 2 | 31.210 | 0.899 | 0.123 | 0.973 | 32.337 | 0.892 | 0.121 | 0.971 |
HAT | 2 | 31.502 | 0.905 | 0.118 | 0.974 | 32.362 | 0.892 | 0.121 | 0.971 |
NGswin | 2 | 31.520 | 0.906 | 0.117 | 0.974 | 32.325 | 0.892 | 0.121 | 0.971 |
SRFormer | 2 | 31.531 | 0.906 | 0.118 | 0.975 | 32.334 | 0.891 | 0.121 | 0.970 |
HiT-SR | 2 | 31.498 | 0.905 | 0.117 | 0.975 | 32.351 | 0.892 | 0.120 | 0.971 |
TransENet | 2 | 31.605 | 0.904 | 0.120 | 0.973 | 32.426 | 0.892 | 0.121 | 0.969 |
TTST | 2 | 31.571 | 0.906 | 0.117 | 0.974 | 32.335 | 0.892 | 0.123 | 0.971 |
SymSwin(Ours) | 2 | 31.612 | 0.906 | 0.118 | 0.973 | 32.810 | 0.893 | 0.120 | 0.971 |
SwinIR | 3 | 23.269 | 0.679 | 0.304 | 0.911 | 24.766 | 0.709 | 0.287 | 0.934 |
DAT | 3 | 23.233 | 0.678 | 0.301 | 0.915 | 24.707 | 0.709 | 0.284 | 0.936 |
HAT | 3 | 23.169 | 0.675 | 0.305 | 0.913 | 24.656 | 0.708 | 0.287 | 0.935 |
NGswin | 3 | 23.099 | 0.680 | 0.312 | 0.888 | 24.611 | 0.706 | 0.289 | 0.934 |
SRFormer | 3 | 23.326 | 0.679 | 0.305 | 0.911 | 24.766 | 0.709 | 0.288 | 0.934 |
HiT-SR | 3 | 23.194 | 0.680 | 0.300 | 0.914 | 24.595 | 0.708 | 0.284 | 0.936 |
TransENet | 3 | 23.591 | 0.685 | 0.311 | 0.902 | 24.994 | 0.712 | 0.293 | 0.932 |
TTST | 3 | 23.205 | 0.678 | 0.302 | 0.913 | 24.713 | 0.709 | 0.285 | 0.935 |
SymSwin(Ours) | 3 | 23.603 | 0.688 | 0.296 | 0.913 | 24.994 | 0.718 | 0.283 | 0.936 |
SwinIR | 4 | 27.694 | 0.744 | 0.303 | 0.876 | 27.784 | 0.769 | 0.267 | 0.914 |
DAT | 4 | 27.715 | 0.745 | 0.303 | 0.875 | 27.988 | 0.773 | 0.263 | 0.916 |
HAT | 4 | 27.708 | 0.744 | 0.303 | 0.876 | 27.910 | 0.771 | 0.266 | 0.915 |
NGswin | 4 | 27.684 | 0.743 | 0.303 | 0.876 | 27.913 | 0.770 | 0.266 | 0.915 |
SRFormer | 4 | 27.656 | 0.743 | 0.304 | 0.875 | 27.868 | 0.769 | 0.268 | 0.917 |
HiT-SR | 4 | 27.759 | 0.746 | 0.299 | 0.875 | 24.043 | 0.674 | 0.331 | 0.894 |
TransENet | 4 | 27.531 | 0.736 | 0.308 | 0.874 | 27.709 | 0.764 | 0.274 | 0.907 |
TTST | 4 | 27.716 | 0.745 | 0.303 | 0.875 | 27.964 | 0.772 | 0.264 | 0.914 |
SymSwin(Ours) | 4 | 28.044 | 0.747 | 0.283 | 0.891 | 28.021 | 0.774 | 0.264 | 0.915 |
Class | SwinIR | DAT | HAT | TransENet | TTST | SymSwin |
---|---|---|---|---|---|---|
Airplane | 27.701 | 27.714 | 27.725 | 27.410 | 27.725 | 28.335 |
Airport | 27.951 | 27.968 | 27.946 | 27.822 | 27.943 | 28.235 |
Baseball_diamond | 28.809 | 28.814 | 28.802 | 28.555 | 28.815 | 29.248 |
Basketball_court | 26.158 | 26.189 | 26.169 | 25.923 | 26.181 | 26.660 |
Beach | 29.688 | 29.708 | 29.727 | 29.632 | 29.722 | 29.874 |
Bridge | 29.654 | 29.670 | 29.666 | 29.519 | 29.679 | 29.861 |
Chaparral | 25.420 | 25.402 | 25.450 | 25.172 | 25.447 | 25.753 |
Church | 26.467 | 26.502 | 26.514 | 26.317 | 26.508 | 26.859 |
Circular_farmland | 33.179 | 33.208 | 33.180 | 32.913 | 33.187 | 33.527 |
Cloud | 33.956 | 33.969 | 33.967 | 33.841 | 33.978 | 34.100 |
Commercial_area | 25.045 | 25.093 | 25.075 | 24.927 | 25.083 | 25.347 |
Dense_residential | 22.883 | 22.959 | 22.895 | 22.732 | 22.906 | 23.410 |
Desert | 30.996 | 30.998 | 31.019 | 30.951 | 30.997 | 31.133 |
Forest | 26.180 | 26.184 | 26.185 | 26.119 | 26.183 | 26.265 |
Freeway | 27.169 | 27.163 | 27.189 | 26.982 | 27.191 | 27.622 |
Golf_course | 30.964 | 30.978 | 30.977 | 30.797 | 30.967 | 31.235 |
Ground_track_field | 25.814 | 25.835 | 25.828 | 25.690 | 25.828 | 26.068 |
Harbor | 21.758 | 21.801 | 21.781 | 21.585 | 21.771 | 22.281 |
Industrial_area | 26.182 | 26.219 | 26.187 | 25.970 | 26.197 | 26.643 |
Intersection | 23.077 | 23.114 | 23.093 | 22.926 | 23.111 | 23.864 |
Island | 33.053 | 33.046 | 33.047 | 32.906 | 33.059 | 33.238 |
Lake | 28.689 | 28.688 | 28.695 | 28.628 | 28.687 | 28.808 |
Meadow | 31.773 | 31.773 | 31.760 | 31.714 | 31.776 | 31.840 |
Medium_residential | 26.050 | 26.072 | 26.079 | 25.916 | 26.087 | 26.459 |
Mobile_home_park | 23.501 | 23.542 | 23.517 | 23.316 | 23.541 | 24.024 |
Mountain | 28.925 | 28.936 | 28.930 | 28.854 | 28.915 | 29.036 |
Overpass | 26.435 | 26.520 | 26.478 | 26.140 | 26.525 | 27.073 |
Palace | 24.440 | 24.458 | 24.456 | 24.262 | 24.474 | 24.768 |
Parking_lot | 23.254 | 23.307 | 23.268 | 23.026 | 23.314 | 23.998 |
Railway | 25.783 | 25.805 | 25.811 | 25.605 | 25.810 | 26.081 |
Railway_station | 26.600 | 26.621 | 26.613 | 26.419 | 26.616 | 26.974 |
Rectangular_farmland | 30.696 | 30.731 | 30.703 | 30.474 | 30.706 | 31.000 |
River | 29.515 | 29.525 | 29.517 | 29.407 | 29.518 | 29.675 |
Roundabout | 25.179 | 25.214 | 25.191 | 25.026 | 25.207 | 25.461 |
Runway | 33.034 | 33.049 | 33.025 | 32.733 | 33.087 | 33.959 |
Sea_ice | 31.522 | 31.517 | 31.529 | 31.388 | 31.528 | 31.736 |
Ship | 29.676 | 29.677 | 29.667 | 29.478 | 29.694 | 29.987 |
Snowberg | 23.856 | 23.862 | 23.893 | 23.771 | 23.875 | 23.997 |
Sparse_residential | 27.241 | 27.248 | 27.261 | 27.141 | 27.262 | 27.398 |
Stadium | 27.405 | 27.422 | 27.442 | 27.184 | 27.439 | 27.819 |
Storage_tank | 27.846 | 27.893 | 27.867 | 27.624 | 27.858 | 28.363 |
Tennis_court | 26.143 | 26.170 | 26.152 | 26.010 | 26.176 | 26.487 |
Terrace | 28.195 | 28.196 | 28.219 | 28.016 | 28.219 | 28.508 |
Thermal_power | 27.534 | 27.615 | 27.565 | 27.342 | 27.578 | 27.997 |
Wetland | 30.822 | 30.821 | 30.821 | 30.709 | 30.829 | 30.979 |
Average | 27.694 | 27.715 | 27.708 | 27.531 | 27.716 | 28.044 |
Model Params | Model0 (Base) 11.900 M | Model1 12.560 M | Model2 14.713 M | Model3 15.035 M | Model4 (SymSwin) 15.035 M |
---|---|---|---|---|---|
SyMW | w/o | w | w/o | w | w |
CRAA | w/o | w/o | w | w | w |
UWT | w/o | w/o | w/o | w/o | w |
PSNR | 27.694 | 27.749 | 27.707 | 27.776 | 28.044 |
SSIM | 0.744 | 0.747 | 0.744 | 0.747 | 0.747 |
LPIPS | 0.303 | 0.301 | 0.303 | 0.300 | 0.283 |
CLIPscore | 0.876 | 0.874 | 0.876 | 0.875 | 0.891 |
Model | Parameters (×4/×3/×2) | FLOPs (×4/×3/×2) |
---|---|---|
SwinIR | 11.900 M/11.937 M/11.752 M | 50.546 G/48.836 G/48.045 G |
DAT | 11.212 M/11.249 M/11.064 M | 46.618 G/44.907 G/44.117 G |
HAT | 20.572 M/20.609 M/20.424 M | 85.707 G/83.997 G/83.207 G |
NGswin | 14.672 M/14.709 M/14.524 M | 51.246 G/49.536 G/48.745 G |
SRFormer | 10.440 M/10.477 M/10.292 M | 44.580 G/42.869 G/42.079 G |
HiT-SR | 10.418 M/10.455 M/10.270 M | 47.300 G/45.590 G/44.800 G |
TransENet | 9.404 M/9.441 M/9.256 M | 12.536 G/8.804 G/6.569 G |
TTST | 18.367 M/18.403 M/18.219 M | 76.842 G/75.132 G/74.341 G |
SymSwin | 15.035 M/15.072 M/14.887 M | 63.046 G/61.336 G/60.545 G |
Model | Scale | Parameters | FLOPs | PSNR | SSIM | LPIPS | CLIPscore |
---|---|---|---|---|---|---|---|
SymSwin | 4 | 15.035 M | 63.046 G | 28.044 | 0.747 | 0.283 | 0.891 |
3 | 15.072 M | 61.336 G | 23.603 | 0.688 | 0.296 | 0.913 | |
2 | 14.887 M | 60.545 G | 31.612 | 0.906 | 0.118 | 0.973 | |
SymSwin-Light | 4 | 12.905 M | 54.988 G | 28.019 | 0.758 | 0.284 | 0.887 |
3 | 12.942 M | 53.277 G | 23.793 | 0.692 | 0.301 | 0.906 | |
2 | 12.757 M | 52.487 G | 31.415 | 0.905 | 0.118 | 0.973 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiao, D.; Su, N.; Yan, Y.; Liang, Y.; Feng, S.; Zhao, C.; He, G. SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers. Remote Sens. 2024, 16, 4734. https://doi.org/10.3390/rs16244734
Jiao D, Su N, Yan Y, Liang Y, Feng S, Zhao C, He G. SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers. Remote Sensing. 2024; 16(24):4734. https://doi.org/10.3390/rs16244734
Chicago/Turabian StyleJiao, Dian, Nan Su, Yiming Yan, Ying Liang, Shou Feng, Chunhui Zhao, and Guangjun He. 2024. "SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers" Remote Sensing 16, no. 24: 4734. https://doi.org/10.3390/rs16244734
APA StyleJiao, D., Su, N., Yan, Y., Liang, Y., Feng, S., Zhao, C., & He, G. (2024). SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers. Remote Sensing, 16(24), 4734. https://doi.org/10.3390/rs16244734