Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3615886.3627751acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article
Open access

A Detailed Analysis on the Use of General-purpose Vision Transformers for Remote Sensing Image Segmentation

Published: 20 November 2023 Publication History

Abstract

Image segmentation is currently a hot topic in the context of Earth observation through remote sensing. Recent research has advanced many new models designed specifically for remote sensing image segmentation, often with sophisticated architectures and purposelybuilt mechanisms for this domain. Our work, on the other hand, explores the use of recent general-purpose image segmentation Transformer models on this same context, with emphasis on the adopted training strategy and its influence on segmentation performance. Our objective is to assess the degree to which domainspecific architectures are indeed required to achieve state-of-the-art results, and assess the role of training strategies in the performance of general models. We tested different model sizes and a variety of possibilities in what regards the training strategy, including adaptations to 4-channel inputs, over two datasets used in previous studies. Results show that general-purpose models are indeed competitive with the current state-of-the-art, without relying on purposely-built architectures for remote sensing images.

References

[1]
Nabila Abraham and Naimul Mefraz Khan. 2019. A Novel Focal Tversky Loss Function With Improved Attention U-Net for Lesion Segmentation. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). 683--687.
[2]
Lei Ding, Dong Lin, Shaofu Lin, Jing Zhang, Xiaojie Cui, Yuebin Wang, Hao Tang, and Lorenzo Bruzzone. 2022. Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--13.
[3]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs/2010.11929 (2020).
[4]
K. Grauman and T. Darrell. 2005. The pyramid match kernel: discriminative classification with sets of image features. In Proceedings of the IEEE International Conference on Computer Vision.
[5]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. CoRR abs/1406.4729 (2014).
[6]
Xin He, Yong Zhou, Jiaqi Zhao, Di Zhang, Rui Yao, and Yong Xue. 2022. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--15.
[7]
Yuansheng Hua, Diego Marcos, Lichao Mou, Xiao Xiang Zhu, and Devis Tuia. 2022. Semantic Segmentation of Remote Sensing Images With Sparse Annotations. IEEE Geoscience and Remote Sensing Letters 19 (2022), 1--5.
[8]
Shruti Jadon. 2020. A survey of loss functions for semantic segmentation. In Proceedings of the IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.
[9]
Satyawant Kumar, Abhishek Kumar, and Dong-Gyu Lee. 2023. RemoteNet: Remote Sensing Image Segmentation Network based on Global-Local Information. CoRR abs/2302.13084 (2023).
[10]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[11]
Rui Li, Shunyi Zheng, Chenxi Duan, Jianlin Su, and Ce Zhang. 2022. Multistage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 19 (2022), 1--5.
[12]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. CoRR abs/2103.14030 (2021).
[13]
Zhongyu Sun, Wangping Zhou, Chen Ding, and Min Xia. 2022. Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image. ISPRS International Journal of Geo-Information 11, 3 (2022).
[14]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017).
[15]
Michele Volpi and Vittorio Ferrari. 2015. Semantic segmentation of urban scenes by learning local class interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1--9.
[16]
Libo Wang, Rui Li, Chenxi Duan, Ce Zhang, Xiaoliang Meng, and Shenghui Fang. 2022. A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 19 (2022), 1--5.
[17]
Libo Wang, Rui Li, Ce Zhang, Shenghui Fang, Chenxi Duan, Xiaoliang Meng, and Peter M. Atkinson. 2022. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing 190 (2022), 196--214.
[18]
WenhaiWang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. CoRR abs/2102.12122 (2021).
[19]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. CoRR abs/2105.15203 (2021).
[20]
Yonghao Xu and Pedram Ghamisi. 2022. Consistency-Regularized Region- Growing Network for Semantic Segmentation of Urban Scenes With Point-Level Annotations. IEEE Transactions on Image Processing 31 (2022), 5038--5051.
[21]
Haotian Yan, Chuang Zhang, and MingWu. 2022. Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention. CoRR abs/2201.01615 (2022).
[22]
Yanchao Yang and Stefano Soatto. 2020. FDA: Fourier Domain Adaptation for Semantic Segmentation. CoRR abs/2004.05498 (2020).
[23]
Can Yaris, Bohao Huang, Kyle Bradbury, and Jordan M. Malof. 2021. Randomized Histogram Matching: A Simple Augmentation for Unsupervised Domain Adaptation in Overhead Imagery. CoRR abs/2104.14032 (2021).

Cited By

View all
  • (2024)Data Augmentation with Generative Adversarial Network for Solar Panel Segmentation from Remote Sensing ImagesEnergies10.3390/en1713320417:13(3204)Online publication date: 29-Jun-2024

Index Terms

  1. A Detailed Analysis on the Use of General-purpose Vision Transformers for Remote Sensing Image Segmentation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GeoAI '23: Proceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
    November 2023
    135 pages
    ISBN:9798400703485
    DOI:10.1145/3615886
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 November 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Computer Vision
    2. Deep Learning
    3. Image Segmentation
    4. Remote Sensing
    5. Vision Transformers

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Portuguese Recovery and Resilience Plan
    • Fundação para a Ciência e Tecnologia - FCT

    Conference

    SIGSPATIAL '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 17 of 25 submissions, 68%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)251
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data Augmentation with Generative Adversarial Network for Solar Panel Segmentation from Remote Sensing ImagesEnergies10.3390/en1713320417:13(3204)Online publication date: 29-Jun-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media