Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627377.3627448acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdtConference Proceedingsconference-collections
research-article

End-to-End Remote Sensing Image Scene Classification with Vision Transformers

Published: 04 December 2023 Publication History

Abstract

In recent years, the development of deep learning technology has led to widespread attention on Vision Transformer (ViT) as an emerging image classification method. Remote sensing image classification is an important task in the field of remote sensing, with extensive application prospects. This paper aims to explore the remote sensing image classification method based on Vision Transformer, addressing the limitations of traditional convolutional neural networks in terms of global perception capability, context information retrieval, and positional encoding. The classification performance of the Vision Transformer model is evaluated and compared on remote sensing datasets. Vision Transformer is a deep neural network model based on self-attention mechanism that can capture the global context information in images and has achieved remarkable performance in various computer vision tasks. Furthermore, experimental results demonstrate that the remote sensing image classification method based on Vision Transformer exhibits outstanding accuracy and generalization ability. Compared to traditional convolutional neural networks, it can better capture the global features in remote sensing images and has better scalability when dealing with large-scale remote sensing image data. Experimental results on different remote sensing image datasets show that the model performs well compared to state-of-the-art methods. Specifically, Vision Transformer achieves average classification accuracies of 95.41%, 98.26%, 93.74% and 95.25% on the AID, UC-Merced, NWPU-RESISC45 and Optimal31 datasets, respectively.

References

[1]
Zhang W, Zhang R, Wang G, Physics Guided Remote Sensing Image Synthesis Network for Ship Detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-14.
[2]
Wang J, Zhong Y, Zheng Z, RSNet: The search for remote sensing deep neural networks in recognition tasks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59(3): 2520-2534.
[3]
Han K, Wang Y, Chen H, A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110.
[4]
Nguyen M H, Quang K N. A Study of Vision Transformer for Lung Diseases Classification[C]//2022 6th International Conference on Green Technology and Sustainable Development (GTSD). IEEE, 2022: 116-121.
[5]
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[6]
Zhao Z, Li J, Luo Z, Remote sensing image scene classification based on an enhanced attention module[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(11): 1926-1930.
[7]
Szegedy C, Liu W, Jia Y, Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
[8]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[9]
Shafiq M, Gu Z. Deep residual learning for image recognition: a survey[J]. Applied Sciences, 2022, 12(18): 8972.
[10]
Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.
[11]
Bi Q, Qin K, Zhang H, APDC-Net: Attention pooling-based convolutional network for aerial scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 17(9): 1603-1607.
[12]
Hu F, Xia G S, Hu J, Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery[J]. Remote Sensing, 2015, 7(11): 14680-14707.
[13]
Li E, Xia J, Du P, Integrating multilayer features of convolutional neural networks for remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(10): 5653-5665.
[14]
Zhang F, Du B, Zhang L. Scene classification via a gradient boosting random convolutional network framework[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 54(3): 1793-1802.
[15]
Han D, Choi Y. GBNet: Gradient Boosting Network for Monocular Depth Estimation[C]//2021 21st International Conference on Control, Automation and Systems (ICCAS). IEEE, 2021: 342-346.
[16]
Xu C, Zhu G, Shu J. A lightweight and robust lie group-convolutional neural networks joint representation for remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-15.
[17]
Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[18]
Khan S, Naseer M, Hayat M, Transformers in vision: A survey[J]. ACM computing surveys (CSUR), 2022, 54(10s): 1-41.
[19]
Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[20]
Xia G S, Hu J, Hu F, AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[21]
Yang Y, Newsam S. Geographic image retrieval using local invariant features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2012, 51(2): 818-832.
[22]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[23]
Wang Q, Liu S, Chanussot J, Scene classification with recurrent attention of VHR remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 57(2): 1155-1167.
[24]
Pan H, Pang Z, Wang Y, A new image recognition and classification method combining transfer learning algorithm and mobilenet model for welding defects[J]. Ieee Access, 2020, 8: 119951-119960.
[25]
Bazi Y, Al Rahhal M M, Alhichri H, Simple yet effective fine-tuning of deep CNNs using an auxiliary classification loss for remote sensing scene classification[J]. Remote Sensing, 2019, 11(24): 2908.
[26]
Aral R A, Keskin Ş R, Kaya M, Classification of trashnet dataset based on deep learning models[C]//2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018: 2058-2062.

Index Terms

  1. End-to-End Remote Sensing Image Scene Classification with Vision Transformers

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICBDT '23: Proceedings of the 2023 6th International Conference on Big Data Technologies
      September 2023
      441 pages
      ISBN:9798400707667
      DOI:10.1145/3627377
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 December 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tag

      1. Multi-head attention

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICBDT 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 48
        Total Downloads
      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media