MDPI - Publisher of Open Access Journals

25 pages, 115458 KiB

Open AccessArticle

RSAM-Seg: A SAM-Based Model with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

by Jie Zhang, Yunxin Li, Xubing Yang, Rui Jiang and Li Zhang

Remote Sens. 2025, 17(4), 590; https://doi.org/10.3390/rs17040590 - 8 Feb 2025

Viewed by 815

High-resolution remote sensing satellites have revolutionized remote sensing research, yet accurately segmenting specific targets from complex satellite imagery remains challenging. While the Segment Anything Model (SAM) has emerged as a promising universal segmentation model, its direct application to remote sensing imagery yields suboptimal [...] Read more.

High-resolution remote sensing satellites have revolutionized remote sensing research, yet accurately segmenting specific targets from complex satellite imagery remains challenging. While the Segment Anything Model (SAM) has emerged as a promising universal segmentation model, its direct application to remote sensing imagery yields suboptimal results. To address these limitations, we propose RSAM-Seg, a novel deep learning model adapted from SAM specifically designed for remote sensing applications. Our model incorporates two key components: Adapter-Scale and Adapter-Feature modules. The Adapter-Scale modules, integrated within Vision Transformer (ViT) blocks, enhance model adaptability through learnable transformations, while the Adapter-Feature modules, positioned between ViT blocks, generate image-informed prompts by incorporating task-specific information. Extensive experiments across four binary and two multi-class segmentation scenarios demonstrate the superior performance of RSAM-Seg, achieving an

F 1

score of 0.815 in cloud detection, 0.834 in building segmentation, and 0.755 in road extraction, consistently outperforming established architectures like U-Net, DeepLabV3+, and Segformer. Moreover, RSAM-Seg shows significant improvements of up to 56.5% in

F 1

score compared to the original SAM. In addition, RSAM-Seg maintains robust performance in few-shot learning scenarios, achieving an

F 1

score of 0.656 with only 1% of the training data and increasing to 0.815 with full data availability. Furthermore, RSAM-Seg exhibits the capability to detect missing areas within the ground truth of certain datasets, highlighting its capability for completion. Full article

(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis)

► Show Figures

Graphical abstract

14 pages, 4833 KiB

Open AccessArticle

Automatic Road Extraction from Historical Maps Using Transformer-Based SegFormers

by Elif Sertel, Can Michael Hucko and Mustafa Erdem Kabadayı

ISPRS Int. J. Geo-Inf. 2024, 13(12), 464; https://doi.org/10.3390/ijgi13120464 - 21 Dec 2024

Viewed by 1403

Abstract

Historical maps are valuable sources of geospatial data for various geography-related applications, providing insightful information about historical land use, transportation infrastructure, and settlements. While transformer-based segmentation methods have been widely applied to image segmentation tasks, they have mostly focused on satellite images. There [...] Read more.

Historical maps are valuable sources of geospatial data for various geography-related applications, providing insightful information about historical land use, transportation infrastructure, and settlements. While transformer-based segmentation methods have been widely applied to image segmentation tasks, they have mostly focused on satellite images. There is a growing need to explore transformer-based approaches for geospatial object extraction from historical maps, given their superior performance over traditional convolutional neural network (CNN)-based architectures. In this research, we aim to automatically extract five different road types from historical maps, using a road dataset digitized from the scanned Deutsche Heereskarte 1:200,000 Türkei (DHK 200 Turkey) maps. We applied the variants of the transformer-based SegFormer model and evaluated the effects of different encoders, batch sizes, loss functions, optimizers, and augmentation techniques on road extraction performance. Our best results, with an intersection over union (IoU) of 0.5411 and an F1 score of 0.7017, were achieved using the SegFormer-B2 model, the Adam optimizer, and the focal loss function. All SegFormer-based experiments outperformed previously reported CNN-based segmentation models on the same dataset. In general, increasing the batch size and using larger SegFormer variants (from B0 to B2) resulted in improved accuracy metrics. Additionally, the choice of augmentation techniques significantly influenced the outcomes. Our results demonstrate that SegFormer models substantially enhance true positive predictions and resulted in higher precision metric values. These findings suggest that the output weights could be directly applied to transfer learning for similar historical maps and the inference of additional DHK maps, while offering a promising architecture for future road extraction studies. Full article

► Show Figures

Figure 1

16 pages, 11666 KiB

Open AccessArticle

A Lightweight Building Extraction Approach for Contour Recovery in Complex Urban Environments

by Jiaxin He, Yong Cheng, Wei Wang, Zhoupeng Ren, Ce Zhang and Wenjie Zhang

Remote Sens. 2024, 16(5), 740; https://doi.org/10.3390/rs16050740 - 20 Feb 2024

Cited by 3 | Viewed by 1614

Abstract

High-spatial-resolution urban buildings play a crucial role in urban planning, emergency response, and disaster management. However, challenges such as missing building contours due to occlusion problems (occlusion between buildings of different heights and buildings obscured by trees), uneven contour extraction due to mixing [...] Read more.

High-spatial-resolution urban buildings play a crucial role in urban planning, emergency response, and disaster management. However, challenges such as missing building contours due to occlusion problems (occlusion between buildings of different heights and buildings obscured by trees), uneven contour extraction due to mixing of building edges with other feature elements (roads, vehicles, and trees), and slow training speed in high-resolution image data hinder efficient and accurate building extraction. To address these issues, we propose a semantic segmentation model composed of a lightweight backbone, coordinate attention module, and pooling fusion module, which achieves lightweight building extraction and adaptive recovery of spatial contours. Comparative experiments were conducted on datasets featuring typical urban building instances in China and the Mapchallenge dataset, comparing our method with several classical and mainstream semantic segmentation algorithms. The results demonstrate the effectiveness of our approach, achieving excellent mean intersection over union (mIoU) and frames per second (FPS) scores on both datasets (China dataset: 85.11% and 110.67 FPS; Mapchallenge dataset: 90.27% and 117.68 FPS). Quantitative evaluations indicate that our model not only significantly improves computational speed but also ensures high accuracy in the extraction of urban buildings from high-resolution imagery. Specifically, on a typical urban building dataset from China, our model shows an accuracy improvement of 0.64% and a speed increase of 70.03 FPS compared to the baseline model. On the Mapchallenge dataset, our model achieves an accuracy improvement of 0.54% and a speed increase of 42.39 FPS compared to the baseline model. Our research indicates that lightweight networks show significant potential in urban building extraction tasks. In the future, the segmentation accuracy and prediction speed can be further balanced on the basis of adjusting the deep learning model or introducing remote sensing indices, which can be applied to research scenarios such as greenfield extraction or multi-class target extraction. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence for Remote Sensing: Methodology and Applications)

► Show Figures

Figure 1

12 pages, 10744 KiB

Open AccessCommunication

Application of C-LSTM Networks to Automatic Labeling of Vehicle Dynamic Response Data for Bridges

by Ryota Shin, Yukihiko Okada and Kyosuke Yamamoto

Sensors 2022, 22(9), 3486; https://doi.org/10.3390/s22093486 - 3 May 2022

Cited by 6 | Viewed by 2589

Abstract

Maintaining bridges that support road infrastructure is critical to the economy and human life. Structural health monitoring of bridges using vibration includes direct monitoring and drive-by monitoring. Drive-by monitoring uses a vehicle equipped with accelerometers to drive over bridges and estimates the bridge’s [...] Read more.

Maintaining bridges that support road infrastructure is critical to the economy and human life. Structural health monitoring of bridges using vibration includes direct monitoring and drive-by monitoring. Drive-by monitoring uses a vehicle equipped with accelerometers to drive over bridges and estimates the bridge’s health from the vehicle vibration obtained. In this study, we attempt to identify the driving segments on bridges in the vehicle vibration data for the practical application of drive-by monitoring. We developed an in-vehicle sensor system that can measure three-dimensional behavior, and we propose a new problem of identifying the driving segment of vehicle vibration on a bridge from data measured in a field experiment. The “on a bridge” label was assigned based on the peaks in the vehicle vibration when running at joints. A supervised binary classification model using C-LSTM (Convolution—Long-Term Short Memory) networks was constructed and applied to data measured, and the model was successfully constructed with high accuracy. The challenge is to build a model that can be applied to bridges where joints do not exist. Therefore, future work is needed to propose a running label on bridges based on bridge vibration and extend the model to a multi-class model. Full article

(This article belongs to the Special Issue Selected Papers from the 28th World Congress on Engineering (WCE 2021))

► Show Figures

Figure 1

18 pages, 3934 KiB

Open AccessFeature PaperArticle

Multi-Class Strategies for Joint Building Footprint and Road Detection in Remote Sensing

by Christian Ayala, Carlos Aranda and Mikel Galar

Appl. Sci. 2021, 11(18), 8340; https://doi.org/10.3390/app11188340 - 8 Sep 2021

Cited by 4 | Viewed by 2960

Abstract

Building footprints and road networks are important inputs for a great deal of services. For instance, building maps are useful for urban planning, whereas road maps are essential for disaster response services. Traditionally, building and road maps are manually generated by remote sensing [...] Read more.

Building footprints and road networks are important inputs for a great deal of services. For instance, building maps are useful for urban planning, whereas road maps are essential for disaster response services. Traditionally, building and road maps are manually generated by remote sensing experts or land surveying, occasionally assisted by semi-automatic tools. In the last decade, deep learning-based approaches have demonstrated their capabilities to extract these elements automatically and accurately from remote sensing imagery. The building footprint and road network detection problem can be considered a multi-class semantic segmentation task, that is, a single model performs a pixel-wise classification on multiple classes, optimizing the overall performance. However, depending on the spatial resolution of the imagery used, both classes may coexist within the same pixel, drastically reducing their separability. In this regard, binary decomposition techniques, which have been widely studied in the machine learning literature, are proved useful for addressing multi-class problems. Accordingly, the multi-class problem can be split into multiple binary semantic segmentation sub-problems, specializing different models for each class. Nevertheless, in these cases, an aggregation step is required to obtain the final output labels. Additionally, other novel approaches, such as multi-task learning, may come in handy to further increase the performance of the binary semantic segmentation models. Since there is no certainty as to which strategy should be carried out to accurately tackle a multi-class remote sensing semantic segmentation problem, this paper performs an in-depth study to shed light on the issue. For this purpose, open-access Sentinel-1 and Sentinel-2 imagery (at 10 m) are considered for extracting buildings and roads, making use of the well-known U-Net convolutional neural network. It is worth stressing that building and road classes may coexist within the same pixel when working at such a low spatial resolution, setting a challenging problem scheme. Accordingly, a robust experimental study is developed to assess the benefits of the decomposition strategies and their combination with a multi-task learning scheme. The obtained results demonstrate that decomposing the considered multi-class remote sensing semantic segmentation problem into multiple binary ones using a One-vs.-All binary decomposition technique leads to better results than the standard direct multi-class approach. Additionally, the benefits of using a multi-task learning scheme for pushing the performance of binary segmentation models are also shown. Full article

(This article belongs to the Special Issue Computer Vision in the Era of Deep Learning)

► Show Figures

Figure 1

15 pages, 5623 KiB

Open AccessEditor’s ChoiceArticle

Automatic Road Extraction from Historical Maps Using Deep Learning Techniques: A Regional Case Study of Turkey in a German World War II Map

by Burak Ekim, Elif Sertel and M. Erdem Kabadayı

ISPRS Int. J. Geo-Inf. 2021, 10(8), 492; https://doi.org/10.3390/ijgi10080492 - 21 Jul 2021

Cited by 30 | Viewed by 5891

Abstract

Scanned historical maps are available from different sources in various scales and contents. Automatic geographical feature extraction from these historical maps is an essential task to derive valuable spatial information on the characteristics and distribution of transportation infrastructures and settlements and to conduct [...] Read more.

Scanned historical maps are available from different sources in various scales and contents. Automatic geographical feature extraction from these historical maps is an essential task to derive valuable spatial information on the characteristics and distribution of transportation infrastructures and settlements and to conduct quantitative and geometrical analysis. In this research, we used the Deutsche Heereskarte 1:200,000 Türkei (DHK 200 Turkey) maps as the base geoinformation source to construct the past transportation networks using the deep learning approach. Five different road types were digitized and labeled to be used as inputs for the proposed deep learning-based segmentation approach. We adapted U-Net++ and ResneXt50_32×4d architectures to produce multi-class segmentation masks and perform feature extraction to determine various road types accurately. We achieved remarkable results, with 98.73% overall accuracy, 41.99% intersection of union, and 46.61% F1 score values. The proposed method can be implemented in DHK maps of different countries to automatically extract different road types and used for transfer learning of different historical maps. Full article

► Show Figures

Figure 1

17 pages, 5020 KiB

Open AccessArticle

An Innovative Intelligent System with Integrated CNN and SVM: Considering Various Crops through Hyperspectral Image Data

by Shiuan Wan, Mei-Ling Yeh and Hong-Lin Ma

ISPRS Int. J. Geo-Inf. 2021, 10(4), 242; https://doi.org/10.3390/ijgi10040242 - 7 Apr 2021

Cited by 16 | Viewed by 3186

Abstract

Generation of a thematic map is important for scientists and agriculture engineers in analyzing different crops in a given field. Remote sensing data are well-accepted for image classification on a vast area of crop investigation. However, most of the research has currently focused [...] Read more.

Generation of a thematic map is important for scientists and agriculture engineers in analyzing different crops in a given field. Remote sensing data are well-accepted for image classification on a vast area of crop investigation. However, most of the research has currently focused on the classification of pixel-based image data for analysis. The study was carried out to develop a multi-category crop hyperspectral image classification system to identify the major crops in the Chiayi Golden Corridor. The hyperspectral image data from CASI (Compact Airborne Spectrographic Imager) were used as the experimental data in this study. A two-stage classification was designed to display the performance of the image classification. More specifically, the study used a multi-class classification by support vector machine (SVM) + convolutional neural network (CNN) for image classification analysis. SVM is a supervised learning model that analyzes data used for classification. CNN is a class of deep neural networks that is applied to analyzing visual imagery. The image classification comparison was made among four crops (paddy rice, potatoes, cabbages, and peanuts), roads, and structures for classification. In the first stage, the support vector machine handled the hyperspectral image classification through pixel-based analysis. Then, the convolution neural network improved the classification of image details through various blocks (cells) of segmentation in the second stage. A series of discussion and analyses of the results are presented. The repair module was also designed to link the usage of CNN and SVM to remove the classification errors. Full article

(This article belongs to the Special Issue Deep Learning Meets GIR: Recent Advances in Geographic Information Retrieval)

► Show Figures

Figure 1

Figure 1
Study area. (a) Hyperspectrum image (April, 2016). (b) Ground truth data. Full article ">Figure 2
The convolutional neural network (CNN) model for prediction. Full article ">Figure 3
Research steps of the study. Full article ">Figure 4
The progress of the Cell program derived from the regional object classification model (ROC) program. The line is represent the margin of linear regression function by collecting the coordinate data of blue pixels: (a) The ROC program find seeds of Area and Similarity. (b) Removal of the tiny parts to be integrated. Full article ">Figure 5
Thematic map of the first stage on SVM. Full article ">Figure 6
Resize window of CNN selection by the Cell program and the prepared inputs for CNN. Full article ">Figure 7
Accuracy of feature selection of PCA 8 epoch 30. Full article ">Figure 8
Accuracy of feature selection of PCA 24 of epoch 30. Full article ">

37 pages, 41216 KiB

Open AccessArticle

Pixel-Wise Classification of High-Resolution Ground-Based Urban Hyperspectral Images with Convolutional Neural Networks

by Farid Qamar and Gregory Dobler

Remote Sens. 2020, 12(16), 2540; https://doi.org/10.3390/rs12162540 - 7 Aug 2020

Cited by 14 | Viewed by 5180

Abstract

Using ground-based, remote hyperspectral images from 0.4–1.0 micron in ∼850 spectral channels—acquired with the Urban Observatory facility in New York City—we evaluate the use of one-dimensional Convolutional Neural Networks (CNNs) for pixel-level classification and segmentation of built and natural materials in urban environments. [...] Read more.

Using ground-based, remote hyperspectral images from 0.4–1.0 micron in ∼850 spectral channels—acquired with the Urban Observatory facility in New York City—we evaluate the use of one-dimensional Convolutional Neural Networks (CNNs) for pixel-level classification and segmentation of built and natural materials in urban environments. We find that a multi-class model trained on hand-labeled pixels containing Sky, Clouds, Vegetation, Water, Building facades, Windows, Roads, Cars, and Metal structures yields an accuracy of 90–97% for three different scenes. We assess the transferability of this model by training on one scene and testing to another with significantly different illumination conditions and/or different content. This results in a significant (∼45%) decrease in the model precision and recall as does training on all scenes at once and testing on the individual scenes. These results suggest that while CNNs are powerful tools for pixel-level classification of very high-resolution spectral data of urban environments, retraining between scenes may be necessary. Furthermore, we test the dependence of the model on several instrument- and data-specific parameters including reduced spectral resolution (down to 15 spectral channels) and number of available training instances. The results are strongly class-dependent; however, we find that the classification of natural materials is particularly robust, especially the Vegetation class with a precision and recall >94% for all scenes and model transfers and >90% with only a single training instance. Full article

(This article belongs to the Special Issue Feature Extraction and Data Classification in Hyperspectral Imaging)

► Show Figures

Graphical abstract

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI