Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (368)

Search Parameters:
Keywords = semantic spatial structure

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1397 KiB  
Article
ASOD: Attention-Based Salient Object Detector for Strip Steel Surface Defects
by Hongzhou Yue, Xirui Li, Yange Sun, Li Zhang, Yan Feng and Huaping Guo
Electronics 2025, 14(5), 831; https://doi.org/10.3390/electronics14050831 - 20 Feb 2025
Abstract
The accurate and efficient detection of steel surface defects remains challenging due to complex backgrounds, diverse defect types, and varying defect scales. The existing CNN-based methods often struggle with capturing long-range dependencies and handling complex background noise, resulting in suboptimal performance. Meanwhile, although [...] Read more.
The accurate and efficient detection of steel surface defects remains challenging due to complex backgrounds, diverse defect types, and varying defect scales. The existing CNN-based methods often struggle with capturing long-range dependencies and handling complex background noise, resulting in suboptimal performance. Meanwhile, although Transformer-based approaches are effective in modeling global context, they typically require large-scale datasets and are computationally expensive, limiting their practicality for industrial applications. To address these challenges, we introduce a novel attention-based salient object detector, called the ASOD, to enhance the effectiveness of detectors for strip steel surface defects. In particular, we first design a novel channel-attention-based block including global max/average pooling to focus on the relevant channel-wise features while suppressing irrelevant channel responses, where maximizing pooling extracts the main features of local regions, while removing irrelevant features and average pooling obtain the overall features while removing local details. Then, a new block based on spatial attention is designed to emphasize the area with strip steel surface defects while suppressing irrelevant background areas. In addition, a new cross-spatial-attention-based block is designed to fuse the feature maps with multiple scales filtered through the proposed channel and spatial attention to produce features with better semantic and spatial information such that the detector adapts to strip steel defects of multiple sizes. The experiments show that the ASOD achieves superior performance across multiple evaluation metrics, with a weighted F-measure of 0.9559, an structure measure of 0.9230, a Pratt’s figure of meri of 0.0113, and an mean absolute error of 0.0144. In addition, the ASOD demonstrates strong robustness to noise interference, maintaining consistently high performance even with 10–20% dataset noise, which confirms its stability and reliability. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>The architecture of the ASOD includes a multiscale feature extraction module (MFEM), bottleneck module (BM), and a feature fusion module (FFM). The BM includes spatial-attention-based blocks (SABs) followed by channel-attention-based blocks (CABs). The FFM includes residual decoder blocks (RDBs) followed by cross-spatial-attention-based blocks (CSABs).</p>
Full article ">Figure 2
<p>The structure of the bottleneck module includes SAB and CAB.</p>
Full article ">Figure 3
<p>The structure of RDB.</p>
Full article ">Figure 4
<p>The structure of CSAB includes two SABs.</p>
Full article ">Figure 5
<p>Three types of defects.</p>
Full article ">Figure 6
<p>Convergence curve for SD-saliency-900.</p>
Full article ">Figure 7
<p>Visual results of salient maps: (<b>a</b>) input images, (<b>b</b>) real labels, (<b>c</b>) RCRR, (<b>d</b>) 2LSG, (<b>e</b>) BC, (<b>f</b>) SMD, (<b>g</b>) MIL, (<b>h</b>) PFANet, (<b>i</b>) NLDF, (<b>j</b>) DSS, (<b>k</b>) R3Net, (<b>l</b>) BMPM, (<b>m</b>) PoolNet, (<b>n</b>) PiCANet, (<b>o</b>) CPD, (<b>p</b>) BASNet, (<b>q</b>) EDRNet, and (<b>r</b>) our model.</p>
Full article ">
22 pages, 3475 KiB  
Article
Uncertainty-Aware Adaptive Multiscale U-Net for Low-Contrast Cardiac Image Segmentation
by A. S. M. Sharifuzzaman Sagar, Muhammad Zubair Islam, Jawad Tanveer and Hyung Seok Kim
Appl. Sci. 2025, 15(4), 2222; https://doi.org/10.3390/app15042222 - 19 Feb 2025
Abstract
Medical image analysis is critical for diagnosing and planning treatments, particularly in addressing heart disease, a leading cause of mortality worldwide. Precise segmentation of the left atrium, a key structure in cardiac imaging, is essential for detecting conditions such as atrial fibrillation, heart [...] Read more.
Medical image analysis is critical for diagnosing and planning treatments, particularly in addressing heart disease, a leading cause of mortality worldwide. Precise segmentation of the left atrium, a key structure in cardiac imaging, is essential for detecting conditions such as atrial fibrillation, heart failure, and stroke. However, its complex anatomy, subtle boundaries, and inter-patient variations make accurate segmentation challenging for traditional methods. Recent advancements in deep learning, especially semantic segmentation, have shown promise in addressing these limitations by enabling detailed, pixel-wise classification. This study proposes a novel segmentation framework Adaptive Multiscale U-Net (AMU-Net) combining Convolutional Neural Networks (CNNs) and transformer-based encoder–decoder architectures. The framework introduces a Contextual Dynamic Encoder (CDE) for extracting multi-scale features and capturing long-range dependencies. An Adaptive Feature Decoder Block (AFDB), leveraging an Adaptive Feature Attention Block (AFAB) improves boundary delineation. Additionally, a Spectral Synthesis Fusion Head (SFFH) synthesizes spectral and spatial features, enhancing segmentation performance in low-contrast regions. To ensure robustness, data augmentation techniques such as rotation, scaling, and flipping are applied. Laplacian approximation is employed for uncertainty estimation, enabling interpretability and identifying regions of low confidence. Our proposed model achieves a Dice score of 93.35, a Precision of 94.12, and a Recall of 92.78, outperforming existing methods. Full article
Show Figures

Figure 1

Figure 1
<p>Overall structure of AMU-Net for medical image analysis.</p>
Full article ">Figure 2
<p>Overall structure of CDE encoder block, along with Modulated Predictive Coding Module (MPCM), used in our model.</p>
Full article ">Figure 3
<p>The overall structure of the proposed DMSA module used in the encoder block.</p>
Full article ">Figure 4
<p>Overall structure of AFDB used in our proposed AMU-Net.</p>
Full article ">Figure 5
<p>Illustration of Adaptive Fusion Attention Block.</p>
Full article ">Figure 6
<p>An illustration of the overall framework of the SFFH.</p>
Full article ">Figure 7
<p>Acquired loss and Dice score during the training process of AMU-Net.</p>
Full article ">Figure 8
<p>The visualization results of AMU-net to evaluate the performance of the model.</p>
Full article ">Figure 9
<p>The visualization results FPs and FNs on challenging images.</p>
Full article ">Figure 10
<p>The visualization results of different models along with FP and FN.</p>
Full article ">Figure 11
<p>Uncertainty estimation of the predicted results using Laplacian approximation.</p>
Full article ">Figure 12
<p>Calibration error of the different data shift intensity for baseline and Bayesian models. Diamond represents the outlier.</p>
Full article ">
32 pages, 124914 KiB  
Article
CNN–Transformer Hybrid Architecture for Underwater Sonar Image Segmentation
by Juan Lei, Huigang Wang, Zelin Lei, Jiayuan Li and Shaowei Rong
Remote Sens. 2025, 17(4), 707; https://doi.org/10.3390/rs17040707 - 19 Feb 2025
Abstract
The salient object detection (SOD) of forward-looking sonar images plays a crucial role in underwater detection and rescue tasks. However, the existing SOD algorithms find it difficult to effectively extract salient features and spatial structure information from images with scarce semantic information, uneven [...] Read more.
The salient object detection (SOD) of forward-looking sonar images plays a crucial role in underwater detection and rescue tasks. However, the existing SOD algorithms find it difficult to effectively extract salient features and spatial structure information from images with scarce semantic information, uneven intensity distribution, and high noise. Convolutional neural networks (CNNs) have strong local feature extraction capabilities, but they are easily constrained by the receptive field and lack the ability to model long-range dependencies. Transformers, with their powerful self-attention mechanism, are capable of modeling the global features of a target, but they tend to lose a significant amount of local detail. Mamba effectively models long-range dependencies in long sequence inputs through a selection mechanism, offering a novel approach to capturing long-range correlations between pixels. However, since the saliency of image pixels does not exhibit sequential dependencies, this somewhat limits Mamba’s ability to fully capture global contextual information during the forward pass. Inspired by multimodal feature fusion learning, we propose a hybrid CNN–Transformer–Mamba architecture, termed FLSSNet. FLSSNet is built upon a CNN and Transformer backbone network, integrating four core submodules to address various technical challenges: (1) The asymmetric dual encoder–decoder (ADED) is capable of simultaneously extracting features from different modalities and systematically modeling both local contextual information and global spatial structure. (2) The Transformer feature converter (TFC) module optimizes the multimodal feature fusion process through feature transformation and channel compression. (3) The long-range correlation attention (LRCA) module enhances CNN’s ability to model long-range dependencies through the collaborative use of convolutional kernels, selective sequential scanning, and attention mechanisms, while effectively suppressing noise interference. (4) The recursive contour refinement (RCR) model refines edge contour information through a layer-by-layer recursive mechanism, achieving greater precision in boundary details. The experimental results show that FLSSNet exhibits outstanding competitiveness among 25 state-of-the-art SOD methods, achieving MAE and Eξ values of 0.04 and 0.973, respectively. Full article
(This article belongs to the Special Issue Ocean Remote Sensing Based on Radar, Sonar and Optical Techniques)
Show Figures

Figure 1

Figure 1
<p>Examples of FLS images containing various noise sources. The red region indicates the ground truth of the salient target, the purple region represents areas with intensity inconsistency, the blue region indicates multipath noise, and the yellow region represents shadow noise.</p>
Full article ">Figure 2
<p>The overall structure of the proposed FLSSNet. The method employs a two-stage strategy: the first stage utilizes an asymmetric dual encoder–decoder structure for saliency feature extraction, while the second stage further refines the feature maps using the recursive refine module. In the first stage, the image is input into two encoders to obtain feature information from different modalities. Simultaneously, the Transformer feature converter module is used to transform and compress information from these modalities. Next, the long-range correlation attention module integrates multi-level features and reduces feature redundancy. Finally, the recursive refine module is employed to further enhance the precision of feature prediction.</p>
Full article ">Figure 3
<p>The overall structure of the proposed Transformer feature converter (TFC) module. The TFC module mainly consists of the residual channel attention module (RCAM) and the multi-scale dual self-attention mechanism module. MHSA stands for multi-head attention mechanism.</p>
Full article ">Figure 4
<p>The overall structure of the proposed long-range correlation attention (LRCA) module. The LRCA module primarily consists of a multi-directional convolution module (MDC) and omnidirectional selective scan module (OSSM), an attention module.</p>
Full article ">Figure 5
<p>The overall structure of the proposed recursive block (<b>a</b>) and recursive contour extraction (RECM) module (<b>b</b>). <span class="html-italic">N</span> denotes the number of RECMs, and <span class="html-italic">m</span> represents the sequence number of recursive blocks in the hierarchy of RCR.</p>
Full article ">Figure 6
<p>Contains different types of samples. (<b>a</b>) Bottle; (<b>b</b>) can; (<b>c</b>) tire; (<b>d</b>) chain; (<b>e</b>) hook; (<b>f</b>) standing bottle; (<b>g</b>) drink carton; (<b>h</b>) shampoo bottle; (<b>i</b>) valve; (<b>j</b>) propeller; (<b>k</b>) wall.</p>
Full article ">Figure 7
<p>Examples of noise in FLS images are shown. The red areas indicate the ground truth. The blue areas represent small targets that are easily lost due to spatial positioning. The yellow areas show shadow noise caused by occlusions. The cyan areas depict scattering noise caused by sound waves encountering suspended particles, bubbles, and other media. The purple areas illustrate pseudo-target noise caused by reflection noise from water waves.</p>
Full article ">Figure 8
<p>The comparison between FLSSNet and its comparison model on the PR curve (<b>a</b>) and F-measure curve (<b>b</b>). Please zoom in to view.</p>
Full article ">Figure 9
<p>Visual display of FLSSNet and comparison models. The red box indicates significant differences.</p>
Full article ">Figure 10
<p>Visualization results of side outputs at different levels of the recursive block.</p>
Full article ">Figure 11
<p>(<b>a</b>,<b>b</b>) Quantitative comparison of the variant models within the CNN–Transformer hybrid backbone architecture in the PR (precision–recall) and F-measure curves.</p>
Full article ">Figure 12
<p>Visualization results of variant models in CNN–Transformer hybrid backbone architecture.</p>
Full article ">Figure 13
<p>The visualization results of a single module in a pure CNN backbone architecture.</p>
Full article ">Figure 14
<p>Visualization of ablation experiments using MDC and OSSM in CNN–Transformer hybrid architecture and pure CNN architecture.</p>
Full article ">Figure 15
<p>In different levels of RCR, the visualization results of <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>l</mi> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>c</mi> </msubsup> </semantics></math> in the second layer of RCEM are presented. Here, <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>s</mi> <mn>1</mn> <mo>,</mo> <mi>s</mi> <mn>2</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>s</mi> <mn>5</mn> <mo>)</mo> </mrow> </semantics></math> represent the hierarchical sequence of RCEM, while <span class="html-italic">L</span> and <span class="html-italic">C</span>, respectively, denote the specific visualization results of <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>l</mi> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>c</mi> </msubsup> </semantics></math>.</p>
Full article ">
17 pages, 4266 KiB  
Article
Hyperspectral Image Change Detection Method Based on the Balanced Metric
by Xintao Liang, Xinling Li, Qingyan Wang, Jiadong Qian and Yujing Wang
Sensors 2025, 25(4), 1158; https://doi.org/10.3390/s25041158 - 13 Feb 2025
Abstract
Change detection, as a popular research direction for dynamic monitoring of land cover change, usually uses hyperspectral remote-sensing images as data sources. Hyperspectral images have rich spatial–spectral information, but traditional change detection methods have limited ability to express the features of hyperspectral images, [...] Read more.
Change detection, as a popular research direction for dynamic monitoring of land cover change, usually uses hyperspectral remote-sensing images as data sources. Hyperspectral images have rich spatial–spectral information, but traditional change detection methods have limited ability to express the features of hyperspectral images, and it is difficult to identify the complex detailed features, semantic features, and spatial–temporal correlation features in two-phase hyperspectral images. Effectively using the abundant spatial and spectral information in hyperspectral images to complete change detection is a challenging task. This paper proposes a hyperspectral image change detection method based on the balanced metric, which uses the spatiotemporal attention module to translate bi-temporal hyperspectral images to the same eigenspace, uses the deep Siamese network structure to extract deep semantic features and shallow spatial features, and measures sample features according to the Euclidean distance. In the training phase, the model is optimized by minimizing the loss of distance maps and label maps. In the testing phase, the prediction map is generated by simple thresholding of distance maps. Experiments show that on the four datasets, the proposed method can achieve a good change detection effect. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Flow block diagram.</p>
Full article ">Figure 2
<p>Attention module of the change detection model.</p>
Full article ">Figure 3
<p>Feature extractor of the change detection model.</p>
Full article ">Figure 4
<p>The datasets used in this paper: (<b>a</b>) Farm dataset; (<b>b</b>) River dataset; (<b>c</b>) Babara dataset; (<b>d</b>) Bayarea dataset.</p>
Full article ">Figure 5
<p>Comparison of change detection effects on Farm dataset, the red box highlights the differences in the detection results. (<b>a</b>) Change detection result of CNN model; (<b>b</b>) change detection result of Siam-Resnet model; (<b>c</b>) change detection result of CSA-net model; (<b>d</b>) change detection result of the method in this paper; (<b>e</b>) ground-truth map.</p>
Full article ">Figure 6
<p>Feature visualization comparison diagram of the three methods. (<b>a</b>) t-SNE feature extracted by CNN method; (<b>b</b>) t-SNE feature extracted by Siam-Resnet method; (<b>c</b>) t-SNE feature extracted by the method in this paper.</p>
Full article ">Figure 7
<p>Measurement feature comparison diagram of the two methods. (<b>a</b>) Measurement feature of Siam-Resnet method; (<b>b</b>) measurement feature of the method in this paper; (<b>c</b>) measurement feature of the label map.</p>
Full article ">Figure 8
<p>Change detection results on different datasets. (<b>a</b>) CNN result on River dataset; (<b>b</b>) Siam-Resnet result on River dataset; (<b>c</b>) this paper’s method’s result on River dataset; (<b>d</b>) ground truth of River dataset; (<b>e</b>) CNN result on Bayarea dataset; (<b>f</b>) Siam-Resnet result on Bayarea dataset; (<b>g</b>) this paper’s method’s result on Bayarea dataset; (<b>h</b>) ground truth of Bayarea dataset; (<b>i</b>) CNN result on Babara dataset; (<b>j</b>) Siam-Resnet result on Babara dataset; (<b>k</b>) this paper’s method’s result on Babara dataset; (<b>l</b>) ground truth of Babara dataset.</p>
Full article ">Figure 9
<p>Change detection performance comparison on three datasets.</p>
Full article ">
26 pages, 27528 KiB  
Article
A Stereo Visual-Inertial SLAM Algorithm with Point-Line Fusion and Semantic Optimization for Forest Environments
by Bo Liu, Hongwei Liu, Yanqiu Xing, Weishu Gong, Shuhang Yang, Hong Yang, Kai Pan, Yuanxin Li, Yifei Hou and Shiqing Jia
Forests 2025, 16(2), 335; https://doi.org/10.3390/f16020335 - 13 Feb 2025
Abstract
Accurately localizing individual trees and identifying species distribution are critical tasks in forestry remote sensing. Visual Simultaneous Localization and Mapping (visual SLAM) algorithms serve as important tools for outdoor spatial positioning and mapping, mitigating signal loss caused by tree canopy obstructions. To address [...] Read more.
Accurately localizing individual trees and identifying species distribution are critical tasks in forestry remote sensing. Visual Simultaneous Localization and Mapping (visual SLAM) algorithms serve as important tools for outdoor spatial positioning and mapping, mitigating signal loss caused by tree canopy obstructions. To address these challenges, a semantic SLAM algorithm called LPD-SLAM (Line-Point-Distance Semantic SLAM) is proposed, which integrates stereo cameras with an inertial measurement unit (IMU), with contributions including dynamic feature removal, an individual tree data structure, and semantic point distance constraints. LPD-SLAM is capable of performing individual tree localization and tree species discrimination tasks in forest environments. In mapping, LPD-SLAM reduces false species detection and filters dynamic objects by leveraging a deep learning model and a novel individual tree data structure. In optimization, LPD-SLAM incorporates point and line feature reprojection error constraints along with semantic point distance constraints, which improve robustness and accuracy by introducing additional geometric constraints. Due to the lack of publicly available forest datasets, we choose to validate the proposed algorithm on eight experimental plots, which are selected to cover different seasons, various tree species, and different data collection paths, ensuring the dataset’s diversity and representativeness. The experimental results indicate that the average root mean square error (RMSE) of the trajectories of LPD-SLAM is reduced by up to 81.2% compared with leading algorithms. Meanwhile, the mean absolute error (MAE) of LPD-SLAM in tree localization is 0.24 m, which verifies its excellent performance in forest environments. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>System framework.</p>
Full article ">Figure 2
<p>Example of real-time system operation.</p>
Full article ">Figure 3
<p>Data collection equipment.</p>
Full article ">Figure 4
<p>Generation of the semantic segmentation mask.</p>
Full article ">Figure 5
<p>Semantic feature extraction.</p>
Full article ">Figure 6
<p>Stereo vision geometry.</p>
Full article ">Figure 7
<p>Extraction of stereo point and line features.</p>
Full article ">Figure 8
<p>Establishment of global individual tree database.</p>
Full article ">Figure 9
<p>Postex multi-functional tree measurement system.</p>
Full article ">Figure 10
<p>TSI acquisition of ground truth trajectory data.</p>
Full article ">Figure 11
<p>Experimental data.</p>
Full article ">Figure 12
<p>Individual tree localization coordinate comparison on 8 experimental plots.</p>
Full article ">Figure 12 Cont.
<p>Individual tree localization coordinate comparison on 8 experimental plots.</p>
Full article ">Figure 13
<p>Trajectory comparison of different algorithms on 8 experimental plots.</p>
Full article ">Figure 13 Cont.
<p>Trajectory comparison of different algorithms on 8 experimental plots.</p>
Full article ">Figure 13 Cont.
<p>Trajectory comparison of different algorithms on 8 experimental plots.</p>
Full article ">
25 pages, 13626 KiB  
Article
Fine-Tuning LLM-Assisted Chinese Disaster Geospatial Intelligence Extraction and Case Studies
by Yaoyao Han, Jiping Liu, An Luo, Yong Wang and Shuai Bao
ISPRS Int. J. Geo-Inf. 2025, 14(2), 79; https://doi.org/10.3390/ijgi14020079 - 11 Feb 2025
Abstract
The extraction of disaster geospatial intelligence (DGI) from social media data with spatiotemporal attributes plays a crucial role in real-time disaster monitoring and emergency decision-making. However, conventional machine learning approaches struggle with semantic complexity and limited Chinese disaster corpus. Recent advancements in large [...] Read more.
The extraction of disaster geospatial intelligence (DGI) from social media data with spatiotemporal attributes plays a crucial role in real-time disaster monitoring and emergency decision-making. However, conventional machine learning approaches struggle with semantic complexity and limited Chinese disaster corpus. Recent advancements in large language models (LLMs) offer new opportunities to overcome these challenges due to their enhanced semantic comprehension and multi-task learning capabilities. This study investigates the potential application of LLMs in disaster intelligence extraction and proposes an efficient, scalable method for multi-hazard DGI extraction. Building upon a unified ontological framework encompassing core natural disaster elements, this method employs parameter-efficient low-rank adaptation (LoRA) fine-tuning to optimize open-source Chinese LLMs using a meticulously curated instruction-tuning dataset. It achieves simultaneous identification of multi-hazard intelligence cues and extraction of disaster spatial entity attributes from unstructured Chinese social media texts through unified semantic parsing and structured knowledge mapping. Compared to pre-trained models such as BERT and ERNIE, the proposed method was shown to achieve state-of-the-art evaluation results, with the highest recognition accuracy (F1-score: 0.9714) and the best performance in structured information generation (BLEU-4 score: 92.9649). Furthermore, we developed and released DGI-Corpus, a Chinese instruction-tuning dataset covering various disaster types, to support the research and application of LLMs in this field. Lastly, the proposed method was applied to analyze the spatiotemporal evolution patterns of the Zhengzhou “7.20” flood disaster. This study enhances the efficiency of natural disaster monitoring and emergency management, offering technical support for disaster response and mitigation decision-making. Full article
Show Figures

Figure 1

Figure 1
<p>Overall methodological flowchart.</p>
Full article ">Figure 2
<p>Disaster information ontology model.</p>
Full article ">Figure 3
<p>Intelligence clue instruction data example. The blue boxes denote the corresponding English translations of the Chinese texts.</p>
Full article ">Figure 4
<p>Implementation process of LoRA fine-tuning.</p>
Full article ">Figure 5
<p>Case analysis area.</p>
Full article ">Figure 6
<p>Temporal variations of rainfall, tweet volume, and DGI volume. In subfigures (<b>A</b>–<b>F</b>), rainfall is represented by blue bar charts, while the orange curve illustrates fluctuations in tweet volume. Other colored curves in the subplots depict cumulative changes in DGI volume under 24-h interval conditions.</p>
Full article ">Figure 7
<p>Spatiotemporal evolution patterns of DGI hotspots at 12-h intervals.</p>
Full article ">Figure 8
<p>Spatial overlay analysis of intelligence hotspots, waterlogging points, and collapse points.</p>
Full article ">Figure 9
<p>The spatiotemporal distribution of the public’s emergency demands from 20 July to 22 July 2021.</p>
Full article ">
18 pages, 4325 KiB  
Article
Hybrid U-Net Model with Visual Transformers for Enhanced Multi-Organ Medical Image Segmentation
by Pengsong Jiang, Wufeng Liu, Feihu Wang and Renjie Wei
Information 2025, 16(2), 111; https://doi.org/10.3390/info16020111 - 6 Feb 2025
Abstract
Medical image segmentation is an essential process that facilitates the precise extraction and localization of diseased areas from medical pictures. It can provide clear and quantifiable information to support clinicians in making final decisions. However, due to the lack of explicit modeling of [...] Read more.
Medical image segmentation is an essential process that facilitates the precise extraction and localization of diseased areas from medical pictures. It can provide clear and quantifiable information to support clinicians in making final decisions. However, due to the lack of explicit modeling of global relationships in CNNs, they are unable to fully use the long-range dependencies among several image locations. In this paper, we propose a novel model that can extract local and global semantic features from the images by utilizing CNN and the visual transformer in the encoder. It is important to note that the self-attention mechanism treats a 2D image as a 1D sequence of patches, which can potentially disrupt the image’s inherent 2D spatial structure. Therefore, we utilized the structure of the transformer using visual attention and large kernel attention, and we added a residual convolutional attention module (RCAM) and multi-scale fusion convolution (MFC) into the decoder. They can help the model better capture crucial features and fine details to improve detail and accuracy of segmentation effects. On the synapse multi-organ segmentation (Synapse) and the automated cardiac diagnostic challenge (ACDC) datasets, our model performed better than the previous models, demonstrating that it is more precise and robust in multi-organ medical image segmentation. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Overview of the proposed network structure.</p>
Full article ">Figure 2
<p>Layer schematic of visual transformers.</p>
Full article ">Figure 3
<p>Configuration of the residual convolutional attention module.</p>
Full article ">Figure 4
<p>Multi-scale fusion convolution, where (<b>a</b>) is normal convolution, (<b>b</b>,<b>c</b>) represent dilated convolutions with dilation rates of 2 and 3, respectively.</p>
Full article ">Figure 5
<p>Comparison of model predictions on Synapse dataset.</p>
Full article ">Figure 6
<p>Comparison of model predictions on the ACDC dataset.</p>
Full article ">Figure 7
<p>Loss curves and DSC curves for training and validation on the ACDC dataset. (<b>a</b>) Loss curves for training and validation. (<b>b</b>) DSC curves for training and validation.</p>
Full article ">
17 pages, 3362 KiB  
Article
Truck Lifting Accident Detection Method Based on Improved PointNet++ for Container Terminals
by Yang Shen, Xintai Man, Jiaqi Wang, Yujie Zhang and Chao Mi
J. Mar. Sci. Eng. 2025, 13(2), 256; https://doi.org/10.3390/jmse13020256 - 30 Jan 2025
Abstract
In container terminal operations, truck lifting accidents pose a serious threat to the safety and efficiency of automated equipment. Traditional detection methods using visual cameras and single-line Light Detection and Ranging (LiDAR) are insufficient for capturing three-dimensional spatial features, leading to reduced detection [...] Read more.
In container terminal operations, truck lifting accidents pose a serious threat to the safety and efficiency of automated equipment. Traditional detection methods using visual cameras and single-line Light Detection and Ranging (LiDAR) are insufficient for capturing three-dimensional spatial features, leading to reduced detection accuracy. Moreover, the boundary features of key accident objects, such as containers, truck chassis, and wheels, are often blurred, resulting in frequent false and missed detections. To tackle these challenges, this paper proposes an accident detection method based on multi-line LiDAR and an improved PointNet++ model. This method uses multi-line LiDAR to collect point cloud data from operational lanes in real time and enhances the PointNet++ model by integrating a multi-layer perceptron (MLP) and a mixed attention mechanism (MAM), optimizing the model’s ability to extract local and global features. This results in high-precision semantic segmentation and accident detection of critical structural point clouds, such as containers, truck chassis, and wheels. Experiments confirm that the proposed method achieves superior performance compared to the current mainstream algorithms regarding point cloud segmentation accuracy and stability. In engineering tests across various real-world conditions, the model exhibits strong generalization capability. Full article
(This article belongs to the Special Issue Sustainable Maritime Transport and Port Intelligence)
Show Figures

Figure 1

Figure 1
<p>Nine different types of truck lifting accidents.</p>
Full article ">Figure 2
<p>LiDAR installation and data collection. (<b>a</b>) Shows LiDAR installation; (<b>b</b>) shows a point cloud of a container being lifted as captured by the LiDAR.</p>
Full article ">Figure 3
<p>Truck lifting accident detection process.</p>
Full article ">Figure 4
<p>Improved PointNet++ network structure.</p>
Full article ">Figure 5
<p>Multi-layer perceptron.</p>
Full article ">Figure 6
<p>Feature extraction module based on mixed attention mechanism.</p>
Full article ">Figure 7
<p>Channel attention mechanism.</p>
Full article ">Figure 8
<p>Self-attention mechanism.</p>
Full article ">Figure 9
<p>Point cloud dataset collection process. (<b>a</b>) Shows the collection process for a truck lifting accident involving a 20-foot container; (<b>b</b>) shows the collection process for a truck lifting accident involving a 40-foot container.</p>
Full article ">Figure 10
<p>Visualization of point cloud segmentation results. (<b>a</b>) Shows a container remaining stationary without being lifted; (<b>b</b>) shows a container being lifted successfully under normal conditions; (<b>c</b>) shows a lifting accident occurring with the front lock engaged; (<b>d</b>) shows a lifting accident occurring with the rear lock engaged.</p>
Full article ">
17 pages, 5511 KiB  
Article
Semantic-Guided Transformer Network for Crop Classification in Hyperspectral Images
by Weiqiang Pi, Tao Zhang, Rongyang Wang, Guowei Ma, Yong Wang and Jianmin Du
J. Imaging 2025, 11(2), 37; https://doi.org/10.3390/jimaging11020037 - 26 Jan 2025
Abstract
The hyperspectral remote sensing images of agricultural crops contain rich spectral information, which can provide important details about crop growth status, diseases, and pests. However, existing crop classification methods face several key limitations when processing hyperspectral remote sensing images, primarily in the following [...] Read more.
The hyperspectral remote sensing images of agricultural crops contain rich spectral information, which can provide important details about crop growth status, diseases, and pests. However, existing crop classification methods face several key limitations when processing hyperspectral remote sensing images, primarily in the following aspects. First, the complex background in the images. Various elements in the background may have similar spectral characteristics to the crops, and this spectral similarity makes the classification model susceptible to background interference, thus reducing classification accuracy. Second, the differences in crop scales increase the difficulty of feature extraction. In different image regions, the scale of crops can vary significantly, and traditional classification methods often struggle to effectively capture this information. Additionally, due to the limitations of spectral information, especially under multi-scale variation backgrounds, the extraction of crop information becomes even more challenging, leading to instability in the classification results. To address these issues, a semantic-guided transformer network (SGTN) is proposed, which aims to effectively overcome the limitations of these deep learning methods and improve crop classification accuracy and robustness. First, a multi-scale spatial–spectral information extraction (MSIE) module is designed that effectively handle the variations of crops at different scales in the image, thereby extracting richer and more accurate features, and reducing the impact of scale changes. Second, a semantic-guided attention (SGA) module is proposed, which enhances the model’s sensitivity to crop semantic information, further reducing background interference and improving the accuracy of crop area recognition. By combining the MSIE and SGA modules, the SGTN can focus on the semantic features of crops at multiple scales, thus generating more accurate classification results. Finally, a two-stage feature extraction structure is employed to further optimize the extraction of crop semantic features and enhance classification accuracy. The results show that on the Indian Pines, Pavia University, and Salinas benchmark datasets, the overall accuracies of the proposed model are 98.24%, 98.34%, and 97.89%, respectively. Compared with other methods, the model achieves better classification accuracy and generalization performance. In the future, the SGTN is expected to be applied to more agricultural remote sensing tasks, such as crop disease detection and yield prediction, providing more reliable technical support for precision agriculture and agricultural monitoring. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

Figure 1
<p>The SGTN model framework, where the symbol ⊗ represents element-wise multiplication.</p>
Full article ">Figure 2
<p>Illustration of the MSIE module, where the symbol ⊕ denotes element-wise summation.</p>
Full article ">Figure 3
<p>Illustration of the SGA module.</p>
Full article ">Figure 4
<p>False-color image and ground truth for the IP dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth.</p>
Full article ">Figure 5
<p>False-color image and ground truth for the SA dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth.</p>
Full article ">Figure 6
<p>False-color image and ground truth for the PU dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth.</p>
Full article ">Figure 7
<p>Experimental results for different spatial sizes on the three datasets: (<b>a</b>) IP. (<b>b</b>) SA. (<b>c</b>) PU.</p>
Full article ">Figure 8
<p>Classification result maps of different models on the IP dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth. (<b>c</b>) DFFN. (<b>d</b>) HSST. (<b>e</b>) SSAN. (<b>f</b>) SSFTTnet. (<b>g</b>) MASSFormer. (<b>h</b>) MSSTT. (<b>i</b>) SGTN.</p>
Full article ">Figure 9
<p>Classification result maps of different models on the SA dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth. (<b>c</b>) DFFN. (<b>d</b>) HSST. (<b>e</b>) SSAN. (<b>f</b>) SSFTTnet. (<b>g</b>) MASSFormer. (<b>h</b>) MSSTT. (<b>i</b>) SGTN.</p>
Full article ">Figure 10
<p>Classification result maps of different models on the PU dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth. (<b>c</b>) DFFN. (<b>d</b>) HSST. (<b>e</b>) SSAN. (<b>f</b>) SSFTTnet. (<b>g</b>) MASSFormer. (<b>h</b>) MSSTT. (<b>i</b>) SGTN.</p>
Full article ">
22 pages, 11693 KiB  
Article
Development of Navigation Network Models for Indoor Path Planning Using 3D Semantic Point Clouds
by Jiwei Hou, Patrick Hübner and Dorota Iwaszczuk
Appl. Sci. 2025, 15(3), 1151; https://doi.org/10.3390/app15031151 - 23 Jan 2025
Viewed by 517
Abstract
Accurate and efficient path planning in indoor environments relies on high-quality navigation networks that faithfully represent the spatial and semantic structure of the environment. Three-dimensional semantic point clouds provide valuable spatial and semantic information for navigation tasks. However, extracting detailed navigation networks from [...] Read more.
Accurate and efficient path planning in indoor environments relies on high-quality navigation networks that faithfully represent the spatial and semantic structure of the environment. Three-dimensional semantic point clouds provide valuable spatial and semantic information for navigation tasks. However, extracting detailed navigation networks from 3D semantic point clouds remains a challenge, especially in complex indoor spaces like staircases and multi-floor environments. This study presents a comprehensive framework for developing and extracting robust navigation network models, specifically designed for indoor path planning applications. The main contributions include (1) a preprocessing pipeline that ensures high accuracy and consistency of the input semantic point cloud data; (2) a moving window algorithm for refined node extraction in staircases to enable seamless navigation across vertical spaces; and (3) a lightweight, JSON-based storage structure for efficient network representation and integration. Additionally, we presented a more comprehensive sub-node extraction method for hallways to enhance network continuity. We validated the method using two datasets—the public S3DIS dataset and the self-collected HoloLens 2 dataset—and demonstrated its effectiveness through Dijkstra-based path planning. The generated navigation networks supported practical scenarios such as wheelchair-accessible path planning and seamless multi-floor navigation. These findings highlight the practical value of our approach for modern indoor navigation systems, with potential applications in smart building management, robotics, and emergency response. Full article
(This article belongs to the Special Issue Current Research in Indoor Positioning and Localization)
Show Figures

Figure 1

Figure 1
<p>The pipeline for extracting JSON-encoded navigation network for indoor path planning from 3D semantic point clouds.</p>
Full article ">Figure 2
<p>The illustration of a composite hallway converted into multiple simple, straight hallways without turns.</p>
Full article ">Figure 3
<p>The illustration of the extracted staircase and elevator.</p>
Full article ">Figure 4
<p>The illustration of the extracted doors and virtual doors from the hallway.</p>
Full article ">Figure 5
<p>An example of the file tree structure after data preprocessing.</p>
Full article ">Figure 6
<p>An illustration of the moving window approach for staircase nodes extraction. The red arrow indicates the moving direction along the <span class="html-italic">Z</span>-axis: (<b>a</b>) side view, (<b>b</b>) top view.</p>
Full article ">Figure 7
<p>Principal component analysis (PCA) of the hallway point cloud: red arrow for the main direction, yellow for the secondary.</p>
Full article ">Figure 8
<p>An illustration of multiple sub-node extraction for longer hallways.</p>
Full article ">Figure 9
<p>An example of extracted data structure in JSON format.</p>
Full article ">Figure 10
<p>The three areas of the S3DIS dataset used in this study: (<b>a</b>) Area_1, (<b>b</b>) Area_4, and (<b>c</b>) Area_6. All images are screenshots from the original dataset and the ceilings are removed for a better visualization.</p>
Full article ">Figure 11
<p>The self-collected 3D data and its corresponding HoloLens 2 movement trajectory: (<b>a</b>) floor 0, (<b>b</b>) floor 1, (<b>c</b>) floor 2, (<b>d</b>) floor 3, and (<b>e</b>) floor 4. The trajectory starts at red and ends at blue.</p>
Full article ">Figure 12
<p>Color-labeled visualization of basic indoor units of our self-collected data: (<b>a</b>) floor 0, (<b>b</b>) floor 1, (<b>c</b>) floor 2, (<b>d</b>) floor 3, and (<b>e</b>) floor 4. In this case, stairwells are depicted in orange, elevator lobbies in yellow, hallways in blue, and kitchens in green.</p>
Full article ">Figure 13
<p>Indoor navigation elements were extracted from various spaces in our self-collected dataset, with floor 1 as an example.</p>
Full article ">Figure 14
<p>The results of the extracted indoor navigation network with S3DIS dataset: (<b>a</b>) Area_1, (<b>b</b>) Area_4, and (<b>c</b>) Area_6.</p>
Full article ">Figure 15
<p>The results of the extracted indoor navigation network with self-collected dataset: (<b>a</b>) floor 0, (<b>b</b>) floor 1, (<b>c</b>) floor 2, and (<b>d</b>) floor 3, (<b>e</b>) floor 4.</p>
Full article ">Figure 15 Cont.
<p>The results of the extracted indoor navigation network with self-collected dataset: (<b>a</b>) floor 0, (<b>b</b>) floor 1, (<b>c</b>) floor 2, and (<b>d</b>) floor 3, (<b>e</b>) floor 4.</p>
Full article ">Figure 16
<p>The results of the shortest path planning for the single floor navigation network extracted from the S3DIS dataset: (<b>a</b>) Area_1; (<b>b</b>) Area_4; (<b>c</b>) Area_6. The route starts at purple and ends at red.</p>
Full article ">Figure 17
<p>The results of the shortest path planning across multi-floor for the navigation network extracted from our self-collected dataset: (<b>a</b>) without barrier-free access limitation; (<b>b</b>) with barrier-free access limitation; (<b>c</b>) staircase-only route. The route starts at purple and ends at red.</p>
Full article ">
18 pages, 1801 KiB  
Article
Bi-Att3DDet: Attention-Based Bi-Directional Fusion for Multi-Modal 3D Object Detection
by Xu Gao, Yaqian Zhao, Yanan Wang, Jiandong Shang, Chunmin Zhang and Gang Wu
Sensors 2025, 25(3), 658; https://doi.org/10.3390/s25030658 - 23 Jan 2025
Viewed by 244
Abstract
Currently, multi-modal 3D object detection methods have become a key area of research in the field of autonomous driving. Fusion is an essential factor affecting performance in multi-modal object detection. However, previous methods still suffer from the inability to effectively fuse features from [...] Read more.
Currently, multi-modal 3D object detection methods have become a key area of research in the field of autonomous driving. Fusion is an essential factor affecting performance in multi-modal object detection. However, previous methods still suffer from the inability to effectively fuse features from LiDAR and RGB images, resulting in a low utilization rate of complementary information between depth and semantic texture features. At the same time, existing methods may not adequately capture the structural information in Region of Interest (RoI) features when extracting them. Structural information plays a crucial role in RoI features. It encompasses the position, size, and orientation of objects, as well as the relative positions and spatial relationships between objects. Its absence can result in false or missed detections. To solve the above problems, we propose a multi-modal sensor fusion network, Bi-Att3DDet, which mainly consists of a Self-Attentive RoI Feature Extraction module (SARoIFE) and a Feature Bidirectional Interactive Fusion module (FBIF). Specifically, SARoIFE captures the relationship between different positions in RoI features to obtain high-quality RoI features through the self-attention mechanism. SARoIFE prepares for the fusion stage. FBIF performs bidirectional interaction between LiDAR and pseudo RoI features to make full use of the complementary information. We perform comprehensive experiments on the KITTI dataset, and our method notably demonstrates a 1.55% improvement in the hard difficulty level and a 0.19% improvement in the mean Average Precision (mAP) metric on the test dataset. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

Figure 1
<p>The existing methods have the problems of missing detection in complex scenes and the easy mis-detection of distant objects. The top two lines and bottom depict the ground truth and the detected results of SFD [<a href="#B19-sensors-25-00658" class="html-bibr">19</a>], respectively. The light blue ellipse shows the false detection and the yellow one gives the missed object. (<b>a</b>) False detection. (<b>b</b>) Missed detection.</p>
Full article ">Figure 2
<p>LiDAR–camera fusion strategy. (<b>a</b>) Serial fusion: limited by LiDAR sensors. (<b>b</b>) Parallel fusion: dual streams work independently. (<b>c</b>) Bidirectional interactive fusion (ours): intermodal bidirectional interactions make full use of complementary information.</p>
Full article ">Figure 3
<p>Bi-Att3DDet framework.</p>
Full article ">Figure 4
<p>Illustration of self-attentive RoI feature extraction.</p>
Full article ">Figure 5
<p>Illustration of Feature Bidirectional Interactive Fusion.</p>
Full article ">Figure 6
<p>Comparison of visualization results between our method and theSFD method on KITTI. These images are the ground truth, SFD result, and Bi-Att3DDet (ours) result from left to right; the first three rows are false detection and the last two are missed detection in order from top to bottom. In the 3D detection result, the green box is the ground truth box and the red box is the prediction box. The blue circles denote false positives, and the yellow circles indicate undetected objects.</p>
Full article ">
17 pages, 5264 KiB  
Article
Automated Road Extraction from Satellite Imagery Integrating Dense Depthwise Dilated Separable Spatial Pyramid Pooling with DeepLabV3+
by Arpan Mahara, Md Rezaul Karim Khan, Liangdong Deng, Naphtali Rishe, Wenjia Wang and Seyed Masoud Sadjadi
Appl. Sci. 2025, 15(3), 1027; https://doi.org/10.3390/app15031027 - 21 Jan 2025
Viewed by 400
Abstract
Road extraction is a sub-domain of remote sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field [...] Read more.
Road extraction is a sub-domain of remote sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field is needed. Convolutional neural networks (CNNs), especially the DeepLab series known for its proficiency in semantic segmentation due to its efficiency in interpreting multi-scale objects’ features, address some of these challenges caused by the varying nature of roads. The present work proposes the utilization of DeepLabV3+, the latest version of the DeepLab series, by introducing an innovative Dense Depthwise Dilated Separable Spatial Pyramid Pooling (DenseDDSSPP) module and integrating it in the place of the conventional Atrous Spatial Pyramid Pooling (ASPP) module. This modification enhances the extraction of complex road structures from satellite images. This study hypothesizes that the integration of DenseDDSSPP with a CNN backbone network and a Squeeze-and-Excitation block will generate an efficient dense feature map by focusing on relevant features, leading to more precise and accurate road extraction from remote sensing images. The Results Section presents a comparison of our model’s performance against state-of-the-art models, demonstrating better results that highlight the effectiveness and success of the proposed approach. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Depiction of different convolutions.</p>
Full article ">Figure 2
<p>Comparison of architectural designs: ASPP vs. DenseDDSSPP modules.</p>
Full article ">Figure 3
<p>Architecture of DeepLabV3+ with DenseDDSSPP module. The SE block in the decoder operates on a tensor of dimensions 1*1*1792, where 1*1 represents the spatial dimensions and 1792 denotes the number of channels.</p>
Full article ">Figure 4
<p>A visualization of Xception’s layers. The dotted line indicates the omission of intermediate layers for conciseness.</p>
Full article ">Figure 5
<p>Squeeze-and-Excitation block.</p>
Full article ">Figure 6
<p><b>Comparative results of road extraction from the Massachusetts dataset</b>. The figure presents a side-by-side comparison of road extracted by various models, including the proposed model, against the ground truth, highlighting the effectiveness of each approach in synthesizing accurate road extraction.</p>
Full article ">Figure 7
<p><b>Comparative results of road extraction from the DeepGlobe road dataset</b>. The figure presents a side-by-side comparison of road extracted by various models, including the proposed model, against the ground truth.</p>
Full article ">
28 pages, 13922 KiB  
Article
Multi-Class Guided GAN for Remote-Sensing Image Synthesis Based on Semantic Labels
by Zhenye Niu, Yuxia Li, Yushu Gong, Bowei Zhang, Yuan He, Jinglin Zhang, Mengyu Tian and Lei He
Remote Sens. 2025, 17(2), 344; https://doi.org/10.3390/rs17020344 - 20 Jan 2025
Viewed by 434
Abstract
In the scenario of limited labeled remote-sensing datasets, the model’s performance is constrained by the insufficient availability of data. Generative model-based data augmentation has emerged as a promising solution to this limitation. While existing generative models perform well in natural scene domains (e.g., [...] Read more.
In the scenario of limited labeled remote-sensing datasets, the model’s performance is constrained by the insufficient availability of data. Generative model-based data augmentation has emerged as a promising solution to this limitation. While existing generative models perform well in natural scene domains (e.g., faces and street scenes), their performance in remote sensing is hindered by severe data imbalance and the semantic similarity among land-cover classes. To tackle these challenges, we propose the Multi-Class Guided GAN (MCGGAN), a novel network for generating remote-sensing images from semantic labels. Our model features a dual-branch architecture with a global generator that captures the overall image structure and a multi-class generator that improves the quality and differentiation of land-cover types. To integrate these generators, we design a shared-parameter encoder for consistent feature encoding across two branches, and a spatial decoder that synthesizes outputs from the class generators, preventing overlap and confusion. Additionally, we employ perceptual loss (LVGG) to assess perceptual similarity between generated and real images, and texture matching loss (LT) to capture fine texture details. To evaluate the quality of image generation, we tested multiple models on two custom datasets (one from Chongzhou, Sichuan Province, and another from Wuzhen, Zhejiang Province, China) and a public dataset LoveDA. The results show that MCGGAN achieves improvements of 52.86 in FID, 0.0821 in SSIM, and 0.0297 in LPIPS compared to the Pix2Pix baseline. We also conducted comparative experiments to assess the semantic segmentation accuracy of the U-Net before and after incorporating the generated images. The results show that data augmentation with the generated images leads to an improvement of 4.47% in FWIoU and 3.23% in OA across the Chongzhou and Wuzhen datasets. Experiments show that MCGGAN can be effectively used as a data augmentation approach to improve the performance of downstream remote-sensing image segmentation tasks. Full article
Show Figures

Figure 1

Figure 1
<p>The network structure of MCGGAN.</p>
Full article ">Figure 2
<p>The structure of shared-parameter encoder.</p>
Full article ">Figure 3
<p>The module structure of the multi-class generator.</p>
Full article ">Figure 4
<p>The three datasets used for MCGGAN.</p>
Full article ">Figure 5
<p>The schematic diagram of ablation experiment plan.</p>
Full article ">Figure 6
<p>The ablation experiment on the Chongzhou dataset.</p>
Full article ">Figure 7
<p>The ablation experiment on the Wuzhen dataset.</p>
Full article ">Figure 8
<p>The loss function for the three dual-branch models in ablation experiments.</p>
Full article ">Figure 9
<p>The partial DBGAN-generated images: left, semantic label; middle, generated image; right, real image. (<b>a</b>) Chongzhou and (<b>b</b>) Wuzhen.</p>
Full article ">Figure 10
<p>The generated results for the Chongzhou dataset.</p>
Full article ">Figure 11
<p>The generated results for the Wuzhen dataset.</p>
Full article ">Figure 12
<p>The generated results for the LoveDA dataset.</p>
Full article ">Figure 13
<p>CAM visualization results of different U-Net layers in real remote-sensing images. I, II, and III, respectively, represent the visual results of the second downsampling module, the bottleneck layer between the encoder and decoder, and the output features of the last upsampling module.</p>
Full article ">Figure 14
<p>CAM visualization results of different U-Net layers in generated images. I, II, and III, respectively, represent the visual results of the second downsampling module, the bottleneck layer between the encoder and decoder, and the output features of the last upsampling module.</p>
Full article ">
28 pages, 21353 KiB  
Article
ThermalGS: Dynamic 3D Thermal Reconstruction with Gaussian Splatting
by Yuxiang Liu, Xi Chen, Shen Yan, Zeyu Cui, Huaxin Xiao, Yu Liu and Maojun Zhang
Remote Sens. 2025, 17(2), 335; https://doi.org/10.3390/rs17020335 - 19 Jan 2025
Viewed by 720
Abstract
Thermal infrared (TIR) images capture temperature in a non-invasive manner, making them valuable for generating 3D models that reflect the spatial distribution of thermal properties within a scene. Current TIR image-based 3D reconstruction methods primarily focus on static conditions, which only capture the [...] Read more.
Thermal infrared (TIR) images capture temperature in a non-invasive manner, making them valuable for generating 3D models that reflect the spatial distribution of thermal properties within a scene. Current TIR image-based 3D reconstruction methods primarily focus on static conditions, which only capture the spatial distribution of thermal radiation but lack the ability to represent its temporal dynamics. The absence of dedicated datasets and effective methods for dynamic 3D representation are two key challenges that hinder progress in this field. To address these challenges, we propose a novel dynamic thermal 3D reconstruction method, named ThermalGS, based on 3D Gaussian Splatting (3DGS). ThermalGS employs a data-driven approach to directly learn both scene structure and dynamic thermal representation, using RGB and TIR images as input. The position, orientation, and scale of Gaussian primitives are guided by the RGB mesh. We introduce feature encoding and embedding networks to integrate semantic and temporal information into the Gaussian primitives, allowing them to capture dynamic thermal radiation characteristics. Moreover, we construct the Thermal Scene Day-and-Night (TSDN) dataset, which includes multi-view, high-resolution aerial RGB reference images and TIR images captured at five different times throughout the day and night, providing a benchmark for dynamic thermal 3D reconstruction tasks. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on the TSDN dataset, with an average absolute temperature error of 1 °C and the ability to predict surface temperature variations over time. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>UAV platform and cameras. DJI Matrice 300 RTK (<b>left</b>), SHARE PSDK 102S-V3 (<b>center</b>) and DJI H20T (<b>right</b>).</p>
Full article ">Figure 2
<p>Route planning. The (<b>left side</b>) shows the acquisition routes for RGB images and the (<b>right side</b>) shows the thermal image acquisition routes.</p>
Full article ">Figure 3
<p>Reference model visualization. (<b>a</b>) is a textured mesh mode, (<b>b</b>) is a non-textured mesh model.</p>
Full article ">Figure 4
<p>Comparison of sample images. (<b>a</b>) shows the original R-JPG image, where the gray scale represents pixel values ranging from 0 to 255. (<b>b</b>) displays the normalized radiative temperature image. (<b>c</b>) illustrates the radiative temperature image with the region of interest (ROI) extracted, where the background is set to transparent. In all panels, the shades of gray represent a temperature range of −15 to 25 degrees Celsius.</p>
Full article ">Figure 5
<p><b>Overview</b>. We use RGB images and the mesh models reconstructed from them as the initial input, generating initial 3D Gaussians from each triangle on the mesh and assigning them semantic features with the help of SAM [<a href="#B61-remotesensing-17-00335" class="html-bibr">61</a>]. Then, our carefully designed MLP transforms the semantic features into radiation features and integrates them with spherical harmonics, enabling the learning of thermal radiation behavior on different material surfaces. The embedding of time features further endows the 3D Gaussians with the ability to change dynamically. Finally, we reconstruct the entire dynamic scene by minimizing the differences between the rendered images at different times and their corresponding ground truth (GT).</p>
Full article ">Figure 6
<p>Visualization of Gaussian semantic features. (<b>a</b>) is the RGB texture rendering image, and (<b>b</b>) is the visualization result of the semantic feature vector from the corresponding view.</p>
Full article ">Figure 7
<p>Example comparison of thermal renderings and GT at different time periods in static experiments. The red and green squares in each image correspond to the areas displayed in the magnified view at the bottom.</p>
Full article ">Figure 8
<p>Visual comparison of the rendering quality of our method and existing methods on the Cross-Time (as show in <a href="#remotesensing-17-00335-t002" class="html-table">Table 2</a>) dataset. The red and green squares in each image correspond to the areas displayed in the magnified view at the bottom.</p>
Full article ">Figure 9
<p>Continuous variation visualization. The images are arranged from left to right and top to bottom, illustrating the continuous changes in thermal radiation across 84 time nodes from 10:35 a.m. to 12:44 a.m. the following day. To enhance the visibility of these changes, we transformed the rendered grayscale images into pseudo-color.</p>
Full article ">Figure 10
<p>Visualization of the comparison between the predicted temperature and the temperature error. The left column displays the ground truth (GT) temperature images, the right column shows the predicted temperature images, and the middle column visualizes the error between the GT and predicted temperatures. The error is normalized using the min-max method and mapped to a red-blue color scale, where regions closer to red represent larger absolute temperature errors, and regions closer to blue represent smaller errors.</p>
Full article ">
36 pages, 6278 KiB  
Review
Toward a Sense of Place Unified Conceptual Framework Based on a Narrative Review: A Way of Feeding Place-Based GIS
by Ahmed Rezeg, Stéphane Roche and Emmanuel Eveno
Land 2025, 14(1), 170; https://doi.org/10.3390/land14010170 - 15 Jan 2025
Viewed by 539
Abstract
Space and place are two of the main concepts in several fields of knowledge, such as human geography, environmental psychology, urban sociology, architecture, urban planning, and others. Space is an objective and structured concept. It is mainly a physical location characterized by measured [...] Read more.
Space and place are two of the main concepts in several fields of knowledge, such as human geography, environmental psychology, urban sociology, architecture, urban planning, and others. Space is an objective and structured concept. It is mainly a physical location characterized by measured dimensions and geographical coordinates, while place is a location that holds meaning and value for an individual or a group, created through the human experience and social interactions. Sense of place is thus a set of precepted meanings and attitudinal ties toward a place (conative, affective, and cognitive bonds). From a geospatial perspective, the subjective aspect of sense of place is difficult to depict in a cartographic projection. From this angle, Place-Based Geographic Information Systems represent a set of initiatives that attempt to combine the objectivity of spaces and the subjectivity of places in digital systems, and that integrate spatial semantic characteristics as described by places’ users. In this paper, the methodological approach is mainly based on a systematic analysis and search of the scientific literature. It is a narrative review inspired and based on a scoping review strategy following the JBI methodology and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analysis Extension for Scoping Reviews) checklist. This bibliographic analysis is about understanding the characterization and the components of sense of place. Results take the form of a synthesis of the conceptual approaches most prominent in the literature, in addition to a conceptual model encompassing the full conceptual specificities of the sense of place concept. Full article
(This article belongs to the Special Issue Place-Based Urban Planning)
Show Figures

Figure 1

Figure 1
<p>PRISMA diagram [<a href="#B44-land-14-00170" class="html-bibr">44</a>] demonstrating the study selection approach.</p>
Full article ">Figure 2
<p>The four types of placemaking [<a href="#B75-land-14-00170" class="html-bibr">75</a>].</p>
Full article ">Figure 3
<p>The place diagram [<a href="#B53-land-14-00170" class="html-bibr">53</a>].</p>
Full article ">Figure 4
<p>Examples of Airbnb’s online place-based experiences. (<b>A</b>) A virtual visit to Chernobyl (“<a href="https://medium.com/airbnb-engineering/zooming-towards-human-connection-66bb6e45161c" target="_blank">https://medium.com/airbnb-engineering/zooming-towards-human-connection-66bb6e45161c</a> (accessed on 20 January 2023)”). (<b>B</b>) A girl shares her music activity with guests (“<a href="https://news.airbnb.com/enjoy-the-magic-of-airbnb-experiences-from-the-comfort-of-your-home/" target="_blank">https://news.airbnb.com/enjoy-the-magic-of-airbnb-experiences-from-the-comfort-of-your-home/</a> (accessed on 20 January 2023)”).</p>
Full article ">Figure 5
<p>Calvium’s hybrid space diagram (“<a href="https://medium.com/@gemmacampbell/building-the-hybrid-space-ba406426ffeb" target="_blank">https://medium.com/@gemmacampbell/building-the-hybrid-space-ba406426ffeb</a> (accessed on 10 February 2023)”).</p>
Full article ">Figure 6
<p>Calvium’s digital placemaking diagram [<a href="#B54-land-14-00170" class="html-bibr">54</a>].</p>
Full article ">Figure 7
<p>The fundamental idea of bilateral interaction characterizes the sense of place (© Ahmed Rezeg, 2023).</p>
Full article ">Figure 8
<p>Proposed conceptual unified model of sense of place (© Ahmed Rezeg, 2023).</p>
Full article ">Figure 9
<p>Integration of subjective “sense of place” data into a PBGIS database. (© Ahmed Rezeg, 2023).</p>
Full article ">
Back to TopTop