Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,021)

Search Parameters:
Keywords = attention U-Net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 19699 KiB  
Article
Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture
by Vikas Kumar, Nitin Mohan Sharma, Prasant K. Mahapatra, Neeti Dogra, Lalit Maurya, Fahad Ahmad, Neelam Dahiya and Prashant Panda
Diagnostics 2025, 15(6), 663; https://doi.org/10.3390/diagnostics15060663 (registering DOI) - 9 Mar 2025
Abstract
Background: Accurate segmentation of the left ventricle in echocardiograms is crucial for the diagnosis and monitoring of cardiovascular diseases. However, this process is hindered by the limited availability of high-quality annotated datasets and the inherent complexities of echocardiogram images. Traditional methods often [...] Read more.
Background: Accurate segmentation of the left ventricle in echocardiograms is crucial for the diagnosis and monitoring of cardiovascular diseases. However, this process is hindered by the limited availability of high-quality annotated datasets and the inherent complexities of echocardiogram images. Traditional methods often struggle to generalize across varying image qualities and conditions, necessitating a more robust solution. Objectives: This study aims to enhance left ventricular segmentation in echocardiograms by developing a framework that integrates Generative Adversarial Networks (GANs) for synthetic data augmentation with a MultiResUNet architecture, providing a more accurate and reliable segmentation method. Methods: We propose a GAN-based framework that generates synthetic echocardiogram images and their corresponding segmentation masks, augmenting the available training data. The synthetic data, along with real echocardiograms from the EchoNet-Dynamic dataset, were used to train the MultiResUNet architecture. MultiResUNet incorporates multi-resolution blocks, residual connections, and attention mechanisms to effectively capture fine details at multiple scales. Additional enhancements include atrous spatial pyramid pooling (ASPP) and scaled exponential linear units (SELUs) to further improve segmentation accuracy. Results: The proposed approach significantly outperforms existing methods, achieving a Dice Similarity Coefficient of 95.68% and an Intersection over Union (IoU) of 91.62%. This represents improvements of 2.58% in Dice and 4.84% in IoU over previous segmentation techniques, demonstrating the effectiveness of GAN-based augmentation in overcoming data scarcity and improving segmentation performance. Conclusions: The integration of GAN-generated synthetic data and the MultiResUNet architecture provides a robust and accurate solution for left ventricular segmentation in echocardiograms. This approach has the potential to enhance clinical decision-making in cardiovascular medicine by improving the accuracy of automated diagnostic tools, even in the presence of limited and complex training data. Full article
(This article belongs to the Special Issue Artificial Intelligence in Cardiovascular Diseases (2024))
Show Figures

Figure 1

Figure 1
<p>Procedural framework of the Methodology.</p>
Full article ">Figure 2
<p>GAN Architecture.</p>
Full article ">Figure 3
<p>The architecture of the MultiResUNet Model.</p>
Full article ">Figure 4
<p>ResPath Block Structure.</p>
Full article ">Figure 5
<p>Synthetic images with corresponding masks Generated through GAN.</p>
Full article ">Figure 6
<p>Dice Coefficient vs. Epoch and IoU vs. Epoch Graph during training (<b>left two plots</b>) and validation (<b>right two plots</b>), respectively.</p>
Full article ">Figure 7
<p>ROC curve for all test images.</p>
Full article ">Figure 8
<p>(<b>a</b>) 2D projection plot of high-dimensional image features before and after GAN-based approach (<b>b</b>) Local Outlier Factor (LOF) anomaly detection. The blue points correspond to the original dataset, while the red points represent the synthetic images generated by the GAN. The black markers represent outliers detected using LOF.</p>
Full article ">Figure 9
<p>Boxplots of 10 fold cross validation results of proposed approach.</p>
Full article ">Figure 10
<p>MultiResUNet result vs. Expert Annotated LV Segmentation. Green represents true-positive pixels, red indicates false-positive pixels, and blue highlights false-negative pixels.</p>
Full article ">
30 pages, 34905 KiB  
Article
Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation
by Ojonugwa Oluwafemi Ejiga Peter, Opeyemi Taiwo Adeniran, Adetokunbo MacGregor John-Otumu, Fahmi Khalifa and Md Mahmudur Rahman
Algorithms 2025, 18(3), 155; https://doi.org/10.3390/a18030155 (registering DOI) - 9 Mar 2025
Viewed by 71
Abstract
The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized [...] Read more.
The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized Generative Adversarial Network (VQ-GAN)) have been used to generate images but not colonoscopy data for intelligent data augmentation. This study developed an effective method for producing synthetic colonoscopy image data, which can be used to train advanced medical diagnostic models for robust colorectal cancer detection and treatment. Text-to-image synthesis was performed using fine-tuned Visual Large Language Models (LLMs). Stable Diffusion and DreamBooth Low-Rank Adaptation produce images that look authentic, with an average Inception score of 2.36 across three datasets. The validation accuracy of various classification models Big Transfer (BiT), Fixed Resolution Residual Next Generation Network (FixResNeXt), and Efficient Neural Network (EfficientNet) were 92%, 91%, and 86%, respectively. Vision Transformer (ViT) and Data-Efficient Image Transformers (DeiT) had an accuracy rate of 93%. Secondly, for the segmentation of polyps, the ground truth masks are generated using Segment Anything Model (SAM). Then, five segmentation models (U-Net, Pyramid Scene Parsing Network (PSNet), Feature Pyramid Network (FPN), Link Network (LinkNet), and Multi-scale Attention Network (MANet)) were adopted. FPN produced excellent results, with an Intersection Over Union (IoU) of 0.64, an F1 score of 0.78, a recall of 0.75, and a Dice coefficient of 0.77. This demonstrates strong performance in terms of both segmentation accuracy and overlap metrics, with particularly robust results in balanced detection capability as shown by the high F1 score and Dice coefficient. This highlights how AI-generated medical images can improve colonoscopy analysis, which is critical for early colorectal cancer detection. Full article
35 pages, 1085 KiB  
Article
Multi-Channel Speech Enhancement Using Labelled Random Finite Sets and a Neural Beamformer in Cocktail Party Scenario
by Jayanta Datta, Ali Dehghan Firoozabadi, David Zabala-Blanco and Francisco R. Castillo-Soria
Appl. Sci. 2025, 15(6), 2944; https://doi.org/10.3390/app15062944 (registering DOI) - 8 Mar 2025
Viewed by 278
Abstract
In this research, a multi-channel target speech enhancement scheme is proposed that is based on deep learning (DL) architecture and assisted by multi-source tracking using a labeled random finite set (RFS) framework. A neural network based on minimum variance distortionless response (MVDR) beamformer [...] Read more.
In this research, a multi-channel target speech enhancement scheme is proposed that is based on deep learning (DL) architecture and assisted by multi-source tracking using a labeled random finite set (RFS) framework. A neural network based on minimum variance distortionless response (MVDR) beamformer is considered as the beamformer of choice, where a residual dense convolutional graph-U-Net is applied in a generative adversarial network (GAN) setting to model the beamformer for target speech enhancement under reverberant conditions involving multiple moving speech sources. The input dataset for this neural architecture is constructed by applying multi-source tracking using multi-sensor generalized labeled multi-Bernoulli (MS-GLMB) filtering, which belongs to the labeled RFS framework, to obtain estimations of the sources’ positions and the associated labels (corresponding to each source) at each time frame with high accuracy under the effect of undesirable factors like reverberation and background noise. The tracked sources’ positions and associated labels help to correctly discriminate the target source from the interferers across all time frames and generate time–frequency (T-F) masks corresponding to the target source from the output of a time-varying, minimum variance distortionless response (MVDR) beamformer. These T-F masks constitute the target label set used to train the proposed deep neural architecture to perform target speech enhancement. The exploitation of MS-GLMB filtering and a time-varying MVDR beamformer help in providing the spatial information of the sources, in addition to the spectral information, within the neural speech enhancement framework during the training phase. Moreover, the application of the GAN framework takes advantage of adversarial optimization as an alternative to maximum likelihood (ML)-based frameworks, which further boosts the performance of target speech enhancement under reverberant conditions. The computer simulations demonstrate that the proposed approach leads to better target speech enhancement performance compared with existing state-of-the-art DL-based methodologies which do not incorporate the labeled RFS-based approach, something which is evident from the 75% ESTOI and PESQ of 2.70 achieved by the proposed approach as compared with the 46.74% ESTOI and PESQ of 1.84 achieved by Mask-MVDR with self-attention mechanism at a reverberation time (RT60) of 550 ms. Full article
24 pages, 8059 KiB  
Article
MMRAD-Net: A Multi-Scale Model for Precise Building Extraction from High-Resolution Remote Sensing Imagery with DSM Integration
by Yu Gao, Huiming Chai and Xiaolei Lv
Remote Sens. 2025, 17(6), 952; https://doi.org/10.3390/rs17060952 - 7 Mar 2025
Viewed by 127
Abstract
High-resolution remote sensing imagery (HRRSI) presents significant challenges for building extraction tasks due to its complex terrain structures, multi-scale features, and rich spectral and geometric information. Traditional methods often face limitations in effectively integrating multi-scale features while maintaining a balance between detailed and [...] Read more.
High-resolution remote sensing imagery (HRRSI) presents significant challenges for building extraction tasks due to its complex terrain structures, multi-scale features, and rich spectral and geometric information. Traditional methods often face limitations in effectively integrating multi-scale features while maintaining a balance between detailed and global semantic information. To address these challenges, this paper proposes an innovative deep learning network, Multi-Source Multi-Scale Residual Attention Network (MMRAD-Net). This model is built upon the classical encoder–decoder framework and introduces two key components: the GCN OA-SWinT Dense Module (GSTDM) and the Res DualAttention Dense Fusion Block (R-DDFB). Additionally, it incorporates Digital Surface Model (DSM) data, presenting a novel feature extraction and fusion strategy. Specifically, the model enhances building extraction accuracy and robustness through hierarchical feature modeling and a refined cross-scale fusion mechanism, while effectively preserving both detail information and global semantic relationships. Furthermore, we propose a Hybrid Loss, which combines Binary Cross-Entropy Loss (BCE Loss), Dice Loss, and an edge-sensitive term to further improve the precision of building edges and foreground reconstruction capabilities. Experiments conducted on the GF-7 and WHU datasets validate the performance of MMRAD-Net, demonstrating its superiority over traditional methods in boundary handling, detail recovery, and adaptability to complex scenes. On the GF-7 Dataset, MMRAD-Net achieved an F1-score of 91.12% and an IoU of 83.01%. On the WHU Building Dataset, the F1-score and IoU were 94.04% and 88.99%, respectively. Ablation studies and transfer learning experiments further confirm the rationality of the model design and its strong generalization ability. These results highlight that innovations in multi-source data fusion, multi-scale feature modeling, and detailed feature fusion mechanisms have enhanced the accuracy and robustness of building extraction. Full article
15 pages, 3702 KiB  
Article
Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention
by Xiaoming Sun, Shilin Li, Yongji Chen, Junxia Chen, Hao Geng, Kun Sun, Yuemin Zhu, Bochao Su and Hu Zhang
Electronics 2025, 14(6), 1058; https://doi.org/10.3390/electronics14061058 - 7 Mar 2025
Viewed by 81
Abstract
Nucleus accurate segmentation is a crucial task in biomedical image analysis. While convolutional neural networks (CNNs) have achieved notable progress in this field, challenges remain due to the complexity and heterogeneity of cell images, especially in overlapping regions of nuclei. To address the [...] Read more.
Nucleus accurate segmentation is a crucial task in biomedical image analysis. While convolutional neural networks (CNNs) have achieved notable progress in this field, challenges remain due to the complexity and heterogeneity of cell images, especially in overlapping regions of nuclei. To address the limitations of current methods, we propose a mechanism of multiple differential convolution and local-variation attention in CNNs, leading to the so-called multiple differential convolution and local-variation attention U-Net (MDLA-UNet). The multiple differential convolution employs multiple differential operators to capture gradient and direction information, improving the network’s capability to detect edges. The local-variation attention utilizes Haar discrete wavelet transforms for level-1 decomposition to obtain approximate features, and then derives high-frequency features to enhance the global context and local detail variation of the feature maps. The results on the MoNuSeg, TNBC, and CryoNuSeg datasets demonstrated superior segmentation performance of the proposed method for cells having complex boundaries and details with respect to existing methods. The proposed MDLA-UNet presents the ability of capturing fine edges and details in feature maps and thus improves the segmentation of nuclei with blurred boundaries and overlapping regions. Full article
(This article belongs to the Special Issue Feature Papers in "Computer Science & Engineering", 2nd Edition)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The network architecture of the MDLA-UNet.</p>
Full article ">Figure 2
<p>The multiple differential convolution block.</p>
Full article ">Figure 3
<p>HD operators for horizontal differential convolution blocks.</p>
Full article ">Figure 4
<p>Illustration of Haar DWT.</p>
Full article ">Figure 5
<p>The local-variation attention block.</p>
Full article ">Figure 6
<p>The result of the comparison experiments with state-of-the-art models conducted on the MoNuSeg, TNBC, and CryoNuSeg datasets.</p>
Full article ">Figure 7
<p>The findings from the ablation experiments conducted on the MoNuSeg, TNBC, and CryoNuSeg datasets.</p>
Full article ">
22 pages, 9220 KiB  
Article
E2S: A UAV-Based Levee Crack Segmentation Framework Using the Unsupervised Deblurring Technique
by Fangyi Wang, Zhaoli Wang, Xushu Wu, Di Wu, Haiying Hu, Xiaoping Liu and Yan Zhou
Remote Sens. 2025, 17(5), 935; https://doi.org/10.3390/rs17050935 - 6 Mar 2025
Viewed by 122
Abstract
The accurate detection and monitoring of levee cracks is critical for maintaining the structural integrity and safety of flood protection infrastructure. Yet at present the application of using UAV to achieve an automatic, rapid detection of levee cracks is still limited and there [...] Read more.
The accurate detection and monitoring of levee cracks is critical for maintaining the structural integrity and safety of flood protection infrastructure. Yet at present the application of using UAV to achieve an automatic, rapid detection of levee cracks is still limited and there is a lack of effective deblurring methods specifically tailored for UAV-based levee crack images. In this study, we present E2S, a novel two-stage framework specifically designed for UAV-based levee crack segmentation, which leverages an unsupervised deblurring technique to enhance image quality. In the first stage, we introduce an Improved CycleGAN model that mainly performs motion deblurring on UAV-captured images, effectively enhancing crack visibility and preserving crucial structural details. The enhanced images are then fed into the second stage, where an Attention U-Net is employed for precise crack segmentation. The experimental results demonstrate that the E2S framework significantly outperforms traditional supervised models, achieving an F1-score of 81.3% and a crack IoU of 71.84%, surpassing the best-performing baseline, Unet++. The findings confirm that the integration of unsupervised image enhancement can substantially benefit downstream segmentation tasks, providing a robust and scalable solution for automated levee crack monitoring. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) The geographical location of the study area. (<b>b</b>) A satellite image of the investigated levee section. (<b>c</b>) A UAV-captured photograph of the levee during the data collection.</p>
Full article ">Figure 2
<p>The architecture of the generator in the Improved CycleGAN model.</p>
Full article ">Figure 3
<p>The architecture of the Improved CycleGAN model, incorporating RDB and an enhanced loss function.</p>
Full article ">Figure 4
<p>The architecture of the attention gate.</p>
Full article ">Figure 5
<p>The architecture of the Attention U-Net.</p>
Full article ">Figure 6
<p>The overall workflow of the E2S methodology.</p>
Full article ">Figure 7
<p>An illustration of the SIFT-based offset distance calculation process. (<b>a</b>) The detection and matching of SIFT feature points between the UAV-based original image (<b>left</b>) and the enhanced image (<b>right</b>). Green circles represent detected feature points, and the connecting lines indicate matched points across the two images. (<b>b</b>) A conceptual representation of matched feature points and the offset distance calculation. Blue points denote feature positions in the original image, while yellow points represent their corresponding positions in the enhanced image. The Euclidean distance between each matched point pair quantifies the displacement introduced by the enhancement process, serving as the SIFT-based offset distance.</p>
Full article ">Figure 8
<p>A visual comparison of the crack image enhancement results from the ablation experiments. (<b>a</b>) Input images. (<b>b</b>) Results from the Baseline-CycleGAN. (<b>c</b>) Results from the SSIM-CycleGAN. (<b>d</b>) Results from the RDB-CycleGAN. (<b>e</b>) Results from the Improved CycleGAN.</p>
Full article ">Figure 9
<p>Crack segmentation results of five networks on the test set. (<b>a</b>) Input image. (<b>b</b>) Ground truth. (<b>c</b>) Results from E2S. (<b>d</b>) Results from UNet++. (<b>e</b>) Results from Attention U-Net. (<b>f</b>) Results from U-Net. (<b>g</b>) Results from Deeplab3+.</p>
Full article ">Figure 10
<p>A comparison of the False Negative (FN) performance among E2S, Attention U-Net, and U-Net++. The window highlights the area prone to FN errors. FN pixels are visualized in yellow. “Attn” denotes Attention.</p>
Full article ">Figure 11
<p>Instances of artifacts leading to incorrect segmentation results. (<b>a</b>,<b>b</b>) Original UAV-captured images. (<b>c</b>,<b>d</b>) The corresponding enhanced images, where artifacts impacted segmentation accuracy.</p>
Full article ">
24 pages, 4633 KiB  
Article
Load Equipment Segmentation and Assessment Method Based on Multi-Source Tensor Feature Fusion
by Xiaoli Zhang, Congcong Zhao, Wenjie Lu and Kun Liang
Electronics 2025, 14(5), 1040; https://doi.org/10.3390/electronics14051040 - 5 Mar 2025
Viewed by 294
Abstract
The state monitoring of power load equipment plays a crucial role in ensuring its normal operation. However, in densely deployed environments, the target equipment often exhibits low clarity, making real-time warnings challenging. In this study, a load equipment segmentation and assessment method based [...] Read more.
The state monitoring of power load equipment plays a crucial role in ensuring its normal operation. However, in densely deployed environments, the target equipment often exhibits low clarity, making real-time warnings challenging. In this study, a load equipment segmentation and assessment method based on multi-source tensor feature fusion (LSA-MT) is proposed. First, a lightweight residual block based on the attention mechanism is introduced into the backbone network to emphasize key features of load devices and enhance target segmentation efficiency. Second, a 3D edge detail feature perception module is designed to facilitate multi-scale feature fusion while preserving boundary detail features of different devices, thereby improving local recognition accuracy. Finally, tensor decomposition and reorganization are employed to guide visual feature reconstruction in conjunction with equipment monitoring images, while tensor mapping of equipment monitoring data is utilized for automated fault classification. The experimental results demonstrate that LSE-MT produces visually clearer segmentations compared to models such as the classic UNet++ and the more recent EGE-UNet when segmenting multiple load devices, achieving Dice and mIoU scores of 92.48 and 92.90, respectively. Regarding classification across the four datasets, the average accuracy can reach 92.92%. These findings fully demonstrate the effectiveness of the LSA-MT method in load equipment fault alarms and grid operation and maintenance. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of LSA-MT.</p>
Full article ">Figure 2
<p>The architecture of LRB-AM.</p>
Full article ">Figure 3
<p>The architecture of 3DPM.</p>
Full article ">Figure 4
<p>The architecture of EA-MTF.</p>
Full article ">Figure 5
<p>Load equipment visual feature representation and enhancement process.</p>
Full article ">Figure 6
<p>Multi-load device segmentation visualization results. Column (<b>a</b>) shows an image of the device load used for testing, and column (<b>b</b>) shows a manually labeled truth split. Columns (<b>c</b>–<b>h</b>) show the segmentation results of various comparison models, and column (<b>i</b>) illustrates the segmentation results achieved using the LSA-MT.</p>
Full article ">Figure 7
<p>Trend graph of load equipment segmentation assessment results.</p>
Full article ">Figure 8
<p>Trends in metrics for each model on equipment state assessments and open datasets.</p>
Full article ">Figure 9
<p>Grad-CAM visualization results of ablation experiments.</p>
Full article ">
23 pages, 1078 KiB  
Article
Enhanced U-Net for Infant Brain MRI Segmentation: A (2+1)D Convolutional Approach
by Lehel Dénes-Fazakas, Levente Kovács, György Eigner and László Szilágyi
Sensors 2025, 25(5), 1531; https://doi.org/10.3390/s25051531 - 28 Feb 2025
Viewed by 345
Abstract
Background: Infant brain tissue segmentation from MRI data is a critical task in medical imaging, particularly challenging due to the evolving nature of tissue contrasts in the early months of life. The difficulty increases as gray matter (GM) and white matter (WM) intensities [...] Read more.
Background: Infant brain tissue segmentation from MRI data is a critical task in medical imaging, particularly challenging due to the evolving nature of tissue contrasts in the early months of life. The difficulty increases as gray matter (GM) and white matter (WM) intensities converge, making accurate segmentation challenging. This study aims to develop an improved U-net-based model to enhance the precision of automatic segmentation of cerebro-spinal fluid (CSF), GM, and WM in 10 infant brain MRIs using the iSeg-2017 dataset. Methods: The proposed method utilizes a U-net architecture with (2+1)Dconvolutional layers and skip connections. Preprocessing includes intensity normalization using histogram alignment to standardize MRI data across different records. The model was trained on the iSeg-2017 dataset, which comprises T1-weighted and T2-weighted MRI data from ten infant subjects. Cross-validation was performed to evaluate the model’s segmentation performance. Results: The model achieved an average accuracy of 92.2%, improving on previous methods by 0.7%. Sensitivity, precision, and Dice similarity scores were used to evaluate the performance, showing high levels of accuracy across different tissue types. The model demonstrated a slight bias toward misclassifying GM and WM, indicating areas for potential improvement. Conclusions: The results suggest that the U-net architecture is highly effective in segmenting infant brain tissues from MRI data. Future work will explore enhancements such as attention mechanisms and dual-network processing for further improving segmentation accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>2 plus 1 dimensional convolution.</p>
Full article ">Figure 2
<p>Workflow of our U-net approach. The input is a 3-dimensional MRI scan. This image is passed to our U-net which processes it using (2+1)D convolution. Then, the segmented images are obtained at the output of the last layer. Since all images are segmented output.</p>
Full article ">Figure 3
<p>Structure of an encoder block.</p>
Full article ">Figure 4
<p>The proposed 2+1D U-net architecture.</p>
Full article ">Figure 5
<p>Structure of the bridge part.</p>
Full article ">Figure 6
<p>Structure of a decoder block.</p>
Full article ">Figure 7
<p>Boxplot of different segmentation benchmark metrics.</p>
Full article ">Figure 8
<p>Benchmark values obtained for individual records and tissue types in panels (<b>a</b>–<b>c</b>); accuracy rates obtained for individual records in panel (<b>d</b>).</p>
Full article ">Figure 9
<p>All slices of a segmented brain. The three shades of green from dark to light represent the correctly segmented pixels of the three main tissue types: CSF, GM, and WM, respectively, while red color indicates misclassified pixels.</p>
Full article ">
17 pages, 72606 KiB  
Article
Classification of Large Scale Hyperspectral Remote Sensing Images Based on LS3EU-Net++
by Hengqian Zhao, Zhengpu Lu, Shasha Sun, Pan Wang, Tianyu Jia, Yu Xie and Fei Xu
Remote Sens. 2025, 17(5), 872; https://doi.org/10.3390/rs17050872 - 28 Feb 2025
Viewed by 165
Abstract
Aimed at the limitation that existing hyperspectral classification methods were mainly oriented to small-scale images, this paper proposed a new large-scale hyperspectral remote sensing image classification method, LS3EU-Net++ (Lightweight Encoder and Integrated Spatial Spectral Squeeze and Excitation U-Net++). The method optimized the U-Net++ [...] Read more.
Aimed at the limitation that existing hyperspectral classification methods were mainly oriented to small-scale images, this paper proposed a new large-scale hyperspectral remote sensing image classification method, LS3EU-Net++ (Lightweight Encoder and Integrated Spatial Spectral Squeeze and Excitation U-Net++). The method optimized the U-Net++ architecture by introducing a lightweight encoder and combining the Spatial Spectral Squeeze and Excitation (S3E) Attention Module, which maintained the powerful feature extraction capability while significantly reducing the training cost. In addition, the model employed a composite loss function combining focal loss and Jaccard loss, which could focus more on difficult samples, thus improving pixel-level accuracy and classification results. To solve the sample imbalance problem in hyperspectral images, this paper also proposed a data enhancement strategy based on “copy–paste”, which effectively increased the diversity of the training dataset. Experiments on large-scale satellite hyperspectral remote sensing images from the Zhuhai-1 satellite demonstrated that LS3EU-Net++ exhibited superiority over the U-Net++ benchmark. Specifically, the overall accuracy (OA) was improved by 5.35%, and the mean Intersection over Union (mIoU) by 12.4%. These findings suggested that the proposed method provided a robust solution for large-scale hyperspectral image classification, effectively balancing accuracy and computational efficiency. Full article
(This article belongs to the Topic Hyperspectral Imaging and Signal Processing)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of the proposed method.</p>
Full article ">Figure 2
<p>Percentage of samples per class in a large-scale dataset.</p>
Full article ">Figure 3
<p>Data-enhanced samples and their corresponding original samples: (<b>a</b>) is the original sample of (<b>d</b>), (<b>b</b>) is the original sample of (<b>e</b>), and (<b>c</b>) is the original sample of (<b>f</b>). The red boxes show the portion of the bare soil that changed after the data enhancement was performed.</p>
Full article ">Figure 4
<p>Rate of change of percentage of samples from each class after data augmentation.</p>
Full article ">Figure 5
<p>Schematic diagram of a common residual CNN (<b>a</b>) and light-weighted MobileNetV2 (<b>b</b>).</p>
Full article ">Figure 6
<p>Schematic diagram of the S3E model.</p>
Full article ">Figure 7
<p>True color display of one of the sub-images and its corresponding ground truth of LHSI-A.</p>
Full article ">Figure 8
<p>Results of the test set experiment, where green is vegetation, red is buildings, blue is water, yellow is bare soil, and black is background: (<b>f</b>–<b>j</b>) correspond to the labeled true-color display plots for (<b>a</b>–<b>e</b>), respectively, and (<b>k</b>–<b>o</b>) correspond to the predicted plots for (<b>a</b>–<b>e</b>), respectively.</p>
Full article ">Figure 9
<p>True color display (<b>a</b>), ground truth map (<b>b</b>), and the predicted map of LS3EU-Net++ on the LHSI-B dataset (<b>c</b>), where green is vegetation, red is buildings, blue is water, and yellow is bare soil.</p>
Full article ">
26 pages, 4394 KiB  
Article
Neural Network Models for Prostate Zones Segmentation in Magnetic Resonance Imaging
by Saman Fouladi, Luca Di Palma, Fatemeh Darvizeh, Deborah Fazzini, Alessandro Maiocchi, Sergio Papa, Gabriele Gianini and Marco Alì
Information 2025, 16(3), 186; https://doi.org/10.3390/info16030186 - 28 Feb 2025
Viewed by 123
Abstract
Prostate cancer (PCa) is one of the most common tumors diagnosed in men worldwide, with approximately 1.7 million new cases expected by 2030. Most cancerous lesions in PCa are located in the peripheral zone (PZ); therefore, accurate identification of the location of the [...] Read more.
Prostate cancer (PCa) is one of the most common tumors diagnosed in men worldwide, with approximately 1.7 million new cases expected by 2030. Most cancerous lesions in PCa are located in the peripheral zone (PZ); therefore, accurate identification of the location of the lesion is essential for effective diagnosis and treatment. Zonal segmentation in magnetic resonance imaging (MRI) scans is critical and plays a key role in pinpointing cancerous regions and treatment strategies. In this work, we report on the development of three advanced neural network-based models: one based on ensemble learning, one on Meta-Net, and one on YOLO-V8. They were tailored for the segmentation of the central gland (CG) and PZ using a small dataset of 90 MRI scans for training, 25 MRIs for validation, and 24 scans for testing. The ensemble learning method, combining U-Net-based models (Attention-Res-U-Net, Vanilla-Net, and V-Net), achieved an IoU of 79.3% and DSC of 88.4% for CG and an IoU of 54.5% and DSC of 70.5% for PZ on the test set. Meta-Net, used for the first time in segmentation, demonstrated an IoU of 78% and DSC of 88% for CG, while YOLO-V8 outperformed both models with an IoU of 80% and DSC of 89% for CG and an IoU of 58% and DSC of 73% for PZ. Full article
(This article belongs to the Special Issue Detection and Modelling of Biosignals)
Show Figures

Figure 1

Figure 1
<p>The architecture and details of the Att-R-Net neural network, (<b>a</b>) the overall network, and (<b>b</b>) the architecture of the attention block. The Double Conv contains two convolution layers with the Relu activation function, and the gating signal contains one convolution layer with the Relu activation function.</p>
Full article ">Figure 1 Cont.
<p>The architecture and details of the Att-R-Net neural network, (<b>a</b>) the overall network, and (<b>b</b>) the architecture of the attention block. The Double Conv contains two convolution layers with the Relu activation function, and the gating signal contains one convolution layer with the Relu activation function.</p>
Full article ">Figure 2
<p>The architecture and details of the Vanilla-Net neural network.</p>
Full article ">Figure 3
<p>The architecture and details of the V-Net neural network, (<b>a</b>) the overall network, and (<b>b</b>) the architecture of the V-Net Block.</p>
Full article ">Figure 4
<p>The structure of Meta-Net.</p>
Full article ">Figure 5
<p>Confusion Matrix of each judge-ANN.</p>
Full article ">Figure 6
<p>The diagram of the YOLOv8 network structure (figure by the authors) illustrates that the CBS component consists of convolution, batch normalization, and SiLu activation functions. Additionally, the SPPF is built from three tiers of Maxpooling integrated with two CBS units, as in [<a href="#B44-information-16-00186" class="html-bibr">44</a>].</p>
Full article ">Figure 7
<p>Segmentation results of the prostate zones using the ensemble model of three examples of the test set. Columns from left to right show images of the original image, original mask, and predicted mask of CG and PZ, (Brown: PZ and Light Green: CG).</p>
Full article ">Figure 8
<p>Segmentation results of the Meta-Net: (<b>left</b>) original image, (<b>middle</b>) ground truth, and (<b>right</b>) predicted segmentation mask.</p>
Full article ">Figure 9
<p>Detection and segmentation results of the YOLO-V8: (<b>left</b>) original image, (<b>middle</b>) ground truth, and (<b>right</b>) predicted segmentation mask.</p>
Full article ">Figure 10
<p>Comparison of the IoU and DSC results obtained from the test set.</p>
Full article ">Figure 11
<p>Comparison of the DSC results for CG segmentation, obtained from related works and model of our study on the test set [<a href="#B12-information-16-00186" class="html-bibr">12</a>,<a href="#B13-information-16-00186" class="html-bibr">13</a>,<a href="#B14-information-16-00186" class="html-bibr">14</a>,<a href="#B15-information-16-00186" class="html-bibr">15</a>,<a href="#B16-information-16-00186" class="html-bibr">16</a>,<a href="#B17-information-16-00186" class="html-bibr">17</a>,<a href="#B18-information-16-00186" class="html-bibr">18</a>,<a href="#B21-information-16-00186" class="html-bibr">21</a>].</p>
Full article ">Figure 12
<p>Comparison of the DSC results for PZ segmentation, obtained from related works and model of our study on the test set [<a href="#B12-information-16-00186" class="html-bibr">12</a>,<a href="#B13-information-16-00186" class="html-bibr">13</a>,<a href="#B14-information-16-00186" class="html-bibr">14</a>,<a href="#B15-information-16-00186" class="html-bibr">15</a>,<a href="#B16-information-16-00186" class="html-bibr">16</a>,<a href="#B17-information-16-00186" class="html-bibr">17</a>,<a href="#B18-information-16-00186" class="html-bibr">18</a>,<a href="#B19-information-16-00186" class="html-bibr">19</a>,<a href="#B20-information-16-00186" class="html-bibr">20</a>,<a href="#B21-information-16-00186" class="html-bibr">21</a>].</p>
Full article ">
19 pages, 21661 KiB  
Article
U-SwinFusionNet: High Resolution Snow Cover Mapping on the Tibetan Plateau Based on FY-4A
by Xi Kan, Xu Liu, Zhou Zhou, Jing Wang, Linglong Zhu, Lei Gong and Jiangeng Wang
Water 2025, 17(5), 706; https://doi.org/10.3390/w17050706 - 28 Feb 2025
Viewed by 152
Abstract
The Qinghai–Tibet Plateau (QTP), one of China’s most snow-rich regions, has an extremely fragile ecosystem, with drought being the primary driver of ecological degradation. Given that the water resources in this region predominantly exist in the form of snow, high-spatiotemporal-resolution snow mapping is [...] Read more.
The Qinghai–Tibet Plateau (QTP), one of China’s most snow-rich regions, has an extremely fragile ecosystem, with drought being the primary driver of ecological degradation. Given that the water resources in this region predominantly exist in the form of snow, high-spatiotemporal-resolution snow mapping is essential for understanding snow distribution and managing snow water resources effectively. However, although FY-4A/AGRI is capable of obtaining wide-area remote sensing data, only the first to third bands have a resolution of 1 km, which greatly limits its ability to produce high-resolution snow maps. This study proposes U-SwinFusionNet (USFNet), a deep learning-based snow cover retrieval algorithm that leverages the multi-scale advantages of FY-4A/AGRI remote sensing data in the shortwave infrared and visible bands. By integrating 1 km and 2 km resolution remote sensing imagery with auxiliary terrain information, USFNet effectively enhances snow cover mapping accuracy. The proposed model innovatively combines Swin Transformer and convolutional neural networks (CNNs) to capture both global contextual information and local spatial details. Additionally, an Attention Feature Fusion Module (AFFM) is introduced to align and integrate features from different modalities through an efficient attention mechanism, while the Feature Complementation Module (FCM) facilitates interactions between the encoded and decoded features. As a result, USFNet produces snow cover maps with a spatial resolution of 1 km. Experimental comparisons with Artificial Neural Networks (ANNs), Random Forest (RF), U-Net, and ResNet-FSC demonstrate that USFNet exhibits superior robustness, enhanced snow cover continuity, and lower error rates. The model achieves a correlation coefficient of 0.9126 and an R2 of 0.7072. Compared to the MOD10A1 snow product, USFNet demonstrates an improved sensitivity to fragmented and low-snow-cover areas while ensuring more natural snow boundary transitions. Full article
Show Figures

Figure 1

Figure 1
<p>Overall structure of the USFNet.</p>
Full article ">Figure 2
<p>Schematic diagram of the study area. The selected regions outlined represent the areas included in the dataset after filtering.</p>
Full article ">Figure 3
<p>Intermediate image of the FSC True Value Labeling process.</p>
Full article ">Figure 4
<p>Structural diagrams of various modules in the proposed model. (<b>a</b>) CSFEM structure. (<b>b</b>) AFFM structure. (<b>c</b>) FSM structure.</p>
Full article ">Figure 5
<p>Loss curve. The “Train loss with 480 TIF” and “Validation loss with 480 TIF” represent the train and validation loss curves over the entire train set. The “Train loss with 240 TIF” and “Validation loss with 240 TIF” correspond to the train and validation loss curves computed on half of the train set.</p>
Full article ">Figure 6
<p>Comparison of inversion results from various models. The highlighted areas of the figure show where the differences between the methods are more significant.</p>
Full article ">Figure 7
<p>Comparison between the FSC estimation results of the USFNet and the MOD10A1 snow product. The highlighted area shows the significant difference between the MOD10A1 FSC and FY-4A/AGRI FSC. The spatial resolution of FY-4A, Landsat8 FSC, MOD10A1 FSC, and OUR FSC is 1 km, and the MOD10A1 NDSI has a native resolution of 500 m.</p>
Full article ">Figure 8
<p>FY-4A/AGRI remote sensing images of the corresponding area in <a href="#water-17-00706-f007" class="html-fig">Figure 7</a> from 9:00 a.m. to 2:00 p.m.</p>
Full article ">
15 pages, 1792 KiB  
Article
Application Research on Contour Feature Extraction of Solidified Region Image in Laser Powder Bed Fusion Based on SA-TransUNet
by Mengxiang Dang, Xin Zhou, Guorong Huang, Xuede Wang, Ting Zhang, Ying Tian, Guoquan Ding and Hanyu Gao
Appl. Sci. 2025, 15(5), 2602; https://doi.org/10.3390/app15052602 - 28 Feb 2025
Viewed by 289
Abstract
The solidified state after the melting of the forming layer in the laser powder bed fusion (LPBF) directly reflects the final forming quality. Compared with the powder layer and the melt pool, it is easier to recognize and remove the defects of contour [...] Read more.
The solidified state after the melting of the forming layer in the laser powder bed fusion (LPBF) directly reflects the final forming quality. Compared with the powder layer and the melt pool, it is easier to recognize and remove the defects of contour parts in time by monitoring and processing the solidified region after the melting of the forming layer. To explore the application of a solidified region image in defect contour detection of the forming layer, an improved image segmentation model based on TranUNet is designed to extract the image features of the solidified region as process data, on which basis this paper analyzes the similarities and differences between forming process data and CT scanning results data. Addressing the characteristics of large data volume and significant feature scale variation in the solidified region image obtained during the LPBF process, an SA-TransUNet semantic segmentation model integrating SE attention mechanism and ASPP multi-scale feature extraction module is developed to achieve high-precision solidified region image segmentation, with an IoU and a dice coefficient index up to 94.24% and 97.02%, respectively. By extracting the solidified region image of the LPBF forming layer through this model and calculating the geometric feature values of its contour, a comparative analysis is conducted with the corresponding contour geometric feature values of the formed part CT scan image, which verifies the feasibility of the solidified region image extraction method proposed in this paper for contour defect detection. Full article
(This article belongs to the Section Additive Manufacturing Technologies)
Show Figures

Figure 1

Figure 1
<p>Data preparation flowchart. (<b>a</b>) Schematic diagram of the installation of monitoring equipment; (<b>b</b>) forming layer images; (<b>c</b>) data-enhanced images.</p>
Full article ">Figure 2
<p>SA-TransUNet Network Model. (<b>a</b>) Schematic of the Transformer layer; (<b>b</b>) architecture of the proposed SA-TransUNet.</p>
Full article ">Figure 3
<p>SE Attention Module.</p>
Full article ">Figure 4
<p>ASPP Multi-scale Feature Extraction Module.</p>
Full article ">Figure 5
<p>Visualization of comparison results.</p>
Full article ">Figure 6
<p>The correlation between solidified region images and CT slice images. (<b>a</b>) Trends in geometric feature changes; (<b>b</b>) slice layers corresponding to turning points. The orange dash-dotted line is used to correspond the number of layers at the turning point of the line with the uppercase letters marked, for subsequent analysis.</p>
Full article ">
16 pages, 11961 KiB  
Article
Dual-Encoder UNet-Based Narrowband Uncooled Infrared Imaging Denoising Network
by Minghe Wang, Pan Yuan, Su Qiu, Weiqi Jin, Li Li and Xia Wang
Sensors 2025, 25(5), 1476; https://doi.org/10.3390/s25051476 - 27 Feb 2025
Viewed by 181
Abstract
Uncooled infrared imaging systems have significant potential in industrial hazardous gas leak detection. However, the use of narrowband filters to match gas spectral absorption peaks leads to a low level of incident energy captured by uncooled infrared cameras. This results in a mixture [...] Read more.
Uncooled infrared imaging systems have significant potential in industrial hazardous gas leak detection. However, the use of narrowband filters to match gas spectral absorption peaks leads to a low level of incident energy captured by uncooled infrared cameras. This results in a mixture of fixed pattern noise and Gaussian noise, while existing denoising methods for uncooled infrared images struggle to effectively address this mixed noise, severely hindering the extraction and identification of actual gas leak plumes. This paper presents a UNet-structured dual-encoder denoising network specifically designed for narrowband uncooled infrared images. Based on the distinct characteristics of Gaussian random noise and row–column stripe noise, we developed a basic scale residual attention (BSRA) encoder and an enlarged scale residual attention (ESRA) encoder. These two encoder branches perform noise perception and encoding across different receptive fields, allowing for the fusion of noise features from both scales. The combined features are then input into the decoder for reconstruction, resulting in high-quality infrared images. Experimental results demonstrate that our method effectively denoises composite noise, achieving the best results according to both objective metrics and subjective evaluations. This research method significantly enhances the signal-to-noise ratio of narrowband uncooled infrared images, demonstrating substantial application potential in fields such as industrial hazardous gas detection, remote sensing imaging, and medical imaging. Full article
(This article belongs to the Special Issue Optical Sensors for Industrial Applications)
Show Figures

Figure 1

Figure 1
<p>Noise characterization of narrowband infrared images.</p>
Full article ">Figure 2
<p>Architecture of DER-UNet.</p>
Full article ">Figure 3
<p>Structure of RCAG module.</p>
Full article ">Figure 4
<p>Denoising results for an architectural scene.</p>
Full article ">Figure 5
<p>Denoising results for a street scene.</p>
Full article ">Figure 6
<p>Comparative results of ablation study.</p>
Full article ">Figure 7
<p>Process of uncooled infrared image noise: (<b>a</b>) uncooled infrared image noise; (<b>b</b>) image noise following internal baffle calibration; (<b>c</b>) image noise after averaging multiple frames.</p>
Full article ">Figure 8
<p>Denoising results for the real scene.</p>
Full article ">Figure 9
<p>Denoising results for gas plume image.</p>
Full article ">Figure 10
<p>Denoising results of the gas plume image sequence.</p>
Full article ">
20 pages, 3819 KiB  
Article
Research on Precise Segmentation and Center Localization of Weeds in Tea Gardens Based on an Improved U-Net Model and Skeleton Refinement Algorithm
by Zhiyong Cao, Shuai Zhang, Chen Li, Wei Feng, Baijuan Wang, Hao Wang, Ling Luo and Hongbo Zhao
Agriculture 2025, 15(5), 521; https://doi.org/10.3390/agriculture15050521 - 27 Feb 2025
Viewed by 166
Abstract
The primary objective of this research was to develop an efficient method for accurately identifying and localizing weeds in ecological tea garden environments, aiming to enhance the quality and yield of tea production. Weed competition poses a significant challenge to tea production, particularly [...] Read more.
The primary objective of this research was to develop an efficient method for accurately identifying and localizing weeds in ecological tea garden environments, aiming to enhance the quality and yield of tea production. Weed competition poses a significant challenge to tea production, particularly due to the small size of weed plants, their color similarity to tea trees, and the complexity of their growth environment. A dataset comprising 5366 high-definition images of weeds in tea gardens has been compiled to address this challenge. An enhanced U-Net model, incorporating a Double Attention Mechanism and an Atrous Spatial Pyramid Pooling module, is proposed for weed recognition. The results of the ablation experiments show that the model significantly improves the recognition accuracy and the Mean Intersection over Union (MIoU), which are enhanced by 4.08% and 5.22%, respectively. In addition, to meet the demand for precise weed management, a method for determining the center of weed plants by integrating the center of mass and skeleton structure has been developed. The skeleton was extracted through a preprocessing step and a refinement algorithm, and the relative positional relationship between the intersection point of the skeleton and the center of mass was cleverly utilized to achieve up to 82% localization accuracy. These results provide technical support for the research and development of intelligent weeding equipment for tea gardens, which helps to maintain the ecology of tea gardens and improve production efficiency and also provides a reference for weed management in other natural ecological environments. Full article
(This article belongs to the Special Issue Applications of Remote Sensing in Agricultural Soil and Crop Mapping)
Show Figures

Figure 1

Figure 1
<p>Sampling pictures. In (<b>a</b>): Represents the Digitaria sanguinalis. In (<b>b</b>): Represents the Setaria viridis. In (<b>c</b>): Represents the Chenopodium.</p>
Full article ">Figure 2
<p>Segmentation effect diagrams for several models. In (<b>a</b>): Represents the original image. In (<b>b</b>): Represents the hand-labeled. In (<b>c</b>): Represents the U-Net. In (<b>d</b>): Represents the DeepLabV3. In (<b>e</b>): Represents the YOLOv8-Seg.</p>
Full article ">Figure 3
<p>Split effect comparison. In (<b>a</b>): Represents the original image. In (<b>b</b>): Represents the hand-labeled. In (<b>c</b>): Represents the U-Net segmented image.</p>
Full article ">Figure 4
<p>The U-Net with CBAM and ASPP modules improves the network structure.</p>
Full article ">Figure 5
<p>The overall framework of the centering approach.</p>
Full article ">Figure 6
<p>Binary map of weed plants with noise removal. In (<b>a</b>): Represents the before preprocessing. In (<b>b</b>): Represents the after preprocessing.</p>
Full article ">Figure 7
<p>Binary partitioning of weed plants. In (<b>a</b>): Represents the before preprocessing. In (<b>b</b>): Represents the remove central holes. In (<b>c</b>): Represents the remove leaf adhesion.</p>
Full article ">Figure 8
<p>The skeleton extraction results of two thinning algorithms. In (<b>a</b>): Represents the original image. In (<b>b</b>): Represents the Hilditch Thinning Skeleton. In (<b>c</b>): Represents the Zhang-Suen Thinning Skeleton.</p>
Full article ">Figure 9
<p>Comparison of radius of area of interest. In (<b>a</b>): Represents the radius of r/4. In (<b>b</b>): Represents the radius of r/8. The blue points in the figure are the skeleton intersections, the red points are the center of mass positions, and the green points are the centroid points obtained by the algorithm, as follows.</p>
Full article ">Figure 10
<p>The schematic diagram of the center point of the algorithm label.</p>
Full article ">Figure 11
<p>Center positioning effect.</p>
Full article ">Figure 12
<p>The positioning deviation line chart.</p>
Full article ">
19 pages, 1098 KiB  
Article
Deep Learning Based Pile-Up Correction Algorithm for Spectrometric Data Under High-Count-Rate Measurements
by Yiwei Huang, Xiaoying Zheng, Yongxin Zhu, Tom Trigano, Dima Bykhovsky and Zikang Chen
Sensors 2025, 25(5), 1464; https://doi.org/10.3390/s25051464 - 27 Feb 2025
Viewed by 123
Abstract
Gamma-ray spectroscopy is essential in nuclear science, enabling the identification of radioactive materials through energy spectrum analysis. However, high count rates lead to pile-up effects, resulting in spectral distortions that hinder accurate isotope identification and activity estimation. This phenomenon highlights the need for [...] Read more.
Gamma-ray spectroscopy is essential in nuclear science, enabling the identification of radioactive materials through energy spectrum analysis. However, high count rates lead to pile-up effects, resulting in spectral distortions that hinder accurate isotope identification and activity estimation. This phenomenon highlights the need for automated and precise approaches to pile-up correction. We propose a novel deep learning (DL) framework plugging count rate information of pile-up signals with a 2D attention U-Net for energy spectrum recovery. The input to the model is an Energy–Duration matrix constructed from preprocessed pulse signals. Temporal and spatial features are jointly extracted, with count rate information embedded to enhance robustness under high count rate conditions. Training data were generated using an open-source simulator based on a public gamma spectrum database. The model’s performance was evaluated using Kullback–Leibler (KL) divergence, Mean Squared Error (MSE) Energy Resolution (ER), and Full Width at Half Maximum (FWHM). Results indicate that the proposed framework effectively predicts accurate spectra, minimizing errors even under severe pile-up effects. This work provides a robust framework for addressing pile-up effects in gamma-ray spectroscopy, presenting a practical solution for automated, high-accuracy spectrum estimation. The integration of temporal and spatial learning techniques offers promising prospects for advancing high-activity nuclear analysis applications. Full article
(This article belongs to the Special Issue Spectral Detection Technology, Sensors and Instruments, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>An example of signal pile-up phenomenon and its effect on analyzing spectrum. At a count rate of 0.05, that is <math display="inline"><semantics> <mrow> <mn>0.05</mn> <mo>×</mo> <msup> <mn>10</mn> <mn>7</mn> </msup> </mrow> </semantics></math> cps, the duty cycle is given by 0.82 with a theoretical pile-up probability of 0.558 (numerical details are introduced in <a href="#sec4-sensors-25-01464" class="html-sec">Section 4</a>). (<b>a</b>) Illustration of electrical pulses and their stacking phenomenon. (<b>b</b>) Raw pile-up signal and pile-up clusters (red). (<b>c</b>) The true spectrum of <math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Cs</mi> <none/> <none/> <mprescripts/> <none/> <mn>137</mn> </mmultiscripts> </mrow> </semantics></math> (blue) and the distorted spectrum (orange) were obtained by directly calculating the pile-up clusters.</p>
Full article ">Figure 2
<p>An example of a 1:1 mixture of <math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Cs</mi> <none/> <none/> <mprescripts/> <none/> <mn>137</mn> </mmultiscripts> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Co</mi> <none/> <none/> <mprescripts/> <none/> <mn>60</mn> </mmultiscripts> </mrow> </semantics></math>.</p>
Full article ">Figure 3
<p>Workflow of our proposed method.</p>
Full article ">Figure 4
<p>Schematic diagram of the neural network for embedding count rate features. The encoding and decoding of data are simply represented by white cubes.</p>
Full article ">Figure 5
<p>Architecture of the Attention U-Net model used for Energy–Duration matrix recovery. The model processes the input Energy–Duration matrix (size of <math display="inline"><semantics> <mrow> <msub> <mi>D</mi> <mn>1</mn> </msub> <mo>×</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>×</mo> <msub> <mi>C</mi> <mn>1</mn> </msub> </mrow> </semantics></math>) through an encoder–decoder structure with skip connections and attention gates (AGs). The encoder extracts hierarchical features through convolution and max-pooling layers, reducing spatial dimensions (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>,</mo> <mi>E</mi> </mrow> </semantics></math>) while increasing channel depth (<span class="html-italic">C</span>). The decoder progressively upsamples the feature maps, integrating encoder features via skip connections and attention mechanisms to selectively focus on relevant regions. The decoder output is the recovered matrix with the same spatial dimensions as the input.</p>
Full article ">Figure 6
<p>Pile-up spectrum recovery results of our methods, under different high count rates and multiple sources measurements. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.055</mn> <mo>,</mo> <mo> </mo> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>1.12</mn> <mo>,</mo> <mo> </mo> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.674</mn> <mo>,</mo> <mo> </mo> <mmultiscripts> <mi>Co</mi> <none/> <none/> <mprescripts/> <none/> <mn>60</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Ba</mi> <none/> <none/> <mprescripts/> <none/> <mn>133</mn> </mmultiscripts> </mrow> </semantics></math>, KL = 0.07, MSE = <math display="inline"><semantics> <mrow> <mn>0.7</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>7</mn> </mrow> </msup> </mrow> </semantics></math>. (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.265</mn> <mo>,</mo> <mo> </mo> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>5.41</mn> <mo>,</mo> <mo> </mo> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.996</mn> <mo>,</mo> <mo> </mo> <mmultiscripts> <mi>Co</mi> <none/> <none/> <mprescripts/> <none/> <mn>60</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Ba</mi> <none/> <none/> <mprescripts/> <none/> <mn>133</mn> </mmultiscripts> </mrow> </semantics></math>, KL = 0.35, MSE = <math display="inline"><semantics> <mrow> <mn>32.14</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>7</mn> </mrow> </msup> </mrow> </semantics></math>. (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.09</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>1.84</mn> <mo>,</mo> <mo> </mo> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.841</mn> <mo>,</mo> <mo> </mo> <mmultiscripts> <mi>Co</mi> <none/> <none/> <mprescripts/> <none/> <mn>57</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Na</mi> <none/> <none/> <mprescripts/> <none/> <mn>22</mn> </mmultiscripts> </mrow> </semantics></math>, KL = 0.06, MSE = <math display="inline"><semantics> <mrow> <mn>0.80</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>7</mn> </mrow> </msup> </mrow> </semantics></math>. (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.23</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>4.69</mn> <mo>,</mo> <mo> </mo> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.991</mn> <mo>,</mo> <mo> </mo> <mmultiscripts> <mi>Co</mi> <none/> <none/> <mprescripts/> <none/> <mn>57</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Na</mi> <none/> <none/> <mprescripts/> <none/> <mn>22</mn> </mmultiscripts> </mrow> </semantics></math>, KL = 0.17, MSE = <math display="inline"><semantics> <mrow> <mn>3.37</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>7</mn> </mrow> </msup> </mrow> </semantics></math>. (<b>e</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.195</mn> <mo>,</mo> <mo> </mo> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>3.98</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.981</mn> <mo>,</mo> <mo> </mo> <mmultiscripts> <mi>Eu</mi> <none/> <none/> <mprescripts/> <none/> <mn>152</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Ac</mi> <none/> <none/> <mprescripts/> <none/> <mn>225</mn> </mmultiscripts> </mrow> </semantics></math>, KL = 0.20, MSE = <math display="inline"><semantics> <mrow> <mn>9.02</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>7</mn> </mrow> </msup> </mrow> </semantics></math>. (<b>f</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.265</mn> <mo>,</mo> <mo> </mo> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>5.41</mn> <mo>,</mo> <mo> </mo> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.996</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Eu</mi> <none/> <none/> <mprescripts/> <none/> <mn>152</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Ac</mi> <none/> <none/> <mprescripts/> <none/> <mn>225</mn> </mmultiscripts> </mrow> </semantics></math>, KL = 0.24, MSE = <math display="inline"><semantics> <mrow> <mn>7.43</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>7</mn> </mrow> </msup> </mrow> </semantics></math>. The results demonstrate that our method achieves highly accurate peak estimation for increasing levels of pile-up and spectral distortion. The blue curve represents the real energy spectrum, and the orange curve represents the estimated result of our proposed method.</p>
Full article ">Figure 7
<p>Comparison of spectra estimated by different traditional methods and our DL method shows that our approach provides more accurate estimations. The provided conditions are of <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>0.095</mn> <mo>,</mo> <mo> </mo> <mi>D</mi> <mi>C</mi> <mo>=</mo> <mn>1.94</mn> <mo>,</mo> <mo> </mo> <mi mathvariant="script">P</mi> <mo>=</mo> <mn>0.856</mn> <mo>,</mo> <mo> </mo> <mmultiscripts> <mi>Co</mi> <none/> <none/> <mprescripts/> <none/> <mn>57</mn> </mmultiscripts> </mrow> </semantics></math>,<math display="inline"><semantics> <mrow> <mmultiscripts> <mi>Na</mi> <none/> <none/> <mprescripts/> <none/> <mn>22</mn> </mmultiscripts> </mrow> </semantics></math>. (<b>a</b>) The blue curve is the true spectrum, the green curve is Method 1’s result, and the orange curve represents Method 2’s result. (<b>b</b>) The blue curve is the true spectrum, the orange curve is our proposed method’s result.</p>
Full article ">
Back to TopTop