MDPI - Publisher of Open Access Journals

14 pages, 933 KiB

Open AccessSystematic Review

Diagnostic Accuracy of Radiomics in the Early Detection of Pancreatic Cancer: A Systematic Review and Qualitative Assessment Using the Methodological Radiomics Score (METRICS)

by María Estefanía Renjifo-Correa, Salvatore Claudio Fanni, Luis A. Bustamante-Cristancho, Maria Emanuela Cuibari, Gayane Aghakhanyan, Lorenzo Faggioni, Emanuele Neri and Dania Cioni

Cancers 2025, 17(5), 803; https://doi.org/10.3390/cancers17050803 - 26 Feb 2025

Viewed by 165

Abstract

Background/Objectives: Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal malignancy with increasing incidence and low survival rate, primarily due to the late detection of the disease. Radiomics has demonstrated its utility in recognizing patterns and anomalies not perceptible to the human eye. [...] Read more.

Background/Objectives: Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal malignancy with increasing incidence and low survival rate, primarily due to the late detection of the disease. Radiomics has demonstrated its utility in recognizing patterns and anomalies not perceptible to the human eye. This systematic literature review aims to assess the application of radiomics in the analysis of pancreatic parenchyma images to identify early indicators predictive of PDAC. Methods: A systematic search of original research papers was performed on three databases: PubMed, Embase, and Scopus. Two reviewers applied the inclusion and exclusion criteria, and one expert solved conflicts for selecting the articles. After extraction and analysis of the data, there was a quality assessment of these articles using the Methodological Radiomics Score (METRICS) tool. The METRICS assessment was carried out by two raters, and conflicts were solved by a third reviewer. Results: Ten articles for analysis were retrieved. CT scan was the diagnostic imaging used in all the articles. All the studies were retrospective and published between 2019 and 2024. The main objective of the articles was to generate radiomics-based machine learning models able to differentiate pancreatic tumors from healthy tissue. The reported diagnostic performance of the model chosen yielded very high results, with a diagnostic accuracy between 86.5% and 99.2%. Texture and shape features were the most frequently implemented. The METRICS scoring assessment demonstrated that three articles obtained a moderate quality, five a good quality, and, finally, two articles yielded excellent quality. The lack of external validation and available model, code, and data were the major limitations according to the qualitative assessment. Conclusions: There is high heterogeneity in the research question regarding radiomics and pancreatic cancer. The principal limitations of the studies were mainly due to the nature of the trials and the considerable heterogeneity of the radiomic features reported. Nonetheless, the work in this field is promising, and further studies are still required to adopt radiomics in the early detection of PDAC. Full article

(This article belongs to the Special Issue Multimodality Imaging for More Precise Radiotherapy)

► Show Figures

Figure 1

23 pages, 3368 KiB

Open AccessArticle

SDKU-Net: A Novel Architecture with Dynamic Kernels and Optimizer Switching for Enhanced Shadow Detection in Remote Sensing

by Gilberto Alvarado-Robles, Isac Andres Espinosa-Vizcaino, Carlos Gustavo Manriquez-Padilla and Juan Jose Saucedo-Dorantes

Computers 2025, 14(3), 80; https://doi.org/10.3390/computers14030080 - 23 Feb 2025

Viewed by 668

Abstract

Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex [...] Read more.

Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex remote sensing scenarios. SDKU-Net integrates dynamic kernel adjustment, a combined loss function incorporating Focal and Tversky Loss, and optimizer switching to effectively tackle class imbalance and improve segmentation quality. Using the AISD dataset, the proposed method achieved state-of-the-art performance with an Intersection over Union (IoU) of 0.8552, an F1-Score of 0.9219, an Overall Accuracy (OA) of 96.50%, and a Balanced Error Rate (BER) of 5.08%. Comparative analyses demonstrate SDKU-Net’s superior performance against established methods such as U-Net, U-Net++, MSASDNet, and CADDN. Additionally, the model’s efficient training process, requiring only 75 epochs, highlights its potential for resource-constrained applications. These results underscore the robustness and adaptability of SDKU-Net, paving the way for advancements in shadow detection and segmentation across diverse fields. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

22 pages, 9103 KiB

Open AccessArticle

IRST-CGSeg: Infrared Small Target Detection Based on Clustering-Guided Graph Learning and Hierarchical Features

by Guimin Jia, Tao Chen, Yu Cheng and Pengyu Lu

Electronics 2025, 14(5), 858; https://doi.org/10.3390/electronics14050858 - 21 Feb 2025

Viewed by 282

Abstract

Infrared small target detection (IRSTD) aims to segment small targets from an infrared clutter background. However, the long imaging distance, complex background, and extremely limited number of target pixels pose great challenges for IRSTD. In this paper, we propose a new IRSTD method [...] Read more.

Infrared small target detection (IRSTD) aims to segment small targets from an infrared clutter background. However, the long imaging distance, complex background, and extremely limited number of target pixels pose great challenges for IRSTD. In this paper, we propose a new IRSTD method based on the deep graph neural network to fully extract and fuse the texture and structural information of images. Firstly, a clustering algorithm is designed to divide the image into several subgraphs as a prior knowledge to guide the initialization of the graph structure of the infrared image, and the image texture features are integrated to graph construction. Then, a graph feature extraction module is designed, which guides nodes to interact with features within their subgraph via the adjacency matrix. Finally, a hierarchical graph texture feature fusion module is designed to concatenate and stack the structure and texture information at different levels to realize IRSTD. Extensive experiments have been conducted, and the experimental results demonstrate that the proposed method has high interaction over union (IoU) and probability of detection (P_d) on public datasets and the self-constructed dataset, indicating that it has fine shape segmentation and accurate positioning for infrared small targets. Full article

(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)

► Show Figures

Figure 1

29 pages, 12323 KiB

Open AccessArticle

Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines

by Xingfeng Chen, Yunli Zhang, Wu Xue, Shumin Liu, Jiaguo Li, Lei Meng, Jian Yang, Xiaofei Mi, Wei Wan and Qingyan Meng

Remote Sens. 2025, 17(5), 733; https://doi.org/10.3390/rs17050733 - 20 Feb 2025

Viewed by 284

Abstract

Small Target Detection and Identification (TDI) methods for Remote Sensing (RS) images are mostly inherited from the deep learning models of the Computer Vision (CV) field. Compared with natural images, RS images not only have common features such as shape and texture but [...] Read more.

Small Target Detection and Identification (TDI) methods for Remote Sensing (RS) images are mostly inherited from the deep learning models of the Computer Vision (CV) field. Compared with natural images, RS images not only have common features such as shape and texture but also contain unique quantitative information such as spectral features. Therefore, RS TDI in the CV field, which does not use Quantitative Remote Sensing (QRS) information, has the potential to be explored. With the rapid development of high-resolution RS satellites, RS wind turbine detection has become a key research topic for power intelligent inspection. To test the effectiveness of integrating QRS information with deep learning models, the case of wind turbine TDI from high-resolution satellite images was studied. The YOLOv5 model was selected for research because of its stability and high real-time performance. The following methods for integrating QRS and CV for TDI were proposed: (1) Surface reflectance (SR) images obtained using quantitative Atmospheric Correction (AC) were used to make wind turbine samples, and SR data were input into the YOLOv5 model (YOLOv5_AC). (2) A Convolutional Block Attention Module (CBAM) was added to the YOLOv5 network to focus on wind turbine features (YOLOv5_AC_CBAM). (3) Based on the identification results of YOLOv5_AC_CBAM, the spectral, geometric, and textural features selected using expert knowledge were extracted to conduct threshold re-identification (YOLOv5_AC_CBAM_Exp). Accuracy increased from 90.5% to 92.7%, then to 93.2%, and finally to 97.4%. The integration of QRS and CV for TDI showed tremendous potential to achieve high accuracy, and QRS information should not be neglected in RS TDI. Full article

(This article belongs to the Special Issue Target Recognition and Change Detection for High-Resolution Remote Sensing Images)

► Show Figures

Graphical abstract

20 pages, 21510 KiB

Open AccessArticle

Visual Localization Method for Fastener-Nut Disassembly and Assembly Robot Based on Improved Canny and HOG-SED

by Xiangang Cao, Mengzhen Zuo, Guoyin Chen, Xudong Wu, Peng Wang and Yizhe Liu

Appl. Sci. 2025, 15(3), 1645; https://doi.org/10.3390/app15031645 - 6 Feb 2025

Viewed by 601

Abstract

Visual positioning accuracy is crucial for ensuring the successful execution of nut disassembly and assembly tasks by a fastener-nut disassembly and assembly robot. However, disturbances such as on-site lighting changes, abnormal surface conditions of nuts, and complex backgrounds formed by ballast in complex [...] Read more.

Visual positioning accuracy is crucial for ensuring the successful execution of nut disassembly and assembly tasks by a fastener-nut disassembly and assembly robot. However, disturbances such as on-site lighting changes, abnormal surface conditions of nuts, and complex backgrounds formed by ballast in complex railway environments can lead to poor visual positioning accuracy of the fastener nuts, thereby affecting the success rate of the robot’s continuous disassembly and assembly operations. Additionally, the existing method of detecting fasteners first and then positioning nuts has poor applicability in the field. A direct positioning algorithm for spiral rail spikes that combines an improved Canny algorithm with shape feature similarity determination is proposed in response to these issues. Firstly, CLAHE enhances the image, reducing the impact of varying lighting conditions in outdoor work environments on image details. Then, to address the difficulties in extracting the edges of rail spikes caused by abnormal conditions such as water stains, rust, and oil stains on the nuts themselves, the Canny algorithm is improved through three stages, filtering optimization, gradient boosting, and adaptive thresholding, to reduce the impact of edge loss on subsequent rail spike positioning results. Finally, considering the issue of false fitting due to background interference, such as ballast in gradient Hough transformations, the differences in texture and shape features between the rail spike and interference areas are analyzed. The HOG is used to describe the shape features of the area to be screened, and the similarity between the screened area and the standard rail spike template features is compared based on the standard Euclidean distance to determine the rail spike area. Spiral rail spikes are discriminated based on shape features, and the center coordinates of the rail spike are obtained. Experiments were conducted using images collected from the field, and the results showed that the proposed algorithm, when faced with complex environments with multiple interferences, has a correct detection rate higher than 98% and a positioning error mean of 0.9 mm. It exhibits excellent interference resistance and meets the visual positioning accuracy requirements for robot nut disassembly and assembly operations in actual working environments. Full article

(This article belongs to the Section Applied Industrial Technologies)

► Show Figures

Figure 1

18 pages, 2622 KiB

Open AccessArticle

Transformer-Based Explainable Model for Breast Cancer Lesion Segmentation

by Huina Wang, Lan Wei, Bo Liu, Jianqiang Li, Jinshu Li, Juan Fang and Catherine Mooney

Appl. Sci. 2025, 15(3), 1295; https://doi.org/10.3390/app15031295 - 27 Jan 2025

Viewed by 809

Abstract

Breast cancer is one of the most prevalent cancers among women, with early detection playing a critical role in improving survival rates. This study introduces a novel transformer-based explainable model for breast cancer lesion segmentation (TEBLS), aimed at enhancing the accuracy and interpretability [...] Read more.

Breast cancer is one of the most prevalent cancers among women, with early detection playing a critical role in improving survival rates. This study introduces a novel transformer-based explainable model for breast cancer lesion segmentation (TEBLS), aimed at enhancing the accuracy and interpretability of breast cancer lesion segmentation in medical imaging. TEBLS integrates a multi-scale information fusion approach with a hierarchical vision transformer, capturing both local and global features by leveraging the self-attention mechanism. This model addresses the limitations of existing segmentation methods, such as the inability to effectively capture long-range dependencies and fine-grained semantic information. Additionally, TEBLS incorporates visualization techniques to provide insights into the segmentation process, enhancing the model’s interpretability for clinical use. Experiments demonstrate that TEBLS outperforms traditional and existing deep learning-based methods in segmenting complex breast cancer lesions with variations in size, shape, and texture, achieving a mean DSC of 81.86% and a mean AUC of 97.72% on the CBIS-DDSM test set. Our model not only improves segmentation accuracy but also offers a more explainable framework, which has the potential to be used in clinical settings. Full article

(This article belongs to the Special Issue Machine Learning and Reasoning for Reliable and Explainable AI)

► Show Figures

Figure 1

Figure 1
Overview of the framework. The model begins with patch partitioning and linear embedding to transform input images into sequential data. The encoder consists of multiple swin transformer blocks, patch merging layers, and skip connections for feature fusion. The bottleneck layer processes the encoded features with Multi-Head Self-Attention (MSA) and Multi-Layer Perceptron (MLP) modules. The decoder includes patch expansion, normalization, and linear projection layers to produce high-resolution segmentation outputs. The output is processed through global average pooling and a softmax layer for pixel-level classification, with Grad-CAM used to visualize the results. Full article ">Figure 2
The flowchart illustrates the main structure and data flow of TEBLS. The rounded rectangles represent the input and output modules of the model, while the gray rectangles indicate the input data preprocessing module. The blue rectangles represent the transformer-based dense nested feature fusion network module proposed in this paper, with the green rectangular frames representing the encoder and decoder parts of this module, which consist of three swin transformer blocks and swin transformer upsampling, respectively. The yellow rectangles represent the lightweight channel enhancement method based on the multi-scale features module proposed in this paper, which includes group convolution and channel transformation. Full article ">Figure 3
The loss and score curve during the model training process indicates that the loss function stabilized when epoch = 150. Full article ">Figure 4
A performance comparison of different models in terms of parameter complexity, inference time, and segmentation accuracy. (A) A parameter count comparison shows that TEBLS has the fewest parameters, highlighting its lightweight nature. (B) An inference time comparison, demonstrating that TEBLS was the most efficient model, with faster processing compared to Swin-Unet++ and Swin-Unet. (C) A confusion matrix showing the results of the TEBLS model using the test set shows that the model’s sensitivity was 0.7602. (D) Segmentation performance, where TEBLS outperformed other models by accurately capturing lesion regions with clear details at image edges and within lesion areas. Full article ">Figure 5
Visualizations of TEBLS outputs. The set includes the original input images, ground truth segmentations, TEBLS predictions, Grad-CAM visualizations highlighting model focus, and superimposed images showing the overlay of Grad-CAM heatmaps on the original images. The Grad-CAM visualizations help illustrate which areas of the image TEBLS prioritized during segmentation, providing insight into the model’s decision-making process (TP: true positive; FP: false positive; FN: false negative). Full article ">

21 pages, 5349 KiB

Open AccessArticle

RST-DeepLabv3+: Multi-Scale Attention for Tailings Pond Identification with DeepLab

by Xiangrui Feng, Caiyong Wei, Xiaojing Xue, Qian Zhang and Xiangnan Liu

Remote Sens. 2025, 17(3), 411; https://doi.org/10.3390/rs17030411 - 25 Jan 2025

Viewed by 454

Abstract

Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. [...] Read more.

Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. Therefore, tailing pond identification based on computer vision is of practical significance for environmental protection and safety. In the context of identifying tailings ponds in remote sensing, a significant challenge arises due to high-resolution images, which capture extensive feature details—such as shape, location, and texture—complicated by the mixing of tailings with other waste materials. This results in substantial intra-class variance and limited inter-class variance, making accurate recognition more difficult. Therefore, to monitor tailing ponds, this study utilized an improved version of DeepLabv3+, which is a widely recognized deep learning model for semantic segmentation. We introduced the multi-scale attention modules, ResNeSt and SENet, into the DeepLabv3+ encoder. The split-attention module in ResNeSt captures multi-scale information when processing multiple sets of feature maps, while the SENet module focuses on channel attention, improving the model’s ability to distinguish tailings ponds from other materials in images. Additionally, the tailing pond semantic segmentation dataset NX-TPSet was established based on the Gauge-Fractional-6 image. The ablation experiments show that the recognition accuracy (intersection and integration ratio, IOU) of the RST-DeepLabV3+ model was improved by 1.19% to 93.48% over DeepLabV3+.The multi-attention module enables the model to integrate multi-scale features more effectively, which not only improves segmentation accuracy but also directly contributes to more reliable and efficient monitoring of tailings ponds. The proposed approach achieves top performance on two benchmark datasets, NX-TPSet and TPSet, demonstrating its effectiveness as a practical and superior method for real-world tailing pond identification. Full article

► Show Figures

Figure 1

19 pages, 7485 KiB

Open AccessArticle

Design of an Optimal Convolutional Neural Network Architecture for MRI Brain Tumor Classification by Exploiting Particle Swarm Optimization

by Sofia El Amoury, Youssef Smili and Youssef Fakhri

J. Imaging 2025, 11(2), 31; https://doi.org/10.3390/jimaging11020031 - 24 Jan 2025

Viewed by 660

Abstract

The classification of brain tumors using MRI scans is critical for accurate diagnosis and effective treatment planning, though it poses significant challenges due to the complex and varied characteristics of tumors, including irregular shapes, diverse sizes, and subtle textural differences. Traditional convolutional neural [...] Read more.

The classification of brain tumors using MRI scans is critical for accurate diagnosis and effective treatment planning, though it poses significant challenges due to the complex and varied characteristics of tumors, including irregular shapes, diverse sizes, and subtle textural differences. Traditional convolutional neural network (CNN) models, whether handcrafted or pretrained, frequently fall short in capturing these intricate details comprehensively. To address this complexity, an automated approach employing Particle Swarm Optimization (PSO) has been applied to create a CNN architecture specifically adapted for MRI-based brain tumor classification. PSO systematically searches for an optimal configuration of architectural parameters—such as the types and numbers of layers, filter quantities and sizes, and neuron numbers in fully connected layers—with the objective of enhancing classification accuracy. This performance-driven method avoids the inefficiencies of manual design and iterative trial and error. Experimental results indicate that the PSO-optimized CNN achieves a classification accuracy of 99.19%, demonstrating significant potential for improving diagnostic precision in complex medical imaging applications and underscoring the value of automated architecture search in advancing critical healthcare technology. Full article

(This article belongs to the Special Issue Learning and Optimization for Medical Imaging)

► Show Figures

Figure 1

20 pages, 8734 KiB

Open AccessArticle

Enhancing Blood Cell Diagnosis Using Hybrid Residual and Dual Block Transformer Network

by Vishesh Tanwar, Bhisham Sharma, Dhirendra Prasad Yadav and Ashutosh Dhar Dwivedi

Bioengineering 2025, 12(2), 98; https://doi.org/10.3390/bioengineering12020098 - 22 Jan 2025

Viewed by 723

Abstract

Leukemia is a life-threatening blood cancer that affects a large cross-section of the population, which underscores the great need for timely, accurate, and efficient diagnostic solutions. Traditional methods are time-consuming, subject to human vulnerability, and do not always grasp the subtle morphological differences [...] Read more.

Leukemia is a life-threatening blood cancer that affects a large cross-section of the population, which underscores the great need for timely, accurate, and efficient diagnostic solutions. Traditional methods are time-consuming, subject to human vulnerability, and do not always grasp the subtle morphological differences that form the basic discriminatory features among different leukemia subtypes. The proposed residual vision transformer (ResViT) model breaks these limitations by combining the advantages of ResNet-50 for high dimensional feature extraction and a vision transformer for global attention to the spatial features. ResViT can extract low-level features like texture and edges as well as high-level features like patterns and shapes from the leukemia cell images. Furthermore, we designed a dual-stream ViT with a convolution stream for local details and a transformer stream for capturing the global dependencies, which enables ResViT to pay attention to multiple image regions simultaneously. The evaluation results of the proposed model on the two datasets were more than 99%, which makes it an excellent candidate for clinical diagnostics. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning Applications in Healthcare)

► Show Figures

Figure 1

14 pages, 2060 KiB

Open AccessArticle

Detection of Acromion Types in Shoulder Magnetic Resonance Image Examination with Developed Convolutional Neural Network and Textural-Based Content-Based Image Retrieval System

by Mehmet Akçiçek, Mücahit Karaduman, Bülent Petik, Serkan Ünlü, Hursit Burak Mutlu and Muhammed Yildirim

J. Clin. Med. 2025, 14(2), 505; https://doi.org/10.3390/jcm14020505 - 14 Jan 2025

Viewed by 663

Abstract

Background: The morphological type of the acromion may play a role in the etiopathogenesis of various pathologies, such as shoulder impingement syndrome and rotator cuff disorders. Therefore, it is important to determine the acromion’s morphological types accurately and quickly. In this study, it [...] Read more.

Background: The morphological type of the acromion may play a role in the etiopathogenesis of various pathologies, such as shoulder impingement syndrome and rotator cuff disorders. Therefore, it is important to determine the acromion’s morphological types accurately and quickly. In this study, it was aimed to detect the acromion shape, which is one of the etiological causes of chronic shoulder disorders that may cause a decrease in work capacity and quality of life, on shoulder MR images by developing a new model for image retrieval in Content-Based Image Retrieval (CBIR) systems. Methods: Image retrieval was performed in CBIR systems using Convolutional Neural Network (CNN) architectures and textural-based methods as the basis. Feature maps of the images were extracted to measure image similarities in the developed CBIR system. For feature map extraction, feature extraction was performed with Histogram of Gradient (HOG), Local Binary Pattern (LBP), Darknet53, and Densenet201 architectures, and the Minimum Redundancy Maximum Relevance (mRMR) feature selection method was used for feature selection. The feature maps obtained after the dimensionality reduction process were combined. The Euclidean distance and Peak Signal-to-Noise Ratio (PSNR) were used as similarity measurement methods. Image retrieval was performed using features obtained from CNN architectures and textural-based models to compare the performance of the proposed method. Results: The highest Average Precision (AP) value was reached in the PSNR similarity measurement method with 0.76 in the proposed model. Conclusions: The proposed model is promising for accurately and rapidly determining morphological types of the acromion, thus aiding in the diagnosis and understanding of chronic shoulder disorders. Full article

(This article belongs to the Section Nuclear Medicine & Radiology)

► Show Figures

Figure 1

23 pages, 6144 KiB

Open AccessArticle

Based on the Geometric Characteristics of Binocular Imaging for Yarn Remaining Detection

by Ke Le and Yanhong Yuan

Sensors 2025, 25(2), 339; https://doi.org/10.3390/s25020339 - 9 Jan 2025

Viewed by 474

Abstract

The automated detection of yarn margins is crucial for ensuring the continuity and quality of production in textile workshops. Traditional methods rely on workers visually inspecting the yarn margin to determine the timing of replacement; these methods fail to provide real-time data and [...] Read more.

The automated detection of yarn margins is crucial for ensuring the continuity and quality of production in textile workshops. Traditional methods rely on workers visually inspecting the yarn margin to determine the timing of replacement; these methods fail to provide real-time data and cannot meet the precise scheduling requirements of modern production. The complex environmental conditions in textile workshops, combined with the cylindrical shape and repetitive textural features of yarn bobbins, limit the application of traditional visual solutions. Therefore, we propose a visual measurement method based on the geometric characteristics of binocular imaging: First, all contours in the image are extracted, and the distance sequence between the contours and the centroid is extracted. This sequence is then matched with a predefined template to identify the contour information of the yarn bobbin. Additionally, four equations for the tangent line from the camera optical center to the edge points of the yarn bobbin contour are established, and the angle bisectors of each pair of tangents are found. By solving the system of equations for these two angle bisectors, their intersection point is determined, giving the radius of the yarn bobbin. This method overcomes the limitations of monocular vision systems, which lack depth information and suffer from size measurement errors due to the insufficient repeat positioning accuracy when patrolling back and forth. Next, to address the self-occlusion issues and matching difficulties during binocular system measurements caused by the yarn bobbin surface’s repetitive texture, an imaging model is established based on the yarn bobbin’s cylindrical characteristics. This avoids pixel-by-pixel matching in binocular vision and enables the accurate measurement of the remaining yarn margin. The experimental data show that the measurement method exhibits high precision within the recommended working distance range, with an average error of only 0.68 mm. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

14 pages, 3058 KiB

Open AccessArticle

A Combined Frame Difference and Convolution Method for Moving Vehicle Detection in Satellite Videos

by Xin Luo, Jiatian Li, Xiaohui A and Yuxi Deng

Sensors 2025, 25(2), 306; https://doi.org/10.3390/s25020306 - 7 Jan 2025

Viewed by 462

Abstract

To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively [...] Read more.

To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively integrate spatiotemporal information. First, a frame difference module (FDM) is designed, combining frame difference and convolution. This module extracts motion features between adjacent frames using frame difference, refines them through backpropagation in the neural network, and integrates them with the current frame to compensate for the missing motion features in single-frame images. Next, the initial features are processed by a backbone network to further extract spatiotemporal feature information. The neck incorporates deformable convolution, which adaptively adjusts convolution kernel sampling positions, optimizing feature representation and enabling effective multiscale information integration. Additionally, shallow large-scale feature maps, which use smaller receptive fields to focus on small targets and reduce background interference, are fed into the detection head. To enhance small-target feature representation, a small-target self-reconstruction module (SR-TOD) is introduced between the neck and the detection head. Experiments using the Jilin-1 satellite video dataset demonstrate that the proposed method outperforms comparison models, significantly reducing missed detections caused by weak color and texture features and blurred boundaries. For the satellite-video moving vehicle detection task, this method achieves notable improvements, with an average F1-score increase of 3.9% and a per-frame processing speed enhancement of 7 s compared to the next best model, DSFNet. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

Figure 1
The general framework of the methodology is proposed in this paper. Full article ">Figure 2
Efficient multiscale attention (EMA) module [<a href="#B30-sensors-25-00306" class="html-bibr">30</a>]. Full article ">Figure 3
Frame difference module (FDM) architecture. Full article ">Figure 4
Feature extraction network architecture. Full article ">Figure 5
Self-reconstructed tiny object detection framework (SR-TOD). Full article ">Figure 6
Sample presentation of the Jilin-1 satellite dataset. Full article ">Figure 7
Results of different comparison methods in the test dataset. Green boxes: TP. Red boxes: FN. Blue boxes: FP. Full article ">Figure 8
Visual comparison of the traditional frame difference method and the use of FDM. (a) Example of Jilin-1 satellite image (green dots represent ground truth). (b) Traditional frame difference method. (c) Using the frame difference module (FDM). Full article ">

29 pages, 17674 KiB

Open AccessArticle

Noise-Perception Multi-Frame Collaborative Network for Enhanced Polyp Detection in Endoscopic Videos

by Haoran Li, Guoyong Zhen, Chengqun Chu, Yuting Ma and Yongnan Zhao

Electronics 2025, 14(1), 62; https://doi.org/10.3390/electronics14010062 - 27 Dec 2024

Viewed by 598

Abstract

The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp [...] Read more.

The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp detection in video frames. To tackle these challenges, we deployed multivariate regression analysis to refine the model and introduced a Noise-Suppressing Perception Network (NSPNet) designed for enhanced performance. NSPNet leverages wavelet transform to enhance the model’s resistance to noise and artifacts while improving a multi-frame collaborative detection strategy for dynamic polyp detection in endoscopic videos, efficiently utilizing temporal information to strengthen features across frames. Specifically, we designed a High-Low Frequency Feature Fusion (HFLF) framework, which allows the model to capture high-frequency details more effectively. Additionally, we introduced an improved STFT-LSTM Polyp Detection (SLPD) module that utilizes temporal information from video sequences to enhance feature fusion in dynamic environments. Lastly, we integrated an Image Augmentation Polyp Detection (IAPD) module to improve performance on unseen data through preprocessing enhancement strategies. Extensive experiments demonstrate that NSPNet outperforms nine SOTA methods across four datasets on key performance metrics, including F1Score and recall. Full article

► Show Figures

Figure 1

21 pages, 6234 KiB

Open AccessArticle

Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer

by Naohiro Masuda, Keiko Ono, Daisuke Tawara, Yusuke Matsuura and Kentaro Sakabe

Sensors 2025, 25(1), 81; https://doi.org/10.3390/s25010081 - 26 Dec 2024

Viewed by 625

Abstract

The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability [...] Read more.

The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability to incorporate positional information. As orthopedic surgery increasingly requires precise automatic diagnosis, we explored SegFormer, an enhanced Vision Transformer model that better handles spatial awareness in segmentation tasks. However, SegFormer’s effectiveness is typically limited by its need for extensive training data, which is particularly challenging in medical imaging, where obtaining labeled ground truths (GTs) is a costly and resource-intensive process. In this paper, we propose two models and their combination to enable accurate feature extraction from smaller datasets by improving SegFormer. Specifically, these include the data-efficient model, which deepens the hierarchical encoder by adding convolution layers to transformer blocks and increases feature map resolution within transformer blocks, and the FPN-based model, which enhances the decoder through a Feature Pyramid Network (FPN) and attention mechanisms. Testing our model on spine images from the Cancer Imaging Archive and our own hand and wrist dataset, ablation studies confirmed that our modifications outperform the original SegFormer, U-Net, and Mask2Former. These enhancements enable better image feature extraction and more precise object contour detection, which is particularly beneficial for medical imaging applications with limited training data. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

24 pages, 5004 KiB

Open AccessArticle

SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers

by Dian Jiao, Nan Su, Yiming Yan, Ying Liang, Shou Feng, Chunhui Zhao and Guangjun He

Remote Sens. 2024, 16(24), 4734; https://doi.org/10.3390/rs16244734 - 18 Dec 2024

Viewed by 805

Abstract

Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in [...] Read more.

Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in object size and textures in remote sensing images. To address that problem, we present SymSwin, a super-resolution model based on the Swin transformer aimed to capture a multi-scale context. The symmetric multi-scale window (SyMW) mechanism is proposed and integrated in the backbone, which is capable of perceiving features with various sizes. First, the SyMW mechanism is proposed to capture discriminative contextual features from multi-scale presentations using corresponding attentive window size. Subsequently, a cross-receptive field-adaptive attention (CRAA) module is introduced to model the relations among multi-scale contexts and to realize adaptive fusion. Furthermore, RS data exhibit poor spatial resolution, leading to insufficient visual information when merely spatial supervision is applied. Therefore, a U-shape wavelet transform (UWT) loss is proposed to facilitate the training process from the frequency domain. Extensive experiments demonstrate that our method achieves superior performance in both quantitative metrics and visual quality compared with existing algorithms. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

Search Results (450)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (450)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI