Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (450)

Search Parameters:
Keywords = shape and image texture

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 933 KiB  
Systematic Review
Diagnostic Accuracy of Radiomics in the Early Detection of Pancreatic Cancer: A Systematic Review and Qualitative Assessment Using the Methodological Radiomics Score (METRICS)
by María Estefanía Renjifo-Correa, Salvatore Claudio Fanni, Luis A. Bustamante-Cristancho, Maria Emanuela Cuibari, Gayane Aghakhanyan, Lorenzo Faggioni, Emanuele Neri and Dania Cioni
Cancers 2025, 17(5), 803; https://doi.org/10.3390/cancers17050803 - 26 Feb 2025
Viewed by 165
Abstract
Background/Objectives: Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal malignancy with increasing incidence and low survival rate, primarily due to the late detection of the disease. Radiomics has demonstrated its utility in recognizing patterns and anomalies not perceptible to the human eye. [...] Read more.
Background/Objectives: Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal malignancy with increasing incidence and low survival rate, primarily due to the late detection of the disease. Radiomics has demonstrated its utility in recognizing patterns and anomalies not perceptible to the human eye. This systematic literature review aims to assess the application of radiomics in the analysis of pancreatic parenchyma images to identify early indicators predictive of PDAC. Methods: A systematic search of original research papers was performed on three databases: PubMed, Embase, and Scopus. Two reviewers applied the inclusion and exclusion criteria, and one expert solved conflicts for selecting the articles. After extraction and analysis of the data, there was a quality assessment of these articles using the Methodological Radiomics Score (METRICS) tool. The METRICS assessment was carried out by two raters, and conflicts were solved by a third reviewer. Results: Ten articles for analysis were retrieved. CT scan was the diagnostic imaging used in all the articles. All the studies were retrospective and published between 2019 and 2024. The main objective of the articles was to generate radiomics-based machine learning models able to differentiate pancreatic tumors from healthy tissue. The reported diagnostic performance of the model chosen yielded very high results, with a diagnostic accuracy between 86.5% and 99.2%. Texture and shape features were the most frequently implemented. The METRICS scoring assessment demonstrated that three articles obtained a moderate quality, five a good quality, and, finally, two articles yielded excellent quality. The lack of external validation and available model, code, and data were the major limitations according to the qualitative assessment. Conclusions: There is high heterogeneity in the research question regarding radiomics and pancreatic cancer. The principal limitations of the studies were mainly due to the nature of the trials and the considerable heterogeneity of the radiomic features reported. Nonetheless, the work in this field is promising, and further studies are still required to adopt radiomics in the early detection of PDAC. Full article
(This article belongs to the Special Issue Multimodality Imaging for More Precise Radiotherapy)
Show Figures

Figure 1

Figure 1
<p>Study selection process flowchart according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [<a href="#B18-cancers-17-00803" class="html-bibr">18</a>].</p>
Full article ">Figure 2
<p>Distribution of METRICS quality categorization.</p>
Full article ">
23 pages, 3368 KiB  
Article
SDKU-Net: A Novel Architecture with Dynamic Kernels and Optimizer Switching for Enhanced Shadow Detection in Remote Sensing
by Gilberto Alvarado-Robles, Isac Andres Espinosa-Vizcaino, Carlos Gustavo Manriquez-Padilla and Juan Jose Saucedo-Dorantes
Computers 2025, 14(3), 80; https://doi.org/10.3390/computers14030080 - 23 Feb 2025
Viewed by 668
Abstract
Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex [...] Read more.
Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex remote sensing scenarios. SDKU-Net integrates dynamic kernel adjustment, a combined loss function incorporating Focal and Tversky Loss, and optimizer switching to effectively tackle class imbalance and improve segmentation quality. Using the AISD dataset, the proposed method achieved state-of-the-art performance with an Intersection over Union (IoU) of 0.8552, an F1-Score of 0.9219, an Overall Accuracy (OA) of 96.50%, and a Balanced Error Rate (BER) of 5.08%. Comparative analyses demonstrate SDKU-Net’s superior performance against established methods such as U-Net, U-Net++, MSASDNet, and CADDN. Additionally, the model’s efficient training process, requiring only 75 epochs, highlights its potential for resource-constrained applications. These results underscore the robustness and adaptability of SDKU-Net, paving the way for advancements in shadow detection and segmentation across diverse fields. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Methodological framework of the SDKU-Net model.</p>
Full article ">Figure 2
<p>Architecture of SDKU-Net featuring dynamic kernel adjustments, Batch Normalization (BN), and residual connections for optimized segmentation.</p>
Full article ">Figure 3
<p>Example of a mask overlaid on an original image from the AISD dataset, highlighting the imbalance between shadowed and non-shadowed regions.</p>
Full article ">Figure 4
<p>Diagram of a dynamic convolutional block with residual addition, where the blue block represents the input feature map, the purple blocks correspond to the dynamic convolutional layers with kernel sizes, each followed by Batch Normalization (BN) and a ReLU activation, the light blue block represents the shortcut connection convolution and, The gray block corresponds to the final output after residual addition and ReLU activation..</p>
Full article ">Figure 5
<p>IoU evaluation process used to dynamically adjust kernel sizes and optimizers based on validation metrics.</p>
Full article ">Figure 6
<p>Optimizer-switching mechanism alternating between Adam and Nadam based on model performance metrics.</p>
Full article ">Figure 7
<p>Kernel size adjustment process based on patience counters and IoU performance metrics during training.</p>
Full article ">Figure 8
<p>Qualitative comparison of shadow detection results for different methods. (<b>a</b>) Original image, (<b>b</b>) ground truth, (<b>c</b>) U-Net (F1 = 0.9095), (<b>d</b>) U-Net++ (F1 = 0.9095), (<b>e</b>) DSSNet (F1 = 0.9120), (<b>f</b>) MSASDNet (F1 = 0.9352), (<b>g</b>) CADDN (F1 = 0.9045), (<b>h</b>) proposed method (F1 = 0.9293).</p>
Full article ">Figure 9
<p>Comparison of shadow segmentation results for the second sample: (<b>a</b>) original image, (<b>b</b>) ground truth, (<b>c</b>) U-Net (F1: 0.9250), (<b>d</b>) U-Net++ (F1: 0.9374), (<b>e</b>) DSSNet (F1: 0.8912), (<b>f</b>) MSASDNet (F1: 0.9443), (<b>g</b>) CADDN (F1: 0.9272), (<b>h</b>) proposed method (DKU-Net, F1: 0.9509).</p>
Full article ">Figure 10
<p>Comparison of shadow segmentation results for the third sample: (<b>a</b>) original image, (<b>b</b>) ground truth, (<b>c</b>) U-Net (F1: 0.8964), (<b>d</b>) U-Net++ (F1: 0.9241), (<b>e</b>) DSSNet (F1: 0.9057), (<b>f</b>) MSASDNet (F1: 0.8667), (<b>g</b>) CADDN (F1: 0.9293), (<b>h</b>) proposed method (DKU-Net, F1: 0.9565).</p>
Full article ">Figure 11
<p>Achieved result obtained through the SDKU-Net segmentation in complex areas, showcasing its ability to capture intricate shadow details.</p>
Full article ">Figure 12
<p>Visualization of the performance obtained by the generalization of the SDKU-Net which leads the detection of shadows absent from ground truth annotations.</p>
Full article ">
22 pages, 9103 KiB  
Article
IRST-CGSeg: Infrared Small Target Detection Based on Clustering-Guided Graph Learning and Hierarchical Features
by Guimin Jia, Tao Chen, Yu Cheng and Pengyu Lu
Electronics 2025, 14(5), 858; https://doi.org/10.3390/electronics14050858 - 21 Feb 2025
Viewed by 282
Abstract
Infrared small target detection (IRSTD) aims to segment small targets from an infrared clutter background. However, the long imaging distance, complex background, and extremely limited number of target pixels pose great challenges for IRSTD. In this paper, we propose a new IRSTD method [...] Read more.
Infrared small target detection (IRSTD) aims to segment small targets from an infrared clutter background. However, the long imaging distance, complex background, and extremely limited number of target pixels pose great challenges for IRSTD. In this paper, we propose a new IRSTD method based on the deep graph neural network to fully extract and fuse the texture and structural information of images. Firstly, a clustering algorithm is designed to divide the image into several subgraphs as a prior knowledge to guide the initialization of the graph structure of the infrared image, and the image texture features are integrated to graph construction. Then, a graph feature extraction module is designed, which guides nodes to interact with features within their subgraph via the adjacency matrix. Finally, a hierarchical graph texture feature fusion module is designed to concatenate and stack the structure and texture information at different levels to realize IRSTD. Extensive experiments have been conducted, and the experimental results demonstrate that the proposed method has high interaction over union (IoU) and probability of detection (Pd) on public datasets and the self-constructed dataset, indicating that it has fine shape segmentation and accurate positioning for infrared small targets. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>IRST-CGSeg structure.</p>
Full article ">Figure 2
<p>ROC curve under SCISTD1 datasets.</p>
Full article ">Figure 3
<p>ROC curve under SCISTD2 datasets.</p>
Full article ">Figure 4
<p>Visual comparison of different methods on the SCISTD1 dataset. The red box represents the correct detection, the yellow box represents the false detection, and the blue box represents the missed detection.</p>
Full article ">Figure 5
<p>Visual comparison of different methods on the SCISTD2 dataset. The red box represents the correct detection, the yellow box represents the false detection, and the blue box represents the missed detection.</p>
Full article ">Figure 6
<p>Visualization maps of the impact of GFFB on the SCISTD1 dataset.</p>
Full article ">Figure 7
<p>Visualization maps of the impact of GFFB on the SCISTD2 dataset.</p>
Full article ">Figure 8
<p>Visualization maps of the hierarchical graph texture fusion mechanism.</p>
Full article ">Figure 9
<p>Image samples with various complex backgrounds; (<b>a</b>–<b>e</b>) are building, ground, plant, sea, and sky–cloud background, respectively.</p>
Full article ">Figure 10
<p>ROC curves under building environments.</p>
Full article ">Figure 11
<p>ROC curves under ground environments.</p>
Full article ">Figure 12
<p>ROC curves under plant environments.</p>
Full article ">Figure 13
<p>ROC curves under sea environments.</p>
Full article ">Figure 14
<p>ROC curves under sky–cloud environments.</p>
Full article ">Figure 15
<p>Visual comparison of different methods on five backgrounds. The red box represents the correct detection, the yellow box represents the false detection, and the blue box represents the missed detection.</p>
Full article ">
29 pages, 12323 KiB  
Article
Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines
by Xingfeng Chen, Yunli Zhang, Wu Xue, Shumin Liu, Jiaguo Li, Lei Meng, Jian Yang, Xiaofei Mi, Wei Wan and Qingyan Meng
Remote Sens. 2025, 17(5), 733; https://doi.org/10.3390/rs17050733 - 20 Feb 2025
Viewed by 284
Abstract
Small Target Detection and Identification (TDI) methods for Remote Sensing (RS) images are mostly inherited from the deep learning models of the Computer Vision (CV) field. Compared with natural images, RS images not only have common features such as shape and texture but [...] Read more.
Small Target Detection and Identification (TDI) methods for Remote Sensing (RS) images are mostly inherited from the deep learning models of the Computer Vision (CV) field. Compared with natural images, RS images not only have common features such as shape and texture but also contain unique quantitative information such as spectral features. Therefore, RS TDI in the CV field, which does not use Quantitative Remote Sensing (QRS) information, has the potential to be explored. With the rapid development of high-resolution RS satellites, RS wind turbine detection has become a key research topic for power intelligent inspection. To test the effectiveness of integrating QRS information with deep learning models, the case of wind turbine TDI from high-resolution satellite images was studied. The YOLOv5 model was selected for research because of its stability and high real-time performance. The following methods for integrating QRS and CV for TDI were proposed: (1) Surface reflectance (SR) images obtained using quantitative Atmospheric Correction (AC) were used to make wind turbine samples, and SR data were input into the YOLOv5 model (YOLOv5_AC). (2) A Convolutional Block Attention Module (CBAM) was added to the YOLOv5 network to focus on wind turbine features (YOLOv5_AC_CBAM). (3) Based on the identification results of YOLOv5_AC_CBAM, the spectral, geometric, and textural features selected using expert knowledge were extracted to conduct threshold re-identification (YOLOv5_AC_CBAM_Exp). Accuracy increased from 90.5% to 92.7%, then to 93.2%, and finally to 97.4%. The integration of QRS and CV for TDI showed tremendous potential to achieve high accuracy, and QRS information should not be neglected in RS TDI. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Statistics on the number of RS HSR satellites launched in China, USA, and the world, and the “total” refers to the total number of RS satellites launched (<a href="https://space.oscar.wmo.int/" target="_blank">https://space.oscar.wmo.int/</a>, last accessed on 23 July 2024).</p>
Full article ">Figure 2
<p>The basic information of LuoJiaSET’s DOTA dataset for TDI (<a href="https://captain-whu.github.io/DOTA/index.html" target="_blank">https://captain-whu.github.io/DOTA/index.html</a>, last accessed on 12 May 2024).</p>
Full article ">Figure 3
<p>The process of GF-2 L1-level image preprocessing. The pixels of GF-2 L1-level images represent the DN value. The GF-2 L1-level images may have topographic distortion, which can be eliminated by orthographic correction. The whole process of quantitative AC includes three steps: radiometric calibration, path radiation correction, and adjacency effect correction. After quantitative AC, the pixels in the image represent the spectral reflectance of the ground object. Image fusion consists of the fusion of the multispectral and panchromatic images of the GF-2 satellite to improve image resolution. GF-2 L1-level images with wind turbines underwent two different preprocessing stages to obtain two different sets of images: SR images and DN value images.</p>
Full article ">Figure 4
<p>A comparison of images before (<b>left</b>) and after (<b>right</b>) AC. After quantitative AC, the quality and clarity of images were significantly improved.</p>
Full article ">Figure 5
<p>Labeled examples of wind turbine samples. The labeled samples were obtained through the manual annotation of horizontal boxes. By way of visual annotation, the wind turbine body and the shadow projected on the ground were included in the labeled box as targets.</p>
Full article ">Figure 6
<p>An overview of the specific research route. The SR data represent the sample database with spectral information after AC. The DN value and SR sample databases were split into training, validation, and testing sets in an 8:1:1 ratio. The three RGB bands of DN value data were used for YOLOv5 model training and testing, and the three RGB bands of SR data were used for YOLOv5_AC and YOLOv5_AC_CBAM training and testing. Based on the identification results of YOLOv5_AC_CBAM, YOLOv5_AC_CBAM utilized the feature information of the four bands of SR data.</p>
Full article ">Figure 7
<p>The YOLOv5_AC_CBAM model structure. The red boxes are the added CBAMs in the neck part [<a href="#B41-remotesensing-17-00733" class="html-bibr">41</a>].</p>
Full article ">Figure 8
<p>The structure of the CBAM attention mechanism (adapted with permission from Ref. [<a href="#B43-remotesensing-17-00733" class="html-bibr">43</a>]). MaxPool is used to find the maximum value of the feature points in the neighborhood, while AvgPool is used to find the average of the feature points in the neighborhood. And the shared MLP means that the two layers of the neural network are shared.</p>
Full article ">Figure 9
<p>Six common types of false detection ground objects, including striated ground, green land, road, farmland, building, and power tower. The similarity of these false detection ground objects was made up of high-reflection and low-shadow pixels. In terms of geometric, texture, and other features, these ground objects were significantly different from wind turbines.</p>
Full article ">Figure 10
<p>Average spectral curves for multiple ground objects (<b>a</b>) and wind turbines (<b>b</b>) with error bars. The sample size for each ground object was set at 20 pixels. The average spectral reflectance refers to the average value of the spectral reflectance of the adjacent 20 pixels in the area of the ground object. In the spectral curve of the wind turbine, the reflectance of most wind turbine bodies was relatively the lowest in the blue band and was greater than 0.2. The shadow of wind turbines belongs to the dark pixel, so the reflectance was low. The reflectance of most false detection ground objects was lower than 0.2 in the blue band.</p>
Full article ">Figure 11
<p>Statistical histograms of false detection ground objects (<b>a</b>) and wind turbines (<b>b</b>) reflectance in the blue band. “Frequency” refers to the number of pixels corresponding to a certain reflectivity value in the prediction box. “Reflectance” ranges from 0 to 1. Most of the false detection ground objects have few pixels with reflectance exceeding 0.2 in the blue band, but there are relatively more pixels with the reflectance of the wind turbine exceeding 0.2. Therefore, the difference information of the spectral features of the false detection ground objects and wind turbine can be separated from the blue band.</p>
Full article ">Figure 12
<p>Comparison of predicted bounding box size of false detection ground objects (<b>a</b>–<b>d</b>) and wind turbine targets (<b>e</b>,<b>f</b>).</p>
Full article ">Figure 13
<p>A comparison of the characteristic GLCM parameters of false detection ground objects and wind turbine targets. The “Angle” refers to the four angle values in the GLCM. Apart from CON, it is challenging to distinguish between false detections and wind turbines using HOM, DIS, ENT, ASM, and COR. At the <math display="inline"><semantics> <mrow> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mi>π</mi> </mrow> <mrow> <mn>4</mn> </mrow> </mfrac> </mstyle> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mn>3</mn> <mi>π</mi> </mrow> <mrow> <mn>4</mn> </mrow> </mfrac> </mstyle> </mrow> </semantics></math> angles, the CON of the wind turbine is higher than that of most false detections. Therefore, the textural feature differences between false detections and wind turbines can be effectively extracted from CON.</p>
Full article ">Figure 13 Cont.
<p>A comparison of the characteristic GLCM parameters of false detection ground objects and wind turbine targets. The “Angle” refers to the four angle values in the GLCM. Apart from CON, it is challenging to distinguish between false detections and wind turbines using HOM, DIS, ENT, ASM, and COR. At the <math display="inline"><semantics> <mrow> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mi>π</mi> </mrow> <mrow> <mn>4</mn> </mrow> </mfrac> </mstyle> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mn>3</mn> <mi>π</mi> </mrow> <mrow> <mn>4</mn> </mrow> </mfrac> </mstyle> </mrow> </semantics></math> angles, the CON of the wind turbine is higher than that of most false detections. Therefore, the textural feature differences between false detections and wind turbines can be effectively extracted from CON.</p>
Full article ">Figure 14
<p>A flowchart of dynamic threshold re-identification aided by high-confidence target feature information. Target identification was performed on the GF-2 SR images using YOLOv5_AC_CBAM. Based on the identification results, objects with a prediction confidence score greater than 0.8 were classified as high-confidence targets, while those with lower scores were considered low-confidence objects. These low-confidence objects will be re-identified. The image texture, quantitative geometry, and spectral reflectance features extracted from the high-confidence targets were used to dynamically adjust the threshold conditions for re-identification. The threshold re-identification conditions contained Equations (5)–(8). Ultimately, objects that meet the conditions of Equations (5)–(8) were confirmed as targets, while those that do not were classified as non-targets.</p>
Full article ">Figure 15
<p>The missed detection (<b>a</b>) and false detection (<b>b</b>) objects of the YOLOv5 identification. The red boxes represent the missed detection objects, and the orange boxes show the false detection objects of the YOLOv5 model identification.</p>
Full article ">Figure 16
<p>The accuracy improvement effects of three identification methods. The rightmost number represents the identification accuracy and the increase in the accuracy of wind turbine test images. Compared with the identification result of YOLOv5, the identification accuracy of YOLOv5_AC increased by 2.2%. Compared with YOLOv5_AC, the identification accuracy of YOLOv5_AC_CBAM increased by 0.5%. On the basis of YOLOv5_AC_CBAM, the identification accuracy of YOLOv5_AC_CBAM_Exp improved by 4.2%. When QRS feature information was added for threshold re-identification, YOLOv5_AC_CBAM_Exp achieved the highest improvement.</p>
Full article ">
20 pages, 21510 KiB  
Article
Visual Localization Method for Fastener-Nut Disassembly and Assembly Robot Based on Improved Canny and HOG-SED
by Xiangang Cao, Mengzhen Zuo, Guoyin Chen, Xudong Wu, Peng Wang and Yizhe Liu
Appl. Sci. 2025, 15(3), 1645; https://doi.org/10.3390/app15031645 - 6 Feb 2025
Viewed by 601
Abstract
Visual positioning accuracy is crucial for ensuring the successful execution of nut disassembly and assembly tasks by a fastener-nut disassembly and assembly robot. However, disturbances such as on-site lighting changes, abnormal surface conditions of nuts, and complex backgrounds formed by ballast in complex [...] Read more.
Visual positioning accuracy is crucial for ensuring the successful execution of nut disassembly and assembly tasks by a fastener-nut disassembly and assembly robot. However, disturbances such as on-site lighting changes, abnormal surface conditions of nuts, and complex backgrounds formed by ballast in complex railway environments can lead to poor visual positioning accuracy of the fastener nuts, thereby affecting the success rate of the robot’s continuous disassembly and assembly operations. Additionally, the existing method of detecting fasteners first and then positioning nuts has poor applicability in the field. A direct positioning algorithm for spiral rail spikes that combines an improved Canny algorithm with shape feature similarity determination is proposed in response to these issues. Firstly, CLAHE enhances the image, reducing the impact of varying lighting conditions in outdoor work environments on image details. Then, to address the difficulties in extracting the edges of rail spikes caused by abnormal conditions such as water stains, rust, and oil stains on the nuts themselves, the Canny algorithm is improved through three stages, filtering optimization, gradient boosting, and adaptive thresholding, to reduce the impact of edge loss on subsequent rail spike positioning results. Finally, considering the issue of false fitting due to background interference, such as ballast in gradient Hough transformations, the differences in texture and shape features between the rail spike and interference areas are analyzed. The HOG is used to describe the shape features of the area to be screened, and the similarity between the screened area and the standard rail spike template features is compared based on the standard Euclidean distance to determine the rail spike area. Spiral rail spikes are discriminated based on shape features, and the center coordinates of the rail spike are obtained. Experiments were conducted using images collected from the field, and the results showed that the proposed algorithm, when faced with complex environments with multiple interferences, has a correct detection rate higher than 98% and a positioning error mean of 0.9 mm. It exhibits excellent interference resistance and meets the visual positioning accuracy requirements for robot nut disassembly and assembly operations in actual working environments. Full article
(This article belongs to the Section Applied Industrial Technologies)
Show Figures

Figure 1

Figure 1
<p>Fastener-nut disassembly and assembly robot.</p>
Full article ">Figure 2
<p>The working principle of a robot.</p>
Full article ">Figure 3
<p>Overall algorithmic process.</p>
Full article ">Figure 4
<p>Four-direction gradient template.</p>
Full article ">Figure 5
<p>Fitting results of two methods for circles: (<b>a</b>) least squares; (<b>b</b>) gradient Hough transform.</p>
Full article ">Figure 6
<p>Filtering algorithm flow.</p>
Full article ">Figure 7
<p>Experimental sample classification: (<b>a</b>) normal category; (<b>b</b>) water stain category; (<b>c</b>) rust category; (<b>d</b>) oil stain category.</p>
Full article ">Figure 8
<p>Image enhancement results: (<b>a</b>) the original image; (<b>b</b>) the histogram of the original image; (<b>c</b>) the enhanced image; (<b>d</b>) the histogram of the enhanced image.</p>
Full article ">Figure 9
<p>Image filtering effect: (<b>a</b>) original image; (<b>b</b>) enhanced image; (<b>c</b>) Gaussian filtering; (<b>d</b>) median filtering; (<b>e</b>) mean filtering; (<b>f</b>) bilateral filtering.</p>
Full article ">Figure 10
<p>Comparison of fastener edge inspection results: (<b>a</b>) normal surface; (<b>b</b>) surface with water stains; (<b>c</b>) rusty surface; (<b>d</b>) surface with oil stains; (<b>a1</b>) traditional Canny result for normal surface; (<b>b1</b>) traditional Canny result for surface with water stains; (<b>c1</b>) traditional Canny result for rusty surface; (<b>d1</b>) traditional Canny result for surface with oil stains; (<b>a2</b>) improved Canny result for normal surface; (<b>b2</b>) improved Canny result for surface with water stains; (<b>c2</b>) improved Canny result for rusty surface; (<b>d2</b>) improved Canny result for surface with oil stains.</p>
Full article ">Figure 11
<p>Circle detection results: (<b>a</b>) spike detection successful; (<b>b</b>) spike detection failed.</p>
Full article ">Figure 12
<p>Spike area screening results: (<b>a</b>) successful filtering results; (<b>b</b>) unsuccessful filtering results.</p>
Full article ">Figure 13
<p>The success rate of road nail area screening for different algorithms.</p>
Full article ">Figure 14
<p>Number of failed filtering of different-sized increment values.</p>
Full article ">Figure 15
<p>Spike center positioning error.</p>
Full article ">Figure 16
<p>The positioning results under noise interference: (<b>a</b>) normal surface; (<b>b</b>) surface with water stains; (<b>c</b>) rusty surface; (<b>d</b>) surface with oil stains.</p>
Full article ">Figure 17
<p>Railway environment rail spike center positioning results: (<b>a</b>) railway ballast interference; (<b>b</b>) water stain interference; (<b>c</b>) water stain interference, shifted to the right; (<b>d</b>) rust interference; (<b>e</b>) rust interference, uneven lighting; (<b>f</b>) oil stain interference, shifted to the left; (<b>g</b>) oil stain interference, background noise; (<b>h</b>) special environment.</p>
Full article ">
18 pages, 2622 KiB  
Article
Transformer-Based Explainable Model for Breast Cancer Lesion Segmentation
by Huina Wang, Lan Wei, Bo Liu, Jianqiang Li, Jinshu Li, Juan Fang and Catherine Mooney
Appl. Sci. 2025, 15(3), 1295; https://doi.org/10.3390/app15031295 - 27 Jan 2025
Viewed by 809
Abstract
Breast cancer is one of the most prevalent cancers among women, with early detection playing a critical role in improving survival rates. This study introduces a novel transformer-based explainable model for breast cancer lesion segmentation (TEBLS), aimed at enhancing the accuracy and interpretability [...] Read more.
Breast cancer is one of the most prevalent cancers among women, with early detection playing a critical role in improving survival rates. This study introduces a novel transformer-based explainable model for breast cancer lesion segmentation (TEBLS), aimed at enhancing the accuracy and interpretability of breast cancer lesion segmentation in medical imaging. TEBLS integrates a multi-scale information fusion approach with a hierarchical vision transformer, capturing both local and global features by leveraging the self-attention mechanism. This model addresses the limitations of existing segmentation methods, such as the inability to effectively capture long-range dependencies and fine-grained semantic information. Additionally, TEBLS incorporates visualization techniques to provide insights into the segmentation process, enhancing the model’s interpretability for clinical use. Experiments demonstrate that TEBLS outperforms traditional and existing deep learning-based methods in segmenting complex breast cancer lesions with variations in size, shape, and texture, achieving a mean DSC of 81.86% and a mean AUC of 97.72% on the CBIS-DDSM test set. Our model not only improves segmentation accuracy but also offers a more explainable framework, which has the potential to be used in clinical settings. Full article
(This article belongs to the Special Issue Machine Learning and Reasoning for Reliable and Explainable AI)
Show Figures

Figure 1

Figure 1
<p>Overview of the framework. The model begins with patch partitioning and linear embedding to transform input images into sequential data. The encoder consists of multiple swin transformer blocks, patch merging layers, and skip connections for feature fusion. The bottleneck layer processes the encoded features with Multi-Head Self-Attention (MSA) and Multi-Layer Perceptron (MLP) modules. The decoder includes patch expansion, normalization, and linear projection layers to produce high-resolution segmentation outputs. The output is processed through global average pooling and a softmax layer for pixel-level classification, with Grad-CAM used to visualize the results.</p>
Full article ">Figure 2
<p>The flowchart illustrates the main structure and data flow of TEBLS. The rounded rectangles represent the input and output modules of the model, while the gray rectangles indicate the input data preprocessing module. The blue rectangles represent the transformer-based dense nested feature fusion network module proposed in this paper, with the green rectangular frames representing the encoder and decoder parts of this module, which consist of three swin transformer blocks and swin transformer upsampling, respectively. The yellow rectangles represent the lightweight channel enhancement method based on the multi-scale features module proposed in this paper, which includes group convolution and channel transformation.</p>
Full article ">Figure 3
<p>The loss and score curve during the model training process indicates that the loss function stabilized when epoch = 150.</p>
Full article ">Figure 4
<p>A performance comparison of different models in terms of parameter complexity, inference time, and segmentation accuracy. (<b>A</b>) A parameter count comparison shows that TEBLS has the fewest parameters, highlighting its lightweight nature. (<b>B</b>) An inference time comparison, demonstrating that TEBLS was the most efficient model, with faster processing compared to Swin-Unet++ and Swin-Unet. (<b>C</b>) A confusion matrix showing the results of the TEBLS model using the test set shows that the model’s sensitivity was 0.7602. (<b>D</b>) Segmentation performance, where TEBLS outperformed other models by accurately capturing lesion regions with clear details at image edges and within lesion areas.</p>
Full article ">Figure 5
<p>Visualizations of TEBLS outputs. The set includes the original input images, ground truth segmentations, TEBLS predictions, Grad-CAM visualizations highlighting model focus, and superimposed images showing the overlay of Grad-CAM heatmaps on the original images. The Grad-CAM visualizations help illustrate which areas of the image TEBLS prioritized during segmentation, providing insight into the model’s decision-making process (TP: true positive; FP: false positive; FN: false negative).</p>
Full article ">
21 pages, 5349 KiB  
Article
RST-DeepLabv3+: Multi-Scale Attention for Tailings Pond Identification with DeepLab
by Xiangrui Feng, Caiyong Wei, Xiaojing Xue, Qian Zhang and Xiangnan Liu
Remote Sens. 2025, 17(3), 411; https://doi.org/10.3390/rs17030411 - 25 Jan 2025
Viewed by 454
Abstract
Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. [...] Read more.
Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. Therefore, tailing pond identification based on computer vision is of practical significance for environmental protection and safety. In the context of identifying tailings ponds in remote sensing, a significant challenge arises due to high-resolution images, which capture extensive feature details—such as shape, location, and texture—complicated by the mixing of tailings with other waste materials. This results in substantial intra-class variance and limited inter-class variance, making accurate recognition more difficult. Therefore, to monitor tailing ponds, this study utilized an improved version of DeepLabv3+, which is a widely recognized deep learning model for semantic segmentation. We introduced the multi-scale attention modules, ResNeSt and SENet, into the DeepLabv3+ encoder. The split-attention module in ResNeSt captures multi-scale information when processing multiple sets of feature maps, while the SENet module focuses on channel attention, improving the model’s ability to distinguish tailings ponds from other materials in images. Additionally, the tailing pond semantic segmentation dataset NX-TPSet was established based on the Gauge-Fractional-6 image. The ablation experiments show that the recognition accuracy (intersection and integration ratio, IOU) of the RST-DeepLabV3+ model was improved by 1.19% to 93.48% over DeepLabV3+.The multi-attention module enables the model to integrate multi-scale features more effectively, which not only improves segmentation accuracy but also directly contributes to more reliable and efficient monitoring of tailings ponds. The proposed approach achieves top performance on two benchmark datasets, NX-TPSet and TPSet, demonstrating its effectiveness as a practical and superior method for real-world tailing pond identification. Full article
Show Figures

Figure 1

Figure 1
<p>Overall architecture of the RST-DeepLabv3+ model. We incorporated a multi-attention module into DeepLabv3+ to dynamically reallocate weights across different channels.</p>
Full article ">Figure 2
<p>Architecture of the Squeeze-and-Excitation (SE) block. The SE module introduces a novel architectural unit that performs feature recalibration by explicitly modeling channel-wise relationships. This process consists of two main operations: squeeze-and-excitation.</p>
Full article ">Figure 3
<p>Architecture of the split-attention networks.</p>
Full article ">Figure 4
<p>Geographical location of Lingwu City, Shizuishan City, Ningxia Province, China. (<b>a</b>) Sentinel-2 satellite image of Shizuishan City. (<b>b</b>) Sentinel II satellite image of Lingwu city.</p>
Full article ">Figure 5
<p>Example of semantic segmentation datasets for tailings ponds.</p>
Full article ">Figure 6
<p>Visualization results of ablation experiments on the NX-TPSet semantic segmentation dataset. (<b>a</b>) is listed as original image, (<b>b</b>) is listed as ground truth, (<b>c</b>) is listed as DeepLabv3+, (<b>d</b>) is listed as DeepLabv3++SENet, (<b>e</b>) is listed as DeepLabv3++ResNest, and (<b>f</b>) is listed as RST-DeepLabv3+. Orange rectangles are missed areas, and red rectangles are error areas.</p>
Full article ">Figure 7
<p>Visualization results of ablation experiments on the TPSet semantic segmentation dataset. (<b>a</b>) is listed as the original image, (<b>b</b>) is listed as ground truth, (<b>c</b>) is listed as DeepLabv3+, (<b>d</b>) is listed as DeepLabv3++SENet, (<b>e</b>) is listed as DeepLabv3++ResNest, and (<b>f</b>) is listed as RST-DeepLabv3+. Orange rectangles are missed areas and red rectangles are error areas.</p>
Full article ">Figure 8
<p>Comparison of experimental results on the NX-TPSet semantic segmentation datasets. (<b>a</b>) is listed as the original image, (<b>b</b>) is listed as ground truth, (<b>c</b>) is listed as PSP-Net, (<b>d</b>) is listed as U-Net, (<b>e</b>) is listed as DeepLabv3+, and (<b>f</b>) is listed as RST-DeepLabv3+. Orange rectangles are missed areas; red rectangles are error areas.</p>
Full article ">Figure 9
<p>Comparison of experimental results on the TPSet semantic segmentation datasets. (<b>a</b>) is listed as the original image, (<b>b</b>) is listed as ground truth, (<b>c</b>) is listed as PSP-Net, (<b>d</b>) is listed as U-Net, (<b>e</b>) is listed as DeepLabv3+, and (<b>f</b>) is listed as RST-DeepLabv3+. Orange rectangles are missed areas; red rectangles are error areas.</p>
Full article ">
19 pages, 7485 KiB  
Article
Design of an Optimal Convolutional Neural Network Architecture for MRI Brain Tumor Classification by Exploiting Particle Swarm Optimization
by Sofia El Amoury, Youssef Smili and Youssef Fakhri
J. Imaging 2025, 11(2), 31; https://doi.org/10.3390/jimaging11020031 - 24 Jan 2025
Viewed by 660
Abstract
The classification of brain tumors using MRI scans is critical for accurate diagnosis and effective treatment planning, though it poses significant challenges due to the complex and varied characteristics of tumors, including irregular shapes, diverse sizes, and subtle textural differences. Traditional convolutional neural [...] Read more.
The classification of brain tumors using MRI scans is critical for accurate diagnosis and effective treatment planning, though it poses significant challenges due to the complex and varied characteristics of tumors, including irregular shapes, diverse sizes, and subtle textural differences. Traditional convolutional neural network (CNN) models, whether handcrafted or pretrained, frequently fall short in capturing these intricate details comprehensively. To address this complexity, an automated approach employing Particle Swarm Optimization (PSO) has been applied to create a CNN architecture specifically adapted for MRI-based brain tumor classification. PSO systematically searches for an optimal configuration of architectural parameters—such as the types and numbers of layers, filter quantities and sizes, and neuron numbers in fully connected layers—with the objective of enhancing classification accuracy. This performance-driven method avoids the inefficiencies of manual design and iterative trial and error. Experimental results indicate that the PSO-optimized CNN achieves a classification accuracy of 99.19%, demonstrating significant potential for improving diagnostic precision in complex medical imaging applications and underscoring the value of automated architecture search in advancing critical healthcare technology. Full article
(This article belongs to the Special Issue Learning and Optimization for Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Architecture of a CNN.</p>
Full article ">Figure 2
<p>Convolution operation.</p>
Full article ">Figure 3
<p>Flowchart of the algorithm starting with parameter initialization, swarm creation, and fitness evaluation. Each particle iteratively updates its position and velocity using pBest and gBest, until the maximum number of iterations is reached.</p>
Full article ">Figure 4
<p>An encoded CNN architecture with two convolutional layers: the first with 16 kernels of size 5 × 5 and the second with 32 kernels of size 3 × 3, each followed by 2 × 2 max pooling. The architecture ends with two fully connected layers containing 128 and 4 neurons.</p>
Full article ">Figure 5
<p>Calculation of the difference between two particles.</p>
Full article ">Figure 6
<p>Separating FC layers from other layers.</p>
Full article ">Figure 7
<p>Velocity calculation of a single particle.</p>
Full article ">Figure 8
<p>Particle velocity calculation when gBest and pBest are the same.</p>
Full article ">Figure 9
<p>Updating the architecture of a particle.</p>
Full article ">Figure 10
<p>Samples from the dataset.</p>
Full article ">Figure 11
<p>Data distribution.</p>
Full article ">Figure 12
<p>Progression of the gBest model’s accuracy through iterations.</p>
Full article ">Figure 13
<p>Progression of gBest training and validation loss.</p>
Full article ">Figure 14
<p>Progression of gBest training and validation accuracy.</p>
Full article ">Figure 15
<p>The gBest confusion matrix.</p>
Full article ">Figure 16
<p>Confusion matrix illustrating the gBest model’s classification accuracy and error distribution on the BTD-MRI test set.</p>
Full article ">
20 pages, 8734 KiB  
Article
Enhancing Blood Cell Diagnosis Using Hybrid Residual and Dual Block Transformer Network
by Vishesh Tanwar, Bhisham Sharma, Dhirendra Prasad Yadav and Ashutosh Dhar Dwivedi
Bioengineering 2025, 12(2), 98; https://doi.org/10.3390/bioengineering12020098 - 22 Jan 2025
Viewed by 723
Abstract
Leukemia is a life-threatening blood cancer that affects a large cross-section of the population, which underscores the great need for timely, accurate, and efficient diagnostic solutions. Traditional methods are time-consuming, subject to human vulnerability, and do not always grasp the subtle morphological differences [...] Read more.
Leukemia is a life-threatening blood cancer that affects a large cross-section of the population, which underscores the great need for timely, accurate, and efficient diagnostic solutions. Traditional methods are time-consuming, subject to human vulnerability, and do not always grasp the subtle morphological differences that form the basic discriminatory features among different leukemia subtypes. The proposed residual vision transformer (ResViT) model breaks these limitations by combining the advantages of ResNet-50 for high dimensional feature extraction and a vision transformer for global attention to the spatial features. ResViT can extract low-level features like texture and edges as well as high-level features like patterns and shapes from the leukemia cell images. Furthermore, we designed a dual-stream ViT with a convolution stream for local details and a transformer stream for capturing the global dependencies, which enables ResViT to pay attention to multiple image regions simultaneously. The evaluation results of the proposed model on the two datasets were more than 99%, which makes it an excellent candidate for clinical diagnostics. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Applications in Healthcare)
Show Figures

Figure 1

Figure 1
<p>Proposed ResViT architecture for the diagnosis of leukemia cells.</p>
Full article ">Figure 2
<p>Confusion matrices obtained from each of the 5-fold cross-validations on dataset-1.</p>
Full article ">Figure 2 Cont.
<p>Confusion matrices obtained from each of the 5-fold cross-validations on dataset-1.</p>
Full article ">Figure 3
<p>Confusion matrices of dataset-2.</p>
Full article ">Figure 4
<p>The accuracy and loss over the training and validation datasets1 are presented in (<b>a</b>) and (<b>b</b>), respectively.</p>
Full article ">Figure 5
<p>The accuracy and loss over the training and validation dataset-2 is presented in (<b>a</b>) and (<b>b</b>), respectively on.</p>
Full article ">Figure 6
<p>ROC curves for the proposed vision transformer (ViT) model in the classification of 15 leukemia cell types.</p>
Full article ">Figure 7
<p>Comparison of the results on two different datasets of the proposed ResViT model.</p>
Full article ">Figure 8
<p>Comparison of the training and validation time on Dataset-1 and Dataset-2.</p>
Full article ">
14 pages, 2060 KiB  
Article
Detection of Acromion Types in Shoulder Magnetic Resonance Image Examination with Developed Convolutional Neural Network and Textural-Based Content-Based Image Retrieval System
by Mehmet Akçiçek, Mücahit Karaduman, Bülent Petik, Serkan Ünlü, Hursit Burak Mutlu and Muhammed Yildirim
J. Clin. Med. 2025, 14(2), 505; https://doi.org/10.3390/jcm14020505 - 14 Jan 2025
Viewed by 663
Abstract
Background: The morphological type of the acromion may play a role in the etiopathogenesis of various pathologies, such as shoulder impingement syndrome and rotator cuff disorders. Therefore, it is important to determine the acromion’s morphological types accurately and quickly. In this study, it [...] Read more.
Background: The morphological type of the acromion may play a role in the etiopathogenesis of various pathologies, such as shoulder impingement syndrome and rotator cuff disorders. Therefore, it is important to determine the acromion’s morphological types accurately and quickly. In this study, it was aimed to detect the acromion shape, which is one of the etiological causes of chronic shoulder disorders that may cause a decrease in work capacity and quality of life, on shoulder MR images by developing a new model for image retrieval in Content-Based Image Retrieval (CBIR) systems. Methods: Image retrieval was performed in CBIR systems using Convolutional Neural Network (CNN) architectures and textural-based methods as the basis. Feature maps of the images were extracted to measure image similarities in the developed CBIR system. For feature map extraction, feature extraction was performed with Histogram of Gradient (HOG), Local Binary Pattern (LBP), Darknet53, and Densenet201 architectures, and the Minimum Redundancy Maximum Relevance (mRMR) feature selection method was used for feature selection. The feature maps obtained after the dimensionality reduction process were combined. The Euclidean distance and Peak Signal-to-Noise Ratio (PSNR) were used as similarity measurement methods. Image retrieval was performed using features obtained from CNN architectures and textural-based models to compare the performance of the proposed method. Results: The highest Average Precision (AP) value was reached in the PSNR similarity measurement method with 0.76 in the proposed model. Conclusions: The proposed model is promising for accurately and rapidly determining morphological types of the acromion, thus aiding in the diagnosis and understanding of chronic shoulder disorders. Full article
(This article belongs to the Section Nuclear Medicine & Radiology)
Show Figures

Figure 1

Figure 1
<p>Graphical representation of acromion subtypes (<b>a</b>) and sample images of the shoulder MRI data set’s (<b>b</b>). <a href="#jcm-14-00505-f001" class="html-fig">Figure 1</a>a presents a graphic representation of acromion subtypes . Moreover, <a href="#jcm-14-00505-f001" class="html-fig">Figure 1</a>b presents (<b>a</b>) and sample images of the shoulder MRI data set’s Type 1, Type 2, Type 3, and Type 4 classes (<b>b</b>).</p>
Full article ">Figure 2
<p>Extraction of feature map for CBIR method.</p>
Full article ">Figure 3
<p>The proposed model in the CBIR system.</p>
Full article ">Figure 4
<p>Twenty images retrieved for a queried image.</p>
Full article ">Figure 5
<p>Average P-R curve of Type 1 class.</p>
Full article ">Figure 6
<p>Average P-R curve of Type 2 class.</p>
Full article ">Figure 7
<p>Average P-R curve of Type 3 class.</p>
Full article ">Figure 8
<p>Average P-R curve of Type 4.</p>
Full article ">Figure 9
<p>Average P-R curve of data set.</p>
Full article ">
23 pages, 6144 KiB  
Article
Based on the Geometric Characteristics of Binocular Imaging for Yarn Remaining Detection
by Ke Le and Yanhong Yuan
Sensors 2025, 25(2), 339; https://doi.org/10.3390/s25020339 - 9 Jan 2025
Viewed by 474
Abstract
The automated detection of yarn margins is crucial for ensuring the continuity and quality of production in textile workshops. Traditional methods rely on workers visually inspecting the yarn margin to determine the timing of replacement; these methods fail to provide real-time data and [...] Read more.
The automated detection of yarn margins is crucial for ensuring the continuity and quality of production in textile workshops. Traditional methods rely on workers visually inspecting the yarn margin to determine the timing of replacement; these methods fail to provide real-time data and cannot meet the precise scheduling requirements of modern production. The complex environmental conditions in textile workshops, combined with the cylindrical shape and repetitive textural features of yarn bobbins, limit the application of traditional visual solutions. Therefore, we propose a visual measurement method based on the geometric characteristics of binocular imaging: First, all contours in the image are extracted, and the distance sequence between the contours and the centroid is extracted. This sequence is then matched with a predefined template to identify the contour information of the yarn bobbin. Additionally, four equations for the tangent line from the camera optical center to the edge points of the yarn bobbin contour are established, and the angle bisectors of each pair of tangents are found. By solving the system of equations for these two angle bisectors, their intersection point is determined, giving the radius of the yarn bobbin. This method overcomes the limitations of monocular vision systems, which lack depth information and suffer from size measurement errors due to the insufficient repeat positioning accuracy when patrolling back and forth. Next, to address the self-occlusion issues and matching difficulties during binocular system measurements caused by the yarn bobbin surface’s repetitive texture, an imaging model is established based on the yarn bobbin’s cylindrical characteristics. This avoids pixel-by-pixel matching in binocular vision and enables the accurate measurement of the remaining yarn margin. The experimental data show that the measurement method exhibits high precision within the recommended working distance range, with an average error of only 0.68 mm. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>The distribution of weaving machines in a textile workshop: (<b>a</b>) a real textile workshop; (<b>b</b>) the production layout of the textile workshop.</p>
Full article ">Figure 2
<p>A schematic diagram of binocular stereovision measurement: (<b>a</b>) the principle of binocular triangulation. Here, <math display="inline"><semantics> <mrow> <mi>P</mi> </mrow> </semantics></math> is a point in the world coordinate system, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>p</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>p</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> are the image points on the image planes <math display="inline"><semantics> <mrow> <mi>L</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>R</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> are the epipolars. (<b>b</b>) A basic model of a pinhole camera. The length in world coordinates is imaged as the pixel on the imaging plane through the camera’s optical center <math display="inline"><semantics> <mrow> <mi>O</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>f</mi> </mrow> </semantics></math> is the camera focal length, and <math display="inline"><semantics> <mrow> <mi>Z</mi> </mrow> </semantics></math> is the distance between point and the binocular camera. (<b>c</b>) The process of binocular pixel matching.</p>
Full article ">Figure 3
<p>The monocular camera imaging process, where light rays pass through the cylinder, cross the image plane at points <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>r</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>r</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>, and converge at the optical center <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>O</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>. Point <math display="inline"><semantics> <mrow> <mi>O</mi> </mrow> </semantics></math> represents the center of the cylinder’s circular cross-section.</p>
Full article ">Figure 4
<p>Binocular camera imaging process.</p>
Full article ">Figure 5
<p>The imaging process of the cross-section of the yarn bobbin along a vertical axis. Here, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>c</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>c</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> are the horizontal coordinates of the camera optical centers on the pixel plane; the outer contour points <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>r</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mo> </mo> <msub> <mrow> <mi>r</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> are the real-space points <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>L</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>L</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>R</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mo> </mo> <msub> <mrow> <mi>R</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <mi>f</mi> </mrow> </semantics></math> is the camera focal length; and <math display="inline"><semantics> <mrow> <mi>b</mi> </mrow> </semantics></math> is the baseline of the binocular camera.</p>
Full article ">Figure 6
<p>The process of locating the contour of the yarn bobbin using centroid distance: (<b>a</b>) the centroid distance sequence template of the yarn bobbin contour; (<b>b</b>) the centroid distance sequence of the yarn bobbin contour; (<b>c</b>) a schematic of the matching process, where the first row shows the extracted contours, the second row shows the corresponding centroid distance sequences, and the third row shows the results of the cross-correlation function between the extracted centroid distance sequences and the centroid distance sequence template.</p>
Full article ">Figure 7
<p>A schematic of epipolar geometry, where <math display="inline"><semantics> <mrow> <mi>P</mi> </mrow> </semantics></math> is a point in the world coordinate system, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>p</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>p</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> are the image points on the image planes <math display="inline"><semantics> <mrow> <mi>L</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>R</mi> </mrow> </semantics></math>, and the epipoles <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> are the intersections of the baseline <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>O</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <msub> <mrow> <mi>O</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> with the image planes <math display="inline"><semantics> <mrow> <mi>L</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>R</mi> </mrow> </semantics></math>. At this point, the plane formed by <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>O</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <msub> <mrow> <mi>O</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> <mi>P</mi> </mrow> </semantics></math> is called the epipolar plane: (<b>a</b>) the original epipolar geometry diagram; (<b>b</b>) the epipolar geometry diagram after epipolar rectification.</p>
Full article ">Figure 8
<p>The contour localization process: (<b>a</b>) original image; (<b>b</b>) contours detected using structured forests; (<b>c</b>) contours after filtering; (<b>d</b>) extracted contour centroids; (<b>e</b>) centroid distance sequence image; (<b>f</b>) image after the localization of yarn bobbin contours.</p>
Full article ">Figure 9
<p>Measurement results at different distances: (<b>a</b>) a schematic of measurements at different distances for the yarn bobbin, numbers 1–12 represent the sequential positions on the camera mount, and the arrow indicates the direction of camera movement; (<b>b</b>) the measured yarn bobbin samplesfrom left to right: yarn bobbin1; yarn bobbin2.</p>
Full article ">Figure 10
<p>Measurement results at different distances: (<b>a</b>) the measurement results of yarn bobbin1 at different distances; (<b>b</b>) the measurement results of yarn bobbin2 at different distances.</p>
Full article ">Figure 11
<p>Measurement results at different angles: (<b>a</b>) a schematic of measurements at different camera positions, numbers 1–14 represent the sequential positions on the camera mount, and the arrow indicates the direction of camera movement; (<b>b</b>) measured yarn bobbin samples—from left to right: yarn bobbin1, yarn bobbin2, and yarn bobbin3; (<b>c</b>) measurement sizes at different camera positions.</p>
Full article ">Figure 12
<p>Yarn bobbin sample, the numbers in the figure represent the samples numbered in ascending order based on the size of the yarn bobbin: (<b>a</b>) a sample captured in the laboratory; (<b>b</b>) a sample captured in the production workshop.</p>
Full article ">Figure 13
<p>Measurement results. The points in the binocular vision method that did not match successfully resulted in excessively large errors, which are not displayed in the error bar chart in the figure: (<b>a</b>) a comparison of measurement errors in yarn bobbin radius using different methods in a laboratory environment; (<b>b</b>) a comparison of measurement errors in yarn bobbin radius using different methods in a production workshop environment.</p>
Full article ">
14 pages, 3058 KiB  
Article
A Combined Frame Difference and Convolution Method for Moving Vehicle Detection in Satellite Videos
by Xin Luo, Jiatian Li, Xiaohui A and Yuxi Deng
Sensors 2025, 25(2), 306; https://doi.org/10.3390/s25020306 - 7 Jan 2025
Viewed by 462
Abstract
To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively [...] Read more.
To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively integrate spatiotemporal information. First, a frame difference module (FDM) is designed, combining frame difference and convolution. This module extracts motion features between adjacent frames using frame difference, refines them through backpropagation in the neural network, and integrates them with the current frame to compensate for the missing motion features in single-frame images. Next, the initial features are processed by a backbone network to further extract spatiotemporal feature information. The neck incorporates deformable convolution, which adaptively adjusts convolution kernel sampling positions, optimizing feature representation and enabling effective multiscale information integration. Additionally, shallow large-scale feature maps, which use smaller receptive fields to focus on small targets and reduce background interference, are fed into the detection head. To enhance small-target feature representation, a small-target self-reconstruction module (SR-TOD) is introduced between the neck and the detection head. Experiments using the Jilin-1 satellite video dataset demonstrate that the proposed method outperforms comparison models, significantly reducing missed detections caused by weak color and texture features and blurred boundaries. For the satellite-video moving vehicle detection task, this method achieves notable improvements, with an average F1-score increase of 3.9% and a per-frame processing speed enhancement of 7 s compared to the next best model, DSFNet. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

Figure 1
<p>The general framework of the methodology is proposed in this paper.</p>
Full article ">Figure 2
<p>Efficient multiscale attention (EMA) module [<a href="#B30-sensors-25-00306" class="html-bibr">30</a>].</p>
Full article ">Figure 3
<p>Frame difference module (FDM) architecture.</p>
Full article ">Figure 4
<p>Feature extraction network architecture.</p>
Full article ">Figure 5
<p>Self-reconstructed tiny object detection framework (SR-TOD).</p>
Full article ">Figure 6
<p>Sample presentation of the Jilin-1 satellite dataset.</p>
Full article ">Figure 7
<p>Results of different comparison methods in the test dataset. Green boxes: TP. Red boxes: FN. Blue boxes: FP.</p>
Full article ">Figure 8
<p>Visual comparison of the traditional frame difference method and the use of FDM. (<b>a</b>) Example of Jilin-1 satellite image (green dots represent ground truth). (<b>b</b>) Traditional frame difference method. (<b>c</b>) Using the frame difference module (FDM).</p>
Full article ">
29 pages, 17674 KiB  
Article
Noise-Perception Multi-Frame Collaborative Network for Enhanced Polyp Detection in Endoscopic Videos
by Haoran Li, Guoyong Zhen, Chengqun Chu, Yuting Ma and Yongnan Zhao
Electronics 2025, 14(1), 62; https://doi.org/10.3390/electronics14010062 - 27 Dec 2024
Viewed by 598
Abstract
The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp [...] Read more.
The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp detection in video frames. To tackle these challenges, we deployed multivariate regression analysis to refine the model and introduced a Noise-Suppressing Perception Network (NSPNet) designed for enhanced performance. NSPNet leverages wavelet transform to enhance the model’s resistance to noise and artifacts while improving a multi-frame collaborative detection strategy for dynamic polyp detection in endoscopic videos, efficiently utilizing temporal information to strengthen features across frames. Specifically, we designed a High-Low Frequency Feature Fusion (HFLF) framework, which allows the model to capture high-frequency details more effectively. Additionally, we introduced an improved STFT-LSTM Polyp Detection (SLPD) module that utilizes temporal information from video sequences to enhance feature fusion in dynamic environments. Lastly, we integrated an Image Augmentation Polyp Detection (IAPD) module to improve performance on unseen data through preprocessing enhancement strategies. Extensive experiments demonstrate that NSPNet outperforms nine SOTA methods across four datasets on key performance metrics, including F1Score and recall. Full article
Show Figures

Figure 1

Figure 1
<p>The first image shows normal content without artifacts or noise. The second image has artifacts, noise, and camera shake, reducing quality. In the third to sixth image sequence, intermediate frames show artifacts, while surrounding frames are normal, highlighting the contrast.</p>
Full article ">Figure 2
<p>Schematic diagram of the model architecture. (<b>a</b>) illustrates the perception region of earlier models, where noise and artifacts were included during training, negatively impacting model performance. In contrast, (<b>b</b>) demonstrates the use of wavelet transform for frequency decomposition, which enhances detail capture and noise suppression, enabling precise recognition of complex textures and colors.</p>
Full article ">Figure 3
<p>Overall framework diagram. IAPD applies diverse augmentations to enhance adaptability, HFLF leverages wavelet decomposition and attention for feature optimization, and SLPD integrates multi-frame collaboration with spatiotemporal attention for dynamic information extraction.</p>
Full article ">Figure 4
<p>HFLF Framework Overview. (<b>a</b>) illustrates the architecture of the early model, which extracts and outputs features through multiple convolutional layers but fails to fully consider the frequency components and feature selectivity. (<b>b</b>) presents the improved model framework, which incorporates WaveletNet to independently extract high- and low-frequency features, while integrated attention mechanisms further enhance feature selectivity and suppress noise. (<b>c</b>) demonstrates the improved model’s ability to process features across different frequency bands, highlighting the complementary nature of high-frequency detail capture and low-frequency semantic representation. (<b>d</b>) illustrates the model’s interpretability, offering a clear visualization of the learned features to aid understanding.</p>
Full article ">Figure 5
<p>This figure illustrates the design of the SLPD-Module, which integrates spatial and temporal modeling using Convolutional Neural Networks (CNNs), deformable convolutions, LSTM networks, and attention mechanisms to capture dynamic inter-frame associations and enhance polyp detection performance in video sequences. (<b>a</b>) demonstrates the process of feature handling during the training and testing phases, including feature optimization and detection result generation. (<b>b</b>) depicts the architecture of the SLPD-Module, which combines spatial and temporal modeling to capture dynamic information in video sequences, thereby improving polyp detection performance.</p>
Full article ">Figure 6
<p>Training loss curves. The training loss curves illustrate the progressive convergence of different models during the training process. Notably, NSPNet demonstrates a more rapid and stable convergence compared to STFT and other models, with its final loss consistently lower than that of STFT.</p>
Full article ">Figure 7
<p>Detailed comparison of results. Green represents the ground truth, and yellow represents the predicted box.</p>
Full article ">Figure 8
<p>Attention mechanism. By dividing multi-head attention into chunks and inputting multi-frame features into an LSTM for temporal information extraction, adaptive enhancement is achieved. The attention mechanism splits the features into multiple chunks, processing one chunk at a time to reduce memory consumption. Each chunk computes queries, keys, and values, and generates attention weights using matrix multiplication and Softmax operations.</p>
Full article ">Figure 9
<p>This figure demonstrates how the IAPD module enhances model robustness and adaptability by applying various image augmentation techniques, such as Gaussian blur, scaling, rotation, and mixed transformations, to improve generalization in low-quality and diverse datasets.</p>
Full article ">Figure 10
<p>This figure illustrates the model’s training and convergence process, showcasing the progressive optimization of focus areas through Class Activation Mapping (CAM). It visually reflects the model’s enhanced ability to concentrate on dynamic target features and suppress background interference from the initial iterations to the convergence stage.</p>
Full article ">Figure A1
<p>Detailed comparison of results. Green represents the ground truth, and yellow represents the predicted box.</p>
Full article ">Figure A2
<p>The results presented here highlight our model’s performance on videos. Green outlines represent the ground truth, while yellow outlines indicate the predicted bounding boxes. Our model demonstrates outstanding performance even under challenging conditions, such as severe artifact interference (indicated by blue arrows) and extremely uneven illumination (indicated by purple arrows).</p>
Full article ">
21 pages, 6234 KiB  
Article
Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer
by Naohiro Masuda, Keiko Ono, Daisuke Tawara, Yusuke Matsuura and Kentaro Sakabe
Sensors 2025, 25(1), 81; https://doi.org/10.3390/s25010081 - 26 Dec 2024
Viewed by 625
Abstract
The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability [...] Read more.
The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability to incorporate positional information. As orthopedic surgery increasingly requires precise automatic diagnosis, we explored SegFormer, an enhanced Vision Transformer model that better handles spatial awareness in segmentation tasks. However, SegFormer’s effectiveness is typically limited by its need for extensive training data, which is particularly challenging in medical imaging, where obtaining labeled ground truths (GTs) is a costly and resource-intensive process. In this paper, we propose two models and their combination to enable accurate feature extraction from smaller datasets by improving SegFormer. Specifically, these include the data-efficient model, which deepens the hierarchical encoder by adding convolution layers to transformer blocks and increases feature map resolution within transformer blocks, and the FPN-based model, which enhances the decoder through a Feature Pyramid Network (FPN) and attention mechanisms. Testing our model on spine images from the Cancer Imaging Archive and our own hand and wrist dataset, ablation studies confirmed that our modifications outperform the original SegFormer, U-Net, and Mask2Former. These enhancements enable better image feature extraction and more precise object contour detection, which is particularly beneficial for medical imaging applications with limited training data. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>SegFormer architecture.</p>
Full article ">Figure 2
<p>Proposed model architecture.</p>
Full article ">Figure 3
<p>Data-efficient encoder architecture.</p>
Full article ">Figure 4
<p>Model-wise IoU for spine images.</p>
Full article ">Figure 5
<p>Model-wise IoU for hand and Wrist images.</p>
Full article ">Figure 6
<p>Model-wise IoU for femur images.</p>
Full article ">Figure A1
<p>Datasets.</p>
Full article ">Figure A1 Cont.
<p>Datasets.</p>
Full article ">Figure A2
<p>Spine segmentation.</p>
Full article ">Figure A2 Cont.
<p>Spine segmentation.</p>
Full article ">Figure A3
<p>Hand and wrist segmentation.</p>
Full article ">Figure A3 Cont.
<p>Hand and wrist segmentation.</p>
Full article ">Figure A4
<p>Femur segmantation.</p>
Full article ">Figure A4 Cont.
<p>Femur segmantation.</p>
Full article ">
24 pages, 5004 KiB  
Article
SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers
by Dian Jiao, Nan Su, Yiming Yan, Ying Liang, Shou Feng, Chunhui Zhao and Guangjun He
Remote Sens. 2024, 16(24), 4734; https://doi.org/10.3390/rs16244734 - 18 Dec 2024
Viewed by 805
Abstract
Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in [...] Read more.
Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in object size and textures in remote sensing images. To address that problem, we present SymSwin, a super-resolution model based on the Swin transformer aimed to capture a multi-scale context. The symmetric multi-scale window (SyMW) mechanism is proposed and integrated in the backbone, which is capable of perceiving features with various sizes. First, the SyMW mechanism is proposed to capture discriminative contextual features from multi-scale presentations using corresponding attentive window size. Subsequently, a cross-receptive field-adaptive attention (CRAA) module is introduced to model the relations among multi-scale contexts and to realize adaptive fusion. Furthermore, RS data exhibit poor spatial resolution, leading to insufficient visual information when merely spatial supervision is applied. Therefore, a U-shape wavelet transform (UWT) loss is proposed to facilitate the training process from the frequency domain. Extensive experiments demonstrate that our method achieves superior performance in both quantitative metrics and visual quality compared with existing algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Overall architecture of SymSwin, containing three main functional stages. The chief deep feature extraction stage involves SyMWBs and CRAAs. (<b>b</b>) Detailed illustration of SyMWB composition. (<b>c</b>) Detailed illustration of the CRAA module. (<b>d</b>) Detailed illustration of the Swin-DCFF layer. SW-SA denotes conventional shifted-window self-attention. (<b>e</b>) Detailed illustration of DCFF.</p>
Full article ">Figure 2
<p>Indication of SyMW mechanism. <span class="html-fig-inline" id="remotesensing-16-04734-i001"><img alt="Remotesensing 16 04734 i001" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i001.png"/></span> denotes window for SyMWB<sub>i</sub>, <span class="html-fig-inline" id="remotesensing-16-04734-i002"><img alt="Remotesensing 16 04734 i002" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i002.png"/></span> denotes window for SyMWB<sub>i+1</sub>, and <span class="html-fig-inline" id="remotesensing-16-04734-i003"><img alt="Remotesensing 16 04734 i003" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i003.png"/></span> denotes feature map of SyMWB<sub>i</sub>. Each feature map represents the extraction of a whole block. The grid denotes the window size used on each feature map. The illustration intuitively demonstrates the SyMW can provide multi-scale context.</p>
Full article ">Figure 3
<p>Illustration of the CRAA module, containing two main functional stages. During the CRA stage, we calculate the correlation between context with different receptive fields and achieve flexible fusion. During the AFF stage, we adaptively enhance the fusion feature.</p>
Full article ">Figure 4
<p>Illustration of the SWT process. The color space conversion converts an image from RGB space to YCrCb space, and we select the Y-band value, representing the luminance information. LF denotes the low-frequency sub-band, and HF denotes high-frequency sub-bands. The sketches of HF directly depict the horizontal, vertical, and diagonal direction edges.</p>
Full article ">Figure 5
<p>The visualization examples of the ×4 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 5 Cont.
<p>The visualization examples of the ×4 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 6
<p>The visualization examples of the ×3 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 6 Cont.
<p>The visualization examples of the ×3 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 7
<p>A comparison of the visualized feature maps extracted by each layer of the backbone with and without multi-scale representations, illustrating the different regions of interest the nets tend to focus on. The color closer to red denotes the stronger attention.</p>
Full article ">
Back to TopTop