Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,364)

Search Parameters:
Keywords = deep metric learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1728 KiB  
Article
Drug–Target Affinity Prediction Based on Cross-Modal Fusion of Text and Graph
by Jucheng Yang and Fushun Ren
Appl. Sci. 2025, 15(6), 2901; https://doi.org/10.3390/app15062901 (registering DOI) - 7 Mar 2025
Abstract
Drug–target affinity (DTA) prediction is a critical step in virtual screening and significantly accelerates drug development. However, existing deep learning-based methods relying on single-modal representations (e.g., text or graphs) struggle to fully capture the complex interactions between drugs and targets. This study proposes [...] Read more.
Drug–target affinity (DTA) prediction is a critical step in virtual screening and significantly accelerates drug development. However, existing deep learning-based methods relying on single-modal representations (e.g., text or graphs) struggle to fully capture the complex interactions between drugs and targets. This study proposes CM-DTA, a cross-modal feature fusion model that integrates drug textual representations and molecular graphs with target protein amino acid sequences and structural graphs, enhancing feature diversity and expressiveness. The model employs the multi-perceptive neighborhood self-attention aggregation strategy to capture first- and second-order neighborhood information, overcoming limitations in graph isomorphism networks (GIN) for structural representation. The experimental results on the Davis and KIBA datasets show that CM-DTA significantly improves the performance of drug–target affinity prediction, achieving higher accuracy and better prediction metrics compared to state-of-the-art (SOTA) models. Full article
Show Figures

Figure 1

Figure 1
<p>A overview of representative methods for DTA prediction.</p>
Full article ">Figure 2
<p>The overall architecture of CM-DTA.</p>
Full article ">Figure 3
<p>Multi-perceptive neighborhood self-attention aggregation strategy.</p>
Full article ">Figure 4
<p>Cross-modal bidirectional adaptive guided fusion strategy.</p>
Full article ">Figure 5
<p>Single guided-attention cross-modal fusion module.</p>
Full article ">Figure 6
<p>Data preprocessing.</p>
Full article ">Figure 7
<p>Prediction performance on the Davis and KIBA dataset.</p>
Full article ">Figure 8
<p>The real affinity against the predicted value on Davis (<b>a</b>) and KIBA (<b>b</b>) datasets.</p>
Full article ">
22 pages, 2432 KiB  
Article
A Framework for Integrating Deep Learning and Symbolic AI Towards an Explainable Hybrid Model for the Detection of COVID-19 Using Computerized Tomography Scans
by Vengai Musanga, Serestina Viriri and Colin Chibaya
Information 2025, 16(3), 208; https://doi.org/10.3390/info16030208 (registering DOI) - 7 Mar 2025
Abstract
The integration of Deep Learning and Symbolic Artificial Intelligence (AI) offers a promising hybrid framework for enhancing diagnostic accuracy and explainability in critical applications such as COVID-19 detection using computerized tomography (CT) scans. This study proposes a novel hybrid AI model that leverages [...] Read more.
The integration of Deep Learning and Symbolic Artificial Intelligence (AI) offers a promising hybrid framework for enhancing diagnostic accuracy and explainability in critical applications such as COVID-19 detection using computerized tomography (CT) scans. This study proposes a novel hybrid AI model that leverages the strengths of both approaches: the automated feature extraction and classification capabilities of Deep Learning and the logical reasoning and interpretability of Symbolic AI. Key components of the model include the adaptive deformable module, which improves spatial feature extraction by addressing variations in lung anatomy, and the attention-based encoder, which enhances feature saliency by focusing on critical regions within CT scans. Experimental validation using performance metrics such as F1-score, accuracy, precision, and recall demonstrates the model’s significant improvement over baseline configurations, achieving near-perfect accuracy (99.16%) and F1-score (0.9916). This hybrid AI framework not only achieves state-of-the-art diagnostic performance but also ensures interpretability through its symbolic reasoning layer, facilitating its adoption in healthcare settings. The findings underscore the potential of combining advanced machine learning techniques with symbolic approaches to create robust and transparent AI systems for critical medical applications. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>A framework for image classification using Deep Learning methods.</p>
Full article ">Figure 2
<p>A framework for a Symbolic AI system.</p>
Full article ">Figure 3
<p>Overview of the hybrid AI model architecture integrating Deep Learning and Symbolic AI.</p>
Full article ">Figure 4
<p>Confusion matrix.</p>
Full article ">Figure 5
<p>Training and validation accuracy graph.</p>
Full article ">Figure 6
<p>Training and validation loss graph.</p>
Full article ">Figure 7
<p>Faithful analysis: impact of feature removal graph.</p>
Full article ">Figure 8
<p>Comparison of consistency across methods graph.</p>
Full article ">Figure 9
<p>K-fold cross-validation performance graph.</p>
Full article ">
21 pages, 2488 KiB  
Article
Classification of Mycena and Marasmius Species Using Deep Learning Models: An Ecological and Taxonomic Approach
by Fatih Ekinci, Guney Ugurlu, Giray Sercan Ozcan, Koray Acici, Tunc Asuroglu, Eda Kumru, Mehmet Serdar Guzel and Ilgaz Akata
Sensors 2025, 25(6), 1642; https://doi.org/10.3390/s25061642 - 7 Mar 2025
Viewed by 100
Abstract
Fungi play a critical role in ecosystems, contributing to biodiversity and providing economic and biotechnological value. In this study, we developed a novel deep learning-based framework for the classification of seven macrofungi species from the genera Mycena and Marasmius, leveraging their unique [...] Read more.
Fungi play a critical role in ecosystems, contributing to biodiversity and providing economic and biotechnological value. In this study, we developed a novel deep learning-based framework for the classification of seven macrofungi species from the genera Mycena and Marasmius, leveraging their unique ecological and morphological characteristics. The proposed approach integrates a custom convolutional neural network (CNN) with a self-organizing map (SOM) adapted for supervised learning and a Kolmogorov–Arnold Network (KAN) layer to enhance classification performance. The experimental results demonstrate significant improvements in classification metrics when using the CNN-SOM and CNN-KAN architectures. Additionally, advanced pretrained models such as MaxViT-S and ResNetV2-50 achieved high accuracy rates, with MaxViT-S achieving 98.9% accuracy. Statistical analyses using the chi-square test confirmed the reliability of the results, emphasizing the importance of validating evaluation metrics statistically. This research represents the first application of SOM in fungal classification and highlights the potential of deep learning in advancing fungal taxonomy. Future work will focus on optimizing the KAN architecture and expanding the dataset to include more fungal classes, further enhancing classification accuracy and ecological understanding. Full article
Show Figures

Figure 1

Figure 1
<p>The proposed CNN-SOM architecture.</p>
Full article ">Figure 2
<p>A macroscopic overview of <span class="html-italic">Mycena</span> species.</p>
Full article ">Figure 3
<p>A macroscopic overview of <span class="html-italic">Marasmius</span> species.</p>
Full article ">
52 pages, 29859 KiB  
Review
2D Object Detection: A Survey
by Emanuele Malagoli and Luca Di Persio
Mathematics 2025, 13(6), 893; https://doi.org/10.3390/math13060893 (registering DOI) - 7 Mar 2025
Viewed by 141
Abstract
Object detection is a fundamental task in computer vision, aiming to identify and localize objects of interest within an image. Over the past two decades, the domain has changed profoundly, evolving into an active and fast-moving field while simultaneously becoming the foundation for [...] Read more.
Object detection is a fundamental task in computer vision, aiming to identify and localize objects of interest within an image. Over the past two decades, the domain has changed profoundly, evolving into an active and fast-moving field while simultaneously becoming the foundation for a wide range of modern applications. This survey provides a comprehensive review of the evolution of 2D generic object detection, tracing its development from traditional methods relying on handcrafted features to modern approaches driven by deep learning. The review systematically categorizes contemporary object detection methods into three key paradigms: one-stage, two-stage, and transformer-based, highlighting their development milestones and core contributions. The paper provides an in-depth analysis of each paradigm, detailing landmark methods and their impact on the progression of the field. Additionally, the survey examines some fundamental components of 2D object detection such as loss functions, datasets, evaluation metrics, and future trends. Full article
(This article belongs to the Special Issue Advanced Research in Image Processing and Optimization Methods)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Object detection pipeline. The model processes an input image and detects instances of predefined object classes (here, <span class="html-italic">dog</span> and <span class="html-italic">cat</span>), predicting bounding boxes, class labels, and confidence scores.</p>
Full article ">Figure 2
<p>Milestones of 2D generic object detection. AlexNet [<a href="#B6-mathematics-13-00893" class="html-bibr">6</a>] marks the transition from traditional methods, based on handcrafted features, to deep learning-based approaches, based on learned features. Among the latter, three distinct colors identify the paradigms examined in this survey: blue for two-stage detectors, red for one-stage detectors, and light blue for transformer-based detectors, based on the transformer architecture [<a href="#B11-mathematics-13-00893" class="html-bibr">11</a>], represented on the timeline. Milestone detectors in this figure: Viola–Jones Detector [<a href="#B4-mathematics-13-00893" class="html-bibr">4</a>,<a href="#B5-mathematics-13-00893" class="html-bibr">5</a>], HOG Detector [<a href="#B1-mathematics-13-00893" class="html-bibr">1</a>], DPM [<a href="#B12-mathematics-13-00893" class="html-bibr">12</a>], R-CNN [<a href="#B7-mathematics-13-00893" class="html-bibr">7</a>], SPPNet [<a href="#B13-mathematics-13-00893" class="html-bibr">13</a>], Fast R-CNN [<a href="#B14-mathematics-13-00893" class="html-bibr">14</a>], Faster R-CNN [<a href="#B15-mathematics-13-00893" class="html-bibr">15</a>,<a href="#B16-mathematics-13-00893" class="html-bibr">16</a>], R-FCN [<a href="#B17-mathematics-13-00893" class="html-bibr">17</a>], FPN [<a href="#B18-mathematics-13-00893" class="html-bibr">18</a>], Mask R-CNN [<a href="#B19-mathematics-13-00893" class="html-bibr">19</a>], Cascade R-CNN [<a href="#B20-mathematics-13-00893" class="html-bibr">20</a>,<a href="#B21-mathematics-13-00893" class="html-bibr">21</a>], OverFeat [<a href="#B22-mathematics-13-00893" class="html-bibr">22</a>], SSD [<a href="#B9-mathematics-13-00893" class="html-bibr">9</a>], DSSD [<a href="#B23-mathematics-13-00893" class="html-bibr">23</a>], R-SSD [<a href="#B24-mathematics-13-00893" class="html-bibr">24</a>], FSSD [<a href="#B25-mathematics-13-00893" class="html-bibr">25</a>], RefineDet [<a href="#B26-mathematics-13-00893" class="html-bibr">26</a>], EFGRNet [<a href="#B27-mathematics-13-00893" class="html-bibr">27</a>], ASSD [<a href="#B28-mathematics-13-00893" class="html-bibr">28</a>], RetinaNet [<a href="#B29-mathematics-13-00893" class="html-bibr">29</a>], CornerNet [<a href="#B30-mathematics-13-00893" class="html-bibr">30</a>], CenterNet [<a href="#B31-mathematics-13-00893" class="html-bibr">31</a>], ExtremeNet [<a href="#B32-mathematics-13-00893" class="html-bibr">32</a>], FCOS [<a href="#B33-mathematics-13-00893" class="html-bibr">33</a>], FoveaBox [<a href="#B34-mathematics-13-00893" class="html-bibr">34</a>], FSAF [<a href="#B35-mathematics-13-00893" class="html-bibr">35</a>], YOLOv1 [<a href="#B8-mathematics-13-00893" class="html-bibr">8</a>], YOLOv2 [<a href="#B36-mathematics-13-00893" class="html-bibr">36</a>], YOLOv3 [<a href="#B37-mathematics-13-00893" class="html-bibr">37</a>], YOLOv4 [<a href="#B38-mathematics-13-00893" class="html-bibr">38</a>], YOLOv6 [<a href="#B39-mathematics-13-00893" class="html-bibr">39</a>], YOLOv7 [<a href="#B40-mathematics-13-00893" class="html-bibr">40</a>], YOLOv9 [<a href="#B41-mathematics-13-00893" class="html-bibr">41</a>], DETR [<a href="#B10-mathematics-13-00893" class="html-bibr">10</a>], Deformable DETR [<a href="#B42-mathematics-13-00893" class="html-bibr">42</a>], DAB-DETR [<a href="#B43-mathematics-13-00893" class="html-bibr">43</a>], DN-DETR [<a href="#B44-mathematics-13-00893" class="html-bibr">44</a>], DINO [<a href="#B45-mathematics-13-00893" class="html-bibr">45</a>], ViT-FRCNN [<a href="#B46-mathematics-13-00893" class="html-bibr">46</a>], ViTDet [<a href="#B47-mathematics-13-00893" class="html-bibr">47</a>].</p>
Full article ">Figure 3
<p>On the <b>left</b>: the first and second rectangular features selected by AdaBoost [<a href="#B50-mathematics-13-00893" class="html-bibr">50</a>] in Viola–Jones detector [<a href="#B4-mathematics-13-00893" class="html-bibr">4</a>,<a href="#B5-mathematics-13-00893" class="html-bibr">5</a>]. The two features are shown in the top row and then overlaid on a typical training face in the bottom row. On the <b>right</b>: an example detection obtained using the deformable part-based model (DPM) [<a href="#B12-mathematics-13-00893" class="html-bibr">12</a>]. The model consists of a coarse template (blue rectangle), several higher-resolution part templates (yellow rectangles), and a spatial model that defines the relative location of each part.</p>
Full article ">Figure 4
<p>The architecture of R-CNN [<a href="#B7-mathematics-13-00893" class="html-bibr">7</a>], which, in order, takes an input image, extracts around 2000 bottom-up region proposals, computes features for each proposal using a backbone, and then classifies each region using class-specific linear SVMs.</p>
Full article ">Figure 5
<p>Mask R-CNN results on the MS-COCO [<a href="#B121-mathematics-13-00893" class="html-bibr">121</a>] dataset. For each image, masks are shown in color, along with the corresponding bounding boxes, category labels, and confidence scores.</p>
Full article ">Figure 6
<p>SSD architecture: SSD adds several feature layers to the end of a backbone, which predict the offsets to default boxes of different scales and aspect ratios, along with their associated confidence scores.</p>
Full article ">Figure 7
<p>The three images show anchor-free keypoint-based methods which use different combinations of keypoints (red circles) and then group them for bounding box prediction. A pair of corners, a triplet of keypoints, and extreme points on the object are respectively used in CornerNet [<a href="#B30-mathematics-13-00893" class="html-bibr">30</a>], CenterNet [<a href="#B31-mathematics-13-00893" class="html-bibr">31</a>], and ExtremeNet [<a href="#B32-mathematics-13-00893" class="html-bibr">32</a>].</p>
Full article ">Figure 8
<p>On the <b>left</b>: FCOS [<a href="#B33-mathematics-13-00893" class="html-bibr">33</a>] works by predicting a 4D vector <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>l</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>r</mi> <mo>,</mo> <mi>b</mi> <mo>)</mo> </mrow> </semantics></math> encoding the location of a bounding box at each foreground pixel. The second plot on the left illustrates the ambiguity that arises when a location resides in multiple bounding boxes. On the <b>right</b>: the center-ness of FCOS [<a href="#B33-mathematics-13-00893" class="html-bibr">33</a>] is shown, where red, blue, and other colors denote <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>,</mo> <mn>0</mn> </mrow> </semantics></math>, and the values between them, respectively. Center-ness decays from 1 to 0 as the location deviates from the center of the object.</p>
Full article ">Figure 9
<p>YOLOv1 [<a href="#B8-mathematics-13-00893" class="html-bibr">8</a>] detection pipeline: it divides the image into an <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>×</mo> <mi>S</mi> </mrow> </semantics></math> grid and for each grid cell predicts <span class="html-italic">B</span> bounding boxes, their associated confidence scores, and <span class="html-italic">C</span> class probabilities. These outputs are encoded as an <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>×</mo> <mi>S</mi> <mo>×</mo> <mo>(</mo> <mi>B</mi> <mo>×</mo> <mn>5</mn> <mo>+</mo> <mi>C</mi> <mo>)</mo> </mrow> </semantics></math> tensor.</p>
Full article ">Figure 10
<p>DETR [<a href="#B10-mathematics-13-00893" class="html-bibr">10</a>] detection pipeline. DETR [<a href="#B10-mathematics-13-00893" class="html-bibr">10</a>] combines a common CNN with a transformer architecture and directly predicts (in parallel) the final set of detections. During training, bipartite matching is used to uniquely pair predictions with ground truth boxes; unmatched predictions should yield a “no object” (Ø) class prediction.</p>
Full article ">Figure 11
<p>ViT-FRCNN [<a href="#B46-mathematics-13-00893" class="html-bibr">46</a>] detection pipeline. ViT [<a href="#B359-mathematics-13-00893" class="html-bibr">359</a>] backbone is extended to perform object detection by making use of the per-patch outputs in the final transformer layer. These outputs are reinterpreted as spatial feature map and passed to a detection network.</p>
Full article ">Figure 12
<p>Examples of images and their corresponding annotations from some of the most widely used object detection datasets. From left to right: Pascal VOC [<a href="#B81-mathematics-13-00893" class="html-bibr">81</a>,<a href="#B82-mathematics-13-00893" class="html-bibr">82</a>,<a href="#B83-mathematics-13-00893" class="html-bibr">83</a>,<a href="#B384-mathematics-13-00893" class="html-bibr">384</a>,<a href="#B385-mathematics-13-00893" class="html-bibr">385</a>,<a href="#B386-mathematics-13-00893" class="html-bibr">386</a>,<a href="#B387-mathematics-13-00893" class="html-bibr">387</a>,<a href="#B388-mathematics-13-00893" class="html-bibr">388</a>,<a href="#B389-mathematics-13-00893" class="html-bibr">389</a>,<a href="#B390-mathematics-13-00893" class="html-bibr">390</a>], MS-COCO [<a href="#B121-mathematics-13-00893" class="html-bibr">121</a>], and Open Images [<a href="#B391-mathematics-13-00893" class="html-bibr">391</a>,<a href="#B392-mathematics-13-00893" class="html-bibr">392</a>,<a href="#B393-mathematics-13-00893" class="html-bibr">393</a>].</p>
Full article ">
30 pages, 1422 KiB  
Article
A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models
by Andreas Karathanasis, John Violos and Ioannis Kompatsiaris
Mathematics 2025, 13(5), 887; https://doi.org/10.3390/math13050887 (registering DOI) - 6 Mar 2025
Viewed by 129
Abstract
DeepFake detection models play a crucial role in ambient intelligence and smart environments, where systems rely on authentic information for accurate decisions. These environments, integrating interconnected IoT devices and AI-driven systems, face significant threats from DeepFakes, potentially leading to compromised trust, erroneous decisions, [...] Read more.
DeepFake detection models play a crucial role in ambient intelligence and smart environments, where systems rely on authentic information for accurate decisions. These environments, integrating interconnected IoT devices and AI-driven systems, face significant threats from DeepFakes, potentially leading to compromised trust, erroneous decisions, and security breaches. To mitigate these risks, neural-network-based DeepFake detection models have been developed. However, their substantial computational requirements and long training times hinder deployment on resource-constrained edge devices. This paper investigates compression and transfer learning techniques to reduce the computational demands of training and deploying DeepFake detection models, while preserving performance. Pruning, knowledge distillation, quantization, and adapter modules are explored to enable efficient real-time DeepFake detection. An evaluation was conducted on four benchmark datasets: “SynthBuster”, “140k Real and Fake Faces”, “DeepFake and Real Images”, and “ForenSynths”. It compared compressed models with uncompressed baselines using widely recognized metrics such as accuracy, precision, recall, F1-score, model size, and training time. The results showed that a compressed model at 10% of the original size retained only 56% of the baseline accuracy, but fine-tuning in similar scenarios increased this to nearly 98%. In some cases, the accuracy even surpassed the original’s performance by up to 12%. These findings highlight the feasibility of deploying DeepFake detection models in edge computing scenarios. Full article
(This article belongs to the Special Issue Ambient Intelligence Methods and Applications)
Show Figures

Figure 1

Figure 1
<p>Pruning of convolutional neural networks.</p>
Full article ">Figure 2
<p>Knowledge distillation in the teacher–student framework.</p>
Full article ">Figure 3
<p>Quantization of deep neural network parameters.</p>
Full article ">Figure 4
<p>Low-rank factorization.</p>
Full article ">Figure 5
<p>Transfer learning across different tasks.</p>
Full article ">Figure 6
<p>CNN with adapter module for transfer learning.</p>
Full article ">Figure 7
<p>Knowledge distillation for transfer learning.</p>
Full article ">Figure 8
<p>“Dogs vs. cats” dataset example.</p>
Full article ">Figure 9
<p>Sample ROC curves for Synthbuster dataset.</p>
Full article ">
20 pages, 3271 KiB  
Article
Fine-Tuned Machine Learning Classifiers for Diagnosing Parkinson’s Disease Using Vocal Characteristics: A Comparative Analysis
by Mehmet Meral, Ferdi Ozbilgin and Fatih Durmus
Diagnostics 2025, 15(5), 645; https://doi.org/10.3390/diagnostics15050645 - 6 Mar 2025
Viewed by 82
Abstract
Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine [...] Read more.
Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine learning algorithms optimized to classify PD based on vocal characteristics to serve as a non-invasive and easily accessible diagnostic tool. Methods: This study used a publicly available dataset of vocal samples from 188 people with PD and 64 controls. Acoustic features like baseline characteristics, time-frequency components, Mel Frequency Cepstral Coefficients (MFCCs), and wavelet transform-based metrics were extracted and analyzed. The Chi-Square test was used for feature selection to determine the most important attributes that enhanced the accuracy of the classification. Six different machine learning classifiers, namely SVM, k-NN, DT, NN, Ensemble and Stacking models, were developed and optimized via Bayesian Optimization (BO), Grid Search (GS) and Random Search (RS). Accuracy, precision, recall, F1-score and AUC-ROC were used for evaluation. Results: It has been found that Stacking models, especially those fine-tuned via Grid Search, yielded the best performance with 92.07% accuracy and an F1-score of 0.95. In addition to that, the choice of relevant vocal features, in conjunction with the Chi-Square feature selection method, greatly enhanced the computational efficiency and classification performance. Conclusions: This study highlights the potential of combining advanced feature selection techniques with hyperparameter optimization strategies to enhance machine learning-based PD diagnosis using vocal characteristics. Ensemble models proved particularly effective in handling complex datasets, demonstrating robust diagnostic performance. Future research may focus on deep learning approaches and temporal feature integration to further improve diagnostic accuracy and scalability for clinical applications. Full article
Show Figures

Figure 1

Figure 1
<p>Gender distribution of samples in the dataset.</p>
Full article ">Figure 2
<p>Proposed methodology.</p>
Full article ">Figure 3
<p>Proposed Stacking Learning method.</p>
Full article ">Figure 4
<p>ROC curves of machine learning classifiers optimized (<b>a</b>) with BO parameters, (<b>b</b>) with RS parameters, and (<b>c</b>) with GS parameters.</p>
Full article ">Figure 4 Cont.
<p>ROC curves of machine learning classifiers optimized (<b>a</b>) with BO parameters, (<b>b</b>) with RS parameters, and (<b>c</b>) with GS parameters.</p>
Full article ">Figure 5
<p>Comparison of AUC values across different models and optimization methods for PD classification.</p>
Full article ">Figure 6
<p>SHAP summary plots for GS-Ensemble model: feature contributions to PD classification for (<b>a</b>) Class 0 and (<b>b</b>) Class 1.</p>
Full article ">Figure 6 Cont.
<p>SHAP summary plots for GS-Ensemble model: feature contributions to PD classification for (<b>a</b>) Class 0 and (<b>b</b>) Class 1.</p>
Full article ">
31 pages, 875 KiB  
Article
Hierarchical Traffic Engineering in 3D Networks Using QoS-Aware Graph-Based Deep Reinforcement Learning
by Robert Kołakowski, Lechosław Tomaszewski, Rafał Tępiński and Sławomir Kukliński
Electronics 2025, 14(5), 1045; https://doi.org/10.3390/electronics14051045 - 6 Mar 2025
Viewed by 120
Abstract
Ubiquitous connectivity is envisioned through the integration of terrestrial (TNs) and non-terrestrial networks (NTNs). However, NTNs face multiple routing and Quality of Service (QoS) provisioning challenges due to the mobility of network nodes. Distributed Software-Defined Networking (SDN) combined with Multi-Agent Deep Reinforcement Learning [...] Read more.
Ubiquitous connectivity is envisioned through the integration of terrestrial (TNs) and non-terrestrial networks (NTNs). However, NTNs face multiple routing and Quality of Service (QoS) provisioning challenges due to the mobility of network nodes. Distributed Software-Defined Networking (SDN) combined with Multi-Agent Deep Reinforcement Learning (MADRL) is widely used to introduce programmability and intelligent Traffic Engineering (TE) in TNs, yet applying DRL to NTNs is hindered by frequently changing state sizes, model scalability, and coordination issues. This paper introduces 3DQR, a novel TE framework that combines hierarchical multi-controller SDN, hierarchical MADRL based on Graph Neural Networks (GNNs), and network topology predictions for QoS path provisioning, effective load distribution, and flow rejection minimisation in future 3D networks. To enhance SDN scalability, introduced are metrics and path operations abstractions to facilitate domain agents coordination by the global agent. To the best of the authors’ knowledge, 3DQR is the first routing scheme to integrate MADRL and GNNs for optimising centralised routing and path allocation in SDN-based 3D mobile networks. The evaluations show up to a 14% reduction in flow rejection rate, a 50% improvement in traffic distribution, and effective QoS class prioritisation compared to baseline techniques. 3DQR also exhibits strong transfer capabilities, giving consistent performance gains in previously unseen environments. Full article
(This article belongs to the Special Issue Future Generation Non-Terrestrial Networks)
Show Figures

Figure 1

Figure 1
<p>Overall view of 3DQR concept and interactions across components.</p>
Full article ">Figure 2
<p>M-QR/QR and DGA architecture and data flow.</p>
Full article ">Figure 3
<p>Path allocation in 3DQR concept.</p>
Full article ">Figure 4
<p>Architecture and interactions of DQR/M-QR and DGA (current and target Q-networks and loss function). Domain and overlay identifiers <math display="inline"><semantics> <mrow> <mi mathvariant="fraktur">d</mi> <mo>,</mo> <mi>ψ</mi> </mrow> </semantics></math> are omitted for simplification.</p>
Full article ">Figure 5
<p>Complexity comparison of 3DQR, H-SP, and SP routing for different average node degrees.</p>
Full article ">Figure 6
<p>Episodic reward obtained by 3DQR model in low-traffic environment: for domain agents TN, NTN and overlay <math display="inline"><semantics> <mi>ψ</mi> </semantics></math> (<b>left</b>); total agents’ reward (<b>right</b>).</p>
Full article ">Figure 7
<p>Rejected flows per test setups s0–s1.</p>
Full article ">Figure 8
<p>Rejected flows rate for tests per QoS class.</p>
Full article ">Figure 9
<p>Change of flow rejection rate per QoS class compared to baseline H-SP routing.</p>
Full article ">Figure 10
<p>Standard deviation of utilisation <math display="inline"><semantics> <msub> <mi>u</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math> of links per domain (mean marked with red dot) for different loads.</p>
Full article ">Figure 11
<p>Performance comparison of H-SP-routing and 3DQR model in terms of flow rejection rate (<b>left</b>) and load distribution (<b>right</b>). Codes 31 and 22 refer to topologies T1 and T2.</p>
Full article ">Figure 12
<p>Impact of aggregation interval on flow rerouting and rejection rate.</p>
Full article ">
13 pages, 2648 KiB  
Article
Comparative Study of Deep Transfer Learning Models for Semantic Segmentation of Human Mesenchymal Stem Cell Micrographs
by Maksim Solopov, Elizaveta Chechekhina, Anna Kavelina, Gulnara Akopian, Viktor Turchin, Andrey Popandopulo, Dmitry Filimonov and Roman Ishchenko
Int. J. Mol. Sci. 2025, 26(5), 2338; https://doi.org/10.3390/ijms26052338 - 6 Mar 2025
Viewed by 186
Abstract
The aim of this study is to conduct a comparative assessment of the effectiveness of neural network models—U-Net, DeepLabV3+, SegNet and Mask R-CNN—for the semantic segmentation of micrographs of human mesenchymal stem cells (MSCs). A dataset of 320 cell micrographs annotated by cell [...] Read more.
The aim of this study is to conduct a comparative assessment of the effectiveness of neural network models—U-Net, DeepLabV3+, SegNet and Mask R-CNN—for the semantic segmentation of micrographs of human mesenchymal stem cells (MSCs). A dataset of 320 cell micrographs annotated by cell biology experts was created. The models were trained using a transfer learning method based on ImageNet pre-trained weights. As a result, the U-Net model demonstrated the best segmentation accuracy according to the metrics of the Dice coefficient (0.876) and the Jaccard index (0.781). The DeepLabV3+ and Mask R-CNN models also showed high performance, although slightly lower than U-Net, while SegNet exhibited the least accurate results. The obtained data indicate that the U-Net model is the most suitable for automating the segmentation of MSC micrographs and can be recommended for use in biomedical laboratories to streamline the routine analysis of cell cultures. Full article
Show Figures

Figure 1

Figure 1
<p>Training graphs of neural network models for segmenting micrographs of mesenchymal stem cells (MSCs). Dynamics of changes in pixel accuracy (PA) and loss function for the investigated models on training and validation samples of micrographs during training: (<b>a</b>) U-Net, (<b>b</b>) DeepLabV3+, (<b>c</b>) SegNet, and (<b>d</b>) Mask R-CNN.</p>
Full article ">Figure 2
<p>Optimal prediction thresholds for U-Net, DeepLabV3+, SegNet, and Mask R-CNN segmentation models (from top to bottom) according to the Dice coefficient (DC), Jaccard index (JI) and PA metrics (from left to right). The optimal thresholds are defined as the maximum values of the functional dependencies of the metric on the threshold value. To plot the dependencies, the average value of each metric was calculated over 64 images from the validation sample at a given value of the varying threshold. The graphs show the mean values (blue line) with standard deviations (highlighted in gray).</p>
Full article ">Figure 3
<p>Comparison of the performance of segmentation models based on DC (<b>a</b>), JI (<b>b</b>), and PA (<b>c</b>) metrics. The charts show the distribution of metric values for each model. * <span class="html-italic">p</span> &lt; 0.05; ** <span class="html-italic">p</span> &lt; 0.01; **** <span class="html-italic">p</span> &lt; 0.0001; ns—differences are not significant.</p>
Full article ">Figure 4
<p>Examples of segmentation of MSC micrographs by neural network models: original images, ground truth masks, and masks predicted by U-Net, DeepLabV3+, SegNet, and Mask R-CNN models. The micrographs were captured at a magnification of 40×.</p>
Full article ">
17 pages, 8074 KiB  
Article
Automated Segmentation of Breast Cancer Focal Lesions on Ultrasound Images
by Dmitry Pasynkov, Ivan Egoshin, Alexey Kolchev, Ivan Kliouchkin, Olga Pasynkova, Zahraa Saad, Anis Daou and Esam Mohamed Abuzenar
Sensors 2025, 25(5), 1593; https://doi.org/10.3390/s25051593 - 5 Mar 2025
Viewed by 189
Abstract
Ultrasound (US) remains the main modality for the differential diagnosis of changes revealed by mammography. However, the US images themselves are subject to various types of noise and artifacts from reflections, which can worsen the quality of their analysis. Deep learning methods have [...] Read more.
Ultrasound (US) remains the main modality for the differential diagnosis of changes revealed by mammography. However, the US images themselves are subject to various types of noise and artifacts from reflections, which can worsen the quality of their analysis. Deep learning methods have a number of disadvantages, including the often insufficient substantiation of the model, and the complexity of collecting a representative training database. Therefore, it is necessary to develop effective algorithms for the segmentation, classification, and analysis of US images. The aim of the work is to develop a method for the automated detection of pathological lesions in breast US images and their segmentation. A method is proposed that includes two stages of video image processing: (1) searching for a region of interest using a random forest classifier, which classifies normal tissues, (2) selecting the contour of the lesion based on the difference in brightness of image pixels. The test set included 52 ultrasound videos which contained histologically proven suspicious lesions. The average frequency of lesion detection per frame was 91.89%, and the average accuracy of contour selection according to the IoU metric was 0.871. The proposed method can be used to segment a suspicious lesion. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Frames from two different ultrasound video sequences; the arrows indicate (<b>a</b>) histologically proven mucinous breast cancer; and (<b>b</b>) histologically proven ductal breast cancer.</p>
Full article ">Figure 2
<p>Block diagram of the proposed method for identifying the contour of a lesion in ultrasound video frames.</p>
Full article ">Figure 3
<p>(<b>a</b>,<b>c</b>) Original ultrasound images; (<b>b</b>,<b>d</b>) ultrasound images with marked tissues on them (skin, fat, fibrous tissue, glandular tissue, and artifacts).</p>
Full article ">Figure 4
<p>(<b>a</b>) Original ultrasound image; (<b>b</b>) result of tissue classification by the random forest classifier; (<b>c</b>) after applying the morphological dilation operation and median filter; (<b>d</b>) ground truth.</p>
Full article ">Figure 5
<p>(<b>a</b>) Rays drawn from the center of lesion <span class="html-italic">A</span>; (<b>b</b>) brightness values of the pixels <span class="html-italic">P<sub>i</sub></span> lying on the rays; (<b>c</b>) graphs of the brightness gradients Δ<span class="html-italic">P<sub>i</sub></span> of the pixels lying on the rays.</p>
Full article ">Figure 6
<p>(<b>a</b>) Points corresponding to the extrema of the brightness gradient on the rays; (<b>b</b>) constructed averaged cubic regression for them (solid thick yellow line).</p>
Full article ">Figure 7
<p>Results of tissue classification on frames from the ultrasound video sequences corresponding to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>. (<b>a</b>) Histologically proven mucinous breast carcinoma; (<b>b</b>) histologically proven ductal breast carcinoma from different video sequences of another patient; (<b>c</b>,<b>d</b>) ground truths.</p>
Full article ">Figure 7 Cont.
<p>Results of tissue classification on frames from the ultrasound video sequences corresponding to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>. (<b>a</b>) Histologically proven mucinous breast carcinoma; (<b>b</b>) histologically proven ductal breast carcinoma from different video sequences of another patient; (<b>c</b>,<b>d</b>) ground truths.</p>
Full article ">Figure 8
<p>The result of identifying unclassified objects in the frames of ultrasound video sequences corresponding to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>. (<b>a</b>,<b>b</b>) Show the highlighted objects after additional shape filtering, (<b>c</b>,<b>d</b>) show the rectangular areas circled around them, which will be the ROI.</p>
Full article ">Figure 9
<p>(<b>a</b>,<b>b</b>) Segmented contours of lesions using pixel intensity gradient detection along rays drawn from the center of gravity of the contour (yellow); (<b>c</b>,<b>d</b>) contours of the lesion outlined by the specialist physician (red). The images correspond to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>.</p>
Full article ">Figure 10
<p>Frame-by-frame processing of the video ((<b>left</b>) to (<b>right</b>), (<b>top</b>) to (<b>bottom</b>): frame 1, frame 10, frame 20, frame 30, frame 40, frame 50). In frames 40 and 50, the ROI was not detected due to the absence of a suspicious lesion.</p>
Full article ">
20 pages, 2690 KiB  
Article
Creating a Parallel Corpus for the Kazakh Sign Language and Learning
by Aigerim Yerimbetova, Bakzhan Sakenov, Madina Sambetbayeva, Elmira Daiyrbayeva, Ulmeken Berzhanova and Mohamed Othman
Appl. Sci. 2025, 15(5), 2808; https://doi.org/10.3390/app15052808 - 5 Mar 2025
Viewed by 178
Abstract
Kazakh Sign Language (KSL) is a crucial communication tool for individuals with hearing and speech impairments. Deep learning, particularly Transformer models, offers a promising approach to improving accessibility in education and communication. This study analyzes the syntactic structure of KSL, identifying its unique [...] Read more.
Kazakh Sign Language (KSL) is a crucial communication tool for individuals with hearing and speech impairments. Deep learning, particularly Transformer models, offers a promising approach to improving accessibility in education and communication. This study analyzes the syntactic structure of KSL, identifying its unique grammatical features and deviations from spoken Kazakh. A custom parser was developed to convert Kazakh text into KSL glosses, enabling the creation of a large-scale parallel corpus. Using this resource, a Transformer-based machine translation model was trained, achieving high translation accuracy and demonstrating the feasibility of this approach for enhancing communication accessibility. The research highlights key challenges in sign language processing, such as the limited availability of annotated data. Future work directions include the integration of video data and the adoption of more comprehensive evaluation metrics. This paper presents a methodology for constructing a parallel corpus through gloss annotations, contributing to advancements in sign language translation technology. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>An example of the grammatical structure of the KSL.</p>
Full article ">Figure 2
<p>KSL translation algorithm.</p>
Full article ">Figure 3
<p>The Transformer—model architecture [<a href="#B26-applsci-15-02808" class="html-bibr">26</a>].</p>
Full article ">Figure 4
<p>Stages of processing text data (Stages A, B, C).</p>
Full article ">Figure 5
<p>(<b>a</b>,<b>b</b>) Learning Process (Loss Function).</p>
Full article ">Figure 6
<p>BLEU Score.</p>
Full article ">
24 pages, 23486 KiB  
Article
Influence of Model Size and Image Augmentations on Object Detection in Low-Contrast Complex Background Scenes
by Harman Singh Sangha and Matthew J. Darr
AI 2025, 6(3), 52; https://doi.org/10.3390/ai6030052 - 5 Mar 2025
Viewed by 119
Abstract
Background: Bigger and more complex models are often developed for challenging object detection tasks, and image augmentations are used to train a robust deep learning model for small image datasets. Previous studies have suggested that smaller models provide better performance compared to bigger [...] Read more.
Background: Bigger and more complex models are often developed for challenging object detection tasks, and image augmentations are used to train a robust deep learning model for small image datasets. Previous studies have suggested that smaller models provide better performance compared to bigger models for agricultural applications, and not all image augmentation methods contribute equally to model performance. An important part of these studies was also to define the scene of the image. Methods: A standard definition was developed to describe scenes in real-world agricultural datasets by reviewing various image-based machine-learning applications in the agriculture literature. This study primarily evaluates the effects of model size in both one-stage and two-stage detectors on model performance for low-contrast complex background applications. It further explores the influence of different photo-metric image augmentation methods on model performance for standard one-stage and two-stage detectors. Results: For one-stage detectors, a smaller model performed better than a bigger model. Whereas in the case of two-stage detectors, model performance increased with model size. In image augmentations, some methods considerably improved model performance and some either provided no improvement or reduced the model performance in both one-stage and two-stage detectors compared to the baseline. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

Figure 1
<p>An image scene containing a red car in front of a grey house on a cloudy day. The description of the scene in the image uses both low-level descriptors (red, grey, and cloudy) and high-level descriptors (car, house, and in front of) (Google Image Search: ′car in front of a house′).</p>
Full article ">Figure 2
<p>Examples for staged images used for machine learning applications [<a href="#B94-ai-06-00052" class="html-bibr">94</a>,<a href="#B95-ai-06-00052" class="html-bibr">95</a>]. Maturity levels in (<b>a</b>) aroma tomato and (<b>b</b>) pear tomato.</p>
Full article ">Figure 3
<p>Number of instances for each descriptor for real-world agricultural scenes.</p>
Full article ">Figure 4
<p>Model architecture of some of the 2-stage detectors [<a href="#B96-ai-06-00052" class="html-bibr">96</a>].</p>
Full article ">Figure 5
<p>Model architecture of some of the 1-stage detectors [<a href="#B96-ai-06-00052" class="html-bibr">96</a>].</p>
Full article ">Figure 6
<p>RetinaNet [<a href="#B108-ai-06-00052" class="html-bibr">108</a>].</p>
Full article ">Figure 7
<p>A few sample images from the Global Wheat Dataset.</p>
Full article ">Figure 8
<p>Visual examples of random brightness, random contrast, random saturation, random distort color, and random Red-Green-Blue (RGB) to gray-scale conversion on the Global Wheat Dataset.</p>
Full article ">Figure 9
<p>Illustration showing the difference between a poor detection and a good detection while comparing the overlap of two regions. The green box is the ground truth bounding box and the red box is the predicted bounding box.</p>
Full article ">Figure 10
<p>A sample precision-recall curve.</p>
Full article ">Figure 11
<p>Comparison of mAP and testing loss for all five augmentations with RetinaNet.</p>
Full article ">Figure 12
<p>Comparison of mAP and testing loss for all five augmentations with Faster-RCNN.</p>
Full article ">
20 pages, 3774 KiB  
Article
Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning
by Hongyu Han, Shengjie Wang, Baojun Qiao, Lanxue Dang, Xiaomei Zou, Hui Xue and Yingqi Wang
Information 2025, 16(3), 201; https://doi.org/10.3390/info16030201 - 5 Mar 2025
Viewed by 148
Abstract
Aspect-based sentiment analysis (ABSA) through joint task learning aims to simultaneously identify aspect terms and predict their sentiment polarities. However, existing methods face two major challenges: (1) Most existing studies focus on the sentiment polarity classification task, ignoring the critical role of aspect [...] Read more.
Aspect-based sentiment analysis (ABSA) through joint task learning aims to simultaneously identify aspect terms and predict their sentiment polarities. However, existing methods face two major challenges: (1) Most existing studies focus on the sentiment polarity classification task, ignoring the critical role of aspect term extraction, leading to insufficient performance in capturing aspect-related information; (2) existing methods typically model the two tasks independently, failing to effectively share underlying features and semantic information, which weakens the synergy between the tasks and limits the overall performance of the model. In order to resolve these issues, this research suggests a unified framework model through joint task learning, named MTL-GCN, to simultaneously perform aspect term extraction and sentiment polarity classification. The proposed model utilizes dependency trees combined with self-attention mechanisms to generate new weight matrices, emphasizing the locational information of aspect terms, and optimizes the graph convolutional network (GCN) to extract aspect terms more efficiently. Furthermore, the model employs the multi-head attention (MHA) mechanism to process input data and uses its output as the input to the GCN. Next, GCN models the graph structure of the input data, capturing the relationships between nodes and global structural information, fully integrating global contextual semantic information, and generating deep-level contextual feature representations. Finally, the extracted aspect-related features are fused with global features and applied to the sentiment classification task. The proposed unified framework achieves state-of-the-art performance, as evidenced by experimental results on four benchmark datasets. MTL-GCN outperforms baseline models in terms of F1ATE, accuracy, and F1SC metrics, as demonstrated by experimental results on four benchmark datasets. Additionally, comparative and ablation studies further validate the rationale and effectiveness of the model design. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>This figure illustrates how aspect-based sentiment analysis is conducted through graph convolutional networks and multi-task learning. The pink sections correspond to the aspect term extraction task, while the orange sections represent the sentiment polarity classification task.</p>
Full article ">Figure 2
<p>This figure illustrates the process of generating input embeddings for the sentence “The price is reasonable although the service is poor”. The input embedding consists of three components: token embeddings, segment embeddings, and position embeddings. Token embeddings are the word vectors obtained through word embedding techniques; segment embeddings are used to distinguish different parts of the sentence, with each word receiving a corresponding segment embedding based on its position within the sentence; position embeddings contain information about the word’s position within the sentence.</p>
Full article ">Figure 3
<p>The visualization of the complete dependency tree structure for the sentence “The price is reasonable although the service is poor”, showing the relationships between words and their grammatical functions.</p>
Full article ">Figure 4
<p>This figure illustrates the effect of the number of GCN layers, with subgraph (<b>a</b>) showing the impact of P-GCN layers on the ATE task and subgraph (<b>b</b>) showing the impact of GCN layers on the SC task.</p>
Full article ">Figure 5
<p>Attention layer visualization. Darker colors indicate higher attention scores. Subfigure (<b>a</b>) presents the visualization results for the ATE task, while subfigure (<b>b</b>) shows the visualization results for the SC task.</p>
Full article ">
23 pages, 7793 KiB  
Article
A New, Robust, Adaptive, Versatile, and Scalable Abandoned Object Detection Approach Based on DeepSORT Dynamic Prompts, and Customized LLM for Smart Video Surveillance
by Merve Yilmazer and Mehmet Karakose
Appl. Sci. 2025, 15(5), 2774; https://doi.org/10.3390/app15052774 - 4 Mar 2025
Viewed by 150
Abstract
Video cameras are one of the important elements in ensuring security in public areas. Videos inspected by expert personnel using traditional methods may have a high error rate and take a long time to complete. In this study, a new deep learning-based method [...] Read more.
Video cameras are one of the important elements in ensuring security in public areas. Videos inspected by expert personnel using traditional methods may have a high error rate and take a long time to complete. In this study, a new deep learning-based method is proposed for the detection of abandoned objects, such as bags, suitcases, and suitcases left unsupervised in public areas. Transfer learning-based keyframe detection was first performed to remove unnecessary and repetitive frames from the ABODA dataset. Then, human and object classes were detected using the weights of the YOLOv8l model, which has a fast and effective object detection feature. Abandoned object detection is achieved by tracking classes in consecutive frames with the DeepSORT algorithm and measuring the distance between them. In addition, the location information of the human and object classes in the frames was analyzed by a large language model supported by prompt engineering. Thus, an explanation output regarding the location, size, and estimation rate of the object and human classes was created for the authorities. It is observed that the proposed model produces promising results comparable to the state-of-the-art methods for suspicious object detection from videos with success metrics of 97.9% precision, 97.0% recall, and 97.4% f1-score. Full article
Show Figures

Figure 1

Figure 1
<p>Abandoned object detection based on background subtraction [<a href="#B22-applsci-15-02774" class="html-bibr">22</a>].</p>
Full article ">Figure 2
<p>Block diagram of the proposed approach.</p>
Full article ">Figure 3
<p>Block diagram of keyframe detection based on ResNet101v2.</p>
Full article ">Figure 4
<p>Block diagram of YOLOv8-based person and object detection.</p>
Full article ">Figure 5
<p>Abandoned object detection flowchart.</p>
Full article ">Figure 6
<p>Performance comparison of YOLOv8 deep neural network sub-architectures.</p>
Full article ">Figure 7
<p>Mean average precision change curve—precision/recall curve.</p>
Full article ">Figure 8
<p>Person—object detection model input and output.</p>
Full article ">Figure 9
<p>YOLOv8-based person and object detection model outputs.</p>
Full article ">Figure 9 Cont.
<p>YOLOv8-based person and object detection model outputs.</p>
Full article ">Figure 10
<p>Object and human detection image/sample txt file.</p>
Full article ">Figure 11
<p>Prompt and sample output generated for LLM.</p>
Full article ">Figure 12
<p>Abandoned object detection TP, FP, and FN value comparison.</p>
Full article ">Figure 13
<p>Ablation experiments results.</p>
Full article ">Figure 14
<p>Proposed method outputs.</p>
Full article ">Figure 14 Cont.
<p>Proposed method outputs.</p>
Full article ">Figure 15
<p>Proposed method precision comparison.</p>
Full article ">
16 pages, 37656 KiB  
Article
Smoke and Fire-You Only Look Once: A Lightweight Deep Learning Model for Video Smoke and Flame Detection in Natural Scenes
by Chenmeng Zhao, Like Zhao, Ka Zhang, Yinghua Ren, Hui Chen and Yehua Sheng
Fire 2025, 8(3), 104; https://doi.org/10.3390/fire8030104 - 4 Mar 2025
Viewed by 228
Abstract
Owing to the demand for smoke and flame detection in natural scenes, this paper proposes a lightweight deep learning model, SF-YOLO (Smoke and Fire-YOLO), for video smoke and flame detection in such environments. Firstly, YOLOv11 is employed as the backbone network, combined with [...] Read more.
Owing to the demand for smoke and flame detection in natural scenes, this paper proposes a lightweight deep learning model, SF-YOLO (Smoke and Fire-YOLO), for video smoke and flame detection in such environments. Firstly, YOLOv11 is employed as the backbone network, combined with the C3k2 module based on a two-path residual attention mechanism, and a target detection head frame with an embedded attention mechanism. This combination enhances the response of the unobscured regions to compensate for the feature loss in occluded regions, thereby addressing the occlusion problem in dynamic backgrounds. Then, a two-channel loss function (W-SIoU) based on dynamic tuning and intelligent focusing is designed to enhance loss computation in the boundary regions, thus improving the YOLOv11 model’s ability to recognize targets with ambiguous boundaries. Finally, the algorithms proposed in this paper are experimentally validated using the self-generated dataset S-Firedata and the public smoke and flame virtual dataset M4SFWD. These datasets are derived from internet smoke and flame video frame extraction images and open-source smoke and flame dataset images, respectively. The experimental results demonstrate, compared with deep learning models such as YOLOv8, Gold-YOLO, and Faster-RCNN, the SF-YOLO model proposed in this paper is more lightweight and exhibits higher detection accuracy and robustness. The metrics mAP50 and mAP50-95 are improved by 2.5% and 2.4%, respectively, in the self-made dataset S-Firedata, and by 0.7% and 1.4%, respectively, in the publicly available dataset M4SFWD. The research presented in this paper provides practical methods for the automatic detection of smoke and flame in natural scenes, which can further enhance the effectiveness of fire monitoring systems. Full article
Show Figures

Figure 1

Figure 1
<p>Overall technical flowchart of the algorithm in this paper.</p>
Full article ">Figure 2
<p>The two modules of C3k2_DWR.</p>
Full article ">Figure 3
<p>SEAMHead’s Fully Connected Network Architecture.</p>
Full article ">Figure 4
<p>W-SIoU Schematic.</p>
Full article ">Figure 5
<p>Smoke flame detection effectiveness of different deep neural network models in the case of remote sensing fire targets. With the exception of YOLOv11 and SF-YOLO, none of the models detected the target. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 6
<p>Effectiveness of different deep neural network models for smoke flame detection in multi-target situations. Centernet can only detect a portion of the targets and none of the models except SF-YOLO detect small targets in the image. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 7
<p>Effectiveness of different deep neural network models for smoke flame detection in the case of including small targets. Only SF-YOLO successfully recognizes all targets on the image. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 8
<p>Effectiveness of different deep neural network models for smoke flame detection in situations containing targets in dark environments. Centernet and Faster-RCNN have leakage detection, and all other models have lower detection accuracy than SF-YOLO. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 9
<p>Smoke flame detection effectiveness of different deep neural network models in the case of including occluded targets. With the exception of YOLOv11 and SF-YOLO, all models were designed to detect flame targets other than those obscured by foliage. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 10
<p>Effectiveness of different deep neural network models for smoke flame detection in the case of including fire-like targets. Only SF-YOLO and Faster-RCNN succeeded in identifying the obfuscated target. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 11
<p>Detection effectiveness of the SF-YOLO algorithm in the Los Angeles Hill Fire. The algorithm in this paper accurately detects and identifies the scattered small fires in the graph with a confidence level of about 40% for the tiny targets.</p>
Full article ">
19 pages, 5493 KiB  
Article
YOLO-SWD—An Improved Ship Recognition Algorithm for Feature Occlusion Scenarios
by Ruyan Zhou, Mingkang Gu and Haiyan Pan
Appl. Sci. 2025, 15(5), 2749; https://doi.org/10.3390/app15052749 - 4 Mar 2025
Viewed by 285
Abstract
Ship detection and recognition hold significant application value in both military and civilian domains. With the continuous advancement of deep learning technologies, multi-category ship detection and recognition methods based on deep learning have garnered increasing attention. However, challenges such as feature occlusion caused [...] Read more.
Ship detection and recognition hold significant application value in both military and civilian domains. With the continuous advancement of deep learning technologies, multi-category ship detection and recognition methods based on deep learning have garnered increasing attention. However, challenges such as feature occlusion caused by interfering objects, cloudy and foggy weather leading to feature loss, and insufficient accuracy in remote sensing imagery persist. This study aims to enhance the accuracy and robustness of ship recognition by improving deep learning-based object detection models, enabling the algorithm to perform ship detection and recognition tasks effectively in feature-occluded scenarios. In this research, we propose a ship detection and recognition algorithm based on YOLOv11. YOLOv11 possesses stronger feature extraction capabilities and its multi-branch structure effectively captures features of targets at different scales. Three improved modules are introduced: the DLKA module enhances the perception of local details and global context through dynamic deformable convolution and large receptive field attention mechanisms; the CKSP module improves the model’s ability to extract target boundaries and shapes; and the WTHead enhances the diversity and robustness of feature extraction. Comparative experiments with classical object detection models on visible and SAR datasets, which include a variety of feature occlusion scenarios, show that our proposed model achieved the best results across multiple metrics, specifically, our method achieved a mAP of 83.9%, surpassing the second-best result by 2.7%. Full article
Show Figures

Figure 1

Figure 1
<p>Common feature occlusion phenomena in remote sensing ship imagery. while (<b>a</b>,<b>c</b>,<b>d</b>) depict scenarios with cloud or fog occlusion; (<b>b</b>,<b>e</b>,<b>f</b>) illustrate occlusions caused by other ships or port obstacles.</p>
Full article ">Figure 2
<p>Examples of internationally common ship datasets and images from the proposed dataset.</p>
Full article ">Figure 3
<p>Overall architecture of the proposed model.</p>
Full article ">Figure 4
<p>Structure of the CKSP module.</p>
Full article ">Figure 5
<p>Structure of the DLKA module.</p>
Full article ">Figure 6
<p>Structure of the WTConv module.</p>
Full article ">Figure 7
<p>Training mAP curves and loss function curves.</p>
Full article ">Figure 8
<p>Qualitative comparison results (1–12) (magenta: CV, navy blue: DDG, white: CVN, light green: LST, light blue: CG; the orange and blue ellipses and red triangles represent missed detections and wrong detections, respectively).</p>
Full article ">
Back to TopTop