Saved Queries

Drug–target affinity (DTA) prediction is a critical step in virtual screening and significantly accelerates drug development. However, existing deep learning-based methods relying on single-modal representations (e.g., text or graphs) struggle to fully capture the complex interactions between drugs and targets. This study proposes CM-DTA, a cross-modal feature fusion model that integrates drug textual representations and molecular graphs with target protein amino acid sequences and structural graphs, enhancing feature diversity and expressiveness. The model employs the multi-perceptive neighborhood self-attention aggregation strategy to capture first- and second-order neighborhood information, overcoming limitations in graph isomorphism networks (GIN) for structural representation. The experimental results on the Davis and KIBA datasets show that CM-DTA significantly improves the performance of drug–target affinity prediction, achieving higher accuracy and better prediction metrics compared to state-of-the-art (SOTA) models. Full article

►▼ Show Figures

Figure 1

22 pages, 2432 KiB

Open AccessArticle

A Framework for Integrating Deep Learning and Symbolic AI Towards an Explainable Hybrid Model for the Detection of COVID-19 Using Computerized Tomography Scans

by Vengai Musanga, Serestina Viriri and Colin Chibaya

Information 2025, 16(3), 208; https://doi.org/10.3390/info16030208 (registering DOI) - 7 Mar 2025

Abstract

The integration of Deep Learning and Symbolic Artificial Intelligence (AI) offers a promising hybrid framework for enhancing diagnostic accuracy and explainability in critical applications such as COVID-19 detection using computerized tomography (CT) scans. This study proposes a novel hybrid AI model that leverages the strengths of both approaches: the automated feature extraction and classification capabilities of Deep Learning and the logical reasoning and interpretability of Symbolic AI. Key components of the model include the adaptive deformable module, which improves spatial feature extraction by addressing variations in lung anatomy, and the attention-based encoder, which enhances feature saliency by focusing on critical regions within CT scans. Experimental validation using performance metrics such as F1-score, accuracy, precision, and recall demonstrates the model’s significant improvement over baseline configurations, achieving near-perfect accuracy (99.16%) and F1-score (0.9916). This hybrid AI framework not only achieves state-of-the-art diagnostic performance but also ensures interpretability through its symbolic reasoning layer, facilitating its adoption in healthcare settings. The findings underscore the potential of combining advanced machine learning techniques with symbolic approaches to create robust and transparent AI systems for critical medical applications. Full article

►▼ Show Figures

Graphical abstract

21 pages, 2488 KiB

Open AccessArticle

Classification of Mycena and Marasmius Species Using Deep Learning Models: An Ecological and Taxonomic Approach

by Fatih Ekinci, Guney Ugurlu, Giray Sercan Ozcan, Koray Acici, Tunc Asuroglu, Eda Kumru, Mehmet Serdar Guzel and Ilgaz Akata

Sensors 2025, 25(6), 1642; https://doi.org/10.3390/s25061642 - 7 Mar 2025

Viewed by 100

Abstract

Fungi play a critical role in ecosystems, contributing to biodiversity and providing economic and biotechnological value. In this study, we developed a novel deep learning-based framework for the classification of seven macrofungi species from the genera Mycena and Marasmius, leveraging their unique ecological and morphological characteristics. The proposed approach integrates a custom convolutional neural network (CNN) with a self-organizing map (SOM) adapted for supervised learning and a Kolmogorov–Arnold Network (KAN) layer to enhance classification performance. The experimental results demonstrate significant improvements in classification metrics when using the CNN-SOM and CNN-KAN architectures. Additionally, advanced pretrained models such as MaxViT-S and ResNetV2-50 achieved high accuracy rates, with MaxViT-S achieving 98.9% accuracy. Statistical analyses using the chi-square test confirmed the reliability of the results, emphasizing the importance of validating evaluation metrics statistically. This research represents the first application of SOM in fungal classification and highlights the potential of deep learning in advancing fungal taxonomy. Future work will focus on optimizing the KAN architecture and expanding the dataset to include more fungal classes, further enhancing classification accuracy and ecological understanding. Full article

(This article belongs to the Special Issue Computer Vision and Machine Learning for Intelligent Sensing Systems—2nd Edition)

►▼ Show Figures

Figure 1

Figure 1
The proposed CNN-SOM architecture. Full article ">Figure 2
A macroscopic overview of Mycena species. Full article ">Figure 3
A macroscopic overview of Marasmius species. Full article ">

52 pages, 29859 KiB

Open AccessReview

2D Object Detection: A Survey

by Emanuele Malagoli and Luca Di Persio

Mathematics 2025, 13(6), 893; https://doi.org/10.3390/math13060893 (registering DOI) - 7 Mar 2025

Viewed by 141

Abstract

Object detection is a fundamental task in computer vision, aiming to identify and localize objects of interest within an image. Over the past two decades, the domain has changed profoundly, evolving into an active and fast-moving field while simultaneously becoming the foundation for a wide range of modern applications. This survey provides a comprehensive review of the evolution of 2D generic object detection, tracing its development from traditional methods relying on handcrafted features to modern approaches driven by deep learning. The review systematically categorizes contemporary object detection methods into three key paradigms: one-stage, two-stage, and transformer-based, highlighting their development milestones and core contributions. The paper provides an in-depth analysis of each paradigm, detailing landmark methods and their impact on the progression of the field. Additionally, the survey examines some fundamental components of 2D object detection such as loss functions, datasets, evaluation metrics, and future trends. Full article

(This article belongs to the Special Issue Advanced Research in Image Processing and Optimization Methods)

►▼ Show Figures

Graphical abstract

30 pages, 1422 KiB

Open AccessFeature PaperArticle

A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models

by Andreas Karathanasis, John Violos and Ioannis Kompatsiaris

Mathematics 2025, 13(5), 887; https://doi.org/10.3390/math13050887 (registering DOI) - 6 Mar 2025

Viewed by 129

Abstract

DeepFake detection models play a crucial role in ambient intelligence and smart environments, where systems rely on authentic information for accurate decisions. These environments, integrating interconnected IoT devices and AI-driven systems, face significant threats from DeepFakes, potentially leading to compromised trust, erroneous decisions, and security breaches. To mitigate these risks, neural-network-based DeepFake detection models have been developed. However, their substantial computational requirements and long training times hinder deployment on resource-constrained edge devices. This paper investigates compression and transfer learning techniques to reduce the computational demands of training and deploying DeepFake detection models, while preserving performance. Pruning, knowledge distillation, quantization, and adapter modules are explored to enable efficient real-time DeepFake detection. An evaluation was conducted on four benchmark datasets: “SynthBuster”, “140k Real and Fake Faces”, “DeepFake and Real Images”, and “ForenSynths”. It compared compressed models with uncompressed baselines using widely recognized metrics such as accuracy, precision, recall, F1-score, model size, and training time. The results showed that a compressed model at 10% of the original size retained only 56% of the baseline accuracy, but fine-tuning in similar scenarios increased this to nearly 98%. In some cases, the accuracy even surpassed the original’s performance by up to 12%. These findings highlight the feasibility of deploying DeepFake detection models in edge computing scenarios. Full article

(This article belongs to the Special Issue Ambient Intelligence Methods and Applications)

►▼ Show Figures

Figure 1

20 pages, 3271 KiB

Open AccessArticle

Fine-Tuned Machine Learning Classifiers for Diagnosing Parkinson’s Disease Using Vocal Characteristics: A Comparative Analysis

by Mehmet Meral, Ferdi Ozbilgin and Fatih Durmus

Diagnostics 2025, 15(5), 645; https://doi.org/10.3390/diagnostics15050645 - 6 Mar 2025

Viewed by 82

Abstract

Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine learning algorithms optimized to classify PD based on vocal characteristics to serve as a non-invasive and easily accessible diagnostic tool. Methods: This study used a publicly available dataset of vocal samples from 188 people with PD and 64 controls. Acoustic features like baseline characteristics, time-frequency components, Mel Frequency Cepstral Coefficients (MFCCs), and wavelet transform-based metrics were extracted and analyzed. The Chi-Square test was used for feature selection to determine the most important attributes that enhanced the accuracy of the classification. Six different machine learning classifiers, namely SVM, k-NN, DT, NN, Ensemble and Stacking models, were developed and optimized via Bayesian Optimization (BO), Grid Search (GS) and Random Search (RS). Accuracy, precision, recall, F1-score and AUC-ROC were used for evaluation. Results: It has been found that Stacking models, especially those fine-tuned via Grid Search, yielded the best performance with 92.07% accuracy and an F1-score of 0.95. In addition to that, the choice of relevant vocal features, in conjunction with the Chi-Square feature selection method, greatly enhanced the computational efficiency and classification performance. Conclusions: This study highlights the potential of combining advanced feature selection techniques with hyperparameter optimization strategies to enhance machine learning-based PD diagnosis using vocal characteristics. Ensemble models proved particularly effective in handling complex datasets, demonstrating robust diagnostic performance. Future research may focus on deep learning approaches and temporal feature integration to further improve diagnostic accuracy and scalability for clinical applications. Full article

(This article belongs to the Special Issue Artificial Intelligence and Deep Learning in Clinical Classification and Prediction)

►▼ Show Figures

Figure 1

Figure 1
Gender distribution of samples in the dataset. Full article ">Figure 2
Proposed methodology. Full article ">Figure 3
Proposed Stacking Learning method. Full article ">Figure 4
ROC curves of machine learning classifiers optimized (a) with BO parameters, (b) with RS parameters, and (c) with GS parameters. Full article ">Figure 4 Cont.
ROC curves of machine learning classifiers optimized (a) with BO parameters, (b) with RS parameters, and (c) with GS parameters. Full article ">Figure 5
Comparison of AUC values across different models and optimization methods for PD classification. Full article ">Figure 6
SHAP summary plots for GS-Ensemble model: feature contributions to PD classification for (a) Class 0 and (b) Class 1. Full article ">Figure 6 Cont.
SHAP summary plots for GS-Ensemble model: feature contributions to PD classification for (a) Class 0 and (b) Class 1. Full article ">

31 pages, 875 KiB

Open AccessArticle

Hierarchical Traffic Engineering in 3D Networks Using QoS-Aware Graph-Based Deep Reinforcement Learning

by Robert Kołakowski, Lechosław Tomaszewski, Rafał Tępiński and Sławomir Kukliński

Electronics 2025, 14(5), 1045; https://doi.org/10.3390/electronics14051045 - 6 Mar 2025

Viewed by 120

Abstract

Ubiquitous connectivity is envisioned through the integration of terrestrial (TNs) and non-terrestrial networks (NTNs). However, NTNs face multiple routing and Quality of Service (QoS) provisioning challenges due to the mobility of network nodes. Distributed Software-Defined Networking (SDN) combined with Multi-Agent Deep Reinforcement Learning (MADRL) is widely used to introduce programmability and intelligent Traffic Engineering (TE) in TNs, yet applying DRL to NTNs is hindered by frequently changing state sizes, model scalability, and coordination issues. This paper introduces 3DQR, a novel TE framework that combines hierarchical multi-controller SDN, hierarchical MADRL based on Graph Neural Networks (GNNs), and network topology predictions for QoS path provisioning, effective load distribution, and flow rejection minimisation in future 3D networks. To enhance SDN scalability, introduced are metrics and path operations abstractions to facilitate domain agents coordination by the global agent. To the best of the authors’ knowledge, 3DQR is the first routing scheme to integrate MADRL and GNNs for optimising centralised routing and path allocation in SDN-based 3D mobile networks. The evaluations show up to a 14% reduction in flow rejection rate, a 50% improvement in traffic distribution, and effective QoS class prioritisation compared to baseline techniques. 3DQR also exhibits strong transfer capabilities, giving consistent performance gains in previously unseen environments. Full article

(This article belongs to the Special Issue Future Generation Non-Terrestrial Networks)

►▼ Show Figures

Figure 1

13 pages, 2648 KiB

Open AccessArticle

Comparative Study of Deep Transfer Learning Models for Semantic Segmentation of Human Mesenchymal Stem Cell Micrographs

by Maksim Solopov, Elizaveta Chechekhina, Anna Kavelina, Gulnara Akopian, Viktor Turchin, Andrey Popandopulo, Dmitry Filimonov and Roman Ishchenko

Int. J. Mol. Sci. 2025, 26(5), 2338; https://doi.org/10.3390/ijms26052338 - 6 Mar 2025

Viewed by 186

Abstract

The aim of this study is to conduct a comparative assessment of the effectiveness of neural network models—U-Net, DeepLabV3+, SegNet and Mask R-CNN—for the semantic segmentation of micrographs of human mesenchymal stem cells (MSCs). A dataset of 320 cell micrographs annotated by cell biology experts was created. The models were trained using a transfer learning method based on ImageNet pre-trained weights. As a result, the U-Net model demonstrated the best segmentation accuracy according to the metrics of the Dice coefficient (0.876) and the Jaccard index (0.781). The DeepLabV3+ and Mask R-CNN models also showed high performance, although slightly lower than U-Net, while SegNet exhibited the least accurate results. The obtained data indicate that the U-Net model is the most suitable for automating the segmentation of MSC micrographs and can be recommended for use in biomedical laboratories to streamline the routine analysis of cell cultures. Full article

(This article belongs to the Special Issue Fundamental and Practical Perspectives in Regenerative Medicine: Proceedings of the VI National Congress of Regenerative Medicine (2024))

►▼ Show Figures

Figure 1

Figure 1
Training graphs of neural network models for segmenting micrographs of mesenchymal stem cells (MSCs). Dynamics of changes in pixel accuracy (PA) and loss function for the investigated models on training and validation samples of micrographs during training: (a) U-Net, (b) DeepLabV3+, (c) SegNet, and (d) Mask R-CNN. Full article ">Figure 2
Optimal prediction thresholds for U-Net, DeepLabV3+, SegNet, and Mask R-CNN segmentation models (from top to bottom) according to the Dice coefficient (DC), Jaccard index (JI) and PA metrics (from left to right). The optimal thresholds are defined as the maximum values of the functional dependencies of the metric on the threshold value. To plot the dependencies, the average value of each metric was calculated over 64 images from the validation sample at a given value of the varying threshold. The graphs show the mean values (blue line) with standard deviations (highlighted in gray). Full article ">Figure 3
Comparison of the performance of segmentation models based on DC (a), JI (b), and PA (c) metrics. The charts show the distribution of metric values for each model. * p < 0.05; ** p < 0.01; **** p < 0.0001; ns—differences are not significant. Full article ">Figure 4
Examples of segmentation of MSC micrographs by neural network models: original images, ground truth masks, and masks predicted by U-Net, DeepLabV3+, SegNet, and Mask R-CNN models. The micrographs were captured at a magnification of 40×. Full article ">

17 pages, 8074 KiB

Open AccessArticle

Automated Segmentation of Breast Cancer Focal Lesions on Ultrasound Images

by Dmitry Pasynkov, Ivan Egoshin, Alexey Kolchev, Ivan Kliouchkin, Olga Pasynkova, Zahraa Saad, Anis Daou and Esam Mohamed Abuzenar

Sensors 2025, 25(5), 1593; https://doi.org/10.3390/s25051593 - 5 Mar 2025

Viewed by 189

Abstract

Ultrasound (US) remains the main modality for the differential diagnosis of changes revealed by mammography. However, the US images themselves are subject to various types of noise and artifacts from reflections, which can worsen the quality of their analysis. Deep learning methods have a number of disadvantages, including the often insufficient substantiation of the model, and the complexity of collecting a representative training database. Therefore, it is necessary to develop effective algorithms for the segmentation, classification, and analysis of US images. The aim of the work is to develop a method for the automated detection of pathological lesions in breast US images and their segmentation. A method is proposed that includes two stages of video image processing: (1) searching for a region of interest using a random forest classifier, which classifies normal tissues, (2) selecting the contour of the lesion based on the difference in brightness of image pixels. The test set included 52 ultrasound videos which contained histologically proven suspicious lesions. The average frequency of lesion detection per frame was 91.89%, and the average accuracy of contour selection according to the IoU metric was 0.871. The proposed method can be used to segment a suspicious lesion. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

20 pages, 2690 KiB

Open AccessArticle

Creating a Parallel Corpus for the Kazakh Sign Language and Learning

by Aigerim Yerimbetova, Bakzhan Sakenov, Madina Sambetbayeva, Elmira Daiyrbayeva, Ulmeken Berzhanova and Mohamed Othman

Appl. Sci. 2025, 15(5), 2808; https://doi.org/10.3390/app15052808 - 5 Mar 2025

Viewed by 178

Abstract

Kazakh Sign Language (KSL) is a crucial communication tool for individuals with hearing and speech impairments. Deep learning, particularly Transformer models, offers a promising approach to improving accessibility in education and communication. This study analyzes the syntactic structure of KSL, identifying its unique grammatical features and deviations from spoken Kazakh. A custom parser was developed to convert Kazakh text into KSL glosses, enabling the creation of a large-scale parallel corpus. Using this resource, a Transformer-based machine translation model was trained, achieving high translation accuracy and demonstrating the feasibility of this approach for enhancing communication accessibility. The research highlights key challenges in sign language processing, such as the limited availability of annotated data. Future work directions include the integration of video data and the adoption of more comprehensive evaluation metrics. This paper presents a methodology for constructing a parallel corpus through gloss annotations, contributing to advancements in sign language translation technology. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

Figure 1
An example of the grammatical structure of the KSL. Full article ">Figure 2
KSL translation algorithm. Full article ">Figure 3
The Transformer—model architecture [<a href="#B26-applsci-15-02808" class="html-bibr">26</a>]. Full article ">Figure 4
Stages of processing text data (Stages A, B, C). Full article ">Figure 5
(a,b) Learning Process (Loss Function). Full article ">Figure 6
BLEU Score. Full article ">

24 pages, 23486 KiB

Open AccessArticle

Influence of Model Size and Image Augmentations on Object Detection in Low-Contrast Complex Background Scenes

by Harman Singh Sangha and Matthew J. Darr

AI 2025, 6(3), 52; https://doi.org/10.3390/ai6030052 - 5 Mar 2025

Viewed by 119

Abstract

Background: Bigger and more complex models are often developed for challenging object detection tasks, and image augmentations are used to train a robust deep learning model for small image datasets. Previous studies have suggested that smaller models provide better performance compared to bigger models for agricultural applications, and not all image augmentation methods contribute equally to model performance. An important part of these studies was also to define the scene of the image. Methods: A standard definition was developed to describe scenes in real-world agricultural datasets by reviewing various image-based machine-learning applications in the agriculture literature. This study primarily evaluates the effects of model size in both one-stage and two-stage detectors on model performance for low-contrast complex background applications. It further explores the influence of different photo-metric image augmentation methods on model performance for standard one-stage and two-stage detectors. Results: For one-stage detectors, a smaller model performed better than a bigger model. Whereas in the case of two-stage detectors, model performance increased with model size. In image augmentations, some methods considerably improved model performance and some either provided no improvement or reduced the model performance in both one-stage and two-stage detectors compared to the baseline. Full article

(This article belongs to the Special Issue Artificial Intelligence in Agriculture)

►▼ Show Figures

Figure 1

20 pages, 3774 KiB

Open AccessArticle

Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning

by Hongyu Han, Shengjie Wang, Baojun Qiao, Lanxue Dang, Xiaomei Zou, Hui Xue and Yingqi Wang

Information 2025, 16(3), 201; https://doi.org/10.3390/info16030201 - 5 Mar 2025

Viewed by 148

Abstract

Aspect-based sentiment analysis (ABSA) through joint task learning aims to simultaneously identify aspect terms and predict their sentiment polarities. However, existing methods face two major challenges: (1) Most existing studies focus on the sentiment polarity classification task, ignoring the critical role of aspect term extraction, leading to insufficient performance in capturing aspect-related information; (2) existing methods typically model the two tasks independently, failing to effectively share underlying features and semantic information, which weakens the synergy between the tasks and limits the overall performance of the model. In order to resolve these issues, this research suggests a unified framework model through joint task learning, named MTL-GCN, to simultaneously perform aspect term extraction and sentiment polarity classification. The proposed model utilizes dependency trees combined with self-attention mechanisms to generate new weight matrices, emphasizing the locational information of aspect terms, and optimizes the graph convolutional network (GCN) to extract aspect terms more efficiently. Furthermore, the model employs the multi-head attention (MHA) mechanism to process input data and uses its output as the input to the GCN. Next, GCN models the graph structure of the input data, capturing the relationships between nodes and global structural information, fully integrating global contextual semantic information, and generating deep-level contextual feature representations. Finally, the extracted aspect-related features are fused with global features and applied to the sentiment classification task. The proposed unified framework achieves state-of-the-art performance, as evidenced by experimental results on four benchmark datasets. MTL-GCN outperforms baseline models in terms of F1_ATE, accuracy, and F1_SC metrics, as demonstrated by experimental results on four benchmark datasets. Additionally, comparative and ablation studies further validate the rationale and effectiveness of the model design. Full article

►▼ Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
This figure illustrates how aspect-based sentiment analysis is conducted through graph convolutional networks and multi-task learning. The pink sections correspond to the aspect term extraction task, while the orange sections represent the sentiment polarity classification task. Full article ">Figure 2
This figure illustrates the process of generating input embeddings for the sentence “The price is reasonable although the service is poor”. The input embedding consists of three components: token embeddings, segment embeddings, and position embeddings. Token embeddings are the word vectors obtained through word embedding techniques; segment embeddings are used to distinguish different parts of the sentence, with each word receiving a corresponding segment embedding based on its position within the sentence; position embeddings contain information about the word’s position within the sentence. Full article ">Figure 3
The visualization of the complete dependency tree structure for the sentence “The price is reasonable although the service is poor”, showing the relationships between words and their grammatical functions. Full article ">Figure 4
This figure illustrates the effect of the number of GCN layers, with subgraph (a) showing the impact of P-GCN layers on the ATE task and subgraph (b) showing the impact of GCN layers on the SC task. Full article ">Figure 5
Attention layer visualization. Darker colors indicate higher attention scores. Subfigure (a) presents the visualization results for the ATE task, while subfigure (b) shows the visualization results for the SC task. Full article ">

23 pages, 7793 KiB

Open AccessArticle

A New, Robust, Adaptive, Versatile, and Scalable Abandoned Object Detection Approach Based on DeepSORT Dynamic Prompts, and Customized LLM for Smart Video Surveillance

by Merve Yilmazer and Mehmet Karakose

Appl. Sci. 2025, 15(5), 2774; https://doi.org/10.3390/app15052774 - 4 Mar 2025

Viewed by 150

Abstract

Video cameras are one of the important elements in ensuring security in public areas. Videos inspected by expert personnel using traditional methods may have a high error rate and take a long time to complete. In this study, a new deep learning-based method is proposed for the detection of abandoned objects, such as bags, suitcases, and suitcases left unsupervised in public areas. Transfer learning-based keyframe detection was first performed to remove unnecessary and repetitive frames from the ABODA dataset. Then, human and object classes were detected using the weights of the YOLOv8l model, which has a fast and effective object detection feature. Abandoned object detection is achieved by tracking classes in consecutive frames with the DeepSORT algorithm and measuring the distance between them. In addition, the location information of the human and object classes in the frames was analyzed by a large language model supported by prompt engineering. Thus, an explanation output regarding the location, size, and estimation rate of the object and human classes was created for the authorities. It is observed that the proposed model produces promising results comparable to the state-of-the-art methods for suspicious object detection from videos with success metrics of 97.9% precision, 97.0% recall, and 97.4% f1-score. Full article

►▼ Show Figures

Figure 1

16 pages, 37656 KiB

Open AccessArticle

Smoke and Fire-You Only Look Once: A Lightweight Deep Learning Model for Video Smoke and Flame Detection in Natural Scenes

by Chenmeng Zhao, Like Zhao, Ka Zhang, Yinghua Ren, Hui Chen and Yehua Sheng

Fire 2025, 8(3), 104; https://doi.org/10.3390/fire8030104 - 4 Mar 2025

Viewed by 228

Abstract

Owing to the demand for smoke and flame detection in natural scenes, this paper proposes a lightweight deep learning model, SF-YOLO (Smoke and Fire-YOLO), for video smoke and flame detection in such environments. Firstly, YOLOv11 is employed as the backbone network, combined with the C3k2 module based on a two-path residual attention mechanism, and a target detection head frame with an embedded attention mechanism. This combination enhances the response of the unobscured regions to compensate for the feature loss in occluded regions, thereby addressing the occlusion problem in dynamic backgrounds. Then, a two-channel loss function (W-SIoU) based on dynamic tuning and intelligent focusing is designed to enhance loss computation in the boundary regions, thus improving the YOLOv11 model’s ability to recognize targets with ambiguous boundaries. Finally, the algorithms proposed in this paper are experimentally validated using the self-generated dataset S-Firedata and the public smoke and flame virtual dataset M⁴SFWD. These datasets are derived from internet smoke and flame video frame extraction images and open-source smoke and flame dataset images, respectively. The experimental results demonstrate, compared with deep learning models such as YOLOv8, Gold-YOLO, and Faster-RCNN, the SF-YOLO model proposed in this paper is more lightweight and exhibits higher detection accuracy and robustness. The metrics mAP50 and mAP50-95 are improved by 2.5% and 2.4%, respectively, in the self-made dataset S-Firedata, and by 0.7% and 1.4%, respectively, in the publicly available dataset M⁴SFWD. The research presented in this paper provides practical methods for the automatic detection of smoke and flame in natural scenes, which can further enhance the effectiveness of fire monitoring systems. Full article

(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)

►▼ Show Figures

Figure 1

19 pages, 5493 KiB

Open AccessArticle

YOLO-SWD—An Improved Ship Recognition Algorithm for Feature Occlusion Scenarios

by Ruyan Zhou, Mingkang Gu and Haiyan Pan

Appl. Sci. 2025, 15(5), 2749; https://doi.org/10.3390/app15052749 - 4 Mar 2025

Viewed by 285

Abstract

Ship detection and recognition hold significant application value in both military and civilian domains. With the continuous advancement of deep learning technologies, multi-category ship detection and recognition methods based on deep learning have garnered increasing attention. However, challenges such as feature occlusion caused by interfering objects, cloudy and foggy weather leading to feature loss, and insufficient accuracy in remote sensing imagery persist. This study aims to enhance the accuracy and robustness of ship recognition by improving deep learning-based object detection models, enabling the algorithm to perform ship detection and recognition tasks effectively in feature-occluded scenarios. In this research, we propose a ship detection and recognition algorithm based on YOLOv11. YOLOv11 possesses stronger feature extraction capabilities and its multi-branch structure effectively captures features of targets at different scales. Three improved modules are introduced: the DLKA module enhances the perception of local details and global context through dynamic deformable convolution and large receptive field attention mechanisms; the CKSP module improves the model’s ability to extract target boundaries and shapes; and the WTHead enhances the diversity and robustness of feature extraction. Comparative experiments with classical object detection models on visible and SAR datasets, which include a variety of feature occlusion scenarios, show that our proposed model achieved the best results across multiple metrics, specifically, our method achieved a mAP of 83.9%, surpassing the second-best result by 2.7%. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 48.

Go to page 1 2 3 4 5

Search Results (2,364)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI