Saved Queries

The present study seeks to improve the accuracy and reliability of disease identification in apple fruits and leaves through the use of state-of-the-art deep learning techniques. The research investigates several state-of-the-art architectures, such as Xception, InceptionV3, InceptionResNetV2, EfficientNetV2M, MobileNetV3Large, ResNet152V2, DenseNet201, and NASNetLarge. Among the models evaluated, ResNet152V2 performed best in the classification of apple fruit diseases, with a rate of 92%, whereas Xception proved most effective in the classification of apple leaf diseases, with 99% accuracy. The models were able to correctly recognize familiar apple diseases like blotch, scab, rot, and other leaf infections, showing their applicability in agriculture diagnosis. An important by-product of this research is the creation of a web application, easily accessible using Gradio, to conduct real-time disease detection through the upload of apple fruit and leaf images by users. The app gives predicted disease labels along with confidence values and elaborate information on symptoms and management. The system also includes a visualization tool for the inner workings of the neural network, thereby enabling higher transparency and trust in the diagnostic process. Future research will aim to widen the scope of the system to other crop species, with larger disease databases, and to improve explainability further to facilitate real-world agricultural application. Full article

(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)

►▼ Show Figures

Figure 1

Figure 1
Architecture of the Proposed Model for Apple Fruit Diseases Classification. Full article ">Figure 2
Architecture of the Proposed Model for Apple Leaves Diseases Classification. Full article ">Figure 3
Experimental Workflow (on the left); Dataset Structure (on the right). Full article ">

24 pages, 5275 KiB

Open AccessArticle

Force Map-Enhanced Segmentation of a Lightweight Model for the Early Detection of Cervical Cancer

by Sabina Umirzakova, Shakhnoza Muksimova, Jushkin Baltayev and Young Im Cho

Diagnostics 2025, 15(5), 513; https://doi.org/10.3390/diagnostics15050513 - 20 Feb 2025

Abstract

Background/Objectives: Accurate and efficient segmentation of cervical cells is crucial for the early detection of cervical cancer, enabling timely intervention and treatment. Existing segmentation models face challenges with complex cellular arrangements, such as overlapping cells and indistinct boundaries, and are often computationally intensive, which limits their deployment in resource-constrained settings. Methods: In this study, we introduce a lightweight and efficient segmentation model specifically designed for cervical cell analysis. The model employs a MobileNetV2 architecture for feature extraction, ensuring a minimal parameter count conducive to real-time processing. To enhance boundary delineation, we propose a novel force map approach that drives pixel adjustments inward toward the centers of cells, thus improving cell separation in densely packed areas. Additionally, we integrate extreme point supervision to refine segmentation outcomes using minimal boundary annotations, rather than full pixel-wise labels. Results: Our model was rigorously trained and evaluated on a comprehensive dataset of cervical cell images. It achieved a Dice Coefficient of 0.87 and a Boundary F1 Score of 0.84, performances that are comparable to those of advanced models but with considerably lower inference times. The optimized model operates at approximately 50 frames per second on standard low-power hardware. Conclusions: By effectively balancing segmentation accuracy with computational efficiency, our model addresses critical barriers to the widespread adoption of automated cervical cell segmentation tools. Its ability to perform in real time on low-cost devices makes it an ideal candidate for clinical applications and deployment in low-resource environments. This advancement holds significant potential for enhancing access to cervical cancer screening and diagnostics worldwide, thereby supporting broader healthcare initiatives. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Medical Image Processing and Analysis)

►▼ Show Figures

Figure 1

Figure 1
Improved contextual integration and feature extraction with MobileNetV2 for accurate cell segmentation. Full article ">Figure 2
Description of the SipakMed dataset. Full article ">Figure 3
Results of image segmentation using the proposed model. Full article ">Figure 4
Performance curves for model robustness and overfitting prevention. Full article ">

28 pages, 5098 KiB

Open AccessArticle

A Methodological Framework for AI-Assisted Diagnosis of Ovarian Masses Using CT and MR Imaging

by Pratik Adusumilli, Nishant Ravikumar, Geoff Hall and Andrew F. Scarsbrook

J. Pers. Med. 2025, 15(2), 76; https://doi.org/10.3390/jpm15020076 - 19 Feb 2025

Abstract

Background: Ovarian cancer encompasses a diverse range of neoplasms originating in the ovaries, fallopian tubes, and peritoneum. Despite being one of the commonest gynaecological malignancies, there are no validated screening strategies for early detection. A diagnosis typically relies on imaging, biomarkers, and multidisciplinary team discussions. The accurate interpretation of CTs and MRIs may be challenging, especially in borderline cases. This study proposes a methodological pipeline to develop and evaluate deep learning (DL) models that can assist in classifying ovarian masses from CT and MRI data, potentially improving diagnostic confidence and patient outcomes. Methods: A multi-institutional retrospective dataset was compiled, supplemented by external data from the Cancer Genome Atlas. Two classification workflows were examined: (1) whole-volume input and (2) lesion-focused region of interest. Multiple DL architectures, including ResNet, DenseNet, transformer-based UNeST, and Attention Multiple-Instance Learning (MIL), were implemented within the PyTorch-based MONAI framework. The class imbalance was mitigated using focal loss, oversampling, and dynamic class weighting. The hyperparameters were optimised with Optuna, and balanced accuracy was the primary metric. Results: For a preliminary dataset, the proposed framework demonstrated feasibility for the multi-class classification of ovarian masses. The initial experiments highlighted the potential of transformers and MIL for identifying the relevant imaging features. Conclusions: A reproducible methodological pipeline for DL-based ovarian mass classification using CT and MRI scans has been established. Future work will leverage a multi-institutional dataset to refine these models, aiming to enhance clinical workflows and improve patient outcomes. Full article

(This article belongs to the Special Issue Artificial Intelligence Applications in Precision Oncology)

►▼ Show Figures

Graphical abstract

33 pages, 3144 KiB

Open AccessArticle

CNN-Based Optimization for Fish Species Classification: Tackling Environmental Variability, Class Imbalance, and Real-Time Constraints

by Amirhosein Mohammadisabet, Raza Hasan, Vishal Dattana, Salman Mahmood and Saqib Hussain

Information 2025, 16(2), 154; https://doi.org/10.3390/info16020154 - 19 Feb 2025

Abstract

Automated fish species classification is essential for marine biodiversity monitoring, fisheries management, and ecological research. However, challenges such as environmental variability, class imbalance, and computational demands hinder the development of robust classification models. This study investigates the effectiveness of convolutional neural network (CNN)-based models and hybrid approaches to address these challenges. Eight CNN architectures, including DenseNet121, MobileNetV2, and Xception, were compared alongside traditional classifiers like support vector machines (SVMs) and random forest. DenseNet121 achieved the highest accuracy (90.2%), leveraging its superior feature extraction and generalization capabilities, while MobileNetV2 balanced accuracy (83.57%) with computational efficiency, processing images in 0.07 s, making it ideal for real-time deployment. Advanced preprocessing techniques, such as data augmentation, turbidity simulation, and transfer learning, were employed to enhance dataset robustness and address class imbalance. Hybrid models combining CNNs with traditional classifiers achieved intermediate accuracy with improved interpretability. Optimization techniques, including pruning and quantization, reduced model size by 73.7%, enabling real-time deployment on resource-constrained devices. Grad-CAM visualizations further enhanced interpretability by identifying key image regions influencing predictions. This study highlights the potential of CNN-based models for scalable, interpretable fish species classification, offering actionable insights for sustainable fisheries management and biodiversity conservation. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics)

►▼ Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
Framework for the research methodology in fish species classification. Full article ">Figure 2
Example of augmented images. Full article ">Figure 3
Confusion matrix for DenseNet121 performance. Full article ">Figure 4
Training and validation loss for DenseNet121. Full article ">Figure 5
Grad-CAM heatmap analysis for a single class. Full article ">Figure 6
Comparative Grad-CAM visualizations across multiple classes. Full article ">Figure 7
Turbidity simulation results and model predictions. Full article ">

15 pages, 3085 KiB

Open AccessArticle

Early Detection of Skin Diseases Across Diverse Skin Tones Using Hybrid Machine Learning and Deep Learning Models

by Akasha Aquil, Faisal Saeed, Souad Baowidan, Abdullah Marish Ali and Nouh Sabri Elmitwally

Information 2025, 16(2), 152; https://doi.org/10.3390/info16020152 - 19 Feb 2025

Abstract

Skin diseases in melanin-rich skin often present diagnostic challenges due to the unique characteristics of darker skin tones, which can lead to misdiagnosis or delayed treatment. This disparity impacts millions within diverse communities, highlighting the need for accurate, AI-based diagnostic tools. In this paper, we investigated the performance of three machine learning methods -Support Vector Machines (SVMs), Random Forest (RF), and Decision Trees (DTs)-combined with state-of-the-art (SOTA) deep learning models, EfficientNet, MobileNetV2, and DenseNet121, for predicting skin conditions using dermoscopic images from the HAM10000 dataset. The features were extracted using the deep learning models, with the labels encoded numerically. To address the data imbalance, SMOTE and resampling techniques were applied. Additionally, Principal Component Analysis (PCA) was used for feature reduction, and fine-tuning was performed to optimize the models. The results demonstrated that RF with DenseNet121 achieved a superior accuracy of 98.32%, followed by SVM with MobileNetV2 at 98.08%, and Decision Tree with MobileNetV2 at 85.39%. The proposed methods overcome the SVM with the SOTA EfficientNet model, validating the robustness of the proposed approaches. Evaluation metrics such as accuracy, precision, recall, and F1-score were used to benchmark performance, showcasing the potential of these methods in advancing skin disease diagnostics for diverse populations. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

►▼ Show Figures

Figure 1

22 pages, 2391 KiB

Open AccessArticle

Terrestrial Storage of Biomass (Biomass Burial): A Natural, Carbon-Efficient, and Low-Cost Method for Removing CO₂ from Air

by Jeffrey A. Amelse

Appl. Sci. 2025, 15(4), 2183; https://doi.org/10.3390/app15042183 - 18 Feb 2025

Abstract

Terrestrial Storage of Biomass (TSB) is a Negative Emission Technology for removing CO₂ already in the atmosphere. TSB is compared to other NETs and is shown to be a natural, carbon-efficient, and low-cost option. Nature performs the work of removal by growing biomass via photosynthesis. The key to permanent sequestration is to bury the biomass in pits designed to minimize the decomposition. The chemistry of biomass formation and decomposition is reviewed to provide best practices for the TSB burial pit design. Methane formation from even a small amount of decomposition has been raised as a concern. This concern is shown to be unfounded due to a great difference in time constants for methane formation and its removal from the air by ozone oxidation. Methane has a short lifetime in air of only about 12 years. Woody biomass decomposition undergoes exponential decay spread over hundreds to thousands of years. It is inherently slow due to the cross-linking and dense packing of cellulose, which means that the attack can only occur at the surface. A model that couples the slow and exponential decay of the rate of methane formation with the fast removal by oxidation shows that methane will peak at a very small fraction of the buried biomass carbon within about 10 years and then rapidly decline towards zero. The implication is that no additional equipment needs to be added to TSB to collect and burn the methane. Certified carbon credits are listed on various exchanges. The US DOE has recently issued grants for TSB development. Full article

(This article belongs to the Special Issue CCUS: Paving the Way to Net Zero Emissions Technologies)

►▼ Show Figures

Figure 1

Figure 1
Relative species concentrations as a function of pH (Sturm [<a href="#B28-applsci-15-02183" class="html-bibr">28</a>]). Full article ">Figure 2
The effect of adding Ca2+ to the carbonic acid buffer system (Sturm [<a href="#B28-applsci-15-02183" class="html-bibr">28</a>]). Full article ">Figure 3
US bioethanol build-up compared to gasoline and crude oil exports. Data source: EIA [<a href="#B38-applsci-15-02183" class="html-bibr">38</a>]. Full article ">Figure 4
Pathways for the decomposition of complex municipal waste (Emcon Associates [<a href="#B46-applsci-15-02183" class="html-bibr">46</a>]). Full article ">Figure 5
Evolution of biogas composition during waste decomposition (Emcon Associates [<a href="#B46-applsci-15-02183" class="html-bibr">46</a>]). Full article ">Figure 6
The fraction of biomass carbon converted to methane (blue) and the fraction of biomass carbon converted to methane that remains in the atmosphere (magenta). Full article ">

20 pages, 7127 KiB

Open AccessArticle

Cross-Attention Adaptive Feature Pyramid Network with Uncertainty Boundary Modeling for Mass Detection in Digital Breast Tomosynthesis

by Xinyu Ma, Haotian Sun, Gang Yuan, Yufei Tang, Jie Liu, Shuangqing Chen and Jian Zheng

Bioengineering 2025, 12(2), 196; https://doi.org/10.3390/bioengineering12020196 - 17 Feb 2025

Abstract

Computer-aided detection (CADe) of masses in digital breast tomosynthesis (DBT) is crucial for early breast cancer diagnosis. However, the variability in the size and morphology of breast masses and their resemblance to surrounding tissues present significant challenges. Current CNN-based CADe methods, particularly those that use Feature Pyramid Networks (FPN), often fail to integrate multi-scale information effectively and struggle to handle dense glandular tissue with high-density or iso-density mass lesions due to the unidirectional integration and progressive attenuation of features, leading to high false positive rates. Additionally, the commonly indistinct boundaries of breast masses introduce uncertainty in boundary localization, which makes traditional Dirac boundary modeling insufficient for precise boundary regression. To address these issues, we propose the CU-Net network, which efficiently fuses multi-scale features and accurately models blurred boundaries. Specifically, the CU-Net introduces the Cross-Attention Adaptive Feature Pyramid Network (CA-FPN), which enhances the effectiveness and accuracy of feature interactions through a cross-attention mechanism to capture global correlations across multi-scale feature maps. Simultaneously, the Breast Density Perceptual Module (BDPM) incorporates breast density information to weight intermediate features, thereby improving the network’s focus on dense breast regions susceptible to false positives. For blurred mass boundaries, we introduce Uncertainty Boundary Modeling (UBM) to model the positional distribution function of predicted bounding boxes for masses with uncertain boundaries. In comparative experiments on an in-house clinical DBT dataset and the BCS-DBT dataset, the proposed method achieved sensitivities of 89.68% and 72.73% at 2 false positives per DBT volume (FPs/DBT), respectively, significantly outperforming existing state-of-the-art detection methods. This method offers clinicians rapid, accurate, and objective diagnostic assistance, demonstrating substantial potential for clinical application. Full article

(This article belongs to the Section Biosignal Processing)

►▼ Show Figures

Figure 1

Figure 1
The illustration of blurred mass edges or obscured by dense glandular tissue. Blurry edges are indicated by red dashed ellipses. Full article ">Figure 2
Overall architecture of proposed method. The BDPM and CAFPN are used to further integrate and collect the features extracted by the backbone. The UBM is placed in the regression branch of the detection head to predict more accurate 2D bounding boxes. Full article ">Figure 3
Architecture of CA-FPN. The module directly connects deep features with shallow features, preventing the gradual attenuation of feature transmission seen in traditional FPN. Full article ">Figure 4
Architecture of BDPM. The module weights the intermediate features of the network using breast density information to enhance the network’s focus on dense breast regions. Full article ">Figure 5
Three-dimensional aggregation. These 2D detection results are fused along the z-axis to yield the final 3D detection results. Full article ">Figure 6
The FROC curves of the comparison methods [<a href="#B25-bioengineering-12-00196" class="html-bibr">25</a>,<a href="#B27-bioengineering-12-00196" class="html-bibr">27</a>,<a href="#B35-bioengineering-12-00196" class="html-bibr">35</a>,<a href="#B36-bioengineering-12-00196" class="html-bibr">36</a>,<a href="#B37-bioengineering-12-00196" class="html-bibr">37</a>,<a href="#B38-bioengineering-12-00196" class="html-bibr">38</a>,<a href="#B39-bioengineering-12-00196" class="html-bibr">39</a>] on mass-detection task. Full article ">Figure 7
The detection result visualization of different models, in which the green boxes represent ground truth, the aqua blue boxes represent true positive results, and the yellow boxes represent false positive results. Full article ">Figure 8
The heatmap visualization of the detection results using Grad-CAM, in which regions highlighted in red indicate areas of the feature map that receive higher attention from the network and regions depicted in blue represent areas with lower attention from the network. The green boxes represent ground truth. The proposed method effectively focuses on the ground truth regions, thereby achieving accurate detection results. Full article ">

19 pages, 11226 KiB

Open AccessArticle

Evaluation of Weed Infestations in Row Crops Using Aerial RGB Imaging and Deep Learning

by Plamena D. Nikolova, Boris I. Evstatiev, Atanas Z. Atanasov and Asparuh I. Atanasov

Agriculture 2025, 15(4), 418; https://doi.org/10.3390/agriculture15040418 - 16 Feb 2025

Abstract

One of the important factors negatively affecting the yield of row crops is weed infestations. Using non-contact detection methods allows for a rapid assessment of weed infestations’ extent and management decisions for practical weed control. This study aims to develop and demonstrate a methodology for early detection and evaluation of weed infestations in maize using UAV-based RGB imaging and pixel-based deep learning classification. An experimental study was conducted to determine the extent of weed infestations on two tillage technologies, plowing and subsoiling, tailored to the specific soil and climatic conditions of Southern Dobrudja. Based on an experimental study with the DeepLabV3 classification algorithm, it was found that the ResNet-34-backed model ensures the highest performance compared to different versions of ResNet, DenseNet, and VGG backbones. The achieved performance reached precision, recall, F1 score, and Kappa, respectively, 0.986, 0.986, 0.986, and 0.957. After applying the model in the field with the investigated tillage technologies, it was found that a higher level of weed infestation is observed in subsoil deepening areas, where 4.6% of the area is infested, compared to 0.97% with the plowing treatment. This work contributes novel insights into weed management during the critical early growth stages of maize, providing a robust framework for optimizing weed control strategies in this region. Full article

(This article belongs to the Section Digital Agriculture)

►▼ Show Figures

Figure 1

14 pages, 10065 KiB

Open AccessArticle

Automatic Evaluation of Bone Age Using Hand Radiographs and Pancorporal Radiographs in Adolescent Idiopathic Scoliosis

by Ifrah Andleeb, Bilal Zahid Hussain, Julie Joncas, Soraya Barchi, Marjolaine Roy-Beaudry, Stefan Parent, Guy Grimard, Hubert Labelle and Luc Duong

Diagnostics 2025, 15(4), 452; https://doi.org/10.3390/diagnostics15040452 - 13 Feb 2025

Abstract

Background/Objectives: Adolescent idiopathic scoliosis (AIS) is a complex, three-dimensional spinal deformity that requires monitoring of skeletal maturity for effective management. Accurate bone age assessment is important for evaluating developmental progress in AIS. Traditional methods rely on ossification center observations, but recent advances in deep learning (DL) might pave the way for automatic grading of bone age. Methods: The goal of this research is to propose a new deep neural network (DNN) and evaluate class activation maps for bone age assessment in AIS using hand radiographs. We developed a custom neural network based on DenseNet201 and trained it on the RSNA Bone Age dataset. Results: The model achieves an average mean absolute error (MAE) of 4.87 months on more than 250 clinical testing AIS patient dataset. To enhance transparency and trust, we introduced Score-CAM, an explainability tool that reveals the regions of interest contributing to accurate bone age predictions. We compared our model with the BoneXpert system, demonstrating similar performance, which signifies the potential of our approach to reduce inter-rater variability and expedite clinical decision-making. Conclusions: This study outlines the role of deep learning in improving the precision and efficiency of bone age assessment, particularly for AIS patients. Future work involves the detection of other regions of interest and the integration of other ossification centers. Full article

(This article belongs to the Section Medical Imaging and Theranostics)

►▼ Show Figures

Figure 1

22 pages, 11164 KiB

Open AccessArticle

Acoustic Emission-Based Pipeline Leak Detection and Size Identification Using a Customized One-Dimensional DenseNet

by Faisal Saleem, Zahoor Ahmad, Muhammad Farooq Siddique, Muhammad Umar and Jong-Myon Kim

Sensors 2025, 25(4), 1112; https://doi.org/10.3390/s25041112 - 12 Feb 2025

Abstract

Effective leak detection and leak size identification are essential for maintaining the operational safety, integrity, and longevity of industrial pipelines. Traditional methods often suffer from high noise sensitivity, limited adaptability to non-stationary signals, and excessive computational costs, which limits their feasibility for real-time monitoring applications. This study presents a novel acoustic emission (AE)-based pipeline monitoring approach, integrating Empirical Wavelet Transform (EWT) for adaptive frequency decomposition with customized one-dimensional DenseNet architecture to achieve precise leak detection and size classification. The methodology begins with EWT-based signal segmentation, which isolates meaningful frequency bands to enhance leak-related feature extraction. To further improve signal quality, adaptive thresholding and denoising techniques are applied, filtering out low-amplitude noise while preserving critical diagnostic information. The denoised signals are processed using a DenseNet-based deep learning model, which combines convolutional layers and densely connected feature propagation to extract fine-grained temporal dependencies, ensuring the accurate classification of leak presence and severity. Experimental validation was conducted on real-world AE data collected under controlled leak and non-leak conditions at varying pressure levels. The proposed model achieved an exceptional leak detection accuracy of 99.76%, demonstrating its ability to reliably differentiate between normal operation and multiple leak severities. This method effectively reduces computational costs while maintaining robust performance across diverse operating environments. Full article

(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2025)

►▼ Show Figures

Figure 1

18 pages, 8926 KiB

Open AccessArticle

Research on Damage Detection Methods for Concrete Beams Based on Ground Penetrating Radar and Convolutional Neural Networks

by Ning Liu, Ya Ge, Xin Bai, Zi Zhang, Yuhao Shangguan and Yan Li

Appl. Sci. 2025, 15(4), 1882; https://doi.org/10.3390/app15041882 - 12 Feb 2025

Abstract

Ground penetrating radar (GPR) is a mature and important research method in the field of structural non-destructive testing. However, when the detection target scale is small and the amount of data collected is limited, it poses a serious challenge for this research method. In order to verify the applicability of typical one-dimensional radar signals combined with convolutional neural networks (CNN) in the non-destructive testing of concrete structures, this study created concrete specimens with embedded defects (voids, non-dense solids, and cracks) commonly found in concrete structures in a laboratory setting. High-frequency GPR equipment is used for data acquisition, A-scan data corresponding to different defects is extracted as a training set, and appropriate labeling is carried out. The extracted original radar signals were taken as the input of the CNN model. At the same time, in order to improve the sensitivity of the CNN models to specific damage types, the spectrums of A-scan are also used as part of the training datasets of the CNN models. In this paper, two CNN models with different dimensions are used to train the datasets and evaluate the classification results; one is the traditional one-dimensional CNN model, and the other is the classical two-dimensional CNN architecture AlexNet. In addition, the finite difference time domain (FDTD) model of three-dimensional complex media is established by gprMax, and the propagation characteristics of GPR in concrete media are simulated. The results of applying this method to both simulated and experimental data show that combining the A-scan data of ground penetrating radar and their spectrums as input with the CNN model can effectively identify different types of damage and defects inside the concrete structure. Compared with the one-dimensional CNN model, AlexNet has obvious advantages in extracting complex signal features and processing high-dimensional data. The feasibility of this method in the research field of damage detection of concrete structures has been verified. Full article

(This article belongs to the Special Issue Ground Penetrating Radar: Data, Imaging, and Signal Analysis)

►▼ Show Figures

Figure 1

24 pages, 16681 KiB

Open AccessArticle

A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification

by Anas W. Abulfaraj and Faisal Binzagr

Big Data Cogn. Comput. 2025, 9(2), 39; https://doi.org/10.3390/bdcc9020039 - 11 Feb 2025

Abstract

Convolutional Neural Networks (CNNs) have proven to be very effective in image classification due to their status as a powerful feature learning algorithm. Traditional approaches have considered the problem of multiclass classification, where the goal is to classify a set of objects at once. However, co-occurrence can make the discriminative features of the target less salient and may lead to overfitting of the model, resulting in lower performance. To address this, we propose a multi-label classification ensemble model including a Vision Transformer (ViT) and CNN for directly detecting one or multiple objects in an image. First, we improve the MobileNetV2 and DenseNet201 models using extra convolutional layers to strengthen image classification. In detail, three convolution layers are applied in parallel at the end of both models. ViT can learn dependencies among distant positions and local detail, making it an effective tool for multi-label classification. Finally, an ensemble learning algorithm is used to combine the classification predictions of the ViT, the modified MobileNetV2, and DenseNet201 bands for increased image classification accuracy using a voting system. The performance of the proposed model is examined on four benchmark datasets, achieving accuracies of 98.24%, 98.89%, 99.91%, and 96.69% on ASCAL VOC 2007, PASCAL VOC 2012, MS-COCO, and NUS-WIDE 318, respectively, showing that our framework can enhance current state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

►▼ Show Figures

Figure 1

18 pages, 16173 KiB

Open AccessArticle

Comparative Analysis of Deep Learning Architectures for Macular Hole Segmentation in OCT Images: A Performance Evaluation of U-Net Variants

by H. M. S. S. Herath, S. L. P. Yasakethu, Nuwan Madusanka, Myunggi Yi and Byeong-Il Lee

J. Imaging 2025, 11(2), 53; https://doi.org/10.3390/jimaging11020053 - 11 Feb 2025

Abstract

This study presents a comprehensive comparison of U-Net variants with different backbone architectures for Macular Hole (MH) segmentation in optical coherence tomography (OCT) images. We evaluated eleven architectures, including U-Net combined with InceptionNetV4, VGG16, VGG19, ResNet152, DenseNet121, EfficientNet-B7, MobileNetV2, Xception, and Transformer. Models were assessed using the Dice coefficient and HD95 metrics on the OIMHS dataset. While HD95 proved unreliable for small regions like MH, often returning ‘nan’ values, the Dice coefficient provided consistent performance evaluation. InceptionNetV4 + U-Net achieved the highest Dice coefficient (0.9672), demonstrating superior segmentation accuracy. Although considered state-of-the-art, Transformer + U-Net showed poor performance in MH and intraretinal cyst (IRC) segmentation. Analysis of computational resources revealed that MobileNetV2 + U-Net offered the most efficient performance with minimal parameters, while InceptionNetV4 + U-Net balanced accuracy with moderate computational demands. Our findings suggest that CNN-based backbones, particularly InceptionNetV4, are more effective than Transformer architectures for OCT image segmentation, with InceptionNetV4 + U-Net emerging as the most promising model for clinical applications. Full article

►▼ Show Figures

Figure 1

Figure 1
(a) U-net replaced by a CNN encoder; (b) Workflow of this study. Full article ">Figure 2
Dice coefficient trends. Full article ">Figure 3
Metrics across models. Full article ">Figure 4
Performance comparison. Full article ">

21 pages, 3599 KiB

Open AccessArticle

Using Deep Learning to Identify Deepfakes Created Using Generative Adversarial Networks

by Jhanvi Jheelan and Sameerchand Pudaruth

Computers 2025, 14(2), 60; https://doi.org/10.3390/computers14020060 - 10 Feb 2025

Abstract

Generative adversarial networks (GANs) have revolutionised various fields by creating highly realistic images, videos, and audio, thus enhancing applications such as video game development and data augmentation. However, this technology has also given rise to deepfakes, which pose serious challenges due to their potential to create deceptive content. Thousands of media reports have informed us of such occurrences, highlighting the urgent need for reliable detection methods. This study addresses the issue by developing a deep learning (DL) model capable of distinguishing between real and fake face images generated by StyleGAN. Using a subset of the 140K real and fake face dataset, we explored five different models: a custom CNN, ResNet50, DenseNet121, MobileNet, and InceptionV3. We leveraged the pre-trained models to utilise their robust feature extraction and computational efficiency, which are essential for distinguishing between real and fake features. Through extensive experimentation with various dataset sizes, preprocessing techniques, and split ratios, we identified the optimal ones. The 20k_gan_8_1_1 dataset produced the best results, with MobileNet achieving a test accuracy of 98.5%, followed by InceptionV3 at 98.0%, DenseNet121 at 97.3%, ResNet50 at 96.1%, and the custom CNN at 86.2%. All of these models were trained on only 16,000 images and validated and tested on 2000 images each. The custom CNN model was built with a simpler architecture of two convolutional layers and, hence, lagged in accuracy due to its limited feature extraction capabilities compared with deeper networks. This research work also included the development of a user-friendly web interface that allows deepfake detection by uploading images. The web interface backend was developed using Flask, enabling real-time deepfake detection, allowing users to upload images for analysis and demonstrating a practical use for platforms in need of quick, user-friendly verification. This application demonstrates significant potential for practical applications, such as on social media platforms, where the model can help prevent the spread of fake content by flagging suspicious images for review. This study makes important contributions by comparing different deep learning models, including a custom CNN, to understand the balance between model complexity and accuracy in deepfake detection. It also identifies the best dataset setup that improves detection while keeping computational costs low. Additionally, it introduces a user-friendly web tool that allows real-time deepfake detection, making the research useful for social media moderation, security, and content verification. Nevertheless, identifying specific features of GAN-generated deepfakes remains challenging due to their high realism. Future works will aim to expand the dataset by using all 140,000 images, refine the custom CNN model to increase its accuracy, and incorporate more advanced techniques, such as Vision Transformers and diffusion models. The outcomes of this study contribute to the ongoing efforts to counteract the negative impacts of GAN-generated images. Full article

►▼ Show Figures

Figure 1

17 pages, 1944 KiB

Open AccessArticle

Pediatric Pneumonia Recognition Using an Improved DenseNet201 Model with Multi-Scale Convolutions and Mish Activation Function

by Petra Radočaj, Dorijan Radočaj and Goran Martinović

Algorithms 2025, 18(2), 98; https://doi.org/10.3390/a18020098 - 10 Feb 2025

Abstract

Pediatric pneumonia remains a significant global health issue, particularly in low- and middle-income countries, where it contributes substantially to mortality in children under five. This study introduces a deep learning model for pediatric pneumonia diagnosis from chest X-rays that surpasses the performance of state-of-the-art methods reported in the recent literature. Using a DenseNet201 architecture with a Mish activation function and multi-scale convolutions, the model was trained on a dataset of 5856 chest X-ray images, achieving high performance: 0.9642 accuracy, 0.9580 precision, 0.9506 sensitivity, 0.9542 F1 score, and 0.9507 specificity. These results demonstrate a significant advancement in diagnostic precision and efficiency within this domain. By achieving the highest accuracy and F1 score compared to other recent work using the same dataset, our approach offers a tangible improvement for resource-constrained environments where access to specialists and sophisticated equipment is limited. While the need for high-quality datasets and adequate computational resources remains a general consideration for deep learning applications, our model’s demonstrably superior performance establishes a new benchmark and offers the delivery of more timely and precise diagnoses, with the potential to significantly enhance patient outcomes. Full article

(This article belongs to the Special Issue Machine Learning in Medical Signal and Image Processing (3rd Edition))

►▼ Show Figures

Figure 1

Figure 1
The study workflow for pneumonia recognition, utilizing a DenseNet architecture, Mish activation function, and multi-scale convolutions, proceeded through the following phases: (1) input data preparation, encompassing both healthy and pneumonia image classes; (2) implementation of the proposed deep learning model; and (3) evaluation of the implemented model’s performance in pneumonia recognition, including accuracy assessment. Full article ">Figure 2
Samples of the pneumonia and healthy chest X-ray images used in this study. Full article ">Figure 3
The confusion matrix for the proposed methodology. The classified values are represented by the upper numerical entry in each cell, with the corresponding class-specific percentage shown below. Full article ">Figure 4
Grad-CAM heatmaps demonstrating model focus in pediatric pneumonia diagnosis. Full article ">

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 29.

Go to page 1 2 3 4 5

Search Results (1,448)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI