MDPI - Publisher of Open Access Journals

13 pages, 2648 KiB

Open AccessArticle

Comparative Study of Deep Transfer Learning Models for Semantic Segmentation of Human Mesenchymal Stem Cell Micrographs

by Maksim Solopov, Elizaveta Chechekhina, Anna Kavelina, Gulnara Akopian, Viktor Turchin, Andrey Popandopulo, Dmitry Filimonov and Roman Ishchenko

Int. J. Mol. Sci. 2025, 26(5), 2338; https://doi.org/10.3390/ijms26052338 - 6 Mar 2025

Viewed by 151

Abstract

The aim of this study is to conduct a comparative assessment of the effectiveness of neural network models—U-Net, DeepLabV3+, SegNet and Mask R-CNN—for the semantic segmentation of micrographs of human mesenchymal stem cells (MSCs). A dataset of 320 cell micrographs annotated by cell [...] Read more.

The aim of this study is to conduct a comparative assessment of the effectiveness of neural network models—U-Net, DeepLabV3+, SegNet and Mask R-CNN—for the semantic segmentation of micrographs of human mesenchymal stem cells (MSCs). A dataset of 320 cell micrographs annotated by cell biology experts was created. The models were trained using a transfer learning method based on ImageNet pre-trained weights. As a result, the U-Net model demonstrated the best segmentation accuracy according to the metrics of the Dice coefficient (0.876) and the Jaccard index (0.781). The DeepLabV3+ and Mask R-CNN models also showed high performance, although slightly lower than U-Net, while SegNet exhibited the least accurate results. The obtained data indicate that the U-Net model is the most suitable for automating the segmentation of MSC micrographs and can be recommended for use in biomedical laboratories to streamline the routine analysis of cell cultures. Full article

(This article belongs to the Special Issue Fundamental and Practical Perspectives in Regenerative Medicine: Proceedings of the VI National Congress of Regenerative Medicine (2024))

► Show Figures

Figure 1

Figure 1
Training graphs of neural network models for segmenting micrographs of mesenchymal stem cells (MSCs). Dynamics of changes in pixel accuracy (PA) and loss function for the investigated models on training and validation samples of micrographs during training: (a) U-Net, (b) DeepLabV3+, (c) SegNet, and (d) Mask R-CNN. Full article ">Figure 2
Optimal prediction thresholds for U-Net, DeepLabV3+, SegNet, and Mask R-CNN segmentation models (from top to bottom) according to the Dice coefficient (DC), Jaccard index (JI) and PA metrics (from left to right). The optimal thresholds are defined as the maximum values of the functional dependencies of the metric on the threshold value. To plot the dependencies, the average value of each metric was calculated over 64 images from the validation sample at a given value of the varying threshold. The graphs show the mean values (blue line) with standard deviations (highlighted in gray). Full article ">Figure 3
Comparison of the performance of segmentation models based on DC (a), JI (b), and PA (c) metrics. The charts show the distribution of metric values for each model. * p < 0.05; ** p < 0.01; **** p < 0.0001; ns—differences are not significant. Full article ">Figure 4
Examples of segmentation of MSC micrographs by neural network models: original images, ground truth masks, and masks predicted by U-Net, DeepLabV3+, SegNet, and Mask R-CNN models. The micrographs were captured at a magnification of 40×. Full article ">

7 pages, 7488 KiB

Open AccessProceeding Paper

Enhancing Fabric Detection and Classification Using YOLOv5 Models

by Makara Mao, Jun Ma, Ahyoung Lee and Min Hong

Eng. Proc. 2025, 89(1), 33; https://doi.org/10.3390/engproc2025089033 - 3 Mar 2025

Viewed by 80

Abstract

The YOLO series is widely recognized for its efficiency in the real-time detection of objects within images and videos. Accurately identifying and classifying fabric types in the textile industry is vital to ensuring quality, managing supply, and increasing customer satisfaction. We developed a [...] Read more.

The YOLO series is widely recognized for its efficiency in the real-time detection of objects within images and videos. Accurately identifying and classifying fabric types in the textile industry is vital to ensuring quality, managing supply, and increasing customer satisfaction. We developed a method for fabric type classification and object detection using the YOLOv5 architecture. The model was trained on a diverse dataset containing images of different fabrics, including cotton, hanbok, dyed cotton yarn, and a plain cotton blend. We conducted a dataset preparation process, including data collection, annotation, and training procedures for data augmentation to improve model generalization. The model’s performance was evaluated using precision, recall, and F1-score. The developed model detected and classified fabrics with an accuracy of 81.08%. YOLOv5s allowed a faster performance than other models. The model can be used for automated quality control, inventory tracking, and retail analytics. The deep learning-based object detection method with YOLOv5 addresses challenges related to fabric classification, improving the abilities and productivity of manufacturing and operations. Full article

(This article belongs to the Proceedings of 2024 IEEE 7th International Conference on Knowledge Innovation and Invention)

► Show Figures

Figure 1

18 pages, 3451 KiB

Open AccessArticle

Integrating Neural Networks for Automated Video Analysis of Traffic Flow Routing and Composition at Intersections

by Maros Jakubec, Michal Cingel, Eva Lieskovská and Marek Drliciak

Sustainability 2025, 17(5), 2150; https://doi.org/10.3390/su17052150 - 2 Mar 2025

Viewed by 302

Abstract

Traffic flow at intersections is influenced by spatial design, control methods, technical equipment, and traffic volume. This article focuses on detecting traffic flows at intersections using video recordings, employing a YOLO-based framework for automated analysis. We compare manual evaluation with machine processing to [...] Read more.

Traffic flow at intersections is influenced by spatial design, control methods, technical equipment, and traffic volume. This article focuses on detecting traffic flows at intersections using video recordings, employing a YOLO-based framework for automated analysis. We compare manual evaluation with machine processing to demonstrate the efficiency improvements in traffic engineering tasks through automated traffic data analysis. The output data include traditionally immeasurable parameters, such as speed and vehicle gaps within the observed intersection area. The traffic analysis incorporates findings from monitoring groups of vehicles, focusing on their formation and speed as they traverse the intersection. Our proposed system for monitoring and classifying traffic flow was implemented at a selected intersection in the city of Zilina, Slovak Republic, as part of a pilot study for this research initiative. Based on evaluations using local data, the YOLOv9c detection model achieved an mAP50 of 98.2% for vehicle localization and classification across three basic classes: passenger cars, trucks, and buses. Despite the high detection accuracy of the model, the automated annotations for vehicle entry and exit at the intersection showed varying levels of accuracy compared to manual evaluation. On average, the mean absolute error between annotations by traffic specialists and the automated framework for the most frequent class, passenger cars, was 2.73 across all directions at 15 min intervals. This indicates that approximately three passenger cars per 15 min interval were either undetected or misclassified. Full article

(This article belongs to the Special Issue Sustainable Transport: Logistic, Optimization, Traffic Flow and Road Safety)

► Show Figures

Figure 1

Figure 1
The structure of the YOLOv9 network. Full article ">Figure 2
Map of the intersection selected for the pilot analysis and a traffic diagram of the intersection, where (A–C) indicate the directions from which vehicles approach the intersection. Full article ">Figure 3
Video recording setup at the intersection: (a) camera on a tripod placed indoors; (b) close-up of the camera setup; (c) camera display showing the live view of the monitored intersection. Full article ">Figure 4
Proposed framework for evaluating traffic flow routing. Full article ">Figure 5
Comparison of vehicle types in MS COCO [<a href="#B31-sustainability-17-02150" class="html-bibr">31</a>] and those commonly found on Slovak roads. Full article ">Figure 6
Examples of vehicle detection performed using YOLOv9c. Full article ">Figure 7
Example of evaluating the formation of vehicle groups on the Pod Hájom street profile. Full article ">

30 pages, 10408 KiB

Open AccessArticle

Weed Detection in Lily Fields Using YOLOv7 Optimized by Chaotic Harris Hawks Algorithm for Underground Resource Competition

by Junjie Tang, Huafei Wang, Mingyue Zhao, Ping Sun, Yutao Hao and Zhiyuan Zhu

Symmetry 2025, 17(3), 370; https://doi.org/10.3390/sym17030370 - 28 Feb 2025

Viewed by 168

Abstract

Lilies, a key cash crop in Lanzhou, China, widely planted in coal-based fields, cultivated fields, and gardens, face significant yield and quality reduction due to weed infestation, which competes for essential nutrients, water, and light. To address this challenge, we propose an advanced [...] Read more.

Lilies, a key cash crop in Lanzhou, China, widely planted in coal-based fields, cultivated fields, and gardens, face significant yield and quality reduction due to weed infestation, which competes for essential nutrients, water, and light. To address this challenge, we propose an advanced weed detection method that combines symmetry-based convolutional neural networks with metaheuristic optimization. A dedicated weed detection dataset is constructed through extensive field investigation, data collection, and annotation. To enhance detection efficiency, we introduce an optimized YOLOv7-Tiny model, integrating dynamic pruning and knowledge distillation, which reduces computational complexity while maintaining high accuracy. Additionally, a novel Chaotic Harris Hawks Optimization (CHHO) algorithm, incorporating chaotic mapping initialization and differential evolution, is developed to fine-tune YOLOv7-Tiny parameters and activation functions. Experimental results demonstrate that the optimized YOLOv7-Tiny achieves a detection accuracy of 92.53% outperforming traditional models while maintaining efficiency. This study provides a high-performance, lightweight, and scalable solution for real-time precision weed management in lily fields, offering valuable insights for agricultural automation and smart farming applications. Full article

(This article belongs to the Special Issue Symmetry in Nonlinear Dynamics and Chaos II)

► Show Figures

Figure 1

19 pages, 1675 KiB

Open AccessArticle

A Method for Polyp Segmentation Through U-Net Network

by Antonella Santone, Mario Cesarelli and Francesco Mercaldo

Bioengineering 2025, 12(3), 236; https://doi.org/10.3390/bioengineering12030236 - 26 Feb 2025

Viewed by 139

Abstract

Early detection of colorectal polyps through endoscopic colonoscopy is crucial in reducing colorectal cancer mortality. While automated polyp segmentation has been explored to enhance detection accuracy and efficiency, challenges remain in achieving precise boundary delineation, particularly for small or flat polyps. In this [...] Read more.

Early detection of colorectal polyps through endoscopic colonoscopy is crucial in reducing colorectal cancer mortality. While automated polyp segmentation has been explored to enhance detection accuracy and efficiency, challenges remain in achieving precise boundary delineation, particularly for small or flat polyps. In this work, we propose a novel U-Net-based segmentation framework specifically optimized for real-world endoscopic colonoscopy data. Unlike conventional approaches, our method leverages high-resolution frames with pixel-level ground-truth annotations to achieve superior segmentation performance. The U-Net architecture, with its symmetric encoder-decoder design and skip connections, is further adapted to enhance both high-level contextual understanding and fine-grained detail preservation. Our model has been rigorously evaluated on a real-world dataset, demonstrating state-of-the-art accuracy in polyp boundary segmentation, even in challenging cases. By improving detection consistency and reducing observer variability, our approach provides a robust tool to support gastroenterologists in clinical decision-making. Beyond real-time clinical applications, this work contributes to advancing automated and standardized polyp detection, paving the way for more reliable AI-assisted endoscopic analysis. Full article

(This article belongs to the Special Issue Applications of Computational Modeling in Biomedical Image and Signal Processing—2nd Edition)

► Show Figures

Figure 1

Figure 1
The main steps of the proposed method. Full article ">Figure 2
A comparison of the outputs of the proposed U-Net model trained for colon polyp segmentation at different training epochs (1, 25, 50, 75, and 100). The (left) image is the original image, the (middle) image indicates the original mask, and the (right) figure represents the predicted mask by the proposed U-Net model. Full article ">Figure 2 Cont.
A comparison of the outputs of the proposed U-Net model trained for colon polyp segmentation at different training epochs (1, 25, 50, 75, and 100). The (left) image is the original image, the (middle) image indicates the original mask, and the (right) figure represents the predicted mask by the proposed U-Net model. Full article ">Figure 3
The training and the validation loss of the proposed UNET model. Full article ">Figure 4
Several prediction examples related to the proposed U-NET model, with the related Grad-CAM generated for explanation. Full article ">Figure 4 Cont.
Several prediction examples related to the proposed U-NET model, with the related Grad-CAM generated for explanation. Full article ">

20 pages, 3550 KiB

Open AccessFeature PaperArticle

Ambiance Preservation Augmenting for Semantic Segmentation of Pediatric Burn Skin Lesions

by Laura Florea, Corneliu Florea, Constantin Vertan and Silviu Bădoiu

Mathematics 2025, 13(5), 758; https://doi.org/10.3390/math13050758 - 25 Feb 2025

Viewed by 238

Abstract

Burn injuries pose a significant threat to human life, with high morbidity and mortality rates. Accurate diagnosis, including the assessment of burn area and depth, is essential for effective treatment and can sometimes be lifesaving. However, access to specialized medical professionals is often [...] Read more.

Burn injuries pose a significant threat to human life, with high morbidity and mortality rates. Accurate diagnosis, including the assessment of burn area and depth, is essential for effective treatment and can sometimes be lifesaving. However, access to specialized medical professionals is often limited, particularly in remote or underserved regions. To address this challenge and alleviate the burden on healthcare providers, researchers are investigating automated diagnostic tools. The severity of the burn and the affected body surface area are critical factors in diagnosis. From a computer vision perspective, this requires semantic segmentation of burn images to assess the affected area and determine burn severity. In collaboration with medical personnel, we have gathered a dataset of in situ images from a local children’s hospital annotated by specialist burn surgeons. However, due to the limited amount of data, we propose a two-step augmentation approach: training with synthetic burn images and controlling the encoder by ambiance preservation. The latter is a technique that forces the encoder to represent closely the embeddings of images that are similar and is a key contribution of this paper. The method is evaluated on the BAMSI database, demonstrating that the proposed augmentations lead to better performance compared with strong baselines and other potential algorithmic improvements. Full article

(This article belongs to the Special Issue AI-Driven Innovations in Healthcare: Advances in Machine Learning and Computer Vision)

► Show Figures

Figure 1

Figure 1
Image examples from the used databases: (a,b) Pediatric burns from BAMSI database, (c) Kaggle burns, and (d,e) healthy skin images (Abdominal Skin Dataset and, respectively, hand gesture recognition). Full article ">Figure 2
The method schematic. The proposed method assumes training of deep model (G-Cascade) using, sequentially, three types of data (unlabeled, synthetic, and labeled). The trained model is further used to assist medical personnel by predicting the burned area and the degree of burn in images with burns. Full article ">Figure 3
The Pyramid Vision Transformer (PVT) [<a href="#B32-mathematics-13-00758" class="html-bibr">32</a>], which is used as encoder. The model contains four stages, each made from a patch projection layer and an encoder transformer. Full article ">Figure 4
The G-cascade model [<a href="#B29-mathematics-13-00758" class="html-bibr">29</a>] using a PVT encoder [<a href="#B32-mathematics-13-00758" class="html-bibr">32</a>]. The embeddings are captured at the exit of the encoder and concatenated to form the feature descriptor <math display="inline"><semantics> <mi mathvariant="bold">e</mi> </semantics></math>. Full article ">Figure 5
The process to create synthetic burn skin images. First, given healthy skin images with annotations, the inscribed box is computed. Then, a random burn image and a random healthy skin image are selected and combined to create a synthetic burned skin image. The resulting image has a segmentation label map generated from the location where the burn was overlaid. Full article ">Figure 6
Computation of the adaptive Dice Coefficient, used as a weighting factor when computing the Ambiance preservation loss. The background label (marked with black) is not taken into account. Only the burn classes are considered. The Dice Coefficient uses the ratio between intersection over union for each class. These are marked for two examples: in the upper part, there is a pair of synthetic burns, which are considered to have the burn grade IIA (class 2). In the lower part, there is an example with real burn images, which may contain all classes. In this example, both contain class 2, and thus, the term for class 2 is non-zero, while only one contains class 3 (grade IIB), and thus, the intersection is null. Full article ">Figure 7
The training process has been split into three stages: (pre)training in supervised manner with synthetic burns, partially supervised training using the Ambiance preservation technique over the encoder embeddings over all images, and supervised training of the annotated images only. Full article ">Figure 8
Example of images with pediatric skin burns (original images—first column), ground truth (second column) and predictions with the solution from [<a href="#B23-mathematics-13-00758" class="html-bibr">23</a>] (third column) and the proposed solution (forth row). Annotations and predictions showing in this example are color-coded by pink (grade IIA), magenta (grade III), and cyan (grade IV). Full article ">

16 pages, 86590 KiB

Open AccessArticle

Automated Detection of Araraucaria angustifolia (Bertol.) Kuntze in Urban Areas Using Google Earth Images and YOLOv7x

by Mauro Alessandro Karasinski, Ramon de Sousa Leite, Emmanoella Costa Guaraná, Evandro Orfanó Figueiredo, Eben North Broadbent, Carlos Alberto Silva, Erica Kerolaine Mendonça dos Santos, Carlos Roberto Sanquetta and Ana Paula Dalla Corte

Remote Sens. 2025, 17(5), 809; https://doi.org/10.3390/rs17050809 - 25 Feb 2025

Viewed by 394

Abstract

This study addresses the urgent need for effective methods to monitor and conserve Araucaria angustifolia, a critically endangered species of immense ecological and cultural significance in southern Brazil. Using high-resolution satellite images from Google Earth, we apply the YOLOv7x deep learning model [...] Read more.

This study addresses the urgent need for effective methods to monitor and conserve Araucaria angustifolia, a critically endangered species of immense ecological and cultural significance in southern Brazil. Using high-resolution satellite images from Google Earth, we apply the YOLOv7x deep learning model to detect this species in two distinct urban contexts in Curitiba, Paraná: isolated trees across the urban landscape and A. angustifolia individuals within forest remnants. Data augmentation techniques, including image rotation, hue and saturation adjustments, and mosaic augmentation, were employed to increase the model’s accuracy and robustness. Through a 5-fold cross-validation, the model achieved a mean Average Precision (AP) of 90.79% and an F1-score of 88.68%. Results show higher detection accuracy in forest remnants, where the homogeneous background of natural landscapes facilitated the identification of trees, compared to urban areas where complex visual elements like building shadows presented challenges. To reduce false positives, especially misclassifications involving palm species, additional annotations were introduced, significantly enhancing performance in urban environments. These findings highlight the potential of integrating remote sensing with deep learning to automate large-scale forest inventories. Furthermore, the study highlights the broader applicability of the YOLOv7x model for urban forestry planning, offering a cost-effective solution for biodiversity monitoring. The integration of predictive data with urban forest maps reveals a spatial correlation between A. angustifolia density and the presence of forest fragments, suggesting that the preservation of these areas is vital for the species’ sustainability. The model’s scalability also opens the door for future applications in ecological monitoring across larger urban areas. As urban environments continue to expand, understanding and conserving key species like A. angustifolia is critical for enhancing biodiversity, resilience, and addressing climate change. Full article

► Show Figures

Figure 1

Figure 1
Location of the study area in the city of Curitiba, Paraná, Brazil. The highlighted neighborhoods (Batel, Centro, Jardim Botânico, Jardim das Américas, Rebouças, and Santa Felicidade) were used to train and test the YOLOv7x model. The gray area indicates regions where the available images did not have the same quality as the others and, therefore, were not included in the study. Full article ">Figure 2
Components of a bounding box. (bx, by) represent the X and Y coordinates of the center of the bounding box; w represents the width and h the height of the bounding box. Full article ">Figure 3
Learning curve performance of YOLOv7x in the detection of A. angustifolia in the city of Curitiba, Paraná, Brazil. Full article ">Figure 4
Frequency distribution of individuals classified as forest and isolated individuals. Full article ">Figure 5
Overview of A. angustifolia distribution by YOLOv7x in Curitiba, Paraná. (a) Forest areas. (b) Kernel Density Map (trees/ha). (c) Predicted trees. (d) Uncertainty distribution for predicted trees. Full article ">Figure 6
Examples of prediction results: (a) Detection in the context of isolated trees. (b) Detection in forest fragments. (c) Example of a false negative caused by building shadows. (d) Example of a false positive due to confusion with palm trees. (e) Example of a false positive caused by confusion with the shadow projection of an A. angustifolia. Full article ">

11 pages, 375 KiB

Open AccessArticle

High-Performance Genome Annotation for a Safer and Faster-Developing Phage Therapy

by Antoine Culot, Guillaume Abriat and Kieran P. Furlong

Viruses 2025, 17(3), 314; https://doi.org/10.3390/v17030314 - 25 Feb 2025

Viewed by 273

Abstract

Phage therapy, which uses phages to decrease bacterial load in an ecosystem, introduces a multitude of gene copies (bacterial and phage) into said ecosystem. While it is widely accepted that phages have a significant impact on ecology, the mechanisms underlying their impact are [...] Read more.

Phage therapy, which uses phages to decrease bacterial load in an ecosystem, introduces a multitude of gene copies (bacterial and phage) into said ecosystem. While it is widely accepted that phages have a significant impact on ecology, the mechanisms underlying their impact are not well understood. It is therefore paramount to understand what is released in the said ecosystem, to avoid alterations with difficult-to-predict—but potentially huge—consequences. An in-depth annotation of therapeutic phage genomes is therefore essential. Currently, the average published phage genome has only 20–30% functionally annotated genes, which represents a hurdle to overcome to deliver safe phage therapy, for both patients and the environment. This study aims to compare the effectiveness of manual versus automated phage genome annotation methods. Twenty-seven phage genomes were annotated using SEA-PHAGE and Rime Bioinformatics protocols. The structural (gene calling) and functional annotation results were compared. The results suggest that during the structural annotation step, the SEA-PHAGE method was able to identify an average of 1.5 more genes per phage (typically a frameshift gene) and 5.3 gene start sites per phage. Despite this difference, the impact on functional annotation appeared to be limited: on average, 1.2 genes per phage had erroneous functions, caused by the structural annotation. Rime Bioinformatics’ tool (rTOOLS, v2) performed better at assigning functions, especially where the SEA-PHAGE methods assigned hypothetical proteins: 7.0 genes per phage had a better functional annotation on average, compared to SEA PHAGE’s 1.7. The method comparison detailed in this article indicate that (1) manual structural annotation is marginally superior to rTOOLS automated structural annotation; (2) rTOOLS automated functional annotation is superior to manual functional annotation. Previously, the only way to obtain a high-quality annotation was by using manual protocols, such as SEA-PHAGES. In the relatively new field of phage therapy, which requires support to advance, manual work can be problematic due to its high cost. Rime Bioinformatics’ rTOOLS software allows for time and money to be saved by providing high-quality genome annotations that are comparable to manual results, enabling a safer and faster-developing phage therapy. Full article

(This article belongs to the Collection Phage Therapy)

► Show Figures

Figure 1

24 pages, 5275 KiB

Open AccessArticle

Force Map-Enhanced Segmentation of a Lightweight Model for the Early Detection of Cervical Cancer

by Sabina Umirzakova, Shakhnoza Muksimova, Jushkin Baltayev and Young Im Cho

Diagnostics 2025, 15(5), 513; https://doi.org/10.3390/diagnostics15050513 - 20 Feb 2025

Viewed by 171

Abstract

Background/Objectives: Accurate and efficient segmentation of cervical cells is crucial for the early detection of cervical cancer, enabling timely intervention and treatment. Existing segmentation models face challenges with complex cellular arrangements, such as overlapping cells and indistinct boundaries, and are often computationally intensive, [...] Read more.

Background/Objectives: Accurate and efficient segmentation of cervical cells is crucial for the early detection of cervical cancer, enabling timely intervention and treatment. Existing segmentation models face challenges with complex cellular arrangements, such as overlapping cells and indistinct boundaries, and are often computationally intensive, which limits their deployment in resource-constrained settings. Methods: In this study, we introduce a lightweight and efficient segmentation model specifically designed for cervical cell analysis. The model employs a MobileNetV2 architecture for feature extraction, ensuring a minimal parameter count conducive to real-time processing. To enhance boundary delineation, we propose a novel force map approach that drives pixel adjustments inward toward the centers of cells, thus improving cell separation in densely packed areas. Additionally, we integrate extreme point supervision to refine segmentation outcomes using minimal boundary annotations, rather than full pixel-wise labels. Results: Our model was rigorously trained and evaluated on a comprehensive dataset of cervical cell images. It achieved a Dice Coefficient of 0.87 and a Boundary F1 Score of 0.84, performances that are comparable to those of advanced models but with considerably lower inference times. The optimized model operates at approximately 50 frames per second on standard low-power hardware. Conclusions: By effectively balancing segmentation accuracy with computational efficiency, our model addresses critical barriers to the widespread adoption of automated cervical cell segmentation tools. Its ability to perform in real time on low-cost devices makes it an ideal candidate for clinical applications and deployment in low-resource environments. This advancement holds significant potential for enhancing access to cervical cancer screening and diagnostics worldwide, thereby supporting broader healthcare initiatives. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Medical Image Processing and Analysis)

► Show Figures

Figure 1

26 pages, 4229 KiB

Open AccessArticle

YOLOSAMIC: A Hybrid Approach to Skin Cancer Segmentation with the Segment Anything Model and YOLOv8

by Sevda Gül, Gökçen Cetinel, Bekir Murat Aydin, Devrim Akgün and Rabia Öztaş Kara

Diagnostics 2025, 15(4), 479; https://doi.org/10.3390/diagnostics15040479 - 16 Feb 2025

Viewed by 341

Abstract

Background/Objective: The rising global incidence of skin cancer emphasizes the urgent need for reliable and accurate diagnostic tools to aid early intervention. This study introduces YOLOSAMIC (YOLO and SAM in Cancer Imaging), a fully automated segmentation framework that integrates YOLOv8 for lesion [...] Read more.

Background/Objective: The rising global incidence of skin cancer emphasizes the urgent need for reliable and accurate diagnostic tools to aid early intervention. This study introduces YOLOSAMIC (YOLO and SAM in Cancer Imaging), a fully automated segmentation framework that integrates YOLOv8 for lesion detection, and the Segment Anything Model (SAM)-Box for precise segmentation. The objective is to develop a reliable segmentation system that handles complex skin lesion characteristics without requiring manual intervention. Methods: A hybrid database comprising 3463 public and 765 private dermoscopy images was built to enhance model generalizability. YOLOv8 was employed to localize lesions through bounding box detection, while SAM-Box refined the segmentation process. The model was trained and evaluated under four scenarios to assess its robustness. Additionally, an ablation study examined the impact of grayscale conversion, image blur, and model pruning on segmentation performance. Results: YOLOSAMIC demonstrated high segmentation accuracy, achieving Dice and Jaccard scores of 0.9399 and 0.9112 on the public database and 0.8990 and 0.8445 on the hybrid dataset. Conclusions: The proposed YOLOSAMIC framework provides a robust, fully automated solution for skin lesion segmentation, eliminating the need for manual annotation. Integrating YOLOv8 and SAM-Box enhances segmentation precision, making it a valuable decision-support tool for dermatologists. Full article

(This article belongs to the Special Issue Artificial Intelligence in Clinical Medical Imaging: 2nd Edition)

► Show Figures

Figure 1

17 pages, 2395 KiB

Open AccessArticle

Automated Dataset-Creation and Evaluation Pipeline for NER in Russian Literary Heritage

by Kenan Kassab, Nikolay Teslya and Ekaterina Vozhik

Appl. Sci. 2025, 15(4), 2072; https://doi.org/10.3390/app15042072 - 16 Feb 2025

Viewed by 386

Abstract

Developing robust and reliable models for Named Entity Recognition (NER) in the Russian language presents significant challenges due to the linguistic complexity of Russian and the limited availability of suitable training datasets. This study introduces a semi-automated methodology for building a customized Russian [...] Read more.

Developing robust and reliable models for Named Entity Recognition (NER) in the Russian language presents significant challenges due to the linguistic complexity of Russian and the limited availability of suitable training datasets. This study introduces a semi-automated methodology for building a customized Russian dataset for NER specifically designed for literary purposes. The paper provides a detailed description of the methodology employed for collecting and proofreading the dataset, outlining the pipeline used for processing and annotating its contents. A comprehensive analysis highlights the dataset’s richness and diversity. Central to the proposed approach is the use of a voting system to facilitate the efficient elicitation of entities, enabling significant time and cost savings compared to traditional methods of constructing NER datasets. The voting system is described theoretically and mathematically to highlight its impact on enhancing the annotation process. The results of testing the voting system with various thresholds show its impact in increasing the overall precision by 28% compared to using only the state-of-the-art model for auto-annotating. The dataset is meticulously annotated and thoroughly proofread, ensuring its value as a high-quality resource for training and evaluating NER models. Empirical evaluations using multiple NER models underscore the dataset’s importance and its potential to enhance the robustness and reliability of NER models in the Russian language. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

► Show Figures

Figure 1

17 pages, 766 KiB

Open AccessArticle

Semi-Supervised Relation Extraction Corpus Construction and Models Creation for Under-Resourced Languages: A Use Case for Slovene

by Timotej Knez, Miha Štravs and Slavko Žitnik

Information 2025, 16(2), 143; https://doi.org/10.3390/info16020143 - 15 Feb 2025

Viewed by 208

Abstract

The goal of relation extraction is to recognize head and tail entities in a document and determine a relation between them. While a lot of progress was made in solving automated relation extraction in widely used languages such as English, the use of [...] Read more.

The goal of relation extraction is to recognize head and tail entities in a document and determine a relation between them. While a lot of progress was made in solving automated relation extraction in widely used languages such as English, the use of these methods for under-resourced languages and domains is limited due to the lack of training data. In this work, we present a pipeline using distant supervision for constructing a relation extraction corpus in an arbitrary language. The corpus construction combines Wikipedia documents in the target language with relations in the WikiData knowledge graph. We demonstrate the process by constructing a new corpus for relation extraction in the Slovene language. Our corpus captures 20 unique relation types. The final corpus contains 811,032 relations annotated in 244,437 sentences. We use the corpus to train models using three architectures and evaluate them on the task of Slovene relation extraction. We achieve comparable performance to approaches on English data. Full article

(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)

► Show Figures

Figure 1

Figure 1
Overview of the semi-supervised pipeline for constructing the SloREL corpus. Full article ">Figure 2
Distribution of relations. P0 represents that there is no relation between entities. The description of each relation type is presented in <a href="#information-16-00143-t0A1" class="html-table">Table A1</a> of <a href="#app1-information-16-00143" class="html-app">Appendix A</a>. Full article ">Figure 3
The architecture of the LSTM model. The model uses LSTM representation of the main relation as well as the representations of all other relations in a sentence to make the final prediction. Full article ">Figure 4
The architecture of the model using pretrained language model. Full article ">Figure 5
Part of the RECON network responsible for generating entity embedding based on the relations in the knowledge graph. The inputs to the module are all relations in a knowledge graph where the target entity is the head of the relation. Full article ">Figure 6
The trade-off between precision and recall for each of the four models on the Wikipedia dataset. The models using pretrained language models (orange and green) performed worse than the LSTM and RECON models (red and blue). Full article ">Figure 7
The trade-off between precision and recall for each of the four models. The models were trained on the SloREL dataset and tested on the 24ur.com dataset. Since we have no information about the number of all relations, we measure the recall of the model in terms of the number of correctly predicted relations. Full article ">

15 pages, 27507 KiB

Open AccessArticle

Detection of Flexible Pavement Surface Cracks in Coastal Regions Using Deep Learning and 2D/3D Images

by Carlos Sanchez, Feng Wang, Yongsheng Bai and Haitao Gong

Sensors 2025, 25(4), 1145; https://doi.org/10.3390/s25041145 - 13 Feb 2025

Viewed by 395

Abstract

Pavement surface distresses are analyzed by transportation agencies to determine section performance across their pavement networks. To efficiently collect and evaluate thousands of lane-miles, automated processes utilizing image-capturing techniques and detection algorithms are applied to perform these tasks. However, the precision of this [...] Read more.

Pavement surface distresses are analyzed by transportation agencies to determine section performance across their pavement networks. To efficiently collect and evaluate thousands of lane-miles, automated processes utilizing image-capturing techniques and detection algorithms are applied to perform these tasks. However, the precision of this novel technology often leads to inaccuracies that must be verified by pavement engineers. Developments in artificial intelligence and machine learning (AI/ML) can aid in the progress of more robust and precise detection algorithms. Deep learning models are efficient for visual distress identification of pavement. With the use of 2D/3D pavement images, surface distress analysis can help train models to efficiently detect and classify surface distresses that may be caused by traffic loading, weather, aging, and other environmental factors. The formation of these distresses is developing at a higher rate in coastal regions, where extreme weather phenomena are more frequent and intensive. This study aims to develop a YOLOv5 model with 2D/3D images collected in the states of Louisiana, Mississippi, and Texas in the U.S. to establish a library of data on pavement sections near the Gulf of Mexico. Images with a resolution of 4096 × 2048 are annotated by utilizing bounding boxes based on a class list of nine distress and non-distress objects. Along with emphasis on efforts to detect cracks in the presence of background noise on asphalt pavements, six scenarios for augmentation were made to evaluate the model’s performance based on flip probability in the horizontal and vertical directions. The YOLOv5 models are able to detect defined distresses consistently, with the highest mAP50 scores ranging from 0.437 to 0.462 throughout the training scenarios. Full article

(This article belongs to the Special Issue Sensing and Imaging for Defect Detection: 2nd Edition)

► Show Figures

Figure 1

Figure 1
The 2D/3D pavement data collection equipment: (a) is the pavement survey vehicle with laser sensor attached from rear roof; (b) is the HyMIT laser sensor with line laser emitter and camera. Full article ">Figure 2
Location of pavement surface data collection sites in coastal states in United States of America: (a) is the pavement sections in Louisiana and Mississippi; (b) is the pavement sections in Texas. Full article ">Figure 3
Custom annotation software interface: (a) is a 3D range image of pavement surface; (b) is a 2D intensity image of pavement surface. Both images have identical bounding box annotations, with pixel brightness in (a) indicating depth into or from pavement surface. Full article ">Figure 4
Labeled distresses and common objects in 3D images from the dataset: (a) is an asphalt surface with multiple transverse cracks, longitudinal cracks, lane longitudinal cracks, and joints; (b) is the presence of surface failures and alligator cracks with thin hairline cracks; (c) is a large block crack covering majority of lane width; (d) is sealed transverse and sealed longitudinal crack appearance on 3D images. Full article ">Figure 5
Distribution of labels in current dataset. Full article ">Figure 6
YOLOv5 architecture with cross-stage partial networks (BottleNeckCSP), spatial pyramid pooling (SPP), concatenation (Concat), convolution (Conv1x1, Conv3x3), and upsampling (UpSample). Full article ">Figure 7
BottleNeckCSP architecture. Inputs in both BottleNeckCSP and Bottleneck module are split into two parts. One part goes through a convolutional layer, while the other is sent through multiple before rejoining in the Concatenation module. Full article ">Figure 8
Prediction boxes generated via YOLOv5 detection. Original surface images, predictions from Scenarios 1–6 (S1 to S6), and the ground truth (GT) are displayed with color-coded boxes: cracks of transverse (blue), sealed transverse (cyan), longitudinal (aquamarine), sealed longitudinal (dark blue), lane longitudinal (pink), block (red), alligator (yellow), and failures (green). Joints (white) are not visualized. Full article ">

16 pages, 511 KiB

Open AccessArticle

Hybrid Machine Learning and Deep Learning Approaches for Insult Detection in Roman Urdu Text

by Nisar Hussain, Amna Qasim, Gull Mehak, Olga Kolesnikova, Alexander Gelbukh and Grigori Sidorov

AI 2025, 6(2), 33; https://doi.org/10.3390/ai6020033 - 8 Feb 2025

Viewed by 685

Abstract

Thisstudy introduces a new model for detecting insults in Roman Urdu, filling an important gap in natural language processing (NLP) for low-resource languages. The transliterated nature of Roman Urdu also poses specific challenges from a computational linguistics perspective, including non-standardized grammar, variation in [...] Read more.

Thisstudy introduces a new model for detecting insults in Roman Urdu, filling an important gap in natural language processing (NLP) for low-resource languages. The transliterated nature of Roman Urdu also poses specific challenges from a computational linguistics perspective, including non-standardized grammar, variation in spellings for the same word, and high levels of code-mixing with English, which together make automated insult detection for Roman Urdu a highly complex problem. To address these problems, we created a large-scale dataset with 46,045 labeled comments from social media websites such as Twitter, Facebook, and YouTube. This is the first dataset for insult detection for Roman Urdu that was created and annotated with insulting and non-insulting content. Advanced preprocessing methods such as text cleaning, text normalization, and tokenization are used in the study, as well as feature extraction using TF–IDF through unigram (Uni), bigram (Bi), trigram (Tri), and their unions: Uni+Bi+Trigram. We compared ten machine learning algorithms (logistic regression, support vector machines, random forest, gradient boosting, AdaBoost, and XGBoost) and three deep learning topologies (CNN, LSTM, and Bi-LSTM). Different models were compared, and ensemble ones were proven to give the highest F1-scores, reaching 97.79%, 97.78%, and 95.25%, respectively, for AdaBoost, decision tree, TF–IDF, and Uni+Bi+Trigram configurations. Deeper learning models also performed on par, with CNN achieving an F1-score of 97.01%. Overall, the results highlight the utility of n-gram features and the combination of robust classifiers in detecting insults. This study makes strides in improving NLP for Roman Urdu, yet further research has established the foundation of pre-trained transformers and hybrid approaches; this could overcome existing systems and platform limitations. This study has conscious implications, mainly on the construction of automated moderation tools to achieve safer online spaces, especially for South Asian social media websites. Full article

(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)

► Show Figures

Figure 1

26 pages, 906 KiB

Open AccessArticle

Large Language Models as Kuwaiti Annotators

by Hana Alostad

Big Data Cogn. Comput. 2025, 9(2), 33; https://doi.org/10.3390/bdcc9020033 - 8 Feb 2025

Viewed by 357

Abstract

Stance detection for low-resource languages, such as the Kuwaiti dialect, poses a significant challenge in natural language processing (NLP) due to the scarcity of annotated datasets and specialized tools. This study addresses these limitations by evaluating the effectiveness of open large language models [...] Read more.

Stance detection for low-resource languages, such as the Kuwaiti dialect, poses a significant challenge in natural language processing (NLP) due to the scarcity of annotated datasets and specialized tools. This study addresses these limitations by evaluating the effectiveness of open large language models (LLMs) in automating stance detection through zero-shot and few-shot prompt engineering, with a focus on the potential of open-source models to achieve performance levels comparable to those of closed-source alternatives. We also highlight the critical distinctions between zero- and few-shot learning, emphasizing their significance for addressing the challenges posed by low-resource languages. Our evaluation involved testing 11 LLMs on a manually labeled dataset of social media posts, including GPT-4o, Gemini Pro 1.5, Mistral-Large, Jais-30B, and AYA-23. As expected, closed-source models such as GPT-4o, Gemini Pro 1.5, and Mistral-Large demonstrated superior performance, achieving maximum F1 scores of 95.4%, 95.0%, and 93.2%, respectively, in few-shot scenarios with English as the prompt template language. However, open-source models such as Jais-30B and AYA-23 achieved competitive results, with maximum F1 scores of 93.0% and 93.1%, respectively, under the same conditions. Furthermore, statistical analysis using ANOVA and Tukey’s HSD post hoc tests revealed no significant differences in overall performance among GPT-4o, Gemini Pro 1.5, Mistral-Large, Jais-30B, and AYA-23. This finding underscores the potential of open-source LLMs as cost-effective and privacy-preserving alternatives for low-resource language annotation. This is the first study comparing LLMs for stance detection in the Kuwaiti dialect. Our findings highlight the importance of prompt design and model consistency in improving the quality of annotations and pave the way for NLP solutions for under-represented Arabic dialects. Full article

(This article belongs to the Special Issue Generative AI and Large Language Models)

► Show Figures

Figure 1

Search Results (519)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (519)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI