MDPI - Publisher of Open Access Journals

22 pages, 13198 KiB

Open AccessArticle

UAV Localization in Urban Area Mobility Environment Based on Monocular VSLAM with Deep Learning

by Mutagisha Norbelt, Xiling Luo, Jinping Sun and Uwimana Claude

Drones 2025, 9(3), 171; https://doi.org/10.3390/drones9030171 - 26 Feb 2025

Viewed by 290

Unmanned Aerial Vehicles (UAVs) play a major role in different applications, including surveillance, mapping, and disaster relief, particularly in urban environments. This paper presents a comprehensive framework for UAV localization in outdoor environments using monocular ORB-SLAM3 integrated with optical flow and YOLOv5 for [...] Read more.

Unmanned Aerial Vehicles (UAVs) play a major role in different applications, including surveillance, mapping, and disaster relief, particularly in urban environments. This paper presents a comprehensive framework for UAV localization in outdoor environments using monocular ORB-SLAM3 integrated with optical flow and YOLOv5 for enhanced performance. The proposed system addresses the challenges of accurate localization in dynamic outdoor environments where traditional GPS methods may falter. By leveraging the capabilities of ORB-SLAM3, the UAV can effectively map its environment while simultaneously tracking its position using visual information from a single camera. The integration of optical flow techniques allows for accurate motion estimation between consecutive frames, which is critical for maintaining accurate localization amidst dynamic changes in the environment. YOLOv5 is a highly efficient model utilized for real-time object detection, enabling the system to identify and classify dynamic objects within the UAV’s field of view. This dual approach of using both optical flow and deep learning enhances the robustness of the localization process by filtering out dynamic features that could otherwise cause mapping errors. Experimental results show that the combination of monocular ORB-SLAM3, optical flow, and YOLOv5 significantly improves localization accuracy and reduces trajectory errors compared to traditional methods. In terms of absolute trajectory error and average tracking time, the suggested approach performs better than ORB-SLAM3 and DynaSLAM. For real-time SLAM applications in dynamic situations, our technique is especially well-suited due to its potential to achieve lower latency and greater accuracy. These improvements guarantee more dependable performance in a variety of scenarios in addition to increasing overall efficiency. The framework effectively distinguishes between static and dynamic elements, allowing for more reliable map construction and navigation. The results show that our proposed method (U-SLAM) produces a considerable decrease of up to 43.47% in APE and 26.47% RPE in S000, and its accuracy is higher for sequences with moving objects and more motion inside the image. Full article

(This article belongs to the Topic Unmanned Vehicles Technology and Embodied Intelligence Systems for Intelligent Transportation)

► Show Figures

Figure 1

26 pages, 17568 KiB

Open AccessArticle

Research on Apple Detection and Tracking Count in Complex Scenes Based on the Improved YOLOv7-Tiny-PDE

by Dongxuan Cao, Wei Luo, Ruiyin Tang, Yuyan Liu, Jiasen Zhao, Xuqing Li and Lihua Yuan

Agriculture 2025, 15(5), 483; https://doi.org/10.3390/agriculture15050483 - 24 Feb 2025

Viewed by 189

Abstract

Accurately detecting apple fruit can crucially assist in estimating the fruit yield in apple orchards in complex scenarios. In such environments, the factors of density, leaf occlusion, and fruit overlap can affect the detection and counting accuracy. This paper proposes an improved YOLOv7-Tiny-PDE [...] Read more.

Accurately detecting apple fruit can crucially assist in estimating the fruit yield in apple orchards in complex scenarios. In such environments, the factors of density, leaf occlusion, and fruit overlap can affect the detection and counting accuracy. This paper proposes an improved YOLOv7-Tiny-PDE network model based on the YOLOv7-Tiny model to detect and count apples from data collected by drones, considering various occlusion and lighting conditions. First, within the backbone network, we replaced the simplified efficient layer aggregation network (ELAN) with partial convolution (PConv), reducing the network parameters and computational redundancy while maintaining the detection accuracy. Second, in the neck network, we used a dynamic detection head to replace the original detection head, effectively suppressing the background interference and capturing the background information more comprehensively, thus enhancing the detection accuracy for occluded targets and improving the fruit feature extraction. To further optimize the model, we replaced the boundary box loss function from CIOU to EIOU. For fruit counting across video frames in complex occlusion scenes, we integrated the improved model with the DeepSort tracking algorithm based on Kalman filtering and motion trajectory prediction with a cascading matching algorithm. According to experimental results, compared with the baseline YOLOv7-Tiny, the improved model reduced the total parameters by 22.2% and computation complexity by 18.3%. Additionally, in data testing, the p-value improved by 0.5%; the R-value rose by 2.7%; the mAP and F1 scores rose by 4% and 1.7%, respectively; and the MOTA value improved by 2%. The improved model is more lightweight and can preserve a high detection accuracy well, and hence, it can be applied to detection and counting tasks in complex orchards and provides a new solution for fruit yield estimation using lightweight devices. Full article

(This article belongs to the Topic Challenges, Development and Frontiers of Smart Agriculture and Forestry—2nd Volume)

► Show Figures

Figure 1

27 pages, 7551 KiB

Open AccessArticle

RDRM-YOLO: A High-Accuracy and Lightweight Rice Disease Detection Model for Complex Field Environments Based on Improved YOLOv5

by Pan Li, Jitao Zhou, Huihui Sun and Jian Zeng

Agriculture 2025, 15(5), 479; https://doi.org/10.3390/agriculture15050479 - 23 Feb 2025

Viewed by 274

Abstract

Rice leaf diseases critically threaten global rice production by reducing crop yield and quality. Efficient disease detection in complex field environments remains a persistent challenge for sustainable agriculture. Existing deep learning-based methods for rice leaf disease detection struggle with inadequate sensitivity to subtle [...] Read more.

Rice leaf diseases critically threaten global rice production by reducing crop yield and quality. Efficient disease detection in complex field environments remains a persistent challenge for sustainable agriculture. Existing deep learning-based methods for rice leaf disease detection struggle with inadequate sensitivity to subtle disease features, high computational complexity, and degraded accuracy under complex field conditions, such as background interference and fine-grained disease variations. To address these limitations, this research aims to develop a lightweight yet high-accuracy detection model tailored for complex field environments that balances computational efficiency with robust performance. We propose RDRM-YOLO, an enhanced YOLOv5-based network, integrating four key improvements: (i) a cross-stage partial network fusion module (Hor-BNFA) is integrated within the backbone network’s feature extraction stage to enhance the model’s ability to capture disease-specific features; (ii) a spatial depth conversion convolution (SPDConv) is introduced to expand the receptive field, enhancing the extraction of fine-grained features, particularly from small disease spots; (iii) SPDConv is also integrated into the neck network, where the standard convolution is replaced with a lightweight GsConv to increase the accuracy of disease localization, category prediction, and inference speed; and (iv) the WIoU Loss function is adopted in place of CIoU Loss to accelerate convergence and enhance detection accuracy. The model is trained and evaluated utilizing a comprehensive dataset of 5930 field-collected and augmented sample images comprising four prevalent rice leaf diseases: bacterial blight, leaf blast, brown spot, and tungro. Experimental results demonstrate that our proposed RDRM-YOLO model achieves state-of-the-art performance with a detection accuracy of 94.3%, and a recall of 89.6%. Furthermore, it achieves a mean Average Precision (mAP) of 93.5%, while maintaining a compact model size of merely 7.9 MB. Compared to Faster R-CNN, YOLOv6, YOLOv7, and YOLOv8 models, the RDRM-YOLO model demonstrates faster convergence and achieves the optimal result values in Precision, Recall, mAP, model size, and inference speed. This work provides a practical solution for real-time rice disease monitoring in agricultural fields, offering a very effective balance between model simplicity and detection performance. The proposed enhancements are readily adaptable to other crop disease detection tasks, thereby contributing to the advancement of precision agriculture technologies. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

19 pages, 10954 KiB

Open AccessArticle

YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment

by Yane Ma and Shujuan Zhang

Agronomy 2025, 15(3), 537; https://doi.org/10.3390/agronomy15030537 - 23 Feb 2025

Viewed by 176

Abstract

In order to accurately detect the maturity of chili peppers under different lighting and natural environmental scenarios, in this study, we propose a lightweight maturity detection model, YOLOv8-CBSE, based on YOLOv8n. By replacing the C2f module in the original model with the designed [...] Read more.

In order to accurately detect the maturity of chili peppers under different lighting and natural environmental scenarios, in this study, we propose a lightweight maturity detection model, YOLOv8-CBSE, based on YOLOv8n. By replacing the C2f module in the original model with the designed C2CF module, the model integrates the advantages of convolutional neural networks and Transformer architecture, improving the model’s ability to extract local features and global information. Additionally, SRFD and DRFD modules are introduced to replace the original convolutional layers, effectively capturing features at different scales and enhancing the diversity and adaptability of the model through the feature fusion mechanism. To further improve detection accuracy, the EIoU loss function is used instead of the CIoU loss function to provide more comprehensive loss information. The results showed that the average precision (AP) of YOLOv8-CBSE for mature and immature chili peppers was 90.75% and 85.41%, respectively, with F1 scores and a mean average precision (mAP) of 81.69% and 88.08%, respectively. Compared with the original YOLOv8n, the F1 score and mAP of the improved model increased by 0.46% and 1.16%, respectively. The detection effect for chili pepper maturity under different scenarios was improved, which proves the robustness and adaptability of YOLOv8-CBSE. YOLOv8-CBSE also maintains a lightweight design with a model size of only 5.82 MB, enhancing its suitability for real-time applications on resource-constrained devices. This study provides an efficient and accurate method for detecting chili peppers in natural environments, which is of great significance for promoting intelligent and precise agricultural management. Full article

(This article belongs to the Special Issue Intelligent Information System for Agriculture Based on Vision Technology)

► Show Figures

Figure 1

28 pages, 7478 KiB

Open AccessArticle

A Comparative Study of YOLO Series (v3–v10) with DeepSORT and StrongSORT: A Real-Time Tracking Performance Study

by Khadijah Alkandary, Ahmet Serhat Yildiz and Hongying Meng

Electronics 2025, 14(5), 876; https://doi.org/10.3390/electronics14050876 - 23 Feb 2025

Viewed by 208

Abstract

Many previous studies have explored the integration of a specific You Only Look Once (YOLO) model with real-time trackers like Deep Simple Online and Realtime Tracker (DeepSORT) and Strong Simple Online and Realtime Tracker (StrongSORT). However, few have conducted a comprehensive and in-depth [...] Read more.

Many previous studies have explored the integration of a specific You Only Look Once (YOLO) model with real-time trackers like Deep Simple Online and Realtime Tracker (DeepSORT) and Strong Simple Online and Realtime Tracker (StrongSORT). However, few have conducted a comprehensive and in-depth analysis of integrating the family of YOLO models with these real-time trackers to study the performance of the resulting pipeline and draw critical conclusions. This work aims to fill this gap, with the primary objective of investigating the effectiveness of integrating the YOLO series, in light-sized versions, with the real-time DeepSORT and StrongSORT tracking algorithms for real-time object tracking in a computationally limited environment. This work will systematically compare various lightweight YOLO versions, from YOLO version 3 (YOLOv3) to YOLO version 10 (YOLOv10), combined with both tracking algorithms. It will evaluate their performance using detailed metrics across diverse and challenging real-world datasets: the Multiple Object Tracking 2017 (MOT17) and Multiple Object Tracking 2020 (MOT20) datasets. The goal of this work is to assess the robustness and accuracy of these light models in multiple complex real-world environments in scenarios with limited computational resources. Our findings reveal that YOLO version 5 (YOLOv5), when combined with either tracker (DeepSORT or StrongSORT), offers not only a solid baseline in terms of the model’s size (enabling real-time performance on edge devices) but also competitive overall performance (in terms of Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP)). The results suggest a strong correlation between the choice regarding the YOLO version and the tracker’s overall performance. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 2431 KiB

Open AccessArticle

Comparative Performance Evaluation of YOLOv5, YOLOv8, and YOLOv11 for Solar Panel Defect Detection

by Rahima Khanam, Tahreem Asghar and Muhammad Hussain

Solar 2025, 5(1), 6; https://doi.org/10.3390/solar5010006 - 21 Feb 2025

Viewed by 432

Abstract

The reliable operation of photovoltaic (PV) systems is essential for sustainable energy production, yet their efficiency is often compromised by defects such as bird droppings, cracks, and dust accumulation. Automated defect detection is critical for addressing these challenges in large-scale solar farms, where [...] Read more.

The reliable operation of photovoltaic (PV) systems is essential for sustainable energy production, yet their efficiency is often compromised by defects such as bird droppings, cracks, and dust accumulation. Automated defect detection is critical for addressing these challenges in large-scale solar farms, where manual inspections are impractical. This study evaluates three YOLO object detection models—YOLOv5, YOLOv8, and YOLOv11—on a comprehensive dataset to identify solar panel defects. YOLOv5 achieved the fastest inference time (7.1 ms per image) and high precision (94.1%) for cracked panels. YOLOv8 excelled in recall for rare defects, such as bird drops (79.2%), while YOLOv11 delivered the highest [email protected] (93.4%), demonstrating a balanced performance across the defect categories. Despite the strong performance for common defects like dusty panels ([email protected] > 98%), bird drop detection posed challenges due to dataset imbalances. These results highlight the trade-offs between accuracy and computational efficiency, providing actionable insights for deploying automated defect detection systems to enhance PV system reliability and scalability. Full article

(This article belongs to the Special Issue Recent Advances in Solar Photovoltaic Protection)

► Show Figures

Figure 1

25 pages, 5090 KiB

Open AccessArticle

Research on Intelligent Verification of Equipment Information in Engineering Drawings Based on Deep Learning

by Zicheng Zhang and Yurou He

Electronics 2025, 14(4), 814; https://doi.org/10.3390/electronics14040814 - 19 Feb 2025

Viewed by 237

Abstract

This paper focuses on the crucial task of automatic recognition and understanding of table structures in engineering drawings and document processing. Given the importance of tables in information display and the urgent need for automated processing of tables in the digitalization process, an [...] Read more.

This paper focuses on the crucial task of automatic recognition and understanding of table structures in engineering drawings and document processing. Given the importance of tables in information display and the urgent need for automated processing of tables in the digitalization process, an intelligent verification method is proposed. This method integrates multiple key techniques: YOLOv10 is used for table object recognition, achieving a precision of 0.891, a recall rate of 0.899, mAP50 of 0.922, and mAP50-95 of 0.677 in table recognition, demonstrating strong target detection capabilities; the improved LORE algorithm is adopted to extract table structures, breaking through the limitations of the original algorithm by segmenting large-sized images, with a table extraction accuracy rate reaching 91.61% and significantly improving the accuracy of handling complex tables; RapidOCR is utilized to achieve text recognition and cell correspondence, solving the problem of text-cell matching; for equipment name semantic matching, a method based on BERT is introduced and calculated using a comprehensive scoring method. Meanwhile, an improved cuckoo search algorithm is proposed to optimize the adjustment factors, avoiding local optima through sine optimization and the catfish effect. Experiments show the accuracy of equipment name matching in semantic similarity calculation approaches 100%. Finally, the paper provides a concrete system practice to prove the effectiveness of the algorithm. In conclusion, through experimental comparisons, this method exhibits excellent performance in table area location, structure recognition, and semantic matching and is of great significance and practical value in advancing table data processing technology in engineering drawings. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 14761 KiB

Open AccessArticle

GeoIoU-SEA-YOLO: An Advanced Model for Detecting Unsafe Behaviors on Construction Sites

by Xuejun Jia, Xiaoxiong Zhou, Zhihan Shi, Qi Xu and Guangming Zhang

Sensors 2025, 25(4), 1238; https://doi.org/10.3390/s25041238 - 18 Feb 2025

Viewed by 276

Abstract

Unsafe behaviors on construction sites are a major cause of accidents, highlighting the need for effective detection and prevention. Traditional methods like manual inspections and video surveillance often lack real-time performance and comprehensive coverage, making them insufficient for diverse and complex site environments. [...] Read more.

Unsafe behaviors on construction sites are a major cause of accidents, highlighting the need for effective detection and prevention. Traditional methods like manual inspections and video surveillance often lack real-time performance and comprehensive coverage, making them insufficient for diverse and complex site environments. This paper introduces GeoIoU-SEA-YOLO, an enhanced object detection model integrating the Geometric Intersection over Union (GeoIoU) loss function and Structural-Enhanced Attention (SEA) mechanism to improve accuracy and real-time detection. GeoIoU enhances bounding box regression by considering geometric characteristics, excelling in the detection of small objects, occlusions, and multi-object interactions. SEA combines channel and multi-scale spatial attention, dynamically refining feature map weights to focus on critical features. Experiments show that GeoIoU-SEA-YOLO outperforms YOLOv3, YOLOv5s, YOLOv8s, and SSD, achieving high precision ([email protected] = 0.930), recall, and small object detection in complex scenes, particularly for unsafe behaviors like missing safety helmets, vests, or smoking. Ablation studies confirm the independent and combined contributions of GeoIoU and SEA to performance gains, providing a reliable solution for intelligent safety management on construction sites. Full article

(This article belongs to the Special Issue Novel Sensors and Sensing Technology Used for Empowering High-End Equipment Structure)

► Show Figures

Figure 1

24 pages, 13033 KiB

Open AccessArticle

Detection of Parabolic Antennas in Satellite Inverse Synthetic Aperture Radar Images Using Component Prior and Improved-YOLOv8 Network in Terahertz Regime

by Liuxiao Yang, Hongqiang Wang, Yang Zeng, Wei Liu, Ruijun Wang and Bin Deng

Remote Sens. 2025, 17(4), 604; https://doi.org/10.3390/rs17040604 - 10 Feb 2025

Viewed by 417

Abstract

Inverse Synthetic Aperture Radar (ISAR) images of space targets and their key components are very important. However, this method suffers from numerous drawbacks, including a low Signal-to-Noise Ratio (SNR), blurred edges, significant variations in scattering intensity, and limited data availability, all of which [...] Read more.

Inverse Synthetic Aperture Radar (ISAR) images of space targets and their key components are very important. However, this method suffers from numerous drawbacks, including a low Signal-to-Noise Ratio (SNR), blurred edges, significant variations in scattering intensity, and limited data availability, all of which constrain its recognition capabilities. The terahertz (THz) regime has reflected excellent capacity for space detection in terms of showing the details of target structures. However, in ISAR images, as the observation aperture moves, the imaging features of the extended structures (ESs) undergo significant changes, posing challenges to the subsequent recognition performance. In this paper, a parabolic antenna is taken as the research object. An innovative approach for identifying this component is proposed by using the advantages of the Component Prior and Imaging Characteristics (CPICs) effectively. In order to tackle the challenges associated with component identification in satellite ISAR imagery, this study employs the Improved-YOLOv8 model, which was developed by incorporating the YOLOv8 algorithm, an adaptive detection head known as the Dynamic head (Dyhead) that utilizes an attention mechanism, and a regression box loss function called Wise Intersection over Union (WIoU), which addresses the issue of varying sample difficulty. After being trained on the simulated dataset, the model demonstrated a considerable enhancement in detection accuracy over the five base models, reaching an mAP50 of 0.935 and an mAP50-95 of 0.520. Compared with YOLOv8n, it improved by 0.192 and 0.076 in mAP50 and mAP50-95, respectively. Ultimately, the effectiveness of the suggested method is confirmed through the execution of comprehensive simulations and anechoic chamber tests. Full article

(This article belongs to the Special Issue Advanced Spaceborne SAR Processing Techniques for Target Detection)

► Show Figures

Figure 1

24 pages, 5866 KiB

Open AccessArticle

A Data-Driven Approach for Automatic Aircraft Engine Borescope Inspection Defect Detection Using Computer Vision and Deep Learning

by Thibaud Schaller, Jun Li and Karl W. Jenkins

J. Exp. Theor. Anal. 2025, 3(1), 4; https://doi.org/10.3390/jeta3010004 - 5 Feb 2025

Viewed by 441

Abstract

Regular aircraft engine inspections play a crucial role in aviation safety. However, traditional inspections are often performed manually, relying heavily on the judgment and experience of operators. This paper presents a data-driven deep learning framework capable of automatically detecting defects on reactor blades. [...] Read more.

Regular aircraft engine inspections play a crucial role in aviation safety. However, traditional inspections are often performed manually, relying heavily on the judgment and experience of operators. This paper presents a data-driven deep learning framework capable of automatically detecting defects on reactor blades. Specifically, this study develops Deep Neural Network models to detect defects in borescope images using various datasets, based on Computer Vision and YOLOv8n object detection techniques. Firstly, reactor blade images are collected from public resources and then annotated and preprocessed into different groups based on Computer Vision techniques. In addition, synthetic images are generated using Deep Convolutional Generative Adversarial Networks and a manual data augmentation approach by randomly pasting defects onto reactor blade images. YOLOv8n-based deep learning models are subsequently fine-tuned and trained on these dataset groups. The results indicate that the model trained on wide-shot blade images performs better overall at detecting defects on blades compared to the model trained on zoomed-in images. The comparison of multiple models’ results reveals inherent uncertainties in model performance that while some models trained on data enhanced by Computer Vision techniques may appear more reliable in some types of defect detection, the relationship between these techniques and subsequent results cannot be generalized. The impact of epochs and optimizers on the model’s performance indicates that incorporating rotated images and selecting an appropriate optimizer are key factors for effective model training. Furthermore, models trained solely on artificially generated images from collages perform poorly at detecting defects in real images. A potential solution is to train the model on both synthetic and real images. Future work will focus on improving the framework’s performance and conducting a more comprehensive uncertainty analysis by utilizing larger and more diverse datasets, supported by enhanced computational power. Full article

► Show Figures

Figure 1

18 pages, 5370 KiB

Open AccessArticle

Research on Blood Cell Image Detection Method Based on Fourier Ptychographic Microscopy

by Mingjing Li, Le Yang, Shu Fang, Xinyang Liu, Haijiao Yun, Xiaoli Wang, Qingyu Du, Ziqing Han and Junshuai Wang

Sensors 2025, 25(3), 882; https://doi.org/10.3390/s25030882 - 31 Jan 2025

Viewed by 491

Abstract

Autonomous Fourier Ptychographic Microscopy (FPM) is a technology widely used in the field of pathology. It is compatible with high resolution and large field-of-view imaging and can observe more image details. Red blood cells play an indispensable role in assessing the oxygen-carrying capacity [...] Read more.

Autonomous Fourier Ptychographic Microscopy (FPM) is a technology widely used in the field of pathology. It is compatible with high resolution and large field-of-view imaging and can observe more image details. Red blood cells play an indispensable role in assessing the oxygen-carrying capacity of the human body and in screening for clinical diagnosis and treatment needs. In this paper, the blood cell data set is constructed based on the FPM system experimental platform. Before training, four enhancement strategies are adopted for the blood cell image data to improve the generalization and robustness of the model. A blood cell detection algorithm based on SCD-YOLOv7 is proposed. Firstly, the C-MP (Convolutional Max Pooling) module and DELAN (Deep Efficient Learning Automotive Network) module are used in the feature extraction network to optimize the feature extraction process and improve the extraction ability of overlapping cell features by considering the characteristics of channels and spatial dimensions. Secondly, through the Sim-Head detection head, the global information of the deep feature map (mean average precision) and the local details of the shallow feature map are fully utilized to improve the performance of the algorithm for small target detection. MAP is a comprehensive indicator for evaluating the performance of object detection algorithms, which measures the accuracy and robustness of a model by calculating the average precision (AP) under different categories or thresholds. Finally, the Focal-EIoU (Focal Extended Intersection over Union) loss function is introduced, which not only improves the convergence speed of the model but also significantly improves the accuracy of blood cell detection. Through quantitative and qualitative analysis of ablation experiments and comparative experimental results, the detection accuracy of the SCD-YOLOv7 algorithm on the blood cell data set reached 92.4%, increased by 7.2%, and the calculation amount was reduced by 14.6 G. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

26 pages, 29753 KiB

Open AccessArticle

YOLO-SSW: An Improved Detection Method for Printed Circuit Board Surface Defects

by Tizheng Yuan, Zhengkuo Jiao and Naizhe Diao

Mathematics 2025, 13(3), 435; https://doi.org/10.3390/math13030435 - 28 Jan 2025

Viewed by 813

Abstract

Accurately recognizing tiny defects on printed circuit boards (PCBs) remains a significant challenge due to the abundance of small targets and complex background textures. To tackle this issue, this article proposes a novel YOLO-SPD-SimAM-WIoU (YOLO-SSW) network, based on an improved YOLOv8 algorithm, to [...] Read more.

Accurately recognizing tiny defects on printed circuit boards (PCBs) remains a significant challenge due to the abundance of small targets and complex background textures. To tackle this issue, this article proposes a novel YOLO-SPD-SimAM-WIoU (YOLO-SSW) network, based on an improved YOLOv8 algorithm, to detect tiny PCB defects with greater accuracy and efficiency. Firstly, a high-resolution feature layer (P2) is incorporated into the feature fusion part to preserve detailed spatial information of small targets. Secondly, a Non-strided Convolution with Space-to-Depth (Conv-SPD) module is incorporated to retain fine-grained information by replacing traditional strided convolutions, which helps maintain spatial resolution. Thirdly, the Simple Parameter-Free Attention Module (SimAM) is integrated into the backbone to enhance feature extraction and noise resistance, focusing the model’s attention on small targets in relevant areas. Finally, the Wise-IoU (WIoU) loss function is adopted to dynamically adjust gradient gains, reducing the impact of low-quality examples, thereby enhancing localization accuracy. Comprehensive evaluations on publicly available PCB defect datasets have demonstrated that the proposed YOLO-SSW model significantly outperforms several state-of-the-art models, achieving a mean average precision (mAP) of 98.4%. Notably, compared to YOLOv8s, YOLO-SSW improved the mAP, precision, and recall by 0.8%, 0.6%, and 0.8%, respectively, confirming its accuracy and effectiveness. Full article

► Show Figures

Figure 1

18 pages, 8134 KiB

Open AccessArticle

YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds

by Jiajun Ren, Haifeng Zhang and Min Yue

Appl. Sci. 2025, 15(3), 1184; https://doi.org/10.3390/app15031184 - 24 Jan 2025

Viewed by 632

Abstract

The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to [...] Read more.

The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to meet modern manufacturing standards. To address these challenges, an enhanced YOLOv8-based algorithm for steel defect detection, termed YOLOv8-WD (weld detection), was developed to improve accuracy and efficiency in identifying defects in steel. We implemented a novel data augmentation strategy with various image transformation techniques to enhance the model’s generalization across different welding scenarios. The Efficient Vision Transformer (EfficientViT) architecture was adopted to optimize feature representation and contextual understanding, improving detection accuracy. Additionally, we integrated the Convolution and Attention Fusion Module (CAFM) to effectively combine local and global features, enhancing the model’s ability to capture diverse feature scales. Dynamic convolution (DyConv) techniques were also employed to generate convolutional kernels based on input images, increasing model flexibility and efficiency. Through comprehensive optimization and tuning, our research achieved a mean average precision (map) at IoU 0.5 of 90.5% across multiple datasets, contributing to improved weld defect detection and offering a reliable automated inspection solution for the industry. Full article

(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)

► Show Figures

Figure 1

20 pages, 5288 KiB

Open AccessArticle

A Study on Multi-Scale Behavior Recognition of Dairy Cows in Complex Background Based on Improved YOLOv5

by Zheying Zong, Zeyu Ban, Chunguang Wang, Shuai Wang, Wenbo Yuan, Chunhui Zhang, Lide Su and Ze Yuan

Agriculture 2025, 15(2), 213; https://doi.org/10.3390/agriculture15020213 - 19 Jan 2025

Viewed by 699

Abstract

The daily behaviors of dairy cows, including standing, drinking, eating, and lying down, are closely associated with their physical health. Efficient and accurate recognition of dairy cow behaviors is crucial for timely monitoring of their health status and enhancing the economic efficiency of [...] Read more.

The daily behaviors of dairy cows, including standing, drinking, eating, and lying down, are closely associated with their physical health. Efficient and accurate recognition of dairy cow behaviors is crucial for timely monitoring of their health status and enhancing the economic efficiency of farms. To address the challenges posed by complex scenarios and significant variations in target scales in dairy cow behavior recognition within group farming environments, this study proposes an enhanced recognition method based on YOLOv5. Four Shuffle Attention (SA) modules are integrated into the upsampling and downsampling processes of the YOLOv5 model’s neck network to enhance deep feature extraction of small-scale cow targets and focus on feature information, while maintaining network complexity and real-time performance. The C3 module of the model was enhanced by incorporating Deformable convolution (DCNv3), which improves the accuracy of cow behavior characteristic identification. Finally, the original detection head was replaced with a Dynamic Detection Head (DyHead) to improve the efficiency and accuracy of cow behavior detection across different scales in complex environments. An experimental dataset comprising complex backgrounds, multiple behavior categories, and multi-scale targets was constructed for comprehensive validation. The experimental results demonstrate that the improved YOLOv5 model achieved a mean Average Precision (mAP) of 97.7%, representing a 3.7% improvement over the original YOLOv5 model. Moreover, it outperformed comparison models, including YOLOv4, YOLOv3, and Faster R-CNN, in complex background scenarios, multi-scale behavior detection, and behavior type discrimination. Ablation experiments further validate the effectiveness of the SA, DCNv3, and DyHead modules. The research findings offer a valuable reference for real-time monitoring of cow behavior in complex environments throughout the day. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

16 pages, 3776 KiB

Open AccessArticle

MDA-DETR: Enhancing Offending Animal Detection with Multi-Channel Attention and Multi-Scale Feature Aggregation

by Haiyan Zhang, Huiqi Li, Guodong Sun and Feng Yang

Animals 2025, 15(2), 259; https://doi.org/10.3390/ani15020259 - 17 Jan 2025

Viewed by 514

Abstract

Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly [...] Read more.

Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly in obscured or blurry nighttime images. This article introduces Multi-Channel Coordinated Attention and Multi-Dimension Feature Aggregation (MDA-DETR). It integrates multi-scale features for enhanced detection accuracy, employing a Multi-Channel Coordinated Attention (MCCA) mechanism to incorporate location, semantic, and long-range dependency information and a Multi-Dimension Feature Aggregation Module (DFAM) for cross-scale feature aggregation. Additionally, the VariFocal Loss function is utilized to assign pixel weights, enhancing detail focus and maintaining accuracy. In the dataset section, this article uses a dataset from the Northeast China Tiger and Leopard National Park, which includes images of six common offending animal species. In the comprehensive experiments on the dataset, the

m A P_{50}

index of MDA-DETR was 1.3%, 0.6%, 0.3%, 3%, 1.1%, and 0.5% higher than RT-DETR-r18, yolov8n, yolov9-C, DETR, Deformable-detr, and DCA-yolov8, respectively, indicating that MDA-DETR is superior to other advanced methods. Full article

(This article belongs to the Special Issue Animal–Computer Interaction: Advances and Opportunities)

► Show Figures

Figure 1

Search Results (244)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (244)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI