Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (244)

Search Parameters:
Keywords = Comprehensive-YOLOv5

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 13198 KiB  
Article
UAV Localization in Urban Area Mobility Environment Based on Monocular VSLAM with Deep Learning
by Mutagisha Norbelt, Xiling Luo, Jinping Sun and Uwimana Claude
Drones 2025, 9(3), 171; https://doi.org/10.3390/drones9030171 - 26 Feb 2025
Viewed by 290
Abstract
Unmanned Aerial Vehicles (UAVs) play a major role in different applications, including surveillance, mapping, and disaster relief, particularly in urban environments. This paper presents a comprehensive framework for UAV localization in outdoor environments using monocular ORB-SLAM3 integrated with optical flow and YOLOv5 for [...] Read more.
Unmanned Aerial Vehicles (UAVs) play a major role in different applications, including surveillance, mapping, and disaster relief, particularly in urban environments. This paper presents a comprehensive framework for UAV localization in outdoor environments using monocular ORB-SLAM3 integrated with optical flow and YOLOv5 for enhanced performance. The proposed system addresses the challenges of accurate localization in dynamic outdoor environments where traditional GPS methods may falter. By leveraging the capabilities of ORB-SLAM3, the UAV can effectively map its environment while simultaneously tracking its position using visual information from a single camera. The integration of optical flow techniques allows for accurate motion estimation between consecutive frames, which is critical for maintaining accurate localization amidst dynamic changes in the environment. YOLOv5 is a highly efficient model utilized for real-time object detection, enabling the system to identify and classify dynamic objects within the UAV’s field of view. This dual approach of using both optical flow and deep learning enhances the robustness of the localization process by filtering out dynamic features that could otherwise cause mapping errors. Experimental results show that the combination of monocular ORB-SLAM3, optical flow, and YOLOv5 significantly improves localization accuracy and reduces trajectory errors compared to traditional methods. In terms of absolute trajectory error and average tracking time, the suggested approach performs better than ORB-SLAM3 and DynaSLAM. For real-time SLAM applications in dynamic situations, our technique is especially well-suited due to its potential to achieve lower latency and greater accuracy. These improvements guarantee more dependable performance in a variety of scenarios in addition to increasing overall efficiency. The framework effectively distinguishes between static and dynamic elements, allowing for more reliable map construction and navigation. The results show that our proposed method (U-SLAM) produces a considerable decrease of up to 43.47% in APE and 26.47% RPE in S000, and its accuracy is higher for sequences with moving objects and more motion inside the image. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram overview of UAV localization based on Vslam with deep learning.</p>
Full article ">Figure 2
<p>Flowchart of the proposed approach.</p>
Full article ">Figure 3
<p>Principle of YOLO algorithm structure.</p>
Full article ">Figure 4
<p>Yolov5 C3_X structure.</p>
Full article ">Figure 5
<p>Input images were managed using the YOLOv5 method; the processed images are displayed on the right, while the original RGB images are displayed on the left.</p>
Full article ">Figure 6
<p>Results of training-testing proposed YOLOv5 model.</p>
Full article ">Figure 7
<p>Flowchart of ORB-SLAM.</p>
Full article ">Figure 8
<p>ORB points feature extraction in Urban area environments.</p>
Full article ">Figure 9
<p>Optical flow and depth estimation from the input image.</p>
Full article ">Figure 10
<p>This figure illustrates the S000 sequence dataset Absolute Pose Error (APE) distribution graphs for our proposed method compared to existing methods. The x-axis represents time in seconds, while the y-axis shows error in meters. The left side is the APE of U-SLAM (ours) and the right side is the APE of ORB-SLAM3.</p>
Full article ">Figure 11
<p>This figure illustrates the S000 sequence dataset Relative Pose Error (RPE) distribution graphs for our proposed method compared to existing methods. The x-axis represents time in seconds, while the y-axis shows error in meters. The left side is the RPE of U-SLAM (ours) and the right side is the APE of ORB-SLAM3.</p>
Full article ">Figure 12
<p>S000 sequence dataset relative pose error distribution graphs: the left side is the RPE of U-SLAM and the right side is the RPE of ORB-SLAM3.</p>
Full article ">Figure 13
<p>Three different 3D paths are displayed in comparative graphs: The groundtruth is illustrated by the gray dashed lines depicted in the figure, the ORB-SLAM3 trajectory is illustrated by the green lines, and the U-SLAM trajectory is illustrated by the blue lines.</p>
Full article ">Figure 14
<p>Three trajectories in the xyz coordinates and rpy orientations are displayed in comparison graphs; the S000-xyz series is illustrated by the left-side figure, and the S000-rpy sequence is illustrated by the right-side figure. In the graphics, the ground truth is shown by the gray dashed lines; the ORB-SLAM3 computed trajectories are shown by the green lines, and the U-SLAM estimated trajectories are shown by the blue lines.</p>
Full article ">
26 pages, 17568 KiB  
Article
Research on Apple Detection and Tracking Count in Complex Scenes Based on the Improved YOLOv7-Tiny-PDE
by Dongxuan Cao, Wei Luo, Ruiyin Tang, Yuyan Liu, Jiasen Zhao, Xuqing Li and Lihua Yuan
Agriculture 2025, 15(5), 483; https://doi.org/10.3390/agriculture15050483 - 24 Feb 2025
Viewed by 189
Abstract
Accurately detecting apple fruit can crucially assist in estimating the fruit yield in apple orchards in complex scenarios. In such environments, the factors of density, leaf occlusion, and fruit overlap can affect the detection and counting accuracy. This paper proposes an improved YOLOv7-Tiny-PDE [...] Read more.
Accurately detecting apple fruit can crucially assist in estimating the fruit yield in apple orchards in complex scenarios. In such environments, the factors of density, leaf occlusion, and fruit overlap can affect the detection and counting accuracy. This paper proposes an improved YOLOv7-Tiny-PDE network model based on the YOLOv7-Tiny model to detect and count apples from data collected by drones, considering various occlusion and lighting conditions. First, within the backbone network, we replaced the simplified efficient layer aggregation network (ELAN) with partial convolution (PConv), reducing the network parameters and computational redundancy while maintaining the detection accuracy. Second, in the neck network, we used a dynamic detection head to replace the original detection head, effectively suppressing the background interference and capturing the background information more comprehensively, thus enhancing the detection accuracy for occluded targets and improving the fruit feature extraction. To further optimize the model, we replaced the boundary box loss function from CIOU to EIOU. For fruit counting across video frames in complex occlusion scenes, we integrated the improved model with the DeepSort tracking algorithm based on Kalman filtering and motion trajectory prediction with a cascading matching algorithm. According to experimental results, compared with the baseline YOLOv7-Tiny, the improved model reduced the total parameters by 22.2% and computation complexity by 18.3%. Additionally, in data testing, the p-value improved by 0.5%; the R-value rose by 2.7%; the mAP and F1 scores rose by 4% and 1.7%, respectively; and the MOTA value improved by 2%. The improved model is more lightweight and can preserve a high detection accuracy well, and hence, it can be applied to detection and counting tasks in complex orchards and provides a new solution for fruit yield estimation using lightweight devices. Full article
Show Figures

Figure 1

Figure 1
<p>Apple images collected from the orchard, including the (<b>a</b>) upward view, (<b>b</b>) downward view, (<b>c</b>) frontlit, (<b>d</b>) backlit, (<b>e</b>) leaf and branch occlusions, and (<b>f</b>) fruit occlusion.</p>
Full article ">Figure 2
<p>Augmented apple images: (<b>a</b>) original image; (<b>b</b>) brightness adjustment; (<b>c</b>) geometric transformation; (<b>d</b>) Gaussian noise; (<b>e</b>) blur; (<b>f</b>) image composition; (<b>g</b>) mosaic.</p>
Full article ">Figure 3
<p>Sample images of lightly occluded apples.</p>
Full article ">Figure 4
<p>Sample images of densely occluded apples.</p>
Full article ">Figure 5
<p>Apple images annotated using LabelImg. The green rectangular boxes are labeled as target bounding boxes.</p>
Full article ">Figure 6
<p>Structure diagram of the YOLOv7-Tiny model.</p>
Full article ">Figure 7
<p>Structure diagram of the improved YOLOv7-Tiny-PDE model.</p>
Full article ">Figure 8
<p>Convolution architectures: (<b>a</b>) conventional convolution; (<b>b</b>) depthwise/grouped convolution; (<b>c</b>) partial convolution.</p>
Full article ">Figure 9
<p>Structure of DyHead.</p>
Full article ">Figure 10
<p>Flowchart of the improved YOLOv7-Tiny-PDE combined with DeepSort.</p>
Full article ">Figure 11
<p>Detection performance under light occlusion.</p>
Full article ">Figure 12
<p>Detection performance under heavy occlusion.</p>
Full article ">Figure 13
<p>Recognition performance under different lighting conditions.</p>
Full article ">Figure 14
<p>Recognition performance under backlit conditions.</p>
Full article ">Figure 15
<p>Fruit-counting results.</p>
Full article ">
27 pages, 7551 KiB  
Article
RDRM-YOLO: A High-Accuracy and Lightweight Rice Disease Detection Model for Complex Field Environments Based on Improved YOLOv5
by Pan Li, Jitao Zhou, Huihui Sun and Jian Zeng
Agriculture 2025, 15(5), 479; https://doi.org/10.3390/agriculture15050479 - 23 Feb 2025
Viewed by 274
Abstract
Rice leaf diseases critically threaten global rice production by reducing crop yield and quality. Efficient disease detection in complex field environments remains a persistent challenge for sustainable agriculture. Existing deep learning-based methods for rice leaf disease detection struggle with inadequate sensitivity to subtle [...] Read more.
Rice leaf diseases critically threaten global rice production by reducing crop yield and quality. Efficient disease detection in complex field environments remains a persistent challenge for sustainable agriculture. Existing deep learning-based methods for rice leaf disease detection struggle with inadequate sensitivity to subtle disease features, high computational complexity, and degraded accuracy under complex field conditions, such as background interference and fine-grained disease variations. To address these limitations, this research aims to develop a lightweight yet high-accuracy detection model tailored for complex field environments that balances computational efficiency with robust performance. We propose RDRM-YOLO, an enhanced YOLOv5-based network, integrating four key improvements: (i) a cross-stage partial network fusion module (Hor-BNFA) is integrated within the backbone network’s feature extraction stage to enhance the model’s ability to capture disease-specific features; (ii) a spatial depth conversion convolution (SPDConv) is introduced to expand the receptive field, enhancing the extraction of fine-grained features, particularly from small disease spots; (iii) SPDConv is also integrated into the neck network, where the standard convolution is replaced with a lightweight GsConv to increase the accuracy of disease localization, category prediction, and inference speed; and (iv) the WIoU Loss function is adopted in place of CIoU Loss to accelerate convergence and enhance detection accuracy. The model is trained and evaluated utilizing a comprehensive dataset of 5930 field-collected and augmented sample images comprising four prevalent rice leaf diseases: bacterial blight, leaf blast, brown spot, and tungro. Experimental results demonstrate that our proposed RDRM-YOLO model achieves state-of-the-art performance with a detection accuracy of 94.3%, and a recall of 89.6%. Furthermore, it achieves a mean Average Precision (mAP) of 93.5%, while maintaining a compact model size of merely 7.9 MB. Compared to Faster R-CNN, YOLOv6, YOLOv7, and YOLOv8 models, the RDRM-YOLO model demonstrates faster convergence and achieves the optimal result values in Precision, Recall, mAP, model size, and inference speed. This work provides a practical solution for real-time rice disease monitoring in agricultural fields, offering a very effective balance between model simplicity and detection performance. The proposed enhancements are readily adaptable to other crop disease detection tasks, thereby contributing to the advancement of precision agriculture technologies. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Samples of four rice leaf diseases from dataset I: (<b>a</b>) bacterial blight sample; (<b>b</b>) leaf blast sample; (<b>c</b>) brown spot sample; (<b>d</b>) tungro sample.</p>
Full article ">Figure 2
<p>RDRM-YOLO model structure diagram.</p>
Full article ">Figure 3
<p>C3 module structure diagram.</p>
Full article ">Figure 4
<p>g<sup>n</sup>Conv structural diagram. Note: Proj is a convolution operation that can change the number of channels; DWConv is a deep separable convolutional layer; C is the number of channels; Mul is a multiplication operation.</p>
Full article ">Figure 5
<p>HorBlock module structure diagram. Note: ⊗ represents weighted operation.</p>
Full article ">Figure 6
<p>Hor-BNFA module structure diagram. Note: <span class="html-italic">c</span><sub>1</sub>, <span class="html-italic">c</span><sub>2</sub>, y<sub>1</sub>, y<sub>2</sub>, y<sub>3</sub>, y<sub>4</sub> are feature information.</p>
Full article ">Figure 7
<p>Schematic of SPDConv when scale = 2.</p>
Full article ">Figure 8
<p>GsConv structure diagram.</p>
Full article ">Figure 9
<p>The prediction box and the ground truth box of loss function CIoU.</p>
Full article ">Figure 10
<p>Rice leaf disease detection results of RDRM-YOLO: (<b>a</b>) bacterial blight; (<b>b</b>) leaf blast; (<b>c</b>) brown spot; (<b>d</b>) tungro.</p>
Full article ">Figure 11
<p>Precision-Recall curves of the RDRM-YOLO model.</p>
Full article ">Figure 12
<p>Visual heat maps of rice leaf disease detection of RDRM-YOLO.</p>
Full article ">Figure 13
<p>Comparison of the detection effects among different models: (<b>a</b>) Original graphs; (<b>b</b>) Faster R-CNN; (<b>c</b>) YOLOv7; (<b>d</b>) YOLOv8; and (<b>e</b>) RDRM-YOLO.</p>
Full article ">Figure 14
<p>Confusion matrix of the RDRM-YOLO model on the test set.</p>
Full article ">Figure 15
<p>Comparison of the training curves of different models.</p>
Full article ">Figure 16
<p>Lightweight and inference speed comparison of different models.</p>
Full article ">Figure 17
<p>Performance comparison of different models.</p>
Full article ">
19 pages, 10954 KiB  
Article
YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment
by Yane Ma and Shujuan Zhang
Agronomy 2025, 15(3), 537; https://doi.org/10.3390/agronomy15030537 - 23 Feb 2025
Viewed by 176
Abstract
In order to accurately detect the maturity of chili peppers under different lighting and natural environmental scenarios, in this study, we propose a lightweight maturity detection model, YOLOv8-CBSE, based on YOLOv8n. By replacing the C2f module in the original model with the designed [...] Read more.
In order to accurately detect the maturity of chili peppers under different lighting and natural environmental scenarios, in this study, we propose a lightweight maturity detection model, YOLOv8-CBSE, based on YOLOv8n. By replacing the C2f module in the original model with the designed C2CF module, the model integrates the advantages of convolutional neural networks and Transformer architecture, improving the model’s ability to extract local features and global information. Additionally, SRFD and DRFD modules are introduced to replace the original convolutional layers, effectively capturing features at different scales and enhancing the diversity and adaptability of the model through the feature fusion mechanism. To further improve detection accuracy, the EIoU loss function is used instead of the CIoU loss function to provide more comprehensive loss information. The results showed that the average precision (AP) of YOLOv8-CBSE for mature and immature chili peppers was 90.75% and 85.41%, respectively, with F1 scores and a mean average precision (mAP) of 81.69% and 88.08%, respectively. Compared with the original YOLOv8n, the F1 score and mAP of the improved model increased by 0.46% and 1.16%, respectively. The detection effect for chili pepper maturity under different scenarios was improved, which proves the robustness and adaptability of YOLOv8-CBSE. YOLOv8-CBSE also maintains a lightweight design with a model size of only 5.82 MB, enhancing its suitability for real-time applications on resource-constrained devices. This study provides an efficient and accurate method for detecting chili peppers in natural environments, which is of great significance for promoting intelligent and precise agricultural management. Full article
Show Figures

Figure 1

Figure 1
<p>Images of chili pepper in different scenarios. (<b>a</b>) Fair-light. (<b>b</b>) Backlight. (<b>c</b>) Leaf occlusion. (<b>d</b>) Fruit overlap. (<b>e</b>) Dense.</p>
Full article ">Figure 2
<p>The structure of YOLOv8n.</p>
Full article ">Figure 3
<p>The structure of the improved model YOLOv8-CBSE.</p>
Full article ">Figure 4
<p>ConvFormer module structure.</p>
Full article ">Figure 5
<p>SRFD module structure.</p>
Full article ">Figure 6
<p>DRFD module structure.</p>
Full article ">Figure 7
<p>Box plot of YOLOv8n and YOLOv8-CBSE with 5-fold cross-validation of mAP values. The red triangle represents the mean and the purple line represents the median.</p>
Full article ">Figure 8
<p>Thermal map visualization results before and after model improvement. (<b>a</b>) Fair-light. (<b>b</b>) Backlight. (<b>c</b>) Leaf occlusion. (<b>d</b>) Fruit overlap. (<b>e</b>) Dense.</p>
Full article ">Figure 9
<p>Box loss curve and mAP curve before and after model improvement. (<b>a</b>) Box loss curve. (<b>b</b>) mAP curve.</p>
Full article ">Figure 10
<p>Results of maturity detection of chili pepper by different target detection models under different natural environments. (<b>a</b>) Fair-light. (<b>b</b>) Backlight. (<b>c</b>) Leaf occlusion. (<b>d</b>) Fruit overlap. (<b>e</b>) Dense.</p>
Full article ">
28 pages, 7478 KiB  
Article
A Comparative Study of YOLO Series (v3–v10) with DeepSORT and StrongSORT: A Real-Time Tracking Performance Study
by Khadijah Alkandary, Ahmet Serhat Yildiz and Hongying Meng
Electronics 2025, 14(5), 876; https://doi.org/10.3390/electronics14050876 - 23 Feb 2025
Viewed by 208
Abstract
Many previous studies have explored the integration of a specific You Only Look Once (YOLO) model with real-time trackers like Deep Simple Online and Realtime Tracker (DeepSORT) and Strong Simple Online and Realtime Tracker (StrongSORT). However, few have conducted a comprehensive and in-depth [...] Read more.
Many previous studies have explored the integration of a specific You Only Look Once (YOLO) model with real-time trackers like Deep Simple Online and Realtime Tracker (DeepSORT) and Strong Simple Online and Realtime Tracker (StrongSORT). However, few have conducted a comprehensive and in-depth analysis of integrating the family of YOLO models with these real-time trackers to study the performance of the resulting pipeline and draw critical conclusions. This work aims to fill this gap, with the primary objective of investigating the effectiveness of integrating the YOLO series, in light-sized versions, with the real-time DeepSORT and StrongSORT tracking algorithms for real-time object tracking in a computationally limited environment. This work will systematically compare various lightweight YOLO versions, from YOLO version 3 (YOLOv3) to YOLO version 10 (YOLOv10), combined with both tracking algorithms. It will evaluate their performance using detailed metrics across diverse and challenging real-world datasets: the Multiple Object Tracking 2017 (MOT17) and Multiple Object Tracking 2020 (MOT20) datasets. The goal of this work is to assess the robustness and accuracy of these light models in multiple complex real-world environments in scenarios with limited computational resources. Our findings reveal that YOLO version 5 (YOLOv5), when combined with either tracker (DeepSORT or StrongSORT), offers not only a solid baseline in terms of the model’s size (enabling real-time performance on edge devices) but also competitive overall performance (in terms of Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP)). The results suggest a strong correlation between the choice regarding the YOLO version and the tracker’s overall performance. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>YOLOv−1 architecture [<a href="#B1-electronics-14-00876" class="html-bibr">1</a>].</p>
Full article ">Figure 2
<p>YOLOv−2 architecture [<a href="#B21-electronics-14-00876" class="html-bibr">21</a>].</p>
Full article ">Figure 3
<p>YOLOv−3 architecture [<a href="#B21-electronics-14-00876" class="html-bibr">21</a>].</p>
Full article ">Figure 4
<p>YOLOv−4 architecture [<a href="#B8-electronics-14-00876" class="html-bibr">8</a>].</p>
Full article ">Figure 5
<p>YOLOv−5 architecture [<a href="#B22-electronics-14-00876" class="html-bibr">22</a>].</p>
Full article ">Figure 6
<p>YOLOv−6 architecture [<a href="#B10-electronics-14-00876" class="html-bibr">10</a>].</p>
Full article ">Figure 7
<p>YOLOv−7 architecture [<a href="#B23-electronics-14-00876" class="html-bibr">23</a>].</p>
Full article ">Figure 8
<p>YOLOv−8 architecture [<a href="#B12-electronics-14-00876" class="html-bibr">12</a>].</p>
Full article ">Figure 9
<p>Network architectures of (<b>a</b>) PAN, (<b>b</b>) RevCol, (<b>c</b>) deep supervision, and (<b>d</b>) PGI for YOLOv−9 [<a href="#B24-electronics-14-00876" class="html-bibr">24</a>].</p>
Full article ">Figure 10
<p>Network architectures of (<b>a</b>) CSPNet, (<b>b</b>) ELAN, and (<b>c</b>) GELAN for YOLOv9 [<a href="#B24-electronics-14-00876" class="html-bibr">24</a>].</p>
Full article ">Figure 11
<p>The dual assignment strategy [<a href="#B13-electronics-14-00876" class="html-bibr">13</a>].</p>
Full article ">Figure 12
<p>DeepSORT architecture.</p>
Full article ">Figure 13
<p>StrongSORT architecture.</p>
Full article ">Figure 14
<p>Chosen scenes from MOT17 and MOT20 [<a href="#B17-electronics-14-00876" class="html-bibr">17</a>,<a href="#B18-electronics-14-00876" class="html-bibr">18</a>]: (<b>a</b>) represents the MOT17 scenes, while (<b>b</b>) represents the MOT20 scenes.</p>
Full article ">Figure 15
<p>Comparison regarding number of frames and density between MOT20 and MOT17 scenes [<a href="#B17-electronics-14-00876" class="html-bibr">17</a>,<a href="#B18-electronics-14-00876" class="html-bibr">18</a>].</p>
Full article ">Figure 16
<p>Experiment pipeline.</p>
Full article ">Figure 17
<p>DeepSORT: precision vs. recall on MOT17 vs. MOT20 datasets.</p>
Full article ">Figure 18
<p>DeepSORT: MOTA vs. IDsw on MOT17 and MOT20 datasets.</p>
Full article ">Figure 19
<p>Output of DeepSORT with YOLOv5 on MOT17-05 scene.</p>
Full article ">Figure 20
<p>Output of DeepSORT with YOLOv3 on MOT20-01 scene.</p>
Full article ">Figure 21
<p>An ID switch scenario for untuned YOLOv3.</p>
Full article ">Figure 22
<p>StrongSORT: precision vs. recall on MOT17-05 vs. MOT20-03 datasets.</p>
Full article ">Figure 23
<p>StrongSORT: MOTP vs. IDsw on MOT17-05 vs. MOT20-03 datasets.</p>
Full article ">Figure 24
<p>Output of StrongSORT with YOLOv5 on MOT17-05 scene.</p>
Full article ">Figure 25
<p>Output of StrongSORT with YOLOv5 on MOT20-03 scene.</p>
Full article ">
25 pages, 2431 KiB  
Article
Comparative Performance Evaluation of YOLOv5, YOLOv8, and YOLOv11 for Solar Panel Defect Detection
by Rahima Khanam, Tahreem Asghar and Muhammad Hussain
Solar 2025, 5(1), 6; https://doi.org/10.3390/solar5010006 - 21 Feb 2025
Viewed by 432
Abstract
The reliable operation of photovoltaic (PV) systems is essential for sustainable energy production, yet their efficiency is often compromised by defects such as bird droppings, cracks, and dust accumulation. Automated defect detection is critical for addressing these challenges in large-scale solar farms, where [...] Read more.
The reliable operation of photovoltaic (PV) systems is essential for sustainable energy production, yet their efficiency is often compromised by defects such as bird droppings, cracks, and dust accumulation. Automated defect detection is critical for addressing these challenges in large-scale solar farms, where manual inspections are impractical. This study evaluates three YOLO object detection models—YOLOv5, YOLOv8, and YOLOv11—on a comprehensive dataset to identify solar panel defects. YOLOv5 achieved the fastest inference time (7.1 ms per image) and high precision (94.1%) for cracked panels. YOLOv8 excelled in recall for rare defects, such as bird drops (79.2%), while YOLOv11 delivered the highest [email protected] (93.4%), demonstrating a balanced performance across the defect categories. Despite the strong performance for common defects like dusty panels ([email protected] > 98%), bird drop detection posed challenges due to dataset imbalances. These results highlight the trade-offs between accuracy and computational efficiency, providing actionable insights for deploying automated defect detection systems to enhance PV system reliability and scalability. Full article
(This article belongs to the Special Issue Recent Advances in Solar Photovoltaic Protection)
Show Figures

Figure 1

Figure 1
<p>YOLO model evolution.</p>
Full article ">Figure 2
<p>Representative examples of classes in the solar panel dataset.</p>
Full article ">Figure 3
<p>Confusion matrix for YOLO models.</p>
Full article ">Figure 4
<p>F1–confidence curves of YOLO models.</p>
Full article ">Figure 5
<p>Precision–recall (PR) curves of YOLO models.</p>
Full article ">
25 pages, 5090 KiB  
Article
Research on Intelligent Verification of Equipment Information in Engineering Drawings Based on Deep Learning
by Zicheng Zhang and Yurou He
Electronics 2025, 14(4), 814; https://doi.org/10.3390/electronics14040814 - 19 Feb 2025
Viewed by 237
Abstract
This paper focuses on the crucial task of automatic recognition and understanding of table structures in engineering drawings and document processing. Given the importance of tables in information display and the urgent need for automated processing of tables in the digitalization process, an [...] Read more.
This paper focuses on the crucial task of automatic recognition and understanding of table structures in engineering drawings and document processing. Given the importance of tables in information display and the urgent need for automated processing of tables in the digitalization process, an intelligent verification method is proposed. This method integrates multiple key techniques: YOLOv10 is used for table object recognition, achieving a precision of 0.891, a recall rate of 0.899, mAP50 of 0.922, and mAP50-95 of 0.677 in table recognition, demonstrating strong target detection capabilities; the improved LORE algorithm is adopted to extract table structures, breaking through the limitations of the original algorithm by segmenting large-sized images, with a table extraction accuracy rate reaching 91.61% and significantly improving the accuracy of handling complex tables; RapidOCR is utilized to achieve text recognition and cell correspondence, solving the problem of text-cell matching; for equipment name semantic matching, a method based on BERT is introduced and calculated using a comprehensive scoring method. Meanwhile, an improved cuckoo search algorithm is proposed to optimize the adjustment factors, avoiding local optima through sine optimization and the catfish effect. Experiments show the accuracy of equipment name matching in semantic similarity calculation approaches 100%. Finally, the paper provides a concrete system practice to prove the effectiveness of the algorithm. In conclusion, through experimental comparisons, this method exhibits excellent performance in table area location, structure recognition, and semantic matching and is of great significance and practical value in advancing table data processing technology in engineering drawings. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>The framework of intelligent verification methods.</p>
Full article ">Figure 2
<p>The framework of YOLOv10.</p>
Full article ">Figure 3
<p>Illustration of the improved LORE algorithm.</p>
Full article ">Figure 4
<p>First-last layer average pooling.</p>
Full article ">Figure 5
<p>Improved cuckoo search algorithm flowchart.</p>
Full article ">Figure 6
<p>Iteration curve of algorithm training effectiveness.</p>
Full article ">Figure 7
<p>Display of recognition results.</p>
Full article ">Figure 8
<p>Comparison of the recognition process of this paper’s algorithm with the original LORE algorithm.</p>
Full article ">Figure 9
<p>Iteration curves of the three functions for CS and ICS. (<b>a</b>) Iteration Curves of Function F1 (<b>b</b>) Iteration Curves of Function F2 (<b>c</b>) Iteration Curves of Function F3.</p>
Full article ">Figure 10
<p>Iteration curves of the three algorithms.</p>
Full article ">Figure 11
<p>Schematic diagram of system process, model and components.</p>
Full article ">Figure 12
<p>Matching result system screenshot.</p>
Full article ">
21 pages, 14761 KiB  
Article
GeoIoU-SEA-YOLO: An Advanced Model for Detecting Unsafe Behaviors on Construction Sites
by Xuejun Jia, Xiaoxiong Zhou, Zhihan Shi, Qi Xu and Guangming Zhang
Sensors 2025, 25(4), 1238; https://doi.org/10.3390/s25041238 - 18 Feb 2025
Viewed by 276
Abstract
Unsafe behaviors on construction sites are a major cause of accidents, highlighting the need for effective detection and prevention. Traditional methods like manual inspections and video surveillance often lack real-time performance and comprehensive coverage, making them insufficient for diverse and complex site environments. [...] Read more.
Unsafe behaviors on construction sites are a major cause of accidents, highlighting the need for effective detection and prevention. Traditional methods like manual inspections and video surveillance often lack real-time performance and comprehensive coverage, making them insufficient for diverse and complex site environments. This paper introduces GeoIoU-SEA-YOLO, an enhanced object detection model integrating the Geometric Intersection over Union (GeoIoU) loss function and Structural-Enhanced Attention (SEA) mechanism to improve accuracy and real-time detection. GeoIoU enhances bounding box regression by considering geometric characteristics, excelling in the detection of small objects, occlusions, and multi-object interactions. SEA combines channel and multi-scale spatial attention, dynamically refining feature map weights to focus on critical features. Experiments show that GeoIoU-SEA-YOLO outperforms YOLOv3, YOLOv5s, YOLOv8s, and SSD, achieving high precision ([email protected] = 0.930), recall, and small object detection in complex scenes, particularly for unsafe behaviors like missing safety helmets, vests, or smoking. Ablation studies confirm the independent and combined contributions of GeoIoU and SEA to performance gains, providing a reliable solution for intelligent safety management on construction sites. Full article
Show Figures

Figure 1

Figure 1
<p>YOLOv5 network structure diagram.</p>
Full article ">Figure 2
<p>The architecture of the SEA-YOLO network.</p>
Full article ">Figure 3
<p>Schematic diagram of the GeoIoU structure.</p>
Full article ">Figure 4
<p>Workflow of the Structural-Enhanced Attention (SEA) mechanism.</p>
Full article ">Figure 5
<p>Example of datasets.</p>
Full article ">Figure 6
<p>Comparison of Grad-CAM visualizations across different mainstream algorithms. (<b>a</b>–<b>c</b>) represent different picture examples.</p>
Full article ">Figure 7
<p>Comparison of detection results in ablation experiments.</p>
Full article ">Figure 7 Cont.
<p>Comparison of detection results in ablation experiments.</p>
Full article ">Figure 7 Cont.
<p>Comparison of detection results in ablation experiments.</p>
Full article ">Figure 8
<p>Comparison of loss function graph with baseline method.</p>
Full article ">
24 pages, 13033 KiB  
Article
Detection of Parabolic Antennas in Satellite Inverse Synthetic Aperture Radar Images Using Component Prior and Improved-YOLOv8 Network in Terahertz Regime
by Liuxiao Yang, Hongqiang Wang, Yang Zeng, Wei Liu, Ruijun Wang and Bin Deng
Remote Sens. 2025, 17(4), 604; https://doi.org/10.3390/rs17040604 - 10 Feb 2025
Viewed by 417
Abstract
Inverse Synthetic Aperture Radar (ISAR) images of space targets and their key components are very important. However, this method suffers from numerous drawbacks, including a low Signal-to-Noise Ratio (SNR), blurred edges, significant variations in scattering intensity, and limited data availability, all of which [...] Read more.
Inverse Synthetic Aperture Radar (ISAR) images of space targets and their key components are very important. However, this method suffers from numerous drawbacks, including a low Signal-to-Noise Ratio (SNR), blurred edges, significant variations in scattering intensity, and limited data availability, all of which constrain its recognition capabilities. The terahertz (THz) regime has reflected excellent capacity for space detection in terms of showing the details of target structures. However, in ISAR images, as the observation aperture moves, the imaging features of the extended structures (ESs) undergo significant changes, posing challenges to the subsequent recognition performance. In this paper, a parabolic antenna is taken as the research object. An innovative approach for identifying this component is proposed by using the advantages of the Component Prior and Imaging Characteristics (CPICs) effectively. In order to tackle the challenges associated with component identification in satellite ISAR imagery, this study employs the Improved-YOLOv8 model, which was developed by incorporating the YOLOv8 algorithm, an adaptive detection head known as the Dynamic head (Dyhead) that utilizes an attention mechanism, and a regression box loss function called Wise Intersection over Union (WIoU), which addresses the issue of varying sample difficulty. After being trained on the simulated dataset, the model demonstrated a considerable enhancement in detection accuracy over the five base models, reaching an mAP50 of 0.935 and an mAP50-95 of 0.520. Compared with YOLOv8n, it improved by 0.192 and 0.076 in mAP50 and mAP50-95, respectively. Ultimately, the effectiveness of the suggested method is confirmed through the execution of comprehensive simulations and anechoic chamber tests. Full article
(This article belongs to the Special Issue Advanced Spaceborne SAR Processing Techniques for Target Detection)
Show Figures

Figure 1

Figure 1
<p>The overall framework diagram of the proposed method.</p>
Full article ">Figure 2
<p>The observational geometry for space-based terahertz radar in detecting space targets.</p>
Full article ">Figure 3
<p>Geometry projection diagram of ISAR imaging.</p>
Full article ">Figure 4
<p>Parabolic antenna imaging characteristics. (<b>a</b>) Three typical observation apertures. (<b>b</b>) Scattering intensity versus azimuth angle. (<b>c</b>) The specular point. (<b>d</b>) The edge pair-points. (<b>e</b>) The ellipse arc.</p>
Full article ">Figure 5
<p>Satellite CAD model with 5 main scattering components (<b>left</b>) and its geometry and size (<b>right</b>).</p>
Full article ">Figure 6
<p>Imaging results and corresponding CAD under three typical observation apertures.</p>
Full article ">Figure 7
<p>Structure diagram of Improved-YOLOv8.</p>
Full article ">Figure 8
<p>Structure diagram of Dyhead.</p>
Full article ">Figure 9
<p>The training samples under different apertures.</p>
Full article ">Figure 10
<p>The distribution of bounding boxes within the dataset.</p>
Full article ">Figure 11
<p>The mAP50 (left) and mAP50-95 (right) of different networks in the training set.</p>
Full article ">Figure 12
<p>A comparison of the detection performance of different algorithms on EM data.</p>
Full article ">Figure 13
<p>mAP50 and mAP50-95 of different networks.</p>
Full article ">Figure 14
<p>PR curves for three different objects.</p>
Full article ">Figure 15
<p>Anechoic chamber experiment and satellite mock-up presentation. (<b>a</b>) Terahertz radar technology system. (<b>b</b>) Satellite model for anechoic chamber experiment.</p>
Full article ">Figure 16
<p>Comparison of performance between different networks on anechoic chamber data.</p>
Full article ">
24 pages, 5866 KiB  
Article
A Data-Driven Approach for Automatic Aircraft Engine Borescope Inspection Defect Detection Using Computer Vision and Deep Learning
by Thibaud Schaller, Jun Li and Karl W. Jenkins
J. Exp. Theor. Anal. 2025, 3(1), 4; https://doi.org/10.3390/jeta3010004 - 5 Feb 2025
Viewed by 441
Abstract
Regular aircraft engine inspections play a crucial role in aviation safety. However, traditional inspections are often performed manually, relying heavily on the judgment and experience of operators. This paper presents a data-driven deep learning framework capable of automatically detecting defects on reactor blades. [...] Read more.
Regular aircraft engine inspections play a crucial role in aviation safety. However, traditional inspections are often performed manually, relying heavily on the judgment and experience of operators. This paper presents a data-driven deep learning framework capable of automatically detecting defects on reactor blades. Specifically, this study develops Deep Neural Network models to detect defects in borescope images using various datasets, based on Computer Vision and YOLOv8n object detection techniques. Firstly, reactor blade images are collected from public resources and then annotated and preprocessed into different groups based on Computer Vision techniques. In addition, synthetic images are generated using Deep Convolutional Generative Adversarial Networks and a manual data augmentation approach by randomly pasting defects onto reactor blade images. YOLOv8n-based deep learning models are subsequently fine-tuned and trained on these dataset groups. The results indicate that the model trained on wide-shot blade images performs better overall at detecting defects on blades compared to the model trained on zoomed-in images. The comparison of multiple models’ results reveals inherent uncertainties in model performance that while some models trained on data enhanced by Computer Vision techniques may appear more reliable in some types of defect detection, the relationship between these techniques and subsequent results cannot be generalized. The impact of epochs and optimizers on the model’s performance indicates that incorporating rotated images and selecting an appropriate optimizer are key factors for effective model training. Furthermore, models trained solely on artificially generated images from collages perform poorly at detecting defects in real images. A potential solution is to train the model on both synthetic and real images. Future work will focus on improving the framework’s performance and conducting a more comprehensive uncertainty analysis by utilizing larger and more diverse datasets, supported by enhanced computational power. Full article
Show Figures

Figure 1

Figure 1
<p>An aircraft engine borescope inspection automation framework based on Computer Vision and deep learning.</p>
Full article ">Figure 2
<p>Supervision of objects studied with CVAT.</p>
Full article ">Figure 3
<p>Different objects and defects studied.</p>
Full article ">Figure 4
<p>Example images from Dataset 1 with wide view (<b>top</b>) and Dataset 2 with zoom-in view (<b>bottom</b>).</p>
Full article ">Figure 5
<p>Normalized confusion matrix of Model1_100.</p>
Full article ">Figure 6
<p>Confusion matrix of Model1_100.</p>
Full article ">Figure 7
<p>Normalized confusion matrix of Model2_100.</p>
Full article ">Figure 8
<p>Confusion matrix of Model2_100.</p>
Full article ">Figure 9
<p>Normalized confusion matrix of Model3_50v1.</p>
Full article ">Figure 10
<p>Normalized confusion matrix of Model3_50v2.</p>
Full article ">Figure 11
<p>Precision/recall curves for Model3_50v1 and Model3_50v2.</p>
Full article ">Figure 12
<p>Normalized confusion matrix of Model3_100.</p>
Full article ">Figure 13
<p>Normalized confusion matrix of Model3_2x50.</p>
Full article ">Figure 14
<p>Precision/recall curves for Model3_100 and Model3_2x50.</p>
Full article ">Figure 15
<p>Image examples from Datasets 3, 4, 5, 6.</p>
Full article ">Figure 16
<p>Result examples from Models 3_3x50, 4, 5, and 6 on new images.</p>
Full article ">Figure 17
<p>Decomposition of a synthetic image.</p>
Full article ">Figure 18
<p>Examples of synthetic images, with a single defect and several backgrounds.</p>
Full article ">Figure 19
<p>A synthetic image with a defect on the blade edge.</p>
Full article ">Figure 20
<p>Images generated by DCGAN showing similar characteristics.</p>
Full article ">Figure 21
<p>Comparison between real images and images produced by DCGAN.</p>
Full article ">
18 pages, 5370 KiB  
Article
Research on Blood Cell Image Detection Method Based on Fourier Ptychographic Microscopy
by Mingjing Li, Le Yang, Shu Fang, Xinyang Liu, Haijiao Yun, Xiaoli Wang, Qingyu Du, Ziqing Han and Junshuai Wang
Sensors 2025, 25(3), 882; https://doi.org/10.3390/s25030882 - 31 Jan 2025
Viewed by 491
Abstract
Autonomous Fourier Ptychographic Microscopy (FPM) is a technology widely used in the field of pathology. It is compatible with high resolution and large field-of-view imaging and can observe more image details. Red blood cells play an indispensable role in assessing the oxygen-carrying capacity [...] Read more.
Autonomous Fourier Ptychographic Microscopy (FPM) is a technology widely used in the field of pathology. It is compatible with high resolution and large field-of-view imaging and can observe more image details. Red blood cells play an indispensable role in assessing the oxygen-carrying capacity of the human body and in screening for clinical diagnosis and treatment needs. In this paper, the blood cell data set is constructed based on the FPM system experimental platform. Before training, four enhancement strategies are adopted for the blood cell image data to improve the generalization and robustness of the model. A blood cell detection algorithm based on SCD-YOLOv7 is proposed. Firstly, the C-MP (Convolutional Max Pooling) module and DELAN (Deep Efficient Learning Automotive Network) module are used in the feature extraction network to optimize the feature extraction process and improve the extraction ability of overlapping cell features by considering the characteristics of channels and spatial dimensions. Secondly, through the Sim-Head detection head, the global information of the deep feature map (mean average precision) and the local details of the shallow feature map are fully utilized to improve the performance of the algorithm for small target detection. MAP is a comprehensive indicator for evaluating the performance of object detection algorithms, which measures the accuracy and robustness of a model by calculating the average precision (AP) under different categories or thresholds. Finally, the Focal-EIoU (Focal Extended Intersection over Union) loss function is introduced, which not only improves the convergence speed of the model but also significantly improves the accuracy of blood cell detection. Through quantitative and qualitative analysis of ablation experiments and comparative experimental results, the detection accuracy of the SCD-YOLOv7 algorithm on the blood cell data set reached 92.4%, increased by 7.2%, and the calculation amount was reduced by 14.6 G. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Data set production of Fourier compression microscopy imaging technology.</p>
Full article ">Figure 2
<p>SCD-YOLOv7 network structure.</p>
Full article ">Figure 3
<p>The network structure of YOLOv7.</p>
Full article ">Figure 4
<p>The network structure of SCD-YOLOV7.</p>
Full article ">Figure 5
<p>MP module before and after improvement.</p>
Full article ">Figure 6
<p>ELAN module before and after improvement.</p>
Full article ">Figure 7
<p>Sim-Head detection head.</p>
Full article ">Figure 8
<p>SIMAM attention mechanism.</p>
Full article ">Figure 9
<p>P-R curve of SCD-YOLOv7 model.</p>
Full article ">Figure 10
<p>F1-scores curve of SCD-YOLOv7 model.</p>
Full article ">Figure 11
<p>Blood cell detection effect diagram.</p>
Full article ">
26 pages, 29753 KiB  
Article
YOLO-SSW: An Improved Detection Method for Printed Circuit Board Surface Defects
by Tizheng Yuan, Zhengkuo Jiao and Naizhe Diao
Mathematics 2025, 13(3), 435; https://doi.org/10.3390/math13030435 - 28 Jan 2025
Viewed by 813
Abstract
Accurately recognizing tiny defects on printed circuit boards (PCBs) remains a significant challenge due to the abundance of small targets and complex background textures. To tackle this issue, this article proposes a novel YOLO-SPD-SimAM-WIoU (YOLO-SSW) network, based on an improved YOLOv8 algorithm, to [...] Read more.
Accurately recognizing tiny defects on printed circuit boards (PCBs) remains a significant challenge due to the abundance of small targets and complex background textures. To tackle this issue, this article proposes a novel YOLO-SPD-SimAM-WIoU (YOLO-SSW) network, based on an improved YOLOv8 algorithm, to detect tiny PCB defects with greater accuracy and efficiency. Firstly, a high-resolution feature layer (P2) is incorporated into the feature fusion part to preserve detailed spatial information of small targets. Secondly, a Non-strided Convolution with Space-to-Depth (Conv-SPD) module is incorporated to retain fine-grained information by replacing traditional strided convolutions, which helps maintain spatial resolution. Thirdly, the Simple Parameter-Free Attention Module (SimAM) is integrated into the backbone to enhance feature extraction and noise resistance, focusing the model’s attention on small targets in relevant areas. Finally, the Wise-IoU (WIoU) loss function is adopted to dynamically adjust gradient gains, reducing the impact of low-quality examples, thereby enhancing localization accuracy. Comprehensive evaluations on publicly available PCB defect datasets have demonstrated that the proposed YOLO-SSW model significantly outperforms several state-of-the-art models, achieving a mean average precision (mAP) of 98.4%. Notably, compared to YOLOv8s, YOLO-SSW improved the mAP, precision, and recall by 0.8%, 0.6%, and 0.8%, respectively, confirming its accuracy and effectiveness. Full article
Show Figures

Figure 1

Figure 1
<p>Framework of YOLO-SSW.</p>
Full article ">Figure 2
<p>High-resolution P2 layer architecture.</p>
Full article ">Figure 3
<p>Conv-SPD illustration when scale = 2 for feature map processing.</p>
Full article ">Figure 4
<p>Structure diagram of SimAM attention module.</p>
Full article ">Figure 5
<p>Bounding box regression loss.</p>
Full article ">Figure 6
<p>Six defects on the PCB surface: (<b>a</b>) Missing hole, (<b>b</b>) Mouse bite, (<b>c</b>) Open circuit, (<b>d</b>) Short, (<b>e</b>) Spur, and (<b>f</b>) Spurious copper.</p>
Full article ">Figure 7
<p>Training loss comparison between YOLOv8 and YOLO-SSW: (<b>a</b>) loss curve for YOLOv8, (<b>b</b>) loss curve for YOLO-SSW.</p>
Full article ">Figure 8
<p>Training performance comparison between YOLOv8 and YOLO-SSW. Red lines represent YOLO-SSW, and blue lines represent YOLOv8: (<b>a</b>) mAP@0.5, (<b>b</b>) recall, (<b>c</b>) precision.</p>
Full article ">Figure 9
<p>Detection outcomes of advanced object detection models.</p>
Full article ">
18 pages, 8134 KiB  
Article
YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds
by Jiajun Ren, Haifeng Zhang and Min Yue
Appl. Sci. 2025, 15(3), 1184; https://doi.org/10.3390/app15031184 - 24 Jan 2025
Viewed by 632
Abstract
The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to [...] Read more.
The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to meet modern manufacturing standards. To address these challenges, an enhanced YOLOv8-based algorithm for steel defect detection, termed YOLOv8-WD (weld detection), was developed to improve accuracy and efficiency in identifying defects in steel. We implemented a novel data augmentation strategy with various image transformation techniques to enhance the model’s generalization across different welding scenarios. The Efficient Vision Transformer (EfficientViT) architecture was adopted to optimize feature representation and contextual understanding, improving detection accuracy. Additionally, we integrated the Convolution and Attention Fusion Module (CAFM) to effectively combine local and global features, enhancing the model’s ability to capture diverse feature scales. Dynamic convolution (DyConv) techniques were also employed to generate convolutional kernels based on input images, increasing model flexibility and efficiency. Through comprehensive optimization and tuning, our research achieved a mean average precision (map) at IoU 0.5 of 90.5% across multiple datasets, contributing to improved weld defect detection and offering a reliable automated inspection solution for the industry. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

Figure 1
<p>Laser welding defect detection platform for automobile brake joints.</p>
Full article ">Figure 2
<p>Weld defect data.</p>
Full article ">Figure 3
<p>Extended sample sketch: (<b>a</b>) original image; (<b>b</b>) rotated image; (<b>c</b>) translated image; (<b>d</b>) translated image after vertical flipping; (<b>e</b>) image flipped vertically and horizontally; (<b>f</b>) vrightness-adjusted image.</p>
Full article ">Figure 4
<p>Extended sample sketch.</p>
Full article ">Figure 5
<p>A memory-efficient sandwich layout.</p>
Full article ">Figure 6
<p>Illustration of the proposed convolution and attention fusion module (CAFM).</p>
Full article ">Figure 7
<p>The Grad-CAM comparison diagram (from left to right: original image, YOLOv8, and YOLOv8-WD visualization heatmaps).</p>
Full article ">Figure 8
<p>Comparison of accuracy curve.</p>
Full article ">Figure 9
<p>Recall curve comparison.</p>
Full article ">Figure 10
<p>mAP@50 curve comparison.</p>
Full article ">Figure 11
<p>Loss curve comparison.</p>
Full article ">Figure 12
<p>Comparison of mAP@50 for different defects on the validation set before and after improvements.</p>
Full article ">Figure 13
<p>Test set results (from left to right: original image, YOLOv8 detection image, and YOLOv8-WD detection image). (<b>a</b>) Depression recognition map comparison; (<b>b</b>) slag recognition map comparison; (<b>c</b>) scratch recognition map comparison (content in the pink box); (<b>d</b>) bubble recognition map comparison.</p>
Full article ">Figure 13 Cont.
<p>Test set results (from left to right: original image, YOLOv8 detection image, and YOLOv8-WD detection image). (<b>a</b>) Depression recognition map comparison; (<b>b</b>) slag recognition map comparison; (<b>c</b>) scratch recognition map comparison (content in the pink box); (<b>d</b>) bubble recognition map comparison.</p>
Full article ">
20 pages, 5288 KiB  
Article
A Study on Multi-Scale Behavior Recognition of Dairy Cows in Complex Background Based on Improved YOLOv5
by Zheying Zong, Zeyu Ban, Chunguang Wang, Shuai Wang, Wenbo Yuan, Chunhui Zhang, Lide Su and Ze Yuan
Agriculture 2025, 15(2), 213; https://doi.org/10.3390/agriculture15020213 - 19 Jan 2025
Viewed by 699
Abstract
The daily behaviors of dairy cows, including standing, drinking, eating, and lying down, are closely associated with their physical health. Efficient and accurate recognition of dairy cow behaviors is crucial for timely monitoring of their health status and enhancing the economic efficiency of [...] Read more.
The daily behaviors of dairy cows, including standing, drinking, eating, and lying down, are closely associated with their physical health. Efficient and accurate recognition of dairy cow behaviors is crucial for timely monitoring of their health status and enhancing the economic efficiency of farms. To address the challenges posed by complex scenarios and significant variations in target scales in dairy cow behavior recognition within group farming environments, this study proposes an enhanced recognition method based on YOLOv5. Four Shuffle Attention (SA) modules are integrated into the upsampling and downsampling processes of the YOLOv5 model’s neck network to enhance deep feature extraction of small-scale cow targets and focus on feature information, while maintaining network complexity and real-time performance. The C3 module of the model was enhanced by incorporating Deformable convolution (DCNv3), which improves the accuracy of cow behavior characteristic identification. Finally, the original detection head was replaced with a Dynamic Detection Head (DyHead) to improve the efficiency and accuracy of cow behavior detection across different scales in complex environments. An experimental dataset comprising complex backgrounds, multiple behavior categories, and multi-scale targets was constructed for comprehensive validation. The experimental results demonstrate that the improved YOLOv5 model achieved a mean Average Precision (mAP) of 97.7%, representing a 3.7% improvement over the original YOLOv5 model. Moreover, it outperformed comparison models, including YOLOv4, YOLOv3, and Faster R-CNN, in complex background scenarios, multi-scale behavior detection, and behavior type discrimination. Ablation experiments further validate the effectiveness of the SA, DCNv3, and DyHead modules. The research findings offer a valuable reference for real-time monitoring of cow behavior in complex environments throughout the day. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Plan of the dairy cow test site. Note: Camera 1 (Dahua P40A20-WT-1) captures the outdoor activity areas I and II of the cows; camera 2 (Xiaomi Smart Camera 3 Pan and Tilt Version) captures the indoor feeding area of the cows.</p>
Full article ">Figure 2
<p>Dairy cow test site. (<b>a</b>) Diagram of camera in cow feeding area; (<b>b</b>) diagram of cow activity area camera.</p>
Full article ">Figure 3
<p>Example of a cow behavior shot. (<b>a</b>) Surveillance video scene; (<b>b</b>) example of cow behavior.</p>
Full article ">Figure 4
<p>Behavioral labeling analysis of cows.</p>
Full article ">Figure 5
<p>Improved network structure of YOLOv5 model.</p>
Full article ">Figure 6
<p>Shuffle Attention module structure.</p>
Full article ">Figure 7
<p>Principle of transformable convolution. (<b>a</b>) Conventional convolution; (<b>b</b>) deformable convolution; (<b>c</b>) special deformable convolution; (<b>d</b>) special deformable convolution. Green indicates the sampling points of the regular convolution, and blue indicates the dynamically sampled points of the deformable convolution.</p>
Full article ">Figure 8
<p>DCNv3 convolutional implementation process.</p>
Full article ">Figure 9
<p>DyHead structure.</p>
Full article ">Figure 10
<p>Loss function varies during training.</p>
Full article ">Figure 11
<p>Model accuracy changes during training.</p>
Full article ">Figure 12
<p>Improved recognition effect of the YOLOv5 model. (<b>a</b>) Standing behavior of outdoor dairy cows; (<b>b</b>) drinking behavior of outdoor dairy cows; (<b>c</b>) lying behavior of outdoor dairy cows; (<b>d</b>) feeding behavior of indoor dairy cows.</p>
Full article ">Figure 13
<p>Visual comparison of original model and attention model. (<b>a</b>) No attention mechanism; (<b>b</b>) SE attention mechanism; (<b>c</b>) EMA attention mechanism; (<b>d</b>) SA attention mechanism. Red indicates the highest attention weight; yellow indicates a medium-high attention weight; green indicates a medium attention weight; blue indicates a low attention weight; dark blue indicates the lowest attention weight.</p>
Full article ">
16 pages, 3776 KiB  
Article
MDA-DETR: Enhancing Offending Animal Detection with Multi-Channel Attention and Multi-Scale Feature Aggregation
by Haiyan Zhang, Huiqi Li, Guodong Sun and Feng Yang
Animals 2025, 15(2), 259; https://doi.org/10.3390/ani15020259 - 17 Jan 2025
Viewed by 514
Abstract
Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly [...] Read more.
Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly in obscured or blurry nighttime images. This article introduces Multi-Channel Coordinated Attention and Multi-Dimension Feature Aggregation (MDA-DETR). It integrates multi-scale features for enhanced detection accuracy, employing a Multi-Channel Coordinated Attention (MCCA) mechanism to incorporate location, semantic, and long-range dependency information and a Multi-Dimension Feature Aggregation Module (DFAM) for cross-scale feature aggregation. Additionally, the VariFocal Loss function is utilized to assign pixel weights, enhancing detail focus and maintaining accuracy. In the dataset section, this article uses a dataset from the Northeast China Tiger and Leopard National Park, which includes images of six common offending animal species. In the comprehensive experiments on the dataset, the mAP50 index of MDA-DETR was 1.3%, 0.6%, 0.3%, 3%, 1.1%, and 0.5% higher than RT-DETR-r18, yolov8n, yolov9-C, DETR, Deformable-detr, and DCA-yolov8, respectively, indicating that MDA-DETR is superior to other advanced methods. Full article
(This article belongs to the Special Issue Animal–Computer Interaction: Advances and Opportunities)
Show Figures

Figure 1

Figure 1
<p>The overall structure of MDA-DETR.</p>
Full article ">Figure 2
<p>The structure diagram of MCCA.</p>
Full article ">Figure 3
<p>Structure of Fuse and RepC3 in DFAM.</p>
Full article ">Figure 4
<p>Comparison of ablation experiments. (The orange animal in the upper right corner of the third column is the dataset icon.)</p>
Full article ">Figure 5
<p>Comparison between ours and advanced methods (badger, black bear, leopard cat).</p>
Full article ">Figure 6
<p>Comparison between ours and advanced methods (red fox, yellow weasel, wild boar).</p>
Full article ">
Back to TopTop