Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (207)

Search Parameters:
Keywords = Comprehensive-YOLOv5

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 1370 KiB  
Article
FL-YOLOv8: Lightweight Object Detector Based on Feature Fusion
by Ying Xue, Qijin Wang, Yating Hu, Yu Qian, Long Cheng and Hongqiang Wang
Electronics 2024, 13(23), 4653; https://doi.org/10.3390/electronics13234653 - 25 Nov 2024
Abstract
In recent years, anchor-free object detectors have become predominant in deep learning, the YOLOv8 model as a real-time object detector based on anchor-free frames is universal and influential, it efficiently detects objects across multiple scales. However, the generalization performance of the model is [...] Read more.
In recent years, anchor-free object detectors have become predominant in deep learning, the YOLOv8 model as a real-time object detector based on anchor-free frames is universal and influential, it efficiently detects objects across multiple scales. However, the generalization performance of the model is lacking, and the feature fusion within the neck module overly relies on its structural design and dataset size, and it is particularly difficult to localize and detect small objects. To address these issues, we propose the FL-YOLOv8 object detector, which is improved based on YOLOv8s. Firstly, we introduce the FSDI module in the neck, enhancing semantic information across all layers and incorporating rich detailed features through straightforward layer-hopping connections. This module integrates both high-level and low-level information to enhance the accuracy and efficiency of image detection. Meanwhile, the structure of the model was optimized and designed, and the LSCD module is constructed in the detection head; adopting a lightweight shared convolutional detection head reduces the number of parameters and computation of the model by 19% and 10%, respectively. Our model achieves a comprehensive performance of 45.5% on the COCO generalized dataset, surpassing the benchmark by 0.8 percentage points. To further validate the effectiveness of the method, experiments were also performed on specific domain urine sediment data (FCUS22), and the results on category detection also better justify the FL-YOLOv8 object detection algorithm. Full article
27 pages, 10500 KiB  
Article
YOLOv8-GDCI: Research on the Phytophthora Blight Detection Method of Different Parts of Chili Based on Improved YOLOv8 Model
by Yulong Duan, Weiyu Han, Peng Guo and Xinhua Wei
Agronomy 2024, 14(11), 2734; https://doi.org/10.3390/agronomy14112734 - 20 Nov 2024
Viewed by 323
Abstract
Smart farms are crucial in modern agriculture, but current object detection algorithms cannot detect chili Phytophthora blight accurately. To solve this, we introduced the YOLOv8-GDCI model, which can detect the disease on leaves, fruits, and stem bifurcations. The model uses RepGFPN for feature [...] Read more.
Smart farms are crucial in modern agriculture, but current object detection algorithms cannot detect chili Phytophthora blight accurately. To solve this, we introduced the YOLOv8-GDCI model, which can detect the disease on leaves, fruits, and stem bifurcations. The model uses RepGFPN for feature fusion, Dysample upsampling for accuracy, CA attention for feature capture, and Inner-MPDIoU loss for small object detection. In addition, we also created a dataset of chili Phytophthora blight on leaves, fruits, and stem bifurcations, and conducted comparative experiments. The results manifest that the YOLOv8-GDCI model demonstrates outstanding performance across a gamut of comprehensive indicators. In comparison with the YOLOv8n model, the YOLOv8-GDCI model demonstrates an improvement of 0.9% in precision, an increase of 1.8% in recall, and a remarkable enhancement of 1.7% in average precision. Although the FPS decreases slightly, it still exceeds the industry standard for real-time object detection (FPS > 60), thus meeting the requirements for real-time detection. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Image data enhancement. (<b>a</b>) Original image, (<b>b</b>) Random flipping, (<b>c</b>) Gaussian noise, (<b>d</b>) Random clipping, (<b>e</b>) brightness change.</p>
Full article ">Figure 2
<p>The distribution chart of dataset labels. (<b>a</b>) The amount of data in the training set, and how many instances there are for each category. (<b>b</b>) The size and number of bounding boxes. (<b>c</b>) The position of the center point relative to the entire image. (<b>d</b>) The aspect ratio of the target in the image compared to the entire image.</p>
Full article ">Figure 3
<p>The structure of YOLOv8.</p>
Full article ">Figure 4
<p>The structure of PAN (<b>a</b>) and GFPN (<b>b</b>).</p>
Full article ">Figure 5
<p>Multi-scale feature fusion network structure and module design. (<b>a</b>) RepGFPN removes the up-sample connection and uses the CSPStage module for feature fusion. (<b>b</b>) The CSPStage module performs feature fusion operations. ×N means that there are N structures, which are the same as those in the dashed box. (<b>c</b>) The technology of 3 × 3 Rep represents a method for reparameterizing models, leading to decreased computational requirements and enhanced model efficiency.</p>
Full article ">Figure 6
<p>The design of DySample’s modules and its network architecture. (<b>a</b>) X represents the input feature, <span class="html-italic">X</span>′ represents the upsampled feature, and S represents the sampling set. The sampling point generator produces a sampling set, which is then utilized to resample the input feature through the grid sampling function. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>X</mi> <mn>1</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>X</mi> <mn>2</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>X</mi> <mn>3</mn> </msub> </semantics></math> represent offsets with a size of <math display="inline"><semantics> <mrow> <mn>2</mn> <mi>g</mi> <msup> <mi>s</mi> <mn>2</mn> </msup> <mo>×</mo> <mi>H</mi> <mo>×</mo> <mi>W</mi> </mrow> </semantics></math>. O represents the generated offset. G represents the original grid and <math display="inline"><semantics> <mi>σ</mi> </semantics></math> denotes the sigmoid function.</p>
Full article ">Figure 7
<p>The structure of SE (<b>a</b>) and CBAM (<b>b</b>).</p>
Full article ">Figure 8
<p>The structure of CA.</p>
Full article ">Figure 9
<p>The network structure diagram of our YOLOv8-GDCI algorithm.</p>
Full article ">Figure 10
<p>Visual Demonstration Using Different Feature Networks. (<b>a</b>) Truth (<b>b</b>) PAN (<b>c</b>) GFPN (<b>d</b>) BiFPN (<b>e</b>) RepGFPN.</p>
Full article ">Figure 11
<p>Contrast experiments of different loss functions in loss values.</p>
Full article ">Figure 12
<p>The visualization results in different environments.</p>
Full article ">Figure 12 Cont.
<p>The visualization results in different environments.</p>
Full article ">
19 pages, 4118 KiB  
Article
Complex Indoor Human Detection with You Only Look Once: An Improved Network Designed for Human Detection in Complex Indoor Scenes
by Yufeng Xu and Yan Fu
Appl. Sci. 2024, 14(22), 10713; https://doi.org/10.3390/app142210713 - 19 Nov 2024
Viewed by 442
Abstract
Indoor human detection based on artificial intelligence helps to monitor the safety status and abnormal activities of the human body at any time. However, the complex indoor environment and background pose challenges to the detection task. The YOLOv8 algorithm is a cutting-edge technology [...] Read more.
Indoor human detection based on artificial intelligence helps to monitor the safety status and abnormal activities of the human body at any time. However, the complex indoor environment and background pose challenges to the detection task. The YOLOv8 algorithm is a cutting-edge technology in the field of object detection, but it is still affected by indoor low-light environments and large changes in human scale. To address these issues, this article proposes a novel method based on YOLOv8 called CIHD-YOLO, which is specifically designed for indoor human detection. The method proposed in this article combines the spatial pyramid pooling of the backbone with an efficient partial self-attention, enabling the network to effectively capture long-range dependencies and establish global correlations between features, obtaining feature information at different scales. At the same time, the GSEAM module and GSCConv were introduced into the neck network to compensate for the loss caused by differences in lighting levels by combining depth-wise separable convolution and residual connections, enabling it to extract effective features from visual data with poor illumination levels. A dataset specifically designed for indoor human detection, the HCIE dataset, was constructed and used to evaluate the model proposed in this paper. The research results show that compared with the original YOLOv8s framework, the detection accuracy has been improved by 2.67%, and the required floating-point operations have been reduced. The comprehensive case analysis and comparative evaluation highlight the superiority and effectiveness of this method in complex indoor human detection tasks. Full article
Show Figures

Figure 1

Figure 1
<p>Network architecture of YOLOv8.</p>
Full article ">Figure 2
<p>Network architecture of CIHD-YOLO.</p>
Full article ">Figure 3
<p>Network architecture of spatial pyramid pooling with effective partial self-attention (SPPEPSA).</p>
Full article ">Figure 4
<p>Network architecture of RepNCSP and RepNBottleneck. (<b>a</b>) The network architecture of RepNCSP; (<b>b</b>) The network architecture of RepNBottleneck.</p>
Full article ">Figure 5
<p>Network architecture of generalized separated and enhancement aggregation network (GSEAM).</p>
Full article ">Figure 6
<p>Network architecture of global spatial and channel reconstruction convolution (GSCConv).</p>
Full article ">Figure 7
<p>Network architecture of spatial and channel reconstruction convolution (SCConv).</p>
Full article ">Figure 8
<p>Example of indoor human detection images in HCIE dataset.</p>
Full article ">Figure 9
<p>Dataset label distribution. (<b>a</b>) The position of the bounding box center point relative to the entire image; (<b>b</b>) The aspect ratio of the target in the image relative to the entire image.</p>
Full article ">Figure 10
<p>Box loss curve for model training.</p>
Full article ">Figure 11
<p>Curves of mAP50 and mAP50-95 during the training process.</p>
Full article ">Figure 12
<p>Test image set under low illumination.</p>
Full article ">Figure 13
<p>Test image set of small-scale human body.</p>
Full article ">Figure 14
<p>Test image set rotated by 30° angle.</p>
Full article ">
18 pages, 5301 KiB  
Article
Research and Design of an Active Light Source System for UAVs Based on Light Intensity Matching Model
by Rui Ming, Tao Wu, Zhiyan Zhou, Haibo Luo and Shahbaz Gul Hassan
Drones 2024, 8(11), 683; https://doi.org/10.3390/drones8110683 - 19 Nov 2024
Viewed by 352
Abstract
The saliency feature is a key factor in achieving vision-based tracking for multi-UAV control. However, due to the complex and variable environments encountered during multi-UAV operations—such as changes in lighting conditions and scale variations—the UAV’s visual features may degrade, especially under high-speed movement, [...] Read more.
The saliency feature is a key factor in achieving vision-based tracking for multi-UAV control. However, due to the complex and variable environments encountered during multi-UAV operations—such as changes in lighting conditions and scale variations—the UAV’s visual features may degrade, especially under high-speed movement, ultimately resulting in failure of the vision tracking task and reducing the stability and robustness of swarm flight. Therefore, this paper proposes an adaptive active light source system based on light intensity matching to address the issue of visual feature loss caused by environmental light intensity and scale variations in multi-UAV collaborative navigation. The system consists of three components: an environment sensing and control module, a variable active light source module, and a light source power module. This paper first designs the overall framework of the active light source system, detailing the functions of each module and their collaborative working principles. Furthermore, optimization experiments are conducted on the variable active light source module. By comparing the recognition effects of the variable active light source module under different parameters, the best configuration is selected. In addition, to improve the robustness of the active light source system under different lighting conditions, this paper also constructs a light source color matching model based on light intensity matching. By collecting and comparing visible light images of different color light sources under various intensities and constructing the light intensity matching model using the comprehensive peak signal-to-noise ratio parameter, the model is optimized to ensure the best vision tracking performance under different lighting conditions. Finally, to validate the effectiveness of the proposed active light source system, quantitative and qualitative recognition comparison experiments were conducted in eight different scenarios with UAVs equipped with active light sources. The experimental results show that the UAV equipped with an active light source has improved the recall of yoloV7 and RT-DETR recognition algorithms by 30% and 29.6%, the mAP50 by 21% and 19.5%, and the recognition accuracy by 13.1% and 13.6, respectively. Qualitative experiments also demonstrated that the active light source effectively improved the recognition success rate under low lighting conditions. Extensive qualitative and quantitative experiments confirm that the UAV active light source system based on light intensity matching proposed in this paper effectively enhances the effectiveness and robustness of vision-based tracking for multi-UAVs, particularly in complex and variable environments. This research provides an efficient and computationally effective solution for vision-based multi-UAV systems, further enhancing the visual tracking capabilities of multi-UAVs under complex conditions. Full article
Show Figures

Figure 1

Figure 1
<p>Light intensity matched active light source system for UAVs. Note: 1. 5V-DC power supply interface 2. MCU 3. Light intensity sensor module 4. Red laser constant voltage control module 5. Blue laser constant voltage control module 6. Light cover 7. Red laser emission module 8. Blue laser emission module.</p>
Full article ">Figure 2
<p>Active light source device system workflow.</p>
Full article ">Figure 3
<p>Different sizes of light shield.</p>
Full article ">Figure 4
<p>Experimental principle of active light source shield size selection.</p>
Full article ">Figure 5
<p><math display="inline"><semantics> <mrow> <mi>R</mi> <mi>D</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>R</mi> <mi>E</mi> <mi>L</mi> </mrow> </semantics></math> for different sizes of light shield.</p>
Full article ">Figure 6
<p>Principle of optimal light source color selection experiment.</p>
Full article ">Figure 7
<p>Experimental setup for optimal light source color selection: blue light source (<b>a</b>), red light source (<b>b</b>).</p>
Full article ">Figure 8
<p>Relationship between <span class="html-italic">PSNR</span> values of different colors and light intensity changes.</p>
Full article ">Figure 9
<p><span class="html-italic">CREC-PSNR</span> variation with luminance.</p>
Full article ">Figure 10
<p>Nonlinear fitting curves for red and blue.</p>
Full article ">Figure 11
<p>Comparison image of the UAV equipped with the active light source and the conventional UAV: UAV with active light source (<b>a</b>), UAV without active light source (<b>b</b>). Note: 1. Active light source 2. Support module 3. UAV.</p>
Full article ">Figure 12
<p>Data comparison of UAVs with and without active light source in various scenarios.</p>
Full article ">
22 pages, 5176 KiB  
Article
A Reparameterization Feature Redundancy Extract Network for Unmanned Aerial Vehicles Detection
by Shijie Zhang, Xu Yang, Chao Geng and Xinyang Li
Remote Sens. 2024, 16(22), 4226; https://doi.org/10.3390/rs16224226 - 13 Nov 2024
Viewed by 366
Abstract
In unmanned aerial vehicles (UAVs) detection, challenges such as occlusion, complex backgrounds, motion blur, and inference time often lead to false detections and missed detections. General object detection frameworks encounter difficulties in adequately tackling these challenges, leading to substantial information loss during network [...] Read more.
In unmanned aerial vehicles (UAVs) detection, challenges such as occlusion, complex backgrounds, motion blur, and inference time often lead to false detections and missed detections. General object detection frameworks encounter difficulties in adequately tackling these challenges, leading to substantial information loss during network downsampling, inadequate feature fusion, and being unable to meet real-time requirements. In this paper, we propose a Real-Time Small Object Detection YOLO (RTSOD-YOLO) model to tackle the various challenges faced in UAVs detection. We further enhance the adaptive nature of the Adown module by incorporating an adaptive spatial attention mechanism. This mechanism processes the downsampled feature maps, enabling the model to better focus on key regions. Secondly, to address the issue of insufficient feature fusion, we employ combined serial and parallel triple feature encoding (TFE). This approach fuses scale-sequence features from both shallow features and twice-encoded features, resulting in a new small-scale object detection layer. While enhancing the global context awareness of the existing detection layers, this also enriches the small-scale object detection layer with detailed information. Since rich redundant features often ensure a comprehensive understanding of the input, which is a key characteristic of deep neural networks, we propose a more efficient redundant feature generation module. This module generates more feature maps with fewer parameters. Additionally, we introduce reparameterization techniques to compensate for potential feature loss while further improving the model’s inference speed. Experimental results demonstrate that our proposed RTSOD-YOLO achieves superior detection performance, with mAP50/mAP50:95 reaching 97.3%/51.7%, which represents improvement of 3%/3.5% over YOLOv8, and 2.6%/0.1% higher than YOLOv10. Additionally, it has the lowest parameter count and FLOPs, making it highly efficient in terms of computational resources. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of RTSOD-YOLO.</p>
Full article ">Figure 2
<p>Convolution and Adown downsampling. (<b>a</b>) Convolution. (<b>b</b>) Adown.</p>
Full article ">Figure 3
<p>C2f and RFR-Block module. (<b>a</b>) Structure of C2f. (<b>b</b>) Structure of RFR-Block.</p>
Full article ">Figure 4
<p>RepConv schematic diagram.</p>
Full article ">Figure 5
<p>Triple feature encoding module.</p>
Full article ">Figure 6
<p>Scale-sequence feature fusion module.</p>
Full article ">Figure 7
<p>Separated and Enhancement Attention Module.</p>
Full article ">Figure 8
<p>The unmanned aerial vehicle dataset.</p>
Full article ">Figure 9
<p>Detection performance in different scenarios.</p>
Full article ">Figure 10
<p>Test dataset augmented with random erasing.</p>
Full article ">Figure A1
<p>Summary of training.</p>
Full article ">Figure A2
<p>Confusion matrices of different models in different scenarios (occlusion, strong light irradiation, and dim scenes). (<b>a</b>) Confusion matrix of YOLOv5. (<b>b</b>) Confusion matrix of YOLOv8. (<b>c</b>) Confusion matrix of YOLOv9. (<b>d</b>) Confusion matrix of YOLOv10. (<b>e</b>) Confusion matrix of RTSOD-YOLO.</p>
Full article ">Figure A2 Cont.
<p>Confusion matrices of different models in different scenarios (occlusion, strong light irradiation, and dim scenes). (<b>a</b>) Confusion matrix of YOLOv5. (<b>b</b>) Confusion matrix of YOLOv8. (<b>c</b>) Confusion matrix of YOLOv9. (<b>d</b>) Confusion matrix of YOLOv10. (<b>e</b>) Confusion matrix of RTSOD-YOLO.</p>
Full article ">
17 pages, 1906 KiB  
Article
Advancing Indoor Epidemiological Surveillance: Integrating Real-Time Object Detection and Spatial Analysis for Precise Contact Rate Analysis and Enhanced Public Health Strategies
by Ali Baligh Jahromi, Koorosh Attarian, Ali Asgary and Jianhong Wu
Int. J. Environ. Res. Public Health 2024, 21(11), 1502; https://doi.org/10.3390/ijerph21111502 - 13 Nov 2024
Viewed by 506
Abstract
In response to escalating concerns about the indoor transmission of respiratory diseases, this study introduces a sophisticated software tool engineered to accurately determine contact rates among individuals in enclosed spaces—essential for public health surveillance and disease transmission mitigation. The tool applies YOLOv8, a [...] Read more.
In response to escalating concerns about the indoor transmission of respiratory diseases, this study introduces a sophisticated software tool engineered to accurately determine contact rates among individuals in enclosed spaces—essential for public health surveillance and disease transmission mitigation. The tool applies YOLOv8, a cutting-edge deep learning model that enables precise individual detection and real-time tracking from video streams. An innovative feature of this system is its dynamic circular buffer zones, coupled with an advanced 2D projective transformation to accurately overlay video data coordinates onto a digital layout of the physical environment. By analyzing the overlap of these buffer zones and incorporating detailed heatmap visualizations, the software provides an in-depth quantification of contact instances and spatial contact patterns, marking an advancement over traditional contact tracing and contact counting methods. These enhancements not only improve the accuracy and speed of data analysis but also furnish public health officials with a comprehensive framework to develop more effective non-pharmaceutical infection control strategies. This research signifies a crucial evolution in epidemiological tools, transitioning from manual, simulation, and survey-based tracking methods to automated, real time, and precision-driven technologies that integrate advanced visual analytics to better understand and manage disease transmission in indoor settings. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart; steps include initialization, object detection using YOLOv8, real-time human tracking, dynamic buffer zones, spatial analysis, people counting and density analysis, and data handling and visualization.</p>
Full article ">Figure 2
<p>Detecting and tracking individuals in indoor environment. Count of 5 individuals each with their track line (green line) and track id (yellow numbers).</p>
Full article ">Figure 3
<p>Transformation of occupants in 2D floor plan.</p>
Full article ">Figure 4
<p>Interaction duration analysis across tracked individuals.</p>
Full article ">Figure 5
<p>Comparative spatial interaction heatmaps depicting density and movement patterns at time 1 (second) and time 31 (second) during our experiment.</p>
Full article ">
20 pages, 6129 KiB  
Article
Optimized YOLOv5 Architecture for Superior Kidney Stone Detection in CT Scans
by Khasanov Asliddin Abdimurotovich and Young-Im Cho
Electronics 2024, 13(22), 4418; https://doi.org/10.3390/electronics13224418 - 11 Nov 2024
Viewed by 569
Abstract
The early and accurate detection of kidney stones is crucial for effective treatment and improved patient outcomes. This paper proposes a novel modification of the YOLOv5 model, specifically tailored for detecting kidney stones in CT images. Our approach integrates the squeeze-and-excitation (SE) block [...] Read more.
The early and accurate detection of kidney stones is crucial for effective treatment and improved patient outcomes. This paper proposes a novel modification of the YOLOv5 model, specifically tailored for detecting kidney stones in CT images. Our approach integrates the squeeze-and-excitation (SE) block within the C3 block of the YOLOv5m architecture, thereby enhancing the ability of the model to recalibrate channel-wise dependencies and capture intricate feature relationships. This modification leads to significant improvements in the detection accuracy and reliability. Extensive experiments were conducted to evaluate the performance of the proposed model against standard YOLOv5 variants (nano-sized, small, and medium-sized). The results demonstrate that our model achieves superior performance metrics, including higher precision, recall, and mean average precision (mAP), while maintaining a balanced inference speed and model size suitable for real-time applications. The proposed methodology incorporates advanced noise reduction and data augmentation techniques to ensure the preservation of critical features and enhance the robustness of the training dataset. Additionally, a novel color-coding scheme for bounding boxes improves the clarity and differentiation of the detected stones, facilitating better analysis and understanding of the detection results. Our comprehensive evaluation using essential metrics, such as precision, recall, mAP, and intersection over union (IoU), underscores the efficacy of the proposed model for detecting kidney stones. The modified YOLOv5 model offers a robust, accurate, and efficient solution for medical imaging applications and represents a significant advancement in computer-aided diagnosis and kidney stone detection. Full article
Show Figures

Figure 1

Figure 1
<p>Architecture of YOLOv5 with a C3 block.</p>
Full article ">Figure 2
<p>Bottleneck blocks, each consisting of two convolutional layers with a residual connection, contribute to parameter reduction while preserving the representational capacity of the model. (<b>a</b>) C3 block with three convolutions. (<b>b</b>) SE block. (<b>c</b>) Bottleneck of the C3 block.</p>
Full article ">Figure 3
<p>Comparative analysis of different YOLOv5 model variants (nano-sized, small, and medium) along with the proposed modified YOLOv5 model for the detection of kidney stones in CT images.</p>
Full article ">Figure 4
<p>Different coloring approaches for boundary boxes of detected objects.</p>
Full article ">
24 pages, 4899 KiB  
Article
Enhancing YOLOv8’s Performance in Complex Traffic Scenarios: Optimization Design for Handling Long-Distance Dependencies and Complex Feature Relationships
by Bingyu Li, Qiao Meng, Xin Li, Zhijie Wang, Xin Liu and Siyuan Kong
Electronics 2024, 13(22), 4411; https://doi.org/10.3390/electronics13224411 - 11 Nov 2024
Viewed by 549
Abstract
In recent years, the field of deep learning and computer vision has increasingly focused on the problem of vehicle target detection, becoming the forefront of many technological innovations. YOLOv8, as an efficient vehicle target detection model, has achieved good results in many scenarios. [...] Read more.
In recent years, the field of deep learning and computer vision has increasingly focused on the problem of vehicle target detection, becoming the forefront of many technological innovations. YOLOv8, as an efficient vehicle target detection model, has achieved good results in many scenarios. However, when faced with complex traffic scenarios, such as occluded targets, small target detection, changes in lighting, and variable weather conditions, YOLOv8 still has insufficient detection accuracy and robustness. To address these issues, this paper delves into the optimization strategies of YOLOv8 in the field of vehicle target detection, focusing on the EMA module in the backbone part and replacing the original SPPF module with focal modulation technology, all of which effectively improved the model’s performance. At the same time, modifications to the head part were approached with caution to avoid unnecessary interference with the original design. The experiment used the UA-DETRAC dataset, which contains a variety of traffic scenarios, a rich variety of vehicle types, and complex dynamic environments, making it suitable for evaluating and validating the performance of traffic monitoring systems. The 5-fold cross-validation method was used to ensure the reliability and comprehensiveness of the evaluation results. The final results showed that the improved model’s precision rate increased from 0.859 to 0.961, the recall rate from 0.83 to 0.908, and the mAP50 from 0.881 to 0.962. Meanwhile, the optimized YOLOv8 model demonstrated strong robustness in terms of detection accuracy and the ability to adapt to complex environments. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

Figure 1
<p>Yolov8 Structural Framework.</p>
Full article ">Figure 2
<p>SPPF Structural Framework.</p>
Full article ">Figure 3
<p>Self-Attention.</p>
Full article ">Figure 4
<p>Focal Modulation.</p>
Full article ">Figure 5
<p>EMA Structural Framework.</p>
Full article ">Figure 6
<p>Reconstructing the Backbone Network.</p>
Full article ">Figure 7
<p>Comparison of Original and Our Data.</p>
Full article ">Figure 8
<p>Five-Fold Cross-Validation.</p>
Full article ">Figure 9
<p>Results.</p>
Full article ">Figure 10
<p>P_curve.</p>
Full article ">Figure 11
<p>Congested Scene.</p>
Full article ">Figure 12
<p>Night Scene.</p>
Full article ">Figure 13
<p>Rainy Scene.</p>
Full article ">Figure 14
<p>Haze Scene.</p>
Full article ">Figure 15
<p>YOLOv8 Detection1.</p>
Full article ">Figure 16
<p>Our Detection1.</p>
Full article ">Figure 17
<p>EfficientDet Detection1.</p>
Full article ">Figure 18
<p>Rank-DETR Detection1.</p>
Full article ">Figure 19
<p>YOLOv8 Detection2.</p>
Full article ">Figure 20
<p>Our Detection2.</p>
Full article ">Figure 21
<p>EfficientDet Detection2.</p>
Full article ">Figure 22
<p>Rank-DETR Detection2.</p>
Full article ">Figure 23
<p>YOLOv8 Detection3.</p>
Full article ">Figure 24
<p>Our Detection3.</p>
Full article ">Figure 25
<p>EfficientDet Detection3.</p>
Full article ">Figure 26
<p>Rank-DETR Detection3.</p>
Full article ">Figure 27
<p>YOLOv8 Detection4.</p>
Full article ">Figure 28
<p>Our Detection4.</p>
Full article ">Figure 29
<p>EfficientDet Detection4.</p>
Full article ">Figure 30
<p>Rank-DETR Detection4.</p>
Full article ">Figure 31
<p>Original Image 1.</p>
Full article ">Figure 32
<p>Original Image 2.</p>
Full article ">Figure 33
<p>Original Image 3.</p>
Full article ">Figure 34
<p>YOLOv8 Heatmap1.</p>
Full article ">Figure 35
<p>YOLOv8 Heatmap2.</p>
Full article ">Figure 36
<p>YOLOv8 Heatmap3.</p>
Full article ">Figure 37
<p>Our Heatmap1.</p>
Full article ">Figure 38
<p>Our Heatmap2.</p>
Full article ">Figure 39
<p>Our Heatmap3.</p>
Full article ">
16 pages, 4399 KiB  
Article
Lightweight Vehicle Detection Based on Mamba_ViT
by Ze Song, Yuhai Wang, Shuobo Xu, Peng Wang and Lele Liu
Sensors 2024, 24(22), 7138; https://doi.org/10.3390/s24227138 - 6 Nov 2024
Viewed by 362
Abstract
Vehicle detection algorithms are essential for intelligent traffic management and autonomous driving systems. Current vehicle detection algorithms largely rely on deep learning techniques, enabling the automatic extraction of vehicle image features through convolutional neural networks (CNNs). However, in real traffic scenarios, relying only [...] Read more.
Vehicle detection algorithms are essential for intelligent traffic management and autonomous driving systems. Current vehicle detection algorithms largely rely on deep learning techniques, enabling the automatic extraction of vehicle image features through convolutional neural networks (CNNs). However, in real traffic scenarios, relying only on a single feature extraction unit makes it difficult to fully understand the vehicle information in the traffic scenario, thus affecting the vehicle detection effect. To address this issue, we propose a lightweight vehicle detection algorithm based on Mamba_ViT. First, we introduce a new feature extraction architecture (Mamba_ViT) that separates shallow and deep features and processes them independently to obtain a more complete contextual representation, ensuring comprehensive and accurate feature extraction. Additionally, a multi-scale feature fusion mechanism is employed to enhance the integration of shallow and deep features, leading to the development of a vehicle detection algorithm named Mamba_ViT_YOLO. The experimental results on the UA-DETRAC dataset show that our proposed algorithm improves mAP@50 by 3.2% compared to the latest YOLOv8 algorithm, while using only 60% of the model parameters. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Mamba_ViT_YOLO.</p>
Full article ">Figure 2
<p>Mamba_ViT.</p>
Full article ">Figure 3
<p>iRMB.</p>
Full article ">Figure 4
<p>VMamba.</p>
Full article ">Figure 5
<p>Bidirectional feature pyramid network structure.</p>
Full article ">Figure 6
<p>Sample images from the UA-DETRAC dataset.</p>
Full article ">Figure 7
<p>Heat map comparison results: (<b>a</b>,<b>c</b>,<b>e</b>) for YOLO8 heat map (<b>b</b>,<b>d</b>,<b>f</b>) for Mamba_ViT heat map.</p>
Full article ">Figure 8
<p>Comparison of detection results: (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) for YOLO8 detection results, (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) for Mamba_ViT_YOLO detection results.</p>
Full article ">Figure 8 Cont.
<p>Comparison of detection results: (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) for YOLO8 detection results, (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) for Mamba_ViT_YOLO detection results.</p>
Full article ">
16 pages, 5783 KiB  
Article
LG-YOLOv8: A Lightweight Safety Helmet Detection Algorithm Combined with Feature Enhancement
by Zhipeng Fan, Yayun Wu, Wei Liu, Ming Chen and Zeguo Qiu
Appl. Sci. 2024, 14(22), 10141; https://doi.org/10.3390/app142210141 - 6 Nov 2024
Viewed by 495
Abstract
In the realm of construction site monitoring, ensuring the proper use of safety helmets is crucial. Addressing the issues of high parameter values and sluggish detection speed in current safety helmet detection algorithms, a feature-enhanced lightweight algorithm, LG-YOLOv8, was introduced. Firstly, we introduce [...] Read more.
In the realm of construction site monitoring, ensuring the proper use of safety helmets is crucial. Addressing the issues of high parameter values and sluggish detection speed in current safety helmet detection algorithms, a feature-enhanced lightweight algorithm, LG-YOLOv8, was introduced. Firstly, we introduce C2f-GhostDynamicConv as a powerful tool. This module enhances feature extraction to represent safety helmet wearing features, aiming to improve the efficiency of computing resource utilization. Secondly, the Bi-directional Feature Pyramid (BiFPN) was employed to further enrich the feature information, integrating feature maps from various levels to achieve more comprehensive semantic information. Finally, to enhance the training speed of the model and achieve a more lightweight outcome, we introduce a novel lightweight asymmetric detection head (LADH-Head) to optimize the original YOLOv8-n’s detection head. Evaluations on the SWHD dataset confirm the effectiveness of the LG-YOLOv8 algorithm. Compared to the original YOLOv8-n algorithm, our approach achieves a mean Average Precision (mAP) of 94.1%, a 59.8% reduction in parameters, a 54.3% decrease in FLOPs, a 44.2% increase in FPS, and a 2.7 MB compression of the model size. Therefore, LG-YOLOv8 has high accuracy and fast detection speed for safety helmet detection, which realizes real-time accurate detection of safety helmets and an ideal lightweight effect. Full article
Show Figures

Figure 1

Figure 1
<p>Network structure of YOLOv8.</p>
Full article ">Figure 2
<p>Conventional convolution and Ghost module. (<b>a</b>) The convolutional layer; (<b>b</b>) The Ghost module.</p>
Full article ">Figure 3
<p>The structure of DynamicConv.</p>
Full article ">Figure 4
<p>Comparison of C2f and C2f-GhostDynamicConv modules: (<b>a</b>) C2f module; (<b>b</b>) C2f-GhostDynamicConv module.</p>
Full article ">Figure 5
<p>(<b>a</b>) FPN introduces a top-down path to fuse multiscale features from the third to the seventh level (P3–P7); (<b>b</b>) PANet enhances the FPN (Feature Pyramid Network) by incorporating an additional bottom-up pathway; and (<b>c</b>) BiFPN offers a superior balance between accuracy and efficiency.</p>
Full article ">Figure 6
<p>YOLOv8-n Detection Head.</p>
Full article ">Figure 7
<p>Network structure of the lightweight asymmetric detection head (LADH-Head).</p>
Full article ">Figure 8
<p>Example of experimental dataset (Reprinted from [<a href="#B35-applsci-14-10141" class="html-bibr">35</a>]).</p>
Full article ">Figure 9
<p>Changes in key metrics during YOLOv 8-n and LG-YOLOv8 trainings.</p>
Full article ">Figure 10
<p>Changes in loss during YOLOv8-n and LG-YOLOv8 training.</p>
Full article ">Figure 11
<p>Histogram comparison of results of different algorithms.</p>
Full article ">Figure 12
<p>Visualization results for different scenarios (adapted from ref. [<a href="#B35-applsci-14-10141" class="html-bibr">35</a>]). (<b>a</b>) The pictures in the original dataset and the helmet detection pictures in different scenarios; (<b>b</b>) Base model YOLOv8-n; (<b>c</b>) Improved model LG-YOLOv8.</p>
Full article ">
25 pages, 33901 KiB  
Article
Impact of Adverse Weather and Image Distortions on Vision-Based UAV Detection: A Performance Evaluation of Deep Learning Models
by Adnan Munir, Abdul Jabbar Siddiqui, Saeed Anwar, Aiman El-Maleh, Ayaz H. Khan and Aqsa Rehman
Drones 2024, 8(11), 638; https://doi.org/10.3390/drones8110638 - 4 Nov 2024
Viewed by 1037
Abstract
Unmanned aerial vehicle (UAV) detection in real-time is a challenging task despite the advances in computer vision and deep learning techniques. The increasing use of UAVs in numerous applications has generated worries about possible risks and misuse. Although vision-based UAV detection methods have [...] Read more.
Unmanned aerial vehicle (UAV) detection in real-time is a challenging task despite the advances in computer vision and deep learning techniques. The increasing use of UAVs in numerous applications has generated worries about possible risks and misuse. Although vision-based UAV detection methods have been proposed in recent years, a standing open challenge and overlooked issue is that of adverse weather. This work is the first, to the best of our knowledge, to investigate the impact of adverse weather conditions and image distortions on vision-based UAV detection methods. To achieve this, a custom training dataset was curated with images containing a variety of UAVs in diverse complex backgrounds. In addition, this work develops a first-of-its-kind dataset, to the best of our knowledge, with UAV-containing images affected by adverse conditions. Based on the proposed datasets, a comprehensive benchmarking study is conducted to evaluate the impact of adverse weather and image distortions on the performance of popular object detection methods such as YOLOv5, YOLOv8, Faster-RCNN, RetinaNet, and YOLO-NAS. The experimental results reveal the weaknesses of the studied models and the performance degradation due to adverse weather, highlighting avenues for future improvement. The results show that even the best UAV detection model’s performance degrades in mean average precision (mAP) by 50.62 points in torrential rain conditions, by 52.40 points in high noise conditions, and by 77.0 points in high motion blur conditions. To increase the selected models’ resilience, we propose and evaluate a strategy to enhance the training of the selected models by introducing weather effects in the training images. For example, the YOLOv5 model with the proposed enhancement strategy gained +35.4, +39.3, and +44.9 points higher mAP in severe rain, noise, and motion blur conditions respectively. The findings presented in this work highlight the advantages of considering adverse weather conditions during model training and underscore the significance of data enrichment for improving model generalization. The work also accentuates the need for further research into advanced techniques and architectures to ensure more reliable UAV detection under extreme weather conditions and image distortions. Full article
Show Figures

Figure 1

Figure 1
<p>The impact of adverse weather conditions on the performance of UAV detection models. The confidence score of the original YOLOv5 model reduces while the original YOLOv8 fails to detect the UAV. However, through the proposed enhancement strategy, the enhanced YOLOv8 model detects it with good confidence. Without the proposed enhancement, Faster-RCNN shows false positives but yields no false positives after the proposed enhancement. Original RetinaNet fails to detect the UAV whereas with the proposed enhancement, it detects with higher confidence. (best viewed with zoom-in).</p>
Full article ">Figure 2
<p>Illustrating the overall methodology proposed in this work for investigating the impact of adverse effects on vision-based UAV detection methods and enhancements through adverse weather effects-aware training.</p>
Full article ">Figure 3
<p>Sample images from the Complex Background Dataset (CBD) with UAVs.</p>
Full article ">Figure 4
<p>Sample blurred images with varying blur: clean (no blur), low, medium, and high. (best viewed when zoomed in).</p>
Full article ">Figure 5
<p>Depicting samples from the adverse noise test dataset (ANTD) at three levels of noise severity: low, medium, and high.</p>
Full article ">Figure 6
<p>Sample images from the Rainy Test Dataset RTD.</p>
Full article ">Figure 7
<p>Network architecture for YOLOv5 of [<a href="#B55-drones-08-00638" class="html-bibr">55</a>] (best viewed when zoomed in).</p>
Full article ">Figure 8
<p>Network architecture for RetinaNet [<a href="#B56-drones-08-00638" class="html-bibr">56</a>].</p>
Full article ">Figure 9
<p>Network architecture for YOLOv8 [<a href="#B57-drones-08-00638" class="html-bibr">57</a>].</p>
Full article ">Figure 10
<p>Faster R-CNN architecture [<a href="#B11-drones-08-00638" class="html-bibr">11</a>].</p>
Full article ">Figure 11
<p>Sample UAV detection results of Faster-RCNN, YOLO-NAS, RetinaNet, YOLOv5, YOLOv8 on the Complex Backgrounds Dataset <span class="html-italic">CBD</span>.</p>
Full article ">Figure 12
<p>Sample failure cases of each model on CBD.</p>
Full article ">Figure 13
<p>Detection results and improvement comparison between Faster-RCNN and the Enhanced Faster-RCNN model. The enhanced version clearly achieved higher detection results. For <span class="html-italic">MBTD</span>, the original Faster-RCNN fails to detect the UAV.</p>
Full article ">Figure 14
<p>Detection results and improvement comparison between YOLOv5 and the Enhanced YOLOv5 model. The enhanced version clearly achieved higher detection results.</p>
Full article ">Figure 15
<p>Sample Grad-CAM results are shown with all proposed datasets. YOLOv5 shows high scores for the UAV class for all the mentioned datasets. However, for <span class="html-italic">RTD</span>, the YOLOv5 focus is on UAV and the edges of the image, which can cause a false positive. (the darker the red color indicates higher activation values at those spots).</p>
Full article ">
20 pages, 12767 KiB  
Article
A Real-Time End-to-End Framework with a Stacked Model Using Ultrasound Video for Cardiac Septal Defect Decision-Making
by Siti Nurmani, Ria Nova, Ade Iriani Sapitri, Muhammad Naufal Rachmatullah, Bambang Tutuko, Firdaus Firdaus, Annisa Darmawahyuni, Anggun Islami, Satria Mandala, Radiyati Umi Partan, Akhiar Wista Arum and Rio Bastian
J. Imaging 2024, 10(11), 280; https://doi.org/10.3390/jimaging10110280 - 3 Nov 2024
Viewed by 655
Abstract
Echocardiography is the gold standard for the comprehensive diagnosis of cardiac septal defects (CSDs). Currently, echocardiography diagnosis is primarily based on expert observation, which is laborious and time-consuming. With digitization, deep learning (DL) can be used to improve the efficiency of the diagnosis. [...] Read more.
Echocardiography is the gold standard for the comprehensive diagnosis of cardiac septal defects (CSDs). Currently, echocardiography diagnosis is primarily based on expert observation, which is laborious and time-consuming. With digitization, deep learning (DL) can be used to improve the efficiency of the diagnosis. This study presents a real-time end-to-end framework tailored for pediatric ultrasound video analysis for CSD decision-making. The framework employs an advanced real-time architecture based on You Only Look Once (Yolo) techniques for CSD decision-making with high accuracy. Leveraging the state of the art with the Yolov8l (large) architecture, the proposed model achieves a robust performance in real-time processes. It can be observed that the experiment yielded a mean average precision (mAP) exceeding 89%, indicating the framework’s effectiveness in accurately diagnosing CSDs from ultrasound (US) videos. The Yolov8l model exhibits precise performance in the real-time testing of pediatric patients from Mohammad Hoesin General Hospital in Palembang, Indonesia. Based on the results of the proposed model using 222 US videos, it exhibits 95.86% accuracy, 96.82% sensitivity, and 98.74% specificity. During real-time testing in the hospital, the model exhibits a 97.17% accuracy, 95.80% sensitivity, and 98.15% specificity; only 3 out of the 53 US videos in the real-time process were diagnosed incorrectly. This comprehensive approach holds promise for enhancing clinical decision-making and improving patient outcomes in pediatric cardiology. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

Figure 1
<p>The real-time end-to-end framework.</p>
Full article ">Figure 2
<p>The US standard cardiac view of A4CH, A5CH, PLAX, PSAX, and SC with color Doppler echocardiography.</p>
Full article ">Figure 3
<p>Annotation of chamber wall and cardiac defect.</p>
Full article ">Figure 4
<p>A total of 16 samples for performance comparison with 72 variants of Yolo model for CSD prediction to select the best model.</p>
Full article ">Figure 5
<p>Normal–abnormal classification performance for five architectures.</p>
Full article ">Figure 6
<p>Normal and abnormal cardiac classification results in terms of training and validation loss.</p>
Full article ">Figure 7
<p>View classification result on training and validation loss.</p>
Full article ">Figure 8
<p>The CSD detection performance using our framework on five standard views.</p>
Full article ">Figure 9
<p>All CSD detection performances based on patient.</p>
Full article ">Figure 10
<p>The sample of CSD detection images in five views.</p>
Full article ">Figure 11
<p>The proposed model of CSD detection in color Doppler echocardiography case.</p>
Full article ">
28 pages, 27981 KiB  
Article
Acoustic Imaging Learning-Based Approaches for Marine Litter Detection and Classification
by Pedro Alves Guedes, Hugo Miguel Silva, Sen Wang, Alfredo Martins, José Almeida and Eduardo Silva
J. Mar. Sci. Eng. 2024, 12(11), 1984; https://doi.org/10.3390/jmse12111984 - 3 Nov 2024
Viewed by 591
Abstract
This paper introduces an advanced acoustic imaging system leveraging multibeam water column data at various frequencies to detect and classify marine litter. This study encompasses (i) the acquisition of test tank data for diverse types of marine litter at multiple acoustic frequencies; (ii) [...] Read more.
This paper introduces an advanced acoustic imaging system leveraging multibeam water column data at various frequencies to detect and classify marine litter. This study encompasses (i) the acquisition of test tank data for diverse types of marine litter at multiple acoustic frequencies; (ii) the creation of a comprehensive acoustic image dataset with meticulous labelling and formatting; (iii) the implementation of sophisticated classification algorithms, namely support vector machine (SVM) and convolutional neural network (CNN), alongside cutting-edge detection algorithms based on transfer learning, including single-shot multibox detector (SSD) and You Only Look once (YOLO), specifically YOLOv8. The findings reveal discrimination between different classes of marine litter across the implemented algorithms for both detection and classification. Furthermore, cross-frequency studies were conducted to assess model generalisation, evaluating the performance of models trained on one acoustic frequency when tested with acoustic images based on different frequencies. This approach underscores the potential of multibeam data in the detection and classification of marine litter in the water column, paving the way for developing novel research methods in real-life environments. Full article
(This article belongs to the Special Issue Applications of Underwater Acoustics in Ocean Engineering)
Show Figures

Figure 1

Figure 1
<p>Marine litter in the water column. Courtesy of Unsplash by Naja Jensen.</p>
Full article ">Figure 2
<p>Kongsberg M3 Multibeam High-Frequency Echosounder system setup in the test tank. (<b>a</b>) Test tank setup, (<b>b</b>) MBES capturing the Wooden deck in the water column.</p>
Full article ">Figure 3
<p>Marine debris used for the test tank dataset. PVC Squares (1); PVC traffic cone (2); Wooden deck (3); vinyl sheet (4); fish net (5).</p>
Full article ">Figure 4
<p>High-level architecture for the MBES sensor and acoustic imaging for detection and classification problems.</p>
Full article ">Figure 5
<p>Raw acoustic images of a PVC square at the same range, with varying FOV due to the different acoustic frequencies. (<b>a</b>) Raw acoustic image of 1200 kHz, (<b>b</b>) Raw acoustic image of 1400 kHz.</p>
Full article ">Figure 6
<p>Cartesian acoustic image of a PVC square in the water column.</p>
Full article ">Figure 7
<p>Polar acoustic image of a PVC square in the water column.</p>
Full article ">Figure 8
<p>Class Activation Map applied to the CNN with a polar image of a PVC square as an input.</p>
Full article ">Figure 9
<p>SSD model inference in two polar acoustic images with multiple targets with the target detection confidence.</p>
Full article ">Figure 10
<p>YOLO8 model inference in polar acoustic images with multiple targets with the target detection confidence.</p>
Full article ">
21 pages, 14443 KiB  
Article
High-Precision Defect Detection in Solar Cells Using YOLOv10 Deep Learning Model
by Lotfi Aktouf, Yathin Shivanna and Mahmoud Dhimish
Solar 2024, 4(4), 639-659; https://doi.org/10.3390/solar4040030 - 1 Nov 2024
Viewed by 609
Abstract
This study presents an advanced defect detection approach for solar cells using the YOLOv10 deep learning model. Leveraging a comprehensive dataset of 10,500 solar cell images annotated with 12 distinct defect types, our model integrates Compact Inverted Blocks (CIBs) and Partial Self-Attention (PSA) [...] Read more.
This study presents an advanced defect detection approach for solar cells using the YOLOv10 deep learning model. Leveraging a comprehensive dataset of 10,500 solar cell images annotated with 12 distinct defect types, our model integrates Compact Inverted Blocks (CIBs) and Partial Self-Attention (PSA) modules to enhance feature extraction and classification accuracy. Training on the Viking cluster with state-of-the-art GPUs, our model achieved remarkable results, including a mean Average Precision ([email protected]) of 98.5%. Detailed analysis of the model’s performance revealed exceptional precision and recall rates for most defect classes, notably achieving 100% accuracy in detecting black core, corner, fragment, scratch, and short circuit defects. Even for challenging defect types such as a thick line and star crack, the model maintained high performance, with accuracies of 94% and 96%, respectively. The Recall–Confidence and Precision–Recall curves further demonstrate the model’s robustness and reliability across varying confidence thresholds. This research not only advances the state of automated defect detection in photovoltaic manufacturing but also underscores the potential of YOLOv10 for real-time applications. Our findings suggest significant implications for improving the quality control process in solar cell production. Although the model demonstrates high accuracy across most defect types, certain subtle defects, such as thick lines and star cracks, remain challenging, indicating potential areas for further optimization in future work. Full article
Show Figures

Figure 1

Figure 1
<p>YOLOv10 model architecture: (<b>a</b>) The Compact Inverted Block (CIB); (<b>b</b>) The Partial Self-Attention module (PSA); (<b>c</b>) Overall YOLOv10 model architecture.</p>
Full article ">Figure 2
<p>Examples of defect types in the EL Solar Cells dataset [<a href="#B34-solar-04-00030" class="html-bibr">34</a>]. The dataset includes 12 classes of defects: line crack, star crack, finger interruption, black core, vertical dislocation, horizontal dislocation, thick line, scratch, fragment, corner, short circuit, and printing error. Each defect class is highlighted with colored bounding boxes for visual reference.</p>
Full article ">Figure 3
<p>Pair plot showing the distribution and relationships between the bounding box coordinates (x, y) and dimensions (width, height) in the training dataset. This plot helps to visualize the data distribution and correlations, which are essential for effective model training.</p>
Full article ">Figure 4
<p>Recall–Confidence curve for the YOLOv10 model across different defect classes in the EL Solar Cells dataset. The curve illustrates the relationship between recall and confidence threshold for each class, highlighting the model’s detection performance and identifying areas where detection confidence varies among different defect types.</p>
Full article ">Figure 5
<p>Detection results of the YOLOv10 model on the EL Solar Cells dataset. The figure shows various defect types, including cracks, finger interruptions, star cracks, and black core defects, with bounding boxes and labels indicating the detected defects. The images illustrate the model’s accuracy and capability in identifying and localizing multiple defect types.</p>
Full article ">Figure 6
<p>Normalized confusion matrix for the YOLOv10 model on the EL Solar Cells dataset. This matrix illustrates the model’s classification accuracy across different defect classes, with diagonal elements representing correct predictions and off-diagonal elements indicating misclassifications.</p>
Full article ">Figure 7
<p>Precision–Recall curve for the YOLOv10 model on the EL Solar Cells dataset. The curve illustrates the trade-off between precision and recall for each defect class, providing a comprehensive view of the model’s detection capabilities. The mean Average Precision (mAP@0.5) across all classes is 0.985, indicating the model’s high effectiveness in detecting a wide range of defects.</p>
Full article ">Figure A1
Full article ">Figure A2
Full article ">
17 pages, 2483 KiB  
Article
Fire and Smoke Detection in Complex Environments
by Furkat Safarov, Shakhnoza Muksimova, Misirov Kamoliddin and Young Im Cho
Fire 2024, 7(11), 389; https://doi.org/10.3390/fire7110389 - 29 Oct 2024
Viewed by 575
Abstract
Fire detection is a critical task in environmental monitoring and disaster prevention, with traditional methods often limited in their ability to detect fire and smoke in real time over large areas. The rapid identification of fire and smoke in both indoor and outdoor [...] Read more.
Fire detection is a critical task in environmental monitoring and disaster prevention, with traditional methods often limited in their ability to detect fire and smoke in real time over large areas. The rapid identification of fire and smoke in both indoor and outdoor environments is essential for minimizing damage and ensuring timely intervention. In this paper, we propose a novel approach to fire and smoke detection by integrating a vision transformer (ViT) with the YOLOv5s object detection model. Our modified model leverages the attention-based feature extraction capabilities of ViTs to improve detection accuracy, particularly in complex environments where fires may be occluded or distributed across large regions. By replacing the CSPDarknet53 backbone of YOLOv5s with ViT, the model is able to capture both local and global dependencies in images, resulting in more accurate detection of fire and smoke under challenging conditions. We evaluate the performance of the proposed model using a comprehensive Fire and Smoke Detection Dataset, which includes diverse real-world scenarios. The results demonstrate that our model outperforms baseline YOLOv5 variants in terms of precision, recall, and mean average precision (mAP), achieving a [email protected] of 0.664 and a recall of 0.657. The modified YOLOv5s with ViT shows significant improvements in detecting fire and smoke, particularly in scenes with complex backgrounds and varying object scales. Our findings suggest that the integration of ViT as the backbone of YOLOv5s offers a promising approach for real-time fire detection in both urban and natural environments. Full article
Show Figures

Figure 1

Figure 1
<p>The basic architecture of the vision transformer (ViT) integrated into the YOLOv5s framework. The image is divided into patches, each of which is treated as an input token for the transformer. The transformer model then processes these patches, capturing both local and global dependencies through self-attention mechanisms. This architecture enhances the model’s ability to detect fire and smoke, particularly in complex environments where objects may be occluded or distributed across large areas.</p>
Full article ">Figure 2
<p>Modified YOLOv5s model with ViT as the backbone. In this figure, the attention-based feature extraction of ViT is shown as the key component responsible for improved detection accuracy. The figure highlights the process by which the ViT replaces the CSPDarknet53 backbone, allowing the model to capture long-range dependencies and spatial relationships more effectively. This leads to more accurate object detection, especially for fire and smoke, in challenging environments.</p>
Full article ">Figure 3
<p>The data augmentation process, including random flip, random rotation, and ColorJitter.</p>
Full article ">Figure 4
<p>Training results on fire and smoke dataset.</p>
Full article ">Figure 5
<p>Visualization of the results.</p>
Full article ">
Back to TopTop