YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection
<p>YOLOv8 network structure diagram.</p> "> Figure 2
<p>YOLOv8 network detail structure diagram.</p> "> Figure 3
<p>Improved YOLOv8 network structure diagram.</p> "> Figure 4
<p>Improvement scheme at the head.</p> "> Figure 5
<p>Structure of SPD-Conv.</p> "> Figure 6
<p>The GAM attention module.</p> "> Figure 7
<p>Proportion of drone size in the image (darker colors mean more drones).</p> "> Figure 8
<p>Display of dataset diversity. (<b>a</b>) multi-rotor drone; (<b>b</b>) fixed-wing drone; (<b>c</b>–<b>f</b>) show several difficult samples, which contain extreme small drone, blurred drone or complex environment.</p> "> Figure 9
<p>Comparison graph between our model and the YOLOv8s experiment (parameters, model size, and FPS are normalized separately).</p> "> Figure 10
<p>The left side shows some of the leakage detection results of YOLOv8s, as shown in Figure (<b>a</b>,<b>c</b>,<b>e</b>). The right side shows the detection results of the improved model in the same image, as shown in Figure (<b>b</b>,<b>d</b>,<b>f</b>).</p> "> Figure 11
<p>Figure (<b>a</b>,<b>c</b>,<b>e</b>) show the results of the partial error detection of YOLOv8s, as shown in the yellow box, and Figure (<b>b</b>,<b>d</b>,<b>f</b>). show the detection results of the improved model for the same image.</p> "> Figure 12
<p>Size of self-built dataset drones (darker colors mean more drones).</p> "> Figure 13
<p>Selected sample plots of the self-built dataset. (<b>a</b>–<b>c</b>) show drone imagery from different time periods; (<b>d</b>–<b>i</b>) show several difficult samples, including very small drones, drones photographed from a high altitude, or complex environments.</p> "> Figure 14
<p>Comparison graph between our model and the YOLOv8s experiment (self-built datasets).</p> "> Figure 15
<p>Actual test chart display.</p> ">
Abstract
:Highlights
- An improvement upon the state-of-the-art YOLOv8 model, proposing a high-performance and highly generalizable model for detecting tiny UAV targets.
- Addressing the small size characteristics of UAV targets, a high-resolution detection branch is added to the detection head to enhance the model’s ability to detect tiny targets. Simultaneously, prediction and the related feature extraction and fusion layers for large targets are pruned, reducing network redundancy and lowering the model’s parameter count.
- Improving multi-scale feature extraction, using SPD-Conv instead of Conv to extract multi-scale features, better retaining the features of tiny targets, and reducing the probability of UAV miss detection. Additionally, the multi-scale fusion module incorporates the GAM attention mechanism to enhance the fusion of target features and reduce the probability of false detections. The combined use of SPD-Conv and GAM strengthens the model’s ability to detect tiny targets.
Abstract
1. Introduction
2. Related Work
3. Methods
3.1. YOLOv8 Network Structure
3.2. Improved YOLOv8 UAV Detection Model
3.2.1. Improvement of the Detection Head
- A.
- Adding a tiny-target detection head
- B.
- Removing the large-target detection head
3.2.2. Improvement of the Feature Extraction Module
3.2.3. Improvement of the Feature Fusion Module
4. Experimental Preparation and Results
4.1. Dataset Introduction
4.2. Network Setup and Training
4.2.1. Loss Function Setting
4.2.2. Network Training
4.3. Evaluation Indicators
- (1)
- Accuracy and recall rates are calculated as follows:
- (2)
- The average precision and average precision mean are calculated as follows:
4.4. Ablation Experiments
- The increase from the tiny detection head improved the model by 10.8%, 13.5%, and 8.3% for P, R, and mAP, respectively, indicating that the increase from the high-resolution detection head can effectively enhance the detection ability of tiny targets. At the same time, it can be seen that after trimming off the large target detection layer, the parameter amount was reduced by 70.2% and the model size was reduced by 67%, while R remained unchanged, P was reduced by 0.3%, and mAP was reduced by 0.9%, indicating that the low-resolution detection head made little contribution to the detection of tiny UAV targets and generated a large redundant network.
- The experimental results of improving models 3, 4, 5, and 6 show that improving the SPD-Conv module had a better improvement effect on the recall R of the model, indicating that improving the Conv module to SPD-Conv in the backbone network can better retain the features of the minutiae targets and reduce the probability of missing detection for the minutiae targets; adding GAM had a better improvement effect on the accuracy P of the model, indicating that adding the GAM attention module in the addition of the GAM attention module in the neck had a good impact on the feature fusion of the network and reduced the probability of false network detection. When both SPD-Conv and GAM were added, P, R, and mAP were improved, although the number of parameters and the model size slightly increased.
- Comparing the experimental results of the improved model 6 (i.e., our model) and model 1 (i.e., the base model), as shown in Figure 9, we can see that because the tiny-head, SPD-Conv, and GAM modules added some inference time, the improved model FPS metric reached 221/f.s-1, which is lower compared to the 285/f.s-1 of the base model; however, it can still guarantee meeting the real-time requirement in actual deployment. In addition, our model significantly improved the P, R, mAP, number of parameters, and model size compared with the base model, with P, R, and mAP improving by 11.9%, 15.2%, and 9%, respectively. The number of parameters and model size decreased by 59.9% and 57.9%, respectively, thus proving the effectiveness and practicality of the improved model.
4.5. Comparative Experiments
- Comparing YOLOv7 and YOLOv7-tiny, it can be seen that although the number of parameters and the model size of YOLOv7 are much higher than the other models, P, R, and mAP present the worst results. Conversely, YOLOv7-tiny achieves good results in terms of detection accuracy, with a smaller number of parameters and model size. The reason for this is that the TIB-Net dataset has a smaller drone size and has fewer drone features contained in the images, while the more complex YOLOv7 network structure may learn many useless background features, which, in turn, results in poorer detection results.
- The TIB-Net detection network is at the other extreme; it can still maintain better detection accuracy with a much smaller number of parameters and model size than other models. However, one disadvantage is also apparent; the FPS is only 5, far from meeting the needs of real-time UAV detection.
- YOLOv5-s yields the best overall performance except for our model, while the FPS is 256 ahead of all models, and the P and R values are well balanced. In addition, the detection of YOLOX is also good, but R and FPS are slightly low compared with YOLOv5-s, and the model size is too large.
- The improved model proposed in this paper outperforms other models in terms of P, R, and mAP. In addition, it is at the top of all the models in terms of the number of parameters, model size, and FPS, while the number of parameters and model size is only higher than the TIB-Net network; FPS is slightly lower compared to YOLOv5-s and YOLOv7-tiny, but it can meet the deployment requirements of real-time detection. Overall, the tiny UAV detection network proposed in this paper achieves better detection accuracy, model size, and detection speed and can meet the specifications of practical engineering applications.
4.6. Self-Built Dataset Experiment
5. Conclusions and Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shi, X.; Yang, C.; Xie, W.; Liang, C.; Shi, Z.; Chen, J. Anti-Drone System with Multiple Surveillance Technologies: Architecture, Implementation, and Challenges. IEEE Commun. Mag. 2018, 56, 68–74. [Google Scholar] [CrossRef]
- Chen, Q.Q.; Feng, Z.W.; Zhang, G.B. Dynamic modelling and simulation of anti-UAV tethered-net capture system. J. Natl. Univ. Def. Technol. 2022, 44, 9–15. [Google Scholar]
- Ikuesan, R.A.; Ganiyu, S.O.; Majigi, M.U.; Opaluwa, Y.D.; Venter, H.S. Practical Approach to Urban Crime Prevention in Developing Nations. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security, Marrakech, Morocco, 31 March–2 April 2020; pp. 1–8. [Google Scholar]
- Mahmood, S.A. Anti-Drone System: Threats and Challenges. In Proceedings of the 2019 First International Conference of Computer and Applied Sciences (CAS), Baghdad, Iraq, 18–19 December 2019; p. 274. [Google Scholar]
- Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
- Garcia, A.J.; Lee, J.M.; Kim, D.S. Anti-drone system: A visual-based drone detection using neural networks. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 21–23 October 2020; pp. 559–561. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; van Gool, L. Surf: Speeded up robust features. In Proceedings of the Computer Vision—ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part I 9. Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
- Dai, J.; Wu, L.; Wang, P. Overview of UAV Target Detection Algorithms Based on Deep Learning. In Proceedings of the 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 17–19 December 2021; Volume 2, pp. 736–745. [Google Scholar]
- Zuo, Y. Target Detection System of Agricultural Economic Output Efficiency Based on Kruskal Algorithm. In Proceedings of the 2022 IEEE 2nd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, India, 2–3 December 2022; pp. 1–5. [Google Scholar]
- Li, S.; Yu, J.; Wang, H. Damages detection of aero-engine blades via deep learning algorithms. IEEE Trans. Instrum. Meas. 2023, 72, 5009111. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Jiao, L.C.; Zhang, F.; Liu, F.; Yang, S.Y.; Li, L.L.; Feng, Z.X.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
- Sun, H.; Yang, J.; Shen, J.; Liang, D.; Ning-Zhong, L.; Zhou, H. TIB-Net: Drone Detection Network with Tiny Iterative Backbone. IEEE Access 2020, 8, 130697–130707. [Google Scholar] [CrossRef]
- He, J.; Liu, M.; Yu, C. UAV reaction detection based on multi-scale feature fusion. In Proceedings of the 2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), Xi’an, China, 28–30 October 2022; pp. 640–643. [Google Scholar]
- Wastupranata, L.M.; Munir, R. UAV Detection using Web Application Approach based on SSD Pre-Trained Model. In Proceedings of the 2021 IEEE International Conference on Aerospace Electronics and Remote Sensing Technology (ICARES), Virtual, 3–4 November 2021; pp. 1–6. [Google Scholar]
- Tao, Y.; Zongyang, Z.; Jun, Z.; Xinghua, C.; Fuqiang, Z. Low-altitude small-sized object detection using lightweight feature-enhanced convolutional neural network. J. Syst. Eng. Electron. 2021, 32, 841–853. [Google Scholar] [CrossRef]
- Ye, T.; Zhang, J.; Li, Y.; Zhang, X.; Zhao, Z.; Li, Z. CT-Net: An Efficient Network for Low-Altitude Object Detection Based on Convolution and Transformer. IEEE Trans. Instrum. Meas. 2022, 71, 2507412. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Ma, J.; Yao, Z.; Xu, C.; Chen, S. Multi-UAV real-time tracking algorithm based on improved PP-YOLO and Deep-SORT. J. Comput. Appl. 2022, 42, 2885. [Google Scholar]
- Li, H.; Yang, J.; Mao, Y.; Hu, Q.; Du, Y.; Peng, J.; Liu, C. A UAV detection algorithm combined with lightweight network. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; Volume 5, pp. 1865–1872. [Google Scholar]
- Liu, Y.; Liu, D.; Wang, B.; Chen, B. Mob-YOLO: A Lightweight UAV Object Detection Method. In Proceedings of the 2022 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD), Harbin, China, 30 November–2 December 2022; pp. 1–6. [Google Scholar]
- Liu, R.; Xiao, Y.; Li, Z.; Cao, H. Research on the anti-UAV distributed system for airports: YOLOv5-based auto-targeting device. In Proceedings of the 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 20–22 May 2022; pp. 864–867. [Google Scholar]
- Xue, S.; Wang, Y.; Lü, Q.; Cao, G. Anti-occlusion target detection algorithm for anti-UAV system based on YOLOX-drone. Chin. J. Eng. 2023, 45, 1539–1549. [Google Scholar] [CrossRef]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. arXiv 2022, arXiv:2208.03641. [Google Scholar]
- Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
- Liu, S.; Wang, Y.; Yu, Q.; Liu, H.; Peng, Z. CEAM-YOLOv7: Improved YOLOv7 Based on Channel Expansion and Attention Mechanism for Driver Distraction Behavior Detection. IEEE Access 2022, 10, 129116–129124. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, M.; Liu, K.; Xiao, M.; Wen, Z.; Man, J. An Automatic Fault Detection Method of Freight Train Images Based on BD-YOLO. IEEE Access 2022, 10, 39613–39626. [Google Scholar] [CrossRef]
- Fang, Y.; Guo, X.; Chen, K.; Zhou, Z.; Ye, Q. Accurate and automated detection of surface knots on sawn timbers using YOLO-V5 model. BioResources 2021, 16, 5390. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Parameters | Setup |
---|---|
Epochs | 150 |
Warmup-epochs | 3 |
Warmup-momentum | 0.8 |
Batch Size | 8 |
Imgsize | 640 |
Initial Learning Rate | 0.01 |
Final Learning Rate | 0.01 |
Patience | 50 |
Optimizer | SGD |
NMSIoU | 0.7 |
Momentum | 0.937 |
Mask-ratio | 4 |
Weight-Decay | 0.0005 |
Components | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
+Tiny-Head | √ | √ | √ | √ | √ | |
-Large-Head | √ | √ | √ | √ | ||
+SPD-Conv | √ | √ | ||||
+GAM | √ | √ | ||||
P | 81.4% | 92.2% | 91.9% | 92.6% | 93.1% | 93.3% |
R | 78.1% | 91.6% | 91.6% | 92.8% | 92.2% | 93.3% |
mAP | 86.1% | 94.4% | 93.5% | 94.9% | 93.6% | 95.1% |
Parameters/million | 11.126 | 10.852 | 3.527 | 4.209 | 3.785 | 4.467 |
Model Size/MB | 21.9 | 22.1 | 7.3 | 8.7 | 7.9 | 9.2 |
FPS/f.s-1 | 285 | 217 | 259 | 232 | 246 | 221 |
Methods | P | R | mAP | Parameters/Million | Model Size/MB | FPS/f.s-1 |
---|---|---|---|---|---|---|
TIB-Net | 87.6% | 87% | 89.4% | 0.163 | 0.681 | 5 |
YOLOv5-s | 88.1% | 90.9% | 91.2% | 7.013 | 14.3 | 256 |
YOLOX-s | 90.5% | 80.6% | 88.7% | 9.0 | 62.5 | 132 |
YOLOv7 | 64.2% | 56% | 52.4% | 36.480 | 74.7 | 104 |
YOLOv7-tiny | 85% | 82.6% | 85% | 6.007 | 12.2 | 227 |
Ours | 93.3% | 93.3% | 95.1% | 4.467 | 9.2 | 221 |
Model | P | R | mAP |
---|---|---|---|
YOLOv8s | 88.8% | 73.9% | 85.2% |
Ours | 97% | 89.5% | 95.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhai, X.; Huang, Z.; Li, T.; Liu, H.; Wang, S. YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics 2023, 12, 3664. https://doi.org/10.3390/electronics12173664
Zhai X, Huang Z, Li T, Liu H, Wang S. YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics. 2023; 12(17):3664. https://doi.org/10.3390/electronics12173664
Chicago/Turabian StyleZhai, Xianxu, Zhihua Huang, Tao Li, Hanzheng Liu, and Siyuan Wang. 2023. "YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection" Electronics 12, no. 17: 3664. https://doi.org/10.3390/electronics12173664
APA StyleZhai, X., Huang, Z., Li, T., Liu, H., & Wang, S. (2023). YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics, 12(17), 3664. https://doi.org/10.3390/electronics12173664