IRSDD-YOLOv5: Focusing on the Infrared Detection of Small Drones
<p>Example image of the infrared small drone. The drone is represented by a red border and enlarged in the lower right corner.</p> "> Figure 2
<p>Examples of SIDD dataset, from top to bottom: city scene, mountain scene, sponge scene and sky background. The drone targets in the images are marked with red circles.</p> "> Figure 3
<p>Segmentation process of IRSDD-YOLOv5 network.</p> "> Figure 4
<p>Overall architecture of IRSDD-YOLOv5. A PANet-like structure is used in the neck network, and the red part is the infrared small drone detection module added to the neck network. The four prediction heads use the feature maps generated from the neck network to fuse information about the targets. In addition, the number of each module is marked with an orange number on the left side of the module.</p> "> Figure 5
<p>The specific structure of C3 and SPPF modules.</p> "> Figure 6
<p>The top part of the figure shows how IRSTDM works, and the bottom part shows the detailed part of the module.</p> "> Figure 7
<p>Example of a procedure for calculating the NWD between two boxes.</p> "> Figure 8
<p>Visualization of target detection results. The leftmost is the input map, and from left to right is the mask map of target results of mainstream segmentation methods. The target area marked by red circle is enlarged in the upper right corner. GT represents the real area region of the target mask.</p> "> Figure 9
<p>Three-dimensional visualization qualitative results of different instance segmentation methods. From left to right is the input original image, the segmentation result of BLendmask, BoxInst, CondInst, Maskrcnn. From top to bottom are the city scene, mountain scene, sea scene and sky scene.</p> "> Figure 10
<p>Three-dimensional visualization qualitative results of different instance segmentation methods. From left to right is the input original image, the segmentation results of Yolact++, YOLOv5, IRSDD-YOLOV5. The real area of the target (GT). From top to bottom are the city scene, mountain scene, sea scene and sky scene.</p> "> Figure 11
<p>Infrared drone images in different scenes compared with different three-dimensional images.</p> "> Figure 12
<p>The process of stitching four images containing a single object into one image containing four objects.</p> "> Figure 13
<p>An example of narrowing the detection area by introducing prior information from radar. The yellow sector indicates the general location of the drone.</p> ">
Abstract
:1. Introduction
- (1)
- Generative Adversarial Networks
- (2)
- Multi-Scale Learning
- (3)
- Use of Feature Context Information
- (4)
- Improvement of Loss Function
- (1)
- We added an infrared small target detection module (IRSTDM) to effectively realize the extraction and retention of infrared small target information based on the current advanced segmentation detector YOLOv5.
- (2)
- We introduced normalized Wasserstein distance (NWD) to optimize the boundary frame loss function, aiming to solve the problem that the calculated loss function based on the IOU is sensitive to the position of small targets.
- (3)
- We built and published a new SIDD dataset, carried out pixel-level annotation on infrared drone images in the dataset and published all masks used for segmentation.
- (4)
- We conducted a large number of experiments between the proposed IRSDD-YOLOv5 and eight mainstream segmentation detection methods in the SIDD dataset and verified that the proposed method is superior to the most advanced method.
2. Materials and Methods
2.1. Dataset
2.2. Proposed Method
2.2.1. Overall Architecture
2.2.2. Infrared Small Drone Detection Module
2.2.3. Optimization of Loss Function
3. Results and Discussion
3.1. Evaluation Metrics
- (1)
- Average precision (AP): The main algorithm evaluation metrics include , and . is the average value for an IoU of {0.5, 0.55, ----, 0.95}, and AP50 is the value taken for an IoU threshold of 0.5. is the AP value for small targets (less than 32 × 32 pixel values).
- (2)
- Parameters (Para): The number of parameters refers to the value of the parameters included in the model. Each number corresponding to the weight matrix in the convolution and the fully connected layers used in the model is part of the number of parameters. The number of parameters is the key to the machine learning algorithm. The size of the model parameters reflects the complexity of the model to some extent.
- (3)
- Frames per second (FPS): The higher the FPS value, the better is the real-time processing ability of the model when running the detection algorithm under the same hardware conditions. FPS = 1/latency, where latency is the time taken by the network to predict an image.
3.2. Implementation Details
3.3. Comparison with the Latest Methods
3.3.1. Quantitative Results
3.3.2. Qualitative Results
3.3.3. Analysis of the Reasons for Detection Accuracy
3.4. Ablation Studies
- (1)
- Ablation study for small target detection module
- (2)
- Ablation study for loss functions
3.5. Future Outlook
3.5.1. Multiple Infrared Drone Detection
3.5.2. Introduction of a Priori Clue Detection
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bai, X.; Zhou, F.G. Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognit. 2010, 43, 2145–2156. [Google Scholar] [CrossRef]
- Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared Patch-Image Model for Small Target Detection in a Single Image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef] [PubMed]
- Yan, F.; Xu, G.; Wang, J.; Wu, Q.; Wang, Z. Infrared Small Target Detection via Schatten Capped pNorm-Based Non-Convex Tensor Low-Rank Approximation. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Rosser, K.; Nguyen TX, B.; Moss, P.; Chahl, J. Low complexity visual drone track navigation using long-wavelength infrared. J. Field Robot. 2021, 38, 882–897. [Google Scholar] [CrossRef]
- Zheng, L.; Peng, Y.; Ye, Z.; Jiang, R.; Zhou, T. Infrared Small drone Target Detection Algorithm Based on Enhanced Adaptive Feature Pyramid Networks. IEEE Access 2022, 10, 115988–115995. [Google Scholar]
- Fang, H.; Xia, M.; Zhou, G.; Chang, Y.; Yan, L. Infrared Small drone Target Detection Based on Residual Image Prediction via Global and Local Dilated Residual Networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
- Li, X.; Hu, X.; Wang, Z.; Du, Z. Research progress of case segmentation based on Deep Learning. Comput. Eng. Appl. 2021, 57, 60–67. [Google Scholar]
- Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Dai, J.; He, K.; Sun, J. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Tian, Z.; Zhang, B.; Chen, H.; Shen, C. Instance and Panoptic Segmentation Using Conditional Convolutions. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 669–680. [Google Scholar] [CrossRef]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT++ Better Real-Time Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1108–1121. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2: Dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 17721–17732. [Google Scholar]
- He, K.M.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors [EB/OL]. Available online: https://arxiv.org/pdf/2207.02696v1.pdf (accessed on 26 October 2022).
- Hu, J.; Ren, L.J.; Fei, X.; Sang, N. A review of infrared weak target detection algorithms. Meas. Control. Technol. 2018, 4–9. [Google Scholar]
- Li, J.H.; Zhang, P.; Wang, X.W.; Huang, S.Z. A review of infrared weak target detection algorithms. Chin. J. Graph. 2020, 25, 1739–1753. [Google Scholar]
- Wei, C.; Li, Q.; Xu, J.; Yang, J.; Jiang, S. DRUNet: A Method for Infrared Point Target Detection. Appl. Sci. 2022, 12, 9299. [Google Scholar] [CrossRef]
- Xu, H.; Zhong, S.; Zhang, T.; Zou, X. Multi-Scale Multi-Level Residual Feature Fusion for Real-Time Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5002116. [Google Scholar] [CrossRef]
- Xu, Y.; Wan, M.; Zhang, X.; Wu, J.; Chen, Y.; Chen, Q.; Gu, G. Infrared Small Target Detection Based on Local Contrast-Weighted Multidirectional Derivative. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5000816. [Google Scholar] [CrossRef]
- Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense Nested Attention Network for Infrared Small Target Detection. IEEE Trans. Image Process. 2023, 32, 1745–1758. [Google Scholar] [CrossRef]
- Wang, Z.; Yang, J.; Pan, Z.; Liu, Y.; Lei, B.; Hu, Y. APAFNet: Single-Frame Infrared Small Target Detection by Asymmetric Patch Attention Fusion. IEEE Geosci. Remote Sens. Lett. 2023, 20, 7000405. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kim, J.-H.; Hwang, Y. GAN-Based Synthetic Data Augmentation for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5002512. [Google Scholar] [CrossRef]
- Noh, J.; Bae, W.; Lee, W.; Seo, J.; Kim, G. Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.M.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Cui, L.S.; Ma, R.; Lv, P.; Jiang, X.C.; Gao, Z.M.; Zhou, B.; Xu, M.L. MDSSD: Multi-scale deconvolutional single shot detector for small objects. Sci. China-Inf. Sci. 2020, 63, 3. [Google Scholar] [CrossRef] [Green Version]
- Zagoruyko, S.; Lerer, A.; Lin, T.-Y.; Pinheiro, P.O.; Gross, S.; Chintala, S.; Dollar, P. A multipath network for object detection. arXiv 2016, arXiv:1604.02135. [Google Scholar]
- Guan, L.T.; Wu, Y.; Zhao, J.Q. SCAN: Semantic Context Aware Network for Accurate Small Object Detection. Int. J. Comput. Intell. Syst. 2018, 11, 951–961. [Google Scholar] [CrossRef] [Green Version]
- Hu, P.; Ramanan, D. Finding tiny faces. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. UnitBox: An advanced object detection network. In Proceedings of the MM ‘16: Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the 34th AAAI Conference on Artificial Intelligence/32nd Innovative Applications of Artificial Intelligence Conference/10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Yu, X.; Gong, Y.; Jiang, N.; Ye, Q.; Han, Z. Scale Match for Tiny Person Detection. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020. [Google Scholar]
- Hui, B.W.; Song, Z.Y.; Fan, H.Q.; Zhong, P.; Hu, W.D.; Zhang, X.F.; Ling, J.G.; Su, H.Y.; Jin, W.; Zhang, Y.J.; et al. Weak aircraft target detection and tracking dataset with infrared images in ground/air context. Chin. Sci. Data 2020, 5, 291–302. [Google Scholar]
- Xu, S.-Y.; Chu, K.-B.; Zhang, J.; Feng, C.-T. An improved YOLOv3 algorithm for small target detection. Electro-Opt. Control. 2022, 29, 35–39. [Google Scholar]
Methods | City Scene | ||||
---|---|---|---|---|---|
Para(M) | FPS | ||||
Blendmask | 0.621 | 0.940 | 0.613 | 287.6 | 8.2 |
BoxInst | 0.197 | 0.538 | 0.197 | 273.8 | 9.6 |
CondInst | 0.565 | 0.936 | 0.564 | 272.6 | 9.6 |
Solov2 | 0.620 | 0.936 | 0.607 | 372.0 | 8.6 |
Mask-Rcnn | 0.629 | 0.937 | 0.621 | 353.3 | 3.97 |
Yolov5 | 0.477 | 0.929 | 0.469 | 15.2 | 29.68 |
Yolov7 | 0.440 | 0.877 | 0.435 | 76.3 | 16.28 |
Yolact++ | 0.423 | 0.902 | -- | 199.0 | 10.77 |
ours | 0.473 | 0.939 | 0.469 | 15.9 | 25.91 |
Methods | Mountain Scene | ||||
---|---|---|---|---|---|
Para | FPS | ||||
Blendmask | 0.423 | 0.775 | 0.423 | 287.6 | 8.99 |
BoxInst | -- | 0.013 | -- | 273.8 | 10.26 |
CondInst | 0.284 | 0.731 | 0.284 | 272.6 | 10.35 |
Solov2 | -- | -- | -- | 372.0 | 9.06 |
Mask-Rcnn | 0.416 | 0.749 | 0.416 | 353.3 | 4.04 |
Yolov5 | 0.278 | 0.760 | 0.278 | 15.2 | 40.08 |
Yolov7 | 0.269 | 0.746 | 0.269 | 76.3 | 20.79 |
Yolact++ | 0.177 | 0.625 | -- | 199.0 | 11.93 |
ours | 0.277 | 0.798 | 0.277 | 15.9 | 29.03 |
Methods | Sea Surface Scene | ||||
---|---|---|---|---|---|
Para | FPS | ||||
Blendmask | 0.455 | 0.842 | 0.456 | 287.6 | 7.83 |
BoxInst | -- | -- | -- | 273.8 | 8.97 |
CondInst | 0.292 | 0.819 | 0.292 | 272.6 | 8.86 |
Solov2 | -- | -- | -- | 372.0 | 8.08 |
Mask-Rcnn | 0.463 | 0.933 | 0.463 | 353.3 | 4.04 |
Yolov5 | 0.334 | 0.894 | 0.334 | 15.2 | 20.64 |
Yolov7 | 0.355 | 0.930 | 0.355 | 76.3 | 13.82 |
Yolact++ | 0.163 | 0.445 | -- | 199.0 | 9.42 |
ours | 0.375 | 0.934 | 0.375 | 15.9 | 16.01 |
Methods | Sky Background | ||||
---|---|---|---|---|---|
Para | FPS | ||||
Blendmask | 0.725 | 0.986 | 0.717 | 287.6 | 7.81 |
BoxInst | 0.395 | 0.806 | 0.397 | 273.8 | 9.06 |
CondInst | 0.673 | 0.977 | 0.648 | 272.6 | 8.91 |
Solov2 | 0.686 | 0.934 | 0.664 | 372.0 | 7.95 |
Mask-Rcnn | 0.711 | 0.987 | 0.703 | 351.3 | 4.03 |
Yolov5 | 0.592 | 0.977 | 0.570 | 15.2 | 21.68 |
Yolov7 | 0.580 | 0.974 | 0.561 | 76.3 | 14.61 |
Yolact++ | 0.561 | 0.958 | -- | 199.0 | 9.74 |
ours | 0.593 | 0.977 | 0.578 | 15.9 | 16.43 |
Scenes | SCR | SCR | SCR | SCR | SCR | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Urban | Sec1 | 2.950 | Sec2 | 1.848 | Sec3 | 6.366 | Sec4 | 0.686 | Sec5 | 0.622 | 2.494 |
Mountain | Sec1 | 0.880 | Sec2 | 0.506 | Sec3 | 0.283 | Sec4 | 1.280 | Sec5 | 1.822 | 0.954 |
Sea | Sec1 | 5.219 | Sec2 | 2.255 | Sec3 | 3.952 | Sec4 | 4.852 | Sec5 | 3.744 | 4.004 |
Sky | Sec1 | 3.769 | Sec2 | 5.898 | Sec3 | 3.790 | Sec4 | 2.600 | Sec5 | 7.594 | 4.730 |
With/without IRSTDM | Mountain Background | Sea Background | ||||
---|---|---|---|---|---|---|
With | 0.277 | 0.798 | 0.277 | 0.375 | 0.934 | 0.375 |
Without | 0.279 | 0.773↓ | 0.279 | 0.339↓ | 0.898↓ | 0.339↓ |
β | Mountain Background | Sea Background | ||||
---|---|---|---|---|---|---|
-- | 0.278 | 0.760 | 0.278 | 0.345 | 0.912 | 0.345 |
0.1 | 0.278 | 0.760 | 0.278 | 0.345 | 0.912 | 0.345 |
0.2 | 0.275 | 0.767 | 0.275 | 0.346 | 0.914 | 0.346 |
0.3 | 0.286 | 0.792 | 0.286 | 0.348 | 0.894 | 0.348 |
0.4 | 0.293 | 0.793 | 0.293 | 0.311 | 0.910 | 0.311 |
0.5 | 0.284 | 0.783 | 0.284 | 0.351 | 0.911 | 0.351 |
0.6 | 0.279 | 0.773 | 0.279 | 0.339 | 0.898 | 0.339 |
0.7 | 0.282 | 0.748 | 0.282 | 0.368 | 0.905 | 0.368 |
0.8 | 0.286 | 0.771 | 0.286 | 0.359 | 0.934 | 0.359 |
0.9 | 0.273 | 0.732 | 0.273 | 0.334 | 0.906 | 0.334 |
1.0 | 0.275 | 0.732 | 0.275 | 0.346 | 0.889 | 0.346 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, S.; Sun, B.; Zuo, Z.; Huang, H.; Wu, P.; Li, C.; Dang, Z.; Zhao, Z. IRSDD-YOLOv5: Focusing on the Infrared Detection of Small Drones. Drones 2023, 7, 393. https://doi.org/10.3390/drones7060393
Yuan S, Sun B, Zuo Z, Huang H, Wu P, Li C, Dang Z, Zhao Z. IRSDD-YOLOv5: Focusing on the Infrared Detection of Small Drones. Drones. 2023; 7(6):393. https://doi.org/10.3390/drones7060393
Chicago/Turabian StyleYuan, Shudong, Bei Sun, Zhen Zuo, Honghe Huang, Peng Wu, Can Li, Zhaoyang Dang, and Zongqing Zhao. 2023. "IRSDD-YOLOv5: Focusing on the Infrared Detection of Small Drones" Drones 7, no. 6: 393. https://doi.org/10.3390/drones7060393