Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

LODNU: lightweight object detection network in UAV vision

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the development of unmanned aerial vehicle (UAV) technology, using UAV to detect objects has become a research focus. However, most of the current object detection algorithms based on deep learning are resource consuming, which is difficult to deploy on embedded devices lacking memory and computing power, such as UAV. To meet these challenges, this paper proposes a lightweight object detection network in UAV vision (LODNU) based on YOLOv4, which can meet the application requirements of resource-constrained devices while ensuring the detection accuracy. Based on YOLOv4, LODNU uses depth-wise separable convolution to reconstruct the backbone network to reduce the parameters of the model, and embeds improved coordinate attention in the backbone network to improve the extraction ability of key object features. Meanwhile, the adaptive scale weighted feature fusion module is added to the path aggregation network to improve the accuracy of multi-scale object detection. In addition, in order to balance the proportion of sample size, we propose a patching data augmentation. LODNU achieves the performance near to YOLOv4 with fewer parameters. Compared with YOLOv4, LODNU achieves more balanced results between model size and accuracy in the VisDrone2019 dataset. The experimental show that the parameters of LODNU are only 13.1% of YOLOv4, and the floating-point operations are only 13.6% of YOLOv4. The mean average precision of LODNU in the VisDrone2019 dataset is 31.4%, which is 77% of YOLOv4.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  1. Gupta H, Verma OP (2022) monitoring and surveillance of urban road traffic using low altitude drone images: a deep learning approach. Multimed Tools Appl 81(14):19683–19703

    Article  Google Scholar 

  2. Wang S, Zhao J, Ta N, Zhao X, Xiao M, Wei H (2021) A real-time deep learning forest fire monitoring algorithm based on an improved pruned+ kd model. J Real Time Image Proc 18(6):2319–2329

    Article  Google Scholar 

  3. Huang Z, Zhang T, Liu P, Lu X (2020) outdoor independent charging platform system for power patrol UAV. In: 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp 1–5

  4. Han J, Zhang D, Cheng G et al (2014) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337

    Article  Google Scholar 

  5. Shi Z, Yu X, Jiang Z et al (2013) Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans Geosci Remote Sens 52(8):4511–4523

    Google Scholar 

  6. Everingham M, Eslami S, Van Gool L et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  7. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  8. Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755

    Google Scholar 

  9. Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318

    Article  MATH  Google Scholar 

  10. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  11. Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708

  12. Girshick R, Donahue J, Darrell T et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158

    Article  Google Scholar 

  13. Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988

  14. Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 2117–2125

  15. Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768

  16. Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp 21–37

    Google Scholar 

  17. Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  18. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  19. Chen Y, Chen X, Chen L, He D, Zheng J, Xu C, Lin Y, Liu L (2021) UAV lightweight object detection based on the improved yolo algorithm. In: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, pp 1502–1506

  20. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722

  21. Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  22. Dai J, Li Y, He K et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29:379–387

    Google Scholar 

  23. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  24. Yu F, Wang D, Shelhamer E et al (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2403–2412

  25. Kim S-W, Kook H-K, Sun J-Y et al (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250

  26. Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045

  27. Iandola FN, Han S, Moskewicz MW, et al (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360

  28. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  29. Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520

  30. Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324

  31. Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856

  32. Ma N, Zhang X, Liu M et al (2021) Activate or not: learning customized activation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8032–8042

  33. Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941

  34. He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034

  35. Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626

  36. Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13024–13033

  37. Jocher G (2020) Yolov5. https://github.com/ultralytics/yolov5. Accessed 26 Nov 2021

  38. Sun W, Dai L, Zhang X, Chang P, He X (2022) RSOD: real-time small object detection algorithm in uav-based traffic monitoring. Appl Intell 52(8):8448–8463

    Article  Google Scholar 

  39. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

  40. Wang H, Wu Z, Liu Z, Cai H, Zhu L, Gan C, Han S (2020) Hat: hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187

  41. Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H (2020) Delight: deep and light-weight transformer. arXiv preprint arXiv:2008.00623

Download references

Acknowledgements

This study is supported by Open Project of Key Laboratory of Ministry of Public Security for Road Traffic Safety, No.2021Zdsyskfkt04.

Funding

This study is supported by Open Project of Key Laboratory of Minstry of Public Security for Road Traffic Safety, No.2021Zdsyskfkt04.

Author information

Authors and Affiliations

Authors

Contributions

NC wrote the main manuscript text. YL provided some suggestions for revision of the manuscript. ZY provided funding. ZL suggested the structure of the manuscript. SW provided some support on the experimental equipment. JW gave some help to the typesetting of the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yan Li.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Code availability

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, N., Li, Y., Yang, Z. et al. LODNU: lightweight object detection network in UAV vision. J Supercomput 79, 10117–10138 (2023). https://doi.org/10.1007/s11227-023-05065-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05065-x

Keywords

Navigation