Abstract
With the development of unmanned aerial vehicle (UAV) technology, using UAV to detect objects has become a research focus. However, most of the current object detection algorithms based on deep learning are resource consuming, which is difficult to deploy on embedded devices lacking memory and computing power, such as UAV. To meet these challenges, this paper proposes a lightweight object detection network in UAV vision (LODNU) based on YOLOv4, which can meet the application requirements of resource-constrained devices while ensuring the detection accuracy. Based on YOLOv4, LODNU uses depth-wise separable convolution to reconstruct the backbone network to reduce the parameters of the model, and embeds improved coordinate attention in the backbone network to improve the extraction ability of key object features. Meanwhile, the adaptive scale weighted feature fusion module is added to the path aggregation network to improve the accuracy of multi-scale object detection. In addition, in order to balance the proportion of sample size, we propose a patching data augmentation. LODNU achieves the performance near to YOLOv4 with fewer parameters. Compared with YOLOv4, LODNU achieves more balanced results between model size and accuracy in the VisDrone2019 dataset. The experimental show that the parameters of LODNU are only 13.1% of YOLOv4, and the floating-point operations are only 13.6% of YOLOv4. The mean average precision of LODNU in the VisDrone2019 dataset is 31.4%, which is 77% of YOLOv4.
Similar content being viewed by others
Availability of data and materials
Not applicable.
References
Gupta H, Verma OP (2022) monitoring and surveillance of urban road traffic using low altitude drone images: a deep learning approach. Multimed Tools Appl 81(14):19683–19703
Wang S, Zhao J, Ta N, Zhao X, Xiao M, Wei H (2021) A real-time deep learning forest fire monitoring algorithm based on an improved pruned+ kd model. J Real Time Image Proc 18(6):2319–2329
Huang Z, Zhang T, Liu P, Lu X (2020) outdoor independent charging platform system for power patrol UAV. In: 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp 1–5
Han J, Zhang D, Cheng G et al (2014) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337
Shi Z, Yu X, Jiang Z et al (2013) Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans Geosci Remote Sens 52(8):4511–4523
Everingham M, Eslami S, Van Gool L et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755
Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
Girshick R, Donahue J, Darrell T et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 2117–2125
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp 21–37
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Chen Y, Chen X, Chen L, He D, Zheng J, Xu C, Lin Y, Liu L (2021) UAV lightweight object detection based on the improved yolo algorithm. In: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, pp 1502–1506
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Dai J, Li Y, He K et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29:379–387
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Yu F, Wang D, Shelhamer E et al (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2403–2412
Kim S-W, Kook H-K, Sun J-Y et al (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250
Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045
Iandola FN, Han S, Moskewicz MW, et al (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324
Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856
Ma N, Zhang X, Liu M et al (2021) Activate or not: learning customized activation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8032–8042
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941
He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034
Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626
Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13024–13033
Jocher G (2020) Yolov5. https://github.com/ultralytics/yolov5. Accessed 26 Nov 2021
Sun W, Dai L, Zhang X, Chang P, He X (2022) RSOD: real-time small object detection algorithm in uav-based traffic monitoring. Appl Intell 52(8):8448–8463
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
Wang H, Wu Z, Liu Z, Cai H, Zhu L, Gan C, Han S (2020) Hat: hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187
Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H (2020) Delight: deep and light-weight transformer. arXiv preprint arXiv:2008.00623
Acknowledgements
This study is supported by Open Project of Key Laboratory of Ministry of Public Security for Road Traffic Safety, No.2021Zdsyskfkt04.
Funding
This study is supported by Open Project of Key Laboratory of Minstry of Public Security for Road Traffic Safety, No.2021Zdsyskfkt04.
Author information
Authors and Affiliations
Contributions
NC wrote the main manuscript text. YL provided some suggestions for revision of the manuscript. ZY provided funding. ZL suggested the structure of the manuscript. SW provided some support on the experimental equipment. JW gave some help to the typesetting of the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, N., Li, Y., Yang, Z. et al. LODNU: lightweight object detection network in UAV vision. J Supercomput 79, 10117–10138 (2023). https://doi.org/10.1007/s11227-023-05065-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05065-x