Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Real-time detection algorithm for non-motorized vehicles based on D-YOLO model

  • 1229: Multimedia Data Analysis for Smart City Environment Safety
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In complex traffic scenarios, it is crucial to develop a rapid and precise real-time detection system for non-motorized vehicles to ensure safe driving. D-YOLO is a lightweight real-time detection technique for non-motorized vehicles based on an enhanced version of YOLOv4-tiny. Typically, the computing capabilities of mobile devices are constrained, therefore we begin by reducing the number of model parameters. Then, we add dilated convolution and depthwise separable convolution into the network’s Cross Stage Partial Connection (CSPNet) in order to produce the DCSPNet with improved performance. Coordinate Attention(CA) is implemented to enhance the network’s capability to extract effective features. In the neck network of the model by introducing a spatial pyramid set (SPP) to enhance the feature representation of non-motorized vehicles in the feature layer. Finally, we test this proposed model on dataset, the experimental results show that D-YOLO has a model size of only 6.7MB, which is 16.5MB smaller than YOLOv4-tiny. The detection speed of D-YOLO is about 25% faster than that of YOLOv4-tiny, D-YOLO has approximately 58% fewer model parameters than YOLOv4-tiny, D-YOLO has a mAP of 70.36%, which is 2.01% higher than YOLOv4-tiny. It can be shown that D-YOLO ensures both accuracy and real-time performance to satisfies the demand of real-time detection of non-motorized vehicles in intelligent traffic scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Code Availability

The code that support the findings of this study are available from the corresponding author upon reasonable request.

Availability of data and materials

Datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

  1. Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):1–18

    Article  Google Scholar 

  2. Aslam N, Sharma V (2017) Foreground detection of moving object using gaussian mixture model. In: 2017 International conference on communication and signal processing (ICCSP), pp 1071–1074. IEEE

  3. Avenash R, Viswanath P (2019) Semantic segmentation of satellite images using a modified cnn with hard-swish activation function. In: VISIGRAPP (4: VISAPP), pp 413–420

  4. Bar-Cohen Y (2006) Biomimetics—using nature to inspire human innovation. Bioinspiration & Biomimetics 1(1):1

    Article  Google Scholar 

  5. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection arXiv:2004.10934

  6. Cai Y, Luan T, Gao H, Wang H, Chen L, Li Y, Sotelo MA, Li Z (2021) Yolov4-5d: an effective and efficient object detector for autonomous driving. IEEE Trans Instrum Meas 70:1–13

    Google Scholar 

  7. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0

  8. Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) Aˆ 2-nets: double attention networks. Advances in neural information processing systems, vol 31

  9. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. Ieee

  11. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154

  12. Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin P, Zhao S, Keutzer K (2018) Squeezenext: hardware-aware neural network design. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1638–1647

  13. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  14. Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2, pp 1458–1465. IEEE

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  16. He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) Amc: Automl for model compression and acceleration on mobile devices. In: Proceedings of the european conference on computer vision (ECCV), pp 784–800

  17. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  19. He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397

  20. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722

  21. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324

  22. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications arXiv:1704.04861

  23. Hu J, Shen L, Albanie S, Sun G, Vedaldi A (2018) Gather-excite: exploiting feature context in convolutional neural networks. Advances in Neural Information Processing Systems, vol 31

  24. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  25. Hu Y, Sun S, Li J, Wang X, Gu Q (2018) A novel channel pruning method for deep neural network compression arXiv:1805.11394

  26. Huang Z, Li W, Xia X-G, Wang H, Jie F, Tao R (2022) Lo-det: lightweight oriented object detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2021.3067470

    Google Scholar 

  27. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  28. Huang Z, Wang J, Fu X, Yu T, Guo Y, Wang R (2020) Dc-spp-yolo: dense connection and spatial pyramid pooling based yolo for object detection. Inform Sci 522:241–258

    Article  MathSciNet  Google Scholar 

  29. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and;0.5 mb model size arXiv:1602.07360

  30. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178. IEEE

  31. Li X, Lai S, Qian X (2021) Dbcface: towards pure convolutional neural network face detection. IEEE Trans Circuits Syst Video Technol 32(4):1792–1804

    Article  Google Scholar 

  32. Li G, Yang Y, Qu X (2019) Deep learning approaches on pedestrian detection in hazy weather. IEEE Trans Ind Electron 67(10):8889–8899

    Article  Google Scholar 

  33. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings. international conference on image processing, vol 1. IEEE

  34. Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1529–1538

  35. Lin M, Ji R, Zhang Y, Zhang B, Wu Y, Tian Y (2020) Channel pruning via automatic structure search arXiv:2001.08565

  36. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer

  37. Liu J. -J., Hou Q, Cheng M-M, Wang C, Feng J (2020) Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10096–10105

  38. Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K-T, Sun J (2019) Metapruning: meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3296–3305

  39. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  40. Ma X, Guo F-M, Niu W, Lin X, Tang J, Ma K, Ren B, Wang Y (2020) Pconv: the missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 5117–5124

  41. Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the european conference on computer vision (ECCV), pp 116–131

  42. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148

  43. Peng C, Ma J (2020) Semantic segmentation using stride spatial pyramid pooling and dual attention decoder. Pattern Recogn 107:107498

    Article  Google Scholar 

  44. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  45. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  46. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv:1804.02767

  47. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems, vol 28

  48. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  49. Song H, Liang H, Li H, Dai Z, Yun X (2019) Vision-based vehicle detection and counting system using deep learning in highway scenes. Eur Transp Res Rev 11(1):1–16

    Article  Google Scholar 

  50. Srinivas S, Subramanya A, Venkatesh Babu R (2017) Training sparse neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 138–145

  51. Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: Proceedings. 1999 IEEE computer society conference on computer vision and pattern recognition (Cat. No PR00149), vol 2, pp 246–252. IEEE

  52. Stollenga MF, Masci J, Gomez F, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. Advances in Neural Information Processing Systems, vol 2

  53. Tan YS, Lim KM, Tee C, Lee CP, Low CY (2021) Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput and Applic 33(10):5339–5351

    Article  Google Scholar 

  54. Van de Sande KE, Uijlings JR, Gevers T, Smeulders AW (2011) Segmentation as selective search for object recognition. In: 2011 International conference on computer vision, pp 1879–1886. IEEE

  55. Wang G, Ding H, Yang Z, Li B, Wang Y, Bao L (2022) Trc-yolo: a real-time detection method for lightweight targets based on mobile devices. IET Comput Vis 16(2):126–142

    Article  Google Scholar 

  56. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  57. Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 390–391

  58. Wang F, Tax DM (2016) Survey on the attention based rnn model and its applications in computer vision arXiv:1601.06823

  59. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19

  60. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  61. Yang S, Gao T, Wang J, Deng B, Azghadi MR, Lei T, Linares-Barranco B (2022) Sam: a unified self-adaptive multicompartmental spiking neuron model for learning with working memory. Front Neurosci, vol 16

  62. Yang S, Linares-Barranco B, Chen B (2022) Heterogeneous ensemble-based spike-driven few-shot online learning. Front Neurosci, vol 16

  63. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1794–1801. IEEE

  64. Yu J, Zhang W (2021) Face mask wearing detection algorithm based on improved yolo-v4. Sensors 21(9):3263

    Article  Google Scholar 

  65. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, pp 834–849. Springer

  66. Zhang T, Ye S, Zhang Y, Wang Y, Fardad M (2018) Systematic weight pruning of dnns using alternating direction method of multipliers arXiv:1802.05747

  67. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  68. Zhao X, Pu F, Wang Z, Chen H, Xu Z (2019) Detection, tracking, and geolocation of moving vehicle from uav using monocular camera, vol 7

  69. Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Proceedings of the european conference on computer vision (ECCV), pp 267–283

Download references

Acknowledgements

This study was supported in part by grants from The National Natural Science Foundation of China, Grant/Award Numbers: 61461053, 61461054.

Funding

The National Natural Science Foundation of China, Grant/Award Numbers: 61461053, 61461054

Author information

Authors and Affiliations

Authors

Contributions

Yushan Li : Conceptualization,Software, Writing, Original draft preparation.

Hongwei Ding : Supervision, Writing- Original draft preparation.

Peng Hu : Data curation, Visualization.

Zhijun Yang : Data curation, Investigation.

Guanbo Wang : Supervision, Software, Validation.

Corresponding author

Correspondence to Hongwei Ding.

Ethics declarations

Consent for Publication

The Author agrees to publication in the Journal indicated below and also topublication of the article in English by Springer in Springer’s correspondingEnglish-language journal.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Ding, H., Hu, P. et al. Real-time detection algorithm for non-motorized vehicles based on D-YOLO model. Multimed Tools Appl 83, 61673–61696 (2024). https://doi.org/10.1007/s11042-023-14385-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14385-2

Keywords

Navigation