Abstract
In complex traffic scenarios, it is crucial to develop a rapid and precise real-time detection system for non-motorized vehicles to ensure safe driving. D-YOLO is a lightweight real-time detection technique for non-motorized vehicles based on an enhanced version of YOLOv4-tiny. Typically, the computing capabilities of mobile devices are constrained, therefore we begin by reducing the number of model parameters. Then, we add dilated convolution and depthwise separable convolution into the network’s Cross Stage Partial Connection (CSPNet) in order to produce the DCSPNet with improved performance. Coordinate Attention(CA) is implemented to enhance the network’s capability to extract effective features. In the neck network of the model by introducing a spatial pyramid set (SPP) to enhance the feature representation of non-motorized vehicles in the feature layer. Finally, we test this proposed model on dataset, the experimental results show that D-YOLO has a model size of only 6.7MB, which is 16.5MB smaller than YOLOv4-tiny. The detection speed of D-YOLO is about 25% faster than that of YOLOv4-tiny, D-YOLO has approximately 58% fewer model parameters than YOLOv4-tiny, D-YOLO has a mAP of 70.36%, which is 2.01% higher than YOLOv4-tiny. It can be shown that D-YOLO ensures both accuracy and real-time performance to satisfies the demand of real-time detection of non-motorized vehicles in intelligent traffic scenarios.
Similar content being viewed by others
Code Availability
The code that support the findings of this study are available from the corresponding author upon reasonable request.
Availability of data and materials
Datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):1–18
Aslam N, Sharma V (2017) Foreground detection of moving object using gaussian mixture model. In: 2017 International conference on communication and signal processing (ICCSP), pp 1071–1074. IEEE
Avenash R, Viswanath P (2019) Semantic segmentation of satellite images using a modified cnn with hard-swish activation function. In: VISIGRAPP (4: VISAPP), pp 413–420
Bar-Cohen Y (2006) Biomimetics—using nature to inspire human innovation. Bioinspiration & Biomimetics 1(1):1
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection arXiv:2004.10934
Cai Y, Luan T, Gao H, Wang H, Chen L, Li Y, Sotelo MA, Li Z (2021) Yolov4-5d: an effective and efficient object detector for autonomous driving. IEEE Trans Instrum Meas 70:1–13
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0
Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) Aˆ 2-nets: double attention networks. Advances in neural information processing systems, vol 31
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. Ieee
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin P, Zhao S, Keutzer K (2018) Squeezenext: hardware-aware neural network design. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1638–1647
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2, pp 1458–1465. IEEE
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) Amc: Automl for model compression and acceleration on mobile devices. In: Proceedings of the european conference on computer vision (ECCV), pp 784–800
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications arXiv:1704.04861
Hu J, Shen L, Albanie S, Sun G, Vedaldi A (2018) Gather-excite: exploiting feature context in convolutional neural networks. Advances in Neural Information Processing Systems, vol 31
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu Y, Sun S, Li J, Wang X, Gu Q (2018) A novel channel pruning method for deep neural network compression arXiv:1805.11394
Huang Z, Li W, Xia X-G, Wang H, Jie F, Tao R (2022) Lo-det: lightweight oriented object detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2021.3067470
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang Z, Wang J, Fu X, Yu T, Guo Y, Wang R (2020) Dc-spp-yolo: dense connection and spatial pyramid pooling based yolo for object detection. Inform Sci 522:241–258
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and;0.5 mb model size arXiv:1602.07360
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178. IEEE
Li X, Lai S, Qian X (2021) Dbcface: towards pure convolutional neural network face detection. IEEE Trans Circuits Syst Video Technol 32(4):1792–1804
Li G, Yang Y, Qu X (2019) Deep learning approaches on pedestrian detection in hazy weather. IEEE Trans Ind Electron 67(10):8889–8899
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings. international conference on image processing, vol 1. IEEE
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1529–1538
Lin M, Ji R, Zhang Y, Zhang B, Wu Y, Tian Y (2020) Channel pruning via automatic structure search arXiv:2001.08565
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer
Liu J. -J., Hou Q, Cheng M-M, Wang C, Feng J (2020) Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10096–10105
Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K-T, Sun J (2019) Metapruning: meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3296–3305
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Ma X, Guo F-M, Niu W, Lin X, Tang J, Ma K, Ren B, Wang Y (2020) Pconv: the missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 5117–5124
Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the european conference on computer vision (ECCV), pp 116–131
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148
Peng C, Ma J (2020) Semantic segmentation using stride spatial pyramid pooling and dual attention decoder. Pattern Recogn 107:107498
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems, vol 28
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Song H, Liang H, Li H, Dai Z, Yun X (2019) Vision-based vehicle detection and counting system using deep learning in highway scenes. Eur Transp Res Rev 11(1):1–16
Srinivas S, Subramanya A, Venkatesh Babu R (2017) Training sparse neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 138–145
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: Proceedings. 1999 IEEE computer society conference on computer vision and pattern recognition (Cat. No PR00149), vol 2, pp 246–252. IEEE
Stollenga MF, Masci J, Gomez F, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. Advances in Neural Information Processing Systems, vol 2
Tan YS, Lim KM, Tee C, Lee CP, Low CY (2021) Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput and Applic 33(10):5339–5351
Van de Sande KE, Uijlings JR, Gevers T, Smeulders AW (2011) Segmentation as selective search for object recognition. In: 2011 International conference on computer vision, pp 1879–1886. IEEE
Wang G, Ding H, Yang Z, Li B, Wang Y, Bao L (2022) Trc-yolo: a real-time detection method for lightweight targets based on mobile devices. IET Comput Vis 16(2):126–142
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 390–391
Wang F, Tax DM (2016) Survey on the attention based rnn model and its applications in computer vision arXiv:1601.06823
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Yang S, Gao T, Wang J, Deng B, Azghadi MR, Lei T, Linares-Barranco B (2022) Sam: a unified self-adaptive multicompartmental spiking neuron model for learning with working memory. Front Neurosci, vol 16
Yang S, Linares-Barranco B, Chen B (2022) Heterogeneous ensemble-based spike-driven few-shot online learning. Front Neurosci, vol 16
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1794–1801. IEEE
Yu J, Zhang W (2021) Face mask wearing detection algorithm based on improved yolo-v4. Sensors 21(9):3263
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, pp 834–849. Springer
Zhang T, Ye S, Zhang Y, Wang Y, Fardad M (2018) Systematic weight pruning of dnns using alternating direction method of multipliers arXiv:1802.05747
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhao X, Pu F, Wang Z, Chen H, Xu Z (2019) Detection, tracking, and geolocation of moving vehicle from uav using monocular camera, vol 7
Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Proceedings of the european conference on computer vision (ECCV), pp 267–283
Acknowledgements
This study was supported in part by grants from The National Natural Science Foundation of China, Grant/Award Numbers: 61461053, 61461054.
Funding
The National Natural Science Foundation of China, Grant/Award Numbers: 61461053, 61461054
Author information
Authors and Affiliations
Contributions
Yushan Li : Conceptualization,Software, Writing, Original draft preparation.
Hongwei Ding : Supervision, Writing- Original draft preparation.
Peng Hu : Data curation, Visualization.
Zhijun Yang : Data curation, Investigation.
Guanbo Wang : Supervision, Software, Validation.
Corresponding author
Ethics declarations
Consent for Publication
The Author agrees to publication in the Journal indicated below and also topublication of the article in English by Springer in Springer’s correspondingEnglish-language journal.
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Ding, H., Hu, P. et al. Real-time detection algorithm for non-motorized vehicles based on D-YOLO model. Multimed Tools Appl 83, 61673–61696 (2024). https://doi.org/10.1007/s11042-023-14385-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14385-2