CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention

Huilin Wang¹,
Huaming Qian¹,
Shuai Feng¹ &
…
Shuya Yan¹

293 Accesses
6 Citations
Explore all metrics

Abstract

The current deep learning-based target detection algorithm YOLOv4 has a large number of redundant convolutional computations, resulting in much consumption of memory and computational resources, making it difficult to apply on mobile devices with limited computational power and storage resources. We propose a lightweight YOLOv4 (CALYOLOv4) target detection algorithm based on coordinated attention to solve this problem. First, we use MobileNetv2CA with a coordinated attention mechanism instead of CSPDarknet53 as the backbone feature extraction network to reduce network parameters and improve network attention. Second, we use depthwise separable convolutions and mixed depth convolutions (MixConv) to replace the standard convolution in the network, further reducing the parameters and computation of the network. Finally, we choose a better-weighted bidirectional feature pyramid (BiFPN) to replace PANet as the feature fusion network to fully fuse features between different scales. The test results on the PASCAL VOC and MS COCO datasets show that, compared with the YOLOv4 algorithm, our proposed CALYOLOv4 algorithm has 89.1% fewer model total parameters and is 1.71 times faster, reaching 65 frames per second on NVIDIA GeForce RTX 3060, with 81.0% and 29.6% detection accuracy, respectively, achieving the best balance of accuracy and speed. The feasibility and effectiveness of the proposed algorithm are fully demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes

Article 12 June 2023

L-SSD: lightweight SSD target detection based on depth-separable convolution

Article 16 February 2024

Mobile-YOLO: A Lightweight and Efficient Implementation of Object Detector Based on YOLOv4

Availability of data and materials

Research data are not shared.

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Xu Q, Wang G, Li Y, Shi L, Li Y (2021) A comprehensive swarming intelligent method for optimizing deep learning-based object detection by unmanned ground vehicles. Plos one 16(5):e0251339
Article Google Scholar
Tian Y, Su D, Lauria S, Liu X (2022) Recent advances on loss functions in deep learning for computer vision. Neurocomputing
Bouraoui A, Jamoussi S, Hamadou AB (2022) A comprehensive review of deep learning for natural language processing. Int J Data Min Modell Manag 14(2):149–182
Google Scholar
Sun X, Yang D, Li X, Zhang T, Meng Y, Han Q, Wang G, Hovy E, Li J (2021) Interpreting deep learning models in natural language processing: a review. arXiv preprint arXiv:2110.10470
Vilaça L, Yu Y, Viana P (2022) Recent advances and challenges in deep audio-visual correlation learning. arXiv preprint arXiv:2202.13673
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Analy Mach Intell 39(6):1137–1149
Article Google Scholar
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2874–2883
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Deng W, Xu J, Gao XZ, Zhao H (2020) An enhanced msiqde algorithm with novel multiple strategies for global optimization problems. IEEE Trans Syst, Man, Cybern Syst 52(3):1578–1587
Article Google Scholar
Zhao H, Yang X, Chen B, Chen H, Deng W (2022) Bearing fault diagnosis using transfer learning and optimized deep belief network. Measur Sci Technol 33(6):065009
Article Google Scholar
Deng W, Xu J, Song Y, Zhao H (2021) Differential evolution algorithm with wavelet basis function and optimal mutation strategy for complex optimization problem. Appl Soft Comput 100:106724
Article Google Scholar
Zhao H, Zhang P, Zhang R, Yao R, Deng W (2022) A novel performance trend prediction approach using enbls with gwo. Measur Sci Technol 34(2):025018
Article Google Scholar
Sain SR (1996) The nature of statistical learning theory
Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Analy Mach Intell 37(9):1904–1916
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al. (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976
Wang CY, Bochkovskiy A, Liao HYM (2022) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696
Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y, et al. (2022) Pp-yoloe: An evolved version of yolo. arXiv preprint arXiv:2203.16250
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, Springer, pp 21–37
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960
Zheng W, Tang W, Jiang L, Fu CW (2021) Se-ssd: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14494–14503
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9627–9636
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 116–131
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, PMLR, pp 448–456
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
Zhuang Z, Tan M, Zhuang B, Liu J, Guo Y, Wu Q, Huang J, Zhu J (2018) Discrimination-aware channel pruning for deep neural networks. Advances in neural information processing systems 31
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1389–1397
Yang C, Liu H (2022) Channel pruning based on convolutional neural network sensitivity. Neurocomputing 507:97–106
Article Google Scholar
Liu Y, Guo Y, Guo J, Jiang L, Chen J (2021) Conditional automated channel pruning for deep neural networks. IEEE Signal Process Lett 28:1275–1279
Article Google Scholar
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, Springer, pp 525–542
Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711
Zhu C, Han S, Mao H, Dally WJ (2016) Trained ternary quantization. arXiv preprint arXiv:1612.01064
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10781–10790
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Wang CY, Liao HYM, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 390–391
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Tan M, Le QV (2019) Mixconv: Mixed depthwise convolutional kernels. arXiv preprint arXiv:1907.09595
Li L, Li B, Zhou H (2022) Lightweight multi-scale network for small object detection. PeerJ Comput Sci 8:e1145
Article Google Scholar
Ding P, Qian H, Chu S (2022) Slimyolov4: Lightweight object detector based on yolov4. J Real-Time Image Process 19(3):487–498
Article Google Scholar
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision, Springer, pp 740–755

Download references

Funding

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).

Author information

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, China
Huilin Wang, Huaming Qian, Shuai Feng & Shuya Yan

Authors

Huilin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huaming Qian
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Feng
View author publications
You can also search for this author in PubMed Google Scholar
Shuya Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HW wrote the main manuscript text. SF, SY Modify syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

this declaration is not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Qian, H., Feng, S. et al. CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention. J Supercomput 79, 18947–18969 (2023). https://doi.org/10.1007/s11227-023-05380-3

Download citation

Accepted: 03 May 2023
Published: 22 May 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11227-023-05380-3

CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes

L-SSD: lightweight SSD target detection based on depth-separable convolution

Mobile-YOLO: A Lightweight and Efficient Implementation of Object Detector Based on YOLOv4

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes

L-SSD: lightweight SSD target detection based on depth-separable convolution

Mobile-YOLO: A Lightweight and Efficient Implementation of Object Detector Based on YOLOv4

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation