Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

\(\eta\)-repyolo: real-time object detection method based on \(\eta\)-RepConv and YOLOv8

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called \(\eta\)-RepYOLO, which is built upon the \(\eta\)-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named \(\eta\)-EfficientRep, which utilizes a strategically designed network unit-\(\eta\)-RepConv and \(\eta\)-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced \(\eta\)-RepPANet and \(\eta\)-RepAFPN as the model’s detection neck, with the addition of the \(\eta\)-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the \(\eta\)-RepConv takes the place of the traditional \(3 \times 3\) conv, resulting in a marked increase in detection precision during the inference stage. Our proposed \(\eta\)-RepYOLO method, when applied to distinct neck modules, \(\eta\)-RepPANet and \(\eta\)-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for \(\eta\)-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  2. Chu, X., Li, L., Zhang, B.: Make repvgg greater again: a quantization-aware approach. arXiv preprint arXiv:2212.01593 (2022)

  3. Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Process. 19(3), 487–498 (2022)

    Article  Google Scholar 

  4. Ding, X., Chen, H., Zhang, X., Huang, K., Han, J., Ding, G.: Re-parameterizing your optimizers rather than architectures. arXiv preprint arXiv:2205.15242 (2022)

  5. Ding, X., Xia, C., Zhang, X., Chu, X., Han, J., Ding, G.: Repmlp: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883 (2021)

  6. Ding, X., Zhang, X., Han, J., Ding, G.: Diverse branch block: building a convolution as an inception-like unit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10886–10895 (2021)

  7. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)

  8. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  10. Jocher, G.: YOLOv5 by Ultralytics. https://github.com/ultralytics/yolov5. Accessed 12 Apr 2021

  11. Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics yolov8. https://github.com/ultralytics/ultralytics. Accessed 10 Jan 2024

  12. Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)

  13. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  14. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 pp. 21–37 (2016)

  15. Masood, H., Zafar, A., Ali, M.U., Hussain, T., Khan, M.A., Tariq, U., Damaševičius, R.: Tracking of a fixed-shape moving object based on the gradient descent method. Sensors 22(3), 1098 (2022)

    Article  Google Scholar 

  16. Meng, F., Cheng, H., Zhuang, J., Li, K., Sun, X.: Rmnet: Equivalently removing residual connection from networks. arXiv preprint arXiv:2111.00687 (2021)

  17. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  18. Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)

  19. Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)

    Article  Google Scholar 

  20. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  21. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  22. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)

  23. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

  24. Trivedi, R., Farajtabar, M., Biswal, P., Zha, H.: Dyrep: learning representations over dynamic graphs. In: International Conference on Learning Representations (2019)

  25. Vasu, P.K.A., Gabriel, J., Zhu, J., Tuzel, O., Ranjan, A.: Mobileone: an improved one millisecond mobile backbone. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7907–7917 (2023)

  26. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

  27. Weng, K., Chu, X., Xu, X., Huang, J., Wei, X.: Efficientrep: an efficient repvgg-style convnets with hardware-aware neural network design. arXiv preprint arXiv:2302.00386 (2023)

  28. Yang, G., Lei, J., Zhu, Z., Cheng, S., Feng, Z., Liang, R.: Afpn: asymptotic feature pyramid network for object detection. arXiv preprint arXiv:2306.15988 (2023)

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).

Author information

Authors and Affiliations

Authors

Contributions

Shuai Feng wrote the main manuscript text. Wenna Wang, Huilin Wang, and Huaming Qian modified the syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, S., Qian, H., Wang, H. et al. \(\eta\)-repyolo: real-time object detection method based on \(\eta\)-RepConv and YOLOv8. J Real-Time Image Proc 21, 81 (2024). https://doi.org/10.1007/s11554-024-01462-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01462-4

Keywords

Navigation