$$\eta$$ -repyolo: real-time object detection method based on $$\eta$$ -RepConv and YOLOv8

Shuai Feng¹,
Huaming Qian¹,
Huilin Wang¹ &
…
Wenna Wang¹

264 Accesses
Explore all metrics

Abstract

Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called $\eta$-RepYOLO, which is built upon the $\eta$-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named $\eta$-EfficientRep, which utilizes a strategically designed network unit-$\eta$-RepConv and $\eta$-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced $\eta$-RepPANet and $\eta$-RepAFPN as the model’s detection neck, with the addition of the $\eta$-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the $\eta$-RepConv takes the place of the traditional $3 \times 3$ conv, resulting in a marked increase in detection precision during the inference stage. Our proposed $\eta$-RepYOLO method, when applied to distinct neck modules, $\eta$-RepPANet and $\eta$-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for $\eta$-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion

Article 21 October 2024

Comparative Analysis of Advanced Deep Learning Algorithms for Object Detection

A Local Top-Down Module for Object Detection with Multi-scale Features

Data availability

No datasets were generated or analyzed during the current study.

References

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Chu, X., Li, L., Zhang, B.: Make repvgg greater again: a quantization-aware approach. arXiv preprint arXiv:2212.01593 (2022)
Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Process. 19(3), 487–498 (2022)
Article Google Scholar
Ding, X., Chen, H., Zhang, X., Huang, K., Han, J., Ding, G.: Re-parameterizing your optimizers rather than architectures. arXiv preprint arXiv:2205.15242 (2022)
Ding, X., Xia, C., Zhang, X., Chu, X., Han, J., Ding, G.: Repmlp: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883 (2021)
Ding, X., Zhang, X., Han, J., Ding, G.: Diverse branch block: building a convolution as an inception-like unit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10886–10895 (2021)
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jocher, G.: YOLOv5 by Ultralytics. https://github.com/ultralytics/yolov5. Accessed 12 Apr 2021
Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics yolov8. https://github.com/ultralytics/ultralytics. Accessed 10 Jan 2024
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 pp. 21–37 (2016)
Masood, H., Zafar, A., Ali, M.U., Hussain, T., Khan, M.A., Tariq, U., Damaševičius, R.: Tracking of a fixed-shape moving object based on the gradient descent method. Sensors 22(3), 1098 (2022)
Article Google Scholar
Meng, F., Cheng, H., Zhuang, J., Li, K., Sun, X.: Rmnet: Equivalently removing residual connection from networks. arXiv preprint arXiv:2111.00687 (2021)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)
Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Trivedi, R., Farajtabar, M., Biswal, P., Zha, H.: Dyrep: learning representations over dynamic graphs. In: International Conference on Learning Representations (2019)
Vasu, P.K.A., Gabriel, J., Zhu, J., Tuzel, O., Ranjan, A.: Mobileone: an improved one millisecond mobile backbone. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7907–7917 (2023)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Weng, K., Chu, X., Xu, X., Huang, J., Wei, X.: Efficientrep: an efficient repvgg-style convnets with hardware-aware neural network design. arXiv preprint arXiv:2302.00386 (2023)
Yang, G., Lei, J., Zhu, Z., Cheng, S., Feng, Z., Liang, R.: Afpn: asymptotic feature pyramid network for object detection. arXiv preprint arXiv:2306.15988 (2023)

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).

Author information

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, China
Shuai Feng, Huaming Qian, Huilin Wang & Wenna Wang

Authors

Shuai Feng
View author publications
You can also search for this author in PubMed Google Scholar
Huaming Qian
View author publications
You can also search for this author in PubMed Google Scholar
Huilin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenna Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Shuai Feng wrote the main manuscript text. Wenna Wang, Huilin Wang, and Huaming Qian modified the syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Feng, S., Qian, H., Wang, H. et al. $\eta$-repyolo: real-time object detection method based on $\eta$-RepConv and YOLOv8. J Real-Time Image Proc 21, 81 (2024). https://doi.org/10.1007/s11554-024-01462-4

Download citation

Received: 23 February 2024
Accepted: 16 April 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s11554-024-01462-4

\(\eta\)-repyolo: real-time object detection method based on \(\eta\)-RepConv and YOLOv8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion

Comparative Analysis of Advanced Deep Learning Algorithms for Object Detection

A Local Top-Down Module for Object Detection with Multi-scale Features

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

\(\eta\)-repyolo: real-time object detection method based on \(\eta\)-RepConv and YOLOv8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion

Comparative Analysis of Advanced Deep Learning Algorithms for Object Detection

A Local Top-Down Module for Object Detection with Multi-scale Features

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation