Abstract
Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called \(\eta\)-RepYOLO, which is built upon the \(\eta\)-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named \(\eta\)-EfficientRep, which utilizes a strategically designed network unit-\(\eta\)-RepConv and \(\eta\)-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced \(\eta\)-RepPANet and \(\eta\)-RepAFPN as the model’s detection neck, with the addition of the \(\eta\)-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the \(\eta\)-RepConv takes the place of the traditional \(3 \times 3\) conv, resulting in a marked increase in detection precision during the inference stage. Our proposed \(\eta\)-RepYOLO method, when applied to distinct neck modules, \(\eta\)-RepPANet and \(\eta\)-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for \(\eta\)-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.
Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
References
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Chu, X., Li, L., Zhang, B.: Make repvgg greater again: a quantization-aware approach. arXiv preprint arXiv:2212.01593 (2022)
Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Process. 19(3), 487–498 (2022)
Ding, X., Chen, H., Zhang, X., Huang, K., Han, J., Ding, G.: Re-parameterizing your optimizers rather than architectures. arXiv preprint arXiv:2205.15242 (2022)
Ding, X., Xia, C., Zhang, X., Chu, X., Han, J., Ding, G.: Repmlp: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883 (2021)
Ding, X., Zhang, X., Han, J., Ding, G.: Diverse branch block: building a convolution as an inception-like unit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10886–10895 (2021)
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jocher, G.: YOLOv5 by Ultralytics. https://github.com/ultralytics/yolov5. Accessed 12 Apr 2021
Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics yolov8. https://github.com/ultralytics/ultralytics. Accessed 10 Jan 2024
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 pp. 21–37 (2016)
Masood, H., Zafar, A., Ali, M.U., Hussain, T., Khan, M.A., Tariq, U., Damaševičius, R.: Tracking of a fixed-shape moving object based on the gradient descent method. Sensors 22(3), 1098 (2022)
Meng, F., Cheng, H., Zhuang, J., Li, K., Sun, X.: Rmnet: Equivalently removing residual connection from networks. arXiv preprint arXiv:2111.00687 (2021)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)
Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Trivedi, R., Farajtabar, M., Biswal, P., Zha, H.: Dyrep: learning representations over dynamic graphs. In: International Conference on Learning Representations (2019)
Vasu, P.K.A., Gabriel, J., Zhu, J., Tuzel, O., Ranjan, A.: Mobileone: an improved one millisecond mobile backbone. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7907–7917 (2023)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Weng, K., Chu, X., Xu, X., Huang, J., Wei, X.: Efficientrep: an efficient repvgg-style convnets with hardware-aware neural network design. arXiv preprint arXiv:2302.00386 (2023)
Yang, G., Lei, J., Zhu, Z., Cheng, S., Feng, Z., Liang, R.: Afpn: asymptotic feature pyramid network for object detection. arXiv preprint arXiv:2306.15988 (2023)
Acknowledgements
This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).
Author information
Authors and Affiliations
Contributions
Shuai Feng wrote the main manuscript text. Wenna Wang, Huilin Wang, and Huaming Qian modified the syntax. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Feng, S., Qian, H., Wang, H. et al. \(\eta\)-repyolo: real-time object detection method based on \(\eta\)-RepConv and YOLOv8. J Real-Time Image Proc 21, 81 (2024). https://doi.org/10.1007/s11554-024-01462-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01462-4