Adaptive anchor box mechanism to improve the accuracy in the object detection system

Mingyu Gao^1,2,
Yujie Du^1,2,
Yuxiang Yang ORCID: orcid.org/0000-0001-8613-7822^1,3 &
…
Jing Zhang³

994 Accesses
32 Citations
Explore all metrics

Abstract

Recently, most state-of-the-art object detection systems adopt anchor box mechanism to simplify the detection model. Neural networks only need to regress the mapping relations from anchor boxes to ground truth boxes, then prediction boxes can be calculated using information from outputs of networks and default anchor boxes. However, when the problem becomes complex, the number of default anchor boxes will increase with large risk of over-fitting during training. In this paper, we adopt an adaptive anchor box mechanism that one anchor box can cover more ground truth boxes. So networks only need a few adoptive anchor boxes to solve the same problem and the model will be more robust. The sizes of adaptive anchor boxes will be adjusted automatically according to the depth collected by a Time of Flight (TOF) camera. The network adjusts the aspect ratios of anchor boxes to get final prediction boxes. The experimental results demonstrate that the proposed method can get more accurate detection results. Specifically, using the proposed adaptive anchor box mechanism, the Mean Average Precision (mAP) of YOLO-v2 and YOLO-v3 networks increases obviously on open public datasets and our self-built battery image dataset. Moreover, the visual results of prediction comparisons also illustrate that the proposed adaptive anchor box mechanism can achieve better performance than original anchor box mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Path aggregation one-stage anchor free 3D object detection

Article 17 August 2023

Lightweight object detection network model suitable for indoor mobile robots

Article 02 February 2022

Multi-task learning and joint refinement between camera localization and object detection

Article Open access 08 February 2024

References

Battery Dataset. Available online: https://github.com/shadowdyj/battery-object-dataset/tree/master
Chen Z, Kim J (2019) Multi-scale pedestrian detection using skip pooling and recurrent convolution. Multimedia Tools & Applications, 78(2):1719–1736.
Dalal N, Triggs, B (2005) Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 886–893
Girshick R (2015) Fast R-CNN. arXiv, arXiv:1504.08083
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus; pp. 580–587
He K, Zhang X, Ren S, Sun J (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, pp. 346–361
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. Proc IEEE Conf Comput Vis Pattern Recognit
Hussain KF et al (2018) A Comprehensive Study of the Effect of Spatial Resolution and Color of Digital Images on Vehicle Classification. IEEE Trans Intell Transp Syst
Karaimer HC, Brown MS (2016) A software platform for manipulating the camera imaging pipeline. European Conference on Computer Vision. Springer, Cham, pp. 429-444
Kim J-U, Kang H-B (2018) A New 3D Object Pose Detection Method Using LIDAR Shape Set. Sensors 18:882
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, pp. 1097–1105
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single Shot Multibox Detector. n proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin, Germany, pp. 21–37
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Mao J, Xiao T, Jiang Y, Cao Z (2017) What Can Help Pedestrian Detection? arXiv, arXiv:1705.02757v1
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24:971–987
Article MATH Google Scholar
Everingham M, Van~Gool L, Williams C. K. I, Winn J, Zisserman A (2007) The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html (accessed on 21 June 2018)
Pogorelov K, Riegler M, Eskeland SL et al (2017) Efficient disease detection in gastrointestinal videos – global features versus neural networks. Multimed Tools Appl (3):1–33
Portaz M, Kohl M, Chevallet JP et al Object instance identification with fully convolutional networks. Multimed Tools Appl 2018(2):1–18
Ramanath R, Snyder WE, Yoo Y et al (2005) Color image processing pipeline. IEEE Signal Process Mag 22(1):34–43
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, pp. 779–788
Redmon J, Farhadi A (2017) YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv, arXiv:1804.02767
Ren, S.; He, K.; Girshick, R.; Sun, J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, pp. 91–99
Lai K, Bo L, Ren, X et al (2011) A large-scale hierarchical multi-view rgb-d object dataset, In Proceedings of 2011 IEEE international conference on robotics and automation, pp. 1817–1824.
Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. Comput Therm Sci
Ye T, Wang B, Song P, Li J (2018) Automatic Railway Traffic Object Detection System Using Feature Fusion Refine Neural Network under Shunting Mode. Sensors 18:1916
Article Google Scholar
Yoganand AV, Kavida AC, Devi R (2018) Face detection approach from video with the aid of KPCM and improved neural network classifier. Multimedia Tools & Applications, 77(24):31763–31785.
Zhao B, Zhao B, Tang L, Han Y, Wang W (2018) Deep Spatial-Temporal Joint Feature Representation for Video Object Detection. Sensors 18:774
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (61873077, U1609216 and 61806062). This work was supported by Zhejiang Provincial Key Lab of Equipment Electronics.

Author information

Authors and Affiliations

School of Electronics and Information, Hangzhou Dianzi University, Hangzhou, China
Mingyu Gao, Yujie Du & Yuxiang Yang
Zhejiang Provincial Key Lab of Equipment Electronics, Hangzhou, China
Mingyu Gao & Yujie Du
School of Automation, Hangzhou Dianzi University, Hangzhou, China
Yuxiang Yang & Jing Zhang

Authors

Mingyu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Du
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxiang Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, M., Du, Y., Yang, Y. et al. Adaptive anchor box mechanism to improve the accuracy in the object detection system. Multimed Tools Appl 78, 27383–27402 (2019). https://doi.org/10.1007/s11042-019-07858-w

Download citation

Received: 01 August 2018
Revised: 27 May 2019
Accepted: 05 June 2019
Published: 18 June 2019
Issue Date: 15 October 2019
DOI: https://doi.org/10.1007/s11042-019-07858-w

Adaptive anchor box mechanism to improve the accuracy in the object detection system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Path aggregation one-stage anchor free 3D object detection

Lightweight object detection network model suitable for indoor mobile robots

Multi-task learning and joint refinement between camera localization and object detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Adaptive anchor box mechanism to improve the accuracy in the object detection system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Path aggregation one-stage anchor free 3D object detection

Lightweight object detection network model suitable for indoor mobile robots

Multi-task learning and joint refinement between camera localization and object detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation