Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Adaptive anchor box mechanism to improve the accuracy in the object detection system

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, most state-of-the-art object detection systems adopt anchor box mechanism to simplify the detection model. Neural networks only need to regress the mapping relations from anchor boxes to ground truth boxes, then prediction boxes can be calculated using information from outputs of networks and default anchor boxes. However, when the problem becomes complex, the number of default anchor boxes will increase with large risk of over-fitting during training. In this paper, we adopt an adaptive anchor box mechanism that one anchor box can cover more ground truth boxes. So networks only need a few adoptive anchor boxes to solve the same problem and the model will be more robust. The sizes of adaptive anchor boxes will be adjusted automatically according to the depth collected by a Time of Flight (TOF) camera. The network adjusts the aspect ratios of anchor boxes to get final prediction boxes. The experimental results demonstrate that the proposed method can get more accurate detection results. Specifically, using the proposed adaptive anchor box mechanism, the Mean Average Precision (mAP) of YOLO-v2 and YOLO-v3 networks increases obviously on open public datasets and our self-built battery image dataset. Moreover, the visual results of prediction comparisons also illustrate that the proposed adaptive anchor box mechanism can achieve better performance than original anchor box mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Battery Dataset. Available online: https://github.com/shadowdyj/battery-object-dataset/tree/master

  2. Chen Z, Kim J (2019) Multi-scale pedestrian detection using skip pooling and recurrent convolution. Multimedia Tools & Applications, 78(2):1719–1736.

  3. Dalal N, Triggs, B (2005) Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 886–893

  4. Girshick R (2015) Fast R-CNN. arXiv, arXiv:1504.08083

  5. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus; pp. 580–587

  6. He K, Zhang X, Ren S, Sun J (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, pp. 346–361

  7. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. Proc IEEE Conf Comput Vis Pattern Recognit

  8. Hussain KF et al (2018) A Comprehensive Study of the Effect of Spatial Resolution and Color of Digital Images on Vehicle Classification. IEEE Trans Intell Transp Syst

  9. Karaimer HC, Brown MS (2016) A software platform for manipulating the camera imaging pipeline. European Conference on Computer Vision. Springer, Cham, pp. 429-444

  10. Kim J-U, Kang H-B (2018) A New 3D Object Pose Detection Method Using LIDAR Shape Set. Sensors 18:882

    Article  Google Scholar 

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, pp. 1097–1105

  12. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single Shot Multibox Detector. n proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin, Germany, pp. 21–37

  13. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  14. Mao J, Xiao T, Jiang Y, Cao Z (2017) What Can Help Pedestrian Detection? arXiv, arXiv:1705.02757v1

  15. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24:971–987

    Article  MATH  Google Scholar 

  16. Everingham M, Van~Gool L, Williams C. K. I, Winn J, Zisserman A (2007) The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html (accessed on 21 June 2018)

  17. Pogorelov K, Riegler M, Eskeland SL et al (2017) Efficient disease detection in gastrointestinal videos – global features versus neural networks. Multimed Tools Appl (3):1–33

  18. Portaz M, Kohl M, Chevallet JP et al Object instance identification with fully convolutional networks. Multimed Tools Appl 2018(2):1–18

  19. Ramanath R, Snyder WE, Yoo Y et al (2005) Color image processing pipeline. IEEE Signal Process Mag 22(1):34–43

    Article  Google Scholar 

  20. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, pp. 779–788

  21. Redmon J, Farhadi A (2017) YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  22. Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv, arXiv:1804.02767

  23. Ren, S.; He, K.; Girshick, R.; Sun, J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, pp. 91–99

  24. Lai K, Bo L, Ren, X et al (2011) A large-scale hierarchical multi-view rgb-d object dataset, In Proceedings of 2011 IEEE international conference on robotics and automation, pp. 1817–1824.

  25. Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. Comput Therm Sci

  26. Ye T, Wang B, Song P, Li J (2018) Automatic Railway Traffic Object Detection System Using Feature Fusion Refine Neural Network under Shunting Mode. Sensors 18:1916

    Article  Google Scholar 

  27. Yoganand AV, Kavida AC, Devi R (2018) Face detection approach from video with the aid of KPCM and improved neural network classifier. Multimedia Tools & Applications, 77(24):31763–31785.

  28. Zhao B, Zhao B, Tang L, Han Y, Wang W (2018) Deep Spatial-Temporal Joint Feature Representation for Video Object Detection. Sensors 18:774

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (61873077, U1609216 and 61806062). This work was supported by Zhejiang Provincial Key Lab of Equipment Electronics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuxiang Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, M., Du, Y., Yang, Y. et al. Adaptive anchor box mechanism to improve the accuracy in the object detection system. Multimed Tools Appl 78, 27383–27402 (2019). https://doi.org/10.1007/s11042-019-07858-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07858-w

Keywords

Navigation