Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Dynamic multi-scale loss optimization for object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the continuous improvement of deep object detectors via advanced model architectures, imbalance problems in the training process have received more attention. It is a common paradigm in object detection frameworks to perform multi-scale detection. However, each scale is treated equally during training. In this paper, we carefully study the objective imbalance of multi-scale detector training. We argue that the loss in each scale level is neither equally important nor independent. Different from the existing solutions of setting multi-task weights, we dynamically optimize the loss weight of each scale level in the training process. Specifically, we propose an Adaptive Variance Weighting (AVW) to balance multi-scale loss according to the statistical variance. Then we develop a novel Reinforcement Learning Optimization (RLO) to decide the weighting scheme probabilistically during training. It makes better utilization of multi-scale training loss without extra computational complexity and learnable parameters for backpropagation. Without bells and whistles, the proposed method improves ATSS by 0.9 AP on the MS COCO benchmark. And it achieves 82.1 mAP on Pascal VOC 2007 test set, which outperforms other reinforcement-learning-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Cai Q, Pan Y, Wang Y, Liu J, Yao T, Mei T (2020) Learning a unified sample weighting network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14161–14170

  2. Caicedo J C, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2488–2496

  3. Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv:2005.11475

  4. Cao Y, Chen K, Loy C C, Lin D (2020) Prime sample attention in object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11580–11588

  5. Chen K, Wang J, Pang J, et al. (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155

  6. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 6568–6577

  7. Everingham M, Gool L V, Williams C K I, Winn J M, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  8. Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12595–12604

  9. Guo M, Haque A, Huang D, Yeung S, Fei-fei L (2018) Dynamic task prioritization for multitask learning. In: Proceedings of the European conference on computer vision (ECCV), pp 282–299

  10. He K, Gkioxari G, Dollár P., Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2961–2969

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  12. He Y, Zhu C, Wang J, Savvides M, Zhang X (2019) Bounding box regression with uncertainty for accurate object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2888–2897

  13. Jie Z, Liang X, Feng J, Jin X, Lu W F, Yan S (2016) Tree-structured reinforcement learning for sequential object localization. In: Advances in neural information processing systems, pp 127– 135

  14. Joya C, Dong L, Tong X, Shiwei W, Yifei C, Enhong C (2019) Is heuristic sampling necessary in training deep object detectors?. arXiv:1909.04868

  15. Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7482–7491

  16. Kim S, Park S, Na B, Yoon S (2020) Spiking-yolo: spiking neural network for energy-efficient object detection. Proc AAAI Conf Artif Intell:11270–11277

  17. Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process (TIP) 29:7389–7398

    Article  MATH  Google Scholar 

  18. Kong X, Xin B, Wang Y, Hua G (2017) Collaborative deep reinforcement learning for joint object search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7072–7081

  19. Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI conference on artificial intelligence, pp 8577–8584

  20. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8971–8980

  21. Lin T Y, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125

  22. Lin T Y, Goyal P, Girshick R, He K, Dollár P. (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2980–2988

  23. Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV), pp 740–755

  24. Liu S, Huang D, Wang Y (2019) Pay attention to them: deep reinforcement learning-based cascade object detection. IEEE Trans Neural Netw Learn Syst 31(7):2544–2556

    Google Scholar 

  25. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8759–8768

  26. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 21–37

  27. Luo Y, Cao X, Zhang J, Guo J, Shen H, Wang T, Feng Q (2021) CE-FPN: enhancing channel information for object detection. arXiv:2103.10643

  28. Mathe S, Pirinen A, Sminchisescu C (2016) Reinforcement learning for visual object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2894–2902

  29. Oksuz K, Cam B C, Kalkan S, Akbas E (2021) Imbalance problems in object detection: A Review IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  30. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 821–830

  31. Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6945–6954

  32. Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 39(6):1137–1149

    Article  Google Scholar 

  33. Shrivastava A, Gupta A, Girshick R B (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 761–769

  34. Sutton RS, Barto AG (1998) Reinforcement learning - an introduction. MIT Press, Cambridge. https://www.worldcat.org/oclc/37293240

    Book  MATH  Google Scholar 

  35. Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 9626–9635

  36. Wei Y, Pan X, Qin H, Ouyang W, Yan J (2018) Quantization mimic: Towards very tiny cnn for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283

  37. Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo K A (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 31(1):148–162

    Article  Google Scholar 

  38. Yang S., Gao T., Wang J., Deng B., Lansdell B., Linares-Barranco B. (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:97

    Article  Google Scholar 

  39. Yang S., Wang J., Deng B., Azghadi M. R., Linares-Barranco B. (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst

  40. Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi M R (2021) Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst

  41. Yu J, Jiang Y, Wang Z, Cao Z, Huang T S (2016) Unitbox: an advanced object detection network. In: Proceedings of the ACM Conference on Multimedia, pp 516–520

  42. Yu X, Liu T, Wang X, Tao D (2017) On compressing deep models by low rank and sparse decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7370–7379

  43. Yuan C, Guo J, Feng P, Zhao Z, Luo Y, Xu C, Wang T, Duan K (2019) Learning deep embedding with mini-cluster loss for person re-identification. Multimed Tools Appl 78(15):21145–21166

    Article  Google Scholar 

  44. Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic r-CNN: towards high quality object detection via dynamic training. In: Proceedings of the European conference on computer vision (ECCV), pp 260–275

  45. Zhang S, Chi C, Yao Y, Lei Z, Li S Z (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9756–9765

  46. Zhang T, Zhong Q, Pu S, Xie D (2021) Modulating localization and classification for harmonized object detection IEEE International conference on multimedia and expo (ICME)

  47. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 840–849

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61572214, and Seed Foundation of Huazhong University of Science and Technology (2020kfyXGYJ114).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Feng.

Ethics declarations

Conflict of Interests

All the authors have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. The authors declare that they have no conflict of interest. And each author certifies that this manuscript has not been and will not be submitted to or published in any other publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Y., Cao, X., Zhang, J. et al. Dynamic multi-scale loss optimization for object detection. Multimed Tools Appl 82, 2349–2367 (2023). https://doi.org/10.1007/s11042-022-13164-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13164-9

Keywords

Navigation