Abstract
With the continuous improvement of deep object detectors via advanced model architectures, imbalance problems in the training process have received more attention. It is a common paradigm in object detection frameworks to perform multi-scale detection. However, each scale is treated equally during training. In this paper, we carefully study the objective imbalance of multi-scale detector training. We argue that the loss in each scale level is neither equally important nor independent. Different from the existing solutions of setting multi-task weights, we dynamically optimize the loss weight of each scale level in the training process. Specifically, we propose an Adaptive Variance Weighting (AVW) to balance multi-scale loss according to the statistical variance. Then we develop a novel Reinforcement Learning Optimization (RLO) to decide the weighting scheme probabilistically during training. It makes better utilization of multi-scale training loss without extra computational complexity and learnable parameters for backpropagation. Without bells and whistles, the proposed method improves ATSS by 0.9 AP on the MS COCO benchmark. And it achieves 82.1 mAP on Pascal VOC 2007 test set, which outperforms other reinforcement-learning-based methods.
Similar content being viewed by others
References
Cai Q, Pan Y, Wang Y, Liu J, Yao T, Mei T (2020) Learning a unified sample weighting network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14161–14170
Caicedo J C, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2488–2496
Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv:2005.11475
Cao Y, Chen K, Loy C C, Lin D (2020) Prime sample attention in object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11580–11588
Chen K, Wang J, Pang J, et al. (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 6568–6577
Everingham M, Gool L V, Williams C K I, Winn J M, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12595–12604
Guo M, Haque A, Huang D, Yeung S, Fei-fei L (2018) Dynamic task prioritization for multitask learning. In: Proceedings of the European conference on computer vision (ECCV), pp 282–299
He K, Gkioxari G, Dollár P., Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
He Y, Zhu C, Wang J, Savvides M, Zhang X (2019) Bounding box regression with uncertainty for accurate object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2888–2897
Jie Z, Liang X, Feng J, Jin X, Lu W F, Yan S (2016) Tree-structured reinforcement learning for sequential object localization. In: Advances in neural information processing systems, pp 127– 135
Joya C, Dong L, Tong X, Shiwei W, Yifei C, Enhong C (2019) Is heuristic sampling necessary in training deep object detectors?. arXiv:1909.04868
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7482–7491
Kim S, Park S, Na B, Yoon S (2020) Spiking-yolo: spiking neural network for energy-efficient object detection. Proc AAAI Conf Artif Intell:11270–11277
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process (TIP) 29:7389–7398
Kong X, Xin B, Wang Y, Hua G (2017) Collaborative deep reinforcement learning for joint object search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7072–7081
Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI conference on artificial intelligence, pp 8577–8584
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8971–8980
Lin T Y, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125
Lin T Y, Goyal P, Girshick R, He K, Dollár P. (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2980–2988
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV), pp 740–755
Liu S, Huang D, Wang Y (2019) Pay attention to them: deep reinforcement learning-based cascade object detection. IEEE Trans Neural Netw Learn Syst 31(7):2544–2556
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8759–8768
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 21–37
Luo Y, Cao X, Zhang J, Guo J, Shen H, Wang T, Feng Q (2021) CE-FPN: enhancing channel information for object detection. arXiv:2103.10643
Mathe S, Pirinen A, Sminchisescu C (2016) Reinforcement learning for visual object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2894–2902
Oksuz K, Cam B C, Kalkan S, Akbas E (2021) Imbalance problems in object detection: A Review IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 821–830
Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6945–6954
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 39(6):1137–1149
Shrivastava A, Gupta A, Girshick R B (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 761–769
Sutton RS, Barto AG (1998) Reinforcement learning - an introduction. MIT Press, Cambridge. https://www.worldcat.org/oclc/37293240
Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 9626–9635
Wei Y, Pan X, Qin H, Ouyang W, Yan J (2018) Quantization mimic: Towards very tiny cnn for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283
Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo K A (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 31(1):148–162
Yang S., Gao T., Wang J., Deng B., Lansdell B., Linares-Barranco B. (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:97
Yang S., Wang J., Deng B., Azghadi M. R., Linares-Barranco B. (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst
Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi M R (2021) Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst
Yu J, Jiang Y, Wang Z, Cao Z, Huang T S (2016) Unitbox: an advanced object detection network. In: Proceedings of the ACM Conference on Multimedia, pp 516–520
Yu X, Liu T, Wang X, Tao D (2017) On compressing deep models by low rank and sparse decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7370–7379
Yuan C, Guo J, Feng P, Zhao Z, Luo Y, Xu C, Wang T, Duan K (2019) Learning deep embedding with mini-cluster loss for person re-identification. Multimed Tools Appl 78(15):21145–21166
Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic r-CNN: towards high quality object detection via dynamic training. In: Proceedings of the European conference on computer vision (ECCV), pp 260–275
Zhang S, Chi C, Yao Y, Lei Z, Li S Z (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9756–9765
Zhang T, Zhong Q, Pu S, Xie D (2021) Modulating localization and classification for harmonized object detection IEEE International conference on multimedia and expo (ICME)
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 840–849
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61572214, and Seed Foundation of Huazhong University of Science and Technology (2020kfyXGYJ114).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
All the authors have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. The authors declare that they have no conflict of interest. And each author certifies that this manuscript has not been and will not be submitted to or published in any other publication.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, Y., Cao, X., Zhang, J. et al. Dynamic multi-scale loss optimization for object detection. Multimed Tools Appl 82, 2349–2367 (2023). https://doi.org/10.1007/s11042-022-13164-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13164-9