Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Decouple and align classification and regression in one-stage object detection

Published: 18 December 2023 Publication History

Abstract

Current one-stage object detection methods use dense prediction to generate classification and regression results at the same point on the feature map. Due to the different task attributes, classification and regression are typically trained using separate detection heads, which may result in different feature areas being focused on. However, they ultimately act on the same object, especially in the post-processing stage, where we hope they have the same performance. This inherent contradiction can seriously affect the performance of the detector. To solve this problem, we propose a flexible and effective decouple and align classification and regression one-stage object detector (DAOD), based on different aspects to decouple and align the two subtasks. Specifically, we first propose a regression subtask spatial decouple module to solve the regression spatial sensitivity problem by efficiently sampling the information of the regression result map to strengthen localization. Then, we propose a dynamic aligned label assignment strategy for sample selection, guiding the network to focus on more aligned features during training. Finally, we introduce harmonic supervision to align results while ensuring the independence of the respective task. With the negligible additional overhead, extensive experiments on the COCO dataset demonstrate the effectiveness of our DAOD. Notably, DAOD with ResNeXt-101-64×4d-DCN backbone achieves 50.0 AP at single-model single-scale testing on MS-COCO test-dev.

References

[1]
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[2]
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: on the fairness of detection and re-identification in multiple object tracking (2020)
[3]
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
[4]
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv:2107.08430 (2021)
[5]
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: ECCV, pp. 21–37. Springer (2016)
[6]
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
[7]
Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. arXiv:2006.04388 (2020)
[8]
Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: Varifocalnet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523 (2021)
[9]
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: task-aligned one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3510–3519 (2021)
[10]
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: ICCV, pp. 9627–9636 (2019)
[11]
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
[12]
Zhang T, Li Z, Sun Z, and Zhu L A fully convolutional anchor-free object detector Vis. Comput. 2023 39 2 569-580
[13]
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
[14]
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
[15]
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)
[16]
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
[17]
Yue, W., Yinpeng Chen, L., Yuan, Z.L., Wang, L., Li, H., Yun, F.: Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10195 (2020)
[18]
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)
[19]
Chen, Z., Yang, C., Li, Q., Zhao, F., Zha, Z.-J., Wu, F.: Disentangle your dense object detector. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4939–4948 (2021)
[20]
Yang L, Yan X, Wang S, Yuan C, Zhang Z, Li B, and Weiming H PDNet: toward better one-stage object detection with prediction decoupling IEEE Trans. Image Process. 2022 31 5121-5133
[21]
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
[22]
Chen, Y., Han, C., Wang, N., Zhang, Z.: Revisiting feature alignment for one-stage object detection. arXiv:1908.01570 (2019)
[23]
Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2965–2974 (2019)
[24]
Yang, Z., Liu, S., Han, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)
[25]
Chen Y, Zhang Z, Cao Y, Wang L, Lin S, and Han H Reppoints v2: verification meets regression for object detection Adv. Neural Inf. Process. Syst. 2020 33 5621-5631
[26]
Qiu, H., Ma, Y., Li, Z., Liu, S., Sun, J.: Borderdet: border feature for dense object detection. In: European Conference on Computer Vision, pp. 549–564. Springer (2020)
[27]
Xiao J, Jiang H, Li Z, and Qingyi G Rethinking prediction alignment in one-stage object detection Neurocomputing 2022 514 58-69
[28]
Yang, Y., Li, M., Meng, B., Huang, Z., Ren, J., Sun, D.: Rethinking the misalignment problem in dense object detection. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III, pp. 427–442. Springer (2023)
[29]
Jiang, Z., Shi, D., Zhang, S.: FRSE-Net: low-illumination object detection network based on feature representation refinement and semantic-aware enhancement. Vis. Comput. 1–15 (2023)
[30]
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
[31]
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
[32]
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
[33]
Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10588–10597 (2020)
[34]
Zhang X, Wan F, Liu C, Ji X, and Ye Q Learning to match anchors for visual object detection IEEE Trans. Pattern Anal. Mach. Intell. 2021 4 6 3096-3109
[35]
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 840–849 (2019)
[36]
Shao, M., Zhang, W., Li, Y., Fan, B.: Branch aware assignment for object detection. Vis. Comput. 1–10 (2022)
[37]
Pang Yu, Chengdong W, Hao W, and Xiaosheng Yu Over-sampling strategy-based class-imbalanced salient object detection and its application in underwater scene Vis. Comput. 2023 39 5 1959-1974
[38]
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
[39]
Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 355–371. Springer (2020)
[40]
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV, pp. 740–755. Springer (2014)
[41]
Everingham M, Van Gool L, Williams CKI, Winn J, and Zisserman A The PASCAL visual object classes (VOC) challenge Int. J. Comput. Vis. 2010 88 303-338
[42]
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
[43]
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Han, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
[44]
Ge, Z., Liu, S., Li, Z., Yoshie, O., Sun, J.: Ota: optimal transport assignment for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 303–312 (2021)
[45]
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–799 (2018)
[46]
Gao, Z., Wang, L., Gangshan, W.: Mutual supervision for dense object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3641–3650 (2021)
[47]
Wang, K., Zhang, L.: Reconcile prediction consistency for balanced object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3631–3640 (2021)
[48]
Liu, J., Li, D., Zheng, R., Tian, L., Shan, Y.: Rankdetnet: delving into ranking constraints for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 264–273 (2021)
[49]
Cao, Y., Chen, K., Loy, C.C., Lin, D.: Prime sample attention in object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11583–11591 (2020)
[50]
Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
[51]
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155 (2019)
[52]
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer (2010)
[53]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp 770–778 (2016)
[54]
Zhu, X., Han, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
[55]
Kong T, Sun F, Liu H, Jiang Y, Li L, and Shi J Foveabox: beyound anchor-based object detection IEEE Trans. Image Process. 2020 29 7389-7398
[56]
Ke, W., Zhang, T., Huang, Z., Ye, Q., Liu, J., Huang, D.: Multiple anchor learning for visual object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10206–10215 (2020)
[57]
Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pp. 91–107. Springer (2020)
[58]
Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss v2: learning reliable localization quality estimation for dense object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11632–11641 (2021)
[59]
Ma, Y., Liu, S., Li, Z., Sun, J.: Iqdet: instance-wise quality distribution sampling for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1717–1725 (2021)
[60]
Zhu, B., Wang, J., Jiang, Z., Zong, F., Liu, S., Li, Z., Sun, J.: Autoassign: differentiable label assignment for dense object detection. arXiv:2007.03496 (2020)

Index Terms

  1. Decouple and align classification and regression in one-stage object detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image The Visual Computer: International Journal of Computer Graphics
          The Visual Computer: International Journal of Computer Graphics  Volume 40, Issue 11
          Nov 2024
          840 pages

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 18 December 2023
          Accepted: 21 November 2023

          Author Tags

          1. Deep learning
          2. Object detection
          3. Decouple
          4. Alignment
          5. Label assignment

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Nov 2024

          Other Metrics

          Citations

          View Options

          View options

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media