research-article

Decouple and align classification and regression in one-stage object detection

Authors: Zhaoyan Fang, Niannian Chen, Yong Jiang, Yong FanAuthors Info & Claims

The Visual Computer, Volume 40, Issue 11

Pages 7773 - 7786

https://doi.org/10.1007/s00371-023-03207-z

Published: 18 December 2023 Publication History

Abstract

Current one-stage object detection methods use dense prediction to generate classification and regression results at the same point on the feature map. Due to the different task attributes, classification and regression are typically trained using separate detection heads, which may result in different feature areas being focused on. However, they ultimately act on the same object, especially in the post-processing stage, where we hope they have the same performance. This inherent contradiction can seriously affect the performance of the detector. To solve this problem, we propose a flexible and effective decouple and align classification and regression one-stage object detector (DAOD), based on different aspects to decouple and align the two subtasks. Specifically, we first propose a regression subtask spatial decouple module to solve the regression spatial sensitivity problem by efficiently sampling the information of the regression result map to strengthen localization. Then, we propose a dynamic aligned label assignment strategy for sample selection, guiding the network to focus on more aligned features during training. Finally, we introduce harmonic supervision to align results while ensuring the independence of the respective task. With the negligible additional overhead, extensive experiments on the COCO dataset demonstrate the effectiveness of our DAOD. Notably, DAOD with ResNeXt-101-64

\times

4d-DCN backbone achieves 50.0 AP at single-model single-scale testing on MS-COCO test-dev.

References

[1]

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

[2]

Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: on the fairness of detection and re-identification in multiple object tracking (2020)

[3]

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

[4]

Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv:2107.08430 (2021)

[5]

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: ECCV, pp. 21–37. Springer (2016)

[6]

Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)

[7]

Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. arXiv:2006.04388 (2020)

[8]

Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: Varifocalnet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523 (2021)

[9]

Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: task-aligned one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3510–3519 (2021)

[10]

Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: ICCV, pp. 9627–9636 (2019)

[11]

Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)

[12]

Zhang T, Li Z, Sun Z, and Zhu L A fully convolutional anchor-free object detector Vis. Comput. 2023 39 2 569-580

Digital Library

[13]

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

[14]

Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

[15]

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)

[16]

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)

[17]

Yue, W., Yinpeng Chen, L., Yuan, Z.L., Wang, L., Li, H., Yun, F.: Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10195 (2020)

[18]

Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)

[19]

Chen, Z., Yang, C., Li, Q., Zhao, F., Zha, Z.-J., Wu, F.: Disentangle your dense object detector. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4939–4948 (2021)

[20]

Yang L, Yan X, Wang S, Yuan C, Zhang Z, Li B, and Weiming H PDNet: toward better one-stage object detection with prediction decoupling IEEE Trans. Image Process. 2022 31 5121-5133

Digital Library

[21]

Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)

[22]

Chen, Y., Han, C., Wang, N., Zhang, Z.: Revisiting feature alignment for one-stage object detection. arXiv:1908.01570 (2019)

[23]

Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2965–2974 (2019)

[24]

Yang, Z., Liu, S., Han, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)

[25]

Chen Y, Zhang Z, Cao Y, Wang L, Lin S, and Han H Reppoints v2: verification meets regression for object detection Adv. Neural Inf. Process. Syst. 2020 33 5621-5631

[26]

Qiu, H., Ma, Y., Li, Z., Liu, S., Sun, J.: Borderdet: border feature for dense object detection. In: European Conference on Computer Vision, pp. 549–564. Springer (2020)

[27]

Xiao J, Jiang H, Li Z, and Qingyi G Rethinking prediction alignment in one-stage object detection Neurocomputing 2022 514 58-69

Digital Library

[28]

Yang, Y., Li, M., Meng, B., Huang, Z., Ren, J., Sun, D.: Rethinking the misalignment problem in dense object detection. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III, pp. 427–442. Springer (2023)

[29]

Jiang, Z., Shi, D., Zhang, S.: FRSE-Net: low-illumination object detection network based on feature representation refinement and semantic-aware enhancement. Vis. Comput. 1–15 (2023)

[30]

Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

[31]

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)

[32]

Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)

[33]

Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10588–10597 (2020)

[34]

Zhang X, Wan F, Liu C, Ji X, and Ye Q Learning to match anchors for visual object detection IEEE Trans. Pattern Anal. Mach. Intell. 2021 4 6 3096-3109

[35]

Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 840–849 (2019)

[36]

Shao, M., Zhang, W., Li, Y., Fan, B.: Branch aware assignment for object detection. Vis. Comput. 1–10 (2022)

[37]

Pang Yu, Chengdong W, Hao W, and Xiaosheng Yu Over-sampling strategy-based class-imbalanced salient object detection and its application in underwater scene Vis. Comput. 2023 39 5 1959-1974

Digital Library

[38]

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)

[39]

Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 355–371. Springer (2020)

[40]

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV, pp. 740–755. Springer (2014)

[41]

Everingham M, Van Gool L, Williams CKI, Winn J, and Zisserman A The PASCAL visual object classes (VOC) challenge Int. J. Comput. Vis. 2010 88 303-338

Digital Library

[42]

Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

[43]

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Han, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

[44]

Ge, Z., Liu, S., Li, Z., Yoshie, O., Sun, J.: Ota: optimal transport assignment for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 303–312 (2021)

[45]

Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–799 (2018)

[46]

Gao, Z., Wang, L., Gangshan, W.: Mutual supervision for dense object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3641–3650 (2021)

[47]

Wang, K., Zhang, L.: Reconcile prediction consistency for balanced object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3631–3640 (2021)

[48]

Liu, J., Li, D., Zheng, R., Tian, L., Shan, Y.: Rankdetnet: delving into ranking constraints for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 264–273 (2021)

[49]

Cao, Y., Chen, K., Loy, C.C., Lin, D.: Prime sample attention in object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11583–11591 (2020)

[50]

Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)

[51]

Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155 (2019)

[52]

Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer (2010)

[53]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp 770–778 (2016)

[54]

Zhu, X., Han, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)

[55]

Kong T, Sun F, Liu H, Jiang Y, Li L, and Shi J Foveabox: beyound anchor-based object detection IEEE Trans. Image Process. 2020 29 7389-7398

Digital Library

[56]

Ke, W., Zhang, T., Huang, Z., Ye, Q., Liu, J., Huang, D.: Multiple anchor learning for visual object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10206–10215 (2020)

[57]

Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pp. 91–107. Springer (2020)

[58]

Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss v2: learning reliable localization quality estimation for dense object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11632–11641 (2021)

[59]

Ma, Y., Liu, S., Li, Z., Sun, J.: Iqdet: instance-wise quality distribution sampling for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1717–1725 (2021)

[60]

Zhu, B., Wang, J., Jiang, Z., Zong, F., Liu, S., Li, Z., Sun, J.: Autoassign: differentiable label assignment for dense object detection. arXiv:2007.03496 (2020)

Index Terms

Decouple and align classification and regression in one-stage object detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
        Object recognition
      2. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Machine learning approaches

Index terms have been assigned to the content through auto-classification.

Recommendations

Rethinking prediction alignment in one-stage object detection
Abstract
Owing to their excellent performance and efficiency, one-stage detectors have been widely used in multimedia tasks, such as temporal action detection, object tracking, and video detection. However, misalignment between classification ...
Automatic label assignment object detection mehtod on only one feature map
Abstract
Most deep learning-based object detection methods are proposed based on multi-level feature environments. Although some researchers have tried to detect on one-level features, where multiple feature maps are utilized. In this paper, we aim to ...
DLA: Dynamic Label Assignment for Accurate One-stage Object Detection
ICSCA '22: Proceedings of the 2022 11th International Conference on Software and Computer Applications

One-stage object detector has been the most widely used framework in modern object detection due to its excellent performance and high efficiency. Label assignment, which is designed to discriminate positive and negative samples in training process, is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 40, Issue 11

Nov 2024

840 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 18 December 2023

Accepted: 21 November 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents