Nothing Special   »   [go: up one dir, main page]

Skip to main content

Category-Level 6D Object Pose and Size Estimation Using Self-supervised Deep Prior Deformation Networks

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

It is difficult to precisely annotate object instances and their semantics in 3D space, and as such, synthetic data are extensively used for these tasks, e.g., category-level 6D object pose and size estimation. However, the easy annotations in synthetic domains bring the downside effect of synthetic-to-real (Sim2Real) domain gap. In this work, we aim to address this issue in the task setting of Sim2Real, unsupervised domain adaptation for category-level 6D object pose and size estimation. We propose a method that is built upon a novel Deep Prior Deformation Network, shortened as DPDN. DPDN learns to deform features of categorical shape priors to match those of object observations, and is thus able to establish deep correspondence in the feature space for direct regression of object poses and sizes. To reduce the Sim2Real domain gap, we formulate a novel self-supervised objective upon DPDN via consistency learning; more specifically, we apply two rigid transformations to each object observation in parallel, and feed them into DPDN respectively to yield dual sets of predictions; on top of the parallel learning, an inter-consistency term is employed to keep cross consistency between dual predictions for improving the sensitivity of DPDN to pose changes, while individual intra-consistency ones are used to enforce self-adaptation within each learning itself. We train DPDN on both training sets of the synthetic CAMERA25 and real-world REAL275 datasets; our results outperform the existing methods on REAL275 test set under both the unsupervised and supervised settings. Ablation studies also verify the efficacy of our designs. Our code is released publicly at https://github.com/JiehongLin/Self-DPDN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M.: Domain-adversarial neural networks. arXiv preprint arXiv:1412.4446 (2014)

  2. Azuma, R.T.: A survey of augmented reality. Presence Teleoperators Virtual Environ. 6(4), 355–385 (1997)

    Google Scholar 

  3. Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2773–2782 (2021)

    Google Scholar 

  4. Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1581–1590 (2021)

    Google Scholar 

  5. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)

    Google Scholar 

  6. Deng, S., Liang, Z., Sun, L., Jia, K.: Vista: boosting 3D object detection via dual cross-view spatial attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8448–8457 (2022)

    Google Scholar 

  7. Denninger, M., et al.: Blenderproc: reducing the reality gap with photorealistic rendering. In: International Conference on Robotics: Science and Systems, RSS 2020 (2020)

    Google Scholar 

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11632–11641 (2020)

    Google Scholar 

  11. Lee, T., et al.: UDA-COPE: unsupervised domain adaptation for category-level object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14891–14900 (2022)

    Google Scholar 

  12. Levinson, J., et al.: Towards fully autonomous driving: systems and algorithms. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 163–168. IEEE (2011)

    Google Scholar 

  13. Lin, H., Liu, Z., Cheang, C., Fu, Y., Guo, G., Xue, X.: SAR-Net: shape alignment and recovery network for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717 (2022)

    Google Scholar 

  14. Lin, J., Li, H., Chen, K., Lu, J., Jia, K.: Sparse steerable convolutions: an efficient learning of se (3)-equivariant features for estimation and tracking of object poses in 3D space. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  15. Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: Dualposenet: category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3560–3569 (2021)

    Google Scholar 

  16. Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp. 97–105. PMLR (2015)

    Google Scholar 

  17. Mousavian, A., Eppner, C., Fox, D.: 6-DOF GraspNet: variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2901–2910 (2019)

    Google Scholar 

  18. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  19. Qin, C., You, H., Wang, L., Kuo, C.C.J., Fu, Y.: Pointdan: a multi-scale 3D domain adaption network for point cloud representation. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  20. Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 530–546. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_32

    Chapter  Google Scholar 

  21. Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(04), 376–380 (1991)

    Article  Google Scholar 

  22. Wang, C., et al.: Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)

    Google Scholar 

  23. Wang, G., Manhardt, F., Shao, J., Ji, X., Navab, N., Tombari, F.: Self6D: self-supervised monocular 6D object pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 108–125. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_7

    Chapter  Google Scholar 

  24. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)

    Google Scholar 

  25. Wang, J., Chen, K., Dou, Q.: Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4807–4814. IEEE (2021)

    Google Scholar 

  26. Wang, Z., Jia, K.: Frustum convnet: sliding frustums to aggregate local point-wise features for amodal 3D object detection. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1742–1749. IEEE (2019)

    Google Scholar 

  27. Wu, C., et al.: Grasp proposal networks: an end-to-end solution for visual learning of robotic grasps. Adv. Neural. Inf. Process. Syst. 33, 13174–13184 (2020)

    Google Scholar 

  28. Zhang, Y., Deng, B., Jia, K., Zhang, L.: Label propagation with augmented anchors: a simple semi-supervised learning baseline for unsupervised domain adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 781–797. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_45

    Chapter  Google Scholar 

  29. Zhang, Y., Deng, B., Tang, H., Zhang, L., Jia, K.: Unsupervised multi-class domain adaptation: theory, algorithms, and practice. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  30. Zhang, Y., Tang, H., Jia, K., Tan, M.: Domain-symmetric networks for adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5031–5040 (2019)

    Google Scholar 

  31. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

  32. Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5745–5753 (2019)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by Guangdong R &D key project of China (No.: 2019B010155001), and the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (No.: 2017ZT07X183).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kui Jia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, J., Wei, Z., Ding, C., Jia, K. (2022). Category-Level 6D Object Pose and Size Estimation Using Self-supervised Deep Prior Deformation Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20077-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20076-2

  • Online ISBN: 978-3-031-20077-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics