Abstract
The current challenging problems of learning a robust 6D pose lie in noise in RGB/RGBD images, sparsity of point cloud and severe occlusion. To tackle the problems, object geometric information is critical. In this work, we present a novel pipeline for 6DoF object pose estimation. Unlike previous methods that directly regressing pose parameters and predicting keypoints, we tackle this challenging task with a point-pair based approach and leverage geometric information as much as possible. Specifically, at the representation learning stage, we build a point cloud network locally modeling CNN to encode point cloud, which is able to extract effective geometric features while the point cloud is projected into a high-dimensional space. Moreover, we design a coordinate conversion network to regress point cloud in the object coordinate system in a decoded way. Then, the pose could be calculated through point pairs matching algorithm. Experimental results show that our method achieves state-of-the-art performance on several datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data/reanalysis that support the findings of this study are publicly available online at https://drive.google.com/file/d/1if4VoEXNx9W3XCn0Y7Fp15B4GpcYbyYi/view and https://drive.google.com/drive/folders/19ivHpaKm9dOrr12fzC8IDFczWRPFxho7.
References
Marchand E, Uchiyama H, Spindler F (2015) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Visual Comput Grap 22(12):2633–2651
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 244–253
Zhu M, Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3D object detection and pose estimation for grasping. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 3936–3943. IEEE
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790
Rodrigues JJ, Kim J-S, Furukawa M, Xavier J, Aguiar P, Kanade T (2012) 6D pose estimation of textureless shiny objects using random ferns for bin-picking. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 3334–3341. IEEE
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3d object coordinates. In: European conference on computer vision. Springer, pp 536–551
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer, pp 548–562
Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199
Do T-T, Cai M, Pham T, Reid I (2018) Deep-6dpose: recovering 6D object pose from a single RGB image. arXiv preprint arXiv:1802.10367
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6D: Making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529 (2017)
Wang C, et al (2019) Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3343–3352
Mo N, Gan W, Yokoya N, Chen S (2022) Es6d: a computation efficient and symmetry-aware 6D pose regression framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6718–6727
Rad M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6D object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3385–3394
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 4561–4570
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641
He Y, Huang H, Fan H, Chen Q, Sun J (2021) Ffb6d: a full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3003–3013
Wu Y, Zand M, Etemad A, Greenspan M (2022) Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting. In: European conference on computer vision. Springer, pp 335–352
Xu Z, Zhang Y, Chen K, Jia K (2022) Bico-net: regress globally, match locally for robust 6d pose estimation. arXiv preprint arXiv:2205.03536
Li H, Lin J, Jia K (2022) Dcl-net: deep correspondence learning network for 6d pose estimation. In: European conference on computer vision. Springer, pp 369–385
Trabelsi A, Chaabane M, Blanchard N, Beveridge R (2021) A pose proposal and refinement network for better 6d object pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2382–2391
Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: European conference on computer vision. Springer, pp 404–417
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157. IEEE
Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66(3):231–259
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301
Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499
Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 119–134
Suwajanakorn S, Snavely N, Tompson JJ, Norouzi M (2018) Discovery of latent 3d keypoints via end-to-end geometric reasoning. Adv Neur Inf Process Syst 31
Krull A, Michel F, Brachmann E, Gumhold S, Ihrke S, Rother C (2014) 6-dof model based tracking via object coordinate regression. In: Asian Conference on Computer Vision. Springer, pp. 384–399
Nigam A, Penate-Sanchez A, Agapito L (2018) Detect globally, label locally: learning accurate 6-dof object pose estimation by joint segmentation and coordinate regression. IEEE Robot Autom Lett 3(4):3960–3967
Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, et al. (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372
Li Z, Wang G, Ji X (2019) Cdpn: Coordinates-based disentangled pose network for real-time RGB-based 6-dof object pose estimation. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 7678–7687
Park K, Patten T, Vincze M (2019) Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7668–7677
Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1941–1950
Hodan T, Barath, D, Matas, J (2020) Epos: estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11703–11712
He K, et al (2017) Mask r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Lan S, Yu R, Yu G, Davis LS (2019) Modeling local geometric structure of 3d point clouds using geo-cnn. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 998–1008
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005 (2010). IEEE
Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 International conference on computer vision, pp 858–865. IEEE
Chen W, Jia X, Chang HJ, Duan J, Leonardis A (2020) G2l-net: global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4233–4242
Acknowledgements
This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N2104008, and the Central Government Guides the Local Science and Technology Development Special Fund: 2021JH6/10500129.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, Z., Chu, H., Wang, F. et al. HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation. Neural Comput & Applic 36, 3167–3178 (2024). https://doi.org/10.1007/s00521-023-09241-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09241-1