Multi-View Metal Parts Pose Estimation Based on a Single Camera
<p>The left figure shows that the metal parts are usually reflective with different appearances. The part marked by red cruve in the right figure shows that the depth information is sometimes missing due to the reflective property.</p> "> Figure 2
<p>The overview of the proposed multi-view metal parts pose estimation method. In the first stage, this work detects metal parts and estimates the pose of each detected object. The red bounding boxes show the detection results. Then, the proposed method calculates the next best view based on the ray tracing and drives the camera mounted on the arm to the perspective. This work calculates the pose of the metal parts in this view. In the last stage, the proposed method combines the known camera poses and the estimated pose of metal parts in each view to refine the global scene.</p> "> Figure 3
<p>The way of calculating the max area. This work calculates all the hit points in the first view <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>e</mi> <msub> <mi>t</mi> <mn>0</mn> </msub> </mrow> </semantics></math>, rotates the camera around the voxel, and calculates the number of hit points <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>e</mi> <msub> <mi>t</mi> <mi>i</mi> </msub> </mrow> </semantics></math>. The simulated camera is not fixed. It rotates around the objects at intervals of 5 degrees around the x-axis, at 5-degree intervals around the y-axis, and at 10-degree intervals around the z-axis. The voxel in the first view is the metal parts with the score of pose estimation above the threshold. By calculating the additional hit points between new view and the first view <math display="inline"><semantics> <mrow> <mi>a</mi> <mi>d</mi> <msub> <mi>d</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>s</mi> <mi>e</mi> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>−</mo> <mfenced separators="" open="(" close=")"> <mi>s</mi> <mi>e</mi> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>&</mo> <mi>s</mi> <mi>e</mi> <msub> <mi>t</mi> <mn>0</mn> </msub> </mfenced> </mrow> </semantics></math>, this work can get the next best view.</p> "> Figure 4
<p>The information from multiple viewpoints is integrated to estimate an object’s 6D pose. Using keypoint heatmaps from individual views, this work calculates multi-view uncertainty estimates. The image with green arrow indicates the heatmap of the corresponding keypoint. The red arrow shows the corresponding metal part in different views.These estimates are then used for filtering and ranking candidate poses, demonstrating improved accuracy and reliability compared to existing methods.</p> "> Figure 5
<p>Illustration of using scanning spray to determine the pose of metal parts relative to an industrial-grade camera. (<b>a</b>) Image of metal parts without scanning spray. (<b>b</b>) Image of metal parts with scanning spray applied. (<b>c</b>) Selected keypoints are used to align the point cloud with the model and obtain the ground truth pose. (<b>d</b>) The rendered color indicates the estimated pose obtained through the proposed pose estimation method, captured using a Realsense camera.</p> "> Figure 6
<p>The qualitative results pose estimation for a single metal part from a single view. Each scene is organized into two rows: the first row shows the raw image, and the second row presents the rendered pose of the object.</p> "> Figure 7
<p>Illustration of using scanning spray to determine the pose of multiple metal parts relative to an industrial-grade camera. (<b>a</b>) Point clouds of the metal parts without scanning spray. (<b>b</b>) Point cloud of the metal parts coated with scanning spray. (<b>c</b>) Manual segmentation of the point cloud for each metal part. (<b>d</b>) Alignment of the point cloud using keypoints to obtain the ground truth pose. The color overlaid in this image represents the ground truth pose.</p> "> Figure 8
<p>The images illustrate the pose estimation results of multiple metal parts from a single view. The first column displays the raw image, and the second column shows the target detection results. The third column in the first row presents the predicted semantic keypoints for each target, and the last column in the first row displays the rendered pose of these targets. In the second row, the third column depicts the estimated pose of each metal part using PVNet, and the last column presents the estimated pose of all targets. Green 3D bounding boxes indicate the ground truth, and blue bounding boxes represent the estimated poses. The comparison results demonstrate that our method surpasses PVNet in estimating the poses of shiny metal parts.</p> "> Figure 9
<p>The qualitative results outlines the multi-view pose estimation process for metal parts. The first row displays input RGB images from various viewpoints. The second row illustrates the results of 2D detection of metal parts. The third row depicts the rendered estimated poses for the current view, noting that objects without rendering indicate a low confidence score in their estimation. The fourth row presents the final pose refinement. In each scene, the second column shows the subsequent perspective and corresponding estimation results of the first column. The red dounding boxes are the detection results. By employing the next-best view planning method, the proposed method is able to identify more estimated objects compared to using only the first view.</p> "> Figure 10
<p>The comparision results present examples of comparison results. The first column displays the specified camera trajectory used to guide the robot arm, and the second column shows the corresponding estimation results, overlaid with colorfully rendered poses. The third column demonstrates how the camera was moved to the next best view using the proposed method, and the fourth column presents the outcomes of our proposed method, overlaid with colorfully rendered poses. The last row reveals the final pose refinement results after employing two different moving camera methods for two views. The red curves from the last row in the specified trajectory method indicate the metal parts that did not achieve pose estimation. The figure illustrates that the proposed method can estimate more metal parts than the specified trajectory method.</p> "> Figure 11
<p>Examples of pose estimation and grasping experiments. When estimating all the metal parts’ poses, this work drives the robot arm to grasp the object. The upper left corner is the raw image taken from the camera and the rendered objects with estimated poses.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Single-View 6D Pose Estimation
2.2. Multi-View 6D Pose Estimation
3. Methods
3.1. Approach Overview
3.2. Single View Pose Estimation
3.3. The Next-Best-View Planning
3.4. Scene-Level Pose Refinement
4. Experiments
4.1. The Ground Truth Poses
4.2. Single-View Pose Estimation Experiment
4.3. Multi-View Experiments
4.4. Grasping Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, J.; Sun, W.; Yang, H.; Liu, C.; Zhang, X.; Mian, A. Domain-Generalized Robotic Picking via Contrastive Learning-Based 6-D Pose Estimation. IEEE Trans. Ind. Inform. 2024, 1–12. [Google Scholar] [CrossRef]
- Li, D.; Mu, Q.; Yuan, Y.; Wu, S.; Tian, Y.; Hong, H.; Jiang, Q.; Liu, F. 6D Pose Estimation Based on 3D Edge Binocular Reprojection Optimization for Robotic Assembly. IEEE Robot. Autom. Lett. 2023, 8, 8319–8326. [Google Scholar] [CrossRef]
- Zhuang, C.; Li, S.; Ding, H. Instance segmentation based 6D pose estimation of industrial objects using point clouds for robotic bin-picking. Robot. Comput.-Integr. Manuf. 2023, 82, 102541. [Google Scholar] [CrossRef]
- Lin, X.; Wang, D.; Zhou, G.; Liu, C.; Chen, Q. Transpose: 6d object pose estimation with geometry-aware transformer. Neurocomputing 2024, 589, 127652. [Google Scholar] [CrossRef]
- Li, G.; Li, Y.; Ye, Z.; Zhang, Q.; Kong, T.; Cui, Z.; Zhang, G. Generative category-level shape and pose estimation with semantic primitives. In Proceedings of the Conference on Robot Learning, PMLR, Atlanta, GA, USA, 6–9 November 2023; pp. 1390–1400. [Google Scholar]
- Wu, C.; Chen, L.; Wang, S.; Yang, H.; Jiang, J. Geometric-aware dense matching network for 6D pose estimation of objects from RGB-D images. Pattern Recognit. 2023, 137, 109293. [Google Scholar] [CrossRef]
- Petitjean, T.; Wu, Z.; Demonceaux, C.; Laligant, O. OLF: RGB-D adaptive late fusion for robust 6D pose estimation. In Proceedings of the Sixteenth International Conference on Quality Control by Artificial Vision, SPIE, Albi, France, 6–8 June 2023; Volume 12749, pp. 132–140. [Google Scholar]
- Algabri, R.; Shin, H.; Lee, S. Real-time 6DoF full-range markerless head pose estimation. Expert Syst. Appl. 2024, 239, 122293. [Google Scholar] [CrossRef]
- He, Z.; Li, Q.; Zhao, X.; Wang, J.; Shen, H.; Zhang, S.; Tan, J. ContourPose: Monocular 6-D Pose Estimation Method for Reflective Textureless Metal Parts. IEEE Trans. Robot. 2023, 39, 4037–4050. [Google Scholar] [CrossRef]
- He, Z.; Wu, M.; Zhao, X.; Zhang, S.; Tan, J. A Generative Feature-to-Image Robotic Vision Framework for 6D Pose Measurement of Metal Parts. IEEE/ASME Trans. Mechatron. 2021, 27, 3198–3209. [Google Scholar] [CrossRef]
- Peng, S.; Liu, Y.; Huang, Q.; Zhou, X.; Bao, H. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4561–4570. [Google Scholar]
- Yang, J.; Xue, W.; Ghavidel, S.; Waslander, S.L. 6d pose estimation for textureless objects on rgb frames using multi-view optimization. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 2905–2912. [Google Scholar]
- Chang, J.; Kim, M.; Kang, S.; Han, H.; Hong, S.; Jang, K.; Kang, S. GhostPose: Multi-view pose estimation of transparent objects for robot hand grasping. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5749–5755. [Google Scholar]
- Parisotto, T.; Mukherjee, S.; Kasaei, H. MORE: Simultaneous multi-view 3D object recognition and pose estimation. Intell. Serv. Robot. 2023, 16, 497–508. [Google Scholar] [CrossRef]
- Opromolla, R.; Fasano, G.; Rufino, G.; Grassi, M. A model-based 3D template matching technique for pose acquisition of an uncooperative space object. Sensors 2015, 15, 6360–6382. [Google Scholar] [CrossRef]
- He, Z.; Jiang, Z.; Zhao, X.; Zhang, S.; Wu, C. Sparse template-based 6-D pose estimation of metal parts using a monocular camera. IEEE Trans. Ind. Electron. 2019, 67, 390–401. [Google Scholar] [CrossRef]
- Sundermeyer, M.; Marton, Z.C.; Durner, M.; Brucker, M.; Triebel, R. Implicit 3d orientation learning for 6d object detection from rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 699–715. [Google Scholar]
- Schmeckpeper, K.; Osteen, P.R.; Wang, Y.; Pavlakos, G.; Chaney, K.; Jordan, W.; Zhou, X.; Derpanis, K.G.; Daniilidis, K. Semantic keypoint-based pose estimation from single RGB frames. arXiv 2022, arXiv:2204.05864. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Kreiss, S.; Bertoni, L.; Alahi, A. Openpifpaf: Composite fields for semantic keypoint detection and spatio-temporal association. IEEE Trans. Intell. Transp. Syst. 2021, 23, 13498–13511. [Google Scholar] [CrossRef]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. EP n P: An accurate O (n) solution to the P n P problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef]
- Park, K.; Patten, T.; Vincze, M. Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7668–7677. [Google Scholar]
- Haugaard, R.L.; Buch, A.G. Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 6749–6758. [Google Scholar]
- Song, C.; Song, J.; Huang, Q. Hybridpose: 6d object pose estimation under hybrid representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 431–440. [Google Scholar]
- Collet, A.; Srinivasa, S.S. Efficient multi-view object recognition and full pose estimation. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–8 May 2010; pp. 2050–2055. [Google Scholar]
- Duffhauss, F.; Demmler, T.; Neumann, G. MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 3568–3575. [Google Scholar]
- Kaskman, R.; Shugurov, I.; Zakharov, S.; Ilic, S. 6 dof pose estimation of textureless objects from multiple rgb frames. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 612–630. [Google Scholar]
- Labbé, Y.; Carpentier, J.; Aubry, M.; Sivic, J. Cosypose: Consistent multi-view multi-object 6d pose estimation. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 574–591. [Google Scholar]
- Zou, Z.X.; Huang, S.S.; Mu, T.J.; Wang, Y.P. ObjectFusion: Accurate object-level SLAM with neural object priors. Graph. Model. 2022, 123, 101165. [Google Scholar] [CrossRef]
- Lin, S.; Wang, J.; Xu, M.; Zhao, H.; Chen, Z. Contour-SLAM: A Robust Object-Level SLAM Based on Contour Alignment. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
- Maninis, K.K.; Popov, S.; Nießner, M.; Ferrari, V. Vid2cad: Cad model alignment using multi-view constraints from videos. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1320–1327. [Google Scholar] [CrossRef] [PubMed]
- Deng, X.; Mousavian, A.; Xiang, Y.; Xia, F.; Bretl, T.; Fox, D. PoseRBPF: A Rao–Blackwellized particle filter for 6-D object pose tracking. IEEE Trans. Robot. 2021, 37, 1328–1342. [Google Scholar] [CrossRef]
- Yang, S.; Scherer, S. Cubeslam: Monocular 3-d object slam. IEEE Trans. Robot. 2019, 35, 925–938. [Google Scholar] [CrossRef]
- Li, A.; Schoellig, A.P. Multi-view keypoints for reliable 6d object pose estimation. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 6988–6994. [Google Scholar]
- Duffhauss, F.; Koch, S.; Ziesche, H.; Vien, N.A.; Neumann, G. Symfm6d: Symmetry-aware multi-directional fusion for multi-view 6d object pose estimation. IEEE Robot. Autom. Lett. 2023, 8, 5315–5322. [Google Scholar] [CrossRef]
- Li, C.; Bai, J.; Hager, G.D. A unified framework for multi-view multi-class object pose estimation. In Proceedings of the European Conference on Computer Vision (Eccv), Munich, Germany, 8–14 September 2018; pp. 254–269. [Google Scholar]
- Chen, X.; Hu, J.; Jin, C.; Li, L.; Wang, L. Understanding domain randomization for sim-to-real transfer. arXiv 2021, arXiv:2110.03239. [Google Scholar]
- Chum, O.; Matas, J. Optimal randomized RANSAC. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1472–1482. [Google Scholar] [CrossRef] [PubMed]
- Bonaventura, X.; Feixas, M.; Sbert, M.; Chuang, L.; Wallraven, C. A survey of viewpoint selection methods for polygonal models. Entropy 2018, 20, 370. [Google Scholar] [CrossRef]
- Jiang, J.; Luo, X.; Luo, Q.; Qiao, L.; Li, M. An overview of hand-eye calibration. Int. J. Adv. Manuf. Technol. 2022, 119, 77–97. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, C.; Jiang, X. Multi-View Metal Parts Pose Estimation Based on a Single Camera. Sensors 2024, 24, 3408. https://doi.org/10.3390/s24113408
Chen C, Jiang X. Multi-View Metal Parts Pose Estimation Based on a Single Camera. Sensors. 2024; 24(11):3408. https://doi.org/10.3390/s24113408
Chicago/Turabian StyleChen, Chen, and Xin Jiang. 2024. "Multi-View Metal Parts Pose Estimation Based on a Single Camera" Sensors 24, no. 11: 3408. https://doi.org/10.3390/s24113408