Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681131acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Free access

VoCAPTER: Voting-based Pose Tracking for Category-level Articulated Object via Inter-frame Priors

Published: 28 October 2024 Publication History

Abstract

Articulated objects are common in our daily life. However, current category-level articulation pose works mostly focus on predicting 9D poses on statistical point cloud observations. In this paper, we deal with the problem of category-level online robust 9D pose tracking of articulated objects, where we propose VoCAPTER, a novel 3D Voting-based Category-level Articulated object Pose TrackER. Our VoCAPTER efficiently updates poses between adjacent frames by utilizing partial observations from the current frame and the estimated per-part 9D poses from the previous frame. Specifically, by incorporating prior knowledge of continuous motion relationships between frames, we begin by canonicalizing the input point cloud, casting the pose tracking task as an inter-frame pose increment estimation challenge. Subsequently, to obtain a robust pose-tracking algorithm, our main idea is to leverage SE(3)-invariant features during motion. This is achieved through a voting-based articulation tracking algorithm, which identifies keyframes as reference states for accurate pose updating throughout the entire video sequence. We evaluate the performance of VoCAPTER in the synthetic dataset and real-world scenarios, which demonstrates VoCAPTER's generalization ability to diverse and complicated scenes. Through these experiments, we provide evidence of VoCAPTER's superiority and robustness in multi-frame pose tracking of articulated objects. We believe that this work can facilitate the progress of various fields, including robotics, embodied intelligence, and augmented reality. All the codes will be made publicly available.

References

[1]
Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, and Motoaki Kawanabe. 2022. Scanqa: 3d question answering for spatial scene understanding. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 19129--19139.
[2]
Ronald T Azuma. 1997. A survey of augmented reality. Presence: teleoperators & virtual environments 6, 4 (1997), 355--385.
[3]
Kenan Bektaş, Jannis Strecker, Simon Mayer, and Kimberly Garcia. 2024. Gazeenabled activity recognition for augmented reality feedback. Computers & Graphics (2024), 103909.
[4]
Aude Billard and Danica Kragic. 2019. Trends and challenges in robot manipulation. Science 364, 6446 (2019), eaat8414.
[5]
Zhimin Chen, Longlong Jing, Yingwei Li, and Bing Li. 2024. Bridging the domain gap: Self-supervised 3d scene understanding with foundation models. Advances in Neural Information Processing Systems 36 (2024).
[6]
Meghan Clark, Mark W Newman, and Prabal Dutta. 2022. ARticulate: One- Shot Interactions with Intelligent Assistants in Unfamiliar Smart Spaces Using Augmented Reality. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1--24.
[7]
Leon Davis and Usman Aslam. 2024. Analyzing consumer expectations and experiences of Augmented Reality (AR) apps in the fashion retail sector. Journal of Retailing and Consumer Services 76 (2024), 103577.
[8]
Yan Di, Ruida Zhang, Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, and Federico Tombari. 2022. Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6781--6791.
[9]
Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, and Xiaojuan Qi. 2023. Pla: Language-driven open-vocabulary 3d scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7010--7019.
[10]
Bertram Drost, Markus Ulrich, Nassir Navab, and Slobodan Ilic. 2010. Model globally, match locally: Efficient and robust 3D object recognition. In 2010 IEEE computer society conference on computer vision and pattern recognition. Ieee, 998--1005.
[11]
Dan Guo, Kun Li, Bin Hu, Yan Zhang, and Meng Wang. 2024. Benchmarking Micro-action Recognition: Dataset, Methods, and Applications. IEEE Transactions on Circuits and Systems for Video Technology 34, 7 (2024), 6238--6252. https: //doi.org/10.1109/TCSVT.2024.3358415
[12]
Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, and Thomas Kollar. 2023. Carto: Category and joint agnostic reconstruction of articulated objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21201--21210.
[13]
Nick Heppert, Toki Migimatsu, Brent Yi, Claire Chen, and Jeannette Bohg. 2022. Category-independent articulated object tracking with factor graphs. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3800--3807.
[14]
Benjamin Keinert, Matthias Innmann, Michael Sänger, and Marc Stamminger. 2015. Spherical fibonacci mapping. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1--7.
[15]
Jiahui Lei, Congyue Deng, William B Shen, Leonidas J Guibas, and Kostas Daniilidis. 2024. NAP: Neural 3D Articulated Object Prior. Advances in Neural Information Processing Systems 36 (2024).
[16]
Liulei Li, Jianan Wei, Wenguan Wang, and Yi Yang. 2024. Neural-logic humanobject interaction detection. Advances in Neural Information Processing Systems 36 (2024).
[17]
Quanzhou Li, Jingbo Wang, Chen Change Loy, and Bo Dai. 2024. Task-oriented human-object interactions generation with implicit neural representations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3035--3044.
[18]
Xiaolong Li, He Wang, Li Yi, Leonidas J Guibas, A Lynn Abbott, and Shuran Song. 2020. Category-level articulated object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3706--3715.
[19]
Jiehong Lin, Zewei Wei, Zhihao Li, Songcen Xu, Kui Jia, and Yuanqing Li. 2021. Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3560--3569.
[20]
Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A Vela, and Stan Birchfield. 2022. Keypoint-based category-level object pose tracking from an RGB sequence with uncertainty estimation. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 1258--1264.
[21]
Liu Liu, Jianming Du, Hao Wu, Xun Yang, Zhenguang Liu, Richang Hong, and Meng Wang. 2023. Category-Level Articulated Object 9D Pose Estimation via Reinforcement Learning. In Proceedings of the 31st ACM International Conference on Multimedia. 728--736.
[22]
Liu Liu, Anran Huang, Qi Wu, Dan Guo, Xun Yang, and Meng Wang. 2024. KPATracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 3684--3692.
[23]
Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiaojun Yu, Yang Han, and Cewu Lu. 2022. AKB-48: a real-world articulated object knowledge base. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14809--14818.
[24]
Liu Liu, Han Xue, Wenqiang Xu, Haoyuan Fu, and Cewu Lu. 2022. Toward realworld category-level articulation pose estimation. IEEE Transactions on Image Processing 31 (2022), 1072--1083.
[25]
Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, and Li Yi. 2022. HOI4D: A 4D egocentric dataset for category-level human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21013--21022.
[26]
Zhenyu Liu, Qide Wang, Daxin Liu, and Jianrong Tan. 2024. PA-Pose: Partial point cloud fusion based on reliable alignment for 6D pose tracking. Pattern Recognition 148 (2024), 110151.
[27]
Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, and Nassir Navab. 2020. CPS: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning. arXiv preprint arXiv:2003.05848 (2020).
[28]
Zhe Min, Jiaole Wang, and Max Q-H Meng. 2018. Robust generalized point cloud registration using hybrid mixture model. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4812--4818.
[29]
Zhe Min, Jiaole Wang, and Max Q-H Meng. 2019. Robust generalized point cloud registration with orientational data based on expectation maximization. IEEE Transactions on Automation Science and Engineering 17, 1 (2019), 207--221.
[30]
Chuanruo Ning, Ruihai Wu, Haoran Lu, Kaichun Mo, and Hao Dong. 2024. Where2explore: Few-shot affordance learning for unseen novel categories of articulated objects. Advances in Neural Information Processing Systems 36 (2024).
[31]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems 30 (2017).
[32]
Ahmet E Tekden, Aykut Erdem, Erkut Erdem, Tamim Asfour, and Emre Ugur. 2024. Object and relation centric representations for push effect prediction. Robotics and Autonomous Systems 174 (2024), 104632.
[33]
Wenxuan Tu, Renxiang Guan, Sihang Zhou, Chuan Ma, Xin Peng, Zhiping Cai, Zhe Liu, Jieren Cheng, and Xinwang Liu. 2024. Attribute-missing graph clustering network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 15392--15401.
[34]
Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, and Yuke Zhu. 2020. 6-pack: Category-level 6d pose tracker with anchor-based keypoints. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 10059--10066.
[35]
He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J Guibas. 2019. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2642--2651.
[36]
Junbo Wang, Wenhai Liu, Qiaojun Yu, Yang You, Liu Liu, Weiming Wang, and Cewu Lu. 2024. RPMArt: Towards Robust Perception and Manipulation for Articulated Objects. arXiv preprint arXiv:2403.16023 (2024).
[37]
Bowen Wen and Kostas Bekris. 2021. Bundletrack: 6d pose tracking for novel objects without instance or category-level 3d models. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 8067--8074.
[38]
Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Müller, Alex Evans, Dieter Fox, Jan Kautz, and Stan Birchfield. 2023. BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 606--617.
[39]
Yijia Weng, He Wang, Qiang Zhou, Yuzhe Qin, Yueqi Duan, Qingnan Fan, Baoquan Chen, Hao Su, and Leonidas J Guibas. 2021. Captra: Category-level pose tracking for rigid and articulated objects from point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13209--13218.
[40]
Ruihai Wu, Kai Cheng, Yan Zhao, Chuanruo Ning, Guanqi Zhan, and Hao Dong. 2024. Learning environment-aware affordance for 3d articulated object manipulation under occlusions. Advances in Neural Information Processing Systems 36 (2024).
[41]
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, HeWang, et al. 2020. Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11097--11107.
[42]
Han Xue, Liu Liu,Wenqiang Xu, Haoyuan Fu, and Cewu Lu. 2021. OMAD: Object Model with Articulated Deformations for Pose Estimation and Retrieval. arXiv preprint arXiv:2112.07334 (2021).
[43]
Lixin Yang, Kailin Li, Xinyu Zhan, Jun Lv, Wenqiang Xu, Jiefeng Li, and Cewu Lu. 2022. ArtiBoost: Boosting articulated 3d hand-object pose estimation via online exploration and synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2750--2760.
[44]
Yang You, Wenhao He, Michael Xu Liu, Weiming Wang, and Cewu Lu. 2022. Go Beyond Point Pairs: A General and Accurate Sim2Real Object Pose Voting Method with Efficient Online Synthetic Training. CoRR (2022).
[45]
Shaobo Zhang, Wanqing Zhao, Ziyu Guan, Xianlin Peng, and Jinye Peng. 2021. Keypoint-graph-driven learning framework for object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1065--1073.
[46]
Jihong Zhu, Andrea Cherubini, Claire Dune, David Navarro-Alarcon, Farshid Alambeigi, Dmitry Berenson, Fanny Ficuciello, Kensuke Harada, Jens Kober, Xiang Li, et al. 2022. Challenges and outlook in robotic manipulation of deformable objects. IEEE Robotics & Automation Magazine 29, 3 (2022), 67--77.
[47]
Lu Zou, Zhangjin Huang, Naijie Gu, and Guoping Wang. 2024. Learning geometric consistency and discrepancy for category-level 6D object pose estimation from point clouds. Pattern Recognition 145 (2024), 109896.

Index Terms

  1. VoCAPTER: Voting-based Pose Tracking for Category-level Articulated Object via Inter-frame Priors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. category-level objects
    2. inter-frame priors
    3. pose tracking
    4. voting

    Qualifiers

    • Research-article

    Funding Sources

    • 2019YFE0125700

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 23
      Total Downloads
    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media