research-article

EgoHDM: A Real-time Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

Authors:

Manuel Kaufmann,

Sammy Christen,

Pan HuiAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 43, Issue 6

Article No.: 236, Pages 1 - 12

https://doi.org/10.1145/3687907

Published: 19 November 2024 Publication History

Abstract

We present EgoHDM, an online egocentric-inertial human motion capture (mocap), localization, and dense mapping system. Our system uses 6 inertial measurement units (IMUs) and a commodity head-mounted RGB camera. EgoHDM is the first human mocap system that offers dense scene mapping in near real-time. Further, it is fast and robust to initialize and fully closes the loop between physically plausible map-aware global human motion estimation and mocap-aware 3D scene reconstruction. To achieve this, we design a tightly coupled mocap-aware dense bundle adjustment and physics-based body pose correction module leveraging a local body-centric elevation map. The latter introduces a novel terrain-aware contact PD controller, which enables characters to physically contact the given local elevation map thereby reducing human floating or penetration. We demonstrate the performance of our system on established synthetic and real-world benchmarks. The results show that our method reduces human localization, camera pose, and mapping accuracy error by 41%, 71%, 46%, respectively, compared to the state of the art. Our qualitative evaluations on newly captured data further demonstrate that EgoHDM can cover challenging scenarios in non-flat terrain including stepping over stairs and outdoor scenes in the wild. Our project page: https://handiyin.github.io/EgoHDM/

References

[1]

Carlos Campos, Richard Elvira, Juan J. Gómez, José M. M. Montiel, and Juan D. Tardós. 2021. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM. IEEE Transactions on Robotics 37, 6 (2021), 1874--1890.

[2]

Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang, and Shuang Liu. 2020. Cross-view tracking for multi-human 3d pose estimation at over 100 fps. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3279--3288.

[3]

Yudi Dai, Yitai Lin, Xiping Lin, Chenglu Wen, Lan Xu, Hongwei Yi, Siqi Shen, Yuexin Ma, and Cheng Wang. 2023. SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 682--692.

[4]

Yudi Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Ma, and Cheng Wang. 2022. HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6792--6802.

[5]

Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, and Artsiom Sanakoyeu. 2023. Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model. In CVPR.

[6]

Martin L Felis. 2017. RBDL: an efficient rigid-body dynamics library using recursive algorithms. Autonomous Robots 41, 2 (2017), 495--511.

Digital Library

[7]

Vladimir Guzov, Aymen Mir, Torsten Sattler, and Gerard Pons-Moll. 2021. Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

[8]

Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1--15.

Digital Library

[9]

Hao Jiang and Kristen Grauman. 2017. Seeing invisible poses: Estimating 3d body pose from egocentric video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3501--3509.

[10]

Jiaxi Jiang, Paul Streli, Huajian Qiu, Andreas Fender, Larissa Laich, Patrick Snape, and Christian Holz. 2022a. AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing. In Proceedings of European Conference on Computer Vision. Springer.

Digital Library

[11]

Yifeng Jiang, Yuting Ye, Deepak Gopinath, Jungdam Won, Alexander W Winkler, and C Karen Liu. 2022b. Transformer Inertial Poser: Real-time human motion reconstruction from sparse IMUs with simultaneous terrain generation. In SIGGRAPH Asia 2022 Conference Papers. 1--9.

Digital Library

[12]

Manuel Kaufmann, Jie Song, Chen Guo, Kaiyue Shen, Tianjian Jiang, Chengcheng Tang, Juan José Zárate, and Otmar Hilliges. 2023. EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild. In International Conference on Computer Vision (ICCV).

[13]

Manuel Kaufmann, Yi Zhao, Chengcheng Tang, Lingling Tao, Christopher Twigg, Jie Song, Robert Wang, and Otmar Hilliges. 2021. Em-pose: 3d human pose estimation from sparse electromagnetic trackers. In Proceedings of the IEEE/CVF international conference on computer vision. 11510--11520.

[14]

Jiye Lee and Hanbyul Joo. 2024. Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera. arXiv preprint arXiv:2401.00847 (2024).

[15]

Lorenzo Liso, Erik Sandström, Vladimir Yugay, Luc Van Gool, and Martin R Oswald. 2024. Loopy-SLAM: Dense Neural SLAM with Loop Closures. arXiv preprint arXiv:2402.09944 (2024).

[16]

Daniil Lisus, Connor Holmes, and Steven Waslander. 2023. Towards open world nerf-based slam. In 2023 20th Conference on Robots and Vision (CRV). IEEE, 37--44.

[17]

Miao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M Rehg, and Siyu Tang. 2021. 4d human body capture from egocentric video via 3d scene grounding. In 2021 international conference on 3D vision (3DV). IEEE, 930--939.

[18]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2023. SMPL: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 851--866.

Digital Library

[19]

Zhengyi Luo, Ryo Hachiuma, Ye Yuan, and Kris Kitani. 2021. Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems 34 (2021), 25019--25032.

[20]

Takahiro Miki, Lorenz Wellhausen, Ruben Grandia, Fabian Jenelten, Timon Homberger, and Marco Hutter. 2022. Elevation mapping for locomotion and navigation using gpu. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2273--2280.

[21]

Zhixiang Min and Enrique Dunn. 2021. Voldor+ slam: For the times when feature-based or direct methods are not good enough. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13813--13819.

Digital Library

[22]

Noitom. 2024. Retrieved Jan 16, 2024 from https://www.noitom.com/

[23]

Edwin Olson. 2011. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3400--3407.

[24]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

[25]

Monique Paulich, Martin Schepers, Nina Rudigkeit, and G. Bellusci. 2018. Xsens MTw Awinda: Miniature Wireless Inertial-Magnetic Motion Tracker for Highly Accurate 3D Kinematic Applications.

[26]

N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, and Srinivasa G Narasimhan. 2021. Tessetrack: End-to-end learnable multi-person articulated 3d pose tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15190--15200.

[27]

Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016. Egocap: egocentric marker-less motion capture with two fisheye cameras. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1--11.

Digital Library

[28]

Qaiser Riaz, Guanhong Tao, Björn Krüger, and Andreas Weber. 2015. Motion reconstruction using very few accelerometers and ground contacts. Graphical Models 79 (2015), 23--38.

Digital Library

[29]

Antoni Rosinol, John J Leonard, and Luca Carlone. 2023a. Nerf-slam: Real-time dense monocular slam with neural radiance fields. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3437--3444.

[30]

Antoni Rosinol, John J Leonard, and Luca Carlone. 2023b. Probabilistic volumetric fusion for dense monocular slam. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3097--3105.

[31]

Ruizhi Shao, Zerong Zheng, Hongwen Zhang, Jingxiang Sun, and Yebin Liu. 2022. Diffustereo: High quality human reconstruction via diffusion-based stereo using sparse cameras. In European Conference on Computer Vision. Springer, 702--720.

[32]

Soyong Shin, Juyong Kim, Eni Halilaj, and Michael J Black. 2023. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. arXiv preprint arXiv:2312.07531 (2023).

[33]

Ronit Slyper and Jessica K Hodgins. 2008. Action capture with accelerometers. In Proceedings of the 2008 ACM SIGGRAPH/Eurographics symposium on computer animation. 193--199.

Digital Library

[34]

Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics (ToG) 30, 3 (2011), 1--12.

Digital Library

[35]

Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, 402--419.

[36]

Zachary Teed and Jia Deng. 2021. DOID-SLAM: Deep Visual Slam for Monocular, Stereo, and RGB-D cameras. Advances in neural information processing systems 34 (2021), 16558--16569.

[37]

Denis Tome, Thiemo Alldieck, Patrick Peluse, Gerard Pons-Moll, Lourdes Agapito, Hernan Badino, and Fernando De la Torre. 2020. Selfpose: 3d egocentric pose estimation from a headset mounted camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[38]

Matthew Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference. 1--13.

[39]

Timo Von Marcard, Bodo Rosenhahn, Michael J Black, and Gerard Pons-Moll. 2017. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer graphics forum, Vol. 36. Wiley Online Library, 349--360.

[40]

Chen Wang, Dasong Gao, Kuan Xu, Junyi Geng, Yaoyu Hu, Yuheng Qiu, Bowen Li, Fan Yang, Brady Moon, Abhinav Pandey, et al. 2023a. Pypose: A library for robot learning with physics-based optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22024--22034.

[41]

Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, and Christian Theobalt. 2021. Estimating egocentric 3d human pose in global space. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11500--11509.

[42]

Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, and Christian Theobalt. 2023b. Scene-aware Egocentric 3D Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13031--13040.

[43]

Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, HansPeter Seidel, and Christian Theobalt. 2019. Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. IEEE transactions on visualization and computer graphics 25, 5 (2019), 2093--2101.

[44]

Dongseok Yang, Jiho Kang, Lingni Ma, Joseph Greer, Yuting Ye, and Sung-Hee Lee. 2024. DivaTrack: Diverse Bodies and Motions from Acceleration-Enhanced Three-Point Trackers. Computer Graphics Forum n/a, n/a (2024), e15057. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.15057

[45]

Xingrui Yang, Hai Li, Hongjia Zhai, Yuhang Ming, Yuqian Liu, and Guofeng Zhang. 2022. Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 499--507.

[46]

Vickie Ye, Georgios Pavlakos, Jitendra Malik, and Angjoo Kanazawa. 2023. Decoupling human and camera motion from videos in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21222--21232.

[47]

Xinyu Yi, Yuxiao Zhou, Marc Habermann, Vladislav Golyanik, Shaohua Pan, Christian Theobalt, and Feng Xu. 2023. EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors. ACM Transactions on Graphics (TOG) 42, 4, Article 76 (2023), 17 pages.

Digital Library

[48]

Xinyu Yi, Yuxiao Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Feng Xu. 2022. Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13167--13178.

[49]

Xinyu Yi, Yuxiao Zhou, and Feng Xu. 2021. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1--13.

Digital Library

[50]

Ye Yuan and Kris Kitani. 2019. Ego-pose estimation and forecasting as real-time pd control. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10082--10092.

[51]

Wei Zhang, Sen Wang, Xingliang Dong, Rongwei Guo, and Norbert Haala. 2023b. Bamf-slam: Bundle adjusted multi-fisheye visual-inertial slam using recurrent field transforms. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6232--6238.

[52]

Youmin Zhang, Fabio Tosi, Stefano Mattoccia, and Matteo Poggi. 2023a. Go-slam: Global optimization for consistent 3d instant reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3727--3737.

[53]

Yahui Zhang, Shaodi You, and Theo Gevers. 2021. Automatic calibration of the fisheye camera for egocentric 3d human pose estimation from a single image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1772--1781.

[54]

Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R Oswald, and Marc Pollefeys. 2022. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12786--12796.

Index Terms

EgoHDM: A Real-time Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture

Recommendations

Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture
SA '23: SIGGRAPH Asia 2023 Conference Papers

Either RGB images or inertial signals have been used for the task of motion capture (mocap), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent difficulties of using ...
EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate,...
BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking
Computer Vision – ECCV 2022
Abstract
Estimating human motion from video is an active research area due to its many potential applications. Most state-of-the-art methods predict human shape and posture estimates for individual images and do not leverage the temporal information ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 43, Issue 6

December 2024

1828 pages

EISSN:1557-7368

DOI:10.1145/3702969

Issue’s Table of Contents

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2024

Published in TOG Volume 43, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
158
Total Downloads

Downloads (Last 12 months)158
Downloads (Last 6 weeks)38

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents