Nothing Special   »   [go: up one dir, main page]

Skip to main content

PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-Based Motion Capture

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

The data scarcity problem is a crucial factor that hampers the model performance of IMU-based human motion capture. However, effective data augmentation for IMU-based motion capture is challenging, since it has to capture the physical relations and constraints of the human body, while maintaining the data distribution and quality. We propose PoseAugment, a novel pipeline incorporating VAE-based pose generation and physical optimization. Given a pose sequence, the VAE module generates infinite poses with both high fidelity and diversity, while keeping the data distribution. The physical module optimizes poses to satisfy physical constraints with minimal motion restrictions. High-quality IMU data are then synthesized from the augmented poses for training motion capture models. Experiments show that PoseAugment outperforms previous data augmentation and pose generation methods in terms of motion capture accuracy, revealing a strong potential of our method to alleviate the data collection burden for IMU-based motion capture and related tasks driven by human poses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that the raw IMU signals (in device local frames) would first be processed into the global frame before motion capture in practice.

References

  1. Bambade, A., El-Kazdadi, S., Taylor, A., Carpentier, J.: Prox-qp: Yet another quadratic programming solver for robotics and beyond. In: RSS 2022-Robotics: Science and Systems (2022)

    Google Scholar 

  2. Castillo, A., Escobar, M., Jeanneret, G., Pumarola, A., Arbeláez, P., Thabet, A., Sanakoyeu, A.: Bodiffusion: Diffusing sparse observations for full-body human motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4221–4231 (2023)

    Google Scholar 

  3. Chen, W.H., Cho, P.C.: A gan-based data augmentation approach for sensor-based human activity recognition. Int’l J. Comp. and Comm. Engr 10(4), 75–84 (2021)

    Article  Google Scholar 

  4. Chen, X., Jiang, B., Liu, W., Huang, Z., Fu, B., Chen, T., Yu, G.: Executing your commands via motion diffusion in latent space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18000–18010 (2023)

    Google Scholar 

  5. Das, S., Trutoiu, L., Murai, A., Alcindor, D., Oh, M., De la Torre, F., Hodgins, J.: Quantitative measurement of motor symptoms in parkinson’s disease: A study with full-body motion capture data. In: 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. pp. 6789–6792. IEEE (2011)

    Google Scholar 

  6. Du, X., Vasudevan, R., Johnson-Roberson, M.: Bio-lstm: A biomechanically inspired recurrent neural network for 3-d pedestrian pose and gait prediction. IEEE Robotics and Automation Letters 4(2), 1501–1508 (2019). https://doi.org/10.1109/LRA.2019.2895266

    Article  Google Scholar 

  7. Du, Y., Kips, R., Pumarola, A., Starke, S., Thabet, A., Sanakoyeu, A.: Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 481–490 (2023)

    Google Scholar 

  8. Featherstone, R.: Rigid body dynamics algorithms. Springer (2014)

    Google Scholar 

  9. Felis, M.L.: Rbdl: an efficient rigid-body dynamics library using recursive algorithms. Autonomous Robots pp. 1–17 (2016). https://doi.org/10.1007/s10514-016-9574-0

  10. Gong, K., Zhang, J., Feng, J.: Poseaug: A differentiable pose augmentation framework for 3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8575–8584 (2021)

    Google Scholar 

  11. Guo, C., Zou, S., Zuo, X., Wang, S., Ji, W., Li, X., Cheng, L.: Generating diverse and natural 3d human motions from text. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5152–5161 (2022)

    Google Scholar 

  12. Guo, C., Zuo, X., Wang, S., Zou, S., Sun, Q., Deng, A., Gong, M., Cheng, L.: Action2motion: Conditioned generation of 3d human motions. In: Proceedings of the 28th ACM International Conference on Multimedia. pp. 2021–2029 (2020)

    Google Scholar 

  13. Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: Livecap: Real-time human performance capture from monocular video. ACM Trans. Graph. 38(2) (2019). https://doi.org/10.1145/3311970

  14. Henter, G.E., Alexanderson, S., Beskow, J.: Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Trans. Graph. 39(6) (nov 2020). https://doi.org/10.1145/3414685.3417836

  15. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: beta-vae: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (2016)

    Google Scholar 

  16. Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. 37(6) (dec 2018). https://doi.org/10.1145/3272127.3275108

  17. Inc., M.: Xsens (2024). https://www.movella.com/products/xsens

  18. Iwana, B.K., Uchida, S.: An empirical survey of data augmentation for time series classification with neural networks. PLOS ONE 16(7), 1–32 (07 2021). https://doi.org/10.1371/journal.pone.0254841

  19. Jiang, J., Streli, P., Qiu, H., Fender, A., Laich, L., Snape, P., Holz, C.: Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In: European conference on computer vision. pp. 443–460. Springer (2022)

    Google Scholar 

  20. Jiang, Y., Ye, Y., Gopinath, D., Won, J., Winkler, A.W., Liu, C.K.: Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. In: SIGGRAPH Asia 2022 Conference Papers. SA ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3550469.3555428

  21. Karunratanakul, K., Preechakul, K., Suwajanakorn, S., Tang, S.: Guided motion diffusion for controllable human motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 2151–2162 (October 2023)

    Google Scholar 

  22. Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., Sivic, J.: Estimating 3d motion and forces of person-object interactions from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  23. Ling, H.Y., Zinno, F., Cheng, G., Van De Panne, M.: Character controllers using motion vaes. ACM Trans. Graph. 39(4) (aug 2020). https://doi.org/10.1145/3386569.3392422

  24. Liu, L., Yin, K., van de Panne, M., Shao, T., Xu, W.: Sampling-based contact-rich motion control. In: ACM SIGGRAPH 2010 Papers. SIGGRAPH ’10, Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1833349.1778865

  25. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. ACM Trans. Graph. 34(6) (oct 2015). https://doi.org/10.1145/2816795.2818013

  26. Maeda, T., Ukita, N.: Motionaug: Augmentation with physical correction for human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 6427–6436 (June 2022)

    Google Scholar 

  27. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: Archive of motion capture as surface shapes. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 5441–5450 (Oct 2019). https://doi.org/10.1109/ICCV.2019.00554

  28. Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artif. Intell. Rev. 42, 275–293 (2014)

    Article  Google Scholar 

  29. Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.P., Xu, W., Casas, D., Theobalt, C.: Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans. Graph. 36(4) (jul 2017). https://doi.org/10.1145/3072959.3073596

  30. Mollyn, V., Arakawa, R., Goel, M., Harrison, C., Ahuja, K.: Imuposer: Full-body pose estimation using imus in phones, watches, and earbuds. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. CHI ’23, Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3544548.3581392

  31. NaturalPoint, I.: Optitrack (2023). https://optitrack.com

  32. van den Oord, A., Vinyals, O., kavukcuoglu, k.: Neural discrete representation learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017), https://proceedings.neurips.cc/paper_files/paper/2017/file/7a98af17e63a0ac09ce2e96d03992fbc-Paper.pdf

  33. Peng, X.B., Guo, Y., Halper, L., Levine, S., Fidler, S.: Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Trans. Graph. 41(4) (jul 2022). https://doi.org/10.1145/3528223.3530110

  34. Petrovich, M., Black, M.J., Varol, G.: Action-conditioned 3d human motion synthesis with transformer vae. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10985–10995 (October 2021)

    Google Scholar 

  35. Rempe, D., Guibas, L.J., Hertzmann, A., Russell, B., Villegas, R., Yang, J.: Contact and human dynamics from monocular video (2020)

    Google Scholar 

  36. Rogez, G., Schmid, C.: Mocap-guided data augmentation for 3d pose estimation in the wild. Advances in neural information processing systems 29 (2016)

    Google Scholar 

  37. Shi, M., Aberman, K., Aristidou, A., Komura, T., Lischinski, D., Cohen-Or, D., Chen, B.: Motionet: 3d human motion reconstruction from monocular video with skeleton consistency. ACM Trans. Graph. 40(1) (sep 2020). https://doi.org/10.1145/3407659

  38. Shimada, S., Golyanik, V., Xu, W., Theobalt, C.: Physcap: Physically plausible monocular 3d motion capture in real time. ACM Trans. Graph. 39(6) (nov 2020). https://doi.org/10.1145/3414685.3417877

  39. Supej, M.: 3d measurements of alpine skiing with an inertial sensor motion capture suit and gnss rtk system. J. Sports Sci. 28(7), 759–769 (2010)

    Article  Google Scholar 

  40. Tessler, C., Kasten, Y., Guo, Y., Mannor, S., Chechik, G., Peng, X.B.: Calm: Conditional adversarial latent models for directable virtual characters. In: ACM SIGGRAPH 2023 Conference Proceedings. SIGGRAPH ’23, Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3588432.3591541

  41. Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022)

  42. UK, V.M.S.L.: Vicon (2023). https://www.vicon.com

  43. Von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G.: Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In: Computer graphics forum. vol. 36, pp. 349–360. Wiley Online Library (2017)

    Google Scholar 

  44. Wei, X., Chai, J.: Videomocap: Modeling physically realistic human motion from monocular video sequences. In: ACM SIGGRAPH 2010 Papers. SIGGRAPH ’10, Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1833349.1778779

  45. Wen, Q., Sun, L., Yang, F., Song, X., Gao, J., Wang, X., Xu, H.: Time series data augmentation for deep learning: A survey. pp. 4653–4660 (08 2021). https://doi.org/10.24963/ijcai.2021/631

  46. Won, J., Gopinath, D., Hodgins, J.: Physics-based character controllers using conditional vaes. ACM Trans. Graph. 41(4) (jul 2022). https://doi.org/10.1145/3528223.3530067

  47. Wouwe, T.V., Lee, S., Falisse, A., Delp, S., Liu, C.K.: Diffusion inertial poser: Human motion reconstruction from arbitrary sparse imu configurations (2023)

    Google Scholar 

  48. Xu, X., Gong, J., Brum, C., Liang, L., Suh, B., Gupta, S.K., Agarwal, Y., Lindsey, L., Kang, R., Shahsavari, B., Nguyen, T., Nieto, H., Hudson, S.E., Maalouf, C., Mousavi, J.S., Laput, G.: Enabling hand gesture customization on wrist-worn devices. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. CHI ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3491102.3501904

  49. Yi, X., Zhou, Y., Habermann, M., Shimada, S., Golyanik, V., Theobalt, C., Xu, F.: Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 13167–13178 (June 2022)

    Google Scholar 

  50. Yi, X., Zhou, Y., Xu, F.: Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Trans. Graph. 40(4) (jul 2021). https://doi.org/10.1145/3450626.3459786

  51. Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: Physdiff: Physics-guided human motion diffusion model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 16010–16021 (October 2023)

    Google Scholar 

  52. Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3d pose and shape estimation of multiple people in natural scenes: The importance of multiple scene constraints. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2148–2157 (June 2018). https://doi.org/10.1109/CVPR.2018.00229

  53. Zell, P., Wandt, B., Rosenhahn, B.: Joint 3d human motion capture and physical analysis from monocular videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (July 2017)

    Google Scholar 

  54. Zhang, M., Cai, Z., Pan, L., Hong, F., Guo, X., Yang, L., Liu, Z.: Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

    Google Scholar 

  55. Zhao, L., Song, S., Wang, P., Wang, C., Wang, J., Guo, M.: A mlp-mixer and mixture of expert model for remaining useful life prediction of lithium-ion batteries. Front. Comp. Sci. 18(5), 185329 (2024)

    Article  Google Scholar 

  56. Zheng, Y., Yamane, K.: Human motion tracking control with strict contact force constraints for floating-base humanoid robots. In: 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). pp. 34–41 (Oct 2013). https://doi.org/10.1109/HUMANOIDS.2013.7029952

  57. Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5738–5746 (2019). https://doi.org/10.1109/CVPR.2019.00589

  58. Zou, Y., Yang, J., Ceylan, D., Zhang, J., Perazzi, F., Huang, J.B.: Reducing footskate in human motion reconstruction with ground contact constraints. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (March 2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China under Grant No. 62132010, Beijing Key Lab of Networked Multimedia, Institute for Artificial Intelligence, Tsinghua University (THUAI), Beijing National Research Center for Information Science and Technology (BNRist), 2025 Key Technological Innovation Program of Ningbo City under Grant No.2022Z080, Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science Park No.Z221100006722018, and Science and Technology Innovation Key R&D Program of Chongqing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Yu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 3021 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Yu, C., Liang, C., Shi, Y. (2025). PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-Based Motion Capture. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15090. Springer, Cham. https://doi.org/10.1007/978-3-031-73411-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73411-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73410-6

  • Online ISBN: 978-3-031-73411-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics