Nothing Special   »   [go: up one dir, main page]

Skip to main content

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12365))

Included in the following conference series:

Abstract

Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. The proposed InterHand2.6M consists of 2.6 M labeled single and interacting hand frames under various poses from multiple subjects. Our InterNet simultaneously performs 3D single and interacting hand pose estimation. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset. Finally, we show 3D interacting hand pose estimation results from general images. Our code and dataset are available (https://mks0601.github.io/InterHand2.6M/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    There were two settings. Setting 1: on average 34 RGB and 46 monochrome cameras (80 cameras total), 350 lights, and 90fps. Setting 2: on average 139 color cameras, 450 lights, and 30fps. Due to camera failures, not all cameras were operational; thus, each capture would have slightly different number of cameras.

  2. 2.

    The examples of hand sequences are described in supplementary material.

  3. 3.

    The human annotation procedure is described in supplementary material.

References

  1. Ballan, L., Taneja, A., Gall, J., Van Gool, L., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 640–653. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_46

    Chapter  Google Scholar 

  2. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 678–694. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_41

    Chapter  Google Scholar 

  3. Chen, L., et al.: Generating realistic training images based on tonality-alignment generative adversarial networks for hand pose estimation. arXiv preprint arXiv:1811.09916 (2018)

  4. Arrabales, R., Ledezma, A., Sanchis, A.: ConsScale: a pragmatic scale for measuring the level of consciousness in artificial agents. J. Conscious. Stud. 17(3–4), 131–164 (2010)

    Google Scholar 

  5. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: CVPR (2016)

    Google Scholar 

  6. Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: CVPR (2017)

    Google Scholar 

  7. Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yand, H.: Region ensemble network: improving convolutional network for hand pose estimation. ICIP (2017)

    Google Scholar 

  8. Ha, H., Perdoch, M., Alismail, H., So Kweon, I., Sheikh, Y.: Deltille grids for geometric camera calibration. In: CVPR (2017)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  10. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

    Google Scholar 

  11. Iqbal, U., Molchanov, P., Breuel, T., Gall, J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 125–143. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_8

    Chapter  Google Scholar 

  12. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2014)

    Google Scholar 

  13. de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. In: IEEE TPAMI (2011)

    Google Scholar 

  14. Li, W., et al.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)

  15. Moon, G., Ju, Y.C., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR (2018)

    Google Scholar 

  16. Moon, G., Ju, Y.C., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV (2019)

    Google Scholar 

  17. Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)

    Google Scholar 

  18. Mueller, F., et al.: Real-time pose and shape reconstruction of two interacting hands with a single depth camera. In: ACM TOG (2019)

    Google Scholar 

  19. Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: ICCV (2017)

    Google Scholar 

  20. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Tracking the articulated motion of two strongly interacting hands. In: CVPR (2012)

    Google Scholar 

  21. Paszke, A., et al.: Automatic differentiation in pytorch (2017)

    Google Scholar 

  22. Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: CVPR (2014)

    Google Scholar 

  23. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. In: ACM TOG (2017)

    Google Scholar 

  24. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV (2015)

    Google Scholar 

  25. Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: ACM Conference on Human Factors in Computing Systems (2015)

    Google Scholar 

  26. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)

    Google Scholar 

  27. Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: CVPR (2018)

    Google Scholar 

  28. Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 294–310. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_19

    Chapter  Google Scholar 

  29. Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: CVPR (2015)

    Google Scholar 

  30. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33

    Chapter  Google Scholar 

  31. Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., Pauly, M.: Robust articulated-ICP for real-time hand tracking. In: Computer Graphics Forum (2015)

    Google Scholar 

  32. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML (2019)

    Google Scholar 

  33. Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: CVPR (2014)

    Google Scholar 

  34. Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.K., Shotton, J.: Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: ICCV (2015)

    Google Scholar 

  35. Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Trans. Graph. (TOG) 35, 143 (2016)

    Google Scholar 

  36. Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM TOG (2014)

    Google Scholar 

  37. Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. IJCV (2016)

    Google Scholar 

  38. Wan, C., Probst, T., Gool, L.V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: CVPR (2019)

    Google Scholar 

  39. Wu, Y., Lin, J., Huang, T.S.: Analyzing and capturing articulated hand motion in image sequences. IEEE TPAMI (2005)

    Google Scholar 

  40. Xiong, F., et al.: A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: ICCV (2019)

    Google Scholar 

  41. Yang, L., Yao, A.: Disentangling latent hands for image synthesis and pose estimation. In: CVPR (2019)

    Google Scholar 

  42. Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: CVPR (2018)

    Google Scholar 

  43. Yuan, S., Ye, Q., Garcia-Hernando, G., Kim, T.K.: The 2017 hands in the million challenge on 3D hand pose estimation. arXiv preprint arXiv:1707.02237 (2017)

  44. Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand2.2M benchmark: hand pose dataset and state of the art analysis. In: CVPR (2017)

    Google Scholar 

  45. Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3D hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214 (2016)

  46. Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)

    Google Scholar 

  47. Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHand: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Alexander Hypes, David Whitewolf, Eric Brockmeyer, Kevyn McPhail, Mark Pitts, Matt Stewart, Michal Perdoch, Scott Ardisson, Steven Krenn, and Timothy Godisart for building the capture system, Autumn Trimble, Danielle Belko, Junko Saragih, Laura Millerschoen, Lucas Evans, Rohan Bali, Taylor Koska, and Xiaomin Luo for the data capture and annotation efforts, and Chenglei Wu, Jason Saragih, Tomas Simon, and Yaser Sheikh for constructive feedback on data collection and the paper. This work was partially supported by the Next-Generation Information Computing Development Program (NRF-2017M3C4A7069369) and the Visual Turing Test project (IITP-2017-0-01780) funded by the Ministry of Science and ICT of Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyoung Mu Lee .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 45751 KB)

Supplementary material 2 (mp4 9007 KB)

Supplementary material 3 (mp4 8480 KB)

Supplementary material 4 (mp4 18943 KB)

Supplementary material 5 (pdf 2424 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moon, G., Yu, SI., Wen, H., Shiratori, T., Lee, K.M. (2020). InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12365. Springer, Cham. https://doi.org/10.1007/978-3-030-58565-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58565-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58564-8

  • Online ISBN: 978-3-030-58565-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics