Abstract
We present an approach for real-time estimation of 3D hand shape and pose from a single RGB image. To achieve real-time performance, we utilize an efficient Convolutional Neural Network (CNN): MobileNetV3-Small to extract key features from an input image. The extracted features are then sent to an iterative 3D regression module to infer camera parameters, hand shapes and joint angles for projecting and articulating a 3D hand model. By combining the deep neural network with the differentiable hand model, we can train the network with supervision from 2D and 3D annotations in an end-to-end manner. Experiments on two publicly available datasets demonstrate that our approach matches the accuracy of most existing methods while running at over 110 Hz on a GPU or 75 Hz on a CPU.
Supported by Agency for Science, Technology and Research (A*STAR), Nanyang Technological University (NTU) and the National Healthcare Group (NHG). Project code: RFP/19003.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baek, S., Kim, K.I., Kim, T.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR, pp. 1067–1076 (2019)
Bazarevsky, V., Zhang, F.: On-device, real-time hand tracking with mediapipe. Google AI Blog, August 2019
Boukhayma, A., de Bem, R., Torr, P.H.S.: 3D hand shape and pose from images in the wild. In: CVPR, pp. 10835–10844 (2019)
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 678–694. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_41
Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR, pp. 4733–4742 (2016)
Ge, L., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR, pp. 10825–10834 (2019)
Gouidis, F., Panteleris, P., Oikonomidis, I., Argyros, A.A.: Accurate hand keypoint localization on mobile devices. In: MVA, pp. 1–6 (2019)
Gower, J.: Generalized procrustes analysis. Psychometrika 40(1), 33–51 (1975)
Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: HOnnotate: a method for 3D annotation of hand and object poses. In: CVPR, pp. 3193–3203 (2020)
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR, pp. 11799–11808 (2019)
Howard, A., et al.: Searching for mobilenetv3. In: ICCV, pp. 1314–1324 (2019)
Iqbal, U., Molchanov, P., Breuel, T., Gall, J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 125–143. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_8
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Kulon, D., Güler, R.A., Kokkinos, I., Bronstein, M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: CVPR (2020)
Lim, G.M., Jatesiktat, P., Kuah, C.W.K., Ang, W.T.: Camera-based hand tracking using a mirror-based multi-view setup. In: EMBC, pp. 5789–5793 (2020)
Mueller, F., et al.: Ganerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR, pp. 49–59 (2018)
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM TOG 36(6) (2017)
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: CVPR, pp. 89–98 (2018)
Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: A hand pose tracking benchmark from stereo matching. In: ICIP, pp. 982–986 (2017)
Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: ICCV, pp. 2354–2364 (2019)
Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI, pp. 2421–2427 (2016)
Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: CVPR (2020)
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M.J., Brox, T.: Freihand: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV, pp. 813–822 (2019)
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV, pp. 4913–4921 (2017)
Acknowledgments
The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lim, G.M., Jatesiktat, P., Ang, W.T. (2020). MobileHand: Real-Time 3D Hand Shape and Pose Estimation from Color Image. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)