research-article

Robust vision-based glove pose estimation for both hands in virtual reality

Authors:

Liang-Hsun ChenAuthors Info & Claims

Virtual Reality, Volume 27, Issue 4

Pages 3133 - 3148

https://doi.org/10.1007/s10055-023-00860-6

Published: 15 September 2023 Publication History

Abstract

In virtual reality (VR) applications, haptic gloves provide feedback and more direct control than bare hands do. Most VR gloves contain flex and inertial measurement sensors for tracking the finger joints of a single hand; however, they lack a mechanism for tracking two-hand interactions. In this paper, a vision-based method is proposed for improved two-handed glove tracking. The proposed method requires only one camera attached to a VR headset. A photorealistic glove data generation framework was established to synthesize large quantities of training data for identifying the left, right, or both gloves in images with complex backgrounds. We also incorporated the “glove pose hypothesis” in the training stage, in which spatial cues regarding relative joint positions were exploited for accurately predict glove positions under severe self-occlusion or motion blur. In our experiments, a system based on the proposed method achieved an accuracy of 94.06% on a validation set and achieved high-speed tracking at 65 fps on a consumer graphics processing unit.

References

[1]

Barron C and Kakadiaris IA Estimating anthropometry and pose from a single image Proc IEEE Conf Comput vis Pattern Recognit 2000 1 669-676

[2]

Buxton W, Myers B (1986) A study in two-handed input. In: Proceedings of the SIGCHI conference on human factors in computing systems, Boston, Massachusetts, USA., 321–326.

[3]

Buxton W (1995) Chunking and phrasing and the design of human-computer dialogues. In: Baecker RM, Grudin J, Buxton WAS, Greenberg S. (Eds), Readings in human–computer interaction, 494–499.

[4]

Chen W, Yu C, Tu C, Lyu Z, Tang J, Ou S, Fu Y, and Xue Z A Survey on hand pose estimation with wearable sensors and computer-vision-based methods Sensors 2020 20 4 1074

[5]

Chen Y, Tu Z, Ge L, Zhang D, Chen R, Yuan J (2019) SO-HandNet: self-organizing network for 3D hand pose estimation with semi-supervised learning. In: Proceedings of the IEEE/CVF international conference on computer vision, 6961–6970

[6]

Chen Y, Tu Z, Kang D, Bao L, Zhang Y, Zhe X, Chen R, Yuan J (2021) Model-based 3D Hand Reconstruction via Self-Supervised Learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10451–10460.

[7]

Cheng W, Park JH, Ko JH (2021) HandFoldingNet: A 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton. In: Proceedings of the IEEE/CVF international conference on computer vision, 11260–11269.

[8]

Doosti B, Naha S, Mirbagheri M, Crandall DJ (2020) Hope-net: a graph-based model for hand-object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6608–6617.

[9]

Erol A, Bebis G, Nicolescu M, Boyle RD, and Twombly X Vision-based hand pose estimation: a review Comput vis Image Underst 2007 108 1–2 52-73

[10]

Fang L, Liu X, Liu L, Xu H, Kang W (2020) JGR-P2O: Joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In: European Conference Computer Vision, pp 120–137.

[11]

Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 409–419.

[12]

Hinckley K, Pausch R, Proffitt D, and Kassell NF Two-handed virtual manipulation ACM Trans Comput Hum Interact 1998 5 3 260-302

[13]

Hinckley K, Pausch R, Proffitt D, and Kassell NF Two-handed virtual manipulation ACM Trans Comput Hum Interact (TOCHI) 1998 5 3 260-302

[14]

Hinckley K, Pausch R, Proffitt D (1997) Attention and visual feedback: the bimanual frame of reference. In: Proceedings of the 1997 symposium on interactive 3D graphics, Providence, Rhode Island, USA. 121–ff.

[15]

Huber PJ (1992) Robust estimation of a location parameter. In: Breakthroughs in statistics, pp 492–518.

[16]

Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision, pp 34–50.

[17]

Kotranza A, Quarles J, Lok B (2006) Mixed reality: are two hands better than one?. In: Proceedings of the ACM symposium on virtual reality software and technology, Limassol, Cyprus. pp 31–34.

[18]

Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3D pose estimation using monocular RGB. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381.

[19]

Liu S Jiang H, Xu J, Liu S, Wang X (2021) Semi-supervised 3D hand-object poses estimation with interactions in time. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14687–14697.

[20]

Moon G, Chang JY, Lee KM (2018) V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5088.

[21]

Mueller F, Mehta D, Sotnychenko O, Sridhar S, Casas D, Theobalt C (2017) Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of the IEEE international conference on computer vision, pp 1154–1163.

[22]

Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) GANerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59.

[23]

Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937.

[24]

Rad M, Oberweger M, Lepetit V (2018) Feature mapping for learning fast and accurate 3D pose inference from synthetic images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4663–4672.

[25]

Ren P, Sun H, Hao J, Wang J, Qi Q, Liao J (2022) Mining multi-view information: a strong self-supervised framework for depth-based 3D hand pose and mesh estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 20555–20565.

[26]

Rhodin H, Richardt C, Casas D, Insafutdinov E, Shafiei M, Seidel H-P, Schiele B, and Theobalt C EgoCap: egocentric marker-less motion capture with two fisheye cameras ACM Trans Grap 2016 35 6 1-11

[27]

Rudnev V, Golyanik V, Wang J, Seidel HP, Mueller F, Elgharib M, Theobalt C (2021) Real-time neural 3D hand pose estimation from an event stream. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2385–12395.

[28]

Sapp B and Taskar B MODEC: multimodal decomposable models for human pose estimation IEEE Conf Comput vis Pattern Recognit 2013 2013 23-28

[29]

Spurr A, Dahiya A, Wang X, Zhang X, Hilliges O (2021) Self-supervised 3D hand pose estimation from monocular RGB via contrastive learning.In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11230–11239.

[30]

Tompson J, Stein M, Lecun Y, and Perlin K Real-time continuous pose recovery of human hands using convolutional networks ACM Trans Grap 2014 33 5 1-10

[31]

Vogiatzidakis P and Koutsabasis P ‘Address and command’: two-handed mid-air interactions with multiple home devices Int J Hum Comput Stud 2022 159 102755

[32]

Voigt-Antons J N, Kojic T, Ali D, Möller S (2020) Influence of hand tracking as a way of interaction in virtual reality on user experience. In: 2020 Twelfth international conference on quality of multimedia experience (QoMEX), Athlone, Ireland, pp 1–4.

[33]

Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 4724–4732.

[34]

Xiong F, Zhang B, Xiao Y, Cao Z, Yu T, Zhou JT, Yuan J (2019) A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 793–802.

[35]

Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3D hand pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2335–2343.

[36]

Yang L, Li K, Zhan X, Lv J, Xu W, Li J, Lu C (2022) ArtiBoost: boosting articulated 3D hand-object pose estimation via online exploration and synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2750–2760.

[37]

Zhao Z, Zhao X, Wang Y (2021) TravelNet: self-supervised physically plausible hand motion learning from monocular color images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11666–11676.

Index Terms

Robust vision-based glove pose estimation for both hands in virtual reality

Index terms have been assigned to the content through auto-classification.

Recommendations

Vision-based hand pose estimation: A review

Direct use of the hand as an input device is an attractive method for providing natural human-computer interaction (HCI). Currently, the only technology that satisfies the advanced requirements of hand-based input for HCI is glove-based sensing. This ...
Overall design and implementation of the virtual glove

Post-stroke patients and people suffering from hand diseases often need rehabilitation therapy. The recovery of original skills, when possible, is closely related to the frequency, quality, and duration of rehabilitative therapy. Rehabilitation gloves ...
A real time vision-based hand gestures recognition system
ISICA'10: Proceedings of the 5th international conference on Advances in computation and intelligence

Hand gesture recognition is an important aspect in Human-Computer interaction, and can be used in various applications, such as virtual reality and computer games. In this paper, we propose a real time hand gesture recognition system. It includes three ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Virtual Reality

Virtual Reality Volume 27, Issue 4

Dec 2023

861 pages

ISSN:1359-4338

EISSN:1434-9957

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 15 September 2023

Accepted: 21 August 2023

Received: 02 February 2023

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents