Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Robust vision-based glove pose estimation for both hands in virtual reality

Published: 15 September 2023 Publication History

Abstract

In virtual reality (VR) applications, haptic gloves provide feedback and more direct control than bare hands do. Most VR gloves contain flex and inertial measurement sensors for tracking the finger joints of a single hand; however, they lack a mechanism for tracking two-hand interactions. In this paper, a vision-based method is proposed for improved two-handed glove tracking. The proposed method requires only one camera attached to a VR headset. A photorealistic glove data generation framework was established to synthesize large quantities of training data for identifying the left, right, or both gloves in images with complex backgrounds. We also incorporated the “glove pose hypothesis” in the training stage, in which spatial cues regarding relative joint positions were exploited for accurately predict glove positions under severe self-occlusion or motion blur. In our experiments, a system based on the proposed method achieved an accuracy of 94.06% on a validation set and achieved high-speed tracking at 65 fps on a consumer graphics processing unit.

References

[1]
Barron C and Kakadiaris IA Estimating anthropometry and pose from a single image Proc IEEE Conf Comput vis Pattern Recognit 2000 1 669-676
[2]
Buxton W, Myers B (1986) A study in two-handed input. In: Proceedings of the SIGCHI conference on human factors in computing systems, Boston, Massachusetts, USA., 321–326.
[3]
Buxton W (1995) Chunking and phrasing and the design of human-computer dialogues. In: Baecker RM, Grudin J, Buxton WAS, Greenberg S. (Eds), Readings in human–computer interaction, 494–499.
[4]
Chen W, Yu C, Tu C, Lyu Z, Tang J, Ou S, Fu Y, and Xue Z A Survey on hand pose estimation with wearable sensors and computer-vision-based methods Sensors 2020 20 4 1074
[5]
Chen Y, Tu Z, Ge L, Zhang D, Chen R, Yuan J (2019) SO-HandNet: self-organizing network for 3D hand pose estimation with semi-supervised learning. In: Proceedings of the IEEE/CVF international conference on computer vision, 6961–6970
[6]
Chen Y, Tu Z, Kang D, Bao L, Zhang Y, Zhe X, Chen R, Yuan J (2021) Model-based 3D Hand Reconstruction via Self-Supervised Learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10451–10460.
[7]
Cheng W, Park JH, Ko JH (2021) HandFoldingNet: A 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton. In: Proceedings of the IEEE/CVF international conference on computer vision, 11260–11269.
[8]
Doosti B, Naha S, Mirbagheri M, Crandall DJ (2020) Hope-net: a graph-based model for hand-object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6608–6617.
[9]
Erol A, Bebis G, Nicolescu M, Boyle RD, and Twombly X Vision-based hand pose estimation: a review Comput vis Image Underst 2007 108 1–2 52-73
[10]
Fang L, Liu X, Liu L, Xu H, Kang W (2020) JGR-P2O: Joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In: European Conference Computer Vision, pp 120–137.
[11]
Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 409–419.
[12]
Hinckley K, Pausch R, Proffitt D, and Kassell NF Two-handed virtual manipulation ACM Trans Comput Hum Interact 1998 5 3 260-302
[13]
Hinckley K, Pausch R, Proffitt D, and Kassell NF Two-handed virtual manipulation ACM Trans Comput Hum Interact (TOCHI) 1998 5 3 260-302
[14]
Hinckley K, Pausch R, Proffitt D (1997) Attention and visual feedback: the bimanual frame of reference. In: Proceedings of the 1997 symposium on interactive 3D graphics, Providence, Rhode Island, USA. 121–ff.
[15]
Huber PJ (1992) Robust estimation of a location parameter. In: Breakthroughs in statistics, pp 492–518.
[16]
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision, pp 34–50.
[17]
Kotranza A, Quarles J, Lok B (2006) Mixed reality: are two hands better than one?. In: Proceedings of the ACM symposium on virtual reality software and technology, Limassol, Cyprus. pp 31–34.
[18]
Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3D pose estimation using monocular RGB. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381.
[19]
Liu S Jiang H, Xu J, Liu S, Wang X (2021) Semi-supervised 3D hand-object poses estimation with interactions in time. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14687–14697.
[20]
Moon G, Chang JY, Lee KM (2018) V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5088.
[21]
Mueller F, Mehta D, Sotnychenko O, Sridhar S, Casas D, Theobalt C (2017) Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of the IEEE international conference on computer vision, pp 1154–1163.
[22]
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) GANerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59.
[23]
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937.
[24]
Rad M, Oberweger M, Lepetit V (2018) Feature mapping for learning fast and accurate 3D pose inference from synthetic images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4663–4672.
[25]
Ren P, Sun H, Hao J, Wang J, Qi Q, Liao J (2022) Mining multi-view information: a strong self-supervised framework for depth-based 3D hand pose and mesh estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 20555–20565.
[26]
Rhodin H, Richardt C, Casas D, Insafutdinov E, Shafiei M, Seidel H-P, Schiele B, and Theobalt C EgoCap: egocentric marker-less motion capture with two fisheye cameras ACM Trans Grap 2016 35 6 1-11
[27]
Rudnev V, Golyanik V, Wang J, Seidel HP, Mueller F, Elgharib M, Theobalt C (2021) Real-time neural 3D hand pose estimation from an event stream. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2385–12395.
[28]
Sapp B and Taskar B MODEC: multimodal decomposable models for human pose estimation IEEE Conf Comput vis Pattern Recognit 2013 2013 23-28
[29]
Spurr A, Dahiya A, Wang X, Zhang X, Hilliges O (2021) Self-supervised 3D hand pose estimation from monocular RGB via contrastive learning.In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11230–11239.
[30]
Tompson J, Stein M, Lecun Y, and Perlin K Real-time continuous pose recovery of human hands using convolutional networks ACM Trans Grap 2014 33 5 1-10
[31]
Vogiatzidakis P and Koutsabasis P ‘Address and command’: two-handed mid-air interactions with multiple home devices Int J Hum Comput Stud 2022 159 102755
[32]
Voigt-Antons J N, Kojic T, Ali D, Möller S (2020) Influence of hand tracking as a way of interaction in virtual reality on user experience. In: 2020 Twelfth international conference on quality of multimedia experience (QoMEX), Athlone, Ireland, pp 1–4.
[33]
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 4724–4732.
[34]
Xiong F, Zhang B, Xiao Y, Cao Z, Yu T, Zhou JT, Yuan J (2019) A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 793–802.
[35]
Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3D hand pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2335–2343.
[36]
Yang L, Li K, Zhan X, Lv J, Xu W, Li J, Lu C (2022) ArtiBoost: boosting articulated 3D hand-object pose estimation via online exploration and synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2750–2760.
[37]
Zhao Z, Zhao X, Wang Y (2021) TravelNet: self-supervised physically plausible hand motion learning from monocular color images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11666–11676.

Index Terms

  1. Robust vision-based glove pose estimation for both hands in virtual reality
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Virtual Reality
    Virtual Reality  Volume 27, Issue 4
    Dec 2023
    861 pages
    ISSN:1359-4338
    EISSN:1434-9957
    Issue’s Table of Contents

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 15 September 2023
    Accepted: 21 August 2023
    Received: 02 February 2023

    Author Tags

    1. Glove tracking
    2. Glove dataset
    3. Hand tracking
    4. Vision-based tracking
    5. Hand pose estimation
    6. Haptic glove

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media