research-article

3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint

Authors:

Youngkyoon Jang,

Hyung Jin Chang,

Woontack WooAuthors Info & Claims

IEEE Transactions on Visualization and Computer Graphics, Volume 21, Issue 4

Pages 501 - 510

https://doi.org/10.1109/TVCG.2015.2391860

Published: 18 April 2015 Publication History

Abstract

In this paper we present a novel framework for simultaneous detection of click action and estimation of occluded fingertip positions from egocentric viewed single-depth image sequences. For the detection and estimation, a novel probabilistic inference based on knowledge priors of clicking motion and clicked position is presented. Based on the detection and estimation results, we were able to achieve a fine resolution level of a bare hand-based interaction with virtual objects in egocentric viewpoint. Our contributions include: (i) a rotation and translation invariant finger clicking action and position estimation using the combination of 2D image-based fingertip detection with 3D hand posture estimation in egocentric viewpoint. (ii) a novel spatio-temporal random forest, which performs the detection and estimation efficiently in a single framework. We also present (iii) a selection process utilizing the proposed clicking action detection and position estimation in an arm reachable AR/VR space, which does not require any additional device. Experimental results show that the proposed method delivers promising performance under frequent self-occlusions in the process of selecting objects in AR/VR space whilst wearing an egocentric-depth camera-attached HMD.

References

[1]

Leap motion. [Online]. Available: http://www.leapmotion.com/. Accessed Sep. 10, 2014.

[2]

Unity engine. [Online]. Available: http://unity3d.com/. Accessed Sep. 15, 2014.

[3]

M. K. Bhuyan, D. R. Neog, and M. K. Kar. Fingertip detection for hand pose recognition. International Journal on Computer Science and Engineering, 4(3):501–511, March 2012.

[4]

J. Cashion, C. A. Wingrave, and J. J. L. Jr. Dense and dynamic 3d selection for game-based virtual environments. IEEE TVCG, 18(4):634–642, 2012.

Digital Library

[5]

J. Cashion, C. A. Wingrave, and J. J. L. Jr. Optimal 3d selection technique assignment using real-time contextual analysis. In 3DUI, pages 107–110. IEEE, 2013.

[6]

W. H. Chun and T. Hollerer. Real-time hand interaction for augmented reality on mobile phones. In IUI, pages 307–314, New York, USA, 2013.

[7]

A. Colaço, A. Kirmani, H. S. Yang, N.-W. Gong, C. Schmandt, and V. K. Goyal. Mime: Compact, low power 3d gesture sensing for interaction with head mounted displays. In UIST, pages 227–236, USA, 2013.

[8]

M. de La Gorce, D. J. Fleet, and N. Paragios. Model-based 3d hand pose estimation from monocular video. IEEE TPAMI, 33(9):1793–1805, Sept. 2011.

[9]

J. Gall and V. Lempitsky. Class-specific hough forests for object detection. In CVPR, 2009.

[10]

T. Ha, S. Feiner, and W. Woo. Wearhand: Head-worn, RGB-D camera-based, bare-hand user interface with visually enhanced depth perception. In ISMAR, pages 219–228. IEEE, September 2014.

[11]

G. Hackenberg, R. McCall, and W. Broll. Lightweight palm and finger tracking for real-time 3d gesture control. In Virtual Reality Conference (VR), 2011 IEEE, pages 19–26, march 2011.

Digital Library

[12]

N. K. Iason Oikonomidis and A. Argyros. Efficient model-based 3d tracking of hand articulations using kinect. In BMVC, pages 101.1–101.11, 2011.

[13]

D. Kim, O. Hilliges, S. Izadi, A. Butler, J. Chen, I. Oikonomidis, and P. Olivier. Digits: Freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In UIST, 2012.

[14]

T.-K. Kim, S.-F. Wong, and R. Cipolla. Tensor canonical correlation analysis for action classification. In CVPR. IEEE Computer Society, 2007.

[15]

S. G. Kratz, P. Chiu, and M. Back. Pointpose: finger pose estimation for touch input on mobile devices using a depth sensor. In ITS, pages 223–230. ACM, 2013.

Digital Library

[16]

P. Krejov and R. Bowden. Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima. 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG):1–7, 2013.

[17]

T. Lee and T. Hollerer, Multithreaded hybrid feature tracking for markerless augmented reality. IEEE TVCG, 15(3):355–368, 2009.

[18]

V. Lepetit and P. Fua. Keypoint Recognition Using Random Forests and Random Ferns, pages 111–124. Springer, 2013.

[19]

H. Liang, J. Yuan, and D. Thalmapp. 3d fingertip and palm tracking in depth image sequences. In Proceedings of the 20th ACM International Conference on Multimedia, MM '12, pages 785–788, New York, NY, USA, 2012. ACM.

Digital Library

[20]

Y. Liao, Y. Zhou, H. Zhou, and Z. Liang. Fingertips detection algorithm based on skin colour filtering and distance transformation. In QSIC, pages 276–281. IEEE, 2012.

[21]

P. Lubos, G. Bruder, and F. Steinicke. Analysis of direct selection in head-mounted display environments. In 3DUI, pages 1–8, 2014.

[22]

S. Marcel, O. Bernier, J.-E. Viallet, and D. Collobert. Hand gesture recognition using input-output hidden markov models. In FG, pages 456–461. IEEE Computer Society, 2000.

[23]

A. McGovern, N. Hiers, M. Collier, D. Gagne, and R. Brown. Spatiotemporal relational probability trees: An introduction. In Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on, pages 935–940, 2008.

Digital Library

[24]

A. McGovern, D. John Gagne II, N. Troutman, R. A. Brown, J. Basara, and J. K. Williams. Using spatiotemporal relational random forests to improve our understanding of severe weather processes. Stat. Anal. Data Min., 4(4):407–429, Aug. 2011.

Digital Library

[25]

S. Melax, L. Keselman, and S. Orsten. Dynamics based 3d skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, GI'13, pages 63–70, Toronto, Ont., Canada, Canada, 2013.

Digital Library

[26]

K. Mikolajczyk and H. Uemura. Action recognition with motion-appearance vocabulary forest. In CVPR, 2008.

[27]

S. Miyamoto, T. Matsuo, N. Shimada, and Y. Shirai. Real-time and precise 3-d hand posture estimation based on classification tree trained with variations of appearances. In ICPR, pages 453–456. IEEE, 2012.

[28]

G. F. Natalia Bogdan, Tovi Grossman. Hybridspace: Integrating 3d freehand input and stereo viewing into traditional desktop applications. In 3DUI. IEEE, 2014.

[29]

M. Ogata, Y. Sugiura, H. Osawa, and M. Imai. iring: Intelligent ring using infrared reflection. In UIST, pages 131–136, New York, NY, USA, 2012.

[30]

Z. Pan, Y. Li, M. Zhang, C. Sun, K. Guo, X. Tang, and S. Z. Zhou. A real-time multi-cue hand tracking algorithm based on computer vision. In Proceedings of the 2010 IEEE Virtual Reality Conference, VR'10, pages 219–222, Washington, DC, USA, 2010. IEEE Computer Society.

Digital Library

[31]

N. Petersen, A. Pagani, and D. Stricker. Real-time modeling and tracking manual workflows from first-person vision. In ISMAR, pages 117–124. IEEE Computer Society, Oct. 2013.

[32]

C. Qian, X. Sun, Y. Wei, X. Tang, and J. Sun. Realtime and robust hand tracking from depth. In CVPR, June 2014.

[33]

Y. Shen, S. K. Ong, and A. Y. C. Nee. Vision-based hand interaction in augmented reality environment. IJHCI, 27(6):523–544, 2011.

[34]

T. A. Supinie, A. McGovern, J. Williams, and J. Abernathy. Spatiotemporal relational random forests. In 2012 IEEE 12th International Conference on Data Mining Workshops, pages 630–635, 2009.

[35]

D. Tang, H. J. Chang, A. Tejani, and T.-K. Kim. Latent regression forest: Structured estimation of 3d articulated hand posture. In CVPR, June 2014.

Digital Library

[36]

D. Tang, T.-H. Yu, and T.-K. Kim. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In ICCV, December 2013.

[37]

A. Yao, J. Gall, and L. V. Gool. A hough transform-based voting framework for action recognition. In CVPR, 2010.

[38]

G. Yu, J. Yuan, and Z. Liu. Real-time human action search using random forest based hough voting. In Proceedings of the 19th ACM International Conference on Multimedia, MM'11, pages 1149–1152, 2011.

Digital Library

[39]

G. Yu, J. Yuan, and Z. Liu. Unsupervised random forest indexing for fast action search. In CVPR, pages 865–872, 2011.

[40]

G. Yu, J. Yuan, and Z. Liu. Action search by example using randomized visual vocabularies. IEEE TIP, 2012.

Cited By

Bashar MBatmaz A(2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670983
Dash ABalaji KDogra DKim B(2024)Interactions with 3D virtual objects in augmented reality using natural gesturesThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03175-440:9(6449-6462)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00371-023-03175-4
He FHu XShi JQian XWang TRamani K(2023)Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented RealityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580704(1-14)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580704
Show More Cited By

Index Terms

3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint

Index terms have been assigned to the content through auto-classification.

Recommendations

Position Estimation of Occluded Fingertip Based on Image of Dorsal Hand from RGB Camera
Virtual, Augmented and Mixed Reality
Abstract
Development of Virtual Reality (VR) and popularization of motion capture enables a user’s hands and fingers to directly interact with virtual objects in a VR environment without holding any equipment . However, an optical motion capture device ...
A gesture control system for intuitive 3D interaction with virtual objects
VRCAI 08

We present a system for interacting with 3D objects in a 3D virtual environment. Using the notion that a typical head-mounted display (HMD) does not cover the user's entire face, we use a fiducial marker placed on the HMD to locate the user's exposed ...
Interacting with 3D objects in a virtual environment using an intuitive gesture system
VRCAI '08: Proceedings of The 7th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry

We present a system for interacting with 3D objects in a 3D virtual environment. Using the notion that a typical head-mounted display does not cover the user's entire face, we use a fiducial marker placed on the HMD to locate the user's exposed facial ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics

IEEE Transactions on Visualization and Computer Graphics Volume 21, Issue 4

April 2015

119 pages

ISSN:1077-2626

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 18 April 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bashar MBatmaz A(2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670983
Dash ABalaji KDogra DKim B(2024)Interactions with 3D virtual objects in augmented reality using natural gesturesThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03175-440:9(6449-6462)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00371-023-03175-4
He FHu XShi JQian XWang TRamani K(2023)Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented RealityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580704(1-14)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580704
Bandini AZariffa J(2023)Analysis of the Hands in Egocentric Vision: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.298664845:6(6846-6866)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPAMI.2020.2986648
Nooruddin NDembani RMaitlo N(2020)HGR: Hand-Gesture-Recognition Based Text Input Method for AR/VR Wearable Devices2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC42975.2020.9283348(744-751)Online publication date: 11-Oct-2020
https://dl.acm.org/doi/10.1109/SMC42975.2020.9283348
Peng HXian CZhang Y(2020)3D hand mesh reconstruction from a monocular RGB imageThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01908-336:10-12(2227-2239)Online publication date: 14-Jul-2020
https://dl.acm.org/doi/10.1007/s00371-020-01908-3
Yang JChang HLee SKwak N(2020)SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape EstimationComputer Vision – ECCV 202010.1007/978-3-030-58610-2_8(122-139)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58610-2_8
Yeo HWu ELee JQuigley AKoike HGuimbretière FBernstein MReinecke K(2019)OpisthenarProceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology10.1145/3332165.3347867(963-971)Online publication date: 17-Oct-2019
https://dl.acm.org/doi/10.1145/3332165.3347867
Guo FHe ZZhang SZhao X(2019)Estimation of 3D human hand poses with structured pose priorIET Computer Vision10.1049/iet-cvi.2018.548013:8(683-690)Online publication date: 1-Dec-2019
https://dl.acm.org/doi/10.1049/iet-cvi.2018.5480
Shimizume TUmezawa TOsawa N(2019)Estimation of the Distance Between Fingertips Using Silhouette and Texture Information of Dorsal of HandAdvances in Visual Computing10.1007/978-3-030-33720-9_36(471-481)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-33720-9_36
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents