Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint

Published: 18 April 2015 Publication History

Abstract

In this paper we present a novel framework for simultaneous detection of click action and estimation of occluded fingertip positions from egocentric viewed single-depth image sequences. For the detection and estimation, a novel probabilistic inference based on knowledge priors of clicking motion and clicked position is presented. Based on the detection and estimation results, we were able to achieve a fine resolution level of a bare hand-based interaction with virtual objects in egocentric viewpoint. Our contributions include: (i) a rotation and translation invariant finger clicking action and position estimation using the combination of 2D image-based fingertip detection with 3D hand posture estimation in egocentric viewpoint. (ii) a novel spatio-temporal random forest, which performs the detection and estimation efficiently in a single framework. We also present (iii) a selection process utilizing the proposed clicking action detection and position estimation in an arm reachable AR/VR space, which does not require any additional device. Experimental results show that the proposed method delivers promising performance under frequent self-occlusions in the process of selecting objects in AR/VR space whilst wearing an egocentric-depth camera-attached HMD.

References

[1]
Leap motion. [Online]. Available: http://www.leapmotion.com/. Accessed Sep. 10, 2014.
[2]
Unity engine. [Online]. Available: http://unity3d.com/. Accessed Sep. 15, 2014.
[3]
M. K. Bhuyan, D. R. Neog, and M. K. Kar. Fingertip detection for hand pose recognition. International Journal on Computer Science and Engineering, 4(3):501–511, March 2012.
[4]
J. Cashion, C. A. Wingrave, and J. J. L. Jr. Dense and dynamic 3d selection for game-based virtual environments. IEEE TVCG, 18(4):634–642, 2012.
[5]
J. Cashion, C. A. Wingrave, and J. J. L. Jr. Optimal 3d selection technique assignment using real-time contextual analysis. In 3DUI, pages 107–110. IEEE, 2013.
[6]
W. H. Chun and T. Hollerer. Real-time hand interaction for augmented reality on mobile phones. In IUI, pages 307–314, New York, USA, 2013.
[7]
A. Colaço, A. Kirmani, H. S. Yang, N.-W. Gong, C. Schmandt, and V. K. Goyal. Mime: Compact, low power 3d gesture sensing for interaction with head mounted displays. In UIST, pages 227–236, USA, 2013.
[8]
M. de La Gorce, D. J. Fleet, and N. Paragios. Model-based 3d hand pose estimation from monocular video. IEEE TPAMI, 33(9):1793–1805, Sept. 2011.
[9]
J. Gall and V. Lempitsky. Class-specific hough forests for object detection. In CVPR, 2009.
[10]
T. Ha, S. Feiner, and W. Woo. Wearhand: Head-worn, RGB-D camera-based, bare-hand user interface with visually enhanced depth perception. In ISMAR, pages 219–228. IEEE, September 2014.
[11]
G. Hackenberg, R. McCall, and W. Broll. Lightweight palm and finger tracking for real-time 3d gesture control. In Virtual Reality Conference (VR), 2011 IEEE, pages 19–26, march 2011.
[12]
N. K. Iason Oikonomidis and A. Argyros. Efficient model-based 3d tracking of hand articulations using kinect. In BMVC, pages 101.1–101.11, 2011.
[13]
D. Kim, O. Hilliges, S. Izadi, A. Butler, J. Chen, I. Oikonomidis, and P. Olivier. Digits: Freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In UIST, 2012.
[14]
T.-K. Kim, S.-F. Wong, and R. Cipolla. Tensor canonical correlation analysis for action classification. In CVPR. IEEE Computer Society, 2007.
[15]
S. G. Kratz, P. Chiu, and M. Back. Pointpose: finger pose estimation for touch input on mobile devices using a depth sensor. In ITS, pages 223–230. ACM, 2013.
[16]
P. Krejov and R. Bowden. Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima. 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG):1–7, 2013.
[17]
T. Lee and T. Hollerer, Multithreaded hybrid feature tracking for markerless augmented reality. IEEE TVCG, 15(3):355–368, 2009.
[18]
V. Lepetit and P. Fua. Keypoint Recognition Using Random Forests and Random Ferns, pages 111–124. Springer, 2013.
[19]
H. Liang, J. Yuan, and D. Thalmapp. 3d fingertip and palm tracking in depth image sequences. In Proceedings of the 20th ACM International Conference on Multimedia, MM '12, pages 785–788, New York, NY, USA, 2012. ACM.
[20]
Y. Liao, Y. Zhou, H. Zhou, and Z. Liang. Fingertips detection algorithm based on skin colour filtering and distance transformation. In QSIC, pages 276–281. IEEE, 2012.
[21]
P. Lubos, G. Bruder, and F. Steinicke. Analysis of direct selection in head-mounted display environments. In 3DUI, pages 1–8, 2014.
[22]
S. Marcel, O. Bernier, J.-E. Viallet, and D. Collobert. Hand gesture recognition using input-output hidden markov models. In FG, pages 456–461. IEEE Computer Society, 2000.
[23]
A. McGovern, N. Hiers, M. Collier, D. Gagne, and R. Brown. Spatiotemporal relational probability trees: An introduction. In Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on, pages 935–940, 2008.
[24]
A. McGovern, D. John Gagne II, N. Troutman, R. A. Brown, J. Basara, and J. K. Williams. Using spatiotemporal relational random forests to improve our understanding of severe weather processes. Stat. Anal. Data Min., 4(4):407–429, Aug. 2011.
[25]
S. Melax, L. Keselman, and S. Orsten. Dynamics based 3d skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, GI'13, pages 63–70, Toronto, Ont., Canada, Canada, 2013.
[26]
K. Mikolajczyk and H. Uemura. Action recognition with motion-appearance vocabulary forest. In CVPR, 2008.
[27]
S. Miyamoto, T. Matsuo, N. Shimada, and Y. Shirai. Real-time and precise 3-d hand posture estimation based on classification tree trained with variations of appearances. In ICPR, pages 453–456. IEEE, 2012.
[28]
G. F. Natalia Bogdan, Tovi Grossman. Hybridspace: Integrating 3d freehand input and stereo viewing into traditional desktop applications. In 3DUI. IEEE, 2014.
[29]
M. Ogata, Y. Sugiura, H. Osawa, and M. Imai. iring: Intelligent ring using infrared reflection. In UIST, pages 131–136, New York, NY, USA, 2012.
[30]
Z. Pan, Y. Li, M. Zhang, C. Sun, K. Guo, X. Tang, and S. Z. Zhou. A real-time multi-cue hand tracking algorithm based on computer vision. In Proceedings of the 2010 IEEE Virtual Reality Conference, VR'10, pages 219–222, Washington, DC, USA, 2010. IEEE Computer Society.
[31]
N. Petersen, A. Pagani, and D. Stricker. Real-time modeling and tracking manual workflows from first-person vision. In ISMAR, pages 117–124. IEEE Computer Society, Oct. 2013.
[32]
C. Qian, X. Sun, Y. Wei, X. Tang, and J. Sun. Realtime and robust hand tracking from depth. In CVPR, June 2014.
[33]
Y. Shen, S. K. Ong, and A. Y. C. Nee. Vision-based hand interaction in augmented reality environment. IJHCI, 27(6):523–544, 2011.
[34]
T. A. Supinie, A. McGovern, J. Williams, and J. Abernathy. Spatiotemporal relational random forests. In 2012 IEEE 12th International Conference on Data Mining Workshops, pages 630–635, 2009.
[35]
D. Tang, H. J. Chang, A. Tejani, and T.-K. Kim. Latent regression forest: Structured estimation of 3d articulated hand posture. In CVPR, June 2014.
[36]
D. Tang, T.-H. Yu, and T.-K. Kim. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In ICCV, December 2013.
[37]
A. Yao, J. Gall, and L. V. Gool. A hough transform-based voting framework for action recognition. In CVPR, 2010.
[38]
G. Yu, J. Yuan, and Z. Liu. Real-time human action search using random forest based hough voting. In Proceedings of the 19th ACM International Conference on Multimedia, MM'11, pages 1149–1152, 2011.
[39]
G. Yu, J. Yuan, and Z. Liu. Unsupervised random forest indexing for fast action search. In CVPR, pages 865–872, 2011.
[40]
G. Yu, J. Yuan, and Z. Liu. Action search by example using randomized visual vocabularies. IEEE TIP, 2012.

Cited By

View all
  • (2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
  • (2024)Interactions with 3D virtual objects in augmented reality using natural gesturesThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03175-440:9(6449-6462)Online publication date: 1-Sep-2024
  • (2023)Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented RealityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580704(1-14)Online publication date: 19-Apr-2023
  • Show More Cited By

Index Terms

  1. 3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Visualization and Computer Graphics
    IEEE Transactions on Visualization and Computer Graphics  Volume 21, Issue 4
    April 2015
    119 pages

    Publisher

    IEEE Educational Activities Department

    United States

    Publication History

    Published: 18 April 2015

    Author Tags

    1. fingertip position estimation
    2. Hand tracking
    3. spatio-temporal forest
    4. selection
    5. augmented reality
    6. computer vision
    7. self-occlusion
    8. clicking action detection

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
    • (2024)Interactions with 3D virtual objects in augmented reality using natural gesturesThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03175-440:9(6449-6462)Online publication date: 1-Sep-2024
    • (2023)Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented RealityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580704(1-14)Online publication date: 19-Apr-2023
    • (2023)Analysis of the Hands in Egocentric Vision: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.298664845:6(6846-6866)Online publication date: 1-Jun-2023
    • (2020)HGR: Hand-Gesture-Recognition Based Text Input Method for AR/VR Wearable Devices2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC42975.2020.9283348(744-751)Online publication date: 11-Oct-2020
    • (2020)3D hand mesh reconstruction from a monocular RGB imageThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01908-336:10-12(2227-2239)Online publication date: 14-Jul-2020
    • (2020)SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape EstimationComputer Vision – ECCV 202010.1007/978-3-030-58610-2_8(122-139)Online publication date: 23-Aug-2020
    • (2019)OpisthenarProceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology10.1145/3332165.3347867(963-971)Online publication date: 17-Oct-2019
    • (2019)Estimation of 3D human hand poses with structured pose priorIET Computer Vision10.1049/iet-cvi.2018.548013:8(683-690)Online publication date: 1-Dec-2019
    • (2019)Estimation of the Distance Between Fingertips Using Silhouette and Texture Information of Dorsal of HandAdvances in Visual Computing10.1007/978-3-030-33720-9_36(471-481)Online publication date: 7-Oct-2019
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media