Abstract
The hand and fingertip tracking is the crucial part in the egocentric vision interaction, and it remains a challenging problem due to various factors like dynamic environment and hand deformation. We propose a convolutional neural network (CNN) based method for the real-time and accurate hand tracking and fingertip detection in RGB sequences captured by an egocentric mobile camera. Firstly, we build a large scale dataset, Ego-Finger, containing plenty of scenarios and human labeled ground truth. Secondly, we propose a two stage CNN pipeline, i.e., the human vision inspired Attention-based Hand Tracker (AHT) and the hand physical constrained Multi-Points Fingertip Detector (MFD). Comparing with state-of-the-art methods, the proposed method achieves very promising results in the real-time fashion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Ego-Finger dataset is available at:
References
Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015)
Baraldi, L., Paci, F., Serra, G., Benini, L., Cucchiara, R.: Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 702–707 (2014)
Betancourt, A., Morerio, P., Marcenaro, L., Rauterberg, M., Regazzoni, C.: Filtering SVM frame-by-frame binary classification in a detection framework. In: IEEE International Conference on Image Processing (ICIP), pp. 2552–2556 (2015)
Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circ. Syst. Video Technol. 25(5), 744–760 (2015)
Bindemann, M.: Scene and screen center bias early eye movements in scene viewing. Vis. Res. 50(23), 2577–2587 (2010)
Cheng, M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2012)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Huang, Y., Liu, X., Zhang, X., Jin, L.: Deepfinger: a cascade convolutional neuron network approach to finger key point detection in egocentric vision with mobile camera. In: The IEEE Conference on System, Man and Cybernetics (SMC), pp. 2944–2949 (2015)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, C., Kitani, K.M.: Model recommendation with virtual probes for egocentric hand detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2624–2631 (2013)
Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3570–3577 (2013)
Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 1–42 (2014)
Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: data, methods, and challenges. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (TOG) 33(5), 169 (2014)
Tseng, P.H., Carmi, R., Cameron, I.G., Munoz, D.P., Itti, L.: Quantifying center bias of observers in free viewing of dynamic natural scenes. J. Vis. 9(7), 4 (2009)
Wang, N., Shi, J., Yeung, D.Y., Jia, J.: Understanding and diagnosing visual tracking systems. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Acknowledgement
This research is supported in part by MSRA University Collaboration Fund (No.: FY16-RES-THEME-075), Science and Technology Planning Project of Guangdong Province (Grant No.: 2015B010130003, 2015B010101004, 2016A010101014), Fundamental Research Funds for the Central Universities (Grant No.: 2015ZZ027).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, X., Huang, Y., Zhang, X., Jin, L. (2016). Fingertip in the Eye: An Attention-Based Method for Real-Time Hand Tracking and Fingertip Detection in Egocentric Videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 662. Springer, Singapore. https://doi.org/10.1007/978-981-10-3002-4_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-3002-4_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3001-7
Online ISBN: 978-981-10-3002-4
eBook Packages: Computer ScienceComputer Science (R0)