Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Real-Time Hand Posture Recognition System Using Deep Neural Networks

Published: 31 March 2015 Publication History

Abstract

Hand posture recognition (HPR) is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras and the limitations of traditional manually selected features. In this article, we propose a two-stage HPR system for Sign Language Recognition using a Kinect sensor. In the first stage, we propose an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply deep neural networks (DNNs) to automatically learn features from hand posture images that are insensitive to movement, scaling, and rotation. Experiments verify that the proposed system works quickly and accurately and achieves a recognition accuracy as high as 98.12%.

References

[1]
Alper Aksaç, Orkun Öztürk, and Tansel Özyer. 2011. Real-time multi-objective hand posture/gesture recognition by using distance classifiers and finite state machine for virtual mouse operations. In Proceedings of the 2011 7th International Conference on Electrical and Electronics Engineering (ELECO’11). IEEE, II--457.
[2]
Antonis A. Argyros and Manolis I. A. Lourakis. 2004. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In Proceedings of the European Conference on Computer Vision (ECCV’04). Springer, 368--379.
[3]
Chuqing Cao and Ruifeng Li. 2010. Real-time hand posture recognition using Haar-like and topological feature. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface (MVHI’10). IEEE, 683--687.
[4]
Manuel Caputo, Klaus Denker, Benjamin Dums, and Georg Umlauf. 2012. 3D hand gesture recognition based on sensor fusion of commodity hardware. In Mensch & Computer 2012: interaktiv informiert--allgegenwäärtig und allumfassend!?
[5]
Douglas Chai and King N. Ngan. 1999. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology 9, 4 (1999), 551--564.
[6]
Feng-Sheng Chen, Chih-Ming Fu, and Chung-Lin Huang. 2003. Hand gesture recognition using a real-time tracking method and hidden Markov models. Image and Vision Computing 21, 8 (2003), 745--758.
[7]
George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42.
[8]
Marco Fagiani, Emanuele Principi, Stefano Squartini, and Francesco Piazza. 2013. A new system for automatic recognition of italian sign language. In Neural Nets and Surroundings. Springer, 69--79.
[9]
Gian Luca Foresti. 1999. Object recognition and tracking for remote video surveillance. IEEE Transactions on Circuits and Systems for Video Technology 9, 7 (1999), 1045--1062.
[10]
Wen Gao, Gaolin Fang, Debin Zhao, and Yiqiang Chen. 2004. A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition 37, 12 (2004), 2389--2402.
[11]
Geoffrey E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800.
[12]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554.
[13]
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.
[14]
Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2003. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University. July, 2003.
[15]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. (2013).
[16]
Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceeding of the European Symposium on Artificial Neural Networks (ESANN’11).
[17]
A. Kurakin, Z. Zhang, and Z. Liu. 2012. A real time system for dynamic hand gesture recognition with a depth sensor. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO’12). IEEE, 1975--1979.
[18]
Yann LeCun. 1989. Generalization and network design strategies. Connectionism in Perspective (1989), 143--155.
[19]
Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 310 (1995).
[20]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.
[21]
Billy Y. L. Li, Ajmal S. Mian, Wanquan Liu, and Aneesh Krishna. 2013. Using Kinect for face recognition under varying poses, expressions, illumination and disguise. In Proceeding of the 2013 IEEE Workshop on Applications of Computer Vision (WACV). IEEE, 186--192.
[22]
Yi Li. 2012. Hand gesture recognition using Kinect. In Proceedings of the 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS). IEEE, 196--199.
[23]
Zhi Li and Ray Jarvis. 2009. Real time hand gesture recognition using a range camera. In Proceedings of the Australasian Conference on Robotics and Automation. 21--27.
[24]
Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[25]
S. Malassiotis and M. G. Strintzis. 2008. Real-time hand posture recognition using range data. Image and Vision Computing 26, 7 (2008), 1027--1037.
[26]
Vinod Nair and Geoffrey Hinton. 2009. 3-d object recognition with deep belief nets. Advances in Neural Information Processing Systems 22 (2009), 1339--1347.
[27]
C. Nebauer. 1998. Evaluation of convolutional neural networks for visual recognition. IEEE Transactions on Neural Networks 9, 4 (1998), 685--696.
[28]
V. Radha and M. Krishnaveni. 2009. Threshold based segmentation using median filter for sign language recognition system. In Proceedings of the World Congress on Nature & Biologically Inspired Computing, 2009 (NaBIC’’09). IEEE, 1394--1399.
[29]
Zhou Ren, Junsong Yuan, and Zhengyou Zhang. 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 1093--1096.
[30]
Ruslan Salakhutdinov and Geoffrey E. Hinton. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. Advances in Neural Information Processing Systems (2008), 1249--1256.
[31]
Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of Interspeech. 437--440.
[32]
Poonam Suryanarayan, Anbumani Subramanian, and Dinesh Mandalapu. 2010. Dynamic hand pose recognition using depth data. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, 3105--3108.
[33]
Satoshi Suzuki. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32--46.
[34]
Balazs Tusor and A. R. Varkonyi-Koczy. 2010. Circular fuzzy neural network based hand gesture and posture modeling. In Proceedings of the 2010 IEEE Instrumentation and Measurement Technology Conference (I2MTC’10). IEEE, 815--820.
[35]
Michael Van den Bergh and Luc Van Gool. 2011. Combining RGB and ToF cameras for real-time 3D hand gesture interaction. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV’11). IEEE, 66--72.
[36]
Jiang Wang, Zicheng Liu, Jan Chorowski, Zhuoyuan Chen, and Ying Wu. 2012. Robust 3D action recognition with random occupancy patterns. In Proceedings of the European Conference on Computer Vision (ECCV’12). Springer, 872--885.
[37]
Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qh Dai. 2012. 3D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303.
[38]
Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guojun Qi, Yan Song, and Li-Rong Dai. 2009. Semi-supervised kernel density estimation for video annotation. Computer Vision and Image Understanding 113, 3 (2009), 384--396.
[39]
Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.
[40]
X. Zabulis, H. Baltzakis, and A. Argyros. 2009. Vision-based hand gesture recognition for human-computer interaction. The Universal Access Handbook. LEA (2009).

Cited By

View all
  • (2024)Applications of AI and Deep Learning in Biomedicine and HealthcareApplications of Parallel Data Processing for Biomedical Imaging10.4018/979-8-3693-2426-4.ch006(93-124)Online publication date: 31-May-2024
  • (2024)Study on Gesture Recognition Method with Two-Stream Residual Network Fusing sEMG Signals and Acceleration SignalsSensors10.3390/s2409270224:9(2702)Online publication date: 24-Apr-2024
  • (2024)Improved Recognition of Kurdish Sign Language Using Modified CNNComputers10.3390/computers1302003713:2(37)Online publication date: 28-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 2
Special Section on Visual Understanding with RGB-D Sensors
May 2015
381 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2753829
  • Editor:
  • Huan Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2015
Accepted: 01 January 2014
Revised: 01 November 2013
Received: 01 July 2013
Published in TIST Volume 6, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Kinect
  2. deep neural networks
  3. hand tracking
  4. posture recognition

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)9
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Applications of AI and Deep Learning in Biomedicine and HealthcareApplications of Parallel Data Processing for Biomedical Imaging10.4018/979-8-3693-2426-4.ch006(93-124)Online publication date: 31-May-2024
  • (2024)Study on Gesture Recognition Method with Two-Stream Residual Network Fusing sEMG Signals and Acceleration SignalsSensors10.3390/s2409270224:9(2702)Online publication date: 24-Apr-2024
  • (2024)Improved Recognition of Kurdish Sign Language Using Modified CNNComputers10.3390/computers1302003713:2(37)Online publication date: 28-Jan-2024
  • (2024)Toward the design of persuasive systems for a healthy workplace: a real-time posture detectionFrontiers in Big Data10.3389/fdata.2024.13599067Online publication date: 17-Jun-2024
  • (2024)Sign Language Recognizer Using Convolutional Neural Networks2024 2nd World Conference on Communication & Computing (WCONF)10.1109/WCONF61366.2024.10692298(1-7)Online publication date: 12-Jul-2024
  • (2024)Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.341688133(4188-4201)Online publication date: 1-Jan-2024
  • (2024)Piano Practice Assistant System Based on Hand Gesture Recognition2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)10.1109/ICECAI62591.2024.10675100(615-619)Online publication date: 31-May-2024
  • (2024)Three-dimensional dynamic gesture recognition method based on convolutional neural networkHigh-Confidence Computing10.1016/j.hcc.2024.100280(100280)Online publication date: Nov-2024
  • (2024)Machine-Learning-Based Accessibility SystemSN Computer Science10.1007/s42979-024-02615-95:3Online publication date: 28-Feb-2024
  • (2023)BESTProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i3.25470(3597-3605)Online publication date: 7-Feb-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media