research-article

A Real-Time Hand Posture Recognition System Using Deep Neural Networks

Authors:

Houqiang LiAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 6, Issue 2

Article No.: 21, Pages 1 - 23

https://doi.org/10.1145/2735952

Published: 31 March 2015 Publication History

Abstract

Hand posture recognition (HPR) is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras and the limitations of traditional manually selected features. In this article, we propose a two-stage HPR system for Sign Language Recognition using a Kinect sensor. In the first stage, we propose an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply deep neural networks (DNNs) to automatically learn features from hand posture images that are insensitive to movement, scaling, and rotation. Experiments verify that the proposed system works quickly and accurately and achieves a recognition accuracy as high as 98.12%.

References

[1]

Alper Aksaç, Orkun Öztürk, and Tansel Özyer. 2011. Real-time multi-objective hand posture/gesture recognition by using distance classifiers and finite state machine for virtual mouse operations. In Proceedings of the 2011 7th International Conference on Electrical and Electronics Engineering (ELECO’11). IEEE, II--457.

[2]

Antonis A. Argyros and Manolis I. A. Lourakis. 2004. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In Proceedings of the European Conference on Computer Vision (ECCV’04). Springer, 368--379.

[3]

Chuqing Cao and Ruifeng Li. 2010. Real-time hand posture recognition using Haar-like and topological feature. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface (MVHI’10). IEEE, 683--687.

Digital Library

[4]

Manuel Caputo, Klaus Denker, Benjamin Dums, and Georg Umlauf. 2012. 3D hand gesture recognition based on sensor fusion of commodity hardware. In Mensch & Computer 2012: interaktiv informiert--allgegenwäärtig und allumfassend&excl;&quest;

[5]

Douglas Chai and King N. Ngan. 1999. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology 9, 4 (1999), 551--564.

Digital Library

[6]

Feng-Sheng Chen, Chih-Ming Fu, and Chung-Lin Huang. 2003. Hand gesture recognition using a real-time tracking method and hidden Markov models. Image and Vision Computing 21, 8 (2003), 745--758.

[7]

George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42.

Digital Library

[8]

Marco Fagiani, Emanuele Principi, Stefano Squartini, and Francesco Piazza. 2013. A new system for automatic recognition of italian sign language. In Neural Nets and Surroundings. Springer, 69--79.

[9]

Gian Luca Foresti. 1999. Object recognition and tracking for remote video surveillance. IEEE Transactions on Circuits and Systems for Video Technology 9, 7 (1999), 1045--1062.

Digital Library

[10]

Wen Gao, Gaolin Fang, Debin Zhao, and Yiqiang Chen. 2004. A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition 37, 12 (2004), 2389--2402.

Digital Library

[11]

Geoffrey E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800.

Digital Library

[12]

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554.

Digital Library

[13]

Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.

[14]

Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2003. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University. July, 2003.

[15]

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. (2013).

[16]

Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceeding of the European Symposium on Artificial Neural Networks (ESANN’11).

[17]

A. Kurakin, Z. Zhang, and Z. Liu. 2012. A real time system for dynamic hand gesture recognition with a depth sensor. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO’12). IEEE, 1975--1979.

[18]

Yann LeCun. 1989. Generalization and network design strategies. Connectionism in Perspective (1989), 143--155.

[19]

Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 310 (1995).

Digital Library

[20]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.

[21]

Billy Y. L. Li, Ajmal S. Mian, Wanquan Liu, and Aneesh Krishna. 2013. Using Kinect for face recognition under varying poses, expressions, illumination and disguise. In Proceeding of the 2013 IEEE Workshop on Applications of Computer Vision (WACV). IEEE, 186--192.

Digital Library

[22]

Yi Li. 2012. Hand gesture recognition using Kinect. In Proceedings of the 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS). IEEE, 196--199.

[23]

Zhi Li and Ray Jarvis. 2009. Real time hand gesture recognition using a range camera. In Proceedings of the Australasian Conference on Robotics and Automation. 21--27.

[24]

Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).

Digital Library

[25]

S. Malassiotis and M. G. Strintzis. 2008. Real-time hand posture recognition using range data. Image and Vision Computing 26, 7 (2008), 1027--1037.

Digital Library

[26]

Vinod Nair and Geoffrey Hinton. 2009. 3-d object recognition with deep belief nets. Advances in Neural Information Processing Systems 22 (2009), 1339--1347.

[27]

C. Nebauer. 1998. Evaluation of convolutional neural networks for visual recognition. IEEE Transactions on Neural Networks 9, 4 (1998), 685--696.

Digital Library

[28]

V. Radha and M. Krishnaveni. 2009. Threshold based segmentation using median filter for sign language recognition system. In Proceedings of the World Congress on Nature & Biologically Inspired Computing, 2009 (NaBIC’’09). IEEE, 1394--1399.

[29]

Zhou Ren, Junsong Yuan, and Zhengyou Zhang. 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 1093--1096.

Digital Library

[30]

Ruslan Salakhutdinov and Geoffrey E. Hinton. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. Advances in Neural Information Processing Systems (2008), 1249--1256.

[31]

Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of Interspeech. 437--440.

[32]

Poonam Suryanarayan, Anbumani Subramanian, and Dinesh Mandalapu. 2010. Dynamic hand pose recognition using depth data. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, 3105--3108.

Digital Library

[33]

Satoshi Suzuki. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32--46.

[34]

Balazs Tusor and A. R. Varkonyi-Koczy. 2010. Circular fuzzy neural network based hand gesture and posture modeling. In Proceedings of the 2010 IEEE Instrumentation and Measurement Technology Conference (I2MTC’10). IEEE, 815--820.

[35]

Michael Van den Bergh and Luc Van Gool. 2011. Combining RGB and ToF cameras for real-time 3D hand gesture interaction. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV’11). IEEE, 66--72.

Digital Library

[36]

Jiang Wang, Zicheng Liu, Jan Chorowski, Zhuoyuan Chen, and Ying Wu. 2012. Robust 3D action recognition with random occupancy patterns. In Proceedings of the European Conference on Computer Vision (ECCV’12). Springer, 872--885.

Digital Library

[37]

Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qh Dai. 2012. 3D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303.

Digital Library

[38]

Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guojun Qi, Yan Song, and Li-Rong Dai. 2009. Semi-supervised kernel density estimation for video annotation. Computer Vision and Image Understanding 113, 3 (2009), 384--396.

Digital Library

[39]

Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.

[40]

X. Zabulis, H. Baltzakis, and A. Argyros. 2009. Vision-based hand gesture recognition for human-computer interaction. The Universal Access Handbook. LEA (2009).

Cited By

Pujahari RKhan RYadav S(2024)Applications of AI and Deep Learning in Biomedicine and HealthcareApplications of Parallel Data Processing for Biomedical Imaging10.4018/979-8-3693-2426-4.ch006(93-124)Online publication date: 31-May-2024
https://doi.org/10.4018/979-8-3693-2426-4.ch006
Hu ZWang SOu CGe ALi X(2024)Study on Gesture Recognition Method with Two-Stream Residual Network Fusing sEMG Signals and Acceleration SignalsSensors10.3390/s2409270224:9(2702)Online publication date: 24-Apr-2024
https://doi.org/10.3390/s24092702
Hama Rawf KAbdulrahman AMohammed A(2024)Improved Recognition of Kurdish Sign Language Using Modified CNNComputers10.3390/computers1302003713:2(37)Online publication date: 28-Jan-2024
https://doi.org/10.3390/computers13020037
Show More Cited By

Index Terms

A Real-Time Hand Posture Recognition System Using Deep Neural Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision tasks
        Scene understanding

Recommendations

A real time vision-based hand gestures recognition system
ISICA'10: Proceedings of the 5th international conference on Advances in computation and intelligence

Hand gesture recognition is an important aspect in Human-Computer interaction, and can be used in various applications, such as virtual reality and computer games. In this paper, we propose a real time hand gesture recognition system. It includes three ...
Real-time Hand Tracking Using Kinect
ICDSP '18: Proceedings of the 2nd International Conference on Digital Signal Processing

Real-time hand tracking is fundamental to human gesture recognition. However, due to the huge computation, previous studies are either off-line or limited to given poses. In order to satisfy the requirement of real-time hand tracking, in this paper we ...
Robust hand posture recognition integrating multi-cue hand tracking
Edutainment'10: Proceedings of the Entertainment for education, and 5th international conference on E-learning and games

This paper proposes a robust real-time method for hand tracking and hand posture recognition. Dealing with complex background, scale-invariance and rotation-invariance are the difficulties for hand posture recognition. To solve these difficulties, we ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 6, Issue 2

Special Section on Visual Understanding with RGB-D Sensors

May 2015

381 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2753829

Editor:
Huan Liu
Arizona State University

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2015

Accepted: 01 January 2014

Revised: 01 November 2013

Received: 01 July 2013

Published in TIST Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

97
Total Citations
View Citations
1,335
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)9

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pujahari RKhan RYadav S(2024)Applications of AI and Deep Learning in Biomedicine and HealthcareApplications of Parallel Data Processing for Biomedical Imaging10.4018/979-8-3693-2426-4.ch006(93-124)Online publication date: 31-May-2024
https://doi.org/10.4018/979-8-3693-2426-4.ch006
Hu ZWang SOu CGe ALi X(2024)Study on Gesture Recognition Method with Two-Stream Residual Network Fusing sEMG Signals and Acceleration SignalsSensors10.3390/s2409270224:9(2702)Online publication date: 24-Apr-2024
https://doi.org/10.3390/s24092702
Hama Rawf KAbdulrahman AMohammed A(2024)Improved Recognition of Kurdish Sign Language Using Modified CNNComputers10.3390/computers1302003713:2(37)Online publication date: 28-Jan-2024
https://doi.org/10.3390/computers13020037
Ataguba GOrji R(2024)Toward the design of persuasive systems for a healthy workplace: a real-time posture detectionFrontiers in Big Data10.3389/fdata.2024.13599067Online publication date: 17-Jun-2024
https://doi.org/10.3389/fdata.2024.1359906
Mudumbai KKarthik MShalini NTrisha N(2024)Sign Language Recognizer Using Convolutional Neural Networks2024 2nd World Conference on Communication & Computing (WCONF)10.1109/WCONF61366.2024.10692298(1-7)Online publication date: 12-Jul-2024
https://doi.org/10.1109/WCONF61366.2024.10692298
Zhao WZhou WHu HWang MLi H(2024)Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.341688133(4188-4201)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3416881
Hu JLi LGan NHuang QLi ZWu Q(2024)Piano Practice Assistant System Based on Hand Gesture Recognition2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)10.1109/ICECAI62591.2024.10675100(615-619)Online publication date: 31-May-2024
https://doi.org/10.1109/ICECAI62591.2024.10675100
Xi JZhang WXu ZZhu STang LZhao L(2024)Three-dimensional dynamic gesture recognition method based on convolutional neural networkHigh-Confidence Computing10.1016/j.hcc.2024.100280(100280)Online publication date: Nov-2024
https://doi.org/10.1016/j.hcc.2024.100280
Banerjee KSingh AAkhtar NVats I(2024)Machine-Learning-Based Accessibility SystemSN Computer Science10.1007/s42979-024-02615-95:3Online publication date: 28-Feb-2024
https://doi.org/10.1007/s42979-024-02615-9
Zhao WHu HZhou WShi JLi HWilliams BChen YNeville J(2023)BESTProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i3.25470(3597-3605)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i3.25470
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents