Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition

Published: 01 April 2018 Publication History

Abstract

Combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) recurrent network for skeleton-based human activity and hand gesture recognition.Two-stage training strategy which firstly focuses on the CNN training and, secondly, adjusts the full method CNN+LSTM.A method for data augmentation in the context of spatiotemporal 3D data sequences.An exhaustive experimental study on publicly available data benchmarks with respect to the state-of-the-art most representative methods.Comparison among different CPU and GPU platforms. In this work, we address human activity and hand gesture recognition problems using 3D data sequences obtained from full-body and hand skeletons, respectively. To this aim, we propose a deep learning-based approach for temporal 3D pose recognition problems based on a combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) recurrent network. We also present a two-stage training strategy which firstly focuses on CNN training and, secondly, adjusts the full method (CNN+LSTM). Experimental testing demonstrated that our training method obtains better results than a single-stage training strategy. Additionally, we propose a data augmentation method that has also been validated experimentally. Finally, we perform an extensive experimental study on publicly available data benchmarks. The results obtained show how the proposed approach reaches state-of-the-art performance when compared to the methods identified in the literature. The best results were obtained for small datasets, where the proposed data augmentation strategy has greater impact.

References

[1]
A. Farooq, C.S. Won, A survey of human action recognition approaches that use an RGB-D sensor, 2015.
[2]
P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang, P. Ogunbona, Deep convolutional neural networks for action recognition using depth map sequences, Comput. Res. Repository (CoRR), abs/1501.04686 (2015).
[3]
S. Escalera, X. Baro, J. Gonzalez, M. Bautista, M. Madadi, M. Reyes, V. Ponce, H. Escalante, J. Shotton, I. Guyon, ChaLearn Looking at People Challenge 2014: Dataset and Results, vol. 8925, Lecture Notes in Computer Science, pp. 459473.
[4]
A. Shahroudy, J. Liu, T.-T. Ng, G. Wang, Ntu RGB+D: a large scale dataset for 3d human activity analysis, 2016.
[5]
J. Zhang, W. Li, P.O. Ogunbona, P. Wang, C. Tang, RGB-D-based action recognition datasets: a survey, Pattern Recognit., 60 (2016) 86-105.
[6]
L.L. Presti, M.L. Cascia, 3D skeleton-based human action classification: a survey, Pattern Recognit., 53 (2016) 130-147.
[7]
L. Xia, C. Chen, J.K. Aggarwal, View invariant human action recognition using histograms of 3d joints, 2012.
[8]
M. Zanfir, M. Leordeanu, C. Sminchisescu, The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection, 2013.
[9]
M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, A. Bimbo, Space-time pose representation for 3d human action recognition, Springer-Verlag New York, Inc., New York, NY, USA, 2013.
[10]
A. Chrungoo, S.S. Manimaran, B. Ravindran, Activity Recognition for Natural Human Robot Interaction, vol. 8755, Springer International Publishing, Cham, pp. 8494.
[11]
R. Vemulapalli, F. Arrate, R. Chellappa, Human action recognition by representing 3d skeletons as points in a lie group, 2014.
[12]
G. Evangelidis, G. Singh, R. Horaud, Skeletal quads: human action recognition using joint quadruples, 2014.
[13]
H. Zhang, L.E. Parker, Bio-inspired predictive orientation decomposition of skeleton trajectories for real-time human activity prediction, 2015.
[14]
L. Tao, R. Vidal, Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition, 2015.
[15]
C. Coppola, O. Martinez Mozos, N. Bellotto, Applying a 3d qualitative trajectory calculus to human action recognition using depth cameras, IEEE, 2015.
[16]
W. Ding, K. Liu, F. Cheng, J. Zhang, STFC: spatio-temporal feature chain for skeleton-based human action recognition, J. Vis. Commun. Image Represent., 26 (2015) 329-337.
[17]
B.A. Boulbaba, J. Su, S. Anuj, Action recognition using rate-invariant analysis of skeletal shape trajectories, IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016) 1-13.
[18]
G. Zhu, L. Zhang, P. Shen, J. Song, An online continuous human action recognition algorithm based on the kinect sensor, Sensors, 16 (2016) 161:1-161:18.
[19]
E. Cippitelli, S. Gasparrini, E. Gambi, S. Spinsante, A human activity recognition system using skeleton data from RGBD sensors, Comput. Intell. Neurosci., 2016 (2016) 4351435:1-4351435:14.
[20]
C. Wang, Y. Wang, A.L. Yuille, Mining 3d key-pose-motifs for action recognition, 2016.
[21]
C. Wang, J. Flynn, Y. Wang, A.L. Yuille, Recognizing actions in 3d using action-snippets and activated simplices, AAAI Press, 2016.
[22]
I. Lillo, J.C. Niebles, A. Soto, A hierarchical pose-based approach to complex action understanding using dictionaries of actionlets and motion poselets, 2016.
[23]
Y. Du, W. Wang, L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, 2015.
[24]
V. Veeriah, V. Zhuang, G. Qi, Differential recurrent neural networks for action recognition, 2015.
[25]
J. Donahue, L.A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. D., K. Saenko, Long-term recurrent convolutional networks for visual recognition and description, 2015.
[26]
N. Neverova, C. Wolf, G. Lacey, L. Fridman, D. Chandra, B. Barbello, G. Taylor, Learning human identity from motion patterns, IEEE Access, 4 (2015) 1810-1820.
[27]
N. Neverova, C. Wolf, G. Taylor, F. Nebout, Moddrop: adaptive multi-modal gesture recognition, IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016) 1692-1706.
[28]
M. Lingfei, L. Fan, Z. Yanjia, H. Anjie, Human physical activity recognition based on computer vision with deep learning model, 2016.
[29]
W. Pichao, L. Zhaoyang, H. Yonghong, L. Wanqing, Action recognition based on joint trajectory maps using convolutional neural networks, 2016.
[30]
L. Yanghao, L. Cuiling, X. Junliang, Z. Wenjun, Y. Chunfeng, L. Jiaying, Online human action detection using joint classification-regression recurrent neural networks, 2016.
[31]
J. Liu, A. Shahroudy, D. Xu, G. Wang, Spatio-temporal LSTM with trust gates for 3d human action recognition, 2016.
[32]
B. Mahasseni, S. Todorovic, Regularizing long short term memory with 3d human-skeleton sequences for action recognition, 2016.
[33]
W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, X. Xie, Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks, AAAI Press, 2016.
[34]
J.F. Hu, W.S. Zheng, J.H. Lai, J. Zhang, Jointly learning heterogeneous features for rgb-d activity recognition, IEEE Trans. Pattern Anal. Mach. Intell., PP (2017).
[35]
S. Song, C. Lan, J. Xing, W. Zeng, J. Liu, An end-to-end spatio-temporal attention model for human action recognition from skeleton data, 2017.
[36]
F. Han, B. Reily, W. Hoff, H. Zhang, Space-time representation of people based on 3d skeletal data: a review, Comput. Vis. Image Understanding, 158 (2017) 85-105.
[37]
B. Ionescu, D. Coquin, P. Lambert, V. Buzuloi, Dynamic hand gesture recognition using the skeleton of the hand, EURASIP J. Appl. Signal Process., 13 (2005) 2101-2109.
[38]
K. Reddy, P. Latha, M. Babu, Hand gesture recognition using skeleton of hand and distance based metric, Adv. Comput. Inf. Technol., 198 (2011) 346-354.
[39]
C. Wang, S.C. Chan, A new hand gesture recognition algorithm based on joint color-depth superpixel earth movers distance, 2014.
[40]
Q. De Smedt, H. Wannous, J.P. Vandeborre, Skeleton-based dynamic hand gesture recognition, 2016.
[41]
Intel, RealSense SDK Developer Guide 6.0, 2015.
[42]
Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 (2015) 436-444.
[43]
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997) 1735-1780.
[44]
X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS-11), 15, 2011, pp. 315-323.
[45]
R. Raina, A. Madhavan, A.Y. Ng, Large-scale deep unsupervised learning using graphics processors, ACM, New York, NY, USA, 2009.
[46]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014) 1929-1958.
[47]
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Comput., 1 (1989) 541-551.
[48]
A.K. Jain, J. Mao, K.M. Mohiuddin, Artificial neural networks: a tutorial, Computer, 29 (1996) 31-44.
[49]
W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3D points, 2010.
[50]
J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, 2012.
[51]
D.R. Faria, M. Vieira, C. Premebida, U. Nunes, Probabilistic human daily activity recognition towards robot-assisted living, 2015.
[52]
M.D. Zeiler, ADADELTA: an adaptive learning rate method, Comput. Res. Repository (CoRR), abs/1212.5701 (2012).
[53]
C. Chen, R. Jafari, N. Kehtarnavaz, Action recognition from depth sequences using depth motion maps-based local binary patterns, 2015.
[54]
M.A. Gowayyed, M. Torki, M.E. Hussein, M. El-Saban, Histogram of oriented displacements (hod): describing trajectories of human joints for action recognition, AAAI Press, 2013.
[55]
R.E.F. Behnam, Stats-calculus pose descriptor feeding a discrete HMM low-latency detection and recognition system for 3D skeletal actions, Comput. Res. Repository (CoRR), abs/1509.09014 (2015).
[56]
R. Anirudh, P. Turaga, J. Su, A. Srivastava, Elastic functional coding of human actions: from vector-fields to latent variables, 2015.
[57]
G. Ling, C. Fu, Human action recognition using APJ3D and random forests, J. Softw. (JSW), 8 (2013) 2238-2245.
[58]
A. Liu, W. Nie, Y. Su, L. Ma, T. Hao, Z. Yang, Coupled hidden conditional random fields for RGB-D human action recognition, Signal Process., 112 (2015) 74-82.
[59]
Y. Zhu, W. Chen, G.-D. Guo, Fusing spatiotemporal features and joints for 3Daction recognition, 2013.
[60]
M. Jiang, J. Kong, G. Bebis, H. Huo, Informative joints based human action recognition using skeleton contexts, Signal Process., 33 (2015) 29-40.
[61]
I. Theodorakopoulos, D. Kastaniotis, G. Economou, S. Fotopoulos, Pose-based human action recognition via sparse representation in dissimilarity space, J. Vis. Commun. Image Represent., 25 (2014) 12-23.
[62]
J.Y. Chang, Nonparametric Gesture Labeling from Multi-modal Data, Springer International Publishing, Cham, pp. 503517.

Cited By

View all
  • (2024)Vision-Based Hand Gesture Customization from a Single DemonstrationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676378(1-14)Online publication date: 13-Oct-2024
  • (2024)Temporal Decoupling Graph Convolutional Network for Skeleton-Based Gesture RecognitionIEEE Transactions on Multimedia10.1109/TMM.2023.327181126(811-823)Online publication date: 1-Jan-2024
  • (2024)Spatial–Temporal Synchronous Transformer for Skeleton-Based Hand Gesture RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329508434:3(1403-1412)Online publication date: 1-Mar-2024
  • Show More Cited By
  1. Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Pattern Recognition
    Pattern Recognition  Volume 76, Issue C
    April 2018
    669 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 April 2018

    Author Tags

    1. Convolutional Neural Network
    2. Deep learning
    3. Hand gesture recognition
    4. Human activity recognition
    5. Long Short-Term Memory
    6. Real-time
    7. Recurrent neural network

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Vision-Based Hand Gesture Customization from a Single DemonstrationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676378(1-14)Online publication date: 13-Oct-2024
    • (2024)Temporal Decoupling Graph Convolutional Network for Skeleton-Based Gesture RecognitionIEEE Transactions on Multimedia10.1109/TMM.2023.327181126(811-823)Online publication date: 1-Jan-2024
    • (2024)Spatial–Temporal Synchronous Transformer for Skeleton-Based Hand Gesture RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329508434:3(1403-1412)Online publication date: 1-Mar-2024
    • (2024)Early gesture detection in untrimmed streamsPattern Recognition10.1016/j.patcog.2024.110733156:COnline publication date: 18-Nov-2024
    • (2024)Deep neural network with empirical mode decomposition and Bayesian optimisation for residential load forecastingExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121355237:PAOnline publication date: 27-Feb-2024
    • (2024)Deep learning models beyond temporal frame-wise features for hand gesture video recognitionThe Journal of Supercomputing10.1007/s11227-024-05910-780:9(12430-12462)Online publication date: 1-Jun-2024
    • (2024)Viewpoint guided multi-stream neural network for skeleton action recognitionMultimedia Tools and Applications10.1007/s11042-023-15676-483:3(6783-6802)Online publication date: 1-Jan-2024
    • (2024)A multilayered framework for diagnosis and classification of Alzheimer's disease using transfer learned Alexnet and LSTMNeural Computing and Applications10.1007/s00521-023-09301-636:7(3777-3801)Online publication date: 1-Mar-2024
    • (2024)Survey on vision-based dynamic hand gesture recognitionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03160-x40:9(6171-6199)Online publication date: 1-Sep-2024
    • (2024)Quantized depth image and skeleton-based multimodal dynamic hand gesture recognitionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02762-140:1(11-25)Online publication date: 1-Jan-2024
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media