research-article

Multi-Stream CNN-LSTM Network with Partition Strategy for Human Action Recognition

Authors:

Zhuang Tianming,

Wang BintaoAuthors Info & Claims

BIC '21: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing

Pages 431 - 435

https://doi.org/10.1145/3448748.3448815

Published: 21 March 2021 Publication History

Abstract

The wide application of human action recognition in the field of computer vision makes it a hot research topic in the past decades. In recent years, the prevalence of deep sensors and the proposal of real-time skeleton estimation algorithm based on deep images make human action recognition based on skeleton sequence attract increasing attention of researchers. Most of the existing work is aimed at extracting the spatial information of different joint nodes in a frame, but they do not fully consider the combination of temporal and spatial features. At the same time, the different joints were regarded as equally significant in most previous work, which is obviously not in line with the physiological characteristics and kinematics of human body. Therefore, in this paper, a human joint partition strategy is proposed to divide 25 human joints. In addition, a cnn-lstm framework is designed, which can simultaneously model the spatio-temporal characteristics of human skeleton sequence data, and extract the spatial domain information of different joints in a frame and the temporal domain information embedded in consecutive frames.

References

[1]

Z. Duric, W. Gray, R. Heishman, F. Li, A. Rosenfeld, M. Schoelles, C. Schunn, H. Wechsler. "Integrating perceptual and cognitive modeling for adaptive and intelligent human computer interaction", Proc. IEEE, pp. 1272--1289, 2002.

[2]

A. Thangali, J.P. Nash, S. Sclaroff, C. Neidle, "Exploiting phonological constraints for handshape inference in ASL video", Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 521--528, 2011.

Digital Library

[3]

A. Thangali Varadaraju, "Exploiting phonological constraints for handshape recognition in sign language video (Ph.D. thesis)", Boston University, MA, USA, 2013.

[4]

H. Cooper, R. Bowden, "Large lexicon detection of sign language", Proceedings of International Workshop on Human Computer Interaction (HCI), Springer, Berlin, Heidelberg, Beijing, P.R. China, pp. 88--97, 2007.

[5]

J.M. Rehg, G.D. Abowd, A. Rozga, M. Romero, M.A. Clements, S. Sclaroff, I. Essa, O.Y. Ousley, Y. Li, C. Kim, et al., "Decoding children's social behavior", Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Portland, Oregon, pp. 3414--3421, 2013.

[6]

L. Lo Presti, S. Sclaroff, A. Rozga, "Joint alignment and modeling of correlated behavior streams", Proceedings of International Conference on Computer Vision-Workshops (ICCVW), Sydney, Australia, pp. 730--737, 2013.

Digital Library

[7]

O. Kwabena, Z. Qin, T. Zhuang and Z. Qin, "MSCryptoNet: Multi-Scheme Privacy-Preserving Deep Learning in Cloud Computing," in IEEE Access, vol. 7, pp. 29344--29354, 2019.

[8]

Linfang Yu, Zhen Qin, Tianming Zhuang, Yi Ding, Zhiguang Qin, and Kim-Kwang Raymond Choo. "A Framework for Hierarchical Division of Retinal Vascular Networks." Neurocomputing.2019.https://doi.org/10.1016/j.neucom.2018.11.113.

[9]

Z. Qin et al., "Learning-Aided User Identification Using Smartphone Sensors for Smart Homes," in IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7760--7772, Oct. 2019.

[10]

Z. Qin et al., "Demographic information prediction based on smartphone application usage," 2014 International Conference on Smart Computing, Hong Kong, 2014, pp. 183--190.

[11]

Wang, Yilei, Tang, Y., Ma, J., & Qin, Z. (2015). Gender Prediction Based on Data Streams of Smartphone Applications. In Yu Wang, H. Xiong, S. Argamon, X. Li, & J. Li (Eds.), Big Data Computing and Communications (pp. 115--125). Cham: Springer International Publishing.

[12]

Zhen Qin, Gu Huang, Hu Xiong, Zhiguang Qin, and Kim-Kwang Raymond Choo. A Fuzzy Authentication System Based on Neural Network Learning and Extreme Value Statistics. IEEE Transactions on Fuzzy Systems. 2019.

Digital Library

[13]

Zhen Qin, Yilei Wang, Hongrong Cheng, Yingjie Zhou, Zhengguo Sheng, and Victor C.M. Leung. "Demographic Information Prediction: A Portrait of Smartphone Application Users.", In IEEE Transactions on Emerging Topics in Computing, 2018, 6(3): 432--444.

[14]

Ji S, Xu W, Yang M, et al. "3D Convolutional Neural Network for Human Action Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 221--231, 2013.

Digital Library

[15]

Manximov E, Srivastava N, Sakakhutdinov R. "Initialization Strategies of Spatio-Temporal Convolutional Neural Networks", arXiv:1503.07274, 2015.

[16]

Tran D, Bourdev L, Fergus R, et al. "Learning Spatiotemporal Features with 3D Convolutional Networks", Proceedings of IEEE International Conference on Computer Vision, 2015.

[17]

Hochreiter S, Schmidhuber J. "Long Short-Term Memory". Neural Computation, pp. 1735--1780, 1997.

Digital Library

[18]

Donahue J, Hendricks L A, Guadarrama S, et al. "Long-Term Re-current Convolutional Networks for Visual Recognition and Description", Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[19]

Donahue J, Hendricks L A, Guadarrama S, et al. "Long-Term Re-current Convolutional Networks for Visual Recognition and Description", Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[20]

Liu J, Shahroudy A, Xu D, et al. "Spatio-temporal LSTM with trust gates for 3D human action recognition", European Conference on Computer Vision, pp. 816--833, 2016.

[21]

Jun Liu et al. 2016. Spatio-temporal LSTM with trust gates for 3D human action recognition. in ECCV 2016. Springer, Cham Press, Amsterdam, the Netherlands, 816--833. DOI= https://doi.org/10.1007/978-3-319-46487-9_50.

[22]

H. Wang et al. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In IEEE 2017 Conference on CVPR. IEEE Press, Honolulu, HI, USA, 3633-- 3642.

[23]

T. S. Kim et al. 2017. Interpretable 3D human action analysis with temporal convolutional networks. in IEEE 2017 Conference on CVPR Workshops. IEEE Press, Honolulu, HI, USA, 1623--1631. DOI= https://doi.org/10.1109/CVPRW.2017.207.

[24]

M. Liu et al. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (Aug. 2017), 346--362.

Digital Library

[25]

Pengfei Zhang, Cuiling Lan, Xingjun Liang, "VA-fusion: View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, 2019.

[26]

S. Yan et al. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In the 32nd Int. Conf, AAAI. New Orleans, LA, USA, 7444--7452. DOI= https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135.

Cited By

Zhang CFu WTian CCheng XTian YYu H(2024)3D Convolutional Network based micro-gesture recognitionProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674459(193-198)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674459
Gaya-Morey FManresa-Yee CBuades-Rubio J(2024)Deep learning for computer vision based activity recognition and fall detection of the elderly: a systematic reviewApplied Intelligence10.1007/s10489-024-05645-154:19(8982-9007)Online publication date: 8-Jul-2024
https://doi.org/10.1007/s10489-024-05645-1

Index Terms

Multi-Stream CNN-LSTM Network with Partition Strategy for Human Action Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Common-Sense knowledge for a computer vision system for human action recognition
IWAAL'12: Proceedings of the 4th international conference on Ambient Assisted Living and Home Care

This work presents a novel approach for human action recognition based on the combination of computer vision techniques and common-sense knowledge and reasoning capabilities. The emphasis of this work is on how common sense has to be leveraged to a ...
Multi-Stream Interaction Networks for Human Action Recognition
Skeleton-based human action recognition has received extensive attention due to its efficiency and robustness to complex backgrounds. Though the human skeleton can accurately capture the dynamics of human poses, it fails to recognize human actions induced ...
Human Action Recognition using Pre-trained Convolutional Neural Networks
VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

Recognition of human action is one of the challenges in the field of artificial intelligence. Deep learning model has become a research issue in action recognition applications due to its ability to outperform traditional machine learning approaches. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

BIC '21: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing

January 2021

445 pages

ISBN:9781450390002

DOI:10.1145/3448748

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Arizona: University of Arizona

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

BIC 2021

BIC 2021: 2021 International Conference on Bioinformatics and Intelligent Computing

January 22 - 24, 2021

Harbin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
72
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang CFu WTian CCheng XTian YYu H(2024)3D Convolutional Network based micro-gesture recognitionProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674459(193-198)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674459
Gaya-Morey FManresa-Yee CBuades-Rubio J(2024)Deep learning for computer vision based activity recognition and fall detection of the elderly: a systematic reviewApplied Intelligence10.1007/s10489-024-05645-154:19(8982-9007)Online publication date: 8-Jul-2024
https://doi.org/10.1007/s10489-024-05645-1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents