Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3448748.3448815acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbicConference Proceedingsconference-collections
research-article

Multi-Stream CNN-LSTM Network with Partition Strategy for Human Action Recognition

Published: 21 March 2021 Publication History

Abstract

The wide application of human action recognition in the field of computer vision makes it a hot research topic in the past decades. In recent years, the prevalence of deep sensors and the proposal of real-time skeleton estimation algorithm based on deep images make human action recognition based on skeleton sequence attract increasing attention of researchers. Most of the existing work is aimed at extracting the spatial information of different joint nodes in a frame, but they do not fully consider the combination of temporal and spatial features. At the same time, the different joints were regarded as equally significant in most previous work, which is obviously not in line with the physiological characteristics and kinematics of human body. Therefore, in this paper, a human joint partition strategy is proposed to divide 25 human joints. In addition, a cnn-lstm framework is designed, which can simultaneously model the spatio-temporal characteristics of human skeleton sequence data, and extract the spatial domain information of different joints in a frame and the temporal domain information embedded in consecutive frames.

References

[1]
Z. Duric, W. Gray, R. Heishman, F. Li, A. Rosenfeld, M. Schoelles, C. Schunn, H. Wechsler. "Integrating perceptual and cognitive modeling for adaptive and intelligent human computer interaction", Proc. IEEE, pp. 1272--1289, 2002.
[2]
A. Thangali, J.P. Nash, S. Sclaroff, C. Neidle, "Exploiting phonological constraints for handshape inference in ASL video", Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 521--528, 2011.
[3]
A. Thangali Varadaraju, "Exploiting phonological constraints for handshape recognition in sign language video (Ph.D. thesis)", Boston University, MA, USA, 2013.
[4]
H. Cooper, R. Bowden, "Large lexicon detection of sign language", Proceedings of International Workshop on Human Computer Interaction (HCI), Springer, Berlin, Heidelberg, Beijing, P.R. China, pp. 88--97, 2007.
[5]
J.M. Rehg, G.D. Abowd, A. Rozga, M. Romero, M.A. Clements, S. Sclaroff, I. Essa, O.Y. Ousley, Y. Li, C. Kim, et al., "Decoding children's social behavior", Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Portland, Oregon, pp. 3414--3421, 2013.
[6]
L. Lo Presti, S. Sclaroff, A. Rozga, "Joint alignment and modeling of correlated behavior streams", Proceedings of International Conference on Computer Vision-Workshops (ICCVW), Sydney, Australia, pp. 730--737, 2013.
[7]
O. Kwabena, Z. Qin, T. Zhuang and Z. Qin, "MSCryptoNet: Multi-Scheme Privacy-Preserving Deep Learning in Cloud Computing," in IEEE Access, vol. 7, pp. 29344--29354, 2019.
[8]
Linfang Yu, Zhen Qin, Tianming Zhuang, Yi Ding, Zhiguang Qin, and Kim-Kwang Raymond Choo. "A Framework for Hierarchical Division of Retinal Vascular Networks." Neurocomputing.2019.https://doi.org/10.1016/j.neucom.2018.11.113.
[9]
Z. Qin et al., "Learning-Aided User Identification Using Smartphone Sensors for Smart Homes," in IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7760--7772, Oct. 2019.
[10]
Z. Qin et al., "Demographic information prediction based on smartphone application usage," 2014 International Conference on Smart Computing, Hong Kong, 2014, pp. 183--190.
[11]
Wang, Yilei, Tang, Y., Ma, J., & Qin, Z. (2015). Gender Prediction Based on Data Streams of Smartphone Applications. In Yu Wang, H. Xiong, S. Argamon, X. Li, & J. Li (Eds.), Big Data Computing and Communications (pp. 115--125). Cham: Springer International Publishing.
[12]
Zhen Qin, Gu Huang, Hu Xiong, Zhiguang Qin, and Kim-Kwang Raymond Choo. A Fuzzy Authentication System Based on Neural Network Learning and Extreme Value Statistics. IEEE Transactions on Fuzzy Systems. 2019.
[13]
Zhen Qin, Yilei Wang, Hongrong Cheng, Yingjie Zhou, Zhengguo Sheng, and Victor C.M. Leung. "Demographic Information Prediction: A Portrait of Smartphone Application Users.", In IEEE Transactions on Emerging Topics in Computing, 2018, 6(3): 432--444.
[14]
Ji S, Xu W, Yang M, et al. "3D Convolutional Neural Network for Human Action Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 221--231, 2013.
[15]
Manximov E, Srivastava N, Sakakhutdinov R. "Initialization Strategies of Spatio-Temporal Convolutional Neural Networks", arXiv:1503.07274, 2015.
[16]
Tran D, Bourdev L, Fergus R, et al. "Learning Spatiotemporal Features with 3D Convolutional Networks", Proceedings of IEEE International Conference on Computer Vision, 2015.
[17]
Hochreiter S, Schmidhuber J. "Long Short-Term Memory". Neural Computation, pp. 1735--1780, 1997.
[18]
Donahue J, Hendricks L A, Guadarrama S, et al. "Long-Term Re-current Convolutional Networks for Visual Recognition and Description", Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[19]
Donahue J, Hendricks L A, Guadarrama S, et al. "Long-Term Re-current Convolutional Networks for Visual Recognition and Description", Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[20]
Liu J, Shahroudy A, Xu D, et al. "Spatio-temporal LSTM with trust gates for 3D human action recognition", European Conference on Computer Vision, pp. 816--833, 2016.
[21]
Jun Liu et al. 2016. Spatio-temporal LSTM with trust gates for 3D human action recognition. in ECCV 2016. Springer, Cham Press, Amsterdam, the Netherlands, 816--833. DOI= https://doi.org/10.1007/978-3-319-46487-9_50.
[22]
H. Wang et al. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In IEEE 2017 Conference on CVPR. IEEE Press, Honolulu, HI, USA, 3633-- 3642.
[23]
T. S. Kim et al. 2017. Interpretable 3D human action analysis with temporal convolutional networks. in IEEE 2017 Conference on CVPR Workshops. IEEE Press, Honolulu, HI, USA, 1623--1631. DOI= https://doi.org/10.1109/CVPRW.2017.207.
[24]
M. Liu et al. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (Aug. 2017), 346--362.
[25]
Pengfei Zhang, Cuiling Lan, Xingjun Liang, "VA-fusion: View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, 2019.
[26]
S. Yan et al. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In the 32nd Int. Conf, AAAI. New Orleans, LA, USA, 7444--7452. DOI= https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135.

Cited By

View all
  • (2024)3D Convolutional Network based micro-gesture recognitionProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674459(193-198)Online publication date: 5-Jul-2024
  • (2024)Deep learning for computer vision based activity recognition and fall detection of the elderly: a systematic reviewApplied Intelligence10.1007/s10489-024-05645-154:19(8982-9007)Online publication date: 8-Jul-2024

Index Terms

  1. Multi-Stream CNN-LSTM Network with Partition Strategy for Human Action Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    BIC '21: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing
    January 2021
    445 pages
    ISBN:9781450390002
    DOI:10.1145/3448748
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Arizona: University of Arizona

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Action Recognition
    2. Computer Vision
    3. Deep Learning
    4. Human Skeleton

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    BIC 2021

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)3D Convolutional Network based micro-gesture recognitionProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674459(193-198)Online publication date: 5-Jul-2024
    • (2024)Deep learning for computer vision based activity recognition and fall detection of the elderly: a systematic reviewApplied Intelligence10.1007/s10489-024-05645-154:19(8982-9007)Online publication date: 8-Jul-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media