Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3529399.3529425acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmltConference Proceedingsconference-collections
research-article

A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural Networks

Published: 10 June 2022 Publication History

Abstract

Nowadays, Human Action Recognition (HAR) has become an important issue since it is widely used in video surveillance, human-robot collaboration in industry, etc. Developing such accurate and efficient algorithms remains a difficult task because of the high variability of the human shapes, postures, as well as the complexity of their movements but more importantly when using continuous/untrimmed data streams. Since HAR from Segmented/trimmed sequences has been intensively studied and developed in the recent years, Online HAR in the other hand remains a challenging task and is less developed. In this paper, we propose a Sliding Window and Majority Voting skeleton-based approach for Online HAR using Spatial Temporal Graph Convolutional Neural Networks (STGCN-SWMV). Our method is evaluated on two Online skeleton-based datasets named OAD and UOW. In comparison with existing methods, the obtained results exceed state-of-the-art algorithms and show the efficiency of the proposed approach.

References

[1]
Maryam Asadi-Aghbolaghi, A. Clapés, Marco Bellantonio, H. Escalante, Víctor Ponce-López, X. Baró, I. Guyon, S. Kasaei, and S. Escalera. 2017. A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) (2017), 476–483.
[2]
Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. 2017. Real-time Online Action Detection Forests using Spatio-temporal Contexts. Computing Research Repository (CoRR) abs/1610.09334 (2017). arxiv:1610.09334http://arxiv.org/abs/1610.09334
[3]
Oresti Banos, Juan-Manuel Galvez, Miguel Damas, Hector Pomares, and Ignacio Rojas. 2014. Window Size Impact in Human Activity Recognition. Sensors 14, 4 (2014), 6474–6499. https://doi.org/10.3390/s140406474
[4]
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (Jul 2017), 18–42. https://doi.org/10.1109/MSP.2017.2693418 arXiv:1611.08097.
[5]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CoRR abs/1812.08008(2018). arxiv:1812.08008http://arxiv.org/abs/1812.08008
[6]
M. Dallel, V. Havard, D. Baudry, and X. Savatier. 2020. InHARD - Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics. In IEEE International Conference on Human-Machine Systems (ICHMS). 1–6. https://doi.org/10.1109/ICHMS49158.2020.9209531
[7]
Akbar Dehghani, Omid Sarbishei, Tristan Glatard, and Emad Shihab. 2019. A Quantitative Comparison of Overlapping and Non-Overlapping Sliding Windows for Human Activity Recognition Using Inertial Sensors. Sensors 19, 22 (2019). https://doi.org/10.3390/s19225026
[8]
Mickael Delamare, Cyril Laville, Adnane Cabani, and Houcine Chafouk. 2021. Graph Convolutional Networks Skeleton-based Action Recognition for Continuous Data Stream: A Sliding Window Approach. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,. INSTICC, SciTePress, 427–435. https://doi.org/10.5220/0010234904270435
[9]
Victor Escorcia Fabian Caba Heilbron, Bernard Ghanem and Juan Carlos Niebles. 2015. ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961–970.
[10]
H. Idrees, A. R. Zamir, Y. Jiang, A. Gorban, I. Laptev, R. Sukthankar, and M. Shah. 2017. The THUMOS challenge on action recognition for videos “in the wild”. Computer Vision and Image Understanding 155 (2017), 1–23.
[11]
Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. CoRR abs/1609.02907(2016). arxiv:1609.02907http://arxiv.org/abs/1609.02907
[12]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).
[13]
Igor Kviatkovsky, Ehud Rivlin, and Ilan Shimshoni. 2014. Online action recognition using covariance of shape and motion. Comput. Vis. Image Underst. 129 (2014), 15–26.
[14]
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3595–3603.
[15]
Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu. 2016. Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks. European Conference on Computer Vision (ECCV) (2016).
[16]
Y. Li, Daniel Tarlow, Marc Brockschmidt, and R. Zemel. 2016. Gated Graph Sequence Neural Networks. CoRR abs/1511.05493(2016).
[17]
C. H. Lin, P. Y. Chou, C. H. Lin, and M. Y. Tsai. 2020. SlowFast-GCN: A Novel Skeleton-Based Action Recognition Framework. In 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). 170–174. https://doi.org/10.1109/ICPAI51961.2020.00039
[18]
Chunhui Liu, Yanghao Li, Yueyu Hu, and Jiaying Liu. 2017. Online action detection and forecast via Multitask deep Recurrent Neural Networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1702–1706. https://doi.org/10.1109/ICASSP.2017.7952447
[19]
J. Liu, Y. Li, S. Song, J. Xing, C. Lan, and W. Zeng. 2019. Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection. IEEE Transactions on Circuits and Systems for Video Technology 29, 9(2019), 2667–2682. https://doi.org/10.1109/TCSVT.2018.2799968
[20]
Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, and Alex C Kot. 2019. Skeleton-based online action prediction using scale selection network. IEEE transactions on pattern analysis and machine intelligence 42, 6(2019), 1453–1467.
[21]
Jun Liu, Amir Shahroudy, Dong Xu, Alex C Kot, and Gang Wang. 2017. Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE transactions on pattern analysis and machine intelligence 40, 12(2017), 3007–3021.
[22]
Saeed Mehrang, Julia Pietilä, and Ilkka Korhonen. 2018. An Activity Recognition Framework Deploying the Random Forest Classifier and A Single Optical Heart Rate Monitoring and Triaxial Accelerometer Wrist-Band. Sensors 18, 2 (2018). https://doi.org/10.3390/s18020613
[23]
Sushmita Mitra and Tinku Acharya. 2007. Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, 3 (2007), 311–324.
[24]
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. CoRR abs/1605.05273(2016). arxiv:1605.05273http://arxiv.org/abs/1605.05273
[25]
Konstantinos Papadopoulos, Enjie Ghorbel, Djamila Aouada, and Björn Ottersten. 2019. Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition. arXiv preprint arXiv:1912.09745(2019).
[26]
Sebastian Raschka and Vahid Mirjalili. 2017. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow, 2nd Edition(2nd ed.). Packt Publishing.
[27]
G. Singh, S. Saha, M. Sapienza, P. Torr, and F. Cuzzolin. 2017. Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction. In 2017 IEEE International Conference on Computer Vision (ICCV). 3657–3666. https://doi.org/10.1109/ICCV.2017.393
[28]
Chang Tang, Wanqing Li, Pichao Wang, and Lizhe Wang. 2018. Online human action recognition based on incremental learning of weighted covariance descriptors. Information Sciences 467(2018), 219–237. https://doi.org/10.1016/j.ins.2018.08.003
[29]
Gaojing Wang, Qingquan Li, Lei Wang, Wei Wang, Mengqi Wu, and Tao Liu. 2018. Impact of Sliding Window Length in Indoor Human Motion Modes and Pose Pattern Recognition Based on Smartphone Sensors. Sensors 18, 6 (2018). https://doi.org/10.3390/s18061965
[30]
Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, and Sergio Escalera. 2017. RGB-D-based Human Motion Recognition with Deep Learning: A Survey. CoRR abs/1711.08362(2017). arxiv:1711.08362http://arxiv.org/abs/1711.08362
[31]
Pichao Wang, Zhaoyang Li, Yonghong Hou, and Wanqing Li. 2016. Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks. CoRR abs/1611.02447(2016). arxiv:1611.02447http://arxiv.org/abs/1611.02447
[32]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).
[33]
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. 2018. Pose Flow: Efficient Online Pose Tracking. In BMVC.
[34]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[35]
Jun Yin, Jun Han, Ruiqi Xie, Chenghao Wang, Xuyang Duan, Yitong Rong, Xiaoyang Zeng, and Jun Tao. 2021. MC-LSTM: Real-Time 3D Human Action Detection System for Intelligent Healthcare Applications. IEEE Transactions on Biomedical Circuits and Systems 15, 2 (2021), 259–269. https://doi.org/10.1109/TBCAS.2021.3064841
[36]
Xin Zhao, Xue Li, Chaoyi Pang, Xiaofeng Zhu, and Quan Z Sheng. 2013. Online human gesture recognition from motion data streams. In Proceedings of the 21st ACM international conference on Multimedia. 23–32.
[37]
Wanqiang Zheng, Punan Jing, and Qingyang Xu. 2019. Action Recognition Based on Spatial Temporal Graph Convolutional Networks. In Proceedings of the 3rd International Conference on Computer Science and Application Engineering. 1–5.
[38]
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks. Computing Research Repository (CoRR) abs/1603.07772 (2016). arxiv:1603.07772http://arxiv.org/abs/1603.07772

Cited By

View all
  • (2024)Smarter Aging: Developing a Foundational Elderly Activity Monitoring System With AI and GUI InterfaceIEEE Access10.1109/ACCESS.2024.340595412(74499-74523)Online publication date: 2024
  • (2024)Online human motion analysis in industrial contextEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107850131:COnline publication date: 1-May-2024
  • (2024)A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarizationMultimedia Tools and Applications10.1007/s11042-024-19022-0Online publication date: 18-Apr-2024
  • Show More Cited By

Index Terms

  1. A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural Networks
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies
      March 2022
      291 pages
      ISBN:9781450395748
      DOI:10.1145/3529399
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Majority Voting
      2. Online Human Action Recognition
      3. Sliding Window
      4. Spatial Temporal Graph Convolutional Neural Networks

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICMLT 2022

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)101
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Smarter Aging: Developing a Foundational Elderly Activity Monitoring System With AI and GUI InterfaceIEEE Access10.1109/ACCESS.2024.340595412(74499-74523)Online publication date: 2024
      • (2024)Online human motion analysis in industrial contextEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107850131:COnline publication date: 1-May-2024
      • (2024)A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarizationMultimedia Tools and Applications10.1007/s11042-024-19022-0Online publication date: 18-Apr-2024
      • (2023)Towards Recognition of Human Actions in Collaborative Tasks with Robots: Extending Action Recognition with Tool Recognition MethodsSensors10.3390/s2312571823:12(5718)Online publication date: 19-Jun-2023
      • (2023)Digital twin of an industrial workstationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105655118:COnline publication date: 1-Feb-2023
      • (2023)ConvST-LSTM-Net: convolutional spatiotemporal LSTM networks for skeleton-based human action recognitionInternational Journal of Multimedia Information Retrieval10.1007/s13735-023-00301-912:2Online publication date: 27-Oct-2023

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media