research-article

A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural Networks

Authors:

Vincent Havard,

David BaudryAuthors Info & Claims

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

Pages 155 - 163

https://doi.org/10.1145/3529399.3529425

Published: 10 June 2022 Publication History

Abstract

Nowadays, Human Action Recognition (HAR) has become an important issue since it is widely used in video surveillance, human-robot collaboration in industry, etc. Developing such accurate and efficient algorithms remains a difficult task because of the high variability of the human shapes, postures, as well as the complexity of their movements but more importantly when using continuous/untrimmed data streams. Since HAR from Segmented/trimmed sequences has been intensively studied and developed in the recent years, Online HAR in the other hand remains a challenging task and is less developed. In this paper, we propose a Sliding Window and Majority Voting skeleton-based approach for Online HAR using Spatial Temporal Graph Convolutional Neural Networks (STGCN-SWMV). Our method is evaluated on two Online skeleton-based datasets named OAD and UOW. In comparison with existing methods, the obtained results exceed state-of-the-art algorithms and show the efficiency of the proposed approach.

References

[1]

Maryam Asadi-Aghbolaghi, A. Clapés, Marco Bellantonio, H. Escalante, Víctor Ponce-López, X. Baró, I. Guyon, S. Kasaei, and S. Escalera. 2017. A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) (2017), 476–483.

Digital Library

[2]

Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. 2017. Real-time Online Action Detection Forests using Spatio-temporal Contexts. Computing Research Repository (CoRR) abs/1610.09334 (2017). arxiv:1610.09334http://arxiv.org/abs/1610.09334

[3]

Oresti Banos, Juan-Manuel Galvez, Miguel Damas, Hector Pomares, and Ignacio Rojas. 2014. Window Size Impact in Human Activity Recognition. Sensors 14, 4 (2014), 6474–6499. https://doi.org/10.3390/s140406474

[4]

Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (Jul 2017), 18–42. https://doi.org/10.1109/MSP.2017.2693418 arXiv:1611.08097.

[5]

Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CoRR abs/1812.08008(2018). arxiv:1812.08008http://arxiv.org/abs/1812.08008

[6]

M. Dallel, V. Havard, D. Baudry, and X. Savatier. 2020. InHARD - Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics. In IEEE International Conference on Human-Machine Systems (ICHMS). 1–6. https://doi.org/10.1109/ICHMS49158.2020.9209531

[7]

Akbar Dehghani, Omid Sarbishei, Tristan Glatard, and Emad Shihab. 2019. A Quantitative Comparison of Overlapping and Non-Overlapping Sliding Windows for Human Activity Recognition Using Inertial Sensors. Sensors 19, 22 (2019). https://doi.org/10.3390/s19225026

[8]

Mickael Delamare, Cyril Laville, Adnane Cabani, and Houcine Chafouk. 2021. Graph Convolutional Networks Skeleton-based Action Recognition for Continuous Data Stream: A Sliding Window Approach. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,. INSTICC, SciTePress, 427–435. https://doi.org/10.5220/0010234904270435

[9]

Victor Escorcia Fabian Caba Heilbron, Bernard Ghanem and Juan Carlos Niebles. 2015. ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961–970.

[10]

H. Idrees, A. R. Zamir, Y. Jiang, A. Gorban, I. Laptev, R. Sukthankar, and M. Shah. 2017. The THUMOS challenge on action recognition for videos “in the wild”. Computer Vision and Image Understanding 155 (2017), 1–23.

[11]

Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. CoRR abs/1609.02907(2016). arxiv:1609.02907http://arxiv.org/abs/1609.02907

[12]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).

[13]

Igor Kviatkovsky, Ehud Rivlin, and Ilan Shimshoni. 2014. Online action recognition using covariance of shape and motion. Comput. Vis. Image Underst. 129 (2014), 15–26.

Digital Library

[14]

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3595–3603.

[15]

Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu. 2016. Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks. European Conference on Computer Vision (ECCV) (2016).

[16]

Y. Li, Daniel Tarlow, Marc Brockschmidt, and R. Zemel. 2016. Gated Graph Sequence Neural Networks. CoRR abs/1511.05493(2016).

[17]

C. H. Lin, P. Y. Chou, C. H. Lin, and M. Y. Tsai. 2020. SlowFast-GCN: A Novel Skeleton-Based Action Recognition Framework. In 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). 170–174. https://doi.org/10.1109/ICPAI51961.2020.00039

[18]

Chunhui Liu, Yanghao Li, Yueyu Hu, and Jiaying Liu. 2017. Online action detection and forecast via Multitask deep Recurrent Neural Networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1702–1706. https://doi.org/10.1109/ICASSP.2017.7952447

Digital Library

[19]

J. Liu, Y. Li, S. Song, J. Xing, C. Lan, and W. Zeng. 2019. Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection. IEEE Transactions on Circuits and Systems for Video Technology 29, 9(2019), 2667–2682. https://doi.org/10.1109/TCSVT.2018.2799968

Digital Library

[20]

Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, and Alex C Kot. 2019. Skeleton-based online action prediction using scale selection network. IEEE transactions on pattern analysis and machine intelligence 42, 6(2019), 1453–1467.

[21]

Jun Liu, Amir Shahroudy, Dong Xu, Alex C Kot, and Gang Wang. 2017. Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE transactions on pattern analysis and machine intelligence 40, 12(2017), 3007–3021.

[22]

Saeed Mehrang, Julia Pietilä, and Ilkka Korhonen. 2018. An Activity Recognition Framework Deploying the Random Forest Classifier and A Single Optical Heart Rate Monitoring and Triaxial Accelerometer Wrist-Band. Sensors 18, 2 (2018). https://doi.org/10.3390/s18020613

[23]

Sushmita Mitra and Tinku Acharya. 2007. Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, 3 (2007), 311–324.

Digital Library

[24]

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. CoRR abs/1605.05273(2016). arxiv:1605.05273http://arxiv.org/abs/1605.05273

[25]

Konstantinos Papadopoulos, Enjie Ghorbel, Djamila Aouada, and Björn Ottersten. 2019. Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition. arXiv preprint arXiv:1912.09745(2019).

[26]

Sebastian Raschka and Vahid Mirjalili. 2017. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow, 2nd Edition(2nd ed.). Packt Publishing.

[27]

G. Singh, S. Saha, M. Sapienza, P. Torr, and F. Cuzzolin. 2017. Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction. In 2017 IEEE International Conference on Computer Vision (ICCV). 3657–3666. https://doi.org/10.1109/ICCV.2017.393

[28]

Chang Tang, Wanqing Li, Pichao Wang, and Lizhe Wang. 2018. Online human action recognition based on incremental learning of weighted covariance descriptors. Information Sciences 467(2018), 219–237. https://doi.org/10.1016/j.ins.2018.08.003

[29]

Gaojing Wang, Qingquan Li, Lei Wang, Wei Wang, Mengqi Wu, and Tao Liu. 2018. Impact of Sliding Window Length in Indoor Human Motion Modes and Pose Pattern Recognition Based on Smartphone Sensors. Sensors 18, 6 (2018). https://doi.org/10.3390/s18061965

[30]

Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, and Sergio Escalera. 2017. RGB-D-based Human Motion Recognition with Deep Learning: A Survey. CoRR abs/1711.08362(2017). arxiv:1711.08362http://arxiv.org/abs/1711.08362

[31]

Pichao Wang, Zhaoyang Li, Yonghong Hou, and Wanqing Li. 2016. Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks. CoRR abs/1611.02447(2016). arxiv:1611.02447http://arxiv.org/abs/1611.02447

[32]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).

[33]

Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. 2018. Pose Flow: Efficient Online Pose Tracking. In BMVC.

[34]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.

[35]

Jun Yin, Jun Han, Ruiqi Xie, Chenghao Wang, Xuyang Duan, Yitong Rong, Xiaoyang Zeng, and Jun Tao. 2021. MC-LSTM: Real-Time 3D Human Action Detection System for Intelligent Healthcare Applications. IEEE Transactions on Biomedical Circuits and Systems 15, 2 (2021), 259–269. https://doi.org/10.1109/TBCAS.2021.3064841

[36]

Xin Zhao, Xue Li, Chaoyi Pang, Xiaofeng Zhu, and Quan Z Sheng. 2013. Online human gesture recognition from motion data streams. In Proceedings of the 21st ACM international conference on Multimedia. 23–32.

Digital Library

[37]

Wanqiang Zheng, Punan Jing, and Qingyang Xu. 2019. Action Recognition Based on Spatial Temporal Graph Convolutional Networks. In Proceedings of the 3rd International Conference on Computer Science and Application Engineering. 1–5.

Digital Library

[38]

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks. Computing Research Repository (CoRR) abs/1603.07772 (2016). arxiv:1603.07772http://arxiv.org/abs/1603.07772

Cited By

Htet YZin TTin PTamura HKondo KWatanabe SChosa E(2024)Smarter Aging: Developing a Foundational Elderly Activity Monitoring System With AI and GUI InterfaceIEEE Access10.1109/ACCESS.2024.340595412(74499-74523)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3405954
Benmessabih TSlama RHavard VBaudry D(2024)Online human motion analysis in industrial contextEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107850131:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.107850
Hossain SDeb KSakib SSarker I(2024)A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarizationMultimedia Tools and Applications10.1007/s11042-024-19022-0Online publication date: 18-Apr-2024
https://doi.org/10.1007/s11042-024-19022-0
Show More Cited By

Index Terms

A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

EclatDS: An efficient sliding window based frequent pattern mining method for data streams

Mining frequent patterns over data streams is an interesting problem due to its wide application area. The researchers in this field have been facing two key challenges, namely reduction in runtime and memory usage. In this study, a novel method for ...
A dynamic sliding window approach for activity recognition
UMAP'11: Proceedings of the 19th international conference on User modeling, adaption, and personalization

Human activity recognition aims to infer the actions of one or more persons from a set of observations captured by sensors. Usually, this is performed by following a fixed length sliding window approach for the features extraction where two parameters ...
Online FCMAC-BYY Model with Sliding Window
ISNN 2009: Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II

The online Bayesian Ying Yang (BYY) learning using clustering algorithm has been recently applied to Fuzzy CMAC in order to find the optimal centroids and widths of the fuzzy clusters. However, this BYY model is based on wholly-database, in which each ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

March 2022

291 pages

ISBN:9781450395748

DOI:10.1145/3529399

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLT 2022

ICMLT 2022: 2022 7th International Conference on Machine Learning Technologies

March 11 - 13, 2022

Rome, Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)101
Downloads (Last 6 weeks)10

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Htet YZin TTin PTamura HKondo KWatanabe SChosa E(2024)Smarter Aging: Developing a Foundational Elderly Activity Monitoring System With AI and GUI InterfaceIEEE Access10.1109/ACCESS.2024.340595412(74499-74523)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3405954
Benmessabih TSlama RHavard VBaudry D(2024)Online human motion analysis in industrial contextEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107850131:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.107850
Hossain SDeb KSakib SSarker I(2024)A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarizationMultimedia Tools and Applications10.1007/s11042-024-19022-0Online publication date: 18-Apr-2024
https://doi.org/10.1007/s11042-024-19022-0
Büsch LKoch JSchoepflin DSchulze MSchüppstuhl T(2023)Towards Recognition of Human Actions in Collaborative Tasks with Robots: Extending Action Recognition with Tool Recognition MethodsSensors10.3390/s2312571823:12(5718)Online publication date: 19-Jun-2023
https://doi.org/10.3390/s23125718
Dallel MHavard VDupuis YBaudry D(2023)Digital twin of an industrial workstationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105655118:COnline publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.engappai.2022.105655
Sharma ASingh R(2023)ConvST-LSTM-Net: convolutional spatiotemporal LSTM networks for skeleton-based human action recognitionInternational Journal of Multimedia Information Retrieval10.1007/s13735-023-00301-912:2Online publication date: 27-Oct-2023
https://doi.org/10.1007/s13735-023-00301-9

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents