View Invariant Human Action Recognition Using 3D Geometric Features

Qingsong Zhao^14,15,16,
Shijie Sun^14,15,16,
Xiaopeng Ji¹⁷,
Lei Wang^15,16 &
…
Jun Cheng^15,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11743))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

Abstract

Action recognition based on 2D information has encountered intrinsic difficulties such as occlusion and view etc. Especially suffering with complicated changes of perspective. In this paper, we present a straightforward and efficient approach for 3D human action recognition based on skeleton sequences. A rough geometric feature, termed planes of 3D joint motions vector (PoJM3D) is extracted from the raw skeleton data to capture the omnidirectional short-term motion cues. A customized 3D convolutional neural network is employed to learn the global long-term representation of spatial appearance and temporal motion information with a scheme called dynamic temporal sparse sampling (DTSS). Extensive experiments on three public benchmark datasets, including UTD-MVAD, UTD-MHAD, and CAS-YNU-MHAD demonstrate the effectiveness of our method compared to the current state-of-the-art in cross-view evaluation, and significant improvement in cross-subjects evaluation. The code of our proposed approach is available at released on GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

Article 18 November 2022

Understanding the limits of 2D skeletons for action recognition

Article 07 February 2021

View-independent representation with frame interpolation method for skeleton-based human action recognition

Article 05 May 2020

References

Xiao, J., Stolkin, R., Gao, Y., Leonardis, A.: Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Trans. Cybern. 48, 2485–2499 (2018)
Article Google Scholar
Wang, P., Li, W., Gao, Z., Tang, C., Ogunbona, P.O.: Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans. Multimedia 20, 1051–1061 (2018)
Article Google Scholar
Ji, X., Cheng, J., Feng, W.: Spatio-temporal cuboid pyramid for action recognition using depth motion sequences. In: International Conference Advanced Computational Intelligence (ICACI) (2016)
Google Scholar
Liu, M., Liu, H., Chen, C.: Robust 3D action recognition through sampling local appearances and global distributions. IEEE Trans. Multimedia 20, 1932–1947 (2018)
Article Google Scholar
Ji, X., Cheng, J., Tao, D.: Local mean spatio-temporal feature for depth image-based speed-up action recognition. In: IEEE International Conference on Image Processing (ICIP), pp. 2389–2393 (2015)
Google Scholar
Yoshiyasu, Y., Sagawa, R., Ayusawa, K., Murai, A.: Skeleton transformer networks: 3D human pose and skinned mesh from single RGB image. arXiv preprint arXiv:1812.11328 (2018)
Presti, L.L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2016)
Article Google Scholar
Hu, J.-F., Zheng, W.-S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Zhang, S., Yang, Y., Xiao, J., Liu, X., Yang, Y., Xie, D., et al.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20, 2330–2343 (2018)
Article Google Scholar
Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2012)
Google Scholar
Ding, R., He, Q., Liu, H., Liu, M.: Combining adaptive hierarchical depth motion maps with skeletal joints for human action recognition. IEEE Access 7, 5597–5608 (2019)
Article Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA (2018)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010)
Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD Multimodal Human Action Dataset (UTD-MHAD), February 2019. http://www.utdallas.edu/~kehtar/UTD-MHAD.html
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP) (2015)
Google Scholar
Zhao, Q., Cheng, J., Tao, D., Ji, X., Wang, L.: CAS-YNU multi-modal cross-view human action dataset. In: International Conference on Information and Automation (ICIA) (2018)
Google Scholar
Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp. 102–106 (2018)
Google Scholar
Li, C., Hou, Y., Wang, P., Li, W.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)
Article Google Scholar
Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: CVPR, pp. 804–811 (2014)
Google Scholar
Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. Real-Time Image Process 12, 155–163 (2016)
Article Google Scholar
Luvizon, D.C., Tabia, H., Picard, D.: Learning features combination for human action recognition from skeleton sequences. Pattern Recogn. Lett. 99, 13–20 (2017)
Article Google Scholar
Hou, Y., Li, Z., Wang, P., Li, W.: Skeleton optical spectra based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2018)
Article Google Scholar
Ji, X., Cheng, J., Tao, D., Wu, X., Feng, W.: The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl.-Based Syst. 122, 64–74 (2017)
Article Google Scholar
Ijjina, E.P., Chalavadi, K.M.: Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn. 72, 504–516 (2017)
Article Google Scholar

Download references

Acknowledgement

The study is supported by National Natural Science Foundation of China (61772508, U1713213), Key Research and Development Program of Guangdong Province [grant numbers 2019B090915001], CAS Key Technology Talent Program, Shenzhen Technology Project (JCYJ20170413152535587, JCYJ20180507182610734).

Author information

Authors and Affiliations

Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, China
Qingsong Zhao & Shijie Sun
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China
Qingsong Zhao, Shijie Sun, Lei Wang & Jun Cheng
The Chinese University of Hong Kong, Hong Kong, China
Qingsong Zhao, Shijie Sun, Lei Wang & Jun Cheng
State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
Xiaopeng Ji

Authors

Qingsong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Ji
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Cheng .

Editor information

Editors and Affiliations

Shenyang Institute of Automation, Shenyang, China
Haibin Yu
Shenyang Institute of Automation, Shenyang, China
Jinguo Liu
Shenyang Institute of Automation, Shenyang, China
Lianqing Liu
University of Portsmouth, Portsmouth, UK
Zhaojie Ju
Shenyang Institute of Automation, Shenyang, China
Yuwang Liu
University of Portsmouth, Portsmouth, UK
Dalin Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Q., Sun, S., Ji, X., Wang, L., Cheng, J. (2019). View Invariant Human Action Recognition Using 3D Geometric Features. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11743. Springer, Cham. https://doi.org/10.1007/978-3-030-27538-9_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-27538-9_48
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27537-2
Online ISBN: 978-3-030-27538-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

View Invariant Human Action Recognition Using 3D Geometric Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

Understanding the limits of 2D skeletons for action recognition

View-independent representation with frame interpolation method for skeleton-based human action recognition

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

View Invariant Human Action Recognition Using 3D Geometric Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

Understanding the limits of 2D skeletons for action recognition

View-independent representation with frame interpolation method for skeleton-based human action recognition

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation