Abstract
Action recognition based on 2D information has encountered intrinsic difficulties such as occlusion and view etc. Especially suffering with complicated changes of perspective. In this paper, we present a straightforward and efficient approach for 3D human action recognition based on skeleton sequences. A rough geometric feature, termed planes of 3D joint motions vector (PoJM3D) is extracted from the raw skeleton data to capture the omnidirectional short-term motion cues. A customized 3D convolutional neural network is employed to learn the global long-term representation of spatial appearance and temporal motion information with a scheme called dynamic temporal sparse sampling (DTSS). Extensive experiments on three public benchmark datasets, including UTD-MVAD, UTD-MHAD, and CAS-YNU-MHAD demonstrate the effectiveness of our method compared to the current state-of-the-art in cross-view evaluation, and significant improvement in cross-subjects evaluation. The code of our proposed approach is available at released on GitHub.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Xiao, J., Stolkin, R., Gao, Y., Leonardis, A.: Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Trans. Cybern. 48, 2485–2499 (2018)
Wang, P., Li, W., Gao, Z., Tang, C., Ogunbona, P.O.: Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans. Multimedia 20, 1051–1061 (2018)
Ji, X., Cheng, J., Feng, W.: Spatio-temporal cuboid pyramid for action recognition using depth motion sequences. In: International Conference Advanced Computational Intelligence (ICACI) (2016)
Liu, M., Liu, H., Chen, C.: Robust 3D action recognition through sampling local appearances and global distributions. IEEE Trans. Multimedia 20, 1932–1947 (2018)
Ji, X., Cheng, J., Tao, D.: Local mean spatio-temporal feature for depth image-based speed-up action recognition. In: IEEE International Conference on Image Processing (ICIP), pp. 2389–2393 (2015)
Yoshiyasu, Y., Sagawa, R., Ayusawa, K., Murai, A.: Skeleton transformer networks: 3D human pose and skinned mesh from single RGB image. arXiv preprint arXiv:1812.11328 (2018)
Presti, L.L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2016)
Hu, J.-F., Zheng, W.-S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Zhang, S., Yang, Y., Xiao, J., Liu, X., Yang, Y., Xie, D., et al.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20, 2330–2343 (2018)
Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2012)
Ding, R., He, Q., Liu, H., Liu, M.: Combining adaptive hierarchical depth motion maps with skeletal joints for human action recognition. IEEE Access 7, 5597–5608 (2019)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA (2018)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010)
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD Multimodal Human Action Dataset (UTD-MHAD), February 2019. http://www.utdallas.edu/~kehtar/UTD-MHAD.html
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP) (2015)
Zhao, Q., Cheng, J., Tao, D., Ji, X., Wang, L.: CAS-YNU multi-modal cross-view human action dataset. In: International Conference on Information and Automation (ICIA) (2018)
Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp. 102–106 (2018)
Li, C., Hou, Y., Wang, P., Li, W.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)
Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: CVPR, pp. 804–811 (2014)
Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. Real-Time Image Process 12, 155–163 (2016)
Luvizon, D.C., Tabia, H., Picard, D.: Learning features combination for human action recognition from skeleton sequences. Pattern Recogn. Lett. 99, 13–20 (2017)
Hou, Y., Li, Z., Wang, P., Li, W.: Skeleton optical spectra based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2018)
Ji, X., Cheng, J., Tao, D., Wu, X., Feng, W.: The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl.-Based Syst. 122, 64–74 (2017)
Ijjina, E.P., Chalavadi, K.M.: Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn. 72, 504–516 (2017)
Acknowledgement
The study is supported by National Natural Science Foundation of China (61772508, U1713213), Key Research and Development Program of Guangdong Province [grant numbers 2019B090915001], CAS Key Technology Talent Program, Shenzhen Technology Project (JCYJ20170413152535587, JCYJ20180507182610734).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Q., Sun, S., Ji, X., Wang, L., Cheng, J. (2019). View Invariant Human Action Recognition Using 3D Geometric Features. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11743. Springer, Cham. https://doi.org/10.1007/978-3-030-27538-9_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-27538-9_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27537-2
Online ISBN: 978-3-030-27538-9
eBook Packages: Computer ScienceComputer Science (R0)