Nothing Special   »   [go: up one dir, main page]

Skip to main content

View Invariant Human Action Recognition Using 3D Geometric Features

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11743))

Included in the following conference series:

Abstract

Action recognition based on 2D information has encountered intrinsic difficulties such as occlusion and view etc. Especially suffering with complicated changes of perspective. In this paper, we present a straightforward and efficient approach for 3D human action recognition based on skeleton sequences. A rough geometric feature, termed planes of 3D joint motions vector (PoJM3D) is extracted from the raw skeleton data to capture the omnidirectional short-term motion cues. A customized 3D convolutional neural network is employed to learn the global long-term representation of spatial appearance and temporal motion information with a scheme called dynamic temporal sparse sampling (DTSS). Extensive experiments on three public benchmark datasets, including UTD-MVAD, UTD-MHAD, and CAS-YNU-MHAD demonstrate the effectiveness of our method compared to the current state-of-the-art in cross-view evaluation, and significant improvement in cross-subjects evaluation. The code of our proposed approach is available at released on GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Xiao, J., Stolkin, R., Gao, Y., Leonardis, A.: Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Trans. Cybern. 48, 2485–2499 (2018)

    Article  Google Scholar 

  2. Wang, P., Li, W., Gao, Z., Tang, C., Ogunbona, P.O.: Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans. Multimedia 20, 1051–1061 (2018)

    Article  Google Scholar 

  3. Ji, X., Cheng, J., Feng, W.: Spatio-temporal cuboid pyramid for action recognition using depth motion sequences. In: International Conference Advanced Computational Intelligence (ICACI) (2016)

    Google Scholar 

  4. Liu, M., Liu, H., Chen, C.: Robust 3D action recognition through sampling local appearances and global distributions. IEEE Trans. Multimedia 20, 1932–1947 (2018)

    Article  Google Scholar 

  5. Ji, X., Cheng, J., Tao, D.: Local mean spatio-temporal feature for depth image-based speed-up action recognition. In: IEEE International Conference on Image Processing (ICIP), pp. 2389–2393 (2015)

    Google Scholar 

  6. Yoshiyasu, Y., Sagawa, R., Ayusawa, K., Murai, A.: Skeleton transformer networks: 3D human pose and skinned mesh from single RGB image. arXiv preprint arXiv:1812.11328 (2018)

  7. Presti, L.L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2016)

    Article  Google Scholar 

  8. Hu, J.-F., Zheng, W.-S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  9. Zhang, S., Yang, Y., Xiao, J., Liu, X., Yang, Y., Xie, D., et al.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20, 2330–2343 (2018)

    Article  Google Scholar 

  10. Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2012)

    Google Scholar 

  11. Ding, R., He, Q., Liu, H., Liu, M.: Combining adaptive hierarchical depth motion maps with skeletal joints for human action recognition. IEEE Access 7, 5597–5608 (2019)

    Article  Google Scholar 

  12. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  14. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA (2018)

    Google Scholar 

  15. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)

    Google Scholar 

  16. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010)

    Google Scholar 

  17. Chen, C., Jafari, R., Kehtarnavaz, N.: UTD Multimodal Human Action Dataset (UTD-MHAD), February 2019. http://www.utdallas.edu/~kehtar/UTD-MHAD.html

  18. Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP) (2015)

    Google Scholar 

  19. Zhao, Q., Cheng, J., Tao, D., Ji, X., Wang, L.: CAS-YNU multi-modal cross-view human action dataset. In: International Conference on Information and Automation (ICIA) (2018)

    Google Scholar 

  20. Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp. 102–106 (2018)

    Google Scholar 

  21. Li, C., Hou, Y., Wang, P., Li, W.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)

    Article  Google Scholar 

  22. Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: CVPR, pp. 804–811 (2014)

    Google Scholar 

  23. Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. Real-Time Image Process 12, 155–163 (2016)

    Article  Google Scholar 

  24. Luvizon, D.C., Tabia, H., Picard, D.: Learning features combination for human action recognition from skeleton sequences. Pattern Recogn. Lett. 99, 13–20 (2017)

    Article  Google Scholar 

  25. Hou, Y., Li, Z., Wang, P., Li, W.: Skeleton optical spectra based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2018)

    Article  Google Scholar 

  26. Ji, X., Cheng, J., Tao, D., Wu, X., Feng, W.: The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl.-Based Syst. 122, 64–74 (2017)

    Article  Google Scholar 

  27. Ijjina, E.P., Chalavadi, K.M.: Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn. 72, 504–516 (2017)

    Article  Google Scholar 

Download references

Acknowledgement

The study is supported by National Natural Science Foundation of China (61772508, U1713213), Key Research and Development Program of Guangdong Province [grant numbers 2019B090915001], CAS Key Technology Talent Program, Shenzhen Technology Project (JCYJ20170413152535587, JCYJ20180507182610734).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Q., Sun, S., Ji, X., Wang, L., Cheng, J. (2019). View Invariant Human Action Recognition Using 3D Geometric Features. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11743. Springer, Cham. https://doi.org/10.1007/978-3-030-27538-9_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27538-9_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27537-2

  • Online ISBN: 978-3-030-27538-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics