Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Action recognition methods based on spatial-temporal skeleton graphs have been applied extensively. The spatial and temporal graphs are generally modeled individually in previous approaches. Recently, many researchers capture the correlation information of temporal and spatial dimensions in spatial-temporal graphs. However, the existing methods have several issues such as 1. The existing modal graphs are defined based on the human body structure which is not flexible enough; 2. The approach to extracting non-local neighborhood features is insufficiently powerful; 3. Attention modules are limited to a single scale; 4. The fusion of multiple data streams is not sufficiently effective. This work proposes a novel multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition that improves the aforementioned issues. The method utilizes an adaptive topology graph with an adaptive connection coefficient to adaptively optimize the topology of the graph during the training process according to the input data. An optimal high-order adjacency matrix is constructed in our work to balance the weight bias, which captures non-local neighborhood features precisely. Moreover, we design a multi-scale attention mechanism to aggregate information from multiple ranges, which makes the graph convolution focus on more efficient nodes, frames, and channels. To further improve the performance of the model, a novel multi-stream framework is proposed to aggregate the high-order information of the skeleton. The experiment results on the NTU-RGBD and Kinetics-Skeleton prove that our proposed method reveals better results than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos (in English), 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017), pp 1417–1426. https://doi.org/10.1109/Cvpr.2017.155

  2. Du WB, Wang YL, Qiao Y (2017) RPAN: an end-to-end recurrent pose-attention network for action recognition in videos (in English), Ieee I Conf Comp Vis, pp 3745–3754, https://doi.org/10.1109/Iccv.2017.402

  3. Zhao Y, Xiong YJ, Wang LM, Wu ZR, Tang XO, Lin DH (2020) Temporal action detection with structured segment networks, (in English). Int J Comput Vis 128(1):74–95. https://doi.org/10.1007/s11263-019-01211-2

    Article  MathSciNet  MATH  Google Scholar 

  4. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595

  5. Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387

  6. Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 1623–1631

  7. Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, pp 601–604

  8. Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118

  9. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  10. Yan SJ, Xiong YJ, Lin DH (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition, (in English), Thirty-Second Aaai Conference on Artificial Intelligence / Thirtieth Innovative Applications of Artificial Intelligence Conference / Eighth Aaai Symposium on Educational Advances in Artificial Intelligence, pp 7444–7452. [Online]. Available: <Go to ISI>://WOS:000485488907067

  11. Shi L, Zhang YF, Cheng J, Lu HQ (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition, (in English), Proc Cvpr Ieee, pp 12018–12027. https://doi.org/10.1109/Cvpr.2019.01230

  12. Shi L, Zhang YF, Cheng J, Lu HQ (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks, (in English). IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/Tip.2020.3028207

    Article  MATH  Google Scholar 

  13. Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic GCN: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 55–63

  14. Cheng K, Zhang YF, He XY, Chen WH, Cheng J, Lu HQ (2020) Skeleton-based action recognition with shift graph convolutional network, (in English), 2020 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr), pp 180–189. https://doi.org/10.1109/Cvpr42600.2020.00026

  15. Xia HL, Gao XK (2021) Multi-scale mixed dense graph convolution network for skeleton-based action recognition, (in English). IEEE Access 9:36475–36484. https://doi.org/10.1109/Access.2020.3049029

    Article  Google Scholar 

  16. Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM international conference on multimedia, pp 601–610

  17. Liu ZY, Zhang HW, Chen ZH, Wang ZY, Ouyang WL (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition, (in English), 2020 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr), pp 140–149. https://doi.org/10.1109/Cvpr42600.2020.00022

  18. Luan S, Zhao M, Chang X-W, Precup D (2019) Break the ceiling: stronger multi-scale deep graph convolutional networks. arXiv preprint arXiv:1906.02174

  19. Yu L, Tian L, Du Q, Bhutto JA Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition. IET Comput Vis. https://doi.org/10.1049/cvi2.12075

  20. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. arXiv preprint arXiv:1606.09375

  21. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

  22. Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning, PMLR, pp 2014-2023

  23. Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 399–417

  24. Abu-El-Haija S et al (2019) Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In: International conference on machine learning, PMLR, pp 21–29

  25. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603

  26. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International conference on machine learning, PMLR, pp 6861–6871

  27. Liao R, Zhao Z, Urtasun R, Zemel RS (2019) Lanczosnet: multi-scale deep graph convolutional networks. arXiv preprint arXiv:1901.01484

  28. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7912–7921

  29. Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Proc AAAI Conf Artif Intell 34(03):2669–2676

    Google Scholar 

Download references

Funding

This study was supported by the Key-Area Research and Development Program of Guangdong Province (2018B010109001, 2020B1111010002, 2019B020214001).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lianfang Tian or Du Qiliang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, L., Tian, L., Du, Q. et al. Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition. Appl Intell 53, 14838–14854 (2023). https://doi.org/10.1007/s10489-022-04179-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04179-8

Keywords

Navigation