Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition

Lubin Yu^1,2,
Lianfang Tian^1,3,
Du Qiliang^1,4,5 &
…
Jameel Ahmed Bhutto⁶

581 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Action recognition methods based on spatial-temporal skeleton graphs have been applied extensively. The spatial and temporal graphs are generally modeled individually in previous approaches. Recently, many researchers capture the correlation information of temporal and spatial dimensions in spatial-temporal graphs. However, the existing methods have several issues such as 1. The existing modal graphs are defined based on the human body structure which is not flexible enough; 2. The approach to extracting non-local neighborhood features is insufficiently powerful; 3. Attention modules are limited to a single scale; 4. The fusion of multiple data streams is not sufficiently effective. This work proposes a novel multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition that improves the aforementioned issues. The method utilizes an adaptive topology graph with an adaptive connection coefficient to adaptively optimize the topology of the graph during the training process according to the input data. An optimal high-order adjacency matrix is constructed in our work to balance the weight bias, which captures non-local neighborhood features precisely. Moreover, we design a multi-scale attention mechanism to aggregate information from multiple ranges, which makes the graph convolution focus on more efficient nodes, frames, and channels. To further improve the performance of the model, a novel multi-stream framework is proposed to aggregate the high-order information of the skeleton. The experiment results on the NTU-RGBD and Kinetics-Skeleton prove that our proposed method reveals better results than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Two-stream adaptive-attentional subgraph convolution networks for skeleton-based action recognition

Article 07 July 2021

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Article 19 June 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos (in English), 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017), pp 1417–1426. https://doi.org/10.1109/Cvpr.2017.155
Du WB, Wang YL, Qiao Y (2017) RPAN: an end-to-end recurrent pose-attention network for action recognition in videos (in English), Ieee I Conf Comp Vis, pp 3745–3754, https://doi.org/10.1109/Iccv.2017.402
Zhao Y, Xiong YJ, Wang LM, Wu ZR, Tang XO, Lin DH (2020) Temporal action detection with structured segment networks, (in English). Int J Comput Vis 128(1):74–95. https://doi.org/10.1007/s11263-019-01211-2
Article MathSciNet MATH Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 1623–1631
Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, pp 601–604
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Yan SJ, Xiong YJ, Lin DH (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition, (in English), Thirty-Second Aaai Conference on Artificial Intelligence / Thirtieth Innovative Applications of Artificial Intelligence Conference / Eighth Aaai Symposium on Educational Advances in Artificial Intelligence, pp 7444–7452. [Online]. Available: <Go to ISI>://WOS:000485488907067
Shi L, Zhang YF, Cheng J, Lu HQ (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition, (in English), Proc Cvpr Ieee, pp 12018–12027. https://doi.org/10.1109/Cvpr.2019.01230
Shi L, Zhang YF, Cheng J, Lu HQ (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks, (in English). IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/Tip.2020.3028207
Article MATH Google Scholar
Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic GCN: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 55–63
Cheng K, Zhang YF, He XY, Chen WH, Cheng J, Lu HQ (2020) Skeleton-based action recognition with shift graph convolutional network, (in English), 2020 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr), pp 180–189. https://doi.org/10.1109/Cvpr42600.2020.00026
Xia HL, Gao XK (2021) Multi-scale mixed dense graph convolution network for skeleton-based action recognition, (in English). IEEE Access 9:36475–36484. https://doi.org/10.1109/Access.2020.3049029
Article Google Scholar
Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM international conference on multimedia, pp 601–610
Liu ZY, Zhang HW, Chen ZH, Wang ZY, Ouyang WL (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition, (in English), 2020 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr), pp 140–149. https://doi.org/10.1109/Cvpr42600.2020.00022
Luan S, Zhao M, Chang X-W, Precup D (2019) Break the ceiling: stronger multi-scale deep graph convolutional networks. arXiv preprint arXiv:1906.02174
Yu L, Tian L, Du Q, Bhutto JA Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition. IET Comput Vis. https://doi.org/10.1049/cvi2.12075
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. arXiv preprint arXiv:1606.09375
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning, PMLR, pp 2014-2023
Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 399–417
Abu-El-Haija S et al (2019) Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In: International conference on machine learning, PMLR, pp 21–29
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International conference on machine learning, PMLR, pp 6861–6871
Liao R, Zhao Z, Urtasun R, Zemel RS (2019) Lanczosnet: multi-scale deep graph convolutional networks. arXiv preprint arXiv:1901.01484
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7912–7921
Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Proc AAAI Conf Artif Intell 34(03):2669–2676
Google Scholar

Download references

Funding

This study was supported by the Key-Area Research and Development Program of Guangdong Province (2018B010109001, 2020B1111010002, 2019B020214001).

Author information

Authors and Affiliations

School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Lubin Yu, Lianfang Tian & Du Qiliang
The Fifth Electronics Research Institute of Ministry of Industry and Information Technology, Guangzhou, China
Lubin Yu
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China
Lianfang Tian
Sino-Singapore International Joint Research Institute, Guangzhou, China
Du Qiliang
Key Laboratory of Autonomous Systems and Network Control of Ministry of Education, Guangzhou, China
Du Qiliang
School of Computer, Huanggang Normal University, Huanggang, China
Jameel Ahmed Bhutto

Authors

Lubin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lianfang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Du Qiliang
View author publications
You can also search for this author in PubMed Google Scholar
Jameel Ahmed Bhutto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Lianfang Tian or Du Qiliang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, L., Tian, L., Du, Q. et al. Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition. Appl Intell 53, 14838–14854 (2023). https://doi.org/10.1007/s10489-022-04179-8

Download citation

Accepted: 13 September 2022
Published: 04 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04179-8

Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Two-stream adaptive-attentional subgraph convolution networks for skeleton-based action recognition

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Two-stream adaptive-attentional subgraph convolution networks for skeleton-based action recognition

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation