Abstract
Action recognition methods based on spatial-temporal skeleton graphs have been applied extensively. The spatial and temporal graphs are generally modeled individually in previous approaches. Recently, many researchers capture the correlation information of temporal and spatial dimensions in spatial-temporal graphs. However, the existing methods have several issues such as 1. The existing modal graphs are defined based on the human body structure which is not flexible enough; 2. The approach to extracting non-local neighborhood features is insufficiently powerful; 3. Attention modules are limited to a single scale; 4. The fusion of multiple data streams is not sufficiently effective. This work proposes a novel multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition that improves the aforementioned issues. The method utilizes an adaptive topology graph with an adaptive connection coefficient to adaptively optimize the topology of the graph during the training process according to the input data. An optimal high-order adjacency matrix is constructed in our work to balance the weight bias, which captures non-local neighborhood features precisely. Moreover, we design a multi-scale attention mechanism to aggregate information from multiple ranges, which makes the graph convolution focus on more efficient nodes, frames, and channels. To further improve the performance of the model, a novel multi-stream framework is proposed to aggregate the high-order information of the skeleton. The experiment results on the NTU-RGBD and Kinetics-Skeleton prove that our proposed method reveals better results than existing methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos (in English), 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017), pp 1417–1426. https://doi.org/10.1109/Cvpr.2017.155
Du WB, Wang YL, Qiao Y (2017) RPAN: an end-to-end recurrent pose-attention network for action recognition in videos (in English), Ieee I Conf Comp Vis, pp 3745–3754, https://doi.org/10.1109/Iccv.2017.402
Zhao Y, Xiong YJ, Wang LM, Wu ZR, Tang XO, Lin DH (2020) Temporal action detection with structured segment networks, (in English). Int J Comput Vis 128(1):74–95. https://doi.org/10.1007/s11263-019-01211-2
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 1623–1631
Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, pp 601–604
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Yan SJ, Xiong YJ, Lin DH (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition, (in English), Thirty-Second Aaai Conference on Artificial Intelligence / Thirtieth Innovative Applications of Artificial Intelligence Conference / Eighth Aaai Symposium on Educational Advances in Artificial Intelligence, pp 7444–7452. [Online]. Available: <Go to ISI>://WOS:000485488907067
Shi L, Zhang YF, Cheng J, Lu HQ (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition, (in English), Proc Cvpr Ieee, pp 12018–12027. https://doi.org/10.1109/Cvpr.2019.01230
Shi L, Zhang YF, Cheng J, Lu HQ (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks, (in English). IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/Tip.2020.3028207
Ye F, Pu S, Zhong Q, Li C, Xie D, Tang H (2020) Dynamic GCN: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 55–63
Cheng K, Zhang YF, He XY, Chen WH, Cheng J, Lu HQ (2020) Skeleton-based action recognition with shift graph convolutional network, (in English), 2020 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr), pp 180–189. https://doi.org/10.1109/Cvpr42600.2020.00026
Xia HL, Gao XK (2021) Multi-scale mixed dense graph convolution network for skeleton-based action recognition, (in English). IEEE Access 9:36475–36484. https://doi.org/10.1109/Access.2020.3049029
Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM international conference on multimedia, pp 601–610
Liu ZY, Zhang HW, Chen ZH, Wang ZY, Ouyang WL (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition, (in English), 2020 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr), pp 140–149. https://doi.org/10.1109/Cvpr42600.2020.00022
Luan S, Zhao M, Chang X-W, Precup D (2019) Break the ceiling: stronger multi-scale deep graph convolutional networks. arXiv preprint arXiv:1906.02174
Yu L, Tian L, Du Q, Bhutto JA Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition. IET Comput Vis. https://doi.org/10.1049/cvi2.12075
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. arXiv preprint arXiv:1606.09375
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning, PMLR, pp 2014-2023
Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 399–417
Abu-El-Haija S et al (2019) Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In: International conference on machine learning, PMLR, pp 21–29
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International conference on machine learning, PMLR, pp 6861–6871
Liao R, Zhao Z, Urtasun R, Zemel RS (2019) Lanczosnet: multi-scale deep graph convolutional networks. arXiv preprint arXiv:1901.01484
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7912–7921
Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Proc AAAI Conf Artif Intell 34(03):2669–2676
Funding
This study was supported by the Key-Area Research and Development Program of Guangdong Province (2018B010109001, 2020B1111010002, 2019B020214001).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, L., Tian, L., Du, Q. et al. Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition. Appl Intell 53, 14838–14854 (2023). https://doi.org/10.1007/s10489-022-04179-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04179-8