Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition

Kai Hu^1,2,
Junlan Jin¹^na1,
Chaowen Shen¹^na1,
Min Xia^1,2^na1 &
…
Liguo Weng¹^na1

717 Accesses
11 Citations
Explore all metrics

Abstract

Graph Convolutional Networks (GCNs) have become the standard skeleton-based human action recognition research paradigm. As a core component in graph convolutional networks, the construction of graph topology often significantly impacts the accuracy of classification. Considering that the fixed physical graph topology cannot capture the non-physical connection relationship of the human body, existing methods capture more flexible node relationships by constructing dynamic graph structures. This paper proposes a novel attentional weighting strategy-based dynamic GCN (AWD-GCN). We construct a new dynamic adjacency matrix, which uses the attention weighting mechanism to simultaneously capture the dynamic relationships among the three partitions of the human skeleton under multiple actions to extract the discriminative action features fully. In addition, considering the importance of skeletal node position features for action differentiation, we propose new multi-scale position attention and multi-level attention. We use a multi-scale modelling method to capture the complex relationship between skeletal node position features, which is helpful in distinguishing human action in different spatial scales. Extensive experiments on two challenging datasets, NTU-RGB+D and Skeleton-Kinetics, demonstrate the effectiveness and superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Combining channel-wise joint attention and temporal attention in graph convolutional networks for skeleton-based action recognition

Article 31 December 2022

Spatial–Temporal gated graph attention network for skeleton-based action recognition

Article 22 June 2023

Data availability

The data and code used to support the findings of this study are available from the corresponding author upon request (001600@nuist.edu.cn).

References

Hu, K., Jin, J., Zheng, F., Weng, L., Ding, Y.: Overview of behavior recognition based on deep learning. Artificial Intelligence Review, 1–33 (2022)
Hu, K., Li, M., Xia, M., Lin, H.: Multi-scale feature aggregation network for water area segmentation. Remote Sens 14(1), 206 (2022)
Article Google Scholar
Hu, K., Zhang, E., Xia, M., Weng, L., Lin, H.: Mcanet: a multi-branch network for cloud/snow segmentation in high-resolution remote sensing images. Remote Sens 15(4), 1055 (2023)
Article Google Scholar
Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S.: Rgb-d-based human motion recognition with deep learning: A survey. Comput. Vis. Image Underst. 171, 118–139 (2018)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (2014)
Arandjelovic, R., Zisserman, A.: All about vlad. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1578–1585 (2013)
Duta, I.C., Ionescu, B., Aizawa, K., Sebe, N.: Spatio-temporal vlad encoding for human action recognition in videos. In: International Conference on Multimedia Modeling, pp. 365–378. Springer, New York (2017)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2017)
Article Google Scholar
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297 (2017)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)
Article Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw 20(1), 61–80 (2008)
Article Google Scholar
Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.-C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018)
Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–118 (2018)
Li, B., Li, X., Zhang, Z., Wu, F.: Spatio-temporal graph routing for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8561–8568 (2019)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Peng, W., Shi, J., Varanka, T., Zhao, G.: Rethinking the st-gcns for 3d skeleton-based human action recognition. Neurocomputing 454, 45–53 (2021)
Article Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8), 3047–3060 (2019)
Article Google Scholar
Thakkar, K., Narayanan, P.: Part-based graph convolutional network for action recognition. arXiv preprint arXiv:1809.04983 (2018)
Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039 (2020)
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Two stream lstm: A deep fusion framework for human action recognition. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 177–186 (2017). IEEE
Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 597–600 (2017). IEEE
Peng, W., Shi, J., Zhao, G.: Spatial temporal graph deconvolutional network for skeleton-based human action recognition. IEEE Signal Process. Lett. 28, 244–248 (2021)
Article Google Scholar
Peng, W., Hong, X., Zhao, G.: Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn. 115, 107921 (2021)
Article Google Scholar
Peng, W., Shi, J., Xia, Z., Zhao, G.: Mix dimension in poincaré geometry for 3d skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1432–1440 (2020)
Mostafa, A., Peng, W., Zhao, G.: Hyperbolic spatial temporal graph convolutional networks. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 3301–3305 (2022). IEEE
Hu, K., Ding, Y., Jin, J., Weng, L., Xia, M.: Skeleton motion recognition based on multi-scale deep spatio-temporal features. Appl. Sci. 12(3), 1028 (2022)
Article Google Scholar
Liu, T., Zhao, R., Lam, K.-M., Kong, J.: Visual-semantic graph neural network with pose-position attentive learning for group activity recognition. Neurocomputing 491, 217–231 (2022)
Article Google Scholar
Zhao, R., Liu, T., Huang, Z., Lun, D.P.-K., Lam, K.K.: Geometry-aware facial expression recognition via attentive graph convolutional networks. IEEE Transactions on Affective Computing (2021)
Liu, J., Wang, G., Hu, P., Duan, L.-Y., Kot, A.C.: Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1647–1656 (2017)
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Heidari, N., Iosifidis, A.: On the spatial attention in spatio-temporal graph convolutional networks for skeleton-based human action recognition. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2021). IEEE
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Qilong, W., Banggu, W., Pengfei, Z., Peihua, L., Wangmeng, Z., Qinghua, H.: Eca-net: efficient channel attention for deep convolutional neural networks 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., Wang, L.: Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2020)
Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., Torr, P.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
Article Google Scholar
Zhao, Y., Chen, J., Zhang, Z., Zhang, R.: Ba-net: Bridge attention for deep convolutional neural networks. In: European Conference on Computer Vision, pp. 297–312. Springer, New York (2022)
Wang, M., Ni, B., Yang, X.: Learning multi-view interactional skeleton graph for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Soo Kim, T., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)
Zheng, W., Li, L., Zhang, Z., Huang, Y., Wang, L.: Relational network for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 826–831 (2019). IEEE
Li, C., Xie, C., Zhang, B., Han, J., Zhen, X., Chen, J.: Memory attention networks for skeleton-based action recognition. IEEE Transactions on Neural Networks and Learning Systems (2021)
Peng, W., Hong, X., Chen, H., Zhao, G.: Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2669–2676 (2020)
Zhang, J., Ye, G., Tu, Z., Qin, Y., Qin, Q., Zhang, J., Liu, J.: A spatial attentive and temporal dilated (satd) gcn for skeleton-based action recognition. CAAI Trans Intell Technol 7(1), 46–55 (2022)
Article Google Scholar
Tu, Z., Zhang, J., Li, H., Chen, Y., Yuan, J.: Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition. IEEE Transactions on Multimedia (2022)
Twinanda, A.P., Alkan, E.O., Gangi, A., de Mathelin, M., Padoy, N.: Data-driven spatio-temporal rgbd feature encoding for action recognition in operating rooms. Int. J. Comput. Assist. Radiol. Surg. 10(6), 737–747 (2015)
Article Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7912–7921 (2019)
Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 208, 103219 (2021)
Article Google Scholar

Download references

Acknowledgements

Research in this article is supported by the National Natural Science Foundation of China (No. 42075130), the financial support of Jiangsu Austin Optronics Technology Co., Ltd. is deeply appreciated, and I would like to express my heartfelt thanks to those reviewers and editors who submitted valuable revisions to this article.

Author information

Junlan Jin, Chaowen Shen, Min Xia, Liguo Weng have contributed equally to this work.

Authors and Affiliations

School of Automation, Nanjing University of Information Science and Technology, No. 219, Ningliu Road, Nanjing, 210044, Jiangsu, China
Kai Hu, Junlan Jin, Chaowen Shen, Min Xia & Liguo Weng
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, No. 219, Ningliu Road, Nanjing, 210044, Jiangsu, China
Kai Hu & Min Xia

Authors

Kai Hu
View author publications
You can also search for this author in PubMed Google Scholar
Junlan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Chaowen Shen
View author publications
You can also search for this author in PubMed Google Scholar
Min Xia
View author publications
You can also search for this author in PubMed Google Scholar
Liguo Weng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors drafted the manuscript, read, and approved the final manuscript.

Corresponding author

Correspondence to Kai Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, K., Jin, J., Shen, C. et al. Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition. Multimedia Systems 29, 1941–1954 (2023). https://doi.org/10.1007/s00530-023-01082-1

Download citation

Received: 21 December 2022
Accepted: 20 March 2023
Published: 01 April 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00530-023-01082-1

Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Combining channel-wise joint attention and temporal attention in graph convolutional networks for skeleton-based action recognition

Spatial–Temporal gated graph attention network for skeleton-based action recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Combining channel-wise joint attention and temporal attention in graph convolutional networks for skeleton-based action recognition

Spatial–Temporal gated graph attention network for skeleton-based action recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation