Abstract
Graph Convolutional Networks (GCNs) have become the standard skeleton-based human action recognition research paradigm. As a core component in graph convolutional networks, the construction of graph topology often significantly impacts the accuracy of classification. Considering that the fixed physical graph topology cannot capture the non-physical connection relationship of the human body, existing methods capture more flexible node relationships by constructing dynamic graph structures. This paper proposes a novel attentional weighting strategy-based dynamic GCN (AWD-GCN). We construct a new dynamic adjacency matrix, which uses the attention weighting mechanism to simultaneously capture the dynamic relationships among the three partitions of the human skeleton under multiple actions to extract the discriminative action features fully. In addition, considering the importance of skeletal node position features for action differentiation, we propose new multi-scale position attention and multi-level attention. We use a multi-scale modelling method to capture the complex relationship between skeletal node position features, which is helpful in distinguishing human action in different spatial scales. Extensive experiments on two challenging datasets, NTU-RGB+D and Skeleton-Kinetics, demonstrate the effectiveness and superiority of our method.
Similar content being viewed by others
Data availability
The data and code used to support the findings of this study are available from the corresponding author upon request (001600@nuist.edu.cn).
References
Hu, K., Jin, J., Zheng, F., Weng, L., Ding, Y.: Overview of behavior recognition based on deep learning. Artificial Intelligence Review, 1–33 (2022)
Hu, K., Li, M., Xia, M., Lin, H.: Multi-scale feature aggregation network for water area segmentation. Remote Sens 14(1), 206 (2022)
Hu, K., Zhang, E., Xia, M., Weng, L., Lin, H.: Mcanet: a multi-branch network for cloud/snow segmentation in high-resolution remote sensing images. Remote Sens 15(4), 1055 (2023)
Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S.: Rgb-d-based human motion recognition with deep learning: A survey. Comput. Vis. Image Underst. 171, 118–139 (2018)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (2014)
Arandjelovic, R., Zisserman, A.: All about vlad. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1578–1585 (2013)
Duta, I.C., Ionescu, B., Aizawa, K., Sebe, N.: Spatio-temporal vlad encoding for human action recognition in videos. In: International Conference on Multimedia Modeling, pp. 365–378. Springer, New York (2017)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2017)
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297 (2017)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw 20(1), 61–80 (2008)
Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.-C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018)
Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–118 (2018)
Li, B., Li, X., Zhang, Z., Wu, F.: Spatio-temporal graph routing for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8561–8568 (2019)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Peng, W., Shi, J., Varanka, T., Zhao, G.: Rethinking the st-gcns for 3d skeleton-based human action recognition. Neurocomputing 454, 45–53 (2021)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8), 3047–3060 (2019)
Thakkar, K., Narayanan, P.: Part-based graph convolutional network for action recognition. arXiv preprint arXiv:1809.04983 (2018)
Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039 (2020)
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Two stream lstm: A deep fusion framework for human action recognition. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 177–186 (2017). IEEE
Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 597–600 (2017). IEEE
Peng, W., Shi, J., Zhao, G.: Spatial temporal graph deconvolutional network for skeleton-based human action recognition. IEEE Signal Process. Lett. 28, 244–248 (2021)
Peng, W., Hong, X., Zhao, G.: Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn. 115, 107921 (2021)
Peng, W., Shi, J., Xia, Z., Zhao, G.: Mix dimension in poincaré geometry for 3d skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1432–1440 (2020)
Mostafa, A., Peng, W., Zhao, G.: Hyperbolic spatial temporal graph convolutional networks. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 3301–3305 (2022). IEEE
Hu, K., Ding, Y., Jin, J., Weng, L., Xia, M.: Skeleton motion recognition based on multi-scale deep spatio-temporal features. Appl. Sci. 12(3), 1028 (2022)
Liu, T., Zhao, R., Lam, K.-M., Kong, J.: Visual-semantic graph neural network with pose-position attentive learning for group activity recognition. Neurocomputing 491, 217–231 (2022)
Zhao, R., Liu, T., Huang, Z., Lun, D.P.-K., Lam, K.K.: Geometry-aware facial expression recognition via attentive graph convolutional networks. IEEE Transactions on Affective Computing (2021)
Liu, J., Wang, G., Hu, P., Duan, L.-Y., Kot, A.C.: Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1647–1656 (2017)
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Heidari, N., Iosifidis, A.: On the spatial attention in spatio-temporal graph convolutional networks for skeleton-based human action recognition. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2021). IEEE
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Qilong, W., Banggu, W., Pengfei, Z., Peihua, L., Wangmeng, Z., Qinghua, H.: Eca-net: efficient channel attention for deep convolutional neural networks 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., Wang, L.: Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2020)
Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., Torr, P.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
Zhao, Y., Chen, J., Zhang, Z., Zhang, R.: Ba-net: Bridge attention for deep convolutional neural networks. In: European Conference on Computer Vision, pp. 297–312. Springer, New York (2022)
Wang, M., Ni, B., Yang, X.: Learning multi-view interactional skeleton graph for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Soo Kim, T., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)
Zheng, W., Li, L., Zhang, Z., Huang, Y., Wang, L.: Relational network for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 826–831 (2019). IEEE
Li, C., Xie, C., Zhang, B., Han, J., Zhen, X., Chen, J.: Memory attention networks for skeleton-based action recognition. IEEE Transactions on Neural Networks and Learning Systems (2021)
Peng, W., Hong, X., Chen, H., Zhao, G.: Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2669–2676 (2020)
Zhang, J., Ye, G., Tu, Z., Qin, Y., Qin, Q., Zhang, J., Liu, J.: A spatial attentive and temporal dilated (satd) gcn for skeleton-based action recognition. CAAI Trans Intell Technol 7(1), 46–55 (2022)
Tu, Z., Zhang, J., Li, H., Chen, Y., Yuan, J.: Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition. IEEE Transactions on Multimedia (2022)
Twinanda, A.P., Alkan, E.O., Gangi, A., de Mathelin, M., Padoy, N.: Data-driven spatio-temporal rgbd feature encoding for action recognition in operating rooms. Int. J. Comput. Assist. Radiol. Surg. 10(6), 737–747 (2015)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7912–7921 (2019)
Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 208, 103219 (2021)
Acknowledgements
Research in this article is supported by the National Natural Science Foundation of China (No. 42075130), the financial support of Jiangsu Austin Optronics Technology Co., Ltd. is deeply appreciated, and I would like to express my heartfelt thanks to those reviewers and editors who submitted valuable revisions to this article.
Author information
Authors and Affiliations
Contributions
All authors drafted the manuscript, read, and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, K., Jin, J., Shen, C. et al. Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition. Multimedia Systems 29, 1941–1954 (2023). https://doi.org/10.1007/s00530-023-01082-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01082-1