Multi-Perspective Representation to Part-Based Graph for Group Activity Recognition
<p>Illustration of the importance of the local parts. (<b>a</b>) “walking” is significantly related to the legs. (<b>b</b>) The players in the same team have the similar holistic appearance although they perform different actions, while thoses in the different teams have the holistically different but locally similar appearance when they perform the similar actions. The group activities in (<b>c</b>,<b>d</b>) are queuing and talking, respectively. The persons in the two pictures look similar in terms of most body parts. The interactions of their heads and hands are critical cues.</p> "> Figure 2
<p>An overview of the proposed framework for group activity recognition. It consists of two branches for modeling the static and dynamic representations. For the two branches, we apply RoIAlign to extract feature maps of actors, and we decompose them as a set of parts. Both branches incorporate an intra-actor part graph module and inter-actor part graph module to explore intra-actor and inter-actor interactions in the part-level features. Finally, we embed the part features into feature vectors and feed them into classifiers for individual action and group activity recognition. The prediction scores of different branches are late fused to obtain the final results.</p> "> Figure 3
<p>Illustration of the intra-actor part graph module. The blue and green cubes mean the original feature map extracted by backbone and the refined feature maps, respectively. The feature maps can be treated as a set of part features. We build graphs for each individual, and the nodes in different colors represent different parts. GCN is applied for message passing among parts to refine original feature maps. To reduce the parameters, the weights in intra-actor part graph module are sharing for all individuals.</p> "> Figure 4
<p>The difference of graph construction between (<b>a</b>) conventional graph network and (<b>b</b>) our proposed inter-actor part graph. The different colors represent different individuals in the scene.</p> "> Figure 5
<p>Visualization of part-level inter-actor interaction relations captured by the visual relation graph on Volleyball Dataset. (<b>a</b>) Input frame with ground-truth bounding box. Each actor is assigned one of numbers in [0, 11]; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">N</mi> <mo>×</mo> <mi mathvariant="bold-italic">N</mi> </mrow> </semantics></math> dependency matrix. <span class="html-italic">N</span> denotes the number of actor in the video. The values change from large to small along with the colors changing from blue to white; (<b>c</b>) Examples of part-level weight vector.</p> "> Figure 6
<p>Confusion matrices for I3D + Inception-v3 model on Volleyball Dataset. “l-” and “r-” are abbreviations for “Left” and “Right” in the group activity labels.</p> "> Figure 7
<p>Confusion matrices for I3D + Inception-v3 model on Collective Activity Dataset.</p> ">
Abstract
:1. Introduction
- We propose the part-level relation modeling method to explore the fine-grained interactive representations for intra-actor and inter-actor parts.
- An intra-actor part graph module is designed to explore structural information of individuals for discriminative individual representations based on the interaction between different parts of a single individual, and an inter-actor part graph module is designed to explore the latent part-level context information by capturing the visual and location relation between the same part features of different actors.
- The experimental results on two publicly available datasets, the Volleyball and Collective Activity datasets, demonstrate that the proposed method achieves state-of-the-art performance.
2. Related Work
2.1. Group Activity Recognition
2.2. CNN for Action Recognition
3. Methodology
3.1. Overview
3.2. Part Feature Extraction
3.3. Intra-Actor Part Graph Module
3.4. Inter-Actor Part Graph Module
3.5. Fusion of Multiple Branches
3.6. Training Objective
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Metrics
4.4. Ablation Study
4.5. Comparison to the State of the Art
4.6. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yan, R.; Xie, L.; Tang, J.; Shu, X.; Tian, Q. HiGCIN: Hierarchical Graph-based Cross Inference Network for Group Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020. [Google Scholar] [CrossRef] [PubMed]
- Qi, M.; Wang, Y.; Qin, J.; Li, A.; Luo, J.; Van Gool, L. stagNet: An attentive semantic RNN for group activity and individual action recognition. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 549–565. [Google Scholar] [CrossRef]
- Wu, L.F.; Wang, Q.; Jian, M.; Qiao, Y.; Zhao, B.X. A comprehensive review of group activity recognition in videos. Int. J. Autom. Comput. 2021, 18, 334–350. [Google Scholar] [CrossRef]
- Amer, M.R.; Xie, D.; Zhao, M.; Todorovic, S.; Zhu, S.C. Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 187–200. [Google Scholar]
- Amer, M.R.; Lei, P.; Todorovic, S. Hirf: Hierarchical random field for collective activity recognition in videos. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–7 September 2014; pp. 572–585. [Google Scholar]
- Lan, T.; Wang, Y.; Yang, W.; Robinovitch, S.N.; Mori, G. Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1549–1562. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on cComputer Vision, Munich, Germany, 8–14 September 2016; pp. 20–36. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [Green Version]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Ibrahim, M.S.; Muralidharan, S.; Deng, Z.; Vahdat, A.; Mori, G. A hierarchical deep temporal model for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1971–1980. [Google Scholar]
- Yan, R.; Tang, J.; Shu, X.; Li, Z.; Tian, Q. Participation-contributed temporal dynamic model for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018; pp. 1292–1300. [Google Scholar]
- Gavrilyuk, K.; Sanford, R.; Javan, M.; Snoek, C.G. Actor-transformers for group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 839–848. [Google Scholar]
- Wu, J.; Wang, L.; Wang, L.; Guo, J.; Wu, G. Learning actor relation graphs for group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9964–9974. [Google Scholar]
- Tang, Y.; Wei, Y.; Yu, X.; Lu, J.; Zhou, J. Graph Interaction Networks for Relation Transfer in Human Activity Videos. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 2872–2886. [Google Scholar] [CrossRef]
- Li, S.; Cao, Q.; Liu, L.; Yang, K.; Liu, S.; Hou, J.; Yi, S. Groupformer: Group activity recognition with clustered spatial-temporal transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13668–13677. [Google Scholar]
- Choi, W.; Savarese, S. Understanding collective activities of people from videos. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1242–1257. [Google Scholar] [CrossRef]
- Lan, T.; Sigal, L.; Mori, G. Social roles in hierarchical models for human activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1354–1361. [Google Scholar]
- Deng, Z.; Vahdat, A.; Hu, H.; Mori, G. Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4772–4781. [Google Scholar]
- Wang, M.; Ni, B.; Yang, X. Recurrent modeling of interaction context for collective activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3048–3056. [Google Scholar]
- Shu, X.; Zhang, L.; Sun, Y.; Tang, J. Host-parasite: Graph LSTM-in-LSTM for group activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 663–674. [Google Scholar] [CrossRef]
- Wu, L.; Yang, Z.; He, J.; Jian, M.; Xu, Y.; Xu, D.; Chen, C.W. Ontology-based global and collective motion patterns for event classification in basketball videos. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2178–2190. [Google Scholar] [CrossRef] [Green Version]
- Bagautdinov, T.; Alahi, A.; Fleuret, F.; Fua, P.; Savarese, S. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4315–4324. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Lu, L.; Lu, Y.; Yu, R.; Di, H.; Zhang, L.; Wang, S. GAIM: Graph attention interaction model for collective activity recognition. IEEE Trans. Multimed. 2019, 22, 524–539. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
- Jiang, H.; Learned-Miller, E. Face detection with the faster R-CNN. In Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 650–657. [Google Scholar]
- Palaz, D.; Collobert, R. Analysis of CNN-Based Speech Recognition System Using Raw Speech as Input; Technical Report; Idiap: Martigny, Switzerland, 2015. [Google Scholar]
- Kowsher, M.; Alam, M.A.; Uddin, M.J.; Ahmed, F.; Ullah, M.W.; Islam, M.R. Detecting Third Umpire Decisions & Automated Scoring System of Cricket. In Proceedings of the International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 11–12 July 2019; pp. 1–8. [Google Scholar]
- Ciaburro, G.; Iannace, G. Improving Smart Cities Safety Using Sound Events Detection Based on Deep Neural Network Algorithms. Informatics 2020, 7, 23. [Google Scholar] [CrossRef]
- Qiu, Z.; Yao, T.; Mei, T. Learning spatio-temporal representation with pseudo-3d residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5533–5541. [Google Scholar]
- Song, X.; Lan, C.; Zeng, W.; Xing, J.; Sun, X.; Yang, J. Temporal–spatial mapping for action recognition. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 748–759. [Google Scholar] [CrossRef]
- Kong, L.; Huang, D.; Qin, J.; Wang, Y. A joint framework for athlete tracking and action recognition in sports videos. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 532–548. [Google Scholar] [CrossRef]
- Zhang, J.; Hu, H.; Liu, Z. Appearance-and-Dynamic Learning With Bifurcated Convolution Neural Network for Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1593–1606. [Google Scholar] [CrossRef]
- Wang, J.; Hu, J.; Li, S.; Yuan, Z. Revisiting Hard Example for Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 546–556. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. arXiv 2014, arXiv:1406.2199. [Google Scholar]
- Sun, D.; Yang, X.; Liu, M.Y.; Kautz, J. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8934–8943. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Pramono, R.R.A.; Chen, Y.T.; Fang, W.H. Empowering Relational Network by Self-attention Augmented Conditional Random Fields for Group Activity Recognition. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 71–90. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; Volume 30, p. 3. [Google Scholar]
- Tu, C.; Liu, H.; Liu, Z.; Sun, M. Cane: Context-aware network embedding for relation modeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1722–1731. [Google Scholar]
- Tian, H.; Xu, N.; Liu, A.A.; Zhang, Y. Part-Aware Interactive Learning for Scene Graph Generation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3155–3163. [Google Scholar]
- Tang, Y.; Lu, J.; Wang, Z.; Yang, M.; Zhou, J. Learning semantics-preserving attention and contextual interaction for group activity recognition. IEEE Trans. Image Process. 2019, 28, 4997–5012. [Google Scholar] [CrossRef]
- Azar, S.M.; Atigh, M.G.; Nickabadi, A.; Alahi, A. Convolutional relational machine for group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7892–7901. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; pp. 6299–6308. [Google Scholar]
- Choi, W.; Shahid, K.; Savarese, S. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, Japan, 27 September–4 October 2009; pp. 1282–1289. [Google Scholar]
- Zach, C.; Pock, T.; Bischof, H. A duality based approach for realtime tv-l 1 optical flow. In Proceedings of the Joint Pattern Recognition Symposium, Heidelberg, Germany, 12–14 September 2007; pp. 214–223. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Fang, H.S.; Xie, S.; Tai, Y.W.; Lu, C. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–28 October 2017; pp. 2334–2343. [Google Scholar]
- Yuan, H.; Ni, D. Learning Visual Context for Group Activity Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 3261–3269. [Google Scholar]
- Shu, T.; Todorovic, S.; Zhu, S.C. CERN: Confidence-energy recurrent network for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5523–5531. [Google Scholar]
- Hu, G.; Cui, B.; He, Y.; Yu, S. Progressive relation learning for group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 980–989. [Google Scholar]
- Yuan, H.; Ni, D.; Wang, M. Spatio-temporal dynamic inference network for group activity recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 7476–7485. [Google Scholar]
- Kim, D.; Lee, J.; Cho, M.; Kwak, S. Detector-Free Weakly Supervised Group Activity Recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 July 2022. [Google Scholar]
- Tang, Y.; Wang, Z.; Li, P.; Lu, J.; Yang, M.; Zhou, J. Mining semantics-preserving attention for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018; pp. 1283–1291. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Models | MCA(VD)-S | MCA(VD)-D | MCA(CAD)-S | MCA(CAD)-D |
---|---|---|---|---|
Non-Part Graph | 90.8 | 91.3 | 86.0 | 87.9 |
w/o Inter-Actor Part Graph (V+L) | 91.2 | 92.1 | 86.9 | 88.5 |
w/o Inter-Actor Part Graph (V) | 91.4 | 92.7 | 87.1 | 90.7 |
w/o Inter-Actor Part Graph (L) | 91.4 | 92.4 | 87.3 | 89.0 |
w/o Intra-Actor Part Graph | 91.7 | 92.8 | 87.9 | 89.6 |
Ours(RGB) | 92.3 | 93.3 | 88.9 | 91.6 |
Number of Parts | MCA(VD) | MCA(CAD) |
---|---|---|
4 | 92.7 | 89.3 |
9 | 93.3 | 91.6 |
16 | 92.9 | 91.0 |
25 | 93.2 | 90.3 |
Method | Backbone | G-MCA(VD) | I-MCA(VD) | G-MCA(CAD) | G-MPCA(CAD) |
---|---|---|---|---|---|
HDTM(RGB) [14] | AlexNet | 81.9 | - | 81.5 | - |
PCTDM(RGB+Flow) [15] | AlexNet | 87.7 | - | - | 92.2 |
CERN(RGB) [57] | VGG16 | 83.3 | - | 87.2 | 88.3 |
stagNet(RGB) [2] | VGG16 | 89.3 | - | 89.1 | 91.0 |
PRL(RGB) [58] | VGG16 | 91.4 | - | - | 93.8 |
HiGCIN(RGB) [1] | ResNet18 | 91.4 | - | 93.4 | 93.0 |
SPTS(RGB+Flow) [61] | VGG | 90.7 | - | - | 95.7 |
SSU(RGB) [26] | Inception-v3 | 90.6 | 81.8 | - | - |
ARG(RGB) [17] | Inception-v3 | 92.5 | 83.0 | 91.0 | - |
CRM(RGB+Flow) [50] | I3D | 93.0 | - | 85.8 | 94.2 |
Actor-Transformer(RGB+Flow) [16] | I3D | 93.0 | 83.7 | 92.8 | 98.5 |
Actor-Transformer(Pose +Flow) [16] | HRNet + I3D | 94.4 | 85.9 | 91.2 | - |
ACRF(RGB+Flow) [45] | I3D+FPN+ResNet50 | 94.5 | 83.1 | 94.6 | - |
STDIN(RGB) [59] | ResNet18 | 93.3 | - | - | 95.3 |
STBiP(RGB+Pose) [56] | HRNet+Vgg16 | 94.7 | - | - | 96.4 |
GroupFormer(RGB+Flow) [19] | I3D | 94.9 | 84.0 | 94.7 | - |
Detector-Free(RGB) [60] | ResNet18 | 90.5 | - | - | - |
Ours(RGB+Flow) | I3D | 94.1 | 84.1 | 92.2 | 98.8 |
Ours(RGB+Flow) | I3D+Inception-v3 | 94.8 | 85.0 | 93.2 | 98.8 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, L.; Lang, X.; Xiang, Y.; Wang, Q.; Tian, M. Multi-Perspective Representation to Part-Based Graph for Group Activity Recognition. Sensors 2022, 22, 5521. https://doi.org/10.3390/s22155521
Wu L, Lang X, Xiang Y, Wang Q, Tian M. Multi-Perspective Representation to Part-Based Graph for Group Activity Recognition. Sensors. 2022; 22(15):5521. https://doi.org/10.3390/s22155521
Chicago/Turabian StyleWu, Lifang, Xianglong Lang, Ye Xiang, Qi Wang, and Meng Tian. 2022. "Multi-Perspective Representation to Part-Based Graph for Group Activity Recognition" Sensors 22, no. 15: 5521. https://doi.org/10.3390/s22155521