Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3316615.3316722acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicscaConference Proceedingsconference-collections
research-article

Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition

Published: 19 February 2019 Publication History

Abstract

Deep neural network algorithms have shown promising performance for many tasks in computer vision field. Several neural network-based methods have been proposed to recognize group activities from video sequences. However, there are still several challenges that are related to multiple groups with different activities within a scene. The strong correlation that exists among individual motion, groups and activities can be utilized to detect groups and recognize their concurrent activities. Motivated by these observations, we propose a unified deep learning framework for detecting multiple groups and recognizing their corresponding collective activity based on Long Short-Term Memory (LSTM) network. In this framework, we use a pre-trained convolutional neural network (CNN) to extract features from the frames and appearances of persons. An objective function has been proposed to learn the amount of pairwise interaction between persons. The obtained individual features are passed to a clustering algorithm to detect groups in the scene. Then, an LSTM based model is used to recognize group activities. Together with this, a scene level CNN followed by LSTM is used to extract and learn scene level feature. Finally, the activities from the group level and the scene context level are integrated to infer the collective activity. The proposed method is evaluated on the benchmark collective activity dataset and compared with several baselines. The experimental results show its competitive performance for the collective activity recognition task.

References

[1]
R. Poppe, "A survey on vision-based human action recognition," Image and vision computing, vol. 28, no. 6, pp. 976--990, 2010.
[2]
H. Wang and C. Schmid, "Action Recognition with Improved Trajectories," in ICCV, 2013, pp. 3551--3558.
[3]
P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea, "Machine Recognition of Human Activities: A Survey," IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473--1488, 2008.
[4]
Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj, "Beyond gaussian pyramid: Multi-skip feature stacking for action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 204--212.
[5]
M. S. Ryoo and J. K. Aggarwal, "Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities," in Computer vision, 2009 ieee 12th international conference on, 2009, pp. 1593--1600: IEEE.
[6]
M. S. Ryoo and J. K. Aggarwal, "Recognition of composite human activities through context-free grammar based representation," in Computer vision and pattern recognition, 2006 ieee computer society conference on, 2006, vol. 2, pp. 1709--1718: IEEE.
[7]
W. Choi, K. Shahid, and S. Savarese, "What are they doing?: Collective activity classification using spatio-temporal relationship among people," in Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, 2009, pp. 1282--1289: IEEE.
[8]
T. Lan, L. Sigal, and G. Mori, "Social roles in hierarchical models for human activity recognition," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 1354--1361: IEEE.
[9]
V. Ramanathan, B. Yao, and L. Fei-Fei, "Social role discovery in human events," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2475--2482.
[10]
M. R. Amer, P. Lei, and S. Todorovic, "Hirf: Hierarchical random field for collective activity recognition in videos," in European Conference on Computer Vision, 2014, pp. 572--585: Springer.
[11]
L. Sun, H. Ai, and S. Lao, "Activity group localization by modeling the relations among participants," in European Conference on Computer Vision, 2014, pp. 741--755: Springer.
[12]
J. Carreira and A. Zisserman, "Quo vadis, action recognition? a new model and the kinetics dataset," in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017, pp. 4724--4733: IEEE.
[13]
S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221--231, 2013.
[14]
K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, 2014, pp. 568--576.
[15]
B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, "A multi-stream bi-directional recurrent neural network for fine-grained action detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1961--1970.
[16]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725--1732.
[17]
M. S. Ibrahim, S. Muralidharan, Z. Deng, A. Vahdat, and G. Mori, "A hierarchical deep temporal model for group activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1971--1980.
[18]
T. Shu, S. Todorovic, and S.-C. Zhu, "CERN: confidence-energy recurrent network for group activity recognition," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, vol. 2.
[19]
M. Wang, B. Ni, and X. Yang, "Recurrent modeling of interaction context for collective activity recognition," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[20]
T. Lan, Y. Wang, W. Yang, S. N. Robinovitch, and G. Mori, "Discriminative latent models for recognizing contextual group activities," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1549--1562, 2012.
[21]
W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese, "Discovering groups of people in images," in European conference on computer vision, 2014, pp. 417--433: Springer.
[22]
D. Weinland, R. Ronfard, and E. Boyer, "A survey of vision-based methods for action representation, segmentation and recognition," Computer vision and image understanding, vol. 115, no. 2, pp. 224--241, 2011.
[23]
W. Choi, K. Shahid, and S. Savarese, "Learning context for collective activity recognition," 2011.
[24]
M. Ryoo and J. Aggarwal, "Stochastic representation and recognition of high-level group activities," International journal of computer Vision, vol. 93, no. 2, pp. 183--200, 2011.
[25]
H. Hajimirsadeghi, W. Yan, A. Vahdat, and G. Mori, "Visual recognition by counting instances: A multi-instance cardinality potential kernel," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2596--2605.
[26]
S. Lathuilière, G. Evangelidis, and R. Horaud, "Recognition of Group Activities in Videos Based on Single-and Two-Person Descriptors," in Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, 2017, pp. 217--225: IEEE.
[27]
M. R. Amer, S. Todorovic, A. Fern, and S.-C. Zhu, "Monte carlo tree search for scheduling activity recognition," in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1353--1360.
[28]
S. Khamis, V. I. Morariu, and L. S. Davis, "Combining per-frame and per-track cues for multi-person action recognition," in European Conference on Computer Vision, 2012, pp. 116--129: Springer.
[29]
Z. Deng et al., "Deep structured models for group activity recognition," arXiv preprint arXiv:1506.04191, 2015.
[30]
W. Choi and S. Savarese, "Understanding Collective Activitiesof People from Videos," vol. 36, pp. 1242--1257.
[31]
W. Choi and S. Savarese, "A unified framework for multi-target tracking and collective activity recognition," in European Conference on Computer Vision, 2012, pp. 215--230: Springer.
[32]
S. Khamis, V. I. Morariu, and L. S. Davis, "A flow model for joint action recognition and identity maintenance," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 1218--1225: IEEE.
[33]
Z. Deng, A. Vahdat, H. Hu, and G. Mori, "Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4772--4781.
[34]
L. Wei and S. K. Shah, "Human Activity Recognition using Deep Neural Network with Contextual Information," in VISIGRAPP (5: VISAPP), 2017, pp. 34--43.
[35]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.
[36]
P. Sudowe, H. Spitzer, and B. Leibe, "Person Attribute Recognition with a Jointly-Trained Holistic CNN Model," pp. 329--337.
[37]
K. Hornik, "Approximation capabilities of multilayer feedforward networks," Neural networks, vol. 4, no. 2, pp. 251--257, 1991.
[38]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Kdd, 1996, vol. 96, no. 34, pp. 226--231.
[39]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929--1958, 2014.
[40]
F. c. Chollet. (2015). Keras. Available: https://github.com/fchollet/keras
[41]
O. Russakovsky et al., "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, vol. 115, no. 3, pp. 211--252, 2015.
[42]
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[43]
B. Antic and B. Ommer, "Learning latent constituents for recognition of group activities in video," in European Conference on Computer Vision, 2014, pp. 33--47: Springer.

Cited By

View all
  • (2022)A Group Discovery Method Based on Collaborative Filtering and Knowledge Graph for IoT ScenariosIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.30506229:1(279-290)Online publication date: Feb-2022

Index Terms

  1. Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications
    February 2019
    611 pages
    ISBN:9781450365734
    DOI:10.1145/3316615
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of New Brunswick: University of New Brunswick

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 February 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. Clustering Algorithm
    3. Group Activity Recognition
    4. LSTM

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSCA '19

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Group Discovery Method Based on Collaborative Filtering and Knowledge Graph for IoT ScenariosIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.30506229:1(279-290)Online publication date: Feb-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media