Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3460426.3463591acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos

Published: 01 September 2021 Publication History

Abstract

This work presents a spatio-temporal activity detection and recognition framework for untrimmed surveillance videos consisting of a three-step pipeline: object detection, tracking, and activity recognition. The framework relies on the YOLO v4 architecture for object detection, Euclidean distance for tracking, while the activity recognizer uses a 3D Convolutional Deep learning architecture employing spatio-temporal boundaries and addressing it as multi-label classification. The evaluation experiments on the VIRAT dataset achieve accurate detections of the temporal boundaries and recognitions of activities in untrimmed videos, with better performance for the multi-label compared to the multi-class activity recognition.

References

[1]
George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, and Georges Quénot. 2020. TRECVID 2020: comprehensive campaign for evaluating video retrieval tasks across multiple application domains. In Proceedings of TRECVID 2020. NIST, USA, NIST, 100 Bureau Drive Gaithersburg, MD 20899.
[2]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
[3]
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the ieee conference on computer vision and pattern recognition. IEEE, 961--970.
[4]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6299--6308.
[5]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2625--2634.
[6]
Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, and Ram Nevatia. 2017. Turn tap: Temporal unit regression network for temporal action proposals. In Proceedings of the IEEE international conference on computer vision. IEEE, 3628--3636.
[7]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can spatiotemporal 3d CNNs retrace the history of 2d cnns and imagenet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 6546--6555.
[8]
Yu-Gang Jiang, Jingen Liu, A Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. 2014. THUMOS challenge: Action recognition with a large number of classes.
[9]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 1725--1732.
[10]
Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. 2018. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 3--19.
[11]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[12]
Wenhe Liu, Guoliang Kang, Po-Yao Huang, Xiaojun Chang, Yijun Qian, Junwei Liang, Liangke Gui, Jing Wen, and Peng Chen. 2020. Argus: Efficient activity detection system for extended video analysis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. IEEE, 126--133.
[13]
Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, JK Aggarwal, Hyungtae Lee, Larry Davis, et al. 2011. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. IEEE, 3153--3160.
[14]
Aayush Jung Rana, Praveen Tirupattur, Mamshad Nayeem Rizve, Kevin Duarte, Ugur Demir, Yogesh Singh Rawat, and Mubarak Shah. 2019. An Online System for Real-Time Activity Detection in Untrimmed Surveillance Videos. In TRECVID. NIST.
[15]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[16]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 6 (2016), 1137--1149.
[17]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional net- works for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 1. NIPS, 568--576.
[18]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, 10781--10790.
[19]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. IEEE, 4489--4497.
[20]
Huifen Xia and Yongzhao Zhan. 2020. A Survey on Temporal Action Localization. IEEE Access 8 (2020), 70477--70487.

Cited By

View all
  • (2024)Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS CompetitionIEEE Access10.1109/ACCESS.2024.340563812(79342-79366)Online publication date: 2024
  • (2024)Actor-Centric Spatio-Temporal Feature Extraction for Action RecognitionComputer Vision and Image Processing10.1007/978-3-031-58181-6_50(586-599)Online publication date: 3-Jul-2024
  • (2023)Analysis and comparison of various deep learning models to implement suspicious activity recognition in CCTV surveillanceIntelligent Decision Technologies10.3233/IDT-23046917:4(917-942)Online publication date: 20-Nov-2023
  • Show More Cited By

Index Terms

  1. Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
    August 2021
    715 pages
    ISBN:9781450384636
    DOI:10.1145/3460426
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D-convolutional neural networks
    2. activity detection
    3. activity recognition
    4. spatiotemporal boundaries detection

    Qualifiers

    • Short-paper

    Funding Sources

    • European Commission

    Conference

    ICMR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS CompetitionIEEE Access10.1109/ACCESS.2024.340563812(79342-79366)Online publication date: 2024
    • (2024)Actor-Centric Spatio-Temporal Feature Extraction for Action RecognitionComputer Vision and Image Processing10.1007/978-3-031-58181-6_50(586-599)Online publication date: 3-Jul-2024
    • (2023)Analysis and comparison of various deep learning models to implement suspicious activity recognition in CCTV surveillanceIntelligent Decision Technologies10.3233/IDT-23046917:4(917-942)Online publication date: 20-Nov-2023
    • (2023)Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBSMultimedia Systems10.1007/s00530-023-01143-529:6(3481-3504)Online publication date: 24-Aug-2023
    • (2022)Intelligence-Led Policing and the New Technologies Adopted by the Hellenic PoliceDigital10.3390/digital20200092:2(143-163)Online publication date: 29-Mar-2022
    • (2022)GabriellaV2: Towards better generalization in surveillance videos for Action Detection2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW54805.2022.00018(122-132)Online publication date: Jan-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media