short-paper

Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos

Authors:

Konstantinos Gkountakos,

Despoina Touska,

Konstantinos Ioannidis,

Theodora Tsikrika,

Stefanos Vrochidis,

Ioannis KompatsiarisAuthors Info & Claims

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 451 - 455

https://doi.org/10.1145/3460426.3463591

Published: 01 September 2021 Publication History

Abstract

This work presents a spatio-temporal activity detection and recognition framework for untrimmed surveillance videos consisting of a three-step pipeline: object detection, tracking, and activity recognition. The framework relies on the YOLO v4 architecture for object detection, Euclidean distance for tracking, while the activity recognizer uses a 3D Convolutional Deep learning architecture employing spatio-temporal boundaries and addressing it as multi-label classification. The evaluation experiments on the VIRAT dataset achieve accurate detections of the temporal boundaries and recognitions of activities in untrimmed videos, with better performance for the multi-label compared to the multi-class activity recognition.

References

[1]

George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, and Georges Quénot. 2020. TRECVID 2020: comprehensive campaign for evaluating video retrieval tasks across multiple application domains. In Proceedings of TRECVID 2020. NIST, USA, NIST, 100 Bureau Drive Gaithersburg, MD 20899.

[2]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).

[3]

Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the ieee conference on computer vision and pattern recognition. IEEE, 961--970.

[4]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6299--6308.

[5]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2625--2634.

[6]

Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, and Ram Nevatia. 2017. Turn tap: Temporal unit regression network for temporal action proposals. In Proceedings of the IEEE international conference on computer vision. IEEE, 3628--3636.

[7]

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can spatiotemporal 3d CNNs retrace the history of 2d cnns and imagenet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 6546--6555.

[8]

Yu-Gang Jiang, Jingen Liu, A Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. 2014. THUMOS challenge: Action recognition with a large number of classes.

[9]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 1725--1732.

Digital Library

[10]

Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. 2018. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 3--19.

Digital Library

[11]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[12]

Wenhe Liu, Guoliang Kang, Po-Yao Huang, Xiaojun Chang, Yijun Qian, Junwei Liang, Liangke Gui, Jing Wen, and Peng Chen. 2020. Argus: Efficient activity detection system for extended video analysis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. IEEE, 126--133.

[13]

Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, JK Aggarwal, Hyungtae Lee, Larry Davis, et al. 2011. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. IEEE, 3153--3160.

Digital Library

[14]

Aayush Jung Rana, Praveen Tirupattur, Mamshad Nayeem Rizve, Kevin Duarte, Ugur Demir, Yogesh Singh Rawat, and Mubarak Shah. 2019. An Online System for Real-Time Activity Detection in Untrimmed Surveillance Videos. In TRECVID. NIST.

[15]

Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).

[16]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 6 (2016), 1137--1149.

[17]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional net- works for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 1. NIPS, 568--576.

Digital Library

[18]

Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, 10781--10790.

[19]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. IEEE, 4489--4497.

Digital Library

[20]

Huifen Xia and Yongzhao Zhan. 2020. A Survey on Temporal Action Localization. IEEE Access 8 (2020), 70477--70487.

Cited By

Vadicamo LArnold RBailer WCarrara FGurrin CHezel NLi XLokoc JLubos SMa ZMessina NNguyen TPeska LRossetto LSauter LSchöffmann KSpiess FTran MVrochidis S(2024)Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS CompetitionIEEE Access10.1109/ACCESS.2024.340563812(79342-79366)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3405638
Anil KBouroche MSchoen-Phelan B(2024)Actor-Centric Spatio-Temporal Feature Extraction for Action RecognitionComputer Vision and Image Processing10.1007/978-3-031-58181-6_50(586-599)Online publication date: 3-Jul-2024
https://doi.org/10.1007/978-3-031-58181-6_50
Saluja DKukreja HSaini ATegwal DNagrath PHemanth J(2023)Analysis and comparison of various deep learning models to implement suspicious activity recognition in CCTV surveillanceIntelligent Decision Technologies10.3233/IDT-23046917:4(917-942)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.3233/IDT-230469
Show More Cited By

Index Terms

Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Activity detection and recognition of daily living events
MIIRH '13: Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare

Activity recognition is one of the most active topics within computer vision. Despite its popularity, its application in real life scenarios is limited because many methods are not entirely automated and consume high computational resources for ...
Activity detection using Sequential Statistical Boundary Detection (SSBD)

We propose a novel activity detection scheme tailored for home environment scenes.We introduce three new action datasets for action detection evaluation.Fast spatio-temporal action localization with the use of statistical tools. The spiralling increase ...
Towards unobtrusive detection and realistic attribute analysis of daily activity sequences using a finger-worn device

Detection and analysis of activities of daily living (ADLs) are important in activity tracking, security monitoring, and life support in elderly healthcare. Recently, many research projects have employed wearable devices to detect and analyze ADLs. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

August 2021

715 pages

ISBN:9781450384636

DOI:10.1145/3460426

General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

European Commission

Conference

ICMR '21

Sponsor:

SIGMM

ICMR '21: International Conference on Multimedia Retrieval

August 21 - 24, 2021

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
179
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)4

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vadicamo LArnold RBailer WCarrara FGurrin CHezel NLi XLokoc JLubos SMa ZMessina NNguyen TPeska LRossetto LSauter LSchöffmann KSpiess FTran MVrochidis S(2024)Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS CompetitionIEEE Access10.1109/ACCESS.2024.340563812(79342-79366)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3405638
Anil KBouroche MSchoen-Phelan B(2024)Actor-Centric Spatio-Temporal Feature Extraction for Action RecognitionComputer Vision and Image Processing10.1007/978-3-031-58181-6_50(586-599)Online publication date: 3-Jul-2024
https://doi.org/10.1007/978-3-031-58181-6_50
Saluja DKukreja HSaini ATegwal DNagrath PHemanth J(2023)Analysis and comparison of various deep learning models to implement suspicious activity recognition in CCTV surveillanceIntelligent Decision Technologies10.3233/IDT-23046917:4(917-942)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.3233/IDT-230469
Lokoč JAndreadis SBailer WDuane AGurrin CMa ZMessina NNguyen TPeška LRossetto LSauter LSchall KSchoeffmann KKhan OSpiess FVadicamo LVrochidis S(2023)Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBSMultimedia Systems10.1007/s00530-023-01143-529:6(3481-3504)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.1007/s00530-023-01143-5
Gkougkoudis GPissanidis DDemertzis K(2022)Intelligence-Led Policing and the New Technologies Adopted by the Hellenic PoliceDigital10.3390/digital20200092:2(143-163)Online publication date: 29-Mar-2022
https://doi.org/10.3390/digital2020009
Dave IScheffer ZKumar AShiraz SRawat YShah M(2022)GabriellaV2: Towards better generalization in surveillance videos for Action Detection2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW54805.2022.00018(122-132)Online publication date: Jan-2022
https://doi.org/10.1109/WACVW54805.2022.00018

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents