short-paper

Violence Detection in Videos based on CNN feature for ConvLSTM2D

Authors:

Thanh-Sang Vu-Ngoc,

Lam-Thuy Le-Nhi,

Thai-Binh Nguyen,

The-Bao PhamAuthors Info & Claims

ICDAR '24: Proceedings of the 5th ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

Pages 33 - 36

https://doi.org/10.1145/3643488.3660306

Published: 11 June 2024 Publication History

Abstract

The prevalence of violence has become increasingly widespread across most countries worldwide. Consequently, it is an important task to develop an effective system that can detect, alert, and prevent violence through video surveillance. In this study, we develop an automated system for detecting violent and non-violent incidents in video footage. Specifically, we introduce a method based on a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to identify violence or non-violence in videos by utilizing both image and motion features. The CNN model based on VGG19 architecture and with advanced recurrent neural network models using Convolutional Long Short-Term Memory (ConvLSTM) are employed. Our method employs CNN to extract meaningful representations from input images. These features are then fed into RNN to learn contextual information effectively. Experimental results show that our approach obtains promising results, with an accuracy of 97.96% on the Hockey dataset, 97.92% on the combined dataset of Hockey and Movies, and 96.9% on the combined dataset of Hockey, Movies, and Violent Flow.

References

[1]

[1]M. Ramzan et al. 2019. A review on state-of-the-art violence detection techniques," IEEE Access. 7, (2019), 107560-107575.

[2]

[2]F. A. Pujol, H. Mora, and M. L. Pertegal. 2020. A soft computing approach to violence detection in social media for smart cities. Soft Computing, 24, 15 (2020), 11007-11017.

Digital Library

[3]

[3] S. Sarman and M. Sert. 2018. Audio based violent scene classification using ensemble learning. In 2018 6th International Symposium on Digital Forensic and Security (ISDFS). (2018), 1-5.

[4]

[4] A. M. Yildiz et al. 2023. A novel tree pattern-based violence detection model using audio signals. Expert Systems with Applications. 224, (2023), 120031.

Digital Library

[5]

[5] A. Ben Mabrouk and E. Zagrouba. 2017. Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognition Letters. 92, (2017), 62-67.

Digital Library

[6]

[6] J. Ha, J. Park, H. Kim, H. Park, and J. Paik. 2018. Violence detection for video surveillance system using irregular motion information. In 2018 International Conference on Electronics, Information, and Communication (ICEIC). (2018), 1-3.

[7]

[7] J. Li, X. Jiang, T. Sun, and K. Xu. 2019. Efficient violence detection using 3D convolutional neural networks. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). (2019), 1-8.

[8]

[8] E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, and R. Sukthankar. 2011. Violence detection in video using computer vision techniques. In Computer Analysis of Images and Patterns, Berlin, Heidelberg, P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano, and W. Kropatsch, Eds., 2011// 2011: Springer Berlin Heidelberg, 332-339.

[9]

[9] P. Bilinski and F. Bremond. 2016. Human violence recognition and detection in surveillance videos. In 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). (2019), 30-36.

[10]

[10] E. G. Krug, J. A. Mercy, L. L. Dahlberg, and A. B. Zwi. 2002. The world report on violence and health. The Lancet, 360, 9339 (2002), 1083-1088.

[11]

[11] F. U. Ullah, A. Ullah, K. Muhammad, I. U. Haq, and S. W. Baik. 2019. Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors, 19, 11 (2019).

[12]

[12] S. Sudhakaran and O. Lanz. 2017. Learning to detect violent videos using convolutional long short-term memory. In 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS). (2017), 1-6.

[13]

[13] T. Hassner, Y. Itcher, and O. Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. (2012), 1-6.

[14]

[14] M. Sharma and R. Baghel. 2020. Video Surveillance for Violence Detection Using Deep Learning. In Advances in Data Science and Management, Singapore, S. Borah, V. Emilia Balas, and Z. Polkowski, Eds., 2020// 2020: Springer Singapore, pp. 411-420.

[15]

[15] T. T. Dat et al. 2022. An improved CRNN for Vietnamese identity card information recognition. Computer Systems Science and Engineering. 40, 2 (2022), 539-555.

[16]

[16] T. T. Dat, L. T. A. Dang, V. N. T. Sang, L. N. L. Thuy, and P. T. Bao. 2021. Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room. Int. J. Intell. Inf. Database Syst. 14, 3 (2021), 294–314.

Digital Library

Index Terms

Violence Detection in Videos based on CNN feature for ConvLSTM2D
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Real Life Violence Detection in Surveillance Videos using Spatiotemporal Features
IC3-2021: Proceedings of the 2021 Thirteenth International Conference on Contemporary Computing

Automatic violence detection has remarkable importance from practical and academic point of view. Generally speaking, detecting violence in a crowded locality, via computational approaches, is challenging owing to rapid movements, overlapping ...
End-to-end Multiplayer Violence Detection based on Deep 3D CNN
ICNCC '18: Proceedings of the 2018 VII International Conference on Network, Communication and Computing

Numerous behavior recognition researches have focused on UCF-101 video dataset, such as sports, cooking and other simple routines. Yet these studies are less useful in real-life surveillance scenarios. Violence detection in crowded scenes (such as ...
Maxout neurons for deep convolutional and LSTM neural networks in speech recognition

We combine maxout neurons with convolutional and LSTM structures for DNNs.The optimal network structures and training strategies are explored for the models.Experiments are carried out for 6 languages on the IARPA Babel data sets.State-of-the-art ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICDAR '24: Proceedings of the 5th ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

June 2024

48 pages

ISBN:9798400705496

DOI:10.1145/3643488

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

This research is funded by the University of Economics Ho Chi Minh City (UEH) Vietnam

Conference

ICMR '24

Sponsor:

SIGMM

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
73
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten