Abstract
With the rapid development of detecting violent behaviors in surveillance cameras, requests on systems that automatically recognize violent events are expanded. Nowadays, violence detection has become an active research field in image processing and machine learning. The relevant works in such a field are classified into hand-crafted and deep learning methods. Despite the effectiveness of hand-crafted ones, their computational cost may be suppressive for practical applications. Additionally, deep learning techniques usually exploit 3D Convolutional Networks (3D ConvNets) to do this task. To improve the accuracy of these networks, meaningful regions and temporal changes in videos should be considered. Consequently, the performance of a 3D ConvNet can be reinforced by selecting significant temporal information and noticing to special regions in two spatial dimensions. In this work, we propose a novel 3D ConvNet along with a technique for extracting interest frames. The Structural Similarity Index Measure (SSIM) is exploited to extract interest frames as significant temporal information. Indeed, the SSIM uses the statistical features of two consecutive frames for this reason. In this way, sixteen video frames with the smallest SSIM are considered as dominant motion frames, which are then sent to a 3D CNN for classification. Moreover, a spatial attention module is exploited to make attention on the specific regions. Furthermore, three benchmark datasets are employed to evaluate the performance of the proposed method. The results show that in terms of accuracy, our scheme outperforms existing approaches.
Similar content being viewed by others
References
Bellamine I and Tairi H (2016) "Motion detection using color space-time interest points," in lecture notes in electrical engineering
Ben Mabrouk A, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recogn Lett 92:62–67
Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491
Bermejo Nievas E, Deniz Suarez O, Bueno García G and Sukthankar R (2011) "violence detection in video using computer vision techniques," in computer analysis of images and patterns, Violence detection in video using computer vision techniques.
Bilen H, Fernando B, Gavves E, Vedaldi A (Dec. 2018) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 40(12):2799–2813
Chen MY and Hauptmann A (2009) MoSIFT: Recognizing Human Actions in Surveillance Videos
Dai Q, Zhao R, Wu Z, Wang X, Gu Z, Wu W, Jiang Y (2015) "Fudan-Huawei at MediaEval 2015: detecting violent scenes and affective impact in movies with deep learning," in MediaEval
Dalal N, Triggs B and Schmid C (2006) "human detection using oriented histograms of flow and appearance," in computer vision – ECCV 2006, .
De Souza FD, Cha GC, Do Valle EA, De A, Araujo A "Violence detection in video using spatio-temporal features," 2010 23rd SIBGRAPI conference on graphics. Patt Images 2010
Deepak K, Vignesh LKP, Chandrakala S (2020) Autocorrelation of gradients based violence detection in surveillance videos. ICT Express 6(3):155–159
Demarty C, Penet C, Soleymani M, Gravier G (2014) VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74(17):7379–7404
Ding C, Fan S , Zhu M, Feng W and Jia B (2014) "Violence detection in video by using 3D convolutional neural networks," in Advances in Visual Computing. ISVC 2014. Lect Notes Comput Sci,.
Dong Z, Qin J and Wang Y, "Multi-stream deep networks for person to person violence detection in videos," in Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, Singapore, 2016.
Febin IP, Jayasree K, Joy PT (2020) Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement fltering algorithm. Pattern Anal Applic 23:611–623
Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vis Comput 48-49:37–41
Giannakopoulos T, Kosmopoulos D, Aristidou A and Theodoridis S (2006) "violence content classification using audio features," in advances in artificial intelligence,
Giannakopoulos T, Pikrakis A, Theodoridis S (2007) "A multi-class audio classification method with respect to violent content in movies using Bayesian networks," in 2007 IEEE 9th workshop on multimedia signal processing
Gu C, Wu X, Wang S (2020) Violent video detection based on semantic correspondence. IEEE Access:85958–85967
Harris C, Stephens M (1988) "a combined corner and edge detector," in Procedings of the Alvey vision conference
T. Hassner, Y. Itcher and O. Kliper-Gross (2012) "Violent flows: Real-time detection of violent crowd behavior," in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, .
Jain A, Vishwakarma DK (2020) "Deep NeuralNet for violence detection using motion features from dynamic images," in 2020 third international conference on smart systems and inventive technology (ICSSIT). Tirunelveli, India
Keçeli A, Kaya A (2017) Violent activity detection with transfer learning method. Electron Lett 53(13):1047–1048
Kooij J, Liem M, Krijnders J, Andringa T, Gavrila D (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Laptev and Lindeberg (2003) "Space-time interest points," Proceedings Ninth IEEE International Conference on Computer Vision,
Liang Q, Li Y, Chen B, Yang K (2021) Violence behavior recognition of two-Cascade temporal shift module with attention mechanism. J Electronic Imaging 30(4)
Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. Int J Comput Vis 60(2):91–110
Mahmoodi J, Salajeghe A (2019) A classification method based on optical flow for violence detection. 127:121–127
Meng Z, Yuan J and Li Z (2017) "Trajectory-pooled deep convolutional networks for violence detection in videos," in lecture notes in computer science,
Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. Expert Syst Appl
Ramzan M, Abid A, Khan HU, Awan SM, Ismail A, Ahmed M, Mahmood A (2019) A review on state-of-the-art violence detection techniques. IEEE Access 7:107560–107575
Rendón-Segador FJ, Álvarez-García JA, Enríquez F, Deniz O (2021) ViolenceNet: Dense multi-head self-attention with bidirectional Convolutional LSTM for detecting violence. Electronics 10(13):1601
Roman DG, Chavez GC (2020) "violence detection and localization in surveillance video," in 2020 33rd SIBGRAPI conference on graphics. Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil
Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G (2018) Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans Image Process 27(10):4787–4797
Shi X, Chen Z, Wang H, Yeung DY (2015) "convolutional LSTM network: a machine learning approach for precipitation Nowcasting," in neural information processing systems (NIPS). Montreal, Canada
Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7:39172–39179
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Studholme C, Hill D, Hawkes D (1999) An overlap invariant entropy measure of 3D medical image alignment. Pattern Recogn 32(1):71–86
Sudhakaran S, Lanz O (2017) "Learning to detect violent videos using convolutional long short-term memory," in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS)
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) "Learning spatiotemporal features with 3D convolutional networks," in 2015 IEEE international conference on computer vision (ICCV)
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. Computer Vision – ECCV 2018:3–19
Xia Q, Zhang P, Wang J, Tian M and Fei C(2018) "real time violence detection based on deep spatio-temporal features," in biometric recognition,
Xu L, Gong C, Yang J, Wu Q, Yao L (2014) "violent video detection based on MoSIFT feature and sparse coding," in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2015) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75(12):7327–7349
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahmoodi, J., Nezamabadi-pour, H. & Abbasi-Moghadam, D. Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimed Tools Appl 81, 20945–20961 (2022). https://doi.org/10.1007/s11042-022-12532-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12532-9