Abstract
Abnormal behavior detection, action recognition, fight and violence detection in videos is an area that has attracted a lot of interest in recent years. In this work, we propose an architecture that combines a Bidirectional Gated Recurrent Unit (BiGRU) and a 2D Convolutional Neural Network (CNN) to detect violence in video sequences. A CNN is used to extract spatial characteristics from each frame, while the BiGRU extracts temporal and local motion characteristics using CNN extracted features from multiple frames. The proposed end-to-end deep learning network is tested in three public datasets with varying scene complexities. The proposed network achieves accuracies up to 98%. The obtained results are promising and show the performance of the proposed end-to-end approach.
Thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference number RGPIN-2018-06233].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdali, A.M.R., Al-Tuma, R.F.: Robust real-time violence detection in video using CNN And LSTM. In: 2019 2nd Scientific Conference of Computer Sciences (SCCS), pp. 104–108, March 2019
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893, June 2005
Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Bebis, G., et al. (eds.) ISVC 2014. LNCS, vol. 8888, pp. 551–558. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14364-4_53
Ditsanthia, E., Pipanmaekaporn, L., Kamonsantiroj, S.: Video representation learning for CCTV-based violence detection. In: 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON), pp. 1–5, December 2018
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6, June 2012
Kanai, S., Fujiwara, Y., Iwamura, S.: Preventing gradient explosions in gated recurrent units. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS 2017, pp. 435–444. Curran Associates Inc., Red Hook (2017)
Kanojia, G., Kumawat, S., Raman, S.: Exploring temporal differences in 3D convolutional neural networks (2019)
Li, C., Zhu, L., Zhu, D., Chen, J., Pan, Z., Li, X., Wang, B.: End-to-end multiplayer violence detection based on deep 3D CNN. In: Proceedings of the 2018 VII International Conference on Network, Communication and Computing, ICNCC 2018, Taipei City, Taiwan, pp. 227–230. ACM, New York (2018)
Mohamed, E., Mohamad, H., Massih, M.A.E.: Real life violence situations dataset. https://kaggle.com/mohamedmustafa/real-life-violence-situations-dataset. Accessed January 2020
Morales, G., Salazar-Reque, I., Telles, J., Díaz, D.: Detecting violent robberies in CCTV videos using deep learning. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2019. IAICT, vol. 559, pp. 282–291. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19823-7_23
Mt, S.: Increasing crimes vs. population density in megacities. Sociol. Criminol.-Open Access 4(1), 1–2 (2016)
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 568–576. Curran Associates, Inc. (2014)
Soliman, M.M., Kamal, M.H., El-Massih Nashed, M.A., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. In: 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85, December 2019. https://doi.org/10.1109/ICICIS46948.2019.9014714
Song, W., Zhang, D., Zhao, X., Yu, J., Zheng, R., Wang, A.: A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7, 39172–39179 (2019)
Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6, August 2017
Sumon, S.A., Shahria, M.D.T., Goni, M.D.R., Hasan, N., Almarufuzzaman, A.M., Rahman, R.M.: Violent crowd flow detection using deep learning. In: Nguyen, N.T., Gaol, F.L., Hong, T.-P., Trawiński, B. (eds.) ACIIDS 2019. LNCS (LNAI), vol. 11431, pp. 613–625. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14799-0_53
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946 [cs, stat], May 2019. arXiv: 1905.11946
Tang, Z., Shi, Y., Wang, D., Feng, Y., Zhang, S.: Memory visualization for gated recurrent neural networks in speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2736–2740, March 2017. https://doi.org/10.1109/ICASSP.2017.7952654
Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors (Basel, Switz.) 19(11), 2472 (2019)
Xu, X., Wu, X., Wang, G., Wang, H.: Violent video classification based on spatial-temporal cues using deep learning. In: 2018 11th International Symposium on Computational Intelligence and Design (ISCID), vol. 01, pp. 319–322, December 2018
Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. J. Phys: Conf. Ser. 844, 012044 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Traoré, A., Akhloufi, M.A. (2020). 2D Bidirectional Gated Recurrent Unit Convolutional Neural Networks for End-to-End Violence Detection in Videos. In: Campilho, A., Karray, F., Wang, Z. (eds) Image Analysis and Recognition. ICIAR 2020. Lecture Notes in Computer Science(), vol 12131. Springer, Cham. https://doi.org/10.1007/978-3-030-50347-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-50347-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50346-8
Online ISBN: 978-3-030-50347-5
eBook Packages: Computer ScienceComputer Science (R0)