Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System
<p>Structural flowchart of anomaly surveillance and security system for multi-modal fusion.</p> "> Figure 2
<p>Multi-modal anomaly surveillance and security workflow.</p> "> Figure 3
<p>Open datasets for anomaly detection.</p> "> Figure 4
<p>Data generation sample using Grand Theft Auto V.</p> "> Figure 5
<p>Data process flow for multiple deep learning models.</p> "> Figure 6
<p>(<b>a</b>) Labeling frames for detection model, and (<b>b</b>) XML annotation file and anomaly video for classification model.</p> "> Figure 7
<p>3D-AE workflow with skip-frame methodology.</p> "> Figure 8
<p>3D-AE structure.</p> "> Figure 9
<p>SlowFast neural network structure.</p> "> Figure 10
<p>ROC curve and AUC score of GTA dataset.</p> "> Figure 11
<p>AUC score for each class.</p> "> Figure 12
<p>Experimental results based on anomaly score: classes of (<b>a</b>) explosion, (<b>b</b>) assault, (<b>c</b>) shooting, and (<b>d</b>) trespass.</p> "> Figure 13
<p>Experimental results of SlowFast models using GTA datasets (our dataset).</p> "> Figure 14
<p>Experimental results of GTA (our dataset) and UCF-101 and HMDB-51 datasets.</p> "> Figure 15
<p>Experimental results of preprocessing schemes (box crop vs. random crop).</p> "> Figure 16
<p>Assault data-constructed sample from real world.</p> "> Figure 17
<p>Proposed system test results: (<b>a</b>) input data, (<b>b</b>) classification result, and (<b>c</b>) classification result-masked VAD result.</p> ">
Abstract
:1. Introduction
2. Related Works
2.1. Unsupervised Learning-Based Video Anomaly Detection
2.2. Supervised Learning-Based Video Anomaly Classification
3. Convergent Analysis and Preprocessing
3.1. Dataset
3.2. Data Generation and Supplement
3.3. Data Preprocessing Module
4. System Materials and Methods
4.1. Anomaly Detection
4.1.1. Three-Dimensional Convolutional AutoEncoder
4.1.2. Skip-Frame Methodology
4.2. Anomaly Classification
5. Experimental Results
5.1. Experimental Setup
5.2. Evaluation of Anomaly Detection Models
5.3. Evaluation of Anomaly Classification Models with 3D-AE Preprocessing
5.4. Evaluation of the System Performance in Real World
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hidayat, F. Intelligent video analytic for suspicious object detection: A systematic review. In Proceedings of the International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 19–20 November 2020; pp. 1–8. [Google Scholar]
- Suk, H.; Kim, M. Deep learning based scheme for developing secure systems in CCTV using anomaly detection. In Proceedings of the International Conference WISA, Jeju Island, Republic of Korea, 24–26 August 2022; p. 435. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6201–6210. [Google Scholar]
- Jeong, J.; Kim, M. Study of technology for anomaly detection in secure edge system via video surveillance. In Proceedings of the International Conference WISA, Jeju Island, Republic of Korea, 24–26 August 2022; p. 426. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Jia, D.; Zhang, X.; Zhou, J.; Lai, P.; Wei, Y. Dynamic thresholding for video anomaly detection. IET Image Process 2022, 16, 2973–2982. [Google Scholar] [CrossRef]
- Shukla, V.; Singh, K.G.; Shah, P. Automatic alert of security threat through video surveillance system. In Proceedings of the Institute of Nuclear Material and Management Annual Meeting, Atlanta, GA, USA, 20–24 July 2013. [Google Scholar]
- Prakash, M.U.; Thamaraiselvi, G.V. Detecting and tracking of multiple moving objects for intelligent video surveillance systems. In Proceedings of the International Conference on Current Trends In Engineering and Technology (ICCTET), Coimbatore, India, 8 July 2014; pp. 253–257. [Google Scholar]
- Wang, H.; Zhang, X.; Yang, S.; Zhang, W. Video anomaly detection by the duality of normality-granted optical flow. arXiv 2021, arXiv:2105.04302. [Google Scholar]
- Gong, D.; Liu, L.; Saha, B.; Le, V.; Mansour, R.M. Memorizing normality to detect anomaly: Memory-augmented deep auto-encoder for unsupervised anomaly detection. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1705–1714. [Google Scholar]
- Ganokratanaa, T.; Aramvith, S.; Sebe, N. Unsupervised anomaly detection and localization based on deep spatiotemporal translation network. IEEE Access 2020, 8, 50312–50329. [Google Scholar] [CrossRef]
- Ionescu, T.R.; Khan, S.F.; Georgescu, I.M.; Shao, L. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7834–7843. [Google Scholar]
- Markovitz, A.; Sharir, G.; Friedman, I.; Zelnik-Manor, L.; Avidan, S. Graph embedded pose clustering for anomaly detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10536–10544. [Google Scholar]
- Hu, J.; Zhu, E.; Wang, S.; Liu, X.; Guo, X. An efficient and robust unsupervised anomaly detection method using ensemble random projection in surveillance videos. Sensors 2019, 19, 4145. [Google Scholar] [CrossRef] [PubMed]
- Astrid, M.; Zaheer, M.; Lee, J.; Lee, S. Learning not to reconstruct anomalies. In Proceedings of the British Machine Vision Conference (BMVC), Online, 22–25 November 2021. [Google Scholar]
- Bhatti, T.M.; Khan, G.M.; Aslam, M.; Fiaz, J.M. Weapon detection in real-time CCTV videos using deep learning. IEEE Access 2021, 9, 34366–34382. [Google Scholar] [CrossRef]
- Xu, J. A deep learning approach to building an intelligent video surveillance system. Multimed. Tools Appl. 2021, 80, 5495–5515. [Google Scholar] [CrossRef]
- Amrutha, V.C.; Jyotsna, C.; Amudha, J. Deep learning approach for suspicious activity detection from surveillance video. In Proceedings of the International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 5–7 March 2020; pp. 335–339. [Google Scholar]
- Singh, V.; Singh, S.; Gupta, P. Real-time anomaly recognition through CCTV using neural networks. Procedia Comput. Sci. 2020, 173, 254–263. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Agarwal, M.; Parashar, P.; Mathur, A.; Utkarsh, K.; Sinha, A. Suspicious activity detection in surveillance applications using slowfast convolutional neural network. Adv. Data Comput. Commun. Secur. 2022, 106, 647–658. [Google Scholar]
- Lu, C.; Shi, J.; Jia, J. Abnormal event detection at 150 FPS in MATLAB. In Proceedings of the International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013; pp. 2720–2727. [Google Scholar]
- Luo, W.; Liu, W.; Gao, S. A revisit of sparse coding based anomaly detection in stacked RNN framework. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 341–349. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A dataset of 101 human actions classes from videos in The Wild. arXiv 2012, arXiv:1212.0402. [Google Scholar]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Ritcher, S.R.; Vitnee, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Cao, Z.; Gao, H.; Mangalam, K.; Cai, Q.; Vo, M. Long-term human motion prediction with scene context. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Munawar, A.; Vinayavekhin, P.; Magitris, G.D. Limiting the reconstruction capability of generative neural network using negative learning. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan, 25–28 September 2017; pp. 1–6. [Google Scholar]
- Zaheer, M.Z.; Lee, J.; Astrid, M.; Lee, S. Old is gold: Redefining the adversarially learned one-class classifier training paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14171–14181. [Google Scholar]
- Zong, B.; Song, Q.; Min, R.M.; Cheng, W.; Lumezanu, C. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Park, H.; Noh, J.; Ham, B. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14360–14369. [Google Scholar]
Model | Dataset | AUC Score (%) |
---|---|---|
Skip-frame-based 3D AutoEncoder | Avenue | 84.67 |
Skip-frame-based 3D AutoEncoder | ShanghaiTech | 75.97 |
Skip-frame-based 3D AutoEncoder | GTA (ours) | 79.95 |
Model | Dataset | Preprocessing | Accuracy (%) | Loss |
---|---|---|---|---|
SlowFast 4 × 16 ResNet-50 | GTA (ours) | Box crop | 78.75 | 0.54 |
SlowFast 4 × 16 ResNet-101 | GTA (ours) | Box crop | 75.00 | 0.67 |
SlowFast 8 × 8 ResNet-50 | GTA (ours) | Box crop | 85.00 | 0.53 |
SlowFast 8 × 8 ResNet-101 | GTA (ours) | Box crop | 80.00 | 0.57 |
SlowFast 8 × 8 ResNet-50 | GTA (ours) | Random crop | 77.50 | 0.71 |
SlowFast 8 × 8 ResNet-101 | UCF-101 | Random crop | 72.31 | 1.02 |
SlowFast 8 × 8 ResNet-101 | HMDB-101 | Random crop | 37.50 | 5.81 |
Data Amount | True Positive (TP) | Recall (%) |
---|---|---|
1345 | 1123 | 83.49 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jeong, J.-h.; Jung, H.-h.; Choi, Y.-h.; Park, S.-h.; Kim, M.-s. Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System. Sensors 2023, 23, 9214. https://doi.org/10.3390/s23229214
Jeong J-h, Jung H-h, Choi Y-h, Park S-h, Kim M-s. Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System. Sensors. 2023; 23(22):9214. https://doi.org/10.3390/s23229214
Chicago/Turabian StyleJeong, Jae-hyeok, Hwan-hee Jung, Yong-hoon Choi, Seong-hee Park, and Min-suk Kim. 2023. "Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System" Sensors 23, no. 22: 9214. https://doi.org/10.3390/s23229214
APA StyleJeong, J. -h., Jung, H. -h., Choi, Y. -h., Park, S. -h., & Kim, M. -s. (2023). Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System. Sensors, 23(22), 9214. https://doi.org/10.3390/s23229214