Deep Learning for Abnormal Human Behavior Detection in Surveillance Videos—A Survey
<p>Trend in the number of publications on deep learning for abnormal human behavior detection over the past five years (2019–2023).</p> "> Figure 2
<p>Distribution of related published papers on abnormal human behavior detection by search engines.</p> "> Figure 3
<p>The organizational structure of the survey.</p> "> Figure 4
<p>Sample abnormal behavior images from each dataset listed in <a href="#electronics-13-02579-t002" class="html-table">Table 2</a>.</p> "> Figure 5
<p>Reconstruction-based AHB detection results using AE on the CUHK dataset.</p> "> Figure 6
<p>Reconstruction-based AHB detection results using VAE on the CUHK dataset.</p> "> Figure 7
<p>Reconstruction-based AHB detection using CAE on the Ped2 and CUHK datasets.</p> "> Figure 8
<p>Generative AHB detection results on the Ped1, Ped2, CUHK, and ST datasets.</p> "> Figure 9
<p>Weakly supervised AHB detection results on the UCF-Crime and ST datasets.</p> ">
Abstract
:1. Introduction
1.1. Literature Review Methodology
1.2. Contributions of the Paper
- Categorizing deep learning techniques for abnormal human behavior detection into three main detection approaches: unsupervised, partially supervised, and fully supervised.
- Discussing the strengths and drawbacks of each learning scheme for training a deep learning model for abnormal human behavior detection.
- Conducting a comprehensive comparison of the performances of deep-learning-based abnormal human behavior detection techniques on popular benchmarking datasets.
- Exploring open research issues in the field of abnormal human behavior detection in surveillance videos.
1.3. Organization of the Paper
2. Abnormal Human Behavior Detection
2.1. Types of Abnormal Behaviors
2.1.1. Short-Term Abnormal Behaviors
2.1.2. Long-Term Abnormal Behaviors
2.2. Prior Research on Abnormal Behavior Recognition
3. Datasets
4. Deep Learning Techniques for Abnormal Human Behavior Detection
4.1. Unsupervised Approach
4.1.1. Reconstruction-Based Detection
4.1.2. Generative Detection
4.2. Partially Supervised Approach
4.2.1. Semi-Supervised Detection
4.2.2. Weakly Supervised Detection
4.3. Fully Supervised Approach
4.4. Summary: Advantages and Disadvantages
5. Open Research Issues
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ito, R.; Tsukada, M.; Kondo, M.; Matsutani, H. An Adaptive Abnormal Behavior Detection using Online Sequential Learning. In Proceedings of the 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), New York, NY, USA, 1–3 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 436–440. [Google Scholar] [CrossRef]
- Antonakaki, P.; Kosmopoulos, D.; Perantonis, S.J. Detecting abnormal human behaviour using multiple cameras. Signal Process. 2009, 89, 1723–1738. [Google Scholar] [CrossRef]
- Kim, D.; Kim, H.; Mok, Y.; Paik, J. Real-Time Surveillance System for Analyzing Abnormal Behavior of Pedestrians. Appl. Sci. 2021, 11, 6153. [Google Scholar] [CrossRef]
- Yoon, Y.-I.; Chun, J.-A. Tracking Model for Abnormal Behavior from Multiple Network CCTV Using the Kalman Filter. In Computer Science and Its Applications: Ubiquitous Information Technologies; Springer: Berlin/Heidelberg, Germany, 2015; pp. 933–939. [Google Scholar] [CrossRef]
- Park, H.-J. A Study on Monitoring System for an Abnormal Behaviors by Object’s Tracking. J. Digit. Contents Soc. 2013, 14, 589–596. [Google Scholar] [CrossRef]
- Patwal, A.; Diwakar, M.; Tripathi, V.; Singh, P. An investigation of videos for abnormal behavior detection. Procedia Comput. Sci. 2023, 218, 2264–2272. [Google Scholar] [CrossRef]
- Tay, N.C.; Connie, T.; Ong, T.S.; Teoh, A.B.J.; Teh, P.S. A Review of Abnormal Behavior Detection in Activities of Daily Living. IEEE Access 2023, 11, 5069–5088. [Google Scholar] [CrossRef]
- Wu, C.; Cheng, Z. A Novel Detection Framework for Detecting Abnormal Human Behavior. Math. Probl. Eng. 2020, 2020, 6625695. [Google Scholar] [CrossRef]
- Yan, M.; Xiong, Y.; She, J. Memory Clustering Autoencoder Method for Human Action Anomaly Detection on Surveillance Camera Video. IEEE Sens. J. 2023, 23, 20715–20728. [Google Scholar] [CrossRef]
- Sinulingga, H.R.; Kong, S.G. Key-Frame Extraction for Reducing Human Effort in Object Detection Training for Video Surveillance. Electronics 2023, 12, 2956. [Google Scholar] [CrossRef]
- Wei, H.; Kehtarnavaz, N. Simultaneous Utilization of Inertial and Video Sensing for Action Detection and Recognition in Continuous Action Streams. IEEE Sens. J. 2020, 20, 6055–6063. [Google Scholar] [CrossRef]
- Kim, B.; Lee, J. A Video-Based Fire Detection Using Deep Learning Models. Appl. Sci. 2019, 9, 2862. [Google Scholar] [CrossRef]
- Wu, Q.; Zhou, Y.; Wu, X.; Liang, G.; Ou, Y.; Sun, T. Real-time running detection system for UAV imagery based on optical flow and deep convolutional networks. IET Intell. Transp. Syst. 2020, 14, 278–287. [Google Scholar] [CrossRef]
- Zhao, Z.; Lan, S.; Zhang, S. Human Pose Estimation based Speed Detection System for Running on Treadmill. In Proceedings of the 2020 International Conference on Culture-Oriented Science & Technology (ICCST), Beijing, China, 28–31 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 524–528. [Google Scholar] [CrossRef]
- Chen, K.-Y.; Shin, J.; Hasan, M.A.M.; Liaw, J.-J. Deep Transfer Learning Based Real Time Fitness Movement Identification. In Proceedings of the 2022 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Shah Alam, Malaysia, 25 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 102–106. [Google Scholar] [CrossRef]
- Cao, Y.; Fan, S.; Cheng, W.; Zhao, Y.; Zheng, H.; Zhao, H. Human Body Movement Velocity Estimation Based on Binocular Video Streams. In Proceedings of the 2022 3rd International Conference on Computer Vision. Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 20–22 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 977–985. [Google Scholar] [CrossRef]
- Lao, S.; Wang, D.; Li, F.; Zhang, H. Human running detection: Benchmark; baseline. Comput. Vis. Image Underst. 2016, 153, 143–150. [Google Scholar] [CrossRef]
- Ha, T.V.; Nguyen, H.M.; Thanh, S.H.; Nguyen, B.T. Fall detection using mixtures of convolutional neural networks. Multimed. Tools Appl. 2023, 83, 18091–18118. [Google Scholar] [CrossRef]
- Yan, J.; Wang, X.; Shi, J.; Hu, S. Skeleton-Based Fall Detection with Multiple Inertial Sensors Using Spatial-Temporal Graph Convolutional Networks. Sensors 2023, 23, 2153. [Google Scholar] [CrossRef] [PubMed]
- Zi, X.; Chaturvedi, K.; Braytee, A.; Li, J.; Prasad, M. Detecting Human Falls in Poor Lighting: Object Detection and Tracking Approach for Indoor Safety. Electronics 2023, 12, 1259. [Google Scholar] [CrossRef]
- Zheng, K.; Li, B.; Li, Y.; Chang, P.; Sun, G.; Li, H.; Zhang, J. Fall detection based on dynamic key points incorporating preposed attention. Math. Biosci. Eng. 2023, 20, 11238–11259. [Google Scholar] [CrossRef]
- Hoang, V.-H.; Lee, J.W.; Piran, M.J.; Park, C.-S. Advances in Skeleton-Based Fall Detection in RGB Videos: From Handcrafted to Deep Learning Approaches. IEEE Access 2023, 11, 92322–92352. [Google Scholar] [CrossRef]
- Wastupranata, L.M.; Munir, R. Convolutional neural network-based crowd detection for COVID-19 social distancing protocol from unmanned aerial vehicles onboard camera. J. Appl. Remote Sens. 2023, 17, 44502. [Google Scholar] [CrossRef]
- Kalshetty, R.; Parveen, A. Abnormal event detection model using an improved ResNet101 in context aware surveillance system. Cogn. Comput. Syst. 2023, 5, 153–167. [Google Scholar] [CrossRef]
- Alafif, T.; Hadi, A.; Allahyani, M.; Alzahrani, B.; Alhothali, A.; Alotaibi, R.; Barnawi, A. Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds. Electronics 2023, 12, 1165. [Google Scholar] [CrossRef]
- Bhuiyan, M.R.; Abdullah, J.; Hashim, N.; Al Farid, F.; Uddin, J. Hajj pilgrimage abnormal crowd movement monitoring using optical flow and FCNN. J. Big Data 2023, 10, 86. [Google Scholar] [CrossRef]
- Hanif, M.S.; Bilal, M.; Balamash, A.S.; Al-Saggaf, U.M. Hypotheses Generation and Verification Based Framework for Crowd Anomaly Detection in Single-Scene Surveillance Videos. Trait. Signal 2023, 40, 115–122. [Google Scholar] [CrossRef]
- Castellano, G.; Cotardo, E.; Mencar, C.; Vessio, G. Density-based clustering with fully-convolutional networks for crowd flow detection from drones. Neurocomputing 2023, 526, 169–179. [Google Scholar] [CrossRef]
- Zubair, M.; Ali, A.; Naeem, S.; Anam, S. Video Streams for The Detection of Thrown Objects from Expressways. In Proceedings of the MOL2NET’22, Conference on Molecular, Biomedical & Computational Sciences and Engineering, 8th Ed.—MOL2NET: FROM MOLECULES TO NETWORKS, Paris, France, 1–15 January 2023; p. 13932. [Google Scholar] [CrossRef]
- Ali, M.M. Real-time video anomaly detection for smart surveillance. IET Image Process 2023, 17, 1375–1388. [Google Scholar] [CrossRef]
- Mahankali, S.; Kabbin, S.V.; Nidagundi, S.; Srinath, R. Identification of Illegal Garbage Dumping with Video Analytics. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2403–2407. [Google Scholar] [CrossRef]
- Chaturvedi, K.; Dhiman, C.; Vishwakarma, D.K. Fight detection with spatial and channel wise attention-based ConvLSTM model. Expert Syst. 2024, 41, e13474. [Google Scholar] [CrossRef]
- Pervaiz, M.; Shorfuzzaman, M.; Alsufyani, A.; Jalal, A.; Alsuhibany, S.A.; Park, J. Tracking and Analysis of Pedestrian’s Behavior in Public Places. Comput. Mater. Contin. 2023, 74, 841–853. [Google Scholar] [CrossRef]
- Alarfaj, M.; Pervaiz, M.; Ghadi, Y.Y.; al Shloul, T.; Alsuhibany, S.A.; Jalal, A.; Park, J. Automatic Anomaly Monitoring in Public Surveillance Areas. Intell. Autom. Soft Comput. 2023, 35, 2655–2671. [Google Scholar] [CrossRef]
- Jebur, S.A.; Hussein, K.A.; Hoomod, H.K.; Alzubaidi, L. Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection. Computers 2023, 12, 175. [Google Scholar] [CrossRef]
- Bashir, M.; Rundensteiner, E.A.; Ahsan, R. A deep learning approach to trespassing detection using video surveillance data. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3535–3544. [Google Scholar] [CrossRef]
- Zhang, Z.; Zaman, A.; Xu, J.; Liu, X. Artificial intelligence-aided railroad trespassing detection and data analytics: Methodology and a case study. Accid. Anal. Prev. 2022, 168, 106594. [Google Scholar] [CrossRef]
- Grabušić, S.; Barić, D. A Systematic Review of Railway Trespassing: Problems and Prevention Measures. Sustainability 2023, 15, 13878. [Google Scholar] [CrossRef]
- Zaman, A.; Ren, B.; Liu, X. Artificial Intelligence-Aided Automated Detection of Railroad Trespassing. Transp. Res. Rec. J. Transp. Res. Board. 2019, 2673, 25–37. [Google Scholar] [CrossRef]
- Havârneanu, G.M. Behavioural and organisational interventions to prevent trespass and graffiti vandalism on railway property. Proc. Inst. Mech. Eng. F J. Rail Rapid. Transit. 2017, 231, 1078–1087. [Google Scholar] [CrossRef]
- Zhang, T.; Aftab, W.; Mihaylova, L.; Langran-Wheeler, C.; Rigby, S.; Fletcher, D.; Maddock, S.; Bosworth, G. Recent Advances in Video Analytics for Rail Network Surveillance for Security, Trespass and Suicide Prevention—A Survey. Sensors 2022, 22, 4324. [Google Scholar] [CrossRef] [PubMed]
- Bamaqa, A.; Sedky, M.; Bosakowski, T.; Bastaki, B.B.; Alshammari, N.O. SIMCD: SIMulated crowd data for anomaly detection and prediction. Expert Syst. Appl. 2022, 203, 117475. [Google Scholar] [CrossRef]
- Mehmood, A. Abnormal Behavior Detection in Uncrowded Videos with Two-Stream 3D Convolutional Neural Networks. Appl. Sci. 2021, 11, 3523. [Google Scholar] [CrossRef]
- Pouyan, S.; Charmi, M.; Azarpeyvand, A.; Hassanpoor, H. Propounding First Artificial Intelligence Approach for Predicting Robbery Behavior Potential in an Indoor Security Camera. IEEE Access 2023, 11, 60471–60489. [Google Scholar] [CrossRef]
- Chen, H.; Bohush, R.; Kurnosov, I.; Ma, G.; Weichen, Y.; Ablameyko, S. Detection of Appearance and Behavior Anomalies in Stationary Camera Videos Using Convolutional Neural Networks. Pattern Recognit. Image Anal. 2022, 32, 254–265. [Google Scholar] [CrossRef]
- Patel, A.S.; Vyas, R.; Vyas, O.P.; Ojha, M.; Tiwari, V. Motion-compensated online object tracking for activity detection and crowd behavior analysis. Vis. Comput. 2023, 39, 2127–2147. [Google Scholar] [CrossRef] [PubMed]
- Wahyono; Harjoko, A.; Dharmawan, A.; Adhinata, F.D.; Kosala, G.; Jo, K.-H. Loitering Detection Using Spatial-Temporal Information for Intelligent Surveillance Systems on a Vision Sensor. J. Sens. Actuator Netw. 2023, 12, 9. [Google Scholar] [CrossRef]
- Huang, T.; Han, Q.; Min, W.; Li, X.; Yu, Y.; Zhang, Y. Loitering Detection Based on Pedestrian Activity Area Classification. Appl. Sci. 2019, 9, 1866. [Google Scholar] [CrossRef]
- Dwivedi, N.; Singh, D.K.; Kushwaha, D.S. An Approach for Unattended Object Detection through Contour Formation using Background Subtraction. Procedia Comput. Sci. 2020, 171, 1979–1988. [Google Scholar] [CrossRef]
- Agarwal, H.; Singh, G.; Siddiqui, M.A. Classification of Abandoned and Unattended Objects, Identification of Their Owner with Threat Assessment for Visual Surveillance. In Proceedings of 3rd International Conference on Computer Vision and Image Processing; Chaudhuri, B., Nakagawa, M., Khanna, P., Kumar, S., Eds.; Springer: Singapore, 2020; pp. 221–232. [Google Scholar] [CrossRef]
- Htun, B.; Sein, M.M. Observation of Unattended or Removed Object in Public Area for Security Monitoring System. In Genetic and Evolutionary Computing; Springer International Publishing: Cham, Switzerland, 2017; pp. 45–53. [Google Scholar] [CrossRef]
- Park, H.; Park, S.; Joo, Y. Robust Real-time Detection of Abandoned Objects using a Dual Background Model. KSII Trans. Internet Inf. Syst. 2020, 14, 771–788. [Google Scholar] [CrossRef]
- Bangare, P.S.; Bangare, S.L.; Yawle, R.U.; Patil, S.T. Detection of human feature in abandoned object with modern security alert system using Android Application. In Proceedings of the 2017 International Conference on Emerging Trends & Innovation in ICT (ICEI), Pune, India, 3–5 February 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 139–144. [Google Scholar] [CrossRef]
- Planinc, R.; Kampel, M. Detecting Unusual Inactivity by Introducing Activity Histogram Comparisons. In Proceedings of the 9th International Conference on Computer Vision Theory and Applications, SCITEPRESS—Science and and Technology Publications, Lisbon, Portugal, 5–8 January 2014; pp. 313–320. [Google Scholar] [CrossRef]
- Koehler, S.; Goldhammer, M.; Bauer, S.; Zecha, S.; Doll, K.; Brunsmann, U.; Dietmayer, K. Stationary Detection of the Pedestrian’s Intention at Intersections. IEEE Intell. Transp. Syst. Mag. 2013, 5, 87–99. [Google Scholar] [CrossRef]
- Yi, S.; Li, H.; Wang, X. Pedestrian Behavior Modeling From Stationary Crowds With Applications to Intelligent Surveillance. IEEE Trans. Image Process. 2016, 25, 4354–4368. [Google Scholar] [CrossRef] [PubMed]
- Deep, S.; Zheng, X.; Karmakar, C.; Yu, D.; Hamey, L.G.C.; Jin, J. A Survey on Anomalous Behavior Detection for Elderly Care Using Dense-Sensing Networks. IEEE Commun. Surv. Tutor. 2020, 22, 352–370. [Google Scholar] [CrossRef]
- Nayak, R.; Pati, U.C.; Das, S.K. A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis. Comput. 2021, 106, 104078. [Google Scholar] [CrossRef]
- Choudhry, N.; Abawajy, J.; Huda, S.; Rao, I. A Comprehensive Survey of Machine Learning Methods for Surveillance Videos Anomaly Detection. IEEE Access 2023, 11, 114680–114713. [Google Scholar] [CrossRef]
- Patrikar, D.R.; Parate, M.R. Anomaly detection using edge computing in video surveillance system: Review. Int. J. Multimed. Inf. Retr. 2022, 11, 85–110. [Google Scholar] [CrossRef]
- Xefteris, V.-R.; Tsanousa, A.; Meditskos, G.; Vrochidis, S.; Kompatsiaris, I. Performance, Challenges, and Limitations in Multimodal Fall Detection Systems: A Review. IEEE Sens. J. 2021, 21, 18398–18409. [Google Scholar] [CrossRef]
- Roka, S.; Diwakar, M.; Singh, P.; Singh, P. Anomaly behavior detection analysis in video surveillance: A critical review. J. Electron. Imaging 2023, 32, 42106. [Google Scholar] [CrossRef]
- Newaz, N.T.; Hanada, E. The Methods of Fall Detection: A Literature Review. Sensors 2023, 23, 5212. [Google Scholar] [CrossRef] [PubMed]
- Jenga, K.; Catal, C.; Kar, G. Machine learning in crime prediction. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 2887–2913. [Google Scholar] [CrossRef]
- Pandiaraja, P.; Saarumathi, R.; Parashakthi, M.; Logapriya, R. An Analysis of Abnormal Event Detection and Person Identification from Surveillance Cameras using Motion Vectors with Deep Learning. In Proceedings of the 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2–4 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1225–1232. [Google Scholar] [CrossRef]
- Zhou, Z.-H.; Schwenker, F. Partially Supervised Learning; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar] [CrossRef]
- Ren, J.; Xia, F.; Liu, Y.; Lee, I. Deep Video Anomaly Detection: Opportunities and Challenges. In Proceedings of the 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, 7–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 959–966. [Google Scholar] [CrossRef]
- Hao, Y.; Tang, Z.; Alzahrani, B.; Alotaibi, R.; Alharthi, R.; Zhao, M.; Mahmood, A. An End-to-End Human Abnormal Behavior Recognition Framework for Crowds With Mentally Disordered Individuals. IEEE J. Biomed. Health Inf. 2022, 26, 3618–3625. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Li, G.; Xu, Q.; Zhang, X.; Su, L.; Huang, Q. Weakly Supervised Anomaly Detection in Videos Considering the Openness of Events. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21687–21699. [Google Scholar] [CrossRef]
- Zhu, S.; Chen, C.; Sultani, W. Video Anomaly Detection for Smart Surveillance. In Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Wang, Y.; Qin, C.; Bai, Y.; Xu, Y.; Ma, X.; Fu, Y. Making Reconstruction-based Method Great Again for Video Anomaly Detection. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 28 November–1 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1215–1220. [Google Scholar] [CrossRef]
- Ganokratanaa, T.; Aramvith, S.; Sebe, N. Anomaly Event Detection Using Generative Adversarial Network for Surveillance Videos. In Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 18–21 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1395–1399. [Google Scholar] [CrossRef]
- Popoola, O.P.; Wang, K. Video-Based Abnormal Human Behavior Recognition—A Review. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2012, 42, 865–878. [Google Scholar] [CrossRef]
- Wu, X.; Ou, Y.; Qian, H.; Xu, Y. A detection system for human abnormal behavior. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 1204–1208. [Google Scholar] [CrossRef]
- Fei, F.; Fang, Z.; Shu, L. A fast algorithm based on human visual system for abnormal event detection. In Proceedings of the 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China, 21–23 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 185–189. [Google Scholar] [CrossRef]
- Tran, C.H.; Kong, S.G. An Iterative Learning Scheme with Binary Classifier for Improved Event Detection in Surveillance Video. Electronics 2023, 12, 3275. [Google Scholar] [CrossRef]
- Jin, C.; Wang, T.; Alhusaini, N.; Zhao, S.; Liu, H.; Xu, K.; Zhang, J. Video Fire Detection Methods Based on Deep Learning: Datasets, Methods, and Future Directions. Fire 2023, 6, 315. [Google Scholar] [CrossRef]
- Cao, X.; Su, Y.; Geng, X.; Wang, Y. YOLO-SF: YOLO for Fire Segmentation Detection. IEEE Access 2023, 11, 111079–111092. [Google Scholar] [CrossRef]
- Yam, C.; Nixon, M.S.; Carter, J.N. On the relationship of human walking and running: Automatic person identification by gait. In Object Recognition Supported by User Interaction for Service Robots; IEEE Computer Society: Washington, DC, USA, 2002; pp. 287–290. [Google Scholar] [CrossRef]
- Gutiérrez, J.; Martin, S.; Rodriguez, V. Human stability assessment and fall detection based on dynamic descriptors. IET Image Process 2023, 17, 3177–3195. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Shubber, M.S.M.; Al-Ta’i, Z.T.M. A review on video violence detection approaches. Int. J. Nonlinear Anal. Appl. (IJNAA) 2022, 13, 1117–1130. [Google Scholar] [CrossRef]
- Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
- Espinosa, R.; Ponce, H.; Gutiérrez, S.; Martínez-Villaseñor, L.; Brieva, J.; Moya-Albor, E. A vision-based approach for fall detection using multiple cameras and convolutional neural networks: A case study using the UP-Fall detection dataset. Comput. Biol. Med. 2019, 115, 103520. [Google Scholar] [CrossRef]
- Gomes, M.E.N.; Macêdo, D.; Zanchettin, C.; de-Mattos-Neto, P.S.G.; Oliveira, A. Multi-human Fall Detection and Localization in Videos. Comput. Vis. Image Underst. 2022, 220, 103442. [Google Scholar] [CrossRef]
- Chandrakala, S.; Vignesh, L.K.P. V2AnomalyVec: Deep Discriminative Embeddings for Detecting Anomalous Activities in Surveillance Videos. IEEE Trans. Comput. Soc. Syst. 2022, 9, 1307–1316. [Google Scholar] [CrossRef]
- Gandapur, M.Q. E2E-VSDL: End-to-end video surveillance-based deep learning model to detect and prevent criminal activities. Image Vis. Comput. 2022, 123, 104467. [Google Scholar] [CrossRef]
- Sivachandiran, S.; Mohan, K.J.; Nazer, G.M. Deep Learning driven automated person detection and tracking model on surveillance videos. Meas. Sens. 2022, 24, 100422. [Google Scholar] [CrossRef]
- Ahn, J.; Park, J.; Lee, S.S.; Lee, K.-H.; Do, H.; Ko, J. SafeFac: Video-based smart safety monitoring for preventing industrial work accidents. Expert. Syst. Appl. 2023, 215, 119397. [Google Scholar] [CrossRef]
- Onyema, E.M.; Balasubaramanian, S.; Suguna S, K.; Iwendi, C.; Prasad, B.V.V.S.; Edeh, C.D. Remote monitoring system using slow-fast deep convolution neural network model for identifying anti-social activities in surveillance applications. Meas. Sens. 2023, 27, 100718. [Google Scholar] [CrossRef]
- Hussain, A.; Khan, S.U.; Khan, N.; Rida, I.; Alharbi, M.; Baik, S.W. Low-light aware framework for human activity recognition via optimized dual stream parallel network. Alex. Eng. J. 2023, 74, 569–583. [Google Scholar] [CrossRef]
- Ullah, H.; Munir, A. Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework. J. Imaging 2023, 9, 130. [Google Scholar] [CrossRef] [PubMed]
- Mao, J.; Zhou, P.; Wang, X.; Yao, H.; Liang, L.; Zhao, Y.; Zhang, J.; Ban, D.; Zheng, H. A health monitoring system based on flexible triboelectric sensors for intelligence medical internet of things and its applications in virtual reality. Nano Energy 2023, 118, 108984. [Google Scholar] [CrossRef]
- Kshirsagar, A.P.; Azath, H. YOLOv3-based human detection and heuristically modified-LSTM for abnormal human activities detection in ATM machine. J. Vis. Commun. Image Represent. 2023, 95, 103901. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
- Baxter, R.H.; Robertson, N.M.; Lane, D.M. Human behaviour recognition in data-scarce domains. Pattern Recognit. 2015, 48, 2377–2393. [Google Scholar] [CrossRef]
- Tu, H.; Allanach, J.; Singh, S.; Pattipati, K.R.; Willett, P. Information integration via hierarchical and hybrid bayesian networks. IEEE Trans. Syst. Man Cybern.—Part A Syst. Hum. 2006, 36, 19–33. [Google Scholar] [CrossRef]
- Duong, H.-T.; Le, V.-T.; Hoang, V.T. Deep Learning-Based Anomaly Detection in Video Surveillance: A Survey. Sensors 2023, 23, 5024. [Google Scholar] [CrossRef] [PubMed]
- Lavee, G.; Rivlin, E.; Rudzsky, M. Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2009, 39, 489–504. [Google Scholar] [CrossRef]
- Gawlikowski, J.; Tassi, C.R.N.; Ali, M.; Lee, J.; Humt, M.; Feng, J.; Kruspe, A.; Triebel, R.; Jung, P.; Roscher, R.; et al. A survey of uncertainty in deep neural networks. Artif. Intell. Rev. 2023, 56, 1513–1589. [Google Scholar] [CrossRef]
- Myagmar-Ochir, Y.; Kim, W. A Survey of Video Surveillance Systems in Smart City. Electronics 2023, 12, 3567. [Google Scholar] [CrossRef]
- Şengönül, E.; Samet, R.; Al-Haija, Q.A.; Alqahtani, A.; Alturki, B.; Alsulami, A.A. An Analysis of Artificial Intelligence Techniques in Surveillance Video Anomaly Detection: A Comprehensive Survey. Appl. Sci. 2023, 13, 4956. [Google Scholar] [CrossRef]
- Wang, T.; Miao, Z.; Chen, Y.; Zhou, Y.; Shan, G.; Snoussi, H. AED-Net: An Abnormal Event Detection Network. Engineering 2019, 5, 930–939. [Google Scholar] [CrossRef]
- Hu, J.; Zhu, E.; Wang, S.; Liu, X.; Guo, X.; Yin, J. An Efficient and Robust Unsupervised Anomaly Detection Method Using Ensemble Random Projection in Surveillance Videos. Sensors 2019, 19, 4145. [Google Scholar] [CrossRef] [PubMed]
- Liu, Q.; Zhou, X. A Fully Connected Network Based on Memory for Video Anomaly Detection. In Proceedings of the 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS), Chengdu, China, 26–28 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 221–226. [Google Scholar] [CrossRef]
- Chang, Y.; Tu, Z.; Xie, W.; Luo, B.; Zhang, S.; Sui, H.; Yuan, J. Video anomaly detection with spatio-temporal dissociation. Pattern Recognit. 2022, 122, 108213. [Google Scholar] [CrossRef]
- Wang, X.; Che, Z.; Jiang, B.; Xiao, N.; Yang, K.; Tang, J.; Ye, J.; Wang, J.; Qi, Q. Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 2301–2312. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Liu, T.; Zhou, J.; Guan, J. Video anomaly detection based on spatio-temporal relationships among objects. Neurocomputing 2023, 532, 141–151. [Google Scholar] [CrossRef]
- Liu, Y.; Guo, Z.; Liu, J.; Li, C.; Song, L. OSIN: Object-Centric Scene Inference Network for Unsupervised Video Anomaly Detection. IEEE Signal Process Lett. 2023, 30, 359–363. [Google Scholar] [CrossRef]
- Li, N.; Chang, F.; Liu, C. A Self-Trained Spatial Graph Convolutional Network for Unsupervised Human-Related Anomalous Event Detection in Complex Scenes. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 737–750. [Google Scholar] [CrossRef]
- Sampath, D.K.; Kumar, K. Abnormal Crowd Behaviour Detection in Surveillance Videos Using Spatiotemporal Inter-Fused Autoencoder. Int. J. Intell. Eng. Syst. 2023, 16, 470–481. [Google Scholar] [CrossRef]
- Wang, T.; Qiao, M.; Lin, Z.; Li, C.; Snoussi, H.; Liu, Z.; Choi, C. Generative Neural Networks for Anomaly Detection in Crowded Scenes. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1390–1399. [Google Scholar] [CrossRef]
- Xu, M.; Yu, X.; Chen, D.; Wu, C.; Jiang, Y. An Efficient Anomaly Detection System for Crowded Scenes Using Variational Autoencoders. Appl. Sci. 2019, 9, 3337. [Google Scholar] [CrossRef]
- Yan, S.; Smith, J.S.; Lu, W.; Zhang, B. Abnormal Event Detection From Videos Using a Two-Stream Recurrent Variational Autoencoder. IEEE Trans. Cogn. Dev. Syst. 2020, 12, 30–42. [Google Scholar] [CrossRef]
- Wang, T.; Xu, X.; Shen, F.; Yang, Y. A Cognitive Memory-Augmented Network for Visual Anomaly Detection. IEEE/CAA J. Autom. Sin. 2021, 8, 1296–1307. [Google Scholar] [CrossRef]
- Cho, M.; Kim, T.; Kim, W.J.; Cho, S.; Lee, S. Unsupervised video anomaly detection via normalizing flows with implicit latent features. Pattern Recognit. 2022, 129, 108703. [Google Scholar] [CrossRef]
- Huang, C.; Wu, Z.; Wen, J.; Xu, Y.; Jiang, Q.; Wang, Y. Abnormal Event Detection Using Deep Contrastive Learning for Intelligent Video Surveillance System. IEEE Trans. Ind. Inf. 2022, 18, 5171–5179. [Google Scholar] [CrossRef]
- Wang, L.; Tan, H.; Zhou, F.; Zuo, W.; Sun, P. Unsupervised Anomaly Video Detection via a Double-Flow ConvLSTM Variational Autoencoder. IEEE Access 2022, 10, 44278–44289. [Google Scholar] [CrossRef]
- Slavic, G.; Baydoun, M.; Campo, D.; Marcenaro, L.; Regazzoni, C. Multilevel Anomaly Detection Through Variational Autoencoders and Bayesian Models for Self-Aware Embodied Agents. IEEE Trans. Multimed. 2022, 24, 1399–1414. [Google Scholar] [CrossRef]
- Liu, Y.; Yang, D.; Fang, G.; Wang, Y.; Wei, D.; Zhao, M.; Cheng, K.; Liu, J.; Song, L. Stochastic video normality network for abnormal event detection in surveillance videos. Knowl. Based Syst. 2023, 280, 110986. [Google Scholar] [CrossRef]
- Chu, W.; Xue, H.; Yao, C.; Cai, D. Sparse Coding Guided Spatiotemporal Feature Learning for Abnormal Event Detection in Large Videos. IEEE Trans. Multimed. 2019, 21, 246–255. [Google Scholar] [CrossRef]
- Duman, E.; Erdem, O.A. Anomaly Detection in Videos Using Optical Flow and Convolutional Autoencoder. IEEE Access 2019, 7, 183914–183923. [Google Scholar] [CrossRef]
- Yan, M.; Meng, J.; Zhou, C.; Tu, Z.; Tan, Y.-P.; Yuan, J. Detecting spatiotemporal irregularities in videos via a 3D convolutional autoencoder. J. Vis. Commun. Image Represent. 2020, 67, 102747. [Google Scholar] [CrossRef]
- Bahrami, M.; Pourahmadi, M.; Vafaei, A.; Shayesteh, M.R. A comparative study between single and multi-frame anomaly detection and localization in recorded video streams. J. Vis. Commun. Image Represent. 2021, 79, 103232. [Google Scholar] [CrossRef]
- Asad, M.; Yang, J.; Tu, E.; Chen, L.; He, X. Anomaly3D: Video anomaly detection based on 3D-normality clusters. J. Vis. Commun. Image Represent. 2021, 75, 103047. [Google Scholar] [CrossRef]
- Li, B.; Leroux, S.; Simoens, P. Decoupled appearance and motion learning for efficient anomaly detection in surveillance video. Comput. Vis. Image Underst. 2021, 210, 103249. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, J.; Ji, G.; Sheng, B. Criss-Cross Attention Based Auto Encoder for Video Anomaly Event Detection. Intell. Autom. Soft Comput. 2022, 34, 1629–1642. [Google Scholar] [CrossRef]
- Kommanduri, R.; Ghorai, M. Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection. J. Vis. Commun. Image Represent. 2023, 95, 103860. [Google Scholar] [CrossRef]
- Taghinezhad, N.; Yazdi, M. A New Unsupervised Video Anomaly Detection Using Multi-Scale Feature Memorization and Multipath Temporal Information Prediction. IEEE Access 2023, 11, 9295–9310. [Google Scholar] [CrossRef]
- Jeong, J.; Jung, H.; Choi, Y.; Park, S.; Kim, M. Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System. Sensors 2023, 23, 9214. [Google Scholar] [CrossRef] [PubMed]
- Li, N.; Chang, F. Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder. Neurocomputing 2019, 369, 92–105. [Google Scholar] [CrossRef]
- Ganokratanaa, T.; Aramvith, S.; Sebe, N. Unsupervised Anomaly Detection and Localization Based on Deep Spatiotemporal Translation Network. IEEE Access 2020, 8, 50312–50329. [Google Scholar] [CrossRef]
- Li, Y.; Cai, Y.; Liu, J.; Lang, S.; Zhang, X. Spatio-Temporal Unity Networking for Video Anomaly Detection. IEEE Access 2019, 7, 172425–172432. [Google Scholar] [CrossRef]
- Chen, D.; Wang, P.; Yue, L.; Zhang, Y.; Jia, T. Anomaly detection in surveillance video based on bidirectional prediction. Image Vis. Comput. 2020, 98, 103915. [Google Scholar] [CrossRef]
- Patil, P.W.; Dudhane, A.; Murala, S. End-to-End Recurrent Generative Adversarial Network for Traffic and Surveillance Applications. IEEE Trans. Veh. Technol. 2020, 69, 14550–14562. [Google Scholar] [CrossRef]
- Liu, S.; Yang, E.; Fang, K. Self-Learning pLSA Model for Abnormal Behavior Detection in Crowded Scenes. IEICE Trans. Inf. Syst. 2021, E104.D, 473–476. [Google Scholar] [CrossRef]
- Wu, R.; Li, S.; Chen, C.; Hao, A. Improving video anomaly detection performance by mining useful data from unseen video frames. Neurocomputing 2021, 462, 523–533. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, J.; Wu, P. Bidirectional Retrospective Generation Adversarial Network for Anomaly Detection in Videos. IEEE Access 2021, 9, 107842–107857. [Google Scholar] [CrossRef]
- Chen, D.; Yue, L.; Chang, X.; Xu, M.; Jia, T. NM-GAN: Noise-modulated generative adversarial network for video anomaly detection. Pattern Recognit. 2021, 116, 107969. [Google Scholar] [CrossRef]
- Ganokratanaa, T.; Aramvith, S.; Sebe, N. Video anomaly detection using deep residual-spatiotemporal translation network. Pattern Recognit. Lett. 2022, 155, 143–150. [Google Scholar] [CrossRef]
- Yu, J.; Lee, Y.; Yow, K.C.; Jeon, M.; Pedrycz, W. Abnormal Event Detection and Localization via Adversarial Event Prediction. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 3572–3586. [Google Scholar] [CrossRef]
- Zhong, Y.; Chen, X.; Jiang, J.; Ren, F. A cascade reconstruction model with generalization ability evaluation for anomaly detection in videos. Pattern Recognit. 2022, 122, 108336. [Google Scholar] [CrossRef]
- Aslam, N.; Rai, P.K.; Kolekar, M.H. A3N: Attention-based adversarial autoencoder network for detecting anomalies in video sequence. J. Vis. Commun. Image Represent. 2022, 87, 103598. [Google Scholar] [CrossRef]
- Hao, Y.; Li, J.; Wang, N.; Wang, X.; Gao, X. Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recognit. 2022, 121, 108232. [Google Scholar] [CrossRef]
- Li, Q.; Yang, R.; Xiao, F.; Bhanu, B.; Zhang, F. Attention-based anomaly detection in multi-view surveillance videos. Knowl. Based Syst. 2022, 252, 109348. [Google Scholar] [CrossRef]
- Zhao, L.; Wang, S.; Wang, S.; Ye, Y.; Ma, S.; Gao, W. Enhanced Surveillance Video Compression With Dual Reference Frames Generation. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 1592–1606. [Google Scholar] [CrossRef]
- Huang, H.; Zhao, B.; Gao, F.; Chen, P.; Wang, J.; Hussain, A. A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction. Sensors 2023, 23, 4828. [Google Scholar] [CrossRef]
- Li, G.; He, P.; Li, H.; Zhang, F. Adversarial composite prediction of normal video dynamics for anomaly detection. Comput. Vis. Image Underst. 2023, 232, 103686. [Google Scholar] [CrossRef]
- Pedrycz, W.; Waletzky, J. Fuzzy clustering with partial supervision. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 1997, 27, 787–795. [Google Scholar] [CrossRef]
- Sikdar, A.; Chowdhury, A.S. An adaptive training-less framework for anomaly detection in crowd scenes. Neurocomputing 2020, 415, 317–331. [Google Scholar] [CrossRef]
- Singh, G.; Kapoor, R.; Khosla, A. Optical Flow-Based Weighted Magnitude and Direction Histograms for the Detection of Abnormal Visual Events Using Combined Classifier. Int. J. Cogn. Inform. Nat. Intell. 2021, 15, 12–30. [Google Scholar] [CrossRef]
- Khaire, P.; Kumar, P. A semi-supervised deep learning based video anomaly detection framework using RGB-D for surveillance of real-world critical environments. Forensic Sci. Int. Digit. Investig. 2022, 40, 301346. [Google Scholar] [CrossRef]
- Pramanik, A.; Sarkar, S.; Pal, S.K. Video surveillance-based fall detection system using object-level feature thresholding. Knowl. Based Syst. 2023, 280, 110992. [Google Scholar] [CrossRef]
- Hu, X.; Dai, J.; Huang, Y.; Yang, H.; Zhang, L.; Chen, W.; Yang, G.; Zhang, D. A weakly supervised framework for abnormal behavior detection and localization in crowded scenes. Neurocomputing 2020, 383, 270–281. [Google Scholar] [CrossRef]
- Degardin, B.; Proença, H. Iterative weak/self-supervised classification framework for abnormal events detection. Pattern Recognit. Lett. 2021, 145, 50–57. [Google Scholar] [CrossRef]
- Ullah, W.; Hussain, T.; Khan, Z.A.; Haroon, U.; Baik, S.W. Intelligent dual stream CNN and echo state network for anomaly detection. Knowl. Based Syst. 2022, 253, 109456. [Google Scholar] [CrossRef]
- Yi, S.; Fan, Z.; Wu, D. Batch feature standardization network with triplet loss for weakly-supervised video anomaly detection. Image Vis. Comput. 2022, 120, 104397. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, J.; Zhao, M.; Li, S.; Song, L. Collaborative Normality Learning Framework for Weakly Supervised Video Anomaly Detection. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 2508–2512. [Google Scholar] [CrossRef]
- Kamoona, A.M.; Gostar, A.K.; Bab-Hadiashar, A.; Hoseinnezhad, R. Multiple instance-based video anomaly detection using deep temporal encoding–decoding. Expert. Syst. Appl. 2023, 214, 119079. [Google Scholar] [CrossRef]
- Thakare, K.V.; Sharma, N.; Dogra, D.P.; Choi, H.; Kim, I.-J. A multi-stream deep neural network with late fuzzy fusion for real-world anomaly detection. Expert. Syst. Appl. 2022, 201, 117030. [Google Scholar] [CrossRef]
- Krishna, N.S.; Bhattu, S.N.; Somayajulu, D.V.L.N.; Kumar, N.V.N.; Reddy, K.J.S. GssMILP for anomaly classification in surveillance videos. Expert. Syst. Appl. 2022, 203, 117451. [Google Scholar] [CrossRef]
- Ullah, W.; Hussain, T.; Ullah, F.U.M.; Lee, M.Y.; Baik, S.W. TransCNN: Hybrid CNN and transformer mechanism for surveillance anomaly detection. Eng. Appl. Artif. Intell. 2023, 123, 106173. [Google Scholar] [CrossRef]
- Shao, W.; Xiao, R.; Rajapaksha, P.; Wang, M.; Crespi, N.; Luo, Z.; Minerva, R. Video anomaly detection with NTCN-ML: A novel TCN for multi-instance learning. Pattern Recognit. 2023, 143, 109765. [Google Scholar] [CrossRef]
- Chen, H.; Mei, X.; Ma, Z.; Wu, X.; Wei, Y. Spatial–temporal graph attention network for video anomaly detection. Image Vis. Comput. 2023, 131, 104629. [Google Scholar] [CrossRef]
- Tang, J.; Wang, Z.; Hao, G.; Wang, K.; Zhang, Y.; Wang, N.; Liang, D. SAE-PPL: Self-guided attention encoder with prior knowledge-guided pseudo labels for weakly supervised video anomaly detection. J. Vis. Commun. Image Represent. 2023, 97, 103967. [Google Scholar] [CrossRef]
- Zhang, B.; Xue, J. Weakly-supervised anomaly detection with a Sub-Max strategy. Neurocomputing 2023, 560, 126770. [Google Scholar] [CrossRef]
- Wang, L.; Wang, X.; Liu, F.; Li, M.; Hao, X.; Zhao, N. Attention-guided MIL weakly supervised visual anomaly detection. Measurement 2023, 209, 112500. [Google Scholar] [CrossRef]
- Ullah, W.; Ullah, F.U.M.; Khan, Z.A.; Baik, S.W. Sequential attention mechanism for weakly supervised video anomaly detection. Expert. Syst. Appl. 2023, 230, 120599. [Google Scholar] [CrossRef]
- Lv, H.; Zhou, C.; Cui, Z.; Xu, C.; Li, Y.; Yang, J. Localizing Anomalies From Weakly-Labeled Videos. IEEE Trans. Image Process. 2021, 30, 4505–4515. [Google Scholar] [CrossRef] [PubMed]
- Jebur, S.A.; Hussein, K.A.; Hoomod, H.K.; Alzubaidi, L.; Santamaría, J. Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance. Electronics 2022, 12, 29. [Google Scholar] [CrossRef]
- Mahadevan, V.; Li, W.; Bhalodia, V.; Vasconcelos, N. Anomaly detection in crowded scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1975–1981. [Google Scholar] [CrossRef]
- Luo, W.; Liu, W.; Gao, S. A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 341–349. [Google Scholar] [CrossRef]
- Sultani, W.; Chen, C.; Shah, M. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6479–6488. [Google Scholar] [CrossRef]
- Lu, C.; Shi, J.; Jia, J. Abnormal Event Detection at 150 FPS in MATLAB. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 2720–2727. [Google Scholar] [CrossRef]
- Detection of Unusual Crowd Activity Dataset. n.d. Available online: https://mha.cs.umn.edu/proj_events.shtml#crowd (accessed on 14 June 2024).
- Ferryman, J.; Shahrokni, A. PETS2009: Dataset and challenge. In Proceedings of the 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Snowbird, UT, USA, 7–9 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar] [CrossRef]
- Adam, A.; Rivlin, E.; Shimshoni, I.; Reinitz, D. Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 555–560. [Google Scholar] [CrossRef]
- Degardin, B.; Proenca, H. Human Activity Analysis: Iterative Weak/Self-Supervised Learning Frameworks for Detecting Abnormal Events. In Proceedings of the 2020 IEEE International Joint Conference on Biometrics (IJCB), Houston, USA, 28 September–1 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Leyva, R.; Sanchez, V.; Li, C.-T. The LV dataset: A realistic surveillance video dataset for abnormal event detection. In Proceedings of the 2017 5th International Workshop on Biometrics and Forensics (IWBF), Coventry, UK, 4–5 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Akti, S.; Tataroglu, G.A.; Ekenel, H.K. Vision-based Fight Detection from Surveillance Cameras. In Proceedings of the 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), Istanbul, Turkey, 6–9 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Nievas, E.B.; Suarez, O.D.; García, G.B.; Sukthankar, R. Violence Detection in Video Using Computer Vision Techniques. In Computer Analysis of Images and Patterns; Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 332–339. [Google Scholar] [CrossRef]
- Hassner, T.; Itcher, Y.; Kliper-Gross, O. Violent flows: Real-time detection of violent crowd behavior. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1–6. [Google Scholar] [CrossRef]
- Martínez-Villaseñor, L.; Ponce, H.; Brieva, J.; Moya-Albor, E.; Núñez-Martínez, J.; Peñafort-Asturiano, C. UP-Fall Detection Dataset: A Multimodal Approach. Sensors 2019, 19, 1988. [Google Scholar] [CrossRef]
- Gu, C.; Sun, C.; Ross, D.A.; Vondrick, C.; Pantofaru, C.; Li, Y.; Vijayanarasimhan, S.; Toderici, G.; Ricco, S.; Sukthankar, R.; et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6047–6056. [Google Scholar] [CrossRef]
- Auvinet, E.; Rougier, C.; Meunier, J.; St-Arnaud, A.; Rousseau, J. Multiple Cameras Fall Dataset; Tech. Rep. 1350; DIRO-Université de Montréal: Montréal, QC, Canada, 2010; p. 24. [Google Scholar]
- Kwolek, B.; Kepski, M. Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Progr. Biomed. 2014, 117, 489–501. [Google Scholar] [CrossRef]
- Everingham, M.; Van, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Wang, L.; Shi, J.; Song, G.; Shen, I. Object Detection Combining Recognition and Segmentation. In Computer Vision—ACCV 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 189–199. [Google Scholar] [CrossRef]
- Reddy, K.K.; Shah, M. Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 2013, 24, 971–981. [Google Scholar] [CrossRef]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv 2012, arXiv:1212.0402. [Google Scholar] [CrossRef]
- Krasin, I.; Duerig, T.; Alldrin, N.; Ferrari, V.; Abu-El-Haija, S.; Kuznetsova, A.; Rom, H.; Uijlings, J.; Popov, S.; Veit, A.; et al. OpenImages: A Public Dataset for Large-Scale Multi-Label And Multi-Class Image Classification. 2017. Dataset. Available online: https://github.com/openimages (accessed on 12 June 2024).
- CMU Graphics Lab Motion Capture Database. n.d. Available online: http://mocap.cs.cmu.edu/ (accessed on 3 June 2024).
- Ryoo, M.S.; Aggarwal, J.K.; Dataset, U.T.-I. ICPR contest on Semantic Description of Human Activities (SDHA). 2010. Available online: https://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html (accessed on 3 June 2024).
- Peliculas Movies Fight Detection Dataset. n.d. Available online: http://academictorrents.com/details/70e0794e2292fc051a13f05ea6f5b6c16f3d3635/tech&h%20it=1&filelist=1 (accessed on 12 June 2024).
- Mehran, R.; Oyama, A.; Shah, M. Abnormal crowd behavior detection using social force model. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 935–942. [Google Scholar] [CrossRef]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2556–2563. [Google Scholar] [CrossRef]
- Carreira, J.; Noland, E.; Banki-Horvath, A.; Hillier, C.; Zisserman, A. A short note about kinetics-600. arXiv 2018, arXiv:1808.01340. [Google Scholar] [CrossRef]
- Liu, J.; Luo, J.; Shah, M. Recognizing realistic actions from videos “in the wild”. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1996–2003. [Google Scholar] [CrossRef]
- Cinelli, L.P.; Marins, M.A.; da Silva, E.A.B.; Netto, S.L. Variational Methods for Machine Learning with Applications to Deep Networks; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
- Oliveira, E.E.; Rodrigues, M.; Pereira, J.P.; Lopes, A.M.; Mestric, I.I.; Bjelogrlic, S. Unlabeled learning algorithms and operations: Overview and future trends in defense sector. Artif. Intell. Rev. 2024, 57, 66. [Google Scholar] [CrossRef]
- Ribeiro, M.; Lazzaretti, A.E.; Lopes, H.S. A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recognit. Lett. 2018, 105, 13–22. [Google Scholar] [CrossRef]
- Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. In Artificial Neural Networks and Machine Learning—ICANN 2011; Honkela, T., Duch, W., Girolami, M., Kaski, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 52–59. [Google Scholar] [CrossRef]
- Jovanovic, M.; Campbell, M. Generative Artificial Intelligence: Trends and Prospects. Computer 2022, 55, 107–112. [Google Scholar] [CrossRef]
- Simmler, N.; Sager, P.; Andermatt, P.; Chavarriaga, R.; Schilling, F.-P.; Rosenthal, M.; Stadelmann, T. A Survey of Un-, Weakly-, and Semi-Supervised Learning Methods for Noisy, Missing and Partial Labels in Industrial Vision Applications. In Proceedings of the 2021 8th Swiss Conference on Data Science (SDS), Lucerne, Switzerland, 9 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 26–31. [Google Scholar] [CrossRef]
- Yu, J.; Kim, J.-G.; Gwak, J.; Lee, B.-G.; Jeon, M. Abnormal event detection using adversarial predictive coding for motion and appearance. Inf. Sci. 2022, 586, 59–73. [Google Scholar] [CrossRef]
- Huang, C.; Wen, J.; Xu, Y.; Jiang, Q.; Yang, J.; Wang, Y.; Zhang, D. Self-Supervised Attentive Generative Adversarial Networks for Video Anomaly Detection. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9389–9403. [Google Scholar] [CrossRef]
- Antoine, V.; Guerrero, J.A.; Romero, G. Possibilistic fuzzy c-means with partial supervision. Fuzzy Sets Syst. 2022, 449, 162–186. [Google Scholar] [CrossRef]
- Oliver, A.; Odena, A.; Raffel, C.; Cubuk, E.D.; Goodfellow, I.J. Realistic evaluation of deep semi-supervised learning algorithms. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 3239–3250. [Google Scholar]
- Tian, Z.; Wang, W.; Zhou, K.; Song, X.; Shen, Y.; Liu, S. Weighted Pseudo-Labels and Bounding Boxes for Semisupervised SAR Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5193–5203. [Google Scholar] [CrossRef]
- Park, S.; Kim, H.; Kim, M.; Kim, D.; Sohn, K. Normality Guided Multiple Instance Learning for Weakly Supervised Video Anomaly Detection. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2664–2673. [Google Scholar] [CrossRef]
- Xu, Z.; Zeng, X.; Ji, G.; Sheng, B. Improved Anomaly Detection in Surveillance Videos with Multiple Probabilistic Models Inference. Intell. Autom. Soft Comput. 2022, 31, 1703–1717. [Google Scholar] [CrossRef]
- Peyre, J.; Laptev, I.; Schmid, C.; Sivic, J. Weakly-Supervised Learning of Visual Relations. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 5189–5198. [Google Scholar] [CrossRef]
- Vu, T.-H.; Boonaert, J.; Ambellouis, S.; Taleb-Ahmed, A. Multi-Channel Generative Framework and Supervised Learning for Anomaly Detection in Surveillance Videos. Sensors 2021, 21, 3179. [Google Scholar] [CrossRef] [PubMed]
- Yu, B.X.B.; Chang, J.; Wang, H.; Liu, L.; Wang, S.; Wang, Z.; Lin, J.; Xie, L.; Li, H.; Lin, Z.; et al. Visual Tuning. ACM Comput. Surv. 2024. [Google Scholar] [CrossRef]
Survey (Year) | Scope | Merits | Limitations | |||
---|---|---|---|---|---|---|
Datasets | Deep Learning | Application | Metrics Comparison | |||
Patrikar and Parate [60] (2022) | ✓ | P | ✓ | ✓ |
| Lacks thorough exploration of machine learning in the context of abnormal behavior detection |
Myagmar-Ochir and Kim [101] (2023) | P | P | ✓ | P | Surveys VSS for smart city applications | The explanation of methods used in unsupervised learning methods is incomplete |
Duong, Le, and Hoang [98] (2023) | ✓ | ✓ | - | - |
| Does not include quantitative comparisons using metrics among research results |
Choudhry et al. [59] (2023) | ✓ | ✓ | ✓ | P |
| The scope does not focus on image-based detection |
Ours (2024) | ✓ | ✓ | ✓ | ✓ |
|
Dataset | Characteristics | Merits | Challenges | Composition |
---|---|---|---|---|
Ped1 [171] | Anomalies include bikers, skaters, small carts, and people crossing | Ground truth annotations provided with binary flags per frame. Some clips include pixel-level masks for anomaly localization assessment | Perspectives include distortion, which might limit generalization | 34 training videos, 36 test videos |
Ped2 [171] | Pedestrian movement parallel to the camera plane from a top angle | Focuses on abnormal pedestrian motion patterns. Ground truth annotations provided with binary flags per frame. Some clips include pixel-level masks for anomaly localization assessment | Smaller dataset compared to Ped1 | 16 training videos, 12 testing videos |
ST [172] | Includes abnormal behaviors caused by sudden motion, such as chasing and brawling | Pixel-level ground truth annotations of abnormal events | Complex lighting conditions and camera angles from all sides | 270,000 training frames, 13 anomaly scenes |
UCF-Crime [173] | Anomalies in real-world environments include abuse, arrest, arson, assault, accident, burglary, explosion, fighting, robbery, shooting, stealing, shoplifting, and vandalism | Extensive and diverse anomaly types relevant to public safety, with high-quality annotations by trained annotators, video-level labels for training, temporal annotations for testing, and a balanced set of 950 anomalous and 950 normal videos | Limited to surveillance footage, excluding other potential sources of anomalies | 1900 videos with 13 classes |
CUHK [174] | Contains unusual events such as running, throwing objects, and loitering | High frame rate detection (141.34 fps) | Unusual incidents with slight camera shake | 30,652 frames, 14 abnormal classes |
UMN [175] | Crowd behavior scenarios, where each video consists of a normal starting section and an abnormal ending section | Focuses on crowd behavior under panic conditions | Abnormal behavior typically appears at the end of the videos, which can lead to model overfitting | 22 videos, 11 abnormal scenarios |
UBI-Fights [178] | Various fighting scenarios in indoor and outdoor environments, with videos resized to 640 × 360 pixels and set to 30 fps | Provides a wide diversity of fighting scenarios with detailed frame-level annotations | Imbalance between fight and normal videos | 1000 videos, where 216 videos contain a fight event, and 784 depict normal daily life situations |
Research | Anomaly Datasets | Performance (AUC) | Strengths | Drawbacks | |||
---|---|---|---|---|---|---|---|
Ped1 | Ped2 | CUHK | |||||
Auto-encoders (AE) | Wang et al. [103] (2019) | Ped1, Ped2, UMN | 0.897 | 0.913 | N/A | Proposed a network for detecting abnormal events, which integrates a PCA network with kernel PCA | Reliance on hyperparameters and foreground detection may lead to false negatives by erroneously removing valid objects |
Hu et al. [104] (2019) | Ped1, Ped2, CUHK | 0.809 | 0.959 | 0.842 | Developed a three-stage framework for fast unsupervised anomaly detection in videos | May fail to detect instances such as a person walking with a bike | |
Liu and Zhou [105] (2022) | Ped1, Ped2, CUHK | N/A | 0.968 | 0.875 | Proposed a memory-based connected network for video anomaly detection, utilizing an auto-encoder for reconstruction | The scoring threshold must be tuned for each environment | |
Chang et al. [106] (2022) | Ped2, ST, CUHK | N/A | 0.967 | 0.871 | Proposed an auto-encoder for learning spatial and temporal regularity | Only detected abnormal events without classifying the object | |
Wang et al. [107] (2022) | Ped1, Ped2, ST, CUHK | 0.849 | 0.964 | 0.883 | Proposed unsupervised video anomaly detection with frame prediction and noise tolerance loss | Require strategies for hyperparameter selection and model inference to ensure efficiency and accuracy | |
Wang et al. [108] (2023) | Ped2, ST, CUHK | N/A | 0.984 | 0.861 | Proposed a pluggable spatio-temporal relationship attention module for indicating object relationships | Unable to fully utilize and understand the implicit video information | |
Liu et al. [109] (2023) | Ped2, ST, CUHK | N/A | 0.983 | 0.917 | Proposed object-centric scene inference network for unsupervised video anomaly detection | Unable to identify the relationship between moving objects and background scenes | |
Li et al. [110] (2023) | ST, CUHK | N/A | N/A | 0.883 | Proposed unsupervised algorithm based on skeleton features, eliminating manual specification of normal training data | May miss detecting some instances of abnormal pedestrian brawling but accurately identifies normal walking | |
Yan et al. [9] (2023) | ST, UCF-Crime | N/A | N/A | N/A | Utilized auto-encoders and memory clustering to detect abnormal human actions | Challenges in crowd human pose prediction and conflicts in auto-encoders and clustering training | |
Sampath and Kumar [111] (2023) | Ped1, Ped2, UMN | 0.902 | 0.997 | N/A | Proposed a spatiotemporal inter-fused auto-encoder for abnormal behavior detection | Reliant on a single modality, using only cameras for abnormal behavior detection | |
Variational Auto-encoders (VAE) | Wang et al. [112] (2019) | Ped1, CUHK, UMN, PETS | 0.943 | N/A | 0.876 | Used two VAEs for anomaly detection in crowded scenes | Very challenging due to the frame complexity |
Xu et al. [113] (2019) | Ped1, Ped2, ST | 0.957 | 0.923 | N/A | Introduced novel unsupervised VAE-based video anomaly detection approach | Dataset failure cases can hinder abnormal behavior detection performance | |
Yan et al. [114] (2020) | Ped1, Ped2, CUHK, Subway | 0.750 | 0.910 | 0.796 | Proposed two-stream VAE structure: appearance and motion streams | Require additional resources for optical flow computation | |
Wang et al. [115] (2021) | Ped2, ST | N/A | 0.962 | N/A | Proposed a cognitive memory-augmented network for decision-making based on past memory | Challenging to obtain normal sample distribution due to the dataset size | |
Cho et al. [116] (2022) | Ped2, ST, UCF-Crime, CUHK, LV, UBI-Fights | N/A | 0.992 | 0.880 | Proposed implicit two-path auto-encoder with normal feature distribution modeling using normalizing flow | AE and normalizing flow model struggle to distinguish abnormal scenes due to visual similarity | |
Huang et al. [117] (2022) | Ped2, CUHK, ST | N/A | 0.981 | 0.888 | Proposed temporal-aware contrastive network for unsupervised AHB detection | Require hyperparameter tuning to balance contrastive loss and task loss | |
Wang et al. [118] (2022) | Ped1, Ped2, CUHK | 0.884 | 0.888 | 0.872 | Proposed double-flow convolutional LSTM with VAE probability calculation results | Challenging to detect small foreground target objects | |
Slavic et al. [119] (2022) | Subway, CUHK | N/A | N/A | 0.862 | Proposed self-aware embodied agents for abnormal behavior detection, leveraging VAE regularization features | Challenging to detect camouflaged human objects in the background. | |
Liu et al. [120] (2023) | Ped2, ST, CUHK | N/A | 0.984 | 0.907 | Proposed stochastic video normality network for unsupervised anomaly detection | Highly sensitive to hyperparameter settings | |
Convolutional Auto-encoders (CAE) | Chu et al. [121] (2019) | Ped1, Ped2, CUHK, Subway | 0.909 | 0.902 | 0.937 | Presented novel unsupervised spatiotemporal feature learning for video anomaly detection | Performance still unsatisfactory compared to fully supervised learning, which has made great progress |
Duman and Erdem [122] (2019) | Ped1, Ped2, CUHK | 0.924 | 0.929 | 0.895 | Detected AHB by generating reconstructed dense optical flow maps | Struggle to model distant activities | |
Yan et al. [123] (2020) | Ped2, CUHK | N/A | 0.892 | N/A | Developed a 3D CAE for spatiotemporal irregularity detection in videos | Deeper layers in 3D convolutional auto-encoder may be unhelpful due to limited data | |
Bahrami et al. [124] (2021) | Ped2, ST, CUHK, | N/A | 0.975 | 0.801 | Propose single-frame analysis and consideration of consecutive frames | Increased training time due to larger spatiotemporal architecture parameters | |
Asad et al. [125] (2021) | Ped1, Ped2, CUHK, ST, Subway | 0.898 | 0.958 | 0.892 | Proposed two-staged CAE Framework for AHB detection | Takes a long time to train due to a large number of backpropagation iterations | |
Li et al. [126] (2021) | Ped1, Ped2, CUHK, ST | 0.850 | 0.951 | 0.888 | Proposed CAE with extractor and latent code prediction for future frames | As training anomalies increase, the AUC score decreases | |
Wang et al. [127] (2022) | Ped2, CUHK | N/A | 0.953 | 0.840 | Combined criss-cross attention and bi-directional ConvLSTM in auto-encoder for AHB detection | AUC score improvement possible with added spatial and temporal features | |
Kommanduri and Ghorai [128] (2023) | Ped1, Ped2, CUHK | 0.847 | 0.977 | 0.867 | Designed an end-to-end trainable bi-residual convolutional auto-encoder with long–short projection skip connections | Suffers from visual similarity and occlusions | |
Taghinezhad and Yazdi [129] (2023) | Ped1, Ped2, CUHK | 0.838 | 0.976 | 0.890 | Introduced unsupervised video anomaly detection framework based on frame prediction | Significant improvements were not achieved in refined abnormality scores due to noise |
Research | Anomaly Datasets | Performance (AUC) | Strengths | Drawbacks | |||
---|---|---|---|---|---|---|---|
Ped1 | Ped2 | CUHK | ST | ||||
Li and Chang [131] (2019) | Ped1, Ped2, CUHK, UMN | 0.850 | 0.916 | 0.842 | N/A | Built on a two-stream framework for simultaneous appearance and motion anomaly detection | The lower AUC value is due to the noise removal of abnormal frames |
Li et al. [133] (2019) | Ped1, Ped2, CUHK | 0.838 | 0.966 | 0.845 | N/A | Proposed novel spatiotemporal framework for video anomaly detection | Often fails to capture spatial characteristics due to camera angles |
Ganokratanaa et al. [132] (2020) | Ped1, Ped2, UMN, CUHK | 0.985 | 0.955 | 0.879 | N/A | Proposed a spatiotemporal AHB detection and localization | Fails to detect abnormal events with similar object speeds |
Wu et al. [137] (2021) | Ped1, Ped2, CUHK, ST | 0.885 | 0.989 | 0.847 | 0.728 | Used two independent GANs to predict optical flows or color frames | The model needs updates. |
Yang et al. [138] (2021) | CUHK, Ped1, Ped2, ST | 0.847 | 0.976 | 0.886 | 0.745 | Proposed bidirectional prediction generator: forward and backward | The model struggles with small human objects in the presence of perspective distortion |
Ganokratanaa et al. [140] (2022) | Ped1, Ped2, CUHK, UMN | 0.988 | 0.976 | 0.908 | N/A | Introduced unsupervised deep residual spatiotemporal translation network for video anomaly detection and localization | May struggle to distinguish similar abnormal events from normal patterns |
Yu et al. [141] (2022) | Ped1, Ped2, CUHK, Subway, UCF-Crime | 0.979 | 0.979 | 0.949 | N/A | Proposed adversarial event prediction to detect rare pattern events in abnormal human behaviors | Absence of background detection preprocessing leads to slightly lower performance metrics in various scenarios |
Zhong et al. [142] (2022) | Ped1, Ped2, CUHK, ST | 0.826 | 0.977 | 0.889 | 0.707 | Proposed cascade model: frame reconstruction and optical flow network with GAN | The average optical flow prediction error of normal frames increases due to perspective phenomena in datasets |
Aslam et al. [143] (2022) | Ped1, Ped2, CUHK, ST | 0.907 | 0.977 | 0.894 | 0.869 | Proposed end-to-end trainable two-stream attention-based adversarial auto-encoder network | Struggles to learn typical features with small datasets |
Hao et al. [144] (2022) | Ped1, Ped2, CUHK, ST | 0.825 | 0.969 | 0.866 | 0.738 | Proposed spatiotemporal consistency-enhanced network | Based on 3D CNN, struggles to converge if object size varies significantly |
Yu et al. [205] (2022) | Ped1, Ped2, CUHK, UCF-Crime | 0.975 | 0.971 | 0.947 | N/A | Proposed adversarial predictive coding for abnormal event detection and localization | Requires large-scale dataset and motion data but increases the computational cost |
Huang et al. [147] (2023) | Ped2, ST, CUHK | N/A | 0.977 | 0.897 | 0.758 | Predicted future frames using previous video frames and optical flow | Requires computing a large number of parameters |
Huang et al. [206] (2023) | Ped1, Ped2, ST, CUHK | 0.921 | 0.976 | 0.888 | 0.743 | Proposed self-supervised attentive GAN for video anomaly detection | Detection speed decreases with increasing input frame numbers |
Li et al. [148] (2023) | Ped2, CUHK, ST | N/A | 0.968 | 0.887 | 0.767 | Explored adversarial composite prediction for normal video dynamics learning feasibility | Difficulty in determining skip intervals for large foreground motion amplitudes in video anomaly detection |
Research | Anomaly Datasets | Performance (AUC) | Strengths | Drawbacks | ||||
---|---|---|---|---|---|---|---|---|
Ped1 | Ped2 | UCF-Crime | ST | |||||
Semi-supervised | Sikdar and Chowdhury [150] (2020) | Ped1, Ped2, CUHK, UMN, ST | 0.945 | 0.979 | N/A | N/A | Proposed adaptive training-less anomaly detection method | Performance lag due to sparse dataset and difficulty in constructing local descriptors |
Singh et al. [151] (2021) | CUHK, PETS, UMN | N/A | N/A | N/A | N/A | Proposed algorithm for suspicious event detection based on direction and magnitude | Not suitable for real-time application due to time-consuming optical flow calculation for each frame | |
Wu et al. [137] (2021) | Ped1, CUHK | 0.885 | 0.989 | N/A | N/A | Implemented semi-supervised re-learning scheme to boost the baseline approach. Constructed new training selectively from the original testing set | Model performance is positively related to the baseline deep model, but occasional failure cases still occur | |
Weakly supervised | Li et al. [145] (2022) | Ped1, Ped2 UCF-Crime, ST | 0.833 | 0.954 | 0.785 | 0.903 | Proposed attention-based multiple instances learning using attention-based features and a stringent loss | Not robust to significant occlusion |
Hu et al. [154] (2020) | Ped1, Ped2, CUHK, UMN, Subway | N/A | N/A | N/A | N/A | Trained a discriminative classifier for anomaly detection with weakly labeled data | Unable to achieve end-to-end detection of abnormal behaviors | |
Degardin and Proença [155] (2021) | Ped1, Ped2, UBI-Fights, UCF-Crime | 0.819 | 0.819 | 0.769 | N/A | Introduced an iterative learning framework, based on weakly and self-supervised paradigms | Performance gap between indoor and outdoor scenarios | |
Ullah et al. [156] (2022) | UCF-Crime, Surveillance Fight, Hockey Fight, Violent Flows, ST | N/A | N/A | 0.858 | 0.849 | Introduced a dual-stream CNN framework for detecting anomalous events in surveillance and non-surveillance environments | Some highly complex video sequences are mispredicted, contributing to model failure cases | |
Yi et al. [157] (2022) | UCF-Crime, ST | N/A | N/A | 0.843 | 0.977 | Presented a scheme to assess anomaly degree and used triplet loss to optimize the network | Limited discrimination for unseen normal events, leading to high false alarm rates | |
Liu et al. [158] (2022) | Ped2, ST, UCF-Crime | N/A | 0.914 | 0.831 | 0.882 | Proposed a collaborative normality learning framework to address weakly supervised video anomaly detection | Some false detection cases due to image obscuration and low-resolution | |
Ullah et al. [162] (2023) | Ped2, CUHK, ST | N/A | 0.984 | N/A | 0.946 | Proposed a weakly supervised hybrid CNN- and transformer-based framework to learn anomalous events using video-level labels | Transformer approach requires more computational resources due to model parameter variation | |
Shao et al. [163] (2023) | UCF-Crime, ST | N/A | N/A | 0.851 | 0.953 | Enhanced temporal features for the entire video sequence, redefining integrity and coherence | Limited interpretability | |
Chen et al. [164] (2023) | Ped2, ST, UCF-Crime, TAD | N/A | 0.974 | 0.803 | 0.972 | Proposed a spatial-temporal graph attention network to address video anomaly detection | Local discriminative representations may deteriorate in long videos with complex scenes, resulting in underfitting | |
Tang et al. [165] (2023) | UCF-Crime, ST | N/A | N/A | 0.843 | 0.967 | Prior knowledge guided pseudo label generator and improved self-guided attention encoder | High training time cost, and pseudo-label generator not robust enough | |
Zhang and Xue [166] (2023) | UCF-Crime, Ped2 | N/A | 0.941 | 0.832 | N/A | Proposed sub-Max method for anomaly detection | Pixel-level AUC result is suboptimal | |
Wang et al. [167] (2023) | UCF-Crime, ST | N/A | N/A | 0.815 | 0.940 | Proposed attention mechanism-guided multi-instance learning weakly supervised video anomaly detection method | Difficulty in detecting anomalies in low-resolution video, challenging to evaluate confusing actions without additional context |
Research | Anomaly Datasets | Framework | Strengths | Drawbacks |
---|---|---|---|---|
Espinosa et al. [84] (2019) | UP-Fall | CNN | Presented multi-camera vision-based fall detection and classification system using CNN | The efficacy depends on image quality, camera position, and subject presence |
Gomes et al. [85] (2022) | AVA, MCF, UR-Fall | CNN, Kalman Filter | Combined CNN and Kalman filter for fall tracking | Fall detection faces challenges in scenarios like crouching and sitting |
Sivachandiran et al. [88] (2022) | VOC2007, Penn-Fudan | CNN | Enhanced model for person detection and tracking on surveillance videos | Difficulty in hyperparameter tuning for optimum results |
Ahn et al. [89] (2023) | OpenImages | CNN | Designed vision-based factory safety monitoring system for detecting human presence on assembly lines | Low detection performance due to lens distortion issues |
Michael Onyema et al. [90] (2023) | CMU, UT-Interaction, PEL, Hockey Fight, WED, Ped1, Ped2 | CNN | Designed slow–fast CNN for abnormal behavior identification in surveillance videos | Consumes more computational time, increasing costs |
Hussain et al. [91] (2023) | HMDB51, UCF-50, YouTube Action | Dual-stream CNN | Proposed dual-stream network combining image enhancement, convolutional, and transformer techniques | Unable to detect motion in edge devices |
Ullah and Munir [92] (2023) | HMDB51, UCF-50, UCF-101, YouTube Action, Kinetics-600 | DA-CNN, Bi-GRU | Proposed cascaded spatial-temporal discriminative feature-learning framework for human activity recognition in video streams | May produce non-zero probabilities for some action classes |
Kshirsagar and Azath [94] (2023) | YouTube Action | CNN | Used heuristic-assisted deep learning techniques for detecting suspicious human activities in the automated teller machines | Accuracy is slightly lower than state-of-the-art methods |
Approach | Methods | Advantages | Disadvantages |
---|---|---|---|
Unsupervised | Reconstruction-based Detection (AE, VAE, CAE) |
|
|
Generative Detection (GAN) |
|
| |
Partially Supervised | Semi-supervised Detection, Weak-supervised Detection |
|
|
Fully Supervised | CNN, LSTM, GRU |
|
|
Approach | Methods | Open Research Issues |
---|---|---|
Unsupervised | Reconstruction-based Detection (AE, VAE, CAE) |
|
Generative Detection (GAN) |
| |
Partially Supervised | Semi-supervised Detection, Weak-supervised Detection |
|
Fully Supervised | CNN, LSTM, GRU |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wastupranata, L.M.; Kong, S.G.; Wang, L. Deep Learning for Abnormal Human Behavior Detection in Surveillance Videos—A Survey. Electronics 2024, 13, 2579. https://doi.org/10.3390/electronics13132579
Wastupranata LM, Kong SG, Wang L. Deep Learning for Abnormal Human Behavior Detection in Surveillance Videos—A Survey. Electronics. 2024; 13(13):2579. https://doi.org/10.3390/electronics13132579
Chicago/Turabian StyleWastupranata, Leonard Matheus, Seong G. Kong, and Lipo Wang. 2024. "Deep Learning for Abnormal Human Behavior Detection in Surveillance Videos—A Survey" Electronics 13, no. 13: 2579. https://doi.org/10.3390/electronics13132579
APA StyleWastupranata, L. M., Kong, S. G., & Wang, L. (2024). Deep Learning for Abnormal Human Behavior Detection in Surveillance Videos—A Survey. Electronics, 13(13), 2579. https://doi.org/10.3390/electronics13132579