Criminal Intention Detection at Early Stages of Shoplifting Cases by Using 3D Convolutional Neural Networks
<p>Different situations may be recorded by surveillance cameras. Suspicious behavior is not the crime itself. However, particular situations will make us distrust a person if we consider their behavior to be “suspicious”.</p> "> Figure 2
<p>Video segmentation by using the moments obtained from the Pre-Crime Behavior Segment (PCB) method.</p> "> Figure 3
<p>Graphical representation of the process for suspicious behavior sample extraction.</p> "> Figure 4
<p>Architecture of the DL Model used for this investigation. The depth of the kernel for the 3D convolution is adjusted to 10, 30, or 90 frames, according to each particular experiment (see <a href="#sec4-computation-09-00024" class="html-sec">Section 4</a>).</p> "> Figure 5
<p>Overview of the experimental setup followed in this work. For a detailed description of the parameters and the relation of the samples considered for each experiment, please consult <a href="#app1-computation-09-00024" class="html-app">Appendix A</a>.</p> "> Figure 6
<p>Interaction plot of depth (10, 30, and 90 frames) and resolution (32 × 24, 40 × 30, 80 × 60, and 160 × 120 pixels) using the accuracy values obtained from experiment P01.</p> "> Figure 7
<p>Interaction plot of the proportion of the base set used for training (80%, 70%, and 60%) and resolution (32 × 24, 40 × 30, 80 × 60, and 160 × 120 pixels) using the accuracy values obtained from experiment P02.</p> "> Figure 8
<p>Interaction plot of depth (10, 30, and 90 frames) and resolution (32 × 24, 40 × 30, 80 × 60, and 160 × 120 pixels) using the accuracy values obtained from experiment P03.</p> "> Figure 9
<p>Interaction plot of depth (10, 30, and 90 frames) and resolution (32 × 24, 40 × 30, 80 × 60, and 160 × 120 pixels) using the accuracy values obtained from experiment P04 (using 60% of the dataset for training).</p> "> Figure 10
<p>Interaction plot of depth (10, 30, and 90 frames) and resolution (32 × 24, 40 × 30, 80 × 60, and 160 × 120 pixels) using the accuracy values obtained from experiment P04 (using 70% of the dataset for training).</p> "> Figure 11
<p>Confusion matrices for the best model generated for each configuration in the confirmatory experiment.</p> ">
Abstract
:1. Introduction
- It describes a methodology, the PCB method, to unify the processing and division of criminal video samples into useful segments that can later be used for feeding a Deep Learning (DL) model.
- It represents the first implementation of a 3DCNN architecture to detect criminal intentions before an offender shows suspicious behavior.
- It provides a set of experiments to validate the results, confirming that the proposed approach is suitable for such a challenging task: to detect criminal intention even before the suspect begins to behave suspiciously.
2. Background and Related Work
3. Methodology
3.1. Description of The Dataset
3.2. The Pre-Crime Behavior Method
- Identify the instant where the offender appears for the first time in the video. We refer to this moment as the First Appearance Moment (FAM). The analysis of suspicious behavior starts from this moment.
- Detect the moment when the offender undoubtedly commits a crime. This moment is referred to as the Strict Crime Moment (SCM). This moment contains the necessary evidence to argue the crime commission.
- Between the FAM and the SCM, find the moment where the offender starts acting suspiciously. The Comprehensive Crime Moment (CCM) starts as soon as we detect that the offender acts suspiciously in the video.
- After the SCM, locate the moment where the crime ends (when everything seems to be ordinary again). If the video sample started from this instant, we would have no evidence of any crime committed in the past. This moment is known as the Back to Normality Moment (B2NM).
- Pre-Crime Behavior Segment (PCBS). The PCBS is the video segment between the FAM and the CCM. This segment has the information needed to study how people behave before committing a crime, even acting suspiciously. Most human observers will fail to predict that a crime is about to occur by only watching the PCBS.
- Suspicious Behavior Segment (SBS). The SBS is the video segment contained between the CCM and the SCM. The SBS provides specific information about an offender’s behavior before committing a crime.
- Crime Evidence Segment (CES). The CES represents the video segment included between the SCM and the B2NM. This segment contains the evidence to accuse a person of committing a crime.
3.3. 3D Convolutional Neural Networks
3.4. Metrics
4. Experiments And Results
4.1. Preliminary Experiments
- Training set size. The percentage of the samples from the base dataset used for training. The possible values are 80%, 70%, and 60%. Note that, as an attempt to test out approach on different situations, the base dataset changes for each particular experiment.
- Depth. The number of consecutive frames used for 3D convolution. The values allowed for this parameter are 10, 30, and 90 frames.
- Resolution. The size of the input images (in pixels). We used four different values for resolution: 32 × 24, 40 × 30, 80 × 60, and 160 × 120 pixels.
- Flip. As discussed before, to increase the number of samples, we applied a horizontal flipping procedure to all the samples. By using this procedure, we doubled the number of samples. Stating that a set has been flipped indicates that the frames in those videos have been flipped horizontally.
4.1.1. Experiment P01—Effect of the Depth (In Balanced Datasets)
4.1.2. Experiment P02—Effect of the Training Set Size (In Balanced Datasets)
4.1.3. Experiment P03—Effect of the Depth (In Unbalanced Datasets)
4.1.4. Experiment P04—Effect of the Data Augmentation Technique (In Balanced Datasets)
4.2. Confirmatory Experiments
4.2.1. Experiment C01—Effect of the Depth (In Larger Balanced and Unbalanced Datasets with Data Augmentation)
4.2.2. Experiment C02—Aiming for the Best Model
4.3. Discussion
- The system can be used to classify normal and suspicious behavior given the proper conditions.
- The PCB method exhibits some limitations as it is yet a manual process.
- The time needed for training the models suggests that training time may not be related to accuracy.
- There is an apparent relationship between the model’s performance and the number of parameters in the models.
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
3DCNN | Three-Dimensional Convolutional Neural Network |
ACC | Accuracy |
ANOVA | Analysis of Variance |
B2NM | Back to Normal Moment |
CCM | Comprehensive Crime Moment |
CES | Crime Evidence Segment |
DL | Deep Learning |
FAM | First Appearance Moment |
GPU | Graphics Processing Unit |
NRSS | National Retail Security Survey |
PCB | Pre-Crime Behavior method |
PCBS | Pre-Crime Behavior Segment |
SBS | Suspicious Behavior Segment |
SCM | Strict Crime Moment |
TN | True Negative |
TP | True Positive |
VFOA | Visual Focus Of Attention |
Appendix A. Experimental Description
- Preliminary experiment P01:
- Preliminary experiment P02:
- Preliminary experiment P03:
- Preliminary experiment P04:
- -
- -
- Training set size: 60% and 70% of the base dataset.
- -
- Depth: 10.
- -
- Resolutions: 32 × 24, 40 × 30, 80 × 60, and 160 × 120.
- -
- Epochs: 100.
- -
- Runs: Three per configuration.
- Confirmatory experiment C01:
- -
- -
- Training set size: 70% of the base dataset.
- -
- Depth: 10, 30, and 90 frames.
- -
- Resolutions: 32 × 24, 40 × 30, 80 × 60, and 160 × 120.
- -
- Epochs: 100.
- -
- Runs: Three per configuration.
- Confirmatory experiment C02:
- -
- -
- Training set size: 70% of the base dataset.
- -
- Depth: 10, 30, and 90 frames.
- -
- Resolutions: 32 × 24, 40 × 30, 80 × 60, and 160 × 120.
- -
- Epochs: 100.
- -
- Runs: Three per configuration.
- -
- Cross validation: 10 folds.
- -
- SBT_balanced_240_70t
Appendix B. Normal Behavior Samples
ID | File | Begin | End | ID | File | Begin | End |
---|---|---|---|---|---|---|---|
1 | Normal_Videos001_x264.mp4 | 0:00 | 0:18 | 31 | Normal_Videos023_x264.mp4 | 0:00 | 0:59 |
2 | Normal_Videos002_x264.mp4 | 0:00 | 0:55 | 32 | Normal_Videos024_x264.mp4 | 0:00 | 0:36 |
3 | Normal_Videos003_x264.mp4 | 0:00 | 1:34 | 33 | Normal_Videos029_x264.mp4 | 0:00 | 0:29 |
4 | Normal_Videos004_x264.mp4 | 0:00 | 0:31 | 34 | Normal_Videos030_x264.mp4 | 0:00 | 1:00 |
5 | Normal_Videos005_x264.mp4 | 0:00 | 0:13 | 35 | Normal_Videos034_x264.mp4 | 0:00 | 0:44 |
6 | Normal_Videos006_x264.mp4 | 0:00 | 0:15 | 36 | Normal_Videos036_x264.mp4 | 0:00 | 0:44 |
7 | Normal_Videos007_x264.mp4 | 0:00 | 0:37 | 37 | Normal_Videos039_x264.mp4 | 0:00 | 1:00 |
8 | Normal_Videos008_x264.mp4 | 0:00 | 1:26 | 38 | Normal_Videos041_x264.mp4 | 0:00 | 0:42 |
9 | Normal_Videos009_x264.mp4 | 0:08 | 0:17 | 39 | Normal_Videos043_x264.mp4 | 0:00 | 0:58 |
10 | Normal_Videos010_x264.mp4 | 0:00 | 0:35 | 40 | Normal_Videos044_x264.mp4 | 0:00 | 1:24 |
11 | Normal_Videos011_x264.mp4 | 0:00 | 0:30 | 41 | Normal_Videos047_x264.mp4 | 0:00 | 1:00 |
12 | Normal_Videos012_x264.mp4 | 0:00 | 1:18 | 42 | Normal_Videos048_x264.mp4 | 0:00 | 0:56 |
13 | Normal_Videos013_x264.mp4 | 0:00 | 0:40 | 43 | Normal_Videos049_x264.mp4 | 0:00 | 1:00 |
14 | Normal_Videos014_x264.mp4 | 0:00 | 0:50 | 44 | Normal_Videos051_x264.mp4 | 0:00 | 1:19 |
15 | Normal_Videos015_x264.mp4 | 0:00 | 0:16 | 45 | Normal_Videos052_x264.mp4 | 0:00 | 0:11 |
16 | Normal_Videos017_x264.mp4 | 0:00 | 0:28 | 46 | Normal_Videos053_x264.mp4 | 0:00 | 0:13 |
17 | Normal_Videos020_x264.mp4 | 0:00 | 0:16 | 47 | Normal_Videos054_x264.mp4 | 0:00 | 1:06 |
18 | Normal_Videos021_x264.mp4 | 0:00 | 1:05 | 48 | Normal_Videos055_x264.mp4 | 0:00 | 0:08 |
19 | Normal_Videos022_x264.mp4 | 0:00 | 0:13 | 49 | Normal_Videos056_x264.mp4 | 0:00 | 0:52 |
20 | Normal_Videos025_x264.mp4 | 0:00 | 0:25 | 50 | Normal_Videos057_x264.mp4 | 0:00 | 1:00 |
21 | Normal_Videos026_x264.mp4 | 0:00 | 1:31 | 51 | Normal_Videos058_x264.mp4 | 0:00 | 0:33 |
22 | Normal_Videos027_x264.mp4 | 0:00 | 2:44 | 52 | Normal_Videos059_x264.mp4 | 0:00 | 1:01 |
23 | Normal_Videos028_x264.mp4 | 0:00 | 5:21 | 53 | Normal_Videos061_x264.mp4 | 0:00 | 1:00 |
24 | Normal_Videos032_x264.mp4 | 0:00 | 0:28 | 54 | Normal_Videos062_x264.mp4 | 0:00 | 0:52 |
25 | Normal_Videos033_x264.mp4 | 0:00 | 0:56 | 55 | Normal_Videos063_x264.mp4 | 0:00 | 0:12 |
26 | Normal_Videos035_x264.mp4 | 0:04 | 8:00 | 56 | Normal_Videos064_x264.mp4 | 0:12 | 1:10 |
27 | Normal_Videos037_x264.mp4 | 0:00 | 0:15 | 57 | Normal_Videos065_x264.mp4 | 0:00 | 0:29 |
28 | Normal_Videos038_x264.mp4 | 0:00 | 1:39 | 58 | Normal_Videos066_x264.mp4 | 0:00 | 0:34 |
29 | Normal_Videos042_x264.mp4 | 0:00 | 1:45 | 59 | Normal_Videos067_x264.mp4 | 0:00 | 0:36 |
30 | Normal_Videos045_x264.mp4 | 0:00 | 0:52 | 60 | Normal_Videos073_x264.mp4 | 0:08 | 0:30 |
Appendix C. Suspicious Behavior Samples
ID | File | Begin | End | ID | File | Begin | End |
---|---|---|---|---|---|---|---|
1 | Shoplifting001_x264.mp4 | 0:00 | 0:41 | 31 | Shoplifting034_x264.mp4 | 2:56 | 3:08 |
2 | Shoplifting005_x264.mp4 | 0:00 | 0:25 | 32 | Shoplifting034_x264.mp4 | 3:12 | 3:39 |
3 | Shoplifting006_x264.mp4 | 0:09 | 0:57 | 33 | Shoplifting034_x264.mp4 | 3:42 | 3:43 |
4 | Shoplifting008_x264.mp4 | 2:10 | 2:52 | 34 | Shoplifting034_x264.mp4 | 3:47 | 4:04 |
5 | Shoplifting009_x264.mp4 | 0:29 | 2:26 | 35 | Shoplifting034_x264.mp4 | 4:09 | 4:34 |
6 | Shoplifting010_x264.mp4 | 0:19 | 0:24 | 36 | Shoplifting036_x264.mp4 | 0:56 | 1:44 |
7 | Shoplifting010_x264.mp4 | 0:43 | 0:51 | 37 | Shoplifting037_x264.mp4 | 0:00 | 0:38 |
8 | Shoplifting012_x264.mp4 | 1:25 | 4:26 | 38 | Shoplifting038_x264.mp4 | 0:50 | 1:20 |
9 | Shoplifting012_x264.mp4 | 4:38 | 5:53 | 39 | Shoplifting039_x264.mp4 | 0:14 | 1:10 |
10 | Shoplifting014_x264.mp4 | 5:51 | 6:23 | 40 | Shoplifting040_x264.mp4 | 0:00 | 0:27 |
11 | Shoplifting014_x264.mp4 | 6:29 | 11:43 | 41 | Shoplifting040_x264.mp4 | 0:34 | 1:00 |
12 | Shoplifting014_x264.mp4 | 12:03 | 18:46 | 42 | Shoplifting040_x264.mp4 | 1:06 | 2:24 |
13 | Shoplifting014_x264.mp4 | 19:01 | 27:43 | 43 | Shoplifting040_x264.mp4 | 2:36 | 4:39 |
14 | Shoplifting015_x264.mp4 | 0:24 | 1:07 | 44 | Shoplifting040_x264.mp4 | 4:50 | 5:38 |
15 | Shoplifting016_x264.mp4 | 0:00 | 0:15 | 45 | Shoplifting040_x264.mp4 | 5:48 | 7:12 |
16 | Shoplifting017_x264.mp4 | 0:00 | 0:12 | 46 | Shoplifting042_x264.mp4 | 0:00 | 1:04 |
17 | Shoplifting018_x264.mp4 | 0:00 | 0:14 | 47 | Shoplifting044_x264.mp4 | 0:00 | 6:09 |
18 | Shoplifting018_x264.mp4 | 0:27 | 0:37 | 48 | Shoplifting047_x264.mp4 | 0:00 | 0:32 |
19 | Shoplifting019_x264.mp4 | 0:06 | 0:08 | 49 | Shoplifting047_x264.mp4 | 0:34 | 0:43 |
20 | Shoplifting020_x264.mp4 | 1:04 | 1:17 | 50 | Shoplifting047_x264.mp4 | 0:47 | 0:50 |
21 | Shoplifting021_x264.mp4 | 0:00 | 1:09 | 51 | Shoplifting047_x264.mp4 | 0:53 | 0:59 |
22 | Shoplifting024_x264.mp4 | 0:00 | 0:27 | 52 | Shoplifting048_x264.mp4 | 0:11 | 0:25 |
23 | Shoplifting025_x264.mp4 | 0:00 | 0:56 | 53 | Shoplifting049_x264.mp4 | 0:00 | 0:33 |
24 | Shoplifting028_x264.mp4 | 0:06 | 0:20 | 54 | Shoplifting051_x264.mp4 | 0:15 | 2:32 |
25 | Shoplifting028_x264.mp4 | 0:23 | 0:26 | 55 | Shoplifting052_x264.mp4 | 0:07 | 0:29 |
26 | Shoplifting029_x264.mp4 | 0:06 | 0:27 | 56 | Shoplifting052_x264.mp4 | 0:34 | 0:54 |
27 | Shoplifting031_x264.mp4 | 0:00 | 0:04 | 57 | Shoplifting052_x264.mp4 | 1:04 | 1:29 |
28 | Shoplifting033_x264.mp4 | 0:00 | 0:22 | 58 | Shoplifting052_x264.mp4 | 1:35 | 2:12 |
29 | Shoplifting034_x264.mp4 | 0:25 | 2:36 | 59 | Shoplifting052_x264.mp4 | 2:16 | 2:39 |
30 | Shoplifting034_x264.mp4 | 2:42 | 2:53 | 60 | Shoplifting053_x264.mp4 | 0:00 | 0:43 |
References
- Federation, N.R. 2020 National Retail Security Survey; National Retail Federation: Washington, DC, USA, 2020. [Google Scholar]
- Ba, S.O.; Odobez, J. Recognizing Visual Focus of Attention From Head Pose in Natural Meetings. IEEE Trans. Syst. Man, Cybern. Part B (Cybernetics) 2009, 39, 16–33. [Google Scholar] [CrossRef] [Green Version]
- Nayak, N.M.; Sethi, R.J.; Song, B.; Roy-Chowdhury, A.K. Modeling and Recognition of Complex Human Activities. In Visual Analysis of Humans: Looking at People; Springer: London, UK, 2011; pp. 289–309. [Google Scholar] [CrossRef]
- Rankin, S.; Cohen, N.; Maclennan-Brown, K.; Sage, K. CCTV Operator Performance Benchmarking. In Proceedings of the 2012 IEEE International Carnahan Conference on Security Technology (ICCST), Newton, MA, USA, 15–18 October 2012; pp. 325–330. [Google Scholar] [CrossRef]
- DeepCam. Official Website. 2018. Available online: https://deepcamai.com/ (accessed on 6 April 2019).
- FaceFirst. Official Website. 2019. Available online: https://www.facefirst.com/ (accessed on 6 April 2019).
- Geng, X.; Li, G.; Ye, Y.; Tu, Y.; Dai, H. Abnormal Behavior Detection for Early Warning of Terrorist Attack. In AI 2006: Advances in Artificial Intelligence; Sattar, A., Kang, B.H., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1002–1009. [Google Scholar]
- Berjon, D.; Cuevas, C.; Moran, F.; Garcia, N. GPU-based implementation of an optimized nonparametric background modeling for real-time moving object detection. IEEE Trans. Consum. Electron. 2013, 59, 361–369. [Google Scholar] [CrossRef]
- Hati, K.K.; Sa, P.K.; Majhi, B. LOBS: Local background subtracter for video surveillance. In Proceedings of the 2012 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, Hyderabad, India, 5–7 December 2012; pp. 29–34. [Google Scholar] [CrossRef]
- Joshila Grace, L.K.; Reshmi, K. Face recognition in surveillance system. In Proceedings of the 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 19–20 March 2015; pp. 1–5. [Google Scholar] [CrossRef]
- Nurhopipah, A.; Harjoko, A. Motion Detection and Face Recognition for CCTV Surveillance System. IJCCS (Indones. J. Comput. Cybern. Syst.) 2018, 12, 107. [Google Scholar] [CrossRef] [Green Version]
- Hou, L.; Wan, W.; Han, K.; Muhammad, R.; Yang, M. Human detection and tracking over camera networks: A review. In Proceedings of the 2016 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, China, 11–12 July 2016; pp. 574–580. [Google Scholar] [CrossRef]
- Kim, J.S.; Yeom, D.H.; Joo, Y.H. Fast and robust algorithm of tracking multiple moving objects for intelligent video surveillance systems. IEEE Trans. Consum. Electron. 2011, 57, 1165–1170. [Google Scholar] [CrossRef]
- Ling, T.S.; Meng, L.K.; Kuan, L.M.; Kadim, Z.; Baha’a Al-Deen, A.A. Colour-based Object Tracking in Surveillance Application. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2009 (IMECS 2009), Hong Kong, China, 18–20 March 2009; Volume 1. [Google Scholar]
- Kang, J.; Kwak, S. Loitering Detection Solution for CCTV Security System. J. Korea Multimed. Soc. 2014, 17. [Google Scholar] [CrossRef]
- Chang, J.Y.; Liao, H.H.; Chen, L.G. Localized detection of abandoned luggage. EURASIP J. Adv. Signal Process. 2010, 2010, 675784. [Google Scholar] [CrossRef] [Green Version]
- Alvar, M.; Torsello, A.; Sanchez-Miralles, A.; Armingol, J.M. Abnormal behavior detection using dominant sets. Mach. Vis. Appl. 2014, 25, 1351–1368. [Google Scholar] [CrossRef]
- Wang, T.; Qiao, M.; Deng, Y.; Zhou, Y.; Wang, H.; Lyu, Q.; Snoussi, H. Abnormal event detection based on analysis of movement information of video sequence. Opt.-Int. J. Light Electron Opt. 2018, 152, 50–60. [Google Scholar] [CrossRef]
- Wu, S.; Wong, H.; Yu, Z. A Bayesian Model for Crowd Escape Behavior Detection. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 85–98. [Google Scholar] [CrossRef]
- Ouivirach, K.; Gharti, S.; Dailey, M.N. Automatic Suspicious Behavior Detection from a Small Bootstrap Set. In Proceedings of the International Conference on Computer Vision Theory and Applications(VISAPP-2012), Rome, Italy, 24–26 February 2012; pp. 655–658. [Google Scholar] [CrossRef]
- Sabokrou, M.; Fathy, M.; Moayed, Z.; Klette, R. Fast and accurate detection and localization of abnormal behavior in crowded scenes. Mach. Vis. Appl. 2017, 28, 965–985. [Google Scholar] [CrossRef]
- Tsushita, H.; Zin, T.T. A Study on Detection of Abnormal Behavior by a Surveillance Camera Image. In Big Data Analysis and Deep Learning Applications; Zin, T.T., Lin, J.C.W., Eds.; Springer: Singapore, 2019; pp. 284–291. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar] [CrossRef] [Green Version]
- Intel. Official Website. 2020. Available online: https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html (accessed on 9 February 2020).
- Hassner, T.; Itcher, Y.; Kliper-Gross, O. Violent flows: Real-time detection of violent crowd behavior. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 1–6. [Google Scholar] [CrossRef]
- Bermejo Nievas, E.; Deniz Suarez, O.; Bueno García, G.; Sukthankar, R. Violence Detection in Video Using Computer Vision Techniques. In Computer Analysis of Images and Patterns; Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 332–339. [Google Scholar]
- Sultani, W.; Chen, C.; Shah, M. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6479–6488. [Google Scholar] [CrossRef] [Green Version]
- Nasaruddin, N.; Muchtar, K.; Afdhal, A.; Dwiyantoro, A.P.J. Deep anomaly detection through visual attention in surveillance videos. J. Big Data 2020, 87. [Google Scholar] [CrossRef]
- University of Central Florida. UCF-Crime Dataset. 2018. Available online: https://webpages.uncc.edu/cchen62/dataset.html (accessed on 23 April 2019).
- Ishikawa, T.; Zin, T.T. A Study on Detection of Suspicious Persons for Intelligent Monitoring System. In Big Data Analysis and Deep Learning Applications; Zin, T.T., Lin, J.C.W., Eds.; Springer: Singapore, 2019; pp. 292–301. [Google Scholar]
- Afra, S.; Alhajj, R. Early warning system: From face recognition by surveillance cameras to social media analysis to detecting suspicious people. Phys. A: Stat. Mech. Appl. 2020, 540, 123151. [Google Scholar] [CrossRef]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. WIDER FACE: A Face Detection Benchmark. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5525–5533. [Google Scholar] [CrossRef] [Green Version]
- Amos, B.; Ludwiczuk, B.; Satyanarayanan, M. OpenFace: A General-Purpose Face Recognition Library with Mobile Applications; Technical Report, CMU-CS-16-118; CMU School of Computer Science: Pittsburgh, PA, USA, 2016. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Hoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 87–102. [Google Scholar]
- He, T.; Mao, H.; Yi, Z. Moving object recognition using multi-view three-dimensional convolutional neural networks. Neural Comput. Appl. 2017, 28, 3827–3835. [Google Scholar] [CrossRef]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Zhu, G.; Shen, P.; Song, J. Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 3120–3128. [Google Scholar] [CrossRef]
- Ogwueleka, F.N.; Misra, S.; Colomo-Palacios, R.; Fernandez, L. Neural Network and Classification Approach in Identifying Customer Behavior in the Banking Sector: A Case Study of an International Bank. Hum. Factors Ergon. Manuf. Serv. Ind. 2015, 25, 28–42. [Google Scholar] [CrossRef]
- Cai, X.; Hu, F.; Ding, L. Detecting Abnormal Behavior in Examination Surveillance Video with 3D Convolutional Neural Networks. In Proceedings of the 2016 6th International Conference on Digital Home (ICDH), Guangzhou, China, 2–4 December 2016; pp. 20–24. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Varol, G.; Laptev, I.; Schmid, C. Long-Term Temporal Convolutions for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1510–1517. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alfaifi, R.; Artoli, A.M. Human Action Prediction with 3D-CNN. SN Comput. Sci. 2020, 1. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41. [Google Scholar] [CrossRef]
- Jiang, F.; Yuan, J.; Tsaftaris, S.A.; Katsaggelos, A.K. Anomalous Video Event Detection Using Spatiotemporal Context. Comput. Vis. Image Underst. 2011, 115, 323–333. [Google Scholar] [CrossRef]
- Sabokrou, M.; Fathy, M.; Hoseini, M. Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder. Electron. Lett. 2016, 52, 1122–1124. [Google Scholar] [CrossRef]
- Vaswani, N.; Roy Chowdhury, A.; Chellappa, R. Activity recognition using the dynamics of the configuration of interacting objects. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 16–22 June 2003; Volume 2. [Google Scholar]
- Ko, K.E.; Sim, K.B. Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng. Appl. Artif. Intell. 2018, 67, 226–234. [Google Scholar] [CrossRef]
- Bouma, H.; Vogels, J.; Aarts, O.; Kruszynski, C.; Wijn, R.; Burghouts, G. Behavioral profiling in CCTV cameras by combining multiple subtle suspicious observations of different surveillance operators. In Signal Processing, Sensor Fusion, and Target Recognition XXII; Kadar, I., Ed.; International Society for Optics and Photonics, SPIE: San Diego, CA, USA, 2013; Volume 8745, pp. 436–444. [Google Scholar] [CrossRef] [Green Version]
- Koller, C.I.; Wetter, O.E.; Hofer, F. ‘Who’s the Thief?’ The Influence of Knowledge and Experience on Early Detection of Criminal Intentions. Appl. Cogn. Psychol. 2016, 30, 178–187. [Google Scholar] [CrossRef]
- Grant, D.; Williams, D. The importance of perceiving social contexts when predicting crime and antisocial behaviour in CCTV images. Leg. Criminol. Psychol. 2011, 16, 307–322. [Google Scholar] [CrossRef]
- Koller, C.I.; Wetter, O.E.; Hofer, F. What Is Suspicious When Trying to be Inconspicuous? Criminal Intentions Inferred From Nonverbal Behavioral Cues. Perception 2015, 44, 679–708. [Google Scholar] [CrossRef]
- Troscianko, T.; Holmes, A.; Stillman, J.; Mirmehdi, M.; Wright, D.; Wilson, A. What happens next? The predictability of natural behaviour viewed through CCTV cameras. Perception 2004, 33, 87–101. [Google Scholar] [CrossRef]
- Altemir, V. La comunicación no verbal como herramienta en la videovigilancia. In Comportamiento no Verbal: Más Allá de la Cmunicación y el Lenguaje; Pirámide: Madrid, Spain, 2016; pp. 225–228. [Google Scholar]
- Kim, H.; Jeong, Y.S. Sentiment Classification Using Convolutional Neural Networks. Appl. Sci. 2019, 9, 2347. [Google Scholar] [CrossRef] [Green Version]
- Roth, H.R.; Yao, J.; Lu, L.; Stieger, J.; Burns, J.E.; Summers, R.M. Detection of Sclerotic Spine Metastases via Random Aggregation of Deep Convolutional Neural Network Classifications. In Recent Advances in Computational Methods and Clinical Applications for Spine Imaging; Yao, J., Glocker, B., Klinder, T., Li, S., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 3–12. [Google Scholar] [CrossRef] [Green Version]
- Fujimoto Lab. 3D Convolutional Neural Network for Video Classification, Code Repository. 2017. Available online: https://github.com/kcct-fujimotolab/3DCNN (accessed on 28 April 2019).
- Google. Google Colaboratory. 2017. Available online: https://colab.research.google.com/ (accessed on 29 April 2019).
.45 | .45 | .45 | .45 |
(a) t | (b) t | (c) t | (d) t |
Paper | Behaviors to Detect | Dataset Size | Criminal/Incident Videos |
---|---|---|---|
Bouma et al. [52] | Theft and pickpocketing | 8 videos | 5 |
Ishikawa and Zin [31] | Loitering | 6 videos | 6 |
Koller et al. [53] | Theft | 12 videos | 12 |
Tsushita and Zin [22] | Snatch theft | 19 videos | 9 |
Grant and Williams [54] | Violent crimes against people or property | 24 videos | 12 |
Koller et al. [55] | Bomb and theft | 26 videos | 18 |
Ko and Sim [51] | Hand shaking, hugging, kicking, punching, pointing, and pushing | 50 videos | 50 * |
Troscianko et al. [56] | Fights, assaults, car crimes and vandalism | 100 videos | 18 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Depth (Frames) | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
10 | 83.3 | 75.0 | 66.6 | 50.0 |
75.0 | 66.6 | 83.3 | 50.0 | |
91.6 | 75.0 | 83.3 | 41.6 | |
83.3 | 72.2 | 77.7 | 69.3 | |
30 | 83.3 | 83.3 | 83.3 | 83.3 |
75.0 | 66.6 | 75.0 | 50.0 | |
50.0 | 75.0 | 50.0 | 75.0 | |
69.4 | 75.0 | 69.4 | 69.4 | |
90 | 83.3 | 66.6 | 50.0 | 50.0 |
75.0 | 75.0 | 75.0 | 50.0 | |
50.0 | 50.0 | 58.3 | 75.0 | |
69.4 | 63.9 | 61.1 | 58.3 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Training | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
80% | 66.6 | 75.0 | 66.6 | 50.0 |
75.0 | 75.0 | 58.3 | 50.0 | |
75.0 | 66.6 | 75.0 | 41.6 | |
72.2 | 72.2 | 66.6 | 47.2 | |
70% | 77.7 | 72.2 | 66.6 | 77.7 |
66.6 | 77.7 | 72.2 | 77.7 | |
61.1 | 72.2 | 66.6 | 72.2 | |
68.5 | 74.0 | 68.5 | 75.9 | |
60% | 62.5 | 66.6 | 70.8 | 72.2 |
58.3 | 66.6 | 50.0 | 66.6 | |
70.8 | 70.8 | 62.5 | 72.2 | |
63.9 | 68.0 | 61.1 | 70.3 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Depth (Frames) | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
10 | 66.6 | 70.3 | 62.0 | 74.1 |
66.6 | 62.9 | 66.6 | 77.7 | |
66.6 | 70.3 | 77.7 | 85.1 | |
66.6 | 67.8 | 68.8 | 79.0 | |
30 | 70.3 | 55.5 | 81.4 | 77.7 |
66.6 | 66.6 | 77.7 | 77.7 | |
70.3 | 74.0 | 81.4 | 85.1 | |
69.1 | 65.4 | 80.2 | 80.2 | |
90 | 66.6 | 62.9 | 81.4 | 66.6 |
70.3 | 62.9 | 81.4 | 66.6 | |
70.3 | 70.3 | 81.4 | 66.6 | |
69.1 | 65.4 | 81.4 | 66.6 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Flipped | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
FALSE | 72.9 | 70.8 | 79.1 | 77.0 |
70.8 | 72.9 | 79.1 | 83.3 | |
70.8 | 70.8 | 72.9 | 70.8 | |
71.5 | 71.5 | 77.0 | 77.0 | |
TRUE | 70.8 | 77.0 | 83.3 | 72.9 |
75.0 | 75.0 | 87.5 | 68.7 | |
75.0 | 79.1 | 79.1 | 70.8 | |
73.6 | 77.0 | 83.3 | 70.8 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Flipped | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
69.4 | 66.6 | 72.2 | 72.2 | |
FALSE | 77.7 | 72.2 | 75 | 83.3 |
80.5 | 75.0 | 66.6 | 83.3 | |
75.9 | 71.3 | 71.3 | 79.6 | |
80.5 | 66.6 | 75.0 | 77.7 | |
TRUE | 75 | 77.7 | 86.1 | 77.7 |
75 | 72.2 | 83.3 | 80.5 | |
76.8 | 72.2 | 81.5 | 78.6 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Depth (Frames) | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
10 | 75.0 | 72.2 | 83.3 | 77.7 |
84.7 | 86.1 | 86.1 | 77.7 | |
66.6 | 68.0 | 91.6 | 80.5 | |
75.4 | 75.4 | 87.0 | 78.6 | |
30 | 80.5 | 66.6 | 76.3 | 86.1 |
77.7 | 80.5 | 86.1 | 90.2 | |
75.0 | 81.9 | 75.0 | 81.9 | |
77.7 | 76.3 | 79.1 | 86.1 | |
90 | 69.4 | 72.2 | 83.3 | 50.0 |
75.0 | 79.1 | 81.9 | 77.7 | |
79.1 | 75.0 | 83.3 | 50.0 | |
74.5 | 75.4 | 82.8 | 59.2 |
Resolution (Pixels) | ||||
---|---|---|---|---|
Depth (Frames) | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
10 | 83.3 | 68.5 | 81.4 | 79.6 |
64.8 | 70.3 | 77.7 | 77.7 | |
57.4 | 66.6 | 79.6 | 79.6 | |
68.5 | 68.5 | 79.6 | 79.0 | |
30 | 72.2 | 66.6 | 87.0 | 87.0 |
81.4 | 70.3 | 61.1 | 74.0 | |
74.0 | 62.9 | 81.4 | 68.5 | |
75.9 | 66.6 | 76.5 | 76.5 | |
90 | 68.5 | 74.0 | 59.2 | 70.3 |
83.3 | 74.0 | 70.3 | 66.6 | |
70.3 | 72.2 | 81.4 | 66.6 | |
74.0 | 73.4 | 70.3 | 67.8 |
Resolution (Pixels) | |||||
---|---|---|---|---|---|
Number of Samples (Normal/Suspicious) | Depth (Frames) | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
120/120 | 10 | 70.3 (0.0476) | 71.8 (0.0468) | 73.0 (0.0717) | 73.1 (0.0661) |
30 | 70.1 (0.0574) | 71.9 (0.055) | 73.6 (0.0821) | 71.6 (0.0999) | |
120/60 | 10 | 69.4 (0.0686) | 68.7 (0.0569) | 75.0 (0.0689) | 75.7 (0.0638) |
30 | 71.6 (0.0533) | 69.1 (0.0576) | 74.8 (0.0500) | 73.9 (0.0543) |
Resolution (Pixels) | |||||
---|---|---|---|---|---|
Number of Samples (Normal/Suspicious) | Depth (Frames) | 32 × 24 | 40 × 30 | 80 × 60 | 160 × 120 |
120/120 | 10 | 118 | 157 | 475 | 1714 |
30 | 257 | 364 | 1304 | 4952 | |
90 | 688 | 1011 | 3879 | 15,415 | |
120/60 | 10 | 96 | 126 | 369 | 1356 |
30 | 196 | 279 | 1027 | 3918 | |
90 | 518 | 758 | 2929 | 11,655 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Martínez-Mascorro, G.A.; Abreu-Pederzini, J.R.; Ortiz-Bayliss, J.C.; Garcia-Collantes, A.; Terashima-Marín, H. Criminal Intention Detection at Early Stages of Shoplifting Cases by Using 3D Convolutional Neural Networks. Computation 2021, 9, 24. https://doi.org/10.3390/computation9020024
Martínez-Mascorro GA, Abreu-Pederzini JR, Ortiz-Bayliss JC, Garcia-Collantes A, Terashima-Marín H. Criminal Intention Detection at Early Stages of Shoplifting Cases by Using 3D Convolutional Neural Networks. Computation. 2021; 9(2):24. https://doi.org/10.3390/computation9020024
Chicago/Turabian StyleMartínez-Mascorro, Guillermo A., José R. Abreu-Pederzini, José C. Ortiz-Bayliss, Angel Garcia-Collantes, and Hugo Terashima-Marín. 2021. "Criminal Intention Detection at Early Stages of Shoplifting Cases by Using 3D Convolutional Neural Networks" Computation 9, no. 2: 24. https://doi.org/10.3390/computation9020024
APA StyleMartínez-Mascorro, G. A., Abreu-Pederzini, J. R., Ortiz-Bayliss, J. C., Garcia-Collantes, A., & Terashima-Marín, H. (2021). Criminal Intention Detection at Early Stages of Shoplifting Cases by Using 3D Convolutional Neural Networks. Computation, 9(2), 24. https://doi.org/10.3390/computation9020024