Abstract
A substantial increase in the number of surveillance camera systems has not delivered the promised deterrent effects or investigative case evidence and their usefulness has been underwhelming. A potential solution to practical camera monitor needs is computer vision (CV)-enhanced camera networks that can provide automated real-time video analysis, quick processing of monitor query-based searches, and accurate summaries of archived video files. The development and testing of four CV algorithms in computer vision laboratories is presented and implications from their possible adoption by security agencies on society are discussed.
Similar content being viewed by others
Notes
Monitor perception failure occurs when there is little visual change present in long video stream stretches and a monitor’s attention shifts to non-visual tasks such as conversations or daydreaming (Bredemeier and Simons 2012; Fougnie and Marois 2007; Mack and Rock 1998; Memmert 2006; Most et al. 2005; Sasse 2010). Monitoring video streams has been reported to significantly increase perceptual failure (Hyman et al. 2009; Most et al. 2005).
The first alternate approach (termed tubelets) utilized selective sampling to produce sequences of bounding boxes for action localization (Jain et al. 2014). The second competing approach, termed poselets, developed a relational model for action detection which initially decomposes human actions into temporal ‘key poses’ and then into spatial ‘action parts’ (Wang et al. 2014). Our method (T-CNN) was compared with these two prior state-of-the-art action detection approaches on our collected real-world action detection dataset including crime-related events/actions such as fighting, car accident, and robbery. The ROC curves of these approaches were plotted. At each False-Positive Rate, the higher the “True Positive Rate” the more accurately the method detects actions. From the results, our approach was superior to the two alternate CV approaches. However, our method sometimes also missed the real events. For example, when the event/action region is very small in the video due to long distance camera view, our method missed the detection due to limited information. False positives occur when events/actions are very similar in terms of appearance or motion like robbery and burglary. This would confuse the CV algorithm, leading to false positives for those actions such as labeling a robbery as a burglary. The figure below reflects that the T-CNN approach was superior to the two alternate CV approaches.
A few real-world action/event detection examples using the computer vision algorithm can be viewed from here: https://docs.google.com/presentation/d/1MINyHYIuotHTttUrjSKdCIKuR_LrW4eNCht_0kiDjgU/edit?usp=sharing.
The method generated a regression model such that anomalous video segment instances have higher anomaly R2 scores than the normal segments. The anomaly scores are not identical to the R2 values familiar to social science research, but they analogously vary between 0 and 1 with scores closer to one denoting more anomalous video clips. To manage output levels and avoid event swamping, threshold values can be chosen to decrease (closer to zero) or increase (closer to 1) the number of clips labeled as anomalies.
Recall refers to the accuracy of a method in discerning the actual number of correct events in a video, and precision refers to the number of misclassified instances. For example, if a query was to identify police cars in a video that contained 10 police cars (its’ ground truth), and 8 segments were identified as police cars, the recall percentage would be 8 of 10 or 80%. If 1 of 8 identified police cars were incorrect, the method’s precision rate would be 1 of 8 or 12.5%. In practical terms, the goal is to maximize the recall rate and minimize the precision error.
References
Abdi, H., D. Valentin, and B. Edelman. 1999. Neural networks. Thousand Oaks, CA: Sage.
Adams, A., and J. Ferryman. 2015. The future of video analytics for surveillance and its ethical implications. Security Journal 28 (3): 272–289.
Alexandrie, G. 2017. Surveillance cameras and crime: A review of randomized and natural experiments. Journal of Scandinavian Studies of Criminology and Crime Prevention 18 (2): 210–222.
Andrews, S., I. Tsochantaridis, and T. Hofmann. 2003. Support vector machines for multiple-instance learning. In Advances in neural information processing Systems, 577–584. Cambridge: MIT.
Ashby, M.P. 2017. The value of CCTV surveillance cameras as an investigative tool: An empirical analysis. European Journal on Criminal Policy and Research 23 (3): 441–459.
Baldwin, D.A., and J.A. Baird. 2001. Discerning intentions in dynamic human action. Trends in Cognitive Sciences 5: 171–178.
Barrett, H., P. Todd, G. Miller, and P.W. Blythe. 2005. Accurate judgments of intention from motion cues alone: A cross-cultural study. Evolution and Human Behavior 26: 313–331.
Bredemeier, K., and D. Simons. 2012. Working memory and inattentional blindness. Psychological Bulletin Review 19: 239–244.
Bulwa, D., and M.B. Stannard. 2007. Is it worth the cost? San Francisco Chronicle, August 17. https://www.sfgate.com/news/article/Is-it-worth-the-cost-2546948.php. Downloaded 8 Oct 2019
Chen, B.W., J.-C. Wang, and J.F. Wang. 2009. A novel video summarization based on mining the story-structure & semantic relations among concept entities. IEEE Transactions on Multimedia 11 (2): 295–312.
Coetzer, B., B. Josephs, and J. van der Merwe. 2011. Information management and video analytics: The future of intelligent video surveillance. Rijeka: INTECH Open Access Publisher.
Davenport, J. 2007. Tens of thousands of CCTV cameras, yet 80% of crime unsolved. Evening Standard, September 19. https://www.standard.co.uk/news/tens-of-thousands-of-cctv-cameras-yet-80-of-crime-unsolved-6684359.html. Downloaded 8 Oct 2019.
Donald, C. 2005. How many monitors should a CCTV operator view. CCTV Image, Spring, 35–36.
Dowling, C., A. Morgan, A. Gannoni, and P. Jorna. 2019. How do police use CCTV footage in criminal investigations? Trends and Issues in Crime and Criminal Justice 575: 1–14.
Edwards, R. 2008. Police say CCTV is an ‘utter fiasco’. The Telegraph, May 6.https://www.telegraph.co.uk/news/uknews/1932769/Police-say-CCTV-is-utter-fiasco-as-most-footage-is-unusable.html. Downloaded 8 Oct 2019.
Edwards, R. 2009. Seven of ten murders solved by CCTV. The Telegraph, January 1.https://www.telegraph.co.uk/news/uknews/law-and-order/4060443/Seven-of-ten-murders-solved-by-CCTV.html. Downloaded 8 Oct 2019.
Evangelopoulos, G., A. Zlatintsi, G. Skoumas, K. Rapantzikos, A. Potamianos, P. Maragos, and Y. Avrithis, 2009. Video event detection and summarization using audio, visual and text saliency. In ICASSP IEEE international conference on acoustics, speech and signal processing.
Faber, L., N. Maurits, and M. Lorist. 2012. Mental fatigue affects visual selective attention. PLoS ONE 710: e48073.
Ferguson, A. 2017. Policing predictive policing. Washington University Law Review 94: 1115–1194.
Fougnie, D., and R. Marois. 2007. Executive working memory load induces inattentional blindness. Psychonomic Bulletin and Review 141: 142–147.
Gao, Y., D. Wang, J. Yong, and H. Gu. 2009. Dynamic video summarization using two level redundancy detection. Multimedia Tools and Applications 422: 233–250.
Gerell, M. 2016. Hot spot policing with actively monitored CCTV Cameras. International Criminal Justice Review 24 (2): 187–201.
Ghosh, J., Y.J. Lee, and K. Grauman. 2012. Discovering important people and objects for egocentric video summarization. In IEEE conference on CV and pattern recognition, Providence, RI, pp. 1346–1353.
Gill, M. 2003. CCTV. Leicester: Perpetuity Press.
Gong, S., C.C. Loy, and T. Xiang. 2011. Security and surveillance. In Visual analysis of humans, 455–472. London: Springer.
Goold, B. 2004. CCTV and policing. Oxford: Oxford University Press.
Gowsikhaa, D., S. Abirami, and R. Baskaran. 2014. Automated human behavior analysis from surveillance videos: A survey. Artificial Intelligence Review 42 (4): 1–19.
Graham, S. 1996. CCTV-Big Brother or friendly eye in the sky? T AND CP 65: 57–59.
Gygli, M., H. Grabner, and L. Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Hesse, L. 2002. The transition from video motion detection to intelligent scene discrimination and target tracking in automated video surveillance systems. Security Journal 15 (2): 69–78.
Hier, S., J. Greenberg, K. Walby, and D. Lett. 2007. Media, communication and the establishment of public camera surveillance programmes in Canada. Media, Culture, and Society 295: 727–751.
Honovich, J. 2008. Is public CCTV effective? July 7. https://ipvm.com/reports/is-public-cctv-effective. Downloaded 27 Sept 2019.
Hyman, I., E. Boss, S. Matthew, B. Wise, M. McKenzie, E. Kira, and J. Caggiano. 2009. Did you see the unicycling clown? Inattentional blindness while walking and talking on a cell phone. Applied Cognitive Psychology 245: 597–607.
Idrees, H., Shah, M., & Surette, R. 2018. Enhancing camera surveillance using computer vision: A research note. Policing: An International Journal 41 (2), 292–307.
Jain, M., J. Van Gemert, H. Jégou, P. Bouthemy, and C.G. Snoek. 2014. Action localization with tubelets from motion. In Proceedings of the IEEE conference on CV and pattern recognition, pp. 740–747.
Keval, H., and M. Sasse. 2010. “Not the usual suspects”: A study of factors reducing the effectiveness of CCTV. Security Journal 232: 134–154.
Kuehne, H., H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: a large video database for human motion recognition. In International conference on CV, pp. 2556–2563.
Kulesza, A., and B. Taskar. 2012. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning 5 (2–3): 123–286.
La Vigne, N., S. Lowry, J. Markman, and A. Dwyer. 2011. Evaluating the use of public surveillance cameras for crime control and prevention. Washington, D.C.: Urban Institute, Justice Policy Center. https://www.urban.org/UploadedPDF/412403-Evaluating-the-Use-of-Public-Surveillance-Cameras-for-Crime-Control-and-Prevention.pdf.
Leman-Langlois, S. 2002. The myopic panopticon: The social consequences of policing through the lens. Policing and Society 131: 43–58.
Mack, A., and I. Rock. 1998. Inattentional blindness. Cambridge, MA: MIT Press.
Marx, G. 1988. Undercover: Police surveillance in America. Berkley: University of California Press.
Memmert, D. 2006. The effects of eye movement, age, and expertise on inattentional blindness. Consciousness and Cognition 153: 620–627. https://doi.org/10.1016/j.concog.2006.01.001.
Morgan, A., and M. Coughlan. 2018. Police use of CCTV on the rail network. Trends and Issues in Crime and Criminal Justice 561: 1–17.
Morgan, A., and C. Dowling. 2019. Does CCTV help police solve crime? Trends and Issues in Crime and Criminal Justice 576: 1–14.
Most, S.B., B.J. Scholl, E.R. Clifford, and D.J. Simons. 2005. What you see is what you set: Sustained inattentional blindness and the capture of awareness. Psychological Review 1121: 217–242.
Näsholm, E., S. Rohlfing, and J.D. Sauer. 2014. Pirate stealth or inattentional blindness? The effects of target relevance and sustained attention on security monitoring for experienced and naïve operators. PLoS ONE 9 (1): e86157. https://doi.org/10.1371/journal.pone.0086157.
Norris, C., and G. Armstrong. 1999. The Maximum Surveillance Society: The rise of CCTV. Oxford: Berg.
Piza, E., J. Caplan, and L. Kennedy. 2014a. CCTV as a tool for early police intervention: Preliminary lessons from nine case studies. Security Journal 30: 247–265. https://doi.org/10.1057/sj.2014.17.
Piza, E., J. Caplan, and L. Kennedy. 2014b. Is the punishment more certain? An analysis of CCTV detections and enforcement. Justice Quarterly 31 (6): 1015–1043.
Piza, E., B. Welsh, D. Farrington, and A. Thomas. 2019. CCTV surveillance for crime prevention: A 40-year systematic review with meta-analysis. Criminology & Public Policy 18: 135–159.
Prenzler, T., and E. Wilson. 2019. The Ipswich (Queensland) safe city program: an evaluation. Security Journal 32: 137–152.
Ratcliffe, J.H., T. Taniguchi, and R.B. Taylor. 2009. The crime reduction effects of public CCTV cameras: a multi-method spatial approach. Justice Quarterly 264: 746–770.
Ren, S., K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99.
Sandhu, A. 2017. ‘I’m glad that was on camera’: A case study of police officers’ perceptions of cameras. Policing and Society 29 (2): 223–235.
Sasse, A. 2010. Not seeing the crime for the cameras? Communications of the ACM 53: 22–25.
Scheitle, C.P., and C. Halligan. 2018. Explaining the adoption of security measures by places of worship: Perceived risk of victimization and organizational structure. Security Journal 31 (10): 1–23.
Shah, M. 2017. Project Report: Studying the impact of video analytics for pre, live and post event analysis on outcomes of criminal justice, July 2016–December 2016. Orlando, FL: University of Central Florida Center for Research on Computer Vision. Funded by U.S. Department of Justice, NIJ-2015-R2-CX-K025.
Surette, R. 2005. The thinking eye: Pros and cons of second generation CCTV surveillance systems. Policing: An International Journal of Police Strategies and Management 281: 152–173.
Surette, R. 2006. CCTV and citizen guardianship suppression: A questionable proposition. Police Quarterly 9: 100–125.
Surette, R. 2015. Media, crime, and criminal justice: Images, realities, and policies. Stamford, CT: Cengage.
Taylor, E. 2010. Evaluating CCTV: Why the findings are inconsistent, inconclusive and ultimately irrelevant. Crime Prevention and Community Safety 124: 209–232.
The Scotsman. 2008. CCTV: Does it actually work? The Scotsman, May 28. https://www.scotsman.com/news-2-15012/cctv-does-it-actually-work-1-1169849. Downloaded 8 Oct 2019.
Thomas, J., and K. Cook. 2006. A visual analytics agenda. IEEE Computer Graphics and Applications 261: 10–13.
Tickner, A., and E. Poulton. 1973. Monitoring up to 16 synthetic television picture showing a great deal of movement. Ergonomics 16: 381–401.
Uijlings, J.R., K.E. Van De Sande, T. Gevers, and A. Smeulders. 2013. Selective search for object recognition. International Journal of CV 1042: 154–171.
Wang, L., Y. Qiao, and X. Tang. 2014. Video action detection with relational dynamic-poselets. In European conference on CV, pp. 565–580. Cham: Springer.
Welsh, B., and D. Farrington. 2002. Crime prevention effects of closed circuit television: A systematic review. Home Office Research Study 252. London: Home Office.
Welsh, B., and D. Farrington. 2004. Evidence-based crime prevention: The effectiveness of CCTV. Crime Prevention and Community Safety 6: 21–33.
Welsh, B., and D. Farrington. 2009. Public area CCTV and crime prevention: An updated systematic review and meta-analysis. Justice Quarterly 264: 716–745.
Welsh, B., D. Farrington, and S. Taheri. 2015. Effectiveness and social costs of public area surveillance for crime prevention. Annual Review of Law and Social Science 11: 111–130.
Yang, M., S. Ji, W. Xu, J. Wang, F. Lv, K. Yu, Y. Gong, M. Dikmen, D.J. Li, and T.S. Huang. 2009. Detecting human actions in surveillance video. In TREC video retrieval evaluation workshop.
Ye, G., D. Liu, J. Wang, and S. Chang. 2013. Large-scale video hashing via structure learning. In Proceedings of the IEEE international conference on CV, pp. 2272–2279.
Yeung, S., A. Fathi, and L. Fei-Fei. 2014. Videoset: Video summary evaluation through text. arXiv preprint. arXiv:1406.5824.
Zhao, B., and E.P. Xing. 2014. Quasi real-time summarization for consumer videos. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zhou, Z.H. 2018. A brief introduction to weakly supervised learning. National Science Review 5 (1): 44–53.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, C., Surette, R. & Shah, M. Automated monitoring for security camera networks: promise from computer vision labs. Secur J 34, 389–409 (2021). https://doi.org/10.1057/s41284-020-00230-w
Published:
Issue Date:
DOI: https://doi.org/10.1057/s41284-020-00230-w