Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Automated monitoring for security camera networks: promise from computer vision labs

  • Original Article
  • Published:
Security Journal Aims and scope Submit manuscript

Abstract

A substantial increase in the number of surveillance camera systems has not delivered the promised deterrent effects or investigative case evidence and their usefulness has been underwhelming. A potential solution to practical camera monitor needs is computer vision (CV)-enhanced camera networks that can provide automated real-time video analysis, quick processing of monitor query-based searches, and accurate summaries of archived video files. The development and testing of four CV algorithms in computer vision laboratories is presented and implications from their possible adoption by security agencies on society are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Monitor perception failure occurs when there is little visual change present in long video stream stretches and a monitor’s attention shifts to non-visual tasks such as conversations or daydreaming (Bredemeier and Simons 2012; Fougnie and Marois 2007; Mack and Rock 1998; Memmert 2006; Most et al. 2005; Sasse 2010). Monitoring video streams has been reported to significantly increase perceptual failure (Hyman et al. 2009; Most et al. 2005).

  2. The first alternate approach (termed tubelets) utilized selective sampling to produce sequences of bounding boxes for action localization (Jain et al. 2014). The second competing approach, termed poselets, developed a relational model for action detection which initially decomposes human actions into temporal ‘key poses’ and then into spatial ‘action parts’ (Wang et al. 2014). Our method (T-CNN) was compared with these two prior state-of-the-art action detection approaches on our collected real-world action detection dataset including crime-related events/actions such as fighting, car accident, and robbery. The ROC curves of these approaches were plotted. At each False-Positive Rate, the higher the “True Positive Rate” the more accurately the method detects actions. From the results, our approach was superior to the two alternate CV approaches. However, our method sometimes also missed the real events. For example, when the event/action region is very small in the video due to long distance camera view, our method missed the detection due to limited information. False positives occur when events/actions are very similar in terms of appearance or motion like robbery and burglary. This would confuse the CV algorithm, leading to false positives for those actions such as labeling a robbery as a burglary. The figure below reflects that the T-CNN approach was superior to the two alternate CV approaches.

    figure a

    A few real-world action/event detection examples using the computer vision algorithm can be viewed from here: https://docs.google.com/presentation/d/1MINyHYIuotHTttUrjSKdCIKuR_LrW4eNCht_0kiDjgU/edit?usp=sharing.

  3. The method generated a regression model such that anomalous video segment instances have higher anomaly R2 scores than the normal segments. The anomaly scores are not identical to the R2 values familiar to social science research, but they analogously vary between 0 and 1 with scores closer to one denoting more anomalous video clips. To manage output levels and avoid event swamping, threshold values can be chosen to decrease (closer to zero) or increase (closer to 1) the number of clips labeled as anomalies.

  4. The first dataset, UT Egocentric (UTE), included four daily life egocentric videos, each 3–5 h long compiled by Ghosh, et al. (2012). The second data set of television episodes set contained four videos, each roughly 45 min long from Yeung et al. (2014).

  5. Recall refers to the accuracy of a method in discerning the actual number of correct events in a video, and precision refers to the number of misclassified instances. For example, if a query was to identify police cars in a video that contained 10 police cars (its’ ground truth), and 8 segments were identified as police cars, the recall percentage would be 8 of 10 or 80%. If 1 of 8 identified police cars were incorrect, the method’s precision rate would be 1 of 8 or 12.5%. In practical terms, the goal is to maximize the recall rate and minimize the precision error.

References

  • Abdi, H., D. Valentin, and B. Edelman. 1999. Neural networks. Thousand Oaks, CA: Sage.

    Book  Google Scholar 

  • Adams, A., and J. Ferryman. 2015. The future of video analytics for surveillance and its ethical implications. Security Journal 28 (3): 272–289.

    Article  Google Scholar 

  • Alexandrie, G. 2017. Surveillance cameras and crime: A review of randomized and natural experiments. Journal of Scandinavian Studies of Criminology and Crime Prevention 18 (2): 210–222.

    Article  Google Scholar 

  • Andrews, S., I. Tsochantaridis, and T. Hofmann. 2003. Support vector machines for multiple-instance learning. In Advances in neural information processing Systems, 577–584. Cambridge: MIT.

  • Ashby, M.P. 2017. The value of CCTV surveillance cameras as an investigative tool: An empirical analysis. European Journal on Criminal Policy and Research 23 (3): 441–459.

    Article  Google Scholar 

  • Baldwin, D.A., and J.A. Baird. 2001. Discerning intentions in dynamic human action. Trends in Cognitive Sciences 5: 171–178.

    Article  Google Scholar 

  • Barrett, H., P. Todd, G. Miller, and P.W. Blythe. 2005. Accurate judgments of intention from motion cues alone: A cross-cultural study. Evolution and Human Behavior 26: 313–331.

    Article  Google Scholar 

  • Bredemeier, K., and D. Simons. 2012. Working memory and inattentional blindness. Psychological Bulletin Review 19: 239–244.

    Article  Google Scholar 

  • Bulwa, D., and M.B. Stannard. 2007. Is it worth the cost? San Francisco Chronicle, August 17. https://www.sfgate.com/news/article/Is-it-worth-the-cost-2546948.php. Downloaded 8 Oct 2019

  • Chen, B.W., J.-C. Wang, and J.F. Wang. 2009. A novel video summarization based on mining the story-structure & semantic relations among concept entities. IEEE Transactions on Multimedia 11 (2): 295–312.

    Article  Google Scholar 

  • Coetzer, B., B. Josephs, and J. van der Merwe. 2011. Information management and video analytics: The future of intelligent video surveillance. Rijeka: INTECH Open Access Publisher.

    Google Scholar 

  • Davenport, J. 2007. Tens of thousands of CCTV cameras, yet 80% of crime unsolved. Evening Standard, September 19. https://www.standard.co.uk/news/tens-of-thousands-of-cctv-cameras-yet-80-of-crime-unsolved-6684359.html. Downloaded 8 Oct 2019.

  • Donald, C. 2005. How many monitors should a CCTV operator view. CCTV Image, Spring, 35–36.

  • Dowling, C., A. Morgan, A. Gannoni, and P. Jorna. 2019. How do police use CCTV footage in criminal investigations? Trends and Issues in Crime and Criminal Justice 575: 1–14.

    Google Scholar 

  • Edwards, R. 2008. Police say CCTV is an ‘utter fiasco’. The Telegraph, May 6.https://www.telegraph.co.uk/news/uknews/1932769/Police-say-CCTV-is-utter-fiasco-as-most-footage-is-unusable.html. Downloaded 8 Oct 2019.

  • Edwards, R. 2009. Seven of ten murders solved by CCTV. The Telegraph, January 1.https://www.telegraph.co.uk/news/uknews/law-and-order/4060443/Seven-of-ten-murders-solved-by-CCTV.html. Downloaded 8 Oct 2019.

  • Evangelopoulos, G., A. Zlatintsi, G. Skoumas, K. Rapantzikos, A. Potamianos, P. Maragos, and Y. Avrithis, 2009. Video event detection and summarization using audio, visual and text saliency. In ICASSP IEEE international conference on acoustics, speech and signal processing.

  • Faber, L., N. Maurits, and M. Lorist. 2012. Mental fatigue affects visual selective attention. PLoS ONE 710: e48073.

    Article  Google Scholar 

  • Ferguson, A. 2017. Policing predictive policing. Washington University Law Review 94: 1115–1194.

    Google Scholar 

  • Fougnie, D., and R. Marois. 2007. Executive working memory load induces inattentional blindness. Psychonomic Bulletin and Review 141: 142–147.

    Article  Google Scholar 

  • Gao, Y., D. Wang, J. Yong, and H. Gu. 2009. Dynamic video summarization using two level redundancy detection. Multimedia Tools and Applications 422: 233–250.

    Article  Google Scholar 

  • Gerell, M. 2016. Hot spot policing with actively monitored CCTV Cameras. International Criminal Justice Review 24 (2): 187–201.

    Article  Google Scholar 

  • Ghosh, J., Y.J. Lee, and K. Grauman. 2012. Discovering important people and objects for egocentric video summarization. In IEEE conference on CV and pattern recognition, Providence, RI, pp. 1346–1353.

  • Gill, M. 2003. CCTV. Leicester: Perpetuity Press.

    Google Scholar 

  • Gong, S., C.C. Loy, and T. Xiang. 2011. Security and surveillance. In Visual analysis of humans, 455–472. London: Springer.

  • Goold, B. 2004. CCTV and policing. Oxford: Oxford University Press.

    Google Scholar 

  • Gowsikhaa, D., S. Abirami, and R. Baskaran. 2014. Automated human behavior analysis from surveillance videos: A survey. Artificial Intelligence Review 42 (4): 1–19.

    Article  Google Scholar 

  • Graham, S. 1996. CCTV-Big Brother or friendly eye in the sky? T AND CP 65: 57–59.

    Google Scholar 

  • Gygli, M., H. Grabner, and L. Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Hesse, L. 2002. The transition from video motion detection to intelligent scene discrimination and target tracking in automated video surveillance systems. Security Journal 15 (2): 69–78.

    Article  Google Scholar 

  • Hier, S., J. Greenberg, K. Walby, and D. Lett. 2007. Media, communication and the establishment of public camera surveillance programmes in Canada. Media, Culture, and Society 295: 727–751.

    Article  Google Scholar 

  • Honovich, J. 2008. Is public CCTV effective? July 7. https://ipvm.com/reports/is-public-cctv-effective. Downloaded 27 Sept 2019.

  • Hyman, I., E. Boss, S. Matthew, B. Wise, M. McKenzie, E. Kira, and J. Caggiano. 2009. Did you see the unicycling clown? Inattentional blindness while walking and talking on a cell phone. Applied Cognitive Psychology 245: 597–607.

    Article  Google Scholar 

  • Idrees, H., Shah, M., & Surette, R. 2018. Enhancing camera surveillance using computer vision: A research note. Policing: An International Journal 41 (2), 292–307.

    Article  Google Scholar 

  • Jain, M., J. Van Gemert, H. Jégou, P. Bouthemy, and C.G. Snoek. 2014. Action localization with tubelets from motion. In Proceedings of the IEEE conference on CV and pattern recognition, pp. 740–747.

  • Keval, H., and M. Sasse. 2010. “Not the usual suspects”: A study of factors reducing the effectiveness of CCTV. Security Journal 232: 134–154.

    Article  Google Scholar 

  • Kuehne, H., H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: a large video database for human motion recognition. In International conference on CV, pp. 2556–2563.

  • Kulesza, A., and B. Taskar. 2012. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning 5 (2–3): 123–286.

    Article  Google Scholar 

  • La Vigne, N., S. Lowry, J. Markman, and A. Dwyer. 2011. Evaluating the use of public surveillance cameras for crime control and prevention. Washington, D.C.: Urban Institute, Justice Policy Center. https://www.urban.org/UploadedPDF/412403-Evaluating-the-Use-of-Public-Surveillance-Cameras-for-Crime-Control-and-Prevention.pdf.

  • Leman-Langlois, S. 2002. The myopic panopticon: The social consequences of policing through the lens. Policing and Society 131: 43–58.

    Article  Google Scholar 

  • Mack, A., and I. Rock. 1998. Inattentional blindness. Cambridge, MA: MIT Press.

    Book  Google Scholar 

  • Marx, G. 1988. Undercover: Police surveillance in America. Berkley: University of California Press.

    Book  Google Scholar 

  • Memmert, D. 2006. The effects of eye movement, age, and expertise on inattentional blindness. Consciousness and Cognition 153: 620–627. https://doi.org/10.1016/j.concog.2006.01.001.

    Article  Google Scholar 

  • Morgan, A., and M. Coughlan. 2018. Police use of CCTV on the rail network. Trends and Issues in Crime and Criminal Justice 561: 1–17.

    Google Scholar 

  • Morgan, A., and C. Dowling. 2019. Does CCTV help police solve crime? Trends and Issues in Crime and Criminal Justice 576: 1–14.

    Google Scholar 

  • Most, S.B., B.J. Scholl, E.R. Clifford, and D.J. Simons. 2005. What you see is what you set: Sustained inattentional blindness and the capture of awareness. Psychological Review 1121: 217–242.

    Article  Google Scholar 

  • Näsholm, E., S. Rohlfing, and J.D. Sauer. 2014. Pirate stealth or inattentional blindness? The effects of target relevance and sustained attention on security monitoring for experienced and naïve operators. PLoS ONE 9 (1): e86157. https://doi.org/10.1371/journal.pone.0086157.

    Article  Google Scholar 

  • Norris, C., and G. Armstrong. 1999. The Maximum Surveillance Society: The rise of CCTV. Oxford: Berg.

    Google Scholar 

  • Piza, E., J. Caplan, and L. Kennedy. 2014a. CCTV as a tool for early police intervention: Preliminary lessons from nine case studies. Security Journal 30: 247–265. https://doi.org/10.1057/sj.2014.17.

    Article  Google Scholar 

  • Piza, E., J. Caplan, and L. Kennedy. 2014b. Is the punishment more certain? An analysis of CCTV detections and enforcement. Justice Quarterly 31 (6): 1015–1043.

    Article  Google Scholar 

  • Piza, E., B. Welsh, D. Farrington, and A. Thomas. 2019. CCTV surveillance for crime prevention: A 40-year systematic review with meta-analysis. Criminology & Public Policy 18: 135–159.

    Article  Google Scholar 

  • Prenzler, T., and E. Wilson. 2019. The Ipswich (Queensland) safe city program: an evaluation. Security Journal 32: 137–152.

    Article  Google Scholar 

  • Ratcliffe, J.H., T. Taniguchi, and R.B. Taylor. 2009. The crime reduction effects of public CCTV cameras: a multi-method spatial approach. Justice Quarterly 264: 746–770.

    Article  Google Scholar 

  • Ren, S., K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99.

  • Sandhu, A. 2017. ‘I’m glad that was on camera’: A case study of police officers’ perceptions of cameras. Policing and Society 29 (2): 223–235.

    Article  Google Scholar 

  • Sasse, A. 2010. Not seeing the crime for the cameras? Communications of the ACM 53: 22–25.

    Article  Google Scholar 

  • Scheitle, C.P., and C. Halligan. 2018. Explaining the adoption of security measures by places of worship: Perceived risk of victimization and organizational structure. Security Journal 31 (10): 1–23.

    Google Scholar 

  • Shah, M. 2017. Project Report: Studying the impact of video analytics for pre, live and post event analysis on outcomes of criminal justice, July 2016–December 2016. Orlando, FL: University of Central Florida Center for Research on Computer Vision. Funded by U.S. Department of Justice, NIJ-2015-R2-CX-K025.

  • Surette, R. 2005. The thinking eye: Pros and cons of second generation CCTV surveillance systems. Policing: An International Journal of Police Strategies and Management 281: 152–173.

    Article  Google Scholar 

  • Surette, R. 2006. CCTV and citizen guardianship suppression: A questionable proposition. Police Quarterly 9: 100–125.

    Article  Google Scholar 

  • Surette, R. 2015. Media, crime, and criminal justice: Images, realities, and policies. Stamford, CT: Cengage.

    Google Scholar 

  • Taylor, E. 2010. Evaluating CCTV: Why the findings are inconsistent, inconclusive and ultimately irrelevant. Crime Prevention and Community Safety 124: 209–232.

    Article  Google Scholar 

  • The Scotsman. 2008. CCTV: Does it actually work? The Scotsman, May 28. https://www.scotsman.com/news-2-15012/cctv-does-it-actually-work-1-1169849. Downloaded 8 Oct 2019.

  • Thomas, J., and K. Cook. 2006. A visual analytics agenda. IEEE Computer Graphics and Applications 261: 10–13.

    Article  Google Scholar 

  • Tickner, A., and E. Poulton. 1973. Monitoring up to 16 synthetic television picture showing a great deal of movement. Ergonomics 16: 381–401.

    Article  Google Scholar 

  • Uijlings, J.R., K.E. Van De Sande, T. Gevers, and A. Smeulders. 2013. Selective search for object recognition. International Journal of CV 1042: 154–171.

    Google Scholar 

  • Wang, L., Y. Qiao, and X. Tang. 2014. Video action detection with relational dynamic-poselets. In European conference on CV, pp. 565–580. Cham: Springer.

  • Welsh, B., and D. Farrington. 2002. Crime prevention effects of closed circuit television: A systematic review. Home Office Research Study 252. London: Home Office.

  • Welsh, B., and D. Farrington. 2004. Evidence-based crime prevention: The effectiveness of CCTV. Crime Prevention and Community Safety 6: 21–33.

    Article  Google Scholar 

  • Welsh, B., and D. Farrington. 2009. Public area CCTV and crime prevention: An updated systematic review and meta-analysis. Justice Quarterly 264: 716–745.

    Article  Google Scholar 

  • Welsh, B., D. Farrington, and S. Taheri. 2015. Effectiveness and social costs of public area surveillance for crime prevention. Annual Review of Law and Social Science 11: 111–130.

    Article  Google Scholar 

  • Yang, M., S. Ji, W. Xu, J. Wang, F. Lv, K. Yu, Y. Gong, M. Dikmen, D.J. Li, and T.S. Huang. 2009. Detecting human actions in surveillance video. In TREC video retrieval evaluation workshop.

  • Ye, G., D. Liu, J. Wang, and S. Chang. 2013. Large-scale video hashing via structure learning. In Proceedings of the IEEE international conference on CV, pp. 2272–2279.

  • Yeung, S., A. Fathi, and L. Fei-Fei. 2014. Videoset: Video summary evaluation through text. arXiv preprint. arXiv:1406.5824.

  • Zhao, B., and E.P. Xing. 2014. Quasi real-time summarization for consumer videos. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Zhou, Z.H. 2018. A brief introduction to weakly supervised learning. National Science Review 5 (1): 44–53.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ray Surette.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Surette, R. & Shah, M. Automated monitoring for security camera networks: promise from computer vision labs. Secur J 34, 389–409 (2021). https://doi.org/10.1057/s41284-020-00230-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s41284-020-00230-w

Keywords

Navigation