Abstract
Intelligent video surveillance is one of the most challenging tasks in computer vision due to high requirements for reliability, real-time processing and robustness on low resolution videos. In this paper we propose solutions to those challenges through a unified system for indexing and retrieval based on recent discoveries in deep learning. We show that a single stage object detector such as YOLOv2 can be used as a very efficient tool for event detection, key frame selection and scene recognition. The motivation behind our approach is that the feature maps computed by the deep detector encode not only the category of objects present in the image, but also their locations, eliminating automatically background information. We also provide a solution to the low video quality problem with the introduction of a light convolutional network for object description and retrieval. Preliminary experimental results on different video surveillance datasets demonstrate the effectiveness of the proposed system.
Supported by Foxstream: http://www.foxstream.fr.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Awad, G., Snoek, C.G.M., Smeaton, A.F., Quénot, G.: Trecvid semantic indexing of video: a 6-year retrospective. ITE Trans. Media Technol. Appl. 4(3), 187–208 (2016)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Fularz, M., Kraft, M., Schmidt, A., Niechciał, J.: The PUT surveillance database. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 7. AISC, vol. 389, pp. 73–79. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-23814-2_9
Girshick, R.B.: Fast r-cnn. In: ICCV, pp. 1440–1448. IEEE Press, Santiago (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE Press, Venise (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Press, Las Vegas (2016)
Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6), 797–819 (2011)
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34(3), 334–352 (2004)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456. JMLR.org (2015)
Jung, H., Choi, M.K., Jung, J., Lee, J.H., Kwon, S., Jung, W.Y.: Resnet-based vehicle classification and localization in traffic surveillance systems. In: CVPRW, pp. 934–940. IEEE Press, Honolulu (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc., Lake Tahoe (2012)
Lin, T.Y., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2999–3007. IEEE Press, Venise (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Luo, Z., et al.: MIO-TCD: a new benchmark dataset for vehicle classification and localization. IEEE Trans. Image Process. 27, 5129–5141 (2018)
Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: ISCAS, pp. 1–4. IEEE Press, Baltimore (2017)
Podlesnaya, A., Podlesnyy, S.: Deep learning based semantic video indexing and retrieval. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 359–372. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_27
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE Press, Las Vegas (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 6517–6525. IEEE Press, Honolulu (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.B.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: CVPR, pp. 2246–2252. IEEE Press, Ft. Collins (1999)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9. IEEE Press, Boston (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826. IEEE Press, Las Vegas (2016)
Ueki, K., Kobayashi, T.: Object detection oriented feature pooling for video semantic indexing. In: VISIGRAPP, pp. 44–51. SciTePress (2017)
Wang, Z., Chang, S., Yang, Y., Liu, D., Huang, T.S.: Studying very low resolution recognition using deep networks. In: CVPR, pp. 4792–4800. IEEE Press, Las Vegas (2016)
Xu, Z., Hu, J., Deng, W.: Recurrent convolutional neural network for video classification. In: ICME, pp. 1–6. IEEE Press, Seattle (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Durand, T., He, X., Pop, I., Robinault, L. (2019). Utilizing Deep Object Detector for Video Surveillance Indexing and Retrieval. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-05716-9_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)