Abstract
In recent years, some learning-based methods are proposed to detect and locate humans in real-time via convolutional neural networks (CNN). However, high-performance graphics processing units (GPUs) are required in those methods. To resolve this problem, a preprocessing procedure based on video segmentation is proposed to speed up face detection. Meanwhile, an accelerating toolkit is employed in this study to perform face detection in real-time on a standard central processing unit (CPU). Experimental results indicate that the proposed method can achieve an F1-Score of 93.2% and 4.5 times of real-time speed with one CPU on 155883 test frames from the RAI dataset, YouTube, and YOUKU. Notably, when the video sequence is with fewer frames of human faces, the highest speed is nearly 18 times faster than that without video segmentation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu Q, He Z, Li X, Zheng Y (2020) Ptb-tir: A thermal infrared pedestrian tracking benchmark. IEEE Trans Multimedia 22(3):666–675. https://doi.org/10.1109/TMM.2019.2932615
Yang H, Liu L, Min W, Yang X, Xiong X (2021) Driver yawning detection based on subtle facial action recognition. IEEE Trans Multimedia 23:572–583. https://doi.org/10.1109/TMM.2020.2985536
Tian F, Gao Y, Fang Z, Fang Y, Gu J, Fujita H, Hwang J-N (2021) Depth estimation using a self-supervised network based on cross-layer feature fusion and the quadtree constraint. IEEE Trans Circuits Syst Video Technol
Wu D, Sun D-W (2013) Colour measurements by computer vision for food quality control–a review. Trends Food Sci Technol 29(1):5–20
Samaiya D, Gupta KK (2018) Intelligent video surveillance for real time energy savings in smart buildings using hevc compressed domain features. Multimed Tools Appl 77(21):29059–29076
Hui-bin L, Fei W, Qiang C, Yong P (2016) Recognition of individual object in focus people group based on deep learning. In: 2016 International conference on audio, language and image processing (ICALIP). IEEE, pp 615–619
Gao Y, Villecco F, Li M, Song W (2017) Multi-scale permutation entropy based on improved lmd and hmm for rolling bearing diagnosis. Entropy 19(4):176
Zhao Y, Li H, Wan S, Sekuboyina A, Hu X, Tetteh G, Piraud M, Menze B (2019) Knowledge-aided convolutional neural network for small organ segmentation. IEEE J Biomed Health Inform 23(4):1363–1373
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl-Based Syst 194:105590
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision (ECCV), pp 83–98
Liu H, Tan T-H, Kuo T-Y (2019) A novel shot detection approach based on orb fused with structural similarity. IEEE Access 8:2472–2481
Ding S, Qu S, Xi Y, Wan S (2019) A long video caption generation algorithm for big video data retrieval. Futur Gener Comput Syst 93:583–595
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: 2011 International conference on computer vision. Ieee, pp 2564–2571
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error measurement to structural similarity. IEEE Trans Image Process 13(1)
AImageLab (2021) Rai dataset https://aimagelab.ing.unimore.it/imagelab/researchActivity.asp?idActivity=19
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Li SZ, Zhang Z (2004) Floatboost learning and statistical face detection. IEEE Trans Pattern Anal Mach Intell 26(9):1112–1123
Huang C, Ai H, Li Y, Lao S (2007) High-performance rotation invariant multiview face detection. IEEE Trans Pattern Anal Mach Intell 29(4):671–686
Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 650–657
Zhang S, Wang X, Lei Z, Li SZ (2019) Faceboxes: A cpu real-time and accurate unconstrained face detector. Neurocomputing 364:297–309
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5203–5212
Intel (2021) Model Zoo https://docs.openvinotoolkit.org/2019_R1/_face_detection_adas_0001_description_face_detection_adas_0001.html
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
GG LP, Domnic S (2014) Walsh–hadamard transform kernel-based feature vector for shot boundary detection. IEEE Trans Image Process 23(12):5187–5197
Mori G, Belongie S, Malik J (2005) Efficient shape matching using shape contexts. IEEE Trans Pattern Anal Mach Intell 27(11):1832–1837
Krishnapuram R, Medasani S, Jung S-H, Choi Y-S, Balasubramaniam R (2004) Content-based image retrieval based on a fuzzy approach. IEEE Trans Knowl Data Eng 16(10):1185–1199
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: European conference on computer vision. Springer, pp 430–443
Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: Binary robust independent elementary features. In: European conference on computer vision. Springer, pp 778–792
Intel (2021) OpenVINO Toolkit https://software.intel.com/en-us/openvino-toolkit
Kozlov A, Osokin D (2019) Development of real-time adas object detector for deployment on cpu. In: Proceedings of SAI intelligent systems conference. Springer, pp 740–750
Osokin D (2018) Real-time 2d multi-person pose estimation on cpu: Lightweight openpose. arXiv:1811.12004
YouTube (2019) Youtube https://www.youtube.com/watch?v=no-ZR7-x76s
YOUKU (2021) YOUKU. https://v.youku.com/v_show/id_XOTU0NzIzMTQw.html?spm=a-2h0k.114173-42.soresults.dtitle
YOUKU (2021) YOUKU. https://v.youku.com/v_show/id_XNjE2NDk4OTY=.html?spm=a2h0k.11417342.soresults.dtitle
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, H., Fan, Z., Chen, Q. et al. Enhancing face detection in video sequences by video segmentation preprocessing. Appl Intell 53, 2897–2907 (2023). https://doi.org/10.1007/s10489-022-03608-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03608-y