Abstract
Video scene text contains valuable information for scene understanding, as scene text in video provides important semantic clues for human beings to sense the environment. Text detection in natural scene is challenging due to low resolution/low contrast, cluttered backgrounds and various illumination changes. Therefore, in this paper, a new approach has been proposed to detect video scene text based on saliency edge map, which combines both saliency map and edge features for scene text detection. The saliency map is conducive to detecting the text with cluttered backgrounds whereas the edge map is suitable for detecting the scene text with low resolution and various illumination changes. First of all, we retrieve the saliency map and edge map on the video frame/image, respectively. The saliency map can keep most of saliency regions in the video frame/image which will remove some complicated background. The edge map retrieves the edge feature which is not sensitive to the illumination changes and low resolution/low contrast regions. Then we integrate the edge map and saliency map into saliency edge map (SEM), which preserves the advantages of saliency map and edge maps. Finally, based on Gaussian mixture model (GMM), the SEM can be divided into three kinds of components: bright characters, dark characters and background, and we perform connected component analysis on these three components to get the text regions. Experimental evaluations based on public dataset, such as ICDAR 2003, 2013, MSRA-TD500 and SVT, and news video dataset demonstrate that our method significantly outperforms the other 4 text detection algorithms in terms of recall, precision, F-Score and detection speed, especially when there are challenges such as text with different alignments, character sizes, languages, appearances and uneven illumination.
Similar content being viewed by others
References
Achanta R, Susstrunk S (2010) Saliency detection using maximum symmetric surround. International Conference on Image Processing (ICIP), Hong Kong, p 2653–2656.
Ding W, Shan S, Su F (2017) Text detection in natural scene images by hierarchical localization and growing of textual components. IEEE International Conference on Multimedia and Expo (ICME), p 775–780
Dollar P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570
Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multiscale SWT and Edeg filtering. 23rd International Conference on Pattern Recognition (ICPR), p 645–650
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
He X, Song Y, Zhang Y (2017) Scene text detection based on skeleton-cut detector. International Conference on Image Processing (ICIP), p 3375–3379
He W, Zhang X-Y, Yin F, Liu C-L (2017) Deep direct regression for multi-oriented scene text detection. IEEE International Conference on Computer Vision (ICCV), p 745–753.
Karatzas D et al (2013) ICDAR 2013 robust reading competition. Int. Conf. Document Anal. Recognit. (ICDAR), p. 1484–1493
Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305
Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans Image Process 24(11):4488–4501
Liu Y, Zhang D, Zhang Y, Lin S (2014) Real-time scene text detection based on stroke model. the 22nd International Conference on Pattern Recognition (ICPR), p 3116–3120
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 Robust Reading Competitions. Int. Conf. Document Anal. Recognit. (ICDAR), p 682–687.
Ning G, Han TX, He Z (2015) Scene text detection based on componnet-level fusion and region-level verification. International Conference on Image Processing (ICIP), p 837–841
Roy U, Harit G (2012) Text detection on camera acquired document images using supervised classification of connected components in wavelet domain. In: Proceeding of 21st International Conference on Pattern Recognition (ICPR) 2012, p 270–273
Shekar BH, Smitha ML, Shivakumara P (2014) Discrete wavelet transform and gradient difference based approach for text localization in videos. Fifth International Conference on Signals and Image Processing, p 280–284
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419
Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans Circuits Syst Video Technol 23(10):1729–1739
Tang Y, Wu X (2017) Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans Image Process 26(3):1509–1520
Tian S, Pan Y, Huang C, Lu S, Yu K, Tan CL (2015) Text flow: a unified text detection system in natural scene images. IEEE International Conference on Computer Vision (ICCV), p 4651–4659
Tian S, Lu S, Li C (2017) WeText: scene text detection under weak supervision. IEEE International Conference on Computer Vision (ICCV), p 1501–1509.
Wang K, Belongie S (2010) Word spotting in the wild. Eur. Conf. Comput. Vis. (ECCV), p 591–604
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images, CVPR, p 1083–1090
Ye Q, Doermann DS (2014) Robust scene text detection using integrated feature discrimination. In: Proceedings of International Conference on Image Processing (ICIP), p 1678–1682
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Zhou G, Liu Y, Xu L, Jia Z (2015) Scene text detection method based on the hierarchical model. IET Comput Vis 9(4):500–510
Zhu S, Zanibbi R, (2016) A text detection system for natural scenes with convolutional feature learning and cascaded classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p 625–632.
Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges, ECCV, p 391–405
Acknowledgments
This work reported in this paper is supported by Beijing Natural Science Foundation(4173073); the Surface Project of Beijing Committee of Education under Grant No. KM201710028021.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, X. Automatic video scene text detection based on saliency edge map. Multimed Tools Appl 78, 34819–34838 (2019). https://doi.org/10.1007/s11042-019-08045-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08045-7