Automatic video scene text detection based on saliency edge map

Xiaodong Huang ORCID: orcid.org/0000-0002-7953-750X¹

265 Accesses
4 Citations
Explore all metrics

Abstract

Video scene text contains valuable information for scene understanding, as scene text in video provides important semantic clues for human beings to sense the environment. Text detection in natural scene is challenging due to low resolution/low contrast, cluttered backgrounds and various illumination changes. Therefore, in this paper, a new approach has been proposed to detect video scene text based on saliency edge map, which combines both saliency map and edge features for scene text detection. The saliency map is conducive to detecting the text with cluttered backgrounds whereas the edge map is suitable for detecting the scene text with low resolution and various illumination changes. First of all, we retrieve the saliency map and edge map on the video frame/image, respectively. The saliency map can keep most of saliency regions in the video frame/image which will remove some complicated background. The edge map retrieves the edge feature which is not sensitive to the illumination changes and low resolution/low contrast regions. Then we integrate the edge map and saliency map into saliency edge map (SEM), which preserves the advantages of saliency map and edge maps. Finally, based on Gaussian mixture model (GMM), the SEM can be divided into three kinds of components: bright characters, dark characters and background, and we perform connected component analysis on these three components to get the text regions. Experimental evaluations based on public dataset, such as ICDAR 2003, 2013, MSRA-TD500 and SVT, and news video dataset demonstrate that our method significantly outperforms the other 4 text detection algorithms in terms of recall, precision, F-Score and detection speed, especially when there are challenges such as text with different alignments, character sizes, languages, appearances and uneven illumination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

An evidence-based model of saliency feature extraction for scene text analysis

Article 21 July 2016

A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut

Article 17 September 2018

Robust Video Text Detection with Morphological Filtering Enhanced MSER

Article 13 March 2015

References

Achanta R, Susstrunk S (2010) Saliency detection using maximum symmetric surround. International Conference on Image Processing (ICIP), Hong Kong, p 2653–2656.
Ding W, Shan S, Su F (2017) Text detection in natural scene images by hierarchical localization and growing of textual components. IEEE International Conference on Multimedia and Expo (ICME), p 775–780
Dollar P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570
Article Google Scholar
Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multiscale SWT and Edeg filtering. 23rd International Conference on Pattern Recognition (ICPR), p 645–650
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
Article Google Scholar
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
Article MathSciNet Google Scholar
He X, Song Y, Zhang Y (2017) Scene text detection based on skeleton-cut detector. International Conference on Image Processing (ICIP), p 3375–3379
He W, Zhang X-Y, Yin F, Liu C-L (2017) Deep direct regression for multi-oriented scene text detection. IEEE International Conference on Computer Vision (ICCV), p 745–753.
Karatzas D et al (2013) ICDAR 2013 robust reading competition. Int. Conf. Document Anal. Recognit. (ICDAR), p. 1484–1493
Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305
Article MathSciNet Google Scholar
Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans Image Process 24(11):4488–4501
Article MathSciNet Google Scholar
Liu Y, Zhang D, Zhang Y, Lin S (2014) Real-time scene text detection based on stroke model. the 22nd International Conference on Pattern Recognition (ICPR), p 3116–3120
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 Robust Reading Competitions. Int. Conf. Document Anal. Recognit. (ICDAR), p 682–687.
Ning G, Han TX, He Z (2015) Scene text detection based on componnet-level fusion and region-level verification. International Conference on Image Processing (ICIP), p 837–841
Roy U, Harit G (2012) Text detection on camera acquired document images using supervised classification of connected components in wavelet domain. In: Proceeding of 21st International Conference on Pattern Recognition (ICPR) 2012, p 270–273
Shekar BH, Smitha ML, Shivakumara P (2014) Discrete wavelet transform and gradient difference based approach for text localization in videos. Fifth International Conference on Signals and Image Processing, p 280–284
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419
Article Google Scholar
Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans Circuits Syst Video Technol 23(10):1729–1739
Article Google Scholar
Tang Y, Wu X (2017) Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans Image Process 26(3):1509–1520
Article Google Scholar
Tian S, Pan Y, Huang C, Lu S, Yu K, Tan CL (2015) Text flow: a unified text detection system in natural scene images. IEEE International Conference on Computer Vision (ICCV), p 4651–4659
Tian S, Lu S, Li C (2017) WeText: scene text detection under weak supervision. IEEE International Conference on Computer Vision (ICCV), p 1501–1509.
Wang K, Belongie S (2010) Word spotting in the wild. Eur. Conf. Comput. Vis. (ECCV), p 591–604
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images, CVPR, p 1083–1090
Ye Q, Doermann DS (2014) Robust scene text detection using integrated feature discrimination. In: Proceedings of International Conference on Image Processing (ICIP), p 1678–1682
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Article Google Scholar
Zhou G, Liu Y, Xu L, Jia Z (2015) Scene text detection method based on the hierarchical model. IET Comput Vis 9(4):500–510
Article Google Scholar
Zhu S, Zanibbi R, (2016) A text detection system for natural scenes with convolutional feature learning and cascaded classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p 625–632.
Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges, ECCV, p 391–405

Download references

Acknowledgments

This work reported in this paper is supported by Beijing Natural Science Foundation(4173073); the Surface Project of Beijing Committee of Education under Grant No. KM201710028021.

Author information

Authors and Affiliations

Capital Normal University, Beijing, 100048, China
Xiaodong Huang

Authors

Xiaodong Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X. Automatic video scene text detection based on saliency edge map. Multimed Tools Appl 78, 34819–34838 (2019). https://doi.org/10.1007/s11042-019-08045-7

Download citation

Received: 08 October 2018
Revised: 15 July 2019
Accepted: 26 July 2019
Published: 10 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11042-019-08045-7

Automatic video scene text detection based on saliency edge map

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An evidence-based model of saliency feature extraction for scene text analysis

A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut

Robust Video Text Detection with Morphological Filtering Enhanced MSER

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic video scene text detection based on saliency edge map

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An evidence-based model of saliency feature extraction for scene text analysis

A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut

Robust Video Text Detection with Morphological Filtering Enhanced MSER

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation