Abstract
Text detection is a primary task for text recognition and understanding, which can be used in many image analysis techniques. In this paper, we propose an effective scene text detection method including three major steps: connected components (CCs) extraction, character-linking and text/non-text classification. First, for CCs extraction, we design an adaptive color reduction scheme by analyzing image color histogram, which reasonably selects color centers and generates unfixed number of color layers for images in different color complexities. Then, for character-linking, an adjacent character model is built by training an extreme learning machine (ELM), instead of setting various thresholds in previous approaches. Finally, a hybrid text verification strategy is adopted, combining convolutional neural network with ELM for text/non-text classification and performing better than just using one of them. Experimental results on some publicly available datasets illustrate the effectiveness of our method and comparative results with some state-of-the-art algorithms demonstrate our competitiveness.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yi, C., Tian, Y: Assistive text reading from complex background for blind persons. In: Camera-Based Document Analysis and Recognition, pp. 15–28. Springer, Berlin (2012)
Yi, C., Tian, Y.: Scene text recognition in mobile applications by character descriptor and structure configuration. IEEE Trans. Image Process. 23(7), 2972–2982 (2014)
Weinman, J.J., Miller, E.L., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009)
Yin, X.C., Hao, H.W., Sun, J., Naoi, S.: Robust vanishing point detection for mobilecam-based documents. In: International Conference on Document Analysis and Recognition (ICDAR), 2011, pp. 136–140. IEEE (2011)
Liu, X., Li, C., Zhu, H., Wong, T.-T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput., pp. 1–11 (2015). doi: 10.1007/s00371-015-1084-0
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, vol. 2, pp. II–366. IEEE (2004)
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-hog: an effective gradient-based descriptor for single line text regions. Pattern Recognit. 46(3), 1078–1090 (2013)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010)
Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22(6), 2296–2305 (2013)
Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1241–1248. IEEE (2013)
Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)
Nikolaou, N., Papamarkos, N.: Color reduction for complex document images. Int. J. Imaging Syst. Technol. 19(1), 14–26 (2009)
Yi, C., Tian, Y.L.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)
Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2609–2612. IEEE (2011)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Yi, C., Tian, Y.L.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012)
Wang, X., Song, Y., Zhang, Y.: Natural scene text detection with multi-channel connected component segmentation. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1375–1379. IEEE (2013)
Yao, C., Bai, X., Liu, W.: A unified framework for multi-oriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)
Zhang, X., Lin, Z., Sun, F., Ma, Y.: Transform invariant text extraction. Vis. Comput. 30(4), 401–415 (2014)
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: Tilt: transform invariant low-rank textures. Int. J. Comput. Vis. 99(1), 1–24 (2012)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308. IEEE (2012)
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Computer Vision—ECCV 2014, pp. 497–511. Springer, Berlin (2014)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012)
Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: Workshop on Camera-Based Document Analysis and Recognition, vol. 1 (2007)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 107–116 (2013)
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 682–682. IEEE Computer Society (2003)
Wolf, C., Jolion, J.-M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)
Lucas, S.M.: ICDAR 2005 text locating competition results. In: Eighth International Conference on Document Analysis and Recognition, 2005. Proceedings, pp. 80–84. IEEE (2005)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.-P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)
Wang, K., Belongie, S.: Word Spotting in the Wild. Springer, Berlin (2010)
de Campos, T.E., Babu, B.R., Varma, M.: Character recognition in natural images. In: VISAPP (2), pp. 273–280 (2009)
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691. IEEE (2011)
Shi, C., Wang, C., Xiao, B., Gao, S., Hu, J.: Scene text recognition using structure guided character detection and linguistic knowledge. IEEE Trans. Circuits Syst. Video Technol. 24(7), 1235–1250 (2014)
Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2373–2376. IEEE (2009)
Fabrizio, J., Marcotegui, B., Cord, M.: Text detection in street level images. Pattern Anal. Appl. 16(4), 519–533 (2013)
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable suggestions.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This work is partly supported by the National Natural Science Foundation of China (Grant Nos. 61172184, 61379107, 61402539, and 61573380), Program for New Century Excellent Talents in University of Education Ministry in China (Grant No. NCET-13-0603), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130162110016) and the Fundamental Research Funds for the Central Universities of Central South University (Grant No. 2015zzts052).
Rights and permissions
About this article
Cite this article
Wu, H., Zou, B., Zhao, Yq. et al. Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. Vis Comput 33, 113–126 (2017). https://doi.org/10.1007/s00371-015-1156-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-015-1156-1