Abstract
In today's world, scene text detection is important for a wide range of scientific and industrial processes. Compared with text detection in documents, text detection in natural scenes is challenging since they are subjected to different orientations, scaling, brightness variations, and complex backgrounds. Further scenes can contain multiple scripts which limits the performance of detection algorithms. In this paper, we propose a state-of-the-art algorithm for text detection for a bilingual natural scene dataset. The framework consists of (a) Faster R-CNN employed to extract probable text regions within the scene images, (b) rearrangement of the text region as consecutive frames along the time axis and extraction of global and local shape features from the three orthogonal planes and (c) use of simple and effective classifier to predict the features extracted from regions as text or non-text region. The proposed frame when compared to other text detection techniques improves the overall text detection accuracy. The validity of the algorithm is verified on the bilingual text detection dataset MSRA-TD500, and a promising F1 score of 0.70 is reported.
Similar content being viewed by others
References
Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229
Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recognit 37(5):977–997
Zheng Y, Li Q, Liu J, Liu H, Li G, Zhang S (2017) A cascaded method for text detection in natural scene images. Neurocomputing 238:307–315
Zhang Z, Shen W, Yao C and Bai X (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2558–2567
Zhang X, Zhang Z, Zhang C and Bai X (2016) Symmetry-based object proposal for text detection. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 709–714
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Jiang Y, Zhu X, Wang X, Yang S et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection.arXiv preprint arXiv:1706.09579
Zhong Z, Jin L, Zhang S and Feng Z (2016) Deeptext: a unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314
Girshick R, Donahue J, Darrell T and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R and Su, J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
He K, Gkioxari G, Dollár P and Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Shao F, Wang X, Meng F, Zhu J, Wang D, Dai J (2019) Improved faster R-CNN traffic sign detection based on a second region of interest and highly possible regions proposal network. Sens (Basel, Switz) 19(10):2288. https://doi.org/10.3390/s19102288
Maheshwari K, Joseph Raj AN, Mahesh VG, Zhuang Z, Rufus E, Shivakumara P, Naik GR (2019) Bilingual text detection in natural scene images using invariant moments. J Intell & Fuzzy Syst 37(5):6773–6784
Bosch A, Zisserman A and Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp 401–408
Fan X, Tjahjadi T (2015) A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences. Pattern Recognit 48(11):3407–3416
Yao C, Bai X, Liu W, Ma Y et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit (IJDAR) 22(2):143–162
Kumuda T and Basavaraj L (2015) Detection and localization of text from natural scene images using texture features. In 2015 IEEE international conference on computational intelligence and computing research (ICCIC), IEEE, pp 1–4
Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vision 116(1):1–20
Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813
Liu Y, Goto S, Ikenaga T (2006) A contour-based robust algorithm for text detection in color images. IEICE Trans Inf Syst 89(3):1221–1230
Simonyan K and Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Nair V and Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Chorowski JK, Bahdanau D, Serdyuk D, Cho K et al (2015) Attention-based models for speech recognition. In: Advances in neural information processing systems, pp 577–585
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Kumbhar P, Mali M (2016) A survey on feature selection techniques and classification algorithms for efficient text classification. Int J Sci Res 5(5):9
Dalal N and Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol 1, IEEE, pp 886–893
Grauman K and Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV'05) volume 1, vol 2, IEEE, pp 1458–1465
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. John Wiley & Sons, USA
Yin XC, Yin X, Huang K, Hao HW (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Epshtein B, Ofek E and Wexler Y (2010). Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970
Chen H, Tsai SS, Schroth G, Chen DM et al (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing, IEEE, pp 2609–2612
Funding
Shantou University, NTF17016, Alex Noel Joseph Raj , Department of Science and Technology, Ministry of Science and Technology, INTRUSRFBR382, Ruban Nersisson. Basic and Applied Basic Research Foundation of Guangdong Province, No. 2020B1515120061, Zhemin Zhuang.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors state that there is no conflict of interest/competing interest with anything or anybody.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Joseph Raj, A.N., Junmin, C., Nersisson, R. et al. Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients. Pattern Anal Applic 25, 1001–1013 (2022). https://doi.org/10.1007/s10044-022-01066-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01066-3