Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients

Alex Noel Joseph Raj¹,
Chen Junmin¹,
Ruban Nersisson ORCID: orcid.org/0000-0003-1695-3618²,
Vijayalakshmi G. V. Mahesh³ &
…
Zhemin Zhuang¹

478 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In today's world, scene text detection is important for a wide range of scientific and industrial processes. Compared with text detection in documents, text detection in natural scenes is challenging since they are subjected to different orientations, scaling, brightness variations, and complex backgrounds. Further scenes can contain multiple scripts which limits the performance of detection algorithms. In this paper, we propose a state-of-the-art algorithm for text detection for a bilingual natural scene dataset. The framework consists of (a) Faster R-CNN employed to extract probable text regions within the scene images, (b) rearrangement of the text region as consecutive frames along the time axis and extraction of global and local shape features from the three orthogonal planes and (c) use of simple and effective classifier to predict the features extracted from regions as text or non-text region. The proposed frame when compared to other text detection techniques improves the overall text detection accuracy. The validity of the algorithm is verified on the bilingual text detection dataset MSRA-TD500, and a promising F1 score of 0.70 is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Parameter Tuning in MSER for Text Localization in Multi-lingual Camera-Captured Scene Text Images

Enhanced Characterness for Text Detection in the Wild

References

Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229
Article Google Scholar
Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recognit 37(5):977–997
Article Google Scholar
Zheng Y, Li Q, Liu J, Liu H, Li G, Zhang S (2017) A cascaded method for text detection in natural scene images. Neurocomputing 238:307–315
Article Google Scholar
Zhang Z, Shen W, Yao C and Bai X (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2558–2567
Zhang X, Zhang Z, Zhang C and Bai X (2016) Symmetry-based object proposal for text detection. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 709–714
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Article Google Scholar
Jiang Y, Zhu X, Wang X, Yang S et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection.arXiv preprint arXiv:1706.09579
Zhong Z, Jin L, Zhang S and Feng Z (2016) Deeptext: a unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314
Girshick R, Donahue J, Darrell T and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R and Su, J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
He K, Gkioxari G, Dollár P and Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Shao F, Wang X, Meng F, Zhu J, Wang D, Dai J (2019) Improved faster R-CNN traffic sign detection based on a second region of interest and highly possible regions proposal network. Sens (Basel, Switz) 19(10):2288. https://doi.org/10.3390/s19102288
Article Google Scholar
Maheshwari K, Joseph Raj AN, Mahesh VG, Zhuang Z, Rufus E, Shivakumara P, Naik GR (2019) Bilingual text detection in natural scene images using invariant moments. J Intell & Fuzzy Syst 37(5):6773–6784
Article Google Scholar
Bosch A, Zisserman A and Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp 401–408
Fan X, Tjahjadi T (2015) A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences. Pattern Recognit 48(11):3407–3416
Article Google Scholar
Yao C, Bai X, Liu W, Ma Y et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit (IJDAR) 22(2):143–162
Article Google Scholar
Kumuda T and Basavaraj L (2015) Detection and localization of text from natural scene images using texture features. In 2015 IEEE international conference on computational intelligence and computing research (ICCIC), IEEE, pp 1–4
Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076
Article Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vision 116(1):1–20
Article MathSciNet Google Scholar
Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813
MathSciNet MATH Google Scholar
Liu Y, Goto S, Ikenaga T (2006) A contour-based robust algorithm for text detection in color images. IEICE Trans Inf Syst 89(3):1221–1230
Article Google Scholar
Simonyan K and Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Nair V and Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Chorowski JK, Bahdanau D, Serdyuk D, Cho K et al (2015) Attention-based models for speech recognition. In: Advances in neural information processing systems, pp 577–585
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Kumbhar P, Mali M (2016) A survey on feature selection techniques and classification algorithms for efficient text classification. Int J Sci Res 5(5):9
Google Scholar
Dalal N and Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol 1, IEEE, pp 886–893
Grauman K and Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV'05) volume 1, vol 2, IEEE, pp 1458–1465
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar
Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. John Wiley & Sons, USA
MATH Google Scholar
Yin XC, Yin X, Huang K, Hao HW (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Google Scholar
Epshtein B, Ofek E and Wexler Y (2010). Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970
Chen H, Tsai SS, Schroth G, Chen DM et al (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing, IEEE, pp 2609–2612

Download references

Funding

Shantou University, NTF17016, Alex Noel Joseph Raj , Department of Science and Technology, Ministry of Science and Technology, INTRUSRFBR382, Ruban Nersisson. Basic and Applied Basic Research Foundation of Guangdong Province, No. 2020B1515120061, Zhemin Zhuang.

Author information

Authors and Affiliations

Key Laboratory of Digital Signal and Image Processing of Guangdong Province, Department of Electronic Engineering, College of Engineering, Shantou University, Shantou, 515063, China
Alex Noel Joseph Raj, Chen Junmin & Zhemin Zhuang
School of Electrical Engineering, Vellore Institute of Technology, Vellore, 632014, India
Ruban Nersisson
Department of Electronics and Communication Engineering, BMS Institute of Technology and Management, Bangalore, Karnataka, India
Vijayalakshmi G. V. Mahesh

Authors

Alex Noel Joseph Raj
View author publications
You can also search for this author in PubMed Google Scholar
Chen Junmin
View author publications
You can also search for this author in PubMed Google Scholar
Ruban Nersisson
View author publications
You can also search for this author in PubMed Google Scholar
Vijayalakshmi G. V. Mahesh
View author publications
You can also search for this author in PubMed Google Scholar
Zhemin Zhuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruban Nersisson.

Ethics declarations

Conflict of interest

The authors state that there is no conflict of interest/competing interest with anything or anybody.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joseph Raj, A.N., Junmin, C., Nersisson, R. et al. Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients. Pattern Anal Applic 25, 1001–1013 (2022). https://doi.org/10.1007/s10044-022-01066-3

Download citation

Received: 11 June 2020
Accepted: 12 March 2022
Published: 06 April 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10044-022-01066-3

Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Parameter Tuning in MSER for Text Localization in Multi-lingual Camera-Captured Scene Text Images

Enhanced Characterness for Text Detection in the Wild

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Parameter Tuning in MSER for Text Localization in Multi-lingual Camera-Captured Scene Text Images

Enhanced Characterness for Text Detection in the Wild

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation