Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In today's world, scene text detection is important for a wide range of scientific and industrial processes. Compared with text detection in documents, text detection in natural scenes is challenging since they are subjected to different orientations, scaling, brightness variations, and complex backgrounds. Further scenes can contain multiple scripts which limits the performance of detection algorithms. In this paper, we propose a state-of-the-art algorithm for text detection for a bilingual natural scene dataset. The framework consists of (a) Faster R-CNN employed to extract probable text regions within the scene images, (b) rearrangement of the text region as consecutive frames along the time axis and extraction of global and local shape features from the three orthogonal planes and (c) use of simple and effective classifier to predict the features extracted from regions as text or non-text region. The proposed frame when compared to other text detection techniques improves the overall text detection accuracy. The validity of the algorithm is verified on the bilingual text detection dataset MSRA-TD500, and a promising F1 score of 0.70 is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229

    Article  Google Scholar 

  2. Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recognit 37(5):977–997

    Article  Google Scholar 

  3. Zheng Y, Li Q, Liu J, Liu H, Li G, Zhang S (2017) A cascaded method for text detection in natural scene images. Neurocomputing 238:307–315

    Article  Google Scholar 

  4. Zhang Z, Shen W, Yao C and Bai X (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2558–2567

  5. Zhang X, Zhang Z, Zhang C and Bai X (2016) Symmetry-based object proposal for text detection. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 709–714

  6. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  7. Jiang Y, Zhu X, Wang X, Yang S et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection.arXiv preprint arXiv:1706.09579

  8. Zhong Z, Jin L, Zhang S and Feng Z (2016) Deeptext: a unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314

  9. Girshick R, Donahue J, Darrell T and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  10. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  11. Ren S, He K, Girshick R and Su, J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  12. He K, Gkioxari G, Dollár P and Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  13. Shao F, Wang X, Meng F, Zhu J, Wang D, Dai J (2019) Improved faster R-CNN traffic sign detection based on a second region of interest and highly possible regions proposal network. Sens (Basel, Switz) 19(10):2288. https://doi.org/10.3390/s19102288

    Article  Google Scholar 

  14. Maheshwari K, Joseph Raj AN, Mahesh VG, Zhuang Z, Rufus E, Shivakumara P, Naik GR (2019) Bilingual text detection in natural scene images using invariant moments. J Intell & Fuzzy Syst 37(5):6773–6784

    Article  Google Scholar 

  15. Bosch A, Zisserman A and Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp 401–408

  16. Fan X, Tjahjadi T (2015) A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences. Pattern Recognit 48(11):3407–3416

    Article  Google Scholar 

  17. Yao C, Bai X, Liu W, Ma Y et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090

  18. Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit (IJDAR) 22(2):143–162

    Article  Google Scholar 

  19. Kumuda T and Basavaraj L (2015) Detection and localization of text from natural scene images using texture features. In 2015 IEEE international conference on computational intelligence and computing research (ICCIC), IEEE, pp 1–4

  20. Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076

    Article  Google Scholar 

  21. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vision 116(1):1–20

    Article  MathSciNet  Google Scholar 

  22. Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813

    MathSciNet  MATH  Google Scholar 

  23. Liu Y, Goto S, Ikenaga T (2006) A contour-based robust algorithm for text detection in color images. IEICE Trans Inf Syst 89(3):1221–1230

    Article  Google Scholar 

  24. Simonyan K and Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  25. Nair V and Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  26. Chorowski JK, Bahdanau D, Serdyuk D, Cho K et al (2015) Attention-based models for speech recognition. In: Advances in neural information processing systems, pp 577–585

  27. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  28. Kumbhar P, Mali M (2016) A survey on feature selection techniques and classification algorithms for efficient text classification. Int J Sci Res 5(5):9

    Google Scholar 

  29. Dalal N and Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol 1, IEEE, pp 886–893

  30. Grauman K and Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV'05) volume 1, vol 2, IEEE, pp 1458–1465

  31. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

  32. Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. John Wiley & Sons, USA

    MATH  Google Scholar 

  33. Yin XC, Yin X, Huang K, Hao HW (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Google Scholar 

  34. Epshtein B, Ofek E and Wexler Y (2010). Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970

  35. Chen H, Tsai SS, Schroth G, Chen DM et al (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing, IEEE, pp 2609–2612

Download references

Funding

Shantou University, NTF17016, Alex Noel Joseph Raj , Department of Science and Technology, Ministry of Science and Technology, INTRUSRFBR382, Ruban Nersisson. Basic and Applied Basic Research Foundation of Guangdong Province, No. 2020B1515120061, Zhemin Zhuang.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruban Nersisson.

Ethics declarations

Conflict of interest

The authors state that there is no conflict of interest/competing interest with anything or anybody.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Joseph Raj, A.N., Junmin, C., Nersisson, R. et al. Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients. Pattern Anal Applic 25, 1001–1013 (2022). https://doi.org/10.1007/s10044-022-01066-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-022-01066-3

Keywords

Navigation