Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scene text detection methods based on deep segmentation techniques have achieved promising performances over the past years. However, there are still some challenges for accurately detecting arbitrary shape text instances, especially for those of pattern diversity. In this paper, we propose an arbitrary shape text detector that learns and combines instance-relevant contexts to generate accurate text instances of different patterns in scene images. Besides the instance-aware context, which is learned to distinguish adjacent text instances, the instance-relevant contexts also contain the instance shape-aware context learned by the shared segmentation-based subnet to indicate the distribution of text instances. The proposed instance formation algorithm then leverages the connectivity and the similarity of a text instance to segment the corresponding instance polygon, where the instance-relevant contexts guide the process to effectively separate dense text instances and robustly reconstruct complete arbitrary shape instances. Moreover, the formed instance polygons are refined with the local geometric features of text strokes predicted by a trainable regression-based subnet, which can help alleviate the effect of imprecise text pixel-level annotations for accurate boundary generation. Extensive experiments on four challenging datasets demonstrate the proposed method effectively improves the text detector’s detection accuracy and robustness ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Chng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 14Th international conference on document analysis and recognition (ICDAR). Kyoto, Japan, pp 935–942

  2. Dai P, Zhang S, Zhang H, Cao X (2021) Progressive contour regression for arbitrary-shape scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 7393–7402

  3. Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: The thirty-second AAAI conference on artificial intelligence (AAAI). New Orleans, Louisiana, USA, pp 6773–6780

  4. Deng G, Ming Y, Xue JH (2021) RFRN: a recurrent feature refinement network for accurate and efficient scene text detection. Neurocomputing 453:465–481

    Article  Google Scholar 

  5. Feng W, He W, Yin F, Zhang X, Liu C (2019) Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: IEEE International conference on computer vision (ICCV). Seoul, Korea (South), pp 9075–9084

  6. Guo X, Li J, Chen B, Lu G (2019) Mask-most Net: mask approximation based multi-oriented scene text detection network. In: IEEE International conference on multimedia and expo (ICME). Shanghai, China, pp 206–211

  7. He D, Yang X, Liang C, Zhou IIZ, A.G.O, Kifer D, Giles CL (2017) Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 474–483

  8. He K, Gkioxari G, Dollár P, Girshick RB (2020) Mask r-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Article  Google Scholar 

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA, 770–778

  10. He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE International conference on computer vision (ICCV). Venice, Italy, pp 3066–3074

  11. Karatzas D, Gomez-bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: 13Th international conference on document analysis and recognition (ICDAR). Nancy, France, pp 1156–1160

  12. Keserwani P, Dhankhar A, Saini R, Roy PP (2021) Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9:36,802–36,818

    Article  Google Scholar 

  13. Li J, Lin Y, Liu R, Ho CM, Shi H (2021) RSCA: real-time segmentation-based context-aware scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR) workshops, pp 2349–2358

  14. Li J, Zhang C, Sun Y, Han J, Ding E (2019) Detecting text in the wild with deep character embedding network. In: 14Th asian conference on computer vision. Perth, Australia, pp 501–517

  15. Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process 27(8):3676–3690

    Article  MathSciNet  MATH  Google Scholar 

  16. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: The thirty-first AAAI conference on artificial intelligence (AAAI). San Francisco, California, USA, pp 4161–4167

  17. Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI). New York, NY, USA, pp 11,474–11,481

  18. Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 936–944

  19. Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC (2016) SSD: single shot multibox detector. In: The european conference on computer vision (ECCV), vol 9905. Amsterdam, The Netherlands, pp 21–37

  20. Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) ABCNEt: real-time scene text spotting with adaptive bezier-curve network. In: IEEE Conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 9806–9815

  21. Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 3454–3461

  22. Liu Y, Jin L, Fang C (2020) Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29:2918–2930

    Article  MATH  Google Scholar 

  23. Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90:337–345

    Article  Google Scholar 

  24. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR). Boston, MA, USA, pp 3431–3440

  25. Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129:161–184

    Article  Google Scholar 

  26. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: The european conference on computer vision (ECCV), vol 11206. Munich, Germany, pp 19–35

  27. Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: The european conference on computer vision (ECCV), vol 11218. Munich, Germany, pp 71–88

  28. Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR). Salt Lake City, UT, USA, pp 7553–7563

  29. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multim 20(11):3111–3122

    Article  Google Scholar 

  30. Milletari F, Navab N, Ahmadi S (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth international conference on 3d vision, 3DV. Stanford, CA, USA, pp 565–571

  31. Moga AN, Gabbouj M (1998) Parallel marker-based image segmentation with watershed transformation. J Parallel Distribut Comput 51(1):27–45

    Article  MATH  Google Scholar 

  32. Nayef N, Liu C, Ogier J, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J (2019) ICDAR2019 Robust reading challenge on multi-lingual scene text detection and recognition - RRC-MLT-2019. In: International conference on document analysis and recognition (ICDAR). Sydney, Australia, pp 1582–1587

  33. Ren S, He K, Girshick RB, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  34. Roerdink J, Meijster A (2003) The watershed transform: definitions, algorithms and parallelization strategies. Fundam Inf 41:187–228

    MathSciNet  MATH  Google Scholar 

  35. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 3482–3490

  36. Slimane F, Kanoun S, Abed HE, Alimi AM, Ingold R, Hennebert J (2013) ICDAR2013 Competition on multi-font and multi-size digitally represented arabic text. In: 12Th international conference on document analysis and recognition (ICDAR). Washington, DC, USA, pp 1433–1437

  37. Song X, Wu Y, Wang W, Lu T (2020) TK-Text: multi-shaped scene text detection via instance segmentation. In: International conference on multimedia modeling (MMM), pp 201–213

  38. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: The european conference on computer vision (ECCV), vol 9912. Amsterdam, The Netherlands, pp 56–72

  39. Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA, pp 4234–4243

  40. Wang H, Lu P, Zhang H, Yang M, Bai X, Xu Y, He M, Wang Y, Liu W (2020) All you need is boundary: Toward arbitrary-shaped text spotting. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI). New York, NY, USA, pp 12,160–12,167

  41. Wang P, Zhang C, Qi F, Huang Z, En M, Han J, Liu J, Ding E, Shi G (2019) A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. The 27th ACM International Conference on Multimedia

  42. Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: IEEE Conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA, pp 9336–9345

  43. Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International conference on computer vision (ICCV). Seoul, Korea (South), pp 8439–8448

  44. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence (AAAI). Honolulu, Hawaii, USA, pp 9038–9045

  45. Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: The twenty-eighth international joint conference on artificial intelligence (IJCAI). Macao, China, pp 989–995

  46. Ye J, Chen Z, Liu J, Du B (2020) Textfusenet: scene text detection with richer fused features. In: The twenty-ninth international joint conference on artificial intelligence (IJCAI), pp 516–522

  47. Yixing Z, Jun D (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recognit 110(107):336

    Google Scholar 

  48. Zhang L, Liu Y, Xiao H, Yang L, Zhu G, Shah S, Bennamoun M, Shen P (2020) Efficient scene text detection with textual attention tower. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 4272–4276

  49. Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 9696–9705

  50. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, USA, pp 4159–4167

  51. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 2642–2651

Download references

Acknowledgements

This paper is supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region(No.2022D01A17), NSFC(No.62176155), Shanghai Municipal Science and Technology Major Project, China, under grant no. 2021SHZDZX0102. The authors would like to thank all editors and reviewers for their helpful suggestions and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haiyan Li.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Zhang, Y., Bayramli, B. et al. Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts. Multimed Tools Appl 82, 17827–17852 (2023). https://doi.org/10.1007/s11042-022-13897-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13897-7

Keywords

Navigation