Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts

Haiyan Li ORCID: orcid.org/0000-0003-4674-9108^1,2,
Yangsong Zhang¹,
Bayram Bayramli¹ &
…
Hongtao Lu¹

225 Accesses
3 Citations
Explore all metrics

Abstract

Scene text detection methods based on deep segmentation techniques have achieved promising performances over the past years. However, there are still some challenges for accurately detecting arbitrary shape text instances, especially for those of pattern diversity. In this paper, we propose an arbitrary shape text detector that learns and combines instance-relevant contexts to generate accurate text instances of different patterns in scene images. Besides the instance-aware context, which is learned to distinguish adjacent text instances, the instance-relevant contexts also contain the instance shape-aware context learned by the shared segmentation-based subnet to indicate the distribution of text instances. The proposed instance formation algorithm then leverages the connectivity and the similarity of a text instance to segment the corresponding instance polygon, where the instance-relevant contexts guide the process to effectively separate dense text instances and robustly reconstruct complete arbitrary shape instances. Moreover, the formed instance polygons are refined with the local geometric features of text strokes predicted by a trainable regression-based subnet, which can help alleviate the effect of imprecise text pixel-level annotations for accurate boundary generation. Extensive experiments on four challenging datasets demonstrate the proposed method effectively improves the text detector’s detection accuracy and robustness ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TK-Text: Multi-shaped Scene Text Detection via Instance Segmentation

Arbitrary-shaped scene text detection by predicting distance map

Article 07 March 2022

Scene Text Detection with Box Supervision and Level Set Evolution

References

Chng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 14Th international conference on document analysis and recognition (ICDAR). Kyoto, Japan, pp 935–942
Dai P, Zhang S, Zhang H, Cao X (2021) Progressive contour regression for arbitrary-shape scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 7393–7402
Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: The thirty-second AAAI conference on artificial intelligence (AAAI). New Orleans, Louisiana, USA, pp 6773–6780
Deng G, Ming Y, Xue JH (2021) RFRN: a recurrent feature refinement network for accurate and efficient scene text detection. Neurocomputing 453:465–481
Article Google Scholar
Feng W, He W, Yin F, Zhang X, Liu C (2019) Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: IEEE International conference on computer vision (ICCV). Seoul, Korea (South), pp 9075–9084
Guo X, Li J, Chen B, Lu G (2019) Mask-most Net: mask approximation based multi-oriented scene text detection network. In: IEEE International conference on multimedia and expo (ICME). Shanghai, China, pp 206–211
He D, Yang X, Liang C, Zhou IIZ, A.G.O, Kifer D, Giles CL (2017) Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 474–483
He K, Gkioxari G, Dollár P, Girshick RB (2020) Mask r-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA, 770–778
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE International conference on computer vision (ICCV). Venice, Italy, pp 3066–3074
Karatzas D, Gomez-bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: 13Th international conference on document analysis and recognition (ICDAR). Nancy, France, pp 1156–1160
Keserwani P, Dhankhar A, Saini R, Roy PP (2021) Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9:36,802–36,818
Article Google Scholar
Li J, Lin Y, Liu R, Ho CM, Shi H (2021) RSCA: real-time segmentation-based context-aware scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR) workshops, pp 2349–2358
Li J, Zhang C, Sun Y, Han J, Ding E (2019) Detecting text in the wild with deep character embedding network. In: 14Th asian conference on computer vision. Perth, Australia, pp 501–517
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process 27(8):3676–3690
Article MathSciNet MATH Google Scholar
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: The thirty-first AAAI conference on artificial intelligence (AAAI). San Francisco, California, USA, pp 4161–4167
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI). New York, NY, USA, pp 11,474–11,481
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 936–944
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC (2016) SSD: single shot multibox detector. In: The european conference on computer vision (ECCV), vol 9905. Amsterdam, The Netherlands, pp 21–37
Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) ABCNEt: real-time scene text spotting with adaptive bezier-curve network. In: IEEE Conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 9806–9815
Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 3454–3461
Liu Y, Jin L, Fang C (2020) Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29:2918–2930
Article MATH Google Scholar
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90:337–345
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR). Boston, MA, USA, pp 3431–3440
Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129:161–184
Article Google Scholar
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: The european conference on computer vision (ECCV), vol 11206. Munich, Germany, pp 19–35
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: The european conference on computer vision (ECCV), vol 11218. Munich, Germany, pp 71–88
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR). Salt Lake City, UT, USA, pp 7553–7563
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multim 20(11):3111–3122
Article Google Scholar
Milletari F, Navab N, Ahmadi S (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth international conference on 3d vision, 3DV. Stanford, CA, USA, pp 565–571
Moga AN, Gabbouj M (1998) Parallel marker-based image segmentation with watershed transformation. J Parallel Distribut Comput 51(1):27–45
Article MATH Google Scholar
Nayef N, Liu C, Ogier J, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J (2019) ICDAR2019 Robust reading challenge on multi-lingual scene text detection and recognition - RRC-MLT-2019. In: International conference on document analysis and recognition (ICDAR). Sydney, Australia, pp 1582–1587
Ren S, He K, Girshick RB, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Roerdink J, Meijster A (2003) The watershed transform: definitions, algorithms and parallelization strategies. Fundam Inf 41:187–228
MathSciNet MATH Google Scholar
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 3482–3490
Slimane F, Kanoun S, Abed HE, Alimi AM, Ingold R, Hennebert J (2013) ICDAR2013 Competition on multi-font and multi-size digitally represented arabic text. In: 12Th international conference on document analysis and recognition (ICDAR). Washington, DC, USA, pp 1433–1437
Song X, Wu Y, Wang W, Lu T (2020) TK-Text: multi-shaped scene text detection via instance segmentation. In: International conference on multimedia modeling (MMM), pp 201–213
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: The european conference on computer vision (ECCV), vol 9912. Amsterdam, The Netherlands, pp 56–72
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA, pp 4234–4243
Wang H, Lu P, Zhang H, Yang M, Bai X, Xu Y, He M, Wang Y, Liu W (2020) All you need is boundary: Toward arbitrary-shaped text spotting. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI). New York, NY, USA, pp 12,160–12,167
Wang P, Zhang C, Qi F, Huang Z, En M, Han J, Liu J, Ding E, Shi G (2019) A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. The 27th ACM International Conference on Multimedia
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: IEEE Conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA, pp 9336–9345
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International conference on computer vision (ICCV). Seoul, Korea (South), pp 8439–8448
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence (AAAI). Honolulu, Hawaii, USA, pp 9038–9045
Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: The twenty-eighth international joint conference on artificial intelligence (IJCAI). Macao, China, pp 989–995
Ye J, Chen Z, Liu J, Du B (2020) Textfusenet: scene text detection with richer fused features. In: The twenty-ninth international joint conference on artificial intelligence (IJCAI), pp 516–522
Yixing Z, Jun D (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recognit 110(107):336
Google Scholar
Zhang L, Liu Y, Xiao H, Yang L, Zhu G, Shah S, Bennamoun M, Shen P (2020) Efficient scene text detection with textual attention tower. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 4272–4276
Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 9696–9705
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, USA, pp 4159–4167
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 2642–2651

Download references

Acknowledgements

This paper is supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region(No.2022D01A17), NSFC(No.62176155), Shanghai Municipal Science and Technology Major Project, China, under grant no. 2021SHZDZX0102. The authors would like to thank all editors and reviewers for their helpful suggestions and constructive comments.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, MOE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
Haiyan Li, Yangsong Zhang, Bayram Bayramli & Hongtao Lu
Department of Computer Science and Technology, Kashi University, Kashgar, 844000, China
Haiyan Li

Authors

Haiyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yangsong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bayram Bayramli
View author publications
You can also search for this author in PubMed Google Scholar
Hongtao Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiyan Li.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Zhang, Y., Bayramli, B. et al. Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts. Multimed Tools Appl 82, 17827–17852 (2023). https://doi.org/10.1007/s11042-022-13897-7

Download citation

Received: 23 March 2021
Revised: 01 July 2022
Accepted: 12 September 2022
Published: 19 October 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-022-13897-7

Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TK-Text: Multi-shaped Scene Text Detection via Instance Segmentation

Arbitrary-shaped scene text detection by predicting distance map

Scene Text Detection with Box Supervision and Level Set Evolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TK-Text: Multi-shaped Scene Text Detection via Instance Segmentation

Arbitrary-shaped scene text detection by predicting distance map

Scene Text Detection with Box Supervision and Level Set Evolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation