Abstract
Scene text detection methods based on deep segmentation techniques have achieved promising performances over the past years. However, there are still some challenges for accurately detecting arbitrary shape text instances, especially for those of pattern diversity. In this paper, we propose an arbitrary shape text detector that learns and combines instance-relevant contexts to generate accurate text instances of different patterns in scene images. Besides the instance-aware context, which is learned to distinguish adjacent text instances, the instance-relevant contexts also contain the instance shape-aware context learned by the shared segmentation-based subnet to indicate the distribution of text instances. The proposed instance formation algorithm then leverages the connectivity and the similarity of a text instance to segment the corresponding instance polygon, where the instance-relevant contexts guide the process to effectively separate dense text instances and robustly reconstruct complete arbitrary shape instances. Moreover, the formed instance polygons are refined with the local geometric features of text strokes predicted by a trainable regression-based subnet, which can help alleviate the effect of imprecise text pixel-level annotations for accurate boundary generation. Extensive experiments on four challenging datasets demonstrate the proposed method effectively improves the text detector’s detection accuracy and robustness ability.
Similar content being viewed by others
References
Chng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 14Th international conference on document analysis and recognition (ICDAR). Kyoto, Japan, pp 935–942
Dai P, Zhang S, Zhang H, Cao X (2021) Progressive contour regression for arbitrary-shape scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 7393–7402
Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: The thirty-second AAAI conference on artificial intelligence (AAAI). New Orleans, Louisiana, USA, pp 6773–6780
Deng G, Ming Y, Xue JH (2021) RFRN: a recurrent feature refinement network for accurate and efficient scene text detection. Neurocomputing 453:465–481
Feng W, He W, Yin F, Zhang X, Liu C (2019) Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: IEEE International conference on computer vision (ICCV). Seoul, Korea (South), pp 9075–9084
Guo X, Li J, Chen B, Lu G (2019) Mask-most Net: mask approximation based multi-oriented scene text detection network. In: IEEE International conference on multimedia and expo (ICME). Shanghai, China, pp 206–211
He D, Yang X, Liang C, Zhou IIZ, A.G.O, Kifer D, Giles CL (2017) Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 474–483
He K, Gkioxari G, Dollár P, Girshick RB (2020) Mask r-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA, 770–778
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE International conference on computer vision (ICCV). Venice, Italy, pp 3066–3074
Karatzas D, Gomez-bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: 13Th international conference on document analysis and recognition (ICDAR). Nancy, France, pp 1156–1160
Keserwani P, Dhankhar A, Saini R, Roy PP (2021) Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9:36,802–36,818
Li J, Lin Y, Liu R, Ho CM, Shi H (2021) RSCA: real-time segmentation-based context-aware scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR) workshops, pp 2349–2358
Li J, Zhang C, Sun Y, Han J, Ding E (2019) Detecting text in the wild with deep character embedding network. In: 14Th asian conference on computer vision. Perth, Australia, pp 501–517
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process 27(8):3676–3690
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: The thirty-first AAAI conference on artificial intelligence (AAAI). San Francisco, California, USA, pp 4161–4167
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI). New York, NY, USA, pp 11,474–11,481
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 936–944
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC (2016) SSD: single shot multibox detector. In: The european conference on computer vision (ECCV), vol 9905. Amsterdam, The Netherlands, pp 21–37
Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) ABCNEt: real-time scene text spotting with adaptive bezier-curve network. In: IEEE Conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 9806–9815
Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 3454–3461
Liu Y, Jin L, Fang C (2020) Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29:2918–2930
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90:337–345
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR). Boston, MA, USA, pp 3431–3440
Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129:161–184
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: The european conference on computer vision (ECCV), vol 11206. Munich, Germany, pp 19–35
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: The european conference on computer vision (ECCV), vol 11218. Munich, Germany, pp 71–88
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR). Salt Lake City, UT, USA, pp 7553–7563
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multim 20(11):3111–3122
Milletari F, Navab N, Ahmadi S (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth international conference on 3d vision, 3DV. Stanford, CA, USA, pp 565–571
Moga AN, Gabbouj M (1998) Parallel marker-based image segmentation with watershed transformation. J Parallel Distribut Comput 51(1):27–45
Nayef N, Liu C, Ogier J, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J (2019) ICDAR2019 Robust reading challenge on multi-lingual scene text detection and recognition - RRC-MLT-2019. In: International conference on document analysis and recognition (ICDAR). Sydney, Australia, pp 1582–1587
Ren S, He K, Girshick RB, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Roerdink J, Meijster A (2003) The watershed transform: definitions, algorithms and parallelization strategies. Fundam Inf 41:187–228
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 3482–3490
Slimane F, Kanoun S, Abed HE, Alimi AM, Ingold R, Hennebert J (2013) ICDAR2013 Competition on multi-font and multi-size digitally represented arabic text. In: 12Th international conference on document analysis and recognition (ICDAR). Washington, DC, USA, pp 1433–1437
Song X, Wu Y, Wang W, Lu T (2020) TK-Text: multi-shaped scene text detection via instance segmentation. In: International conference on multimedia modeling (MMM), pp 201–213
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: The european conference on computer vision (ECCV), vol 9912. Amsterdam, The Netherlands, pp 56–72
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA, pp 4234–4243
Wang H, Lu P, Zhang H, Yang M, Bai X, Xu Y, He M, Wang Y, Liu W (2020) All you need is boundary: Toward arbitrary-shaped text spotting. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI). New York, NY, USA, pp 12,160–12,167
Wang P, Zhang C, Qi F, Huang Z, En M, Han J, Liu J, Ding E, Shi G (2019) A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. The 27th ACM International Conference on Multimedia
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: IEEE Conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA, pp 9336–9345
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International conference on computer vision (ICCV). Seoul, Korea (South), pp 8439–8448
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence (AAAI). Honolulu, Hawaii, USA, pp 9038–9045
Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: The twenty-eighth international joint conference on artificial intelligence (IJCAI). Macao, China, pp 989–995
Ye J, Chen Z, Liu J, Du B (2020) Textfusenet: scene text detection with richer fused features. In: The twenty-ninth international joint conference on artificial intelligence (IJCAI), pp 516–522
Yixing Z, Jun D (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recognit 110(107):336
Zhang L, Liu Y, Xiao H, Yang L, Zhu G, Shah S, Bennamoun M, Shen P (2020) Efficient scene text detection with textual attention tower. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 4272–4276
Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: IEEE Conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 9696–9705
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, USA, pp 4159–4167
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE Conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 2642–2651
Acknowledgements
This paper is supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region(No.2022D01A17), NSFC(No.62176155), Shanghai Municipal Science and Technology Major Project, China, under grant no. 2021SHZDZX0102. The authors would like to thank all editors and reviewers for their helpful suggestions and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Zhang, Y., Bayramli, B. et al. Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts. Multimed Tools Appl 82, 17827–17852 (2023). https://doi.org/10.1007/s11042-022-13897-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13897-7