Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3343031.3350988acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning

Published: 15 October 2019 Publication History

Abstract

Detecting scene text of arbitrary shapes has been a challenging task over the past years. In this paper, we propose a novel segmentation-based text detector, namely SAST, which employs a context attended multi-task learning framework based on a Fully Convolutional Network (FCN) to learn various geometric properties for the reconstruction of polygonal representation of text regions. Taking sequential characteristics of text into consideration, a Context Attention Block is introduced to capture long-range dependencies of pixel information to obtain a more reliable segmentation. In post-processing, a Point-to-Quad assignment method is proposed to cluster pixels into text instances by integrating both high-level object knowledge and low-level pixel information in a single shot. Moreover, the polygonal representation of arbitrarily-shaped text can be extracted with the proposed geometric properties much more effectively. Experiments on several benchmarks, including ICDAR2015, ICDAR2017-MLT, SCUT-CTW1500, and Total-Text, demonstrate that SAST achieves better or comparable performance in terms of accuracy. Furthermore, the proposed algorithm runs at 27.63 FPS on SCUT-CTW1500 with a Hmean of 81.0% on a single NVIDIA Titan Xp graphics card, surpassing most of the existing segmentation-based methods.

References

[1]
Chee Kheng Ch'ng and Chee Seng Chan. 2017. Total-Text: A comprehensive dataset for scene text detection and recognition. In Int. Conf. Doc. Anal. Recognit. (ICDAR), Vol. 1. IEEE, 935--942.
[2]
Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. 2018. PixelLink: Detecting scene text via instance segmentation. In Proc. AAAI Conf. Artif. Intell. (AAAI) .
[3]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). IEEE, 248--255.
[4]
Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, and Kevin P Murphy. 2017. Semantic instance segmentation via deep metric learning. arxiv: 1703.10277
[5]
R. Girshick. 2015. Fast R-CNN. In IEEE Int. Conf. Comp. Vis. (ICCV). 1440--1448.
[6]
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic data for text localisation in natural images. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 2315--2324.
[7]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017a. Mask R-CNN. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 2961--2969.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR) . 770--778.
[9]
Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. 2017b. Single shot text detector with regional attention. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 3047--3055.
[10]
Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017c. Deep direct regression for multi-oriented scene text detection. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 745--753.
[11]
Han Hu, Chengquan Zhang, Yuxuan Luo, Yuzhuo Wang, Junyu Han, and Errui Ding. 2017. WordSup: Exploiting Word Annotations for Character Based Text Detection. In IEEE Int. Conf. Comp. Vis. (ICCV). 4950--4959.
[12]
Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. 2015. DenseBox: Unifying landmark localization with end to end object detection. arxiv: 1509.04874
[13]
Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2018. CCNet: Criss-cross attention for semantic segmentation. arxiv: 1811.11721
[14]
Zhida Huang, Zhuoyao Zhong, Lei Sun, and Qiang Huo. 2019. Mask R-CNN with pyramid attention network for scene text detection. In Winter Conf. Appl. Comp. Vis. (WACV). IEEE, 764--772.
[15]
Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et almbox. 2015. ICDAR 2015 competition on robust reading. In Int. Conf. Doc. Anal. Recognit. (ICDAR). IEEE, 1156--1160.
[16]
Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, and Carsten Rother. 2017. InstanceCut: from edges to instances with multicut. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). IEEE, 7322--7331.
[17]
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. TextBoxes: A fast text detector with a single deep neural network. In Proc. AAAI Conf. Artif. Intell. (AAAI). 4161--4167.
[18]
Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai. 2018. Rotation-sensitive regression for oriented scene text detection. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 5909--5918.
[19]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR) . 2117--2125.
[20]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In Eur. Conf. Comp. Vis. (ECCV). Springer, 21--37.
[21]
Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 1962--1969.
[22]
Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, and Yan Lu. 2018. Affinity derivation and graph merge for instance segmentation. In Eur. Conf. Comp. Vis. (ECCV) . 686--703.
[23]
Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. 2018. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Eur. Conf. Comp. Vis. (ECCV). 20--36.
[24]
Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018a. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Eur. Conf. Comp. Vis. (ECCV) . 67--83.
[25]
Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, and Xiang Bai. 2018b. Multi-oriented scene text detection via corner localization and region segmentation. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 7553--7563.
[26]
Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia, Vol. 20, 11 (2018), 3111--3122.
[27]
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In 4th Int. Conf. 3D Vision (3DV). IEEE, 565--571.
[28]
Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, Dimosthenis Karatzas, Zhenbo Luo, Umapada Pal, Christophe Rigaud, Joseph Chazalon, et almbox. 2017. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Int. Conf. Doc. Anal. Recognit. (ICDAR), Vol. 1. IEEE, 1454--1459.
[29]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Adv. Neural Inf. Process. Syst. (NIPS). 91--99.
[30]
Baoguang Shi, Xiang Bai, and Serge Belongie. 2017. Detecting oriented text in natural images by linking segments. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 2550--2558.
[31]
Bharat Singh and Larry S Davis. 2018. An analysis of scale invariance in object detection snip. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 3578--3587.
[32]
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting text in natural image with connectionist text proposal network. In Eur. Conf. Comp. Vis. (ECCV). Springer, 56--72.
[33]
Jonas Uhrig, Eike Rehder, Björn Fröhlich, Uwe Franke, and Thomas Brox. 2018. Box2Pix: Single-shot instance segmentation by assigning pixels to object boxes. In IEEE Intell. Veh. Symp. (IV). IEEE, 292--299.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Adv. Neural Inf. Process. Syst. (NIPS) . 5998--6008.
[35]
Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. 2019. Shape Robust Text Detection With Progressive Scale Expansion Network. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 9336--9345.
[36]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR) . 7794--7803.
[37]
Yue Wu and Prem Natarajan. 2017. Self-organized text detection with minimal post-processing via border learning. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 5000--5009.
[38]
Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, and Xiang Bai. 2019. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. IEEE Trans. Image Process. (2019). arxiv: 1812.01393
[39]
Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, and Wei Lin. 2018. IncepText: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. In Int. Joint Conf. Artif. Intell. (IJCAI) . IJCAI, 1071--1077.
[40]
Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 37, 7 (2015), 1480--1500.
[41]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a discriminative feature network for semantic segmentation. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). IEEE, 1857--1866.
[42]
Liu Yuliang, Jin Lianwen, Zhang Shuaitao, and Zhang Sheng. 2017. Detecting curve text in the wild: New dataset and new solution. arxiv: 1712.02170
[43]
Chengquan Zhang, Borong Liang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding, and Xinghao Ding. 2019. Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR) .
[44]
Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 4159--4167.
[45]
Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 1529--1537.
[46]
Zhuoyao Zhong, Lei Sun, and Qiang Huo. 2018. An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches. arxiv: 1804.09003
[47]
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An efficient and accurate scene text detector. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 5551--5560.
[48]
Yixing Zhu and Jun Du. 2018. Sliding line point regression for shape robust scene text detection. In Int. Conf. Pattern Recognit. (ICPR). 3735--3740.
[49]
Yingying Zhu, Cong Yao, and Xiang Bai. 2016. Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, Vol. 10, 1 (2016), 19--36.

Cited By

View all
  • (2025)HI-CMAIM: Hybrid Intelligence-Based Multi-Source Unstructured Chinese Map Annotation Interpretation ModelRemote Sensing10.3390/rs1702020417:2(204)Online publication date: 8-Jan-2025
  • (2025)FSANet: Feature shuffle and adaptive channel attention network for arbitrary shape scene text detectionNeurocomputing10.1016/j.neucom.2025.129443624(129443)Online publication date: Apr-2025
  • (2025)EK-Net++: Real-time scene text detection with expand kernel distance and Epoch Adaptive WeightExpert Systems with Applications10.1016/j.eswa.2024.126159267(126159)Online publication date: Apr-2025
  • Show More Cited By

Index Terms

  1. A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '19: Proceedings of the 27th ACM International Conference on Multimedia
      October 2019
      2794 pages
      ISBN:9781450368896
      DOI:10.1145/3343031
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. arbitrarily-shaped text detection
      2. fcn
      3. real-time segmentation

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '19
      Sponsor:

      Acceptance Rates

      MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)209
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)HI-CMAIM: Hybrid Intelligence-Based Multi-Source Unstructured Chinese Map Annotation Interpretation ModelRemote Sensing10.3390/rs1702020417:2(204)Online publication date: 8-Jan-2025
      • (2025)FSANet: Feature shuffle and adaptive channel attention network for arbitrary shape scene text detectionNeurocomputing10.1016/j.neucom.2025.129443624(129443)Online publication date: Apr-2025
      • (2025)EK-Net++: Real-time scene text detection with expand kernel distance and Epoch Adaptive WeightExpert Systems with Applications10.1016/j.eswa.2024.126159267(126159)Online publication date: Apr-2025
      • (2025)Artistic-style text detector and a new Movie-Poster datasetExpert Systems with Applications10.1016/j.eswa.2024.125544261(125544)Online publication date: Feb-2025
      • (2024)Scene Text Detection Using HRNet and Spatial Attention MechanismProgramming and Computer Software10.1134/S036176882308021249:8(954-965)Online publication date: 24-Jan-2024
      • (2024)Arbitrary Shape Text Detection via Boundary TransformerIEEE Transactions on Multimedia10.1109/TMM.2023.328665726(1747-1760)Online publication date: 1-Jan-2024
      • (2024)Text Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the WildIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328509634:1(286-298)Online publication date: Jan-2024
      • (2024)Modelling Studies of Automatic Container Code Recognition System for Real Time Implementation2024 IEEE Symposium on Industrial Electronics & Applications (ISIEA)10.1109/ISIEA61920.2024.10607358(1-6)Online publication date: 6-Jul-2024
      • (2024)TL-DREN: Transfer Learning Based Detection and Recognition of Electricity Nameplates2024 5th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)10.1109/ICHCI63580.2024.10808048(328-332)Online publication date: 27-Sep-2024
      • (2024)Enhancing Corporate Data Security: A MobileNetV3- Based Approach for Complex Scene Payment Numeric Text Data Recognition2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL)10.1109/CVIDL62147.2024.10604145(729-736)Online publication date: 19-Apr-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media