TKDN: Scene Text Detection via Keypoints Detection

Yuanshun Cui^18,19,
Jie Li^18,19,
Hu Han^18,20,
Shiguang Shan^18,19,21 &
…
Xilin Chen^18,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11365))

Included in the following conference series:

Asian Conference on Computer Vision

2543 Accesses

Abstract

In the past few years, great efforts have been devoted to scene text detection. Nevertheless, efficient text detection in the wild remains a challenging problem. Methods for general object detection usually have limitations in handling the arbitrary orientations and large aspect ratios of scene text. In this paper, we present a novel scene text detection method which treats text detection as a text keypoint detection task performed in a coarse-to-fine scheme (text keypoint detection network, TKDN). Specifically, in TKDN we first generate the coarse text instance regions using feature pyramid network (FPN) as well as region proposal network (RPN) and ResNet50. Within the coarse text regions, we then perform text keypoint detection, bounding box classification and regression, and text region segmentation in a multi-task way. In the inference stage, an effective post-processing algorithm is designed to combine the outputs from three branches and obtain the final text keypoint detection results. The proposed TKDN approach outperforms the state-of-the-art approach and achieves an F-measure of 82.0% on the public-domain ICDAR2015 database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bottom-Up Scene Text Detection with Markov Clustering Networks

Article 10 February 2020

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

Article 05 March 2024

BorderNet: An Efficient Border-Attention Text Detector

References

Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: IEEE CVPR (2018)
Google Scholar
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE CVPR (2017)
Google Scholar
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: IEEE CVPR (2004)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI (2018)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE CVPR (2010)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE CVPR (2016)
Google Scholar
Han, H., Jain, A.K.: 3D face texture modeling from uncalibrated frontal and profile images. In: IEEE BTAS (2012)
Google Scholar
Han, H., Jain, A.K., Wang, F., Shan, S., Chen, X.: Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans. PAMI 40(11), 2597–2609 (2018)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE ICCV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE ICCV (2017)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: IEEE ICCV (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)
Google Scholar
Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. PAMI 25(12), 1631–1639 (2003)
Article Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: AAAI (2017)
Google Scholar
Liao, M., Zhu, Z., Shi, B., Xia, G.S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: IEEE CVPR (2018)
Google Scholar
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE CVPR (2017)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR (2015)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE ICCV (1999)
Google Scholar
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE CVPR (2018)
Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20, 3111–3122 (2018)
Article Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60
Chapter Google Scholar
Nie, X., Feng, J., Zuo, Y., Yan, S.: Human pose estimation with parsing induced learner. In: IEEE CVPR (2018)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: IEEE CVPR (2017)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. PAMI 39(11), 2298–2304 (2017)
Article Google Scholar
Song, Y., Cui, Y., Han, H., Shan, S., Chen, X.: Scene text detection via deep semantic feature fusion and attention-based refinement. In: ICPR (2018)
Google Scholar
Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recognit. 48(9), 2906–2920 (2015)
Article Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: IEEE CVPR (2017)
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE CVPR (2012)
Google Scholar
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016)
Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. PAMI 36(5), 970–983 (2014)
Article Google Scholar
Yin, X., Yin, X., Hao, H., Iqbal, K.: Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost. In: ICPR (2012)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE CVPR (2016)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: IEEE CVPR (2017)
Google Scholar

Download references

Acknowledgement

This research was supported in part by the Natural Science Foundation of China (grants 61732004, 61390511, and 61672496), External Cooperation Program of Chinese Academy of Sciences (CAS) (grant GJHZ1843), and Youth Innovation Promotion Association CAS (2018135).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Yuanshun Cui, Jie Li, Hu Han, Shiguang Shan & Xilin Chen
University of Chinese Academy of Sciences, Beijing, 100049, China
Yuanshun Cui, Jie Li, Shiguang Shan & Xilin Chen
Peng Cheng Laboratory, Shenzhen, China
Hu Han
CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China
Shiguang Shan

Authors

Yuanshun Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Hu Han
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hu Han .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, Y., Li, J., Han, H., Shan, S., Chen, X. (2019). TKDN: Scene Text Detection via Keypoints Detection. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-20873-8_15
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20872-1
Online ISBN: 978-3-030-20873-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TKDN: Scene Text Detection via Keypoints Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bottom-Up Scene Text Detection with Markov Clustering Networks

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

BorderNet: An Efficient Border-Attention Text Detector

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TKDN: Scene Text Detection via Keypoints Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bottom-Up Scene Text Detection with Markov Clustering Networks

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

BorderNet: An Efficient Border-Attention Text Detector

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation