Abstract
In the past few years, great efforts have been devoted to scene text detection. Nevertheless, efficient text detection in the wild remains a challenging problem. Methods for general object detection usually have limitations in handling the arbitrary orientations and large aspect ratios of scene text. In this paper, we present a novel scene text detection method which treats text detection as a text keypoint detection task performed in a coarse-to-fine scheme (text keypoint detection network, TKDN). Specifically, in TKDN we first generate the coarse text instance regions using feature pyramid network (FPN) as well as region proposal network (RPN) and ResNet50. Within the coarse text regions, we then perform text keypoint detection, bounding box classification and regression, and text region segmentation in a multi-task way. In the inference stage, an effective post-processing algorithm is designed to combine the outputs from three branches and obtain the final text keypoint detection results. The proposed TKDN approach outperforms the state-of-the-art approach and achieves an F-measure of 82.0% on the public-domain ICDAR2015 database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: IEEE CVPR (2018)
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE CVPR (2017)
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: IEEE CVPR (2004)
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI (2018)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE CVPR (2010)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE CVPR (2016)
Han, H., Jain, A.K.: 3D face texture modeling from uncalibrated frontal and profile images. In: IEEE BTAS (2012)
Han, H., Jain, A.K., Wang, F., Shan, S., Chen, X.: Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans. PAMI 40(11), 2597–2609 (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE ICCV (2017)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: IEEE ICCV (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)
Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. PAMI 25(12), 1631–1639 (2003)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: AAAI (2017)
Liao, M., Zhu, Z., Shi, B., Xia, G.S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: IEEE CVPR (2018)
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE CVPR (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR (2015)
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE ICCV (1999)
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE CVPR (2018)
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20, 3111–3122 (2018)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60
Nie, X., Feng, J., Zuo, Y., Yan, S.: Human pose estimation with parsing induced learner. In: IEEE CVPR (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: IEEE CVPR (2017)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. PAMI 39(11), 2298–2304 (2017)
Song, Y., Cui, Y., Han, H., Shan, S., Chen, X.: Scene text detection via deep semantic feature fusion and attention-based refinement. In: ICPR (2018)
Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recognit. 48(9), 2906–2920 (2015)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: IEEE CVPR (2017)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE CVPR (2012)
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016)
Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. PAMI 36(5), 970–983 (2014)
Yin, X., Yin, X., Hao, H., Iqbal, K.: Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost. In: ICPR (2012)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE CVPR (2016)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: IEEE CVPR (2017)
Acknowledgement
This research was supported in part by the Natural Science Foundation of China (grants 61732004, 61390511, and 61672496), External Cooperation Program of Chinese Academy of Sciences (CAS) (grant GJHZ1843), and Youth Innovation Promotion Association CAS (2018135).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cui, Y., Li, J., Han, H., Shan, S., Chen, X. (2019). TKDN: Scene Text Detection via Keypoints Detection. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-20873-8_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20872-1
Online ISBN: 978-3-030-20873-8
eBook Packages: Computer ScienceComputer Science (R0)