Abstract
In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3 850 unique ones annotated by experts in over 30 000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. For each character, the annotation includes its underlying character, bounding box, and six attributes. The attributes indicate the character’s background complexity, appearance, style, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cui Y, Zhou F, Lin Y, Belongie S. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1153-1162.
Deng J, Dong W, Socher R, Li L J, Li K, L F F. ImageNet: A large-scale hierarchical image database. In Proc. the 22nd IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255.
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dolláisr P, Zitnick C L. Microsoft COCO: Common objects in context. In Proc. the 13th European Conference on Computer Vision, April 2014, pp.740-755.
Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 2019, 127(3): 302-321.
Lucas S M, Panaretos A, Sosa L et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. International Journal on Document Analysis and Recognition, 2005, 7(2/3): 105-122.
Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 127.
Smith R, Gu C, Lee D, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S. End-to-end interpretation of the French Street Name Signs dataset. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.411-426.
Veit A, Matera T, Neumann L, Matas J, Belongie S. COCO-text: Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140, 2016. https://arxiv.org/abs/1601.07140, March 2019.
de Campos T E, Babu B R, Varma M. Character recognition in natural images. In Proc. the 4th International Conference on Computer Vision Theory and Applications, February 2009, pp.273-280.
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227, 2014. https://arxiv.org/abs/1406.2227, March 2019.
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. https://ai.google/research/pubs/pub37648, March 2019.
Wang K, Babenko B, Belongie S J. End-to-end scene text recognition. In Proc. the 2011 International Conference on Computer Vision, November 2011, pp.1457-1464.
Jung J, Lee S, Cho M S, Kim J H. Touch TT: Scene text extractor using touchscreen interface. Journal of Electronics and Telecommunications Research Institute, 2011, 33(1): 78-88.
Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090.
Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X. ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In Proc. the 14th IAPR International Conference on Document Analysis and Recognition, ovember 2017, pp.1429-1434.
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 23rd IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970.
Matas J, Chum O, Urban M, Pajdla T. Robust widebaseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767.
Chen H, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edgeenhanced Maximally Stable Extremal Regions. In Proc. the 18th IEEE International Conference on Image Processing, September 2011, pp.2609-2612.
Koo H I, Kim D H. Scene text detection via connected component clustering and nontext filtering. IEEE Transactions Image Processing, 2013, 22(6): 2296-2305.
Neumann L, Matas J. A method for text localization and recognition in real-world images. In Proc. the 10th Asian Conference on Computer Vision, November 2011, pp.770-783.
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X. Multioriented text detection with fully convolutional networks. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167.
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: An efficient and accurate scene text detector. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651.
He T, Huang W, Qiao Y, Yao J. Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423, 2016. https://arxiv.org/abs/1603.09423, March 2019.
Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72.
Sheshadri K, Divvala S K. Exemplar driven character recognition in the wild. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 13.
Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z. Scene text recognition using part-based tree-structured character detection. In Proc. the 26th IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.2961-2968.
Zhang D, Chang S F. A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Proc. the 2003 IEEE Conference on Computer Vision and Pattern Recognition, June 2003, pp.528-533.
Mishra A, Alahari K, Jawahar C V. Top-down and bottomup cues for scene text recognition. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.2687-2694.
Lee S, Kim J. Complementary combination of holistic and component analysis for recognition of low-resolution video character images. Pattern Recognition Letters, 2008, 29(4): 383-391.
Wang T, Wu D J, Coates A, Ng A Y. End-to-end text recognition with convolutional neural networks. In Proc. the 21st International Conference on Pattern Recognition, November 2012, pp.3304-3308.
Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304.
Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.
Ye Q, Doermann D. Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1480-1500.
Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S. Traffic-sign detection and classification in the wild. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2110-2118.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-Cun Y. OverFeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, March 2019.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 28th IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.
Everingham M, Eslami S A, Van Gool L, Williams C K, Winn J, Zisserman A. The PASCAL Visual Object Classes challenge: A retrospective. International Journal of Computer Vision, 2015, 111(1): 98-136.
Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(PDF 697 kb)
Rights and permissions
About this article
Cite this article
Yuan, TL., Zhu, Z., Xu, K. et al. A Large Chinese Text Dataset in the Wild. J. Comput. Sci. Technol. 34, 509–521 (2019). https://doi.org/10.1007/s11390-019-1923-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-019-1923-y