A Large Chinese Text Dataset in the Wild

Tai-Ling Yuan¹,
Zhe Zhu²,
Kun Xu¹,
Cheng-Jun Li³,
Tai-Jiang Mu¹ &
…
Shi-Min Hu¹

403 Accesses
Explore all metrics

Abstract

In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3 850 unique ones annotated by experts in over 30 000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. For each character, the annotation includes its underlying character, bounding box, and six attributes. The attributes indicate the character’s background complexity, appearance, style, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Cui Y, Zhou F, Lin Y, Belongie S. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1153-1162.
Deng J, Dong W, Socher R, Li L J, Li K, L F F. ImageNet: A large-scale hierarchical image database. In Proc. the 22nd IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255.
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dolláisr P, Zitnick C L. Microsoft COCO: Common objects in context. In Proc. the 13th European Conference on Computer Vision, April 2014, pp.740-755.
Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 2019, 127(3): 302-321.
Article Google Scholar
Lucas S M, Panaretos A, Sosa L et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. International Journal on Document Analysis and Recognition, 2005, 7(2/3): 105-122.
Article Google Scholar
Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 127.
Smith R, Gu C, Lee D, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S. End-to-end interpretation of the French Street Name Signs dataset. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.411-426.
Veit A, Matera T, Neumann L, Matas J, Belongie S. COCO-text: Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140, 2016. https://arxiv.org/abs/1601.07140, March 2019.
de Campos T E, Babu B R, Varma M. Character recognition in natural images. In Proc. the 4th International Conference on Computer Vision Theory and Applications, February 2009, pp.273-280.
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227, 2014. https://arxiv.org/abs/1406.2227, March 2019.
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. https://ai.google/research/pubs/pub37648, March 2019.
Wang K, Babenko B, Belongie S J. End-to-end scene text recognition. In Proc. the 2011 International Conference on Computer Vision, November 2011, pp.1457-1464.
Jung J, Lee S, Cho M S, Kim J H. Touch TT: Scene text extractor using touchscreen interface. Journal of Electronics and Telecommunications Research Institute, 2011, 33(1): 78-88.
Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090.
Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X. ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In Proc. the 14th IAPR International Conference on Document Analysis and Recognition, ovember 2017, pp.1429-1434.
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 23rd IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970.
Matas J, Chum O, Urban M, Pajdla T. Robust widebaseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767.
Article Google Scholar
Chen H, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edgeenhanced Maximally Stable Extremal Regions. In Proc. the 18th IEEE International Conference on Image Processing, September 2011, pp.2609-2612.
Koo H I, Kim D H. Scene text detection via connected component clustering and nontext filtering. IEEE Transactions Image Processing, 2013, 22(6): 2296-2305.
Article MathSciNet MATH Google Scholar
Neumann L, Matas J. A method for text localization and recognition in real-world images. In Proc. the 10th Asian Conference on Computer Vision, November 2011, pp.770-783.
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X. Multioriented text detection with fully convolutional networks. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167.
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: An efficient and accurate scene text detector. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651.
He T, Huang W, Qiao Y, Yao J. Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423, 2016. https://arxiv.org/abs/1603.09423, March 2019.
Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72.
Sheshadri K, Divvala S K. Exemplar driven character recognition in the wild. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 13.
Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z. Scene text recognition using part-based tree-structured character detection. In Proc. the 26th IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.2961-2968.
Zhang D, Chang S F. A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Proc. the 2003 IEEE Conference on Computer Vision and Pattern Recognition, June 2003, pp.528-533.
Mishra A, Alahari K, Jawahar C V. Top-down and bottomup cues for scene text recognition. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.2687-2694.
Lee S, Kim J. Complementary combination of holistic and component analysis for recognition of low-resolution video character images. Pattern Recognition Letters, 2008, 29(4): 383-391.
Article Google Scholar
Wang T, Wu D J, Coates A, Ng A Y. End-to-end text recognition with convolutional neural networks. In Proc. the 21st International Conference on Pattern Recognition, November 2012, pp.3304-3308.
Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304.
Article Google Scholar
Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.
Ye Q, Doermann D. Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1480-1500.
Article Google Scholar
Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S. Traffic-sign detection and classification in the wild. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2110-2118.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-Cun Y. OverFeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, March 2019.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 28th IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.
Everingham M, Eslami S A, Van Gool L, Williams C K, Winn J, Zisserman A. The PASCAL Visual Object Classes challenge: A retrospective. International Journal of Computer Vision, 2015, 111(1): 98-136.
Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Tai-Ling Yuan, Kun Xu, Tai-Jiang Mu & Shi-Min Hu
Department of Radiology, Duke University, Durham, NC, 27708, U.S.A.
Zhe Zhu
Tencent Technology (Beijing) Co. Ltd., Beijing, 100080, China
Cheng-Jun Li

Authors

Tai-Ling Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Tai-Jiang Mu
View author publications
You can also search for this author in PubMed Google Scholar
Shi-Min Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tai-Jiang Mu.

Electronic supplementary material

ESM 1

(PDF 697 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, TL., Zhu, Z., Xu, K. et al. A Large Chinese Text Dataset in the Wild. J. Comput. Sci. Technol. 34, 509–521 (2019). https://doi.org/10.1007/s11390-019-1923-y

Download citation

Received: 24 December 2018
Revised: 20 March 2019
Published: 10 May 2019
Issue Date: May 2019
DOI: https://doi.org/10.1007/s11390-019-1923-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DevChar: An Extensive Dataset for Optical Character Recognition of Devanagari Characters

GNHK: A Dataset for English Handwriting in the Wild

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A Large Chinese Text Dataset in the Wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DevChar: An Extensive Dataset for Optical Character Recognition of Devanagari Characters

GNHK: A Dataset for English Handwriting in the Wild

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now