Abstract
Extracting text from natural scene images has become a vital issue. The uncertainty of size, color, background, and alignment of the characters make text recognition in natural scene images a demanding challenge. Also, another recent challenge has been the development and expansion of intelligent systems in the field of transportation, especially the recognition of traffic signs, which help ensure safer and easier driving. Therefore, existing a scene-text dataset as a benchmark to generalize researchers’ algorithms is critical. This study, as one of the first studies in the field of text-based traffic signs, intends to prepare a Persian-English multilingual dataset (PESTD) that includes 5832 instances including letters, digits, and symbols in three categories: Persian, English, and Persian-English. Due to the similarity of the calligraphy of numbers and letters in Persian (Farsi), Arabic and Urdu languages, The PESTD can be used in all countries with these languages. To prepare PESTD instances, the text detection process was performed on the traffic signs in Iran. The CRAFT feature extraction algorithm with YOLO and the Tesseract engine have been combined to take an effective step to recognize cursive and multilingual languages despite their specific challenges. Experimental results depict that the values of the evaluation criteria in YOLOv5 are better than its older versions. The accuracy and F1-score values on the PESTD have been attained at 95.3% and 92.3%, respectively.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available in the Persian-English-Scene-Text-Dataset (PESTD) repository, [Link].
References
Ahmed SB, Naz S, Razzak MI, Yusof RB (2019) A novel dataset for English-Arabic scene text recognition (EASTR)-42 K and its evaluation using invariant feature extraction on detected extremal regions. IEEE Access
Baek Y, Lee B, Han D, Yun S, Lee H (2019) ‘Character region awareness for text detection’, in roceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Bochkovskiy A, Wang CY, Liao HYM (2020) ‘Yolov4: Optimal speed and accuracy of object detection’, arXiv preprint arXiv:2004.10934
Brunessaux S, Giroux P, Grilheres B, Manta M, Bodin M, Choukri K, Galibert O, Kahn J (2014) ‘The Maurdor project: improving automatic processing of digital documents’, 11th IAPR International Workshop on Document Analysis Systems (DAS), 349–354
Chernyshova Y, Emelianova E, Sheshkus A, Arlazarov VV (2021) ‘MIDV- LAIT: A Challenging Dataset for Recognition of IDs with Perso-Arabic, Thai, and Indian Scripts’, in International Conference on Document Analysis and Recognition, 258–272
Chowdhury MA, Deb K (2013) Extracting and Segmenting Container Name from Container Images. Int J Comput Appl 74:18–22
Chtourou I, Rouhou AC, Jaiem FK, Kanoun S (2015) ‘ALTID: Arabic/Latin text images database for recognition research’, in Document Analysis and Recognition (ICDAR), in 13th International Conference on, 836–840
Dvorin Y, Havosha UE (2009) ‘Method and device for instant translation’, Google Patents
Greenwood PM, Lenneman JK, Baldwin CL (2022) Advanced driver assistance systems (ADAS): Demographics, preferred sources of information, and accuracy of ADAS knowledge. Transport Res F: Traffic Psychol Behav 86:131–150
Gupta A, Vedaldi A, Zisserman A (2016) ‘Synthetic data for text localisation in natural images’, in Proceedings of the IEEE conference on computer vision and pattern recognition
‘International Phonetic Association and International Phonetic Association Staff and others, Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet’, Cambridge University Press, 1999.
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) ‘Synthetic data and artificial neural networks for natural scene text recognition’, in arXiv preprint arXiv:1406.2227
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda, LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ‘ICDAR 2013 robust reading competition’, in 12th International Conference on Document Analysis and Recognition, IEEE
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) ‘ICDAR 2015 competition on robust reading’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Kheirinejad S, Riaihi N, Azmi R (2020) ‘Persian Text Based Traffic sign Detection with Convolutional Neural Network: A New Dataset’, in 10th International Conference on Computer and Knowledge Engineering (ICCKE), IEEE
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ‘ICDAR 2003 robust reading competitions’, in Seventh International Conference on Document Analysis and Recognition, Proceedings, Springer
Maier D, Baden C, Stoltenberg D, De Vries-Kedem M, Waldherr A (2022) Machine translation vs. multilingual dictionaries assessing two strategies for the topic modeling of multilingual text collections. Commun Methods Meas 16(1):19–38
Mishra A, Alahari K, Jawahar C (2012) ‘Top-down and bottom-up cues for scene text recognition’, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE
Mseddi WS, Sedrine MA, Attia R (2021) ‘YOLOv5 Based Visual Localization for Autonomous Vehicles’, in 29th European Signal Processing Conference (EUSIPCO), 746–750
Naiemi F, Ghods V, Khalesi H (2022) Scene text detection and recognition: a survey. Multimed Tools Appl 81:1–36
Phan TQ, Shivakumara P, Tian S, Tan CL (2013) ‘Recognizing text with perspective distortion in natural scenes’
Powers DM (2020) ‘Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation’, arXiv preprint arXiv:2010.16061
Rashtehroudi AR, Shahbahrami S, Akoushideh A (2020) ‘Iranian license plate recognition using deep learning’, in International Conference on Machine Vision and Image Processing (MVIP)
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint. https://doi.org/10.48550/ARXIV.1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) ‘You only look once: Unified, real-time object detection’, in Proceedings of the IEEE conference on computer vision and pattern recognition
Schulz R, Talbot B, Lam O, Dayoub F, Corke P, Upcroft B, Wyeth G (2015) ‘Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration’, in International Conference on Robotics and Automation (ICRA)
Shetty AK, Saha I, Sanghvi RM, Save SA, Patel YJ (2021) ‘A review: Object detection models’, in 6th International Conference for Convergence in Technology (I2CT), 1–8
Tounsi M, Moalla I, Alimi AM, Lebouregois F (2015) ‘Arabic characters recognition in natural scenes using sparse coding for feature representations’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Tounsi M, Moalla I, Alimi AM (2017) ARASTI: a database for Arabic scene text recognition. In 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, pp 140–144. https://doi.org/10.1109/ASAR.2017.8067776
Tourani A, Soroori S, Shahbahrami A, Akoushideh A (2021) ‘Iranis: A Large-scale Dataset of Iranian Vehicles License Plate Characters’, in 5th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 1–5, https://doi.org/10.1109/IPRIA53572.2021.9483461
Tsai SS, Chen H, Chen D, Schroth G, Grzeszczuk R, Girod B (2011) ‘Mobile visual search on printed documents using text and low bit-rate features’, in 18th IEEE International Conference on Image Processing
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) ‘Coco-text: Dataset and benchmark for text detection and recognition in natural images’, in arXiv preprint arXiv:1601.07140
Wang K, Wei Z (2022) YOLO V4 with hybrid dilated convolution attention module for object detection in the aerial dataset. Int J Remote Sens 43(4):1323–1344
Wang K, Babenko B, Belongie S (2011) ‘End-to-end scene text recognition’, in International Conference on Computer Vision, IEEE
Wolf C, Jolion J (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296
Wu W, Liu H, Li L, Long Y, Wang X, Wang Z, Li J, Chang Y (2021) Application of local fully Convolutional Neural Network combined with YOLO v5 algorithm in small target detection of remote sensing image. PLoS One 16(10):e0259283
Yousfi S, Berrani S, Garcia C (2015) ‘ALIF: A dataset for Arabic embedded text recognition in TV broadcast’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Zayene O, Hennebert J, Touj SM, Ingold R, Amara NEB (2015) ‘A dataset for Arabic text detection, tracking and recognition in news videos-AcTiV’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Zhan F, Lu S, Xue C (2018) ‘Verisimilar image synthesis for accurate detection and recognition of texts in scenes’, in Proceedings of the European Conference on Computer Vision (ECCV)
Zhang C, Ding W, Peng G, Fu F, Wang W (2020) Street View Text Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems. IEEE Trans Intell Transp Syst 22:4727–4743
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: Recent advances and future trends. Front Comput Sci 10(1):19–36
Funding
The author(s) received no financial support for this article’s research, authorship, and publication.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declared no potential conflicts of interest concerning this article’s research, authorship, and publication.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rashtehroudi, A.R., Akoushideh, A. & Shahbahrami, A. PESTD: a large-scale Persian-English scene text dataset. Multimed Tools Appl 82, 34793–34808 (2023). https://doi.org/10.1007/s11042-023-15062-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15062-0