Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

PESTD: a large-scale Persian-English scene text dataset

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Extracting text from natural scene images has become a vital issue. The uncertainty of size, color, background, and alignment of the characters make text recognition in natural scene images a demanding challenge. Also, another recent challenge has been the development and expansion of intelligent systems in the field of transportation, especially the recognition of traffic signs, which help ensure safer and easier driving. Therefore, existing a scene-text dataset as a benchmark to generalize researchers’ algorithms is critical. This study, as one of the first studies in the field of text-based traffic signs, intends to prepare a Persian-English multilingual dataset (PESTD) that includes 5832 instances including letters, digits, and symbols in three categories: Persian, English, and Persian-English. Due to the similarity of the calligraphy of numbers and letters in Persian (Farsi), Arabic and Urdu languages, The PESTD can be used in all countries with these languages. To prepare PESTD instances, the text detection process was performed on the traffic signs in Iran. The CRAFT feature extraction algorithm with YOLO and the Tesseract engine have been combined to take an effective step to recognize cursive and multilingual languages despite their specific challenges. Experimental results depict that the values of the evaluation criteria in YOLOv5 are better than its older versions. The accuracy and F1-score values on the PESTD have been attained at 95.3% and 92.3%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets generated during the current study are available in the Persian-English-Scene-Text-Dataset (PESTD) repository, [Link].

References

  1. Ahmed SB, Naz S, Razzak MI, Yusof RB (2019) A novel dataset for English-Arabic scene text recognition (EASTR)-42 K and its evaluation using invariant feature extraction on detected extremal regions. IEEE Access

  2. Baek Y, Lee B, Han D, Yun S, Lee H (2019) ‘Character region awareness for text detection’, in roceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  3. Bochkovskiy A, Wang CY, Liao HYM (2020) ‘Yolov4: Optimal speed and accuracy of object detection’, arXiv preprint arXiv:2004.10934

  4. Brunessaux S, Giroux P, Grilheres B, Manta M, Bodin M, Choukri K, Galibert O, Kahn J (2014) ‘The Maurdor project: improving automatic processing of digital documents’, 11th IAPR International Workshop on Document Analysis Systems (DAS), 349–354

  5. Chernyshova Y, Emelianova E, Sheshkus A, Arlazarov VV (2021) ‘MIDV- LAIT: A Challenging Dataset for Recognition of IDs with Perso-Arabic, Thai, and Indian Scripts’, in International Conference on Document Analysis and Recognition, 258–272

  6. Chowdhury MA, Deb K (2013) Extracting and Segmenting Container Name from Container Images. Int J Comput Appl 74:18–22

    Google Scholar 

  7. Chtourou I, Rouhou AC, Jaiem FK, Kanoun S (2015) ‘ALTID: Arabic/Latin text images database for recognition research’, in Document Analysis and Recognition (ICDAR), in 13th International Conference on, 836–840

  8. Dvorin Y, Havosha UE (2009) ‘Method and device for instant translation’, Google Patents

  9. Greenwood PM, Lenneman JK, Baldwin CL (2022) Advanced driver assistance systems (ADAS): Demographics, preferred sources of information, and accuracy of ADAS knowledge. Transport Res F: Traffic Psychol Behav 86:131–150

    Article  Google Scholar 

  10. Gupta A, Vedaldi A, Zisserman A (2016) ‘Synthetic data for text localisation in natural images’, in Proceedings of the IEEE conference on computer vision and pattern recognition

  11. ‘International Phonetic Association and International Phonetic Association Staff and others, Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet’, Cambridge University Press, 1999.

  12. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) ‘Synthetic data and artificial neural networks for natural scene text recognition’, in arXiv preprint arXiv:1406.2227

  13. Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda, LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ‘ICDAR 2013 robust reading competition’, in 12th International Conference on Document Analysis and Recognition, IEEE

  14. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) ‘ICDAR 2015 competition on robust reading’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE

  15. Kheirinejad S, Riaihi N, Azmi R (2020) ‘Persian Text Based Traffic sign Detection with Convolutional Neural Network: A New Dataset’, in 10th International Conference on Computer and Knowledge Engineering (ICCKE), IEEE

  16. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ‘ICDAR 2003 robust reading competitions’, in Seventh International Conference on Document Analysis and Recognition, Proceedings, Springer

  17. Maier D, Baden C, Stoltenberg D, De Vries-Kedem M, Waldherr A (2022) Machine translation vs. multilingual dictionaries assessing two strategies for the topic modeling of multilingual text collections. Commun Methods Meas 16(1):19–38

    Article  Google Scholar 

  18. Mishra A, Alahari K, Jawahar C (2012) ‘Top-down and bottom-up cues for scene text recognition’, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE

  19. Mseddi WS, Sedrine MA, Attia R (2021) ‘YOLOv5 Based Visual Localization for Autonomous Vehicles’, in 29th European Signal Processing Conference (EUSIPCO), 746–750

  20. Naiemi F, Ghods V, Khalesi H (2022) Scene text detection and recognition: a survey. Multimed Tools Appl 81:1–36

    Article  Google Scholar 

  21. Phan TQ, Shivakumara P, Tian S, Tan CL (2013) ‘Recognizing text with perspective distortion in natural scenes’

  22. Powers DM (2020) ‘Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation’, arXiv preprint arXiv:2010.16061

  23. Rashtehroudi AR, Shahbahrami S, Akoushideh A (2020) ‘Iranian license plate recognition using deep learning’, in International Conference on Machine Vision and Image Processing (MVIP)

  24. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint. https://doi.org/10.48550/ARXIV.1804.02767

  25. Redmon J, Divvala S, Girshick R, Farhadi A (2016) ‘You only look once: Unified, real-time object detection’, in Proceedings of the IEEE conference on computer vision and pattern recognition

  26. Schulz R, Talbot B, Lam O, Dayoub F, Corke P, Upcroft B, Wyeth G (2015) ‘Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration’, in International Conference on Robotics and Automation (ICRA)

  27. Shetty AK, Saha I, Sanghvi RM, Save SA, Patel YJ (2021) ‘A review: Object detection models’, in 6th International Conference for Convergence in Technology (I2CT), 1–8

  28. Tounsi M, Moalla I, Alimi AM, Lebouregois F (2015) ‘Arabic characters recognition in natural scenes using sparse coding for feature representations’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE

  29. Tounsi M, Moalla I, Alimi AM (2017) ARASTI: a database for Arabic scene text recognition. In 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, pp 140–144. https://doi.org/10.1109/ASAR.2017.8067776

  30. Tourani A, Soroori S, Shahbahrami A, Akoushideh A (2021) ‘Iranis: A Large-scale Dataset of Iranian Vehicles License Plate Characters’, in 5th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 1–5, https://doi.org/10.1109/IPRIA53572.2021.9483461

  31. Tsai SS, Chen H, Chen D, Schroth G, Grzeszczuk R, Girod B (2011) ‘Mobile visual search on printed documents using text and low bit-rate features’, in 18th IEEE International Conference on Image Processing

  32. Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) ‘Coco-text: Dataset and benchmark for text detection and recognition in natural images’, in arXiv preprint arXiv:1601.07140

  33. Wang K, Wei Z (2022) YOLO V4 with hybrid dilated convolution attention module for object detection in the aerial dataset. Int J Remote Sens 43(4):1323–1344

    Article  Google Scholar 

  34. Wang K, Babenko B, Belongie S (2011) ‘End-to-end scene text recognition’, in International Conference on Computer Vision, IEEE

  35. Wolf C, Jolion J (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296

    Article  Google Scholar 

  36. Wu W, Liu H, Li L, Long Y, Wang X, Wang Z, Li J, Chang Y (2021) Application of local fully Convolutional Neural Network combined with YOLO v5 algorithm in small target detection of remote sensing image. PLoS One 16(10):e0259283

    Article  Google Scholar 

  37. Yousfi S, Berrani S, Garcia C (2015) ‘ALIF: A dataset for Arabic embedded text recognition in TV broadcast’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE

  38. Zayene O, Hennebert J, Touj SM, Ingold R, Amara NEB (2015) ‘A dataset for Arabic text detection, tracking and recognition in news videos-AcTiV’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE

  39. Zhan F, Lu S, Xue C (2018) ‘Verisimilar image synthesis for accurate detection and recognition of texts in scenes’, in Proceedings of the European Conference on Computer Vision (ECCV)

  40. Zhang C, Ding W, Peng G, Fu F, Wang W (2020) Street View Text Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems. IEEE Trans Intell Transp Syst 22:4727–4743

    Article  Google Scholar 

  41. Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: Recent advances and future trends. Front Comput Sci 10(1):19–36

    Article  Google Scholar 

Download references

Funding

The author(s) received no financial support for this article’s research, authorship, and publication.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alireza Akoushideh.

Ethics declarations

Conflict of interest

The author(s) declared no potential conflicts of interest concerning this article’s research, authorship, and publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rashtehroudi, A.R., Akoushideh, A. & Shahbahrami, A. PESTD: a large-scale Persian-English scene text dataset. Multimed Tools Appl 82, 34793–34808 (2023). https://doi.org/10.1007/s11042-023-15062-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15062-0

Keywords

Navigation