Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICDAR.2009.155guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A New Arabic Printed Text Image Database and Evaluation Protocols

Published: 26 July 2009 Publication History

Abstract

We report on the creation of a database composed of images of Arabic Printed words. The purpose of this database is the large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style text recognition systems in Arabic. The challenges that are addressed by the database are in the variability of the sizes, fonts and style used to generate the images. A focus is also given on low-resolution images where anti-aliasing is generating noise on the characters to recognize. The database is synthetically generated using a lexicon of 113’284 words, 10 Arabic fonts, 10 font sizes and 4 font styles. The database contains 45’313’600 single word images totaling to more than 250 million characters. Ground truth annotation is provided for each image. The database is called APTI for Arabic Printed Text Images.

Cited By

View all
  • (2023)Analysis of Cursive Text Recognition Systems: A Systematic Literature ReviewACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359260022:7(1-30)Online publication date: 20-Jul-2023
  • (2021)Real-time Assistive Reader Pen for Arabic LanguageACM Transactions on Asian and Low-Resource Language Information Processing10.1145/342313320:1(1-30)Online publication date: 31-Mar-2021
  • (2021)MIDV-LAIT: A Challenging Dataset for Recognition of IDs with Perso-Arabic, Thai, and Indian ScriptsDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86331-9_17(258-272)Online publication date: 5-Sep-2021
  • Show More Cited By

Index Terms

  1. A New Arabic Printed Text Image Database and Evaluation Protocols
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICDAR '09: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
    July 2009
    1425 pages
    ISBN:9780769537252

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 26 July 2009

    Author Tags

    1. Arabic Text Recognition System
    2. OCR
    3. benchmarking
    4. text image databases

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Analysis of Cursive Text Recognition Systems: A Systematic Literature ReviewACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359260022:7(1-30)Online publication date: 20-Jul-2023
    • (2021)Real-time Assistive Reader Pen for Arabic LanguageACM Transactions on Asian and Low-Resource Language Information Processing10.1145/342313320:1(1-30)Online publication date: 31-Mar-2021
    • (2021)MIDV-LAIT: A Challenging Dataset for Recognition of IDs with Perso-Arabic, Thai, and Indian ScriptsDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86331-9_17(258-272)Online publication date: 5-Sep-2021
    • (2019)Printed Arabic Text Database for Automatic Recognition SystemsProceedings of the 2019 5th International Conference on Computer and Technology Applications10.1145/3323933.3324082(107-111)Online publication date: 16-Apr-2019
    • (2017)Arabic optical character recognition softwarePattern Recognition and Image Analysis10.1134/S105466181704006X27:4(763-776)Online publication date: 1-Oct-2017
    • (2017)A texture-based approach for word script and nature identificationPattern Analysis & Applications10.1007/s10044-016-0555-x20:4(1157-1167)Online publication date: 1-Nov-2017
    • (2016)A Four-Tier Annotated Urdu Handwritten Text Image Dataset for Multidisciplinary Research on Urdu ScriptACM Transactions on Asian and Low-Resource Language Information Processing10.1145/285705315:4(1-23)Online publication date: 16-May-2016
    • (2016)Printed Text Image Database for Sindhi OCRACM Transactions on Asian and Low-Resource Language Information Processing10.1145/284609315:4(1-18)Online publication date: 16-May-2016
    • (2016)Open-vocabulary recognition of machine-printed Arabic text using hidden Markov modelsPattern Recognition10.1016/j.patcog.2015.09.01151:C(97-111)Online publication date: 1-Mar-2016
    • (2016)Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networksNeurocomputing10.1016/j.neucom.2015.11.030177:C(228-241)Online publication date: 12-Feb-2016
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media