Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3323933.3324082acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicctaConference Proceedingsconference-collections
research-article

Printed Arabic Text Database for Automatic Recognition Systems

Published: 16 April 2019 Publication History

Abstract

Document image analysis and recognition are important topics in artificial intelligence as they are necessary for the retrieval of documents. Hence, the availability of a database with good script samples is a key requirement for machine-learning processes. Good printed text databases exist for Latin languages. However, there is a lack of databases with Arabic samples. This paper presents a new comprehensive database called PATD (Printed Arabic Text Database), which contains eight hundred and ten images scanned in grayscale format and different resolutions, leading to two thousand and nine hundred and fifty-four images (smartphone-captured images) under varying capture conditions (blurred, at different angles and in different light). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. The database is described in detail and it is intended for the research community.

References

[1]
Alginahi, Y.M., A Survey on Arabic Character Segmentation, International Journal on Document Analysis and Recognition (IJDAR), 16(2), pp. 105--126, 2013.
[2]
AbdelRaouf, A., Higgins, C.A. & Khalil, M, A Database for Arabic Printed Character Recognition, in International Conference Image Analysis and Recognition, Springer, Portugal, 2008.
[3]
Amara, N.B.: On the problematic and Oreintations in recognition of the Arabic Writing.In: CiFED 2002, pp, 1--10(2002)
[4]
Kanoun, S., Alimi, A.M., Lecourtier, Y.: Affixal Approach for Arabic Decom-posable Vocabulary Recognition: A Validation on Printed Word in Only One Font. In: ICDAR 2005, pp. 1025--1029 (2005)
[5]
Faten Kallel Jaiem, Slim Kanoun, Maher Khemakhem, Haikal El Abed, and Jihain Kardoun, Database for Arabic Printed Text Recognition Research. ICIAP 2013, Part I, LNCS 8156, pp. 251--259, 2013.
[6]
R. Davidson and R. Hopely, Arabic and Persian OCR Training and Test Data Sets, Proc. Of Symp. on Document Image Understanding Technology, 30 April-2 May 1997.
[7]
Amin G. AL-Hashim and Sabria. Mahmoud, Printed Arabic Text Database (PATDB) for Research and Benchmarking. In: proc. of 9th Wseas International Conference on Applications of Computer Engineering, pp.62--68 (2010).
[8]
Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M. Alimi, Jean Hennebert, A New Arabic Printed Text Image Database and Evaluation Protocols. In: proc. of 10th International Conference on Document Analysis and Recognition, ICDAR.2009, pp.946--950 (2009).
[9]
Fatma Chabchoub, Yousri Kessentini, Slim Kanoun and Veronique Eglin, SmartATID: A mobile captured Arabic Text Images Dataset for multi-purpose recognition tasks. International Conference in Frontiers on Handwriting Recognition.2016.
[10]
I. Chtourou, A. Cheikh Rouhou, F. Jaiem, and S. Kanoun, "ALTID: Arabic/Latin Text Images Database for recognition research", in ICDAR, 2015, pp. 836--840.
[11]
S. Ahmed, M. Imran Malik, M. Zeshan Afzal, K. Kise, M. Iwamura, A. Dengel and M. Liwicki, "A Generic Method for Automatic Ground Truth Generation of Camera-captured Documents", in arxiv.org, 2016.

Cited By

View all
  • (2024)Deep Learning-based Arabic Optical Character Recognition: A New Comprehensive Dataset at Character and Word Levels2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638273(1-6)Online publication date: 13-Aug-2024
  • (2023)Computational Analysis of Printed Arabic Text Database for Natural Language ProcessingCognitive Studies | Études cognitives10.11649/cs.3027Online publication date: 31-Dec-2023
  • (2021)Offline Pashto Characters Dataset for OCR SystemsSecurity and Communication Networks10.1155/2021/35438162021Online publication date: 1-Jan-2021
  • Show More Cited By

Index Terms

  1. Printed Arabic Text Database for Automatic Recognition Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCTA '19: Proceedings of the 2019 5th International Conference on Computer and Technology Applications
    April 2019
    206 pages
    ISBN:9781450371810
    DOI:10.1145/3323933
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Istanbul Technical University: Istanbul Technical University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Arabic Printed Text Database
    2. Arabic Text Recognition system
    3. Arabic language
    4. Database
    5. Document images

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCTA 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep Learning-based Arabic Optical Character Recognition: A New Comprehensive Dataset at Character and Word Levels2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638273(1-6)Online publication date: 13-Aug-2024
    • (2023)Computational Analysis of Printed Arabic Text Database for Natural Language ProcessingCognitive Studies | Études cognitives10.11649/cs.3027Online publication date: 31-Dec-2023
    • (2021)Offline Pashto Characters Dataset for OCR SystemsSecurity and Communication Networks10.1155/2021/35438162021Online publication date: 1-Jan-2021
    • (2019)A Convolutional Neural Network for Arabic Document Analysis2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT47144.2019.9001779(1-6)Online publication date: Dec-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media