Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1568296.1568308acmotherconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Robust named entity detection using an Arabic offline handwriting recognition system

Published: 23 July 2009 Publication History

Abstract

Text from Arabic optical handwriting recognition (OHR) systems can provide key indexing information. In particular, the text is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in optical character recognition (OCR) output look for these NEs in the single-best recognition results. Due to the inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from word lattices produced by our Arabic handwriting recognition system. Since the improvement in recall is accompanied by a large number of false positives, we use confidence scores based on posterior scores to control precision. We show a 7% improvement in true detects for the same false acceptance rate on using lattices instead of 1-best hypothesis for NE lookup.

References

[1]
S. Saleem, H. Cao, K. Subramanian, M. Kamali, R. Prasad, and P. Natarajan, "Improvements in BBN's HMM-based Offline Arabic Handwriting Recognition System," In Proc. International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona, Spain, July 2009.
[2]
P. Natarajan, S. Saleem, R. Prasad, E. MacRostie and K. Subramanian, "Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach," Book chapter in Arabic and Chinese Handwriting Recognition, SACH 2006, Lecture Notes in Computer Science, Vol. 4768, March 2008.
[3]
F. Kubala, R. Schwartz, R Stone and R. Weishedel, "Named Entity Extraction from Speech," In Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[4]
F. Bechet, A. L. Gorin, J. H. Wright, and D. H. Tur, "Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How May I Help You?," Speech Communication, Volume 42, Issue 2, February 2004.
[5]
K. Subramanian, R. Prasad, E. MacRostie, and P. Natarajan, "Robust Named Entity Detection in Videotext using Character Lattices," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, March 2008.
[6]
Z. Lu, R, Schwartz, C, Raphael, "Script-independent, HMM-based text line finding for OCR," International Conference on Pattern Recognition, Barcelona, Spain, Vol. 4, 2000.
[7]
S. Tulyakov, V Govindaraju, "Probabilistic model for segmentation based word recognition with lexicon," International Conference on Document Analysis and Recognition, 2001.
[8]
R. Prasad, S. Saleem, M. Kamali, R. Meermeier, P. Natarajan, "Improvements in Hidden Markov Model Based Arabic OCR", International Conference on Pattern Recognition, Tampa, U.S.A, December 2008.
[9]
L. R. Rabiner. "A tutorial on HMM and selected applications in speech recognition,", In Proc. IEEE, Vol. 77, No. 2, pp. 257--286, Feb. 1989.
[10]
G. Evermann and P. Woodland, "Posterior Probability Decoding, Confidence Estimation and System combination," Proceedings NIST Speech Transcription Workshop, College Park, MD, 2000.
[11]
D. Bikel, S. Miller, R. Schwartz, and R. Weischedel, "NYMBLE: A High-Performance Learning Name-finder," In Proceedings of the Fifth Conference on Applied Natural Language Processing, Association for Computational Linguistics, pp. 194--201, 1997.

Cited By

View all
  • (2023)Analysis of Cursive Text Recognition Systems: A Systematic Literature ReviewACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359260022:7(1-30)Online publication date: 13-Apr-2023
  • (2018)A Deterministic Algorithm for Arabic Character Recognition Based on Letter PropertiesArtificial Intelligence - Emerging Trends and Applications10.5772/intechopen.76944Online publication date: 27-Jun-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AND '09: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
July 2009
127 pages
ISBN:9781605584966
DOI:10.1145/1568296
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic handwriting recognition
  2. lattice search
  3. named entity detection
  4. optical character recognition

Qualifiers

  • Research-article

Conference

AND '09

Acceptance Rates

AND '09 Paper Acceptance Rate 15 of 22 submissions, 68%;
Overall Acceptance Rate 15 of 22 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Analysis of Cursive Text Recognition Systems: A Systematic Literature ReviewACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359260022:7(1-30)Online publication date: 13-Apr-2023
  • (2018)A Deterministic Algorithm for Arabic Character Recognition Based on Letter PropertiesArtificial Intelligence - Emerging Trends and Applications10.5772/intechopen.76944Online publication date: 27-Jun-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media