Al Azawi et al., 2014 - Google Patents

Context-dependent confusions rules for building error model using weighted finite state transducers for OCR post-processing

Al Azawi et al., 2014

Document ID: 16861185594583290922
Author: Al Azawi M; Breuel T
Publication year: 2014
Publication venue: 2014 11th IAPR International Workshop on Document Analysis Systems

External Links

Cited by

Snippet

In this paper, we propose a new technique to correct the OCR errors by means of weighted finite state transducers (WFST) with context-dependent confusion rules. We translate the OCR confusions which appear in the recognition outputs into edit operations, eg insertions …

Continue reading at ieeexplore.ieee.org (other versions)

230000001419 dependent 0 title abstract description 26

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2845—Using very large corpora, e.g. the world wide web [WWW]
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/273—Orthographic correction, e.g. spelling checkers, vowelisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00852—Recognising whole cursive words
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection

Similar Documents

Publication	Publication Date	Title
US9176936B2 (en)	2015-11-03	Transliteration pair matching
Azmi et al.	2015	A survey of automatic Arabic diacritization techniques
CN107870901B (en)	2023-05-12	Method, recording medium, apparatus and system for generating similar text from translation source text
Kanthak et al.	2005	Novel reordering approaches in phrase-based statistical machine translation
Bluche et al.	2014	The a2ia arabic handwritten text recognition system at the open hart2013 evaluation
Kolak et al.	2003	A generative probabilistic OCR model for NLP applications
JP2005505869A (en)	2005-02-24	Identifying character strings
Alkanhal et al.	2012	Automatic stochastic arabic spelling correction with emphasis on space insertions and deletions
US11423237B2 (en)	2022-08-23	Sequence transduction neural networks
Alabau et al.	2014	Improving on-line handwritten recognition in interactive machine translation
Li et al.	2012	Improving text normalization using character-blocks based models and system combination
KR20230009564A (en)	2023-01-17	Learning data correction method and apparatus thereof using ensemble score
Tennage et al.	2018	Transliteration and byte pair encoding to improve tamil to sinhala neural machine translation
Roy et al.	2019	Unsupervised context-sensitive bangla spelling correction with character n-gram
Kinaci	2018	Spelling correction using recurrent neural networks and character level n-gram
Al Azawi et al.	2014	Context-dependent confusions rules for building error model using weighted finite state transducers for OCR post-processing
Kaur et al.	2019	Spell checker for Punjabi language using deep neural network
Peng et al.	2015	An empirical study of Chinese name matching and applications
Singh et al.	2016	Urdu to Punjabi machine translation: An incremental training approach
Torunoglu-Selamet et al.	2016	Exploring spelling correction approaches for turkish
Farooq et al.	2009	Phrase-based correction model for improving handwriting recognition accuracies
JP6564709B2 (en)	2019-08-21	Sentence rewriting device, method, and program
Al Azawi et al.	2014	Character-level alignment using WFST and LSTM for post-processing in multi-script recognition systems-A comparative study
Habeeb et al.	2021	Three N-grams Based Language Model for Auto-correction of Speech Recognition Errors
Manohar et al.	2015	Spellchecker for Malayalam using finite state transition models