Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2501115.2501127acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

An efficient parametrization of character degradation model for semi-synthetic image generation

Published: 24 August 2013 Publication History

Abstract

This paper presents an efficient parametrization method for generating synthetic noise on document images. By specifying the desired categories and amount of noise, the method is able to generate synthetic document images with most of degradations observed in real document images (ink splotches, white specks or streaks). Thanks to the ability of simulating different amount and kind of noise, it is possible to evaluate the robustness of many document image analysis methods. It also permits to generate data for algorithms that employ a learning process. The degradation model presented in [7] needs eight parameters for generating randomly noise regions. We propose here an extension of this model which aims to set automatically the eight parameters to generate precisely what a user wants (amount and category of noise). Our proposition consists of three steps. First, Nsp seed-points (i.e. centres of noise regions) are selected by an adaptive procedure. Then, these seed-points are classified into three categories of noise by using a heuristic rule. Finally, each size of noise region is set using a random process in order to generate degradations as realistic as possible.

References

[1]
H. S. Baird. The State of the Art of Document Image Degradation Modeling. In In Proc. of 4 th IAPR International Workshop on Document Analysis Systems, Rio de Janeiro, pages 1--16, Rio de Janeiro, Brazil, 2000.
[2]
M. Delalandre, E. Valveny, T. Pridmore, and D. Karatzas. Generation of Synthetic Documents for Performance Evaluation of Symbol Recognition & Spotting Systems. Int. J. Doc. Anal. Recognit., 13(3):187--207, Sept. 2010.
[3]
D. D. Jian Zhai, Liu Wenyin and Q. Li. A Line Drawings Degradation Model for Performance Characterization. In Proc. 7th ICDAR, pages 1020--1024, Edinburgh, Scotland, August 2003.
[4]
T. Kanungo, R. Haralick, H. Baird, W. Stuezle, and D. Madigan. A statistical, Nonparametric Methodology for Document Degradation Model Validation. IEEE Trans. Pattern Anal. Mach. Intell., 22(11):1209--1223, 2000.
[5]
T. Kanungo, R. M. Haralick, and I. Phillips. Global and Local Document Degradation Models. In Proc. of the ICDAR, pages 730--734, Tsukuba Science City, Japan, Oct. 1993.
[6]
V. Kieu, N. Journet, M. Visani, R. Mullot, and J. P. Domenger. Semi-synthetic Document Image Generation Using Texture Mapping on Scanned 3D Document Shapes. In Accepted for publication in Document Analysis and Recognition (ICDAR), 2013 International Conference on, 2013.
[7]
V. Kieu, M. Visani, N. Journet, J. P. Domenger, and R. Mullot. A Character Degradation Model for Grayscale Ancient Document Images. In Proc. of the ICPR, pages 685--688, Tsukuba Science City, Japan, Nov. 2012.
[8]
Y. Li, D. Lopresti, G. Nagy, and A. Tomkins. Validation of Image Defect Models for Optical Character Recognition. IEEE Trans. Pattern Anal. Mach. Intell., 18(2):99--108, Feb. 1996.
[9]
R. Loce and W. Lama. Halftone Banding due to Vibrations in A Xerographic Image Bar Printer. Journal of Imaging Technology, 16(1):6--11, 1990.
[10]
R. F. Moghaddam and M. Cheriet. Low Quality Document Image Modeling and Enhancement. In Int. J. Doc. Anal. Recognit, volume 11, pages 183--201, Berlin, Heidelberg, March 2009. Springer.
[11]
M. Mori, A. Suzuki, A. Shio, and S. Ohtsuka. Generating New Samples from Handwritten Numerals Based on Point Correspondence. In Proc. 7th Int. Workshop on Frontiers in Handwriting Recognition, pages 281--290, Amsterdam, Netherlands, 2000.
[12]
E. B. Smith. Modeling Image Degradations for Improving OCR. In 16th European Signal Processing Conference (EUSIPCO), pages 1--5, Lausanne, Switzerland, August 2008.
[13]
T. Varga and H. Bunke. Effects of Training Set Expansion in Handwriting Recognition Using Synthetic Data. In Proc. 11th Conf. of the Int. Graphonomics Society, pages 200--203, Scottsdale, AZ, USA, Nov. 2003. Citeseer.
[14]
T. Varga and H. Bunke. Generation of Synthetic Training Data for an HMM-based Handwriting Recognition System. In Proc. 7th ICDAR, pages 618--622, Edinburgh, Scotland, August 2003.

Cited By

View all
  • (2022)Low-Computational-Cost Algorithm for Inclination Correction of Independent Handwritten Digits on MicrocontrollersElectronics10.3390/electronics1107107311:7(1073)Online publication date: 29-Mar-2022
  • (2020)A Method to Generate Synthetically Warped Document ImageComputer Vision and Image Processing10.1007/978-981-15-4015-8_24(270-280)Online publication date: 29-Mar-2020
  • (2019)The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image ProcessingThe Computer Journal10.1093/comjnl/bxz098Online publication date: 2-Dec-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
HIP '13: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
August 2013
141 pages
ISBN:9781450321150
DOI:10.1145/2501115
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • FamilySearch: FamilySearch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. degradation model
  2. degradation model validation
  3. performance evaluation
  4. synthetic document image

Qualifiers

  • Research-article

Funding Sources

Conference

HIP '13
Sponsor:
  • FamilySearch

Acceptance Rates

HIP '13 Paper Acceptance Rate 18 of 31 submissions, 58%;
Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Low-Computational-Cost Algorithm for Inclination Correction of Independent Handwritten Digits on MicrocontrollersElectronics10.3390/electronics1107107311:7(1073)Online publication date: 29-Mar-2022
  • (2020)A Method to Generate Synthetically Warped Document ImageComputer Vision and Image Processing10.1007/978-981-15-4015-8_24(270-280)Online publication date: 29-Mar-2020
  • (2019)The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image ProcessingThe Computer Journal10.1093/comjnl/bxz098Online publication date: 2-Dec-2019
  • (2016)Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment2016 12th IAPR Workshop on Document Analysis Systems (DAS)10.1109/DAS.2016.53(422-427)Online publication date: Apr-2016
  • (2015)A character degradation model for color document imagesProceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR)10.1109/ICDAR.2015.7333873(806-810)Online publication date: 23-Aug-2015
  • (2013)Robustness Assessment of Texture Features for the Segmentation of Ancient DocumentsProceedings of the 2013 27th Brazilian Symposium on Software Engineering10.1109/DAS.2014.22(293-297)Online publication date: 1-Oct-2013

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media