Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3103010.3103021acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Assessing Binarization Techniques for Document Images

Published: 31 August 2017 Publication History

Abstract

Image binarization is a technique widely used for documents as monochromatic documents claim for far less space for storage and computer bandwidth for network transmission than their color or even grayscale equivalent. Paper color, texture, aging, translucidity, kind and color of ink used in handwritting, printing process, digitalization process, etc., are some of the factors that affect binarization. No algorithm is good enough to be a winner in the binarization of all kinds of documents. This paper presents a methodology to assess the performance of binarization algorithms for a wide variety of text documents, allowing a judicious quantitative choice of the best algorithms and their parameters.

References

[1]
K. Ntirogiannis, B. Gatos and I. Pratikakis, Performance Evaluation Methodology for Historical Document Image Binarization, IEEE Trans. Image Proc., vol.22, no.2, pp. 595--609, Feb. 2013.
[2]
R. D. Lins et al. An Environment for Processing Images of Historical Documents. Microproc. and Microprogramming, 111--121, 1995.
[3]
G. Sharma. Show-trough cancellation in scans of duplex printed documents. IEEE Transaction Image Processing, v. 10, n. 5, p. 736--754, 2001.
[4]
R. D. Lins. Nabuco -- Two Decades of Processing Historical Documents in Latin America. Journal of Universal Computer Science., March 2011.
[5]
C. A. B. Mello and R. D. Lins. 2002. Generation of Images of Historical Documents by Composition. Symposium on Document Engineering, 127--133. 2002.
[6]
M.Sezgin and B.Sankur. A Survey over Image Thresholding Techniques and Quantitative Performance Evaluation. Journal of Electronc Imaging, v. 1, n. 13, p. 146--165, 2004.
[7]
J. N. Kapur, P. K. Sahoo, A. K. C. Wong. A New Method for Gray-Level Picture Thersholding Using the Entropy of the Histogram. C. Vision Graphics and Image Processing, v. 29, p. 273--285, 1985.
[8]
N. Otsu. A Threshold Selection Method from Gray-Level Histograms. IEEE Transaction on Systems, Man and Cybernetics, v. SMC-9, n. 1, p. 62--66, 1979.
[9]
G. Johannsen and J. A. Bille. A Threshold Selection Method Using Information Measure. ICPR'82 - Proceeding 6th International Conference on Pattern Recognition, 140--143. 1982.
[10]
J. C. Yen, F. J. Chang, S. Chang. 1995. A New Criterion for Automatic Multilevel Thresholding. IEEE Transaction Image Process IP-4, 370--378.
[11]
U. L. Wu, A. Songde, L. U. Haqing. 1998. An Effective Entropic Thresholding for Ultrasonic Imaging. International Conference Pattern Recognition, 1522--1524.
[12]
C. A. B. Mello and R. D. Lins. Generation of Images of Historical Documents by Composition. Proceedings of the 2002 ACM symposium on Document engineering, 127--133, 2002.
[13]
J. M. M. Silva, R. D. Lins, V. C. Rocha. Binarizing and Filtering Historical Documents with Back-to-Front Interference. ACM Symposium on Applied Computing, 853--858, 2006.
[14]
E. Roe and C. A. B. Mello. Binarization of Color Historical Document Images Using Local Image Equalization and XDoG. 12th International Conference on Document Analysis and Recognition, August, p. 205--209, 2013.
[15]
M. A. M. de Almeida, R. D. Lins, B. C. Lima, A New Binarization Algorithm for Images with Back-to-Front Interference. Submitted for publication, 2017.
[16]
S. Paris, P. Kornprobst, J. Tumblin and F. Durand. Bilateral Filtering: Theory and Applications. Foundations and Trends in Computer Graphics and Vision. Vol. 4, No. 1, 1--73. 2008.
[17]
A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and transfer. SIGGRAPH '01 28th annual conference on Computer graphics and interactive techniques, 341--346. 2001.
[18]
N. Memarsadeghi, D. M. Mount, N. S. Netanyahu, J. Moigne. 2007. A Fast Implementation of the IsoData Clustering Algorithm. International Journal of Computational Geometry and Applications, 71--103.
[19]
T. Pun. Entropic Thresholding, A New Approach. Computer Vision Graphics and Image Processing, 210--239, 1981.
[20]
R. D. Lins and G. F. P e Silva. Assessing Strategies to Remove Back-to-Front Interference in Color Documents. IEEE International Telecommunications Symposium, 2010, IEEE Press, p. 1--6, 2010.
[21]
G. G.Mattos, A. A. Formiga, R. D. Lins, F. M. J. Martins. BigBatch: a document processing platform for clusters and grids. ACM-SAC 2008. ACM Press, 2008. v. I. p. 434--441.
[22]
Scikit-learn. http://scikit-learn.org/stable/ (visited: 31st May 2017)

Cited By

View all
  • (2023)FPGA-based point processing for denoising of the show-through effect in scanned document imagesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.12.00135:1(296-309)Online publication date: Jan-2023
  • (2023)Document Image BinarizationDocument Layout Analysis10.1007/978-981-99-4277-0_2(11-30)Online publication date: 1-Aug-2023
  • (2023)Adaptive Binarization of Metal Nameplate Images Using the Pixel Voting ApproachComputer Vision and Graphics10.1007/978-3-031-22025-8_10(137-149)Online publication date: 11-Feb-2023
  • Show More Cited By

Index Terms

  1. Assessing Binarization Techniques for Document Images

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering
    August 2017
    242 pages
    ISBN:9781450346894
    DOI:10.1145/3103010
    © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    In-Cooperation

    • SIGDOC: ACM Special Interest Group on Systems Documentation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. back-to-front interference
    2. big-data
    3. binarization
    4. bleeding
    5. documents
    6. image filtering
    7. show through

    Qualifiers

    • Research-article

    Funding Sources

    • CNPq - Brazilian Government

    Conference

    DocEng '17
    Sponsor:
    DocEng '17: ACM Symposium on Document Engineering 2017
    September 4 - 7, 2017
    Valletta, Malta

    Acceptance Rates

    DocEng '17 Paper Acceptance Rate 13 of 71 submissions, 18%;
    Overall Acceptance Rate 194 of 564 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)FPGA-based point processing for denoising of the show-through effect in scanned document imagesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.12.00135:1(296-309)Online publication date: Jan-2023
    • (2023)Document Image BinarizationDocument Layout Analysis10.1007/978-981-99-4277-0_2(11-30)Online publication date: 1-Aug-2023
    • (2023)Adaptive Binarization of Metal Nameplate Images Using the Pixel Voting ApproachComputer Vision and Graphics10.1007/978-3-031-22025-8_10(137-149)Online publication date: 11-Feb-2023
    • (2022)The Winner Takes It All: Choosing the “best” Binarization Algorithm for Photographed DocumentsDocument Analysis Systems10.1007/978-3-031-06555-2_4(48-64)Online publication date: 18-May-2022
    • (2021)DCNet: Noise-Robust Convolutional Neural Networks for Degradation Classification on Ancient DocumentsJournal of Imaging10.3390/jimaging70701147:7(114)Online publication date: 12-Jul-2021
    • (2021)Personalizing image enhancement for critical visual tasks: improved legibility of papyri using color processing and visual illusionsInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-021-00386-025:2(129-160)Online publication date: 27-Dec-2021
    • (2021)Hardware-Based Document Image Thresholding Techniques Using DSP Builder and SimulinkArtificial Intelligence and Evolutionary Computations in Engineering Systems10.1007/978-981-16-2674-6_16(207-220)Online publication date: 19-Aug-2021
    • (2021)ICDAR 2021 Competition on Time-Quality Document Image BinarizationDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_47(708-722)Online publication date: 2-Sep-2021
    • (2020)Toward a Binarization Framework resolving the Maghrebian Font Database challenges2020 17th International Multi-Conference on Systems, Signals & Devices (SSD)10.1109/SSD49366.2020.9364086(462-466)Online publication date: 20-Jul-2020
    • (2020)Historical Document Image Binarization: A ReviewSN Computer Science10.1007/s42979-020-00176-11:3Online publication date: 16-May-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media