short-paper

Evaluating deep neural networks for image document enhancement

Authors:

Lucas N. Kirsten,

Ricardo Piccoli,

Ricardo RibaniAuthors Info & Claims

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

Article No.: 24, Pages 1 - 4

https://doi.org/10.1145/3469096.3474938

Published: 16 August 2021 Publication History

Abstract

This work evaluates six state-of-the-art deep neural network (DNN) architectures applied to the problem of enhancing camera-captured document images. The results from each network were evaluated both qualitatively (i. e. with manual visual inspection) and quantitatively (i. e. using Image Quality Assessment metrics - IQA), and also compared with an existing approach based on traditional computer vision techniques. The best performing architectures generally produced good enhancement compared to the existing algorithm, showing that it is possible to use DNNs for document image enhancement. Furthermore, the best performing architectures could work as a baseline for future investigations on document enhancement using deep learning techniques. The main contributions of this paper are: a baseline of deep learning techniques that can be further improved to provide better results, and a evaluation methodology using IQA metrics for quantitatively comparing the produced images from the neural networks to a ground truth.

Supplementary Material

PDF File (a24-kirsten-supp.pdf)

Supplemental material.

Download
16.85 MB

References

[1]

Jongmin Baek. 2016. Fast Document Rectification and Enhancement. Available at: https://dropbox.tech/machine-learning/fast-document-rectification-and-enhancement. Accessed in: 2020-05-13.

[2]

Dor Bank, Noam Koenigstein, and Raja Giryes. 2020. Autoencoders. arXiv preprint arXiv:2003.05991 (2020).

[3]

Sebastian Bosse, Dominique Maniry, Klaus-Robert Müller, Thomas Wiegand, and Wojciech Samek. 2017. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Transactions on Image Processing 27, 1 (2017), 206--219.

[4]

HD Cheng and XJ Shi. 2004. A simple and effective histogram equalization approach to image enhancement. Digital signal processing 14, 2 (2004), 158--170.

[5]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[6]

Jian Fan. 2007. Enhancement of camera-captured document images with watershed segmentation. CBDAR07 (2007), 87--93.

[7]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[9]

Sheng He and Lambert Schomaker. 2019. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognition 91 (2019), 379--390.

Digital Library

[10]

José Luis Hidalgo, Salvador Espana, María José Castro, and José Alberto Pérez. 2005. Enhancement and cleaning of handwritten data by using neural networks. In Iberian Conference on Pattern Recognition and Image Analysis. Springer, 376--383.

Digital Library

[11]

Xiaodan Hu, Mohamed A Naiel, Alexander Wong, Mark Lamm, and Paul Fieguth. 2019. RUNet: A robust UNet architecture for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.

[12]

Jie Huang, Pengfei Zhu, Mingrui Geng, Jiewen Ran, Xingguang Zhou, Chen Xing, Pengfei Wan, and Xiangyang Ji. 2018. Range scaling global u-net for perceptual image enhancement on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 0--0.

[13]

Matthew Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, and Ram Vasudevan. 2016. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? arXiv preprint arXiv:1610.01983 (2016).

[14]

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.

[15]

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 136--144.

[16]

Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. Docunet: document image unwarping via a stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4709.

[17]

Pedram Mohammadi, Abbas Ebrahimi-Moghadam, and Shahram Shirani. 2014. Subjective and objective quality assessment of image: A survey. arXiv preprint arXiv:1406.7799 (2014).

[18]

Yingxue Pang, Jianxin Lin, Tao Qin, and Zhibo Chen. 2021. Image-to-Image Translation: Methods and Applications. arXiv preprint arXiv:2101.08629 (2021).

[19]

Ekta Prashnani, Hong Cai, Yasamin Mostofi, and Pradeep Sen. 2018. PieAPP: Perceptual Image-Error Assessment Through Pairwise Preference. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]

R Priyadarshini, Arvind Bharani, E Rahimankhan, and N Rajendran. 2021. Low-Light Image Enhancement Using Deep Convolutional Network. In Innovative Data Communication Technologies and Application. Springer, 695--705.

[21]

Hussam Qassim, Abhishek Verma, and David Feinzimer. 2018. Compressed residual-VGG16 CNN model for big data places image recognition. In 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 169--175.

[22]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[23]

Tim Salimans and Durk P Kingma. 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems. 901--909.

[24]

Fuyu Tao, Xiaomin Yang, Wei Wu, Kai Liu, Zhili Zhou, and Yiguang Liu. 2018. Retinex-based image enhancement framework by using region covariance filter. Soft Computing 22, 5 (2018), 1399--1420.

Digital Library

[25]

Chunwei Tian, Lunke Fei, Wenxian Zheng, Yong Xu, Wangmeng Zuo, and Chia-Wen Lin. 2020. Deep learning on image denoising: An overview. Neural Networks (2020).

[26]

Thang Vu, Cao Van Nguyen, Trung X Pham, Tung M Luu, and Chang D Yoo. 2018. Fast and efficient image quality enhancement via desubpixel convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 0--0.

[27]

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. Ieee, 1398--1402.

[28]

Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, and Thomas Huang. 2018. Wide activation for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718 (2018).

[29]

Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. 2017. Learning deep CNN denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3929--3938.

[30]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

Cited By

Djukić VPopović A(2023)Handling complex representations in visual modeling tools for MDSD/DSM by means of code generator languagesJournal of Computer Languages10.1016/j.cola.2023.10120875(101208)Online publication date: Jun-2023
https://doi.org/10.1016/j.cola.2023.101208
Siddiqi MAlsirhani A(2022)An Ensembled Spatial Enhancement Method for Image Enhancement in HealthcareJournal of Healthcare Engineering10.1155/2022/96608202022(1-12)Online publication date: 4-Jan-2022
https://doi.org/10.1155/2022/9660820

Index Terms

Evaluating deep neural networks for image document enhancement
1. Computing methodologies

Recommendations

Low illumination image enhancement based on multi-scale CycleGAN with deep residual shrinkage

Low-illumination image restoration has been widely used in many fields. Aiming at the problem of low resolution and noise amplification in low light environment, this paper applies style transfer of CycleGAN(Cycle-Consistent Generative Adversarial ...
Single image deraining using deep convolutional networks

A deep learning-based single image deraining algorithm is proposed in this work. Instead of modeling a rain layer as a linear function between the rain image and its clear version as previous works do, we directly formulate the clear image as the result ...
Video surveillance image enhancement via a convolutional neural network and stacked denoising autoencoder
Abstract
In an extensive-scale surveillance system, the quality of the surveillance camera installed varies. This variation of surveillance camera produces different image quality in terms of resolution, illumination, and noise. The quality of the captured ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

August 2021

178 pages

ISBN:9781450385961

DOI:10.1145/3469096

General Chairs:
Patrick Healy
University of Limerick, Ireland
,
Mihai Bilauca
University of Limerick, Ireland
,
Program Chair:
Alexandra Bonnici
University of Malta, Malta

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DocEng '21

Sponsor:

SIGWEB

DocEng '21: ACM Symposium on Document Engineering 2021

August 24 - 27, 2021

Limerick, Ireland

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
123
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Djukić VPopović A(2023)Handling complex representations in visual modeling tools for MDSD/DSM by means of code generator languagesJournal of Computer Languages10.1016/j.cola.2023.10120875(101208)Online publication date: Jun-2023
https://doi.org/10.1016/j.cola.2023.101208
Siddiqi MAlsirhani A(2022)An Ensembled Spatial Enhancement Method for Image Enhancement in HealthcareJournal of Healthcare Engineering10.1155/2022/96608202022(1-12)Online publication date: 4-Jan-2022
https://doi.org/10.1155/2022/9660820

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents