Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2683483.2683550acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Table Extraction from Document Images using Fixed Point Model

Published: 14 December 2014 Publication History

Abstract

The paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multicolumn page layout, show the applicability of our algorithm in layout analysis and table detection.

References

[1]
A. Bansal, S. Chaudhury, S. Dutta Roy, and J. B. Srivastava. Newspaper article extraction using hierarchical fixed point model. In Proc. IAPR International Workshop on Document Analysis Systems, DAS'14, pages 257–261, 2014.
[2]
D. S. Bloomberg. Multiresolution morphological approach to document image analysis. In Proceedings of the 1991 International Conference on Document Analysis and Recognition, ICDAR '91, 1991.
[3]
F. Cesari, S. Marinai, L. Sarti, and G. Soda. Trainable table location in document images. In Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3, 2002.
[4]
B. B. Chaudhuri and U. Garain. Automatic detection of italic, bold and all-capital words in document images. In Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1, ICPR '98, 1998.
[5]
J. Chen and D. Lopresti. Table detection in noisy off-line handwritten documents. In Document Analysis and Recognition (ICDAR), 2011 International Conference on, pages 399–403, 2011.
[6]
A. C. e Silva. Learning rich hidden markov models in document analysis: Table location. In ICDAR, pages 843–847. IEEE Computer Society, 2009.
[7]
D. W. Embley, M. Hurst, D. P. Lopresti, and G. Nagy. Table-processing paradigms: a research survey. IJDAR, 8(2):66–86, 2006.
[8]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871–1874, June 2008.
[9]
J. Fang, P. Mitra, Z. Tang, and C. L. Giles. Table header detection and classification. In AAAI, 2012.
[10]
S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., 6:721–741, 1984.
[11]
J. C. Handley. Electronic Imaging Technology. Document Recognition, SPIE, 1999.
[12]
G. Harit and A. Bansal. Table detection in document images using header and trailer patterns. In Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP '12, pages 62:1–62:8, 2012.
[13]
J. Hu, R. S. Kashi, D. P. Lopresti, and G. Wilfong. Medium-independent table detection. In Proc. SPIE, volume 3967, pages 291–302, 1999.
[14]
T. Kasar, P. Barlas, S. Adam, C. Chatelain, and T. Paquet. Learning to detect tables in scanned document images using line information. In ICDAR, pages 1185–1189, 2013.
[15]
D. Keysers, F. Shafait, and T. M. Breuel. Document image zone classification - a simple high-performance approach. In in 2nd Int. Conf. on Computer Vision Theory and Applications, pages 44–51, 2007.
[16]
T. Kieninger. Table structure recognition based on robust block segmentation. In Document Tecognition V SPIE, pages 22–32, 1998.
[17]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282–289, 2001.
[18]
Q. Li, J. Wang, Z. Tu, and D. P. Wipf. Fixed-point model for structured labeling. In Proc. of the 30th International Conference on Machine Learning (ICML-13), volume 28:1, pages 214–221, 2013.
[19]
Y. Liu. Tableseer: Automatic Table Extraction and Search and Understanding. Ph.D. Thesis, The Pennsylvania State University, 2009.
[20]
D. P. Lopresti and G. Nagy. A tabular survey of automated table processing. In Selected Papers from the Third International Workshop on Graphics Recognition, Recent Advances, GREC '99, pages 93–120, 2000.
[21]
S. Mandal, S. P. Chowdhury, A. K. Das, and B. Chanda. A simple and effective table detection system from document images. IJDAR, 8:172–182, 2006.
[22]
S. Mark. Ugm: Matlab code for undirected graphical models, 2011. http://www.cs.ubc.ca/schmidtm/Software/UGM.html.
[23]
G. Nagy. Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell., 22:38–62, 2000.
[24]
D. Pinto, A. McCallum, X. Wei, and W. B. Croft. Table extraction using conditional random fields. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03, pages 235–242, 2003.
[25]
J.-Y. Ramel, M. Crucianu, N. Vincent, and C. Faure. Detection, extraction and representation of tables. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1, ICDAR '03, 2003.
[26]
F. Shafait and R. Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS '10, pages 65–72, 2010.
[27]
A. Shahab, F. Shafait, T. Kieninger, and A. Dengel. An open approach towards the benchmarking of table structure recognition systems. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS '10, pages 113–120, 2010.
[28]
A. C. Silva, A. M. Jorge, and L. Torgo. Design of an end-to-end method to extract information from tables. International Journal Document Analysis Research, 8:144–171, 2006.
[29]
R. W. Smith. Hybrid page layout analysis via tab-stop detection. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, ICDAR '09, pages 241–245, 2009.
[30]
W. Tersteegen and C. Wenzel. Scantab: Table recognition by reference tables. In in Proceedings of Document Analysis Systems, (DAS'98, 1998.
[31]
Y. Wang, R. Haralick, and I. T. Phillips. Improvement of zone content classification by using background analysis. In In Fourth IAPR International Workshop on Document Analysis Systems. (DAS 2000), Rio de Janeiro, pages 10–13, 2000.
[32]
Y. Wang, R. M. Haralick, and I. T. Phillips. Automatic table ground truth generation and a background-analysis-based table structure extraction method. In ICDAR, pages 528–532, 2001.
[33]
Y. Wang and J. Hu. A machine learning based approach for table detection on the web. In Proceedings of the 11th International Conference on World Wide Web, WWW '02, pages 242–250, 2002.
[34]
Y. Wang, I. T. Phillips, and R. M. Haralick. Table detection via probability optimization. In in Proceedings of Document Analysis Systems, (DAS'02, pages 272–282. Springer-Verlag, 2002.
[35]
Y. Wang, I. T. Phillips, and R. M. Haralick. Document zone content classification and its performance evaluation. Pattern Recogn., 39(1):57–73, Jan. 2006.
[36]
R. Zanibbi, D. Blostein, and R. Cordy. A survey of table recognition: Models, observations, transformations, and inferences. Int. J. Doc. Anal. Recognit., 7:1–16, 2004.

Cited By

View all
  • (2023)Hindi Text Summarization Using Sequence to Sequence Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362401322:10(1-18)Online publication date: 13-Oct-2023
  • (2022)End-to-end table structure recognition and extraction in heterogeneous documentsApplied Soft Computing10.1016/j.asoc.2022.108942123(108942)Online publication date: Jul-2022
  • (2022)Robust Detection of Tables in Documents Using Scores from Table Cell CoresSN Computer Science10.1007/s42979-022-01041-z3:2Online publication date: 12-Feb-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing
December 2014
692 pages
ISBN:9781450330619
DOI:10.1145/2683483
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Conditional Random Fields
  2. Fixed Point Model
  3. Layout analysis
  4. Structured labeling
  5. Table recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICVGIP '14

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Hindi Text Summarization Using Sequence to Sequence Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362401322:10(1-18)Online publication date: 13-Oct-2023
  • (2022)End-to-end table structure recognition and extraction in heterogeneous documentsApplied Soft Computing10.1016/j.asoc.2022.108942123(108942)Online publication date: Jul-2022
  • (2022)Robust Detection of Tables in Documents Using Scores from Table Cell CoresSN Computer Science10.1007/s42979-022-01041-z3:2Online publication date: 12-Feb-2022
  • (2022)Automatic Conversion of Table Contents from PDF Technical Specification Documents into Database Using AI Optical Character Recognition (OCR)Proceedings of International Conference on Communication and Computational Technologies10.1007/978-981-19-3951-8_22(283-291)Online publication date: 27-Sep-2022
  • (2022)An Approach to Convert Compound Document Image to Editable ReplicaAdvances in Information Communication Technology and Computing10.1007/978-981-19-0619-0_52(599-607)Online publication date: 10-May-2022
  • (2021)Template-based NLG for tabular data using BERT2021 Grace Hopper Celebration India (GHCI)10.1109/GHCI50508.2021.9514032(1-5)Online publication date: 19-Feb-2021
  • (2021)TabAug: Data Driven Augmentation for Enhanced Table Structure RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86331-9_38(585-601)Online publication date: 2-Sep-2021
  • (2020)On automated workflow for fine-tuning deepneural network models for table detection in document images2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO48935.2020.9245241(1130-1133)Online publication date: 28-Sep-2020
  • (2020)On Graph-Based Verification for PDF Table Detection2020 Ivannikov Ispras Open Conference (ISPRAS)10.1109/ISPRAS51486.2020.00020(91-95)Online publication date: Dec-2020
  • (2019)Automatic localization and extraction of tables from handheld mobile-camera captured handwritten document imagesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-18124236:3(2527-2544)Online publication date: 26-Mar-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media