Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1244002.1244144acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A table-form extraction with artefact removal

Published: 11 March 2007 Publication History

Abstract

We present a novel methodology for extracting the structure of handwritten filled table-forms. The method identifies the table-form line intersections, detecting and correcting wrong intersections produced by faulty line segments or by table artefacts. Examples of artefacts are overlapping data, broken segments, and smudges. A novel method for artefact identification and deletion is also proposed. The last step performs the extraction of table-form cells.
A database of 350 table-form images was used for evaluation, showing that the artefact identification method improves the performance of the table-forms structure extractor. The proposed approach reached a success rate of 85%.

References

[1]
J. F. Arias, A. Chhabra, and V. Misra. Interpreting and Representing Tabular Documents. CVPR 1996 - IEEE - In: Proceedings of the Conference on Computer Society Conference on Computer Vision and Pattern Recognition, pages 600--605, 1996.
[2]
J. F. Arias, R. Kasturi, and A. Chhabra. Efficient Techniques for Telephone Company Line Drawing Interpretation. ICDAR 1995 - IEEE - Third International Conference on Document Analysis and Recognition, pages 795--798, 1995.
[3]
B. Couasnon. Dmos: A generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems. ICDAR 2001 - Sixth International Conference on Document Analysis and Recognition, pages 215--220, 2001.
[4]
K.-C. Fan, J.-M. Lu, L.-S. Wang, and H.-Y. Liao. Extration of characters from form documents by feature point clustering. Pattern Recognition Letters, 1995.
[5]
T. Hirano, Y. Okada, and F. Yoda. Field extraction method from existing forms transmitted by facsimile. ICDAR 2001 - In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pages 738--742, 2001.
[6]
O. Hori and D. S. Doermann. Robust table-form structure analysis based on box-driven reasoning. ICDAR 1995 - In: Third International Conference on Document Analysis and Recognition, pages 218--221, 1995.
[7]
J. Hu, R. S. Kashi, D. Lopresti, and G. T. Wilfong. Evaluating the performance of table processing algorithms. International Journal on Document Analysis and Recognition, 4:140--153, 2002.
[8]
T. Kieninger and A. Dengel. The t-recs table recognition and analysis system. In: DAS'98 - Sixth International Conference on Document Analysis Systems, pages 255--269, 1998.
[9]
J. Liang, J. Ha, R. M. Haralick, and I. T. Phillips. Document layout structure extraction using bounding boxes of different entities. WACV 1996 In: Third IEEE Workshop on Applications of Computer Vision, pages 278--283, 1996.
[10]
L. A. P. Neves. Extração de células de dados manuscritos em tabelas. Master's thesis, Pontifícia Universidade Católica do Paraná - PUCPR, Brazil, 1999.
[11]
L. A. P. Neves. Metodologia de Extração de Recuperação de Tabelas. PHD thesis - Universidade Federal de Campina Grande - UFCG, Paraíba, 2006.
[12]
A. Pizano. Extracting line features from images of business forms and tables. IAPR - In: Proceedings of the 11th International Conference on Pattern Recognition, 3:399--403, 1992.
[13]
S. Shimotsuji and M. Asano. Form Identification based on Cell Structure. ICPR 1996 - IEEE - In: 12th IAPR International Conference on Pattern Recognition, pages 793--797, 1996.
[14]
H. Shinjo, E. Hadano, K. Marukawa, Y. Shima, and H. Sako. A recursive analysis for form cell recognition. ICDAR2001-In: Sixth International Conference on Document Analysis & Recognition, 2001.
[15]
R. T. V. Thom. Modelisation de Tableaux pour le traitement Automatique des Formulaires. Laboratoire PSI, Universit de Rouen, 1997.
[16]
T. Watanabe, Q. Luo, and N. Sugie. Structure recognition methods for various types of documents. Machine Vision and Applications, 1993.
[17]
T. Watanabe, Q. Luo, and N. Sugie. Toward a practical document understanding of table-form documents: Its framework and knowledge representation. In: Second Conference on Document Analysis and Recognition, pages 510--515, 1993.
[18]
T. Watanabe, Q. Luo, and N. Sugie. Layout recognition of multi-kinds of table-form documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995.
[19]
Tukey, J. W.: Exploratory Data Analysis. Addison-Wesley, 1977.
[20]
Neves, L. A. P.; Carvalho, J. M. ; Facon, J. Bit Block Transfer and Structuring Element Decomposition for Table-form Physical Structure. SIBGRAPI 2003 - XVI Brazilian Symposium on Computer Graphics and Image Processing, 2003, São Carlos, SP.
[21]
Neves, L. A. P.; Carvalho, J. M.; Facon, J. Recognition of Deteriorated Table-form Documents: A New Approach. SIBGRAPI 2003 - XVI Brazilian Symposium on Computer Graphics and Image Processing, 2003, São Carlos, SP.
[22]
Neves, L. A. P.; Carvalho, J. M.; Facon, J.; Bortolozzi, F.; Ignacio, S. A. Handwritten Artefact Identification Method In Table Interpretation With Little Use of Knowledge. LNCS - DAS 2006 - Seventh International Association For Pattern Recognition on Document Analysis Systems, Nelson, Nova Zelǎndia, 2006.
[23]
Neves, L. A. P.; Carvalho, J. M.; Facon, J.; Bortolozzl, F. A New Table Interpretation Methodology with Little Knowledge Base. ACM SAC 2006 - The 21th Annual ACM Symposium on Applied Computing - Document Engineering Track, Dijon, França, 2006.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
March 2007
1688 pages
ISBN:1595934804
DOI:10.1145/1244002
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document segmentation
  2. handwritten data
  3. table-form extraction
  4. table-form recognition

Qualifiers

  • Article

Conference

SAC07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 206
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media