Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/585058.585086acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Recognizing records from the extracted cells of microfilm tables

Published: 08 November 2002 Publication History

Abstract

Microfilm documents contain a wealth of information, but extracting and organizing this information by hand is slow, error-prone, and tedious. As an initial step toward automating access to this information, we describe in this paper an algorithmic process to automatically identify record patterns found in microfilm tables for pre-specified application domains. Our table-processing algorithm accepts an XML input file describing the individual cells of a table taken from a microfilm document, and finds for each record in the document the cells that together comprise the record. Two key features drive the algorithm: (1) geometric layout and (2) label matching with respect to a given domain-specific application ontology. The algorithm achieved an accuracy of 92% on our test corpus of genealogical microfilm tables.

References

[1]
A. Amano, N. Asada, T. Montoymam, T. Sumiyoshi, and K. Suzuki. Table Form Document Synthesis by Grammar-Based Structure Analysis. In Proceedings of the Sixth International Conference on Document Analysis and Recognition (ICDAR'01), pages 533--537, Seattle, Washington, September 2001.]]
[2]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, Menlo Park, California, 1999.]]
[3]
A. El-Nasan and G. Nagy. Ink-Link. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR-2000), pages 573--576, Barcelona, Spain, September, 2000.]]
[4]
A. El-Nasan and G. Nagy. On-Line Handwriting Recognition Based on Bigram Co-occurrences. In Proceedings of the Sixteenth International Conference on Pattern Recognition (ICPR'02), volume 3, pages 740--743, Quebec City, Canada, August, 2002.]]
[5]
E.A. Green and M.S. Krishnamoorthy. Model-Based Analysis of Printed Tables. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR'95), pages 214--217, Montréal, Canada, August, 1995.]]
[6]
J.C. Handley. Chapter 8: Document Recognition. In E.R. Dougherty, editor, Electronic Imaging Technology, pages 289--316, 1999.]]
[7]
H.S. Hou. Digital Document Processing. Wiley, New York, New York, 1983.]]
[8]
J. Hu, R. Kashi, D. Lopresti, G. Nagy, and G. Wilfong. Why Table Ground-Truthing is Hard. In Proceedings of the Sixth International Conference on Document Analysis and Recognition, pages 129--133, Seattle, Washington, September, 2001.]]
[9]
M. Hurst and S. Douglas. Layout and Language: Preliminary Investigations in Recognizing the Structure of Tables. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR'97), pages 1043--1047, Ulm, Germany, August, 1997.]]
[10]
S. Jager. Recovering Dynamic Information from Static, Handwritten Word Images, PhD thesis, University of Freiburg, 1997.]]
[11]
A. Jobbins and L. Evett. Segmenting Documents Using Multiple Lexical Features. In Proceedings of the Fifth International Conference on Document Analysis and Recognition (ICDAR'99), pages 721--724, Bangalore, India, September, 1999.]]
[12]
T. Kochi and T. Saitoh. User-defined Template for identifying Document Type and Extracting Information from Documents. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, pages 127--130, Bangalore, India, September, 1999.]]
[13]
D. Lopresti and G. Nagy. Automated Table Processing: An (Opinionated) Survey. In Proceedings of the Third IAPR Workshop on Graphics Recognition, pages 109--134, Jaipur, India, September, 1999.]]
[14]
D.R. Olsen. Challenges in Constructing a Digital Microfilm Library. In Proceedings of the Second Annual Family History Technology Workshop, Provo, Utah, April, 2002.]]
[15]
C. Peterman, C.H. Chang, and H. Alam. A System for Table Understanding. In Proceedings of the Symposium on Document Image Understanding Technology (SDIUT'97), pages 55--62, Annapolis, Maryland, April/May, 1997.]]
[16]
P. Pyreddy and W.B. Croft. TINTIN: A System for Retrieval in Text Tables. In Proceedings of the 2nd ACM International Conference on Digital Libraries, Philadelphia, Pennsylvania, July, 1997.]]
[17]
S.V. Rice, G. Nagy, and T.A. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier, Kluwer Academic Publishers, Boston, Massachusetts, 1999.]]
[18]
Y. Tang, S. Lee, and C. Suen. Automatic Document Processing: A Survey. Pattern Recognition, 29(12):1931--1952, 1996.]]
[19]
J.C. van Vliet. Document Manipulation and Typography, Cambridge University Press, Cambridge, Massachusetts, 1988.]]
[20]
T. Watanabe, Q.L. Quo, and N. Sugie. Layout Recognition of Multi-Kinds of Table-Form Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4):432--445, 1995.]]
[21]
K. Zuyev. Table Image Segmentation. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR'97), pages 705--708, Ulm, Germany, August, 1997.]]

Cited By

View all
  • (2020)PytheasProceedings of the VLDB Endowment10.14778/3407790.340781013:12(2075-2089)Online publication date: 14-Sep-2020
  • (2009)An RDF-Based Blackboard Architecture for Improving Table AnalysisProceedings of the 2009 10th International Conference on Document Analysis and Recognition10.1109/ICDAR.2009.81(916-920)Online publication date: 26-Jul-2009
  • (2008)An Adaptative Recognition System Using a Table Description Language for Hierarchical Table Structures in Archival DocumentsGraphics Recognition. Recent Advances and New Opportunities10.1007/978-3-540-88188-9_2(9-20)Online publication date: 1-Apr-2008
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '02: Proceedings of the 2002 ACM symposium on Document engineering
November 2002
168 pages
ISBN:1581135947
DOI:10.1145/585058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automated recognition of record patterns
  2. geometric layout
  3. microfilm tables
  4. ontology matching

Qualifiers

  • Article

Conference

DocEng02

Acceptance Rates

DocEng '02 Paper Acceptance Rate 21 of 46 submissions, 46%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)PytheasProceedings of the VLDB Endowment10.14778/3407790.340781013:12(2075-2089)Online publication date: 14-Sep-2020
  • (2009)An RDF-Based Blackboard Architecture for Improving Table AnalysisProceedings of the 2009 10th International Conference on Document Analysis and Recognition10.1109/ICDAR.2009.81(916-920)Online publication date: 26-Jul-2009
  • (2008)An Adaptative Recognition System Using a Table Description Language for Hierarchical Table Structures in Archival DocumentsGraphics Recognition. Recent Advances and New Opportunities10.1007/978-3-540-88188-9_2(9-20)Online publication date: 1-Apr-2008
  • (2005)Automating the extraction of data from HTML tables with unknown structureData & Knowledge Engineering10.1016/j.datak.2004.10.00454:1(3-28)Online publication date: 1-Jul-2005
  • (2005)A minimal and sufficient way of introducing external knowledge for table recognition in archival documentsProceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives10.1007/11767978_19(206-217)Online publication date: 25-Aug-2005
  • (2003)Ontology generation from tablesProceedings of the 7th International Conference on Properties and Applications of Dielectric Materials (Cat. No.03CH37417)10.1109/WISE.2003.1254487(242-249)Online publication date: 2003

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media