Abstract
This paper presents a new approach to table structure recognition as well as to layout analysis. The discussed recognition process differs significantly from existing approaches as it realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down. The following analysis of the recognized layout elements is based on the construction of a tile structure and detects row- and/or column spanning cells as well as sparse tables with a high degree of confidence. The overall system is completely domain independent, optionally neglects textual contents and can thus be applied to arbitrary mixed-mode documents (with or without tables) of any language and even operates on low quality OCR documents (e.g. facsimiles).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Reference
Surekha Chandran and Rangachar Kasturi: Structural Recognition of Tabulated Data. In Proc. of International Conference on Document Analysis and Recognition-ICDAR 93, 1993.
Allen S. Condit.: Autotag-A tool for creating Structured Document Collections from Printed Materials. Master's thesis, Dept. of Computer Science, University of Nevada, Las Vegas, 1995.
Andreas Dengel: About the Logical Partitioning of Document Images. In Proceedings SDAIR-94, Int'l Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pages 209–218, April 1994.
Lawrence O’Gorman: The Document Spectrum for Bottom-Up Page Layout Analysis. In H. Bunke, editor, Advances in Structural and Syntactic Pattern Recognition, pages 270–279. World Scientific, 1992.
E. Green and M. Krishnamoorthy: Recognition of Tables using Table Grammars. In Proc. of the 4-th Symposium on Document Analysis and Information Retrieval-SDAIR95, Las Vegas, Nevada, 1995.
Yuki Hirayama: A Method for Table Structure Analysis using DP Matching. In Proc. of International Conference on Document Analysis and Recognition-ICDAR 95, Montreal, Canada, 1995.
Tao Hu: New Methods for Robust and Efficient Recognition of the Logical Structures in Documents. PhD thesis, Institute of Informatics of the University of Fribourg, Switzerland, 1994.
Katsuhiko Itonori: Table Structure Recognition based on Textblock Arrangement and Ruled Line Position. In Proc. of International Conference on Document Analysis and Recognition-ICDAR 93, 1993.
Thomas Kieninger: The T-Recs Table Converting System. available at http://www.dfki.uni-kl.de/~kieni/doc/trecs3.ps.gz, April 1998.
Koich Kise, Akinori Sato, and Keinosuke Matsumoto: Document Image Segmentation as Selection of Voronoi Edges. In Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 97, June 1997.
George Nagy and S. Seth: Hierarchical Representation of Optically Scanned Documents. In Proc. of the 7th Intl. Conference on Pattern Recognition (ICPR), 1984.
T. Ohya, M. Iri, and K. Murota: A fast Voronoi Diagram Algorithm with Quaternary Tree Bucketing. In Information Processing Letters, Vol. 18, No. 4, 1984.
M. Armon Rahgozar, Zhigang Fan, and Emil V. Rainero: Tabular Document Recognition. In Proc. of the SPIE Conference on Document Recognition, 1994.
Stephen Rice, Frank Jenkins, and Thomas Nartker: The Fourth Annual Test of OCR Accuracy. Technical report, Information Science Research Institute (ISRI), Univ. of Nevada, Las Vegas, 1995.
Stephen V. Rice, Frank R. Jenkins, and Thomas A. Nartker: The Fifth Annual Test of OCR Accuracy. Technical report, Information Science Research Institute (ISRI), Univ. of Nevada, Las Vegas, 1996.
Daniela Rus and Kristen Summers: Using White Space for Automated Document Structuring. Technical Report TR 94-1452, Department of Computer Science, Cornell University, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kieninger, T., Dengel, A. (1999). The T-Recs Table Recognition and Analysis System. In: Lee, SW., Nakano, Y. (eds) Document Analysis Systems: Theory and Practice. DAS 1998. Lecture Notes in Computer Science, vol 1655. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48172-9_21
Download citation
DOI: https://doi.org/10.1007/3-540-48172-9_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66507-6
Online ISBN: 978-3-540-48172-0
eBook Packages: Springer Book Archive