Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3184066.3184091acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlscConference Proceedingsconference-collections
research-article

Learning to detect tables in document images using line and text information

Published: 02 February 2018 Publication History

Abstract

Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to readers in a structured manner. It is still a challenging problem due to the variety of table structures and the complexity of document layout. This paper presents a hybrid method consisting of three fundamental steps to detect table zones: classification of the regions, detection of the tables that constitute intersecting horizontal and vertical lines, and identification of the tables made up by only parallel lines. Experiments on the UW-III dataset show that the obtained results are very promising.

References

[1]
Göbel, M., Hassan, T., Oro, E., Orsi, G., 2013. ICDAR 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1449--1453.
[2]
Liu, Y., Bai, K., Mitra, P., Giles, C. L., 2009. Improving the table boundary detection in Pdf by fixing the sequence error of the sparse lines. In ICDAR 2009. 1006--1010.
[3]
Liu, Y., Mitra, P., Giles, C. L., 2008. Identifying Table Boundaries in Digital Documents via Sparse Line Detection. In 17th ACM Conference on Information and Knowledge Management (CIKM '08). ACM, New York, NY, USA, 1311--1320.
[4]
Shafait, F., Smith, R., 2010. Table Detection in Heterogeneous Documents. In 9th IAPR International Workshop on Document Analysis Systems (DAS '10). ACM, New York, NY, USA, 65--72.
[5]
Fang, J., Gao, L., Bai, K., Qiu, R., Tao, X., Tang, Z., 2011. A table detection method for multipage pdf documents via visual separators and tabular structures. In ICDAR 2011 International Conference on. IEEE, 779--783.
[6]
Harit, G., Bansal, A., 2012. Table Detection in Document Images Using Header and Trailer Patterns. In 8th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP '12). ACM, New York, NY, USA, Article 62, 8 pages.
[7]
Kasar, T., Barlas, P., Adam, S., Chatelain, C., Paquet, T., 2013. Learning to Detect Tables in Scanned Document Images Using Line Information. In 2013 12th International Conference on Document Analysis and Recognition (ICDAR '13). IEEE Computer Society, Washington, DC, USA, 1185--1189.
[8]
Dhiran, T., Sharma, R., 2013. Table detection and extraction from image document. International Journal of Computer & Organization Trends 3, 7 (2013), 275--278.
[9]
Jahan, M. A., Ragel, R. G., 2014. Locating tables in scanned documents for reconstructing and republishing. In Information and Automation for Sustainability (ICIAfS), 2014 7th International Conference on. IEEE, 1--6.
[10]
Ghanmi, N., Belaid, A., 2014. Table detection in handwritten chemistry documents using conditional random fields. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, 146--151.
[11]
Fang, J., Mitra, P., Tang, Z., Giles, C. L., 2012. Table Header Detection and Classification. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI'12). AAAI Press, 599--605. http://dl.acm.org/citation.cfm?id=2900728.2900814
[12]
Tran, T. A., Tran, H. T., Na, I. S., Lee, G. S., Yang, H. J., Kim, S. H., 2016. A Mixture Model Using Random Rotation Bounding Box to Detect Table Region in Document Image. J. Vis. Comun. Image Represent. 39, C (Aug. 2016), 196--208.
[13]
Tran, T. A., Oh, K., Na, I. S., Lee, G. S., Yang, H. J., Kim, S. H., 2017. A Robust System for Document Layout Analysis using Multilevel Homogeneity Structure. Expert Systems With Applications, 85 (2017), 99--113.
[14]
Tran, T. A., Na, I. S., Kim, S. H., 2015. Hybrid Page Segmentation using Multilevel Homogeneity Structure. In Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication pp. 78:1--78:6.
[15]
Tran, T. A., Na, I. S., Kim, S. H., 2015. A Hybrid Method for Table Detection from Document Image. In IEEE Proceedings of the 3rd Asian Conference on PatternRecognition, 131--135.
[16]
IT Phillips. 1996. User's reference manual for the UW English/technical document image database III. UW-III English/Technical Document Image Database Manual (1996).
[17]
Gary Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal: Software Tools for the Professional Programmer 25, 11 (2000), 120--123
[18]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.
[19]
Silva, A. C., 2011. Metrics for evaluating performance in document analysis: application to tables. Int J Doc Anal Recognit 14, 1 (2011), 101--109.
[20]
Göbel, M., Hassan, T., Oro, E., Orsi, G., 2012. A Methodology for Evaluating Algorithms for Table Understanding in PDF Documents. In Proceedings of the 2012 ACM Symposium on Document Engineering (DocEng '12). ACM, New York, NY, USA, 45--48.

Cited By

View all
  • (2024)A New Method “ProjectionP” for Table Structure RecognitionComputer Information Systems and Industrial Management10.1007/978-3-031-71115-2_6(74-88)Online publication date: 27-Sep-2024
  • (2021)Automatic Table Detection, Structure Recognition and Data Extraction from Document ImagesInternational Journal of Innovative Technology and Exploring Engineering10.35940/ijitee.I9349.071092110:9(73-79)Online publication date: 30-Jul-2021
  • (2018)Localization of scores and average in Algerian baccalaureate transcripts2018 International Conference on Signal, Image, Vision and their Applications (SIVA)10.1109/SIVA.2018.8661108(1-6)Online publication date: Nov-2018

Index Terms

  1. Learning to detect tables in document images using line and text information

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLSC '18: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing
    February 2018
    198 pages
    ISBN:9781450363365
    DOI:10.1145/3184066
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. document layout analysis
    2. random forest
    3. support vector machine
    4. table detection

    Qualifiers

    • Research-article

    Conference

    ICMLSC 2018

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A New Method “ProjectionP” for Table Structure RecognitionComputer Information Systems and Industrial Management10.1007/978-3-031-71115-2_6(74-88)Online publication date: 27-Sep-2024
    • (2021)Automatic Table Detection, Structure Recognition and Data Extraction from Document ImagesInternational Journal of Innovative Technology and Exploring Engineering10.35940/ijitee.I9349.071092110:9(73-79)Online publication date: 30-Jul-2021
    • (2018)Localization of scores and average in Algerian baccalaureate transcripts2018 International Conference on Signal, Image, Vision and their Applications (SIVA)10.1109/SIVA.2018.8661108(1-6)Online publication date: Nov-2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media