Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1135777.1135951acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Visually guided bottom-up table detection and segmentation in web documents

Published: 23 May 2006 Publication History

Abstract

In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. Our algorithm works bottom-up by grouping word bounding boxes into larger groups and uses a set of heuristics. It has already been implemented and a preliminary evaluation on about 6000 Web documents has been carried out.

References

[1]
B. Krüpl, M. Herzog, and W. Gatterbauer. Using Visual Cues for Extraction of Tabular Data from Arbitrary HTML Documents. In Proc. of the 14th Int. World Wide Web Conf., pages 1000--1001, 2005.
[2]
J. Liang, I. Phillips, R. Haralick. An Optimization Methodology for Document Structure Extraction on Latin Character Documents. In IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 23, No. 7, pages 719--734, 2001.
[3]
G. Nagy and S. Seth. Hierarchical representation of optically scanned documents. In Proc. of the 7th Int. Conf. on Pattern Recognition, pages 347--349, 1984.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '06: Proceedings of the 15th international conference on World Wide Web
May 2006
1102 pages
ISBN:1595933239
DOI:10.1145/1135777
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. table detection
  2. web information extraction

Qualifiers

  • Article

Conference

WWW06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Semiautomated Generation of Logic Rules for Tabular Information in Building Codes to Support Automated Code Compliance CheckingJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000100036:1Online publication date: Jan-2022
  • (2022)Table understanding: Problem overviewWIREs Data Mining and Knowledge Discovery10.1002/widm.148213:1Online publication date: 21-Nov-2022
  • (2020)A Rule-Based Method for Table Detection in Website ImagesIEEE Access10.1109/ACCESS.2020.29909018(81022-81033)Online publication date: 2020
  • (2018)Table Analysis and Information Extraction for Medical Laboratory Reports2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00043(193-199)Online publication date: Aug-2018
  • (2013)Feature-based object identification for web automationProceedings of the 28th Annual ACM Symposium on Applied Computing10.1145/2480362.2480504(742-749)Online publication date: 18-Mar-2013
  • (2013)ICDAR 2013 Table CompetitionProceedings of the 2013 12th International Conference on Document Analysis and Recognition10.1109/ICDAR.2013.292(1449-1453)Online publication date: 25-Aug-2013
  • (2011)Enabling efficient browsing and manipulation of web tables on smartphoneProceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III10.5555/2027296.2027311(117-126)Online publication date: 9-Jul-2011
  • (2011)A versatile model for web page representation, information extraction and content re-packagingProceedings of the 11th ACM symposium on Document engineering10.1145/2034691.2034721(129-138)Online publication date: 19-Sep-2011
  • (2011)Enabling Efficient Browsing and Manipulation of Web Tables on SmartphoneHuman-Computer Interaction. Towards Mobile and Intelligent Interaction Environments10.1007/978-3-642-21616-9_14(117-126)Online publication date: 2011
  • (2011)Visual webpage block importance prediction using conditional random fieldsJournal of the American Society for Information Science and Technology10.1002/asi.2160562:11(2225-2235)Online publication date: 1-Nov-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media