Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1066677.1066826acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Automatic wrapper maintenance for semi-structured web sources using results from previous queries

Published: 13 March 2005 Publication History

Abstract

During the last years, significant attention has been paid to the problem of building wrappers for extracting data from semistructured web sources. Nevertheless, since web sources are autonomous, they may experience changes that invalidate the wrappers. In this paper, we present new heuristics and algorithms to address the problem of automatic wrapper maintenance. Our approach is based on collecting query results during wrapper operation and using them later to generate new sets of examples that can be used to induce a new wrapper when the source changes.

References

[1]
Arasu, A. and Garcia-Molina, H. Extracting Structured Data from Web Pages. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data.
[2]
Crescenzi, V., Mecca, G. and Merialdo, P. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In Proceedings of the VLDB 2001.
[3]
Kushmerick, N. Regression testing for wrapper maintenance. In Proceedings of the sixteenth national conference on artificial intelligence and eleventh innovation applications of AI conference on Artificial intelligence and innovative applications of artificial intelligence. 1999.
[4]
Laender, A. H. F., Ribeiro-Neto, B. A., Soares da Silva, A. and Teixeira, J. S. A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31(2), pp. 84--93. 2002.
[5]
Lerman, K., Minton, S. and Knoblock, C. Wrapper Maintenance: A Machine Learning Approach. Journal of Artificial Intelligence Research 18, pp. 149--181. 2003.
[6]
Meng, X., Hu, D. and Li, C. Schema-Guided Wrapper Maintenance for Web-Data Extraction. In Proceedings of the ACM Fifth International Workshop on Web Information and Data Management (WIDM). 2003.
[7]
Raposo, J., Pan, A., Álvarez, M. and Viña, A. Automatically Regenerating Wrappers for Web Sources Using Results from Previous Queries. http://www.tic.udc.es/~jrs/publications/WrapperMaintenance.pdf

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
March 2005
1814 pages
ISBN:1581139640
DOI:10.1145/1066677
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. examples
  2. extraction
  3. maintenance
  4. web
  5. wrapper

Qualifiers

  • Article

Conference

SAC05
Sponsor:
SAC05: The 2005 ACM Symposium on Applied Computing
March 13 - 17, 2005
New Mexico, Santa Fe

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Landmarks and regions: a robust approach to data extractionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523705(993-1009)Online publication date: 9-Jun-2022
  • (2013)TEXKnowledge-Based Systems10.1016/j.knosys.2012.10.00939(109-123)Online publication date: 1-Feb-2013
  • (2012)WebSelFProceedings of the 12th international conference on Web Engineering10.1007/978-3-642-31753-8_28(347-361)Online publication date: 23-Jul-2012
  • (2011)Automatic Wrapper Adaptation by Tree Edit Distance MatchingCombinations of Intelligent Methods and Applications10.1007/978-3-642-19618-8_3(41-54)Online publication date: 2011
  • (2008)Information ExtractionFoundations and Trends in Databases10.1561/19000000031:3(261-377)Online publication date: 1-Mar-2008
  • (2006)Preloading browsers for optimizing automatic access to hidden webProceedings of the 10th East European conference on Advances in Databases and Information Systems10.1007/11827252_15(171-183)Online publication date: 3-Sep-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media