Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ActiveDeeper: a model-based active data enrichment system

Published: 01 August 2020 Publication History

Abstract

Deep Web (e.g., Yelp, IMDb) is an invaluable external data source for enriching a local database with new attributes. In this paper, we present ActiveDeeper, a novel model-driven data enrichment system powered by deep web. ActiveDeeper treats deep web as "a labeler" and uses it to train a data enrichment model. We show that this model-based approach significantly outperforms the state-of-the-art system in real-world scenarios. We implemented ActiveDeeper as a Google Sheets add-on and made a demo video at http://tiny.cc/activedeeper.

References

[1]
B. Settles. Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009.
[2]
P. Wang, Y. He, R. Shea, J. Wang, and E. Wu. Deeper: A data enrichment system powered by deep web. In SIGMOD, pages 1801--1804, 2018.
[3]
P. Wang, R. Shea, J. Wang, and E. Wu. Progressive deep web crawling through keyword queries for data enrichment. In SIGMOD, pages 229--246, 2019.
[4]
M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In SIGMOD, pages 97--108, 2012.
[5]
M. Zhang and K. Chakrabarti. Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In SIGMOD, pages 145--156, 2013.

Cited By

View all
  • (2024)Rock: Cleaning Data by Embedding ML in Logic RulesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653372(106-119)Online publication date: 9-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2020
Published in PVLDB Volume 13, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Rock: Cleaning Data by Embedding ML in Logic RulesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653372(106-119)Online publication date: 9-Jun-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media