Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3132847.3132887acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

UFeed: Refining Web Data Integration Based on User Feedback

Published: 06 November 2017 Publication History

Abstract

One of the main challenges in large-scale data integration for relational schemas is creating an accurate mediated schema, and generating accurate semantic mappings between heterogeneous data sources and this mediated schema. Some applications can start with a moderately accurate mediated schema and mappings and refine them over time, which is referred to as the pay-as-you-go approach to data integration. Creating the mediated schema and mappings automatically to bootstrap the pay-as-you-go approach has been extensively studied. However, refining the mediated schema and mappings is still an open challenge because the data sources are usually heterogeneous and use diverse and sometimes ambiguous vocabularies. In this paper, we introduce UFeed, a system that refines relational mediated schemas and mappings based on user feedback over query answers. UFeed translates user actions into refinement operations that are applied to the mediated schema and mappings to improve their quality. We experimentally verify that UFeed improves the quality of query answers over real heterogeneous data sources extracted from the web.

References

[1]
Ashraf Aboulnaga and Kareem El Gebaly. 2007. μbe: User guided source selection and schema mediation for internet scale data integration ICDE.
[2]
Bogdan Alexe, Laura Chiticariu, Renée J Miller, and Wang-Chiew Tan. 2008. Muse: Mapping understanding and design by example. ICDE.
[3]
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan. 2011. Designing and refining schema mappings via data examples SIGMOD.
[4]
Khalid Belhajjame, Norman W Paton, Suzanne M Embury, Alvaro AA Fernandes, and Cornelia Hedeler. 2013. Incrementally improving dataspaces based on user feedback. Information Systems, Vol. 38, 5 (2013).
[5]
Philip A. Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic schema matching, ten years later. In PVLDB.
[6]
Angela Bonifati, Giansalvatore Mecca, Alessandro Pappalardo, Salvatore Raunich, and Gianvito Summa. 2008. Schema mapping verification: The spicy way. In EDBT.
[7]
Michael J Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: exploring the power of tables on the web PVLDB.
[8]
Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, and Jeffrey F. Naughton. 2009. Efficiently incorporating user feedback into information extraction and integration programs SIGMOD.
[9]
Laura Chiticariu and Wang-Chiew Tan. 2006. Debugging schema mappings with routes. In VLDB.
[10]
Anish Das Sarma, Xin Dong, and Alon Halevy. 2008. Bootstrapping pay-as-you-go data integration systems SIGMOD.
[11]
AnHai Doan and Robert McCann. 2003. Building data integration systems: A mass collaboration approach Proc. of the Workshop on Information Integration on the Web.
[12]
Xin Dong, Alon Y. Halevy, and Cong Yu. 2007. Data integration with uncertainty. In PVLDB.
[13]
Richard O. Duda, David G. Stork, and Peter E. Hart. 2000. Pattern classification and scene analysis (bibinfoedition2nd ed.). Wiley.
[14]
Julian Eberius, Maik Thiele, Katrin Braunschweig, and Wolfgang Lehner. 2015. Top-k entity augmentation using consistent set covering SSDBM.
[15]
Hazem Elmeleegy, Jayant Madhavan, and Alon Halevy. 2009. Harvesting relational tables from lists on the web PVLDB.
[16]
Michael Franklin, Alon Halevy, and David Maier. 2005. From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. (2005).
[17]
Chaitanya Gokhale, Sanjib Das, AnHai Doan, Jeffrey F. Naughton, Narasimhan Rampalli, Jude Shavlik, and Xiaojin Zhu. 2014. Corleone: Hands-off crowdsourcing for entity matching SIGMOD.
[18]
Shawn R. Jeffery, Michael J. Franklin, and Alon Y. Halevy. 2008. Pay-as-you-go user feedback for dataspace systems. SIGMOD.
[19]
Matteo Magnani, Nikos Rizopoulos, Peter Mc. Brien, and Danilo Montesi. 2005. Schema integration based on uncertain semantic mappings. ER.
[20]
Hatem A. Mahmoud and Ashraf Aboulnaga. 2010. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems SIGMOD.
[21]
Robert McCann, Warren Shen, and AnHai Doan. 2008. Matching schemas in online communities: A web 2.0 approach ICDE.
[22]
Li Qian, Michael J. Cafarella, and H. V. Jagadish. 2012. Sample-driven schema mapping. In SIGMOD.
[23]
Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB Journal, Vol. 10, 4 (2001).
[24]
Len Seligman, Peter Mork, Alon Halevy, Ken Smith, Michael J. Carey, Kuang Chen, Chris Wolf, Jayant Madhavan, Akshay Kannan, and Doug Burdick. 2010. OpenII: an open source information integration toolkit SIGMOD.
[25]
Michael Steinbach, George Karypis, and Vipin Kumar. 2000. A comparison of document clustering techniques. In Proc. KDD Workshop on Text Mining.
[26]
Weifeng Su, Jiying Wang, and Frederick Lochovsky. 2006. Holistic schema matching for web query interfaces. EDBT.
[27]
Partha Pratim Talukdar, Zachary G. Ives, and Fernando Pereira. 2010. Automatically incorporating new sources in keyword search-based data integration SIGMOD.
[28]
Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, and Sudipto Guha. 2008. Learning to create data-integrating queries. In PVLDB.
[29]
Steven Euijong Whang, Peter Lofgren, and Hector Garcia-Molina. 2013. Question selection for crowd entity resolution. In PVLDB.
[30]
Zhepeng Yan, Nan Zheng, Zachary G. Ives, Partha Pratim Talukdar, and Cong Yu. 2013. Actively soliciting feedback for query answers in keyword search-based data integration PVLDB.
[31]
Chen Jason Zhang, Lei Chen, H.V. Jagadish, and Chen Caleb Cao. 2013. Reducing uncertainty of schema matching via crowdsourcing PVLDB.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data integration
  2. holistic schema matching
  3. probabilistic schema matching
  4. schema mapping
  5. schema matching
  6. user feedback

Qualifiers

  • Research-article

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 162
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media