Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3357384.3357854acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

TuneR: Fine Tuning of Rule-based Entity Matchers

Published: 03 November 2019 Publication History

Abstract

A rule-based entity matching task requires the definition of an effective set of rules, which is a time-consuming and error-prone process. The typical approach adopted for its resolution is a trial and error method, where the rules are incrementally added and modified until satisfactory results are obtained. This approach requires significant human intervention, since a typical dataset needs the definition of a large number of rules and possible interconnections that cannot be manually managed. In this paper, we propose TuneR, a software library supporting developers (i.e., coders, scientists, and domain experts) in tuning sets of matching rules. It aims to reduce human intervention by offering a tool for the optimization of rule sets based on user-defined criteria (such as effectiveness, interpretability, etc.). Our goal is to integrate the framework in the Magellan ecosystem, thus completing the functionalities required by the developers for performing Entity Matching tasks.

References

[1]
Chaudhuri, S., Chen, B., Ganti, V., and Kaushik, R. Example-driven design of efficient record matching queries. In Proceedings of VLDB (2007), pp. 327--338.
[2]
Fan, W., Jia, X., Li, J., and Ma, S. Reasoning about record matching rules. PVLDB 2, 1 (2009), 407--418.
[3]
Feige, U., Mirrokni, V. S., and Vondrá k, J. Maximizing non-monotone submodular functions. SIAM J. Comput. 40, 4 (2011), 1133--1153.
[4]
Konda, P., Das, S., C., P. S. G., Doan, A., Ardalan, A., Ballard, J. R., Li, H., Panahi, F., Zhang, H., Naughton, J. F., Prasad, S., Krishnan, G., Deep, R., and Raghavendra, V. Magellan: Toward building entity matching management systems. PVLDB 9, 12 (2016), 1197--1208.
[5]
Singh, R., Meduri, V. V., Elmagarmid, A. K., Madden, S., Papotti, P., Quiané -Ruiz, J., Solar-Lezama, A., and Tang, N. Synthesizing entity matching rules by examples. PVLDB 11, 2 (2017), 189--202.
[6]
Sottovia, P., Paganelli, M., Guerra, F., and Vincini, M. Big data integration of heterogeneous data sources: the re-search alps case study. In Proceedings IEEE International Congress on Big Data (BigData Congress) (2019), pp. 106--110.
[7]
Wang, J., Li, G., Yu, J. X., and Feng, J. Entity matching: How similar is similar. PVLDB 4, 10 (2011), 622--633.

Cited By

View all
  • (2024)Explaining Entity Matching with Clusters of Words2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00184(2325-2337)Online publication date: 13-May-2024
  • (2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 24-Aug-2023
  • (2023)A Framework to Evaluate the Quality of Integrated DatasetsACM SIGAPP Applied Computing Review10.1145/3584014.358401522:4(5-23)Online publication date: 10-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data deduplication
  2. data integration
  3. entity resolution

Qualifiers

  • Research-article

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Explaining Entity Matching with Clusters of Words2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00184(2325-2337)Online publication date: 13-May-2024
  • (2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 24-Aug-2023
  • (2023)A Framework to Evaluate the Quality of Integrated DatasetsACM SIGAPP Applied Computing Review10.1145/3584014.358401522:4(5-23)Online publication date: 10-Feb-2023
  • (2022)CompanyName2Vec: Company Entity Matching based on Job Ads2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032350(1-10)Online publication date: 13-Oct-2022
  • (2022)Toward Data Cleaning with a Target Accuracy: A Case Study for Value Normalization2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020821(3975-3981)Online publication date: 17-Dec-2022
  • (2022)Industrial Digitalization for Controlling a Dc Motor Through WNCSInformation Technology and Systems10.1007/978-3-030-96293-7_7(65-73)Online publication date: 2-Mar-2022
  • (2020)Unsupervised Evaluation of Data Integration ProcessesProceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services10.1145/3428757.3429129(77-81)Online publication date: 30-Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media