research-article

Open access

DAME: Domain Adaptation for Matching Entities

Authors:

Mohamed Trabelsi,

Jeff Heflin,

Jin CaoAuthors Info & Claims

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Pages 1016 - 1024

https://doi.org/10.1145/3488560.3498486

Published: 15 February 2022 Publication History

PDF eReader

Abstract

Entity matching (EM) identifies data records that refer to the same real-world entity. Despite the effort in the past years to improve the performance in EM, the existing methods still require a huge amount of labeled data in each domain during the training phase. These methods treat each domain individually, and capture the specific signals for each dataset in EM, and this leads to overfitting on just one dataset. The knowledge that is learned from one dataset is not utilized to better understand the EM task in order to make predictions on the unseen datasets with fewer labeled samples. In this paper, we propose a new domain adaptation-based method that transfers the task knowledge from multiple source domains to a target domain. Our method presents a new setting for EM where the objective is to capture the task-specific knowledge from pretraining our model using multiple source domains, then testing our model on a target domain. We study the zero-shot learning case on the target domain, and demonstrate that our method learns the EM task and transfers knowledge to the target domain. We extensively study fine-tuning our model on the target dataset from multiple domains, and demonstrate that our model generalizes better than state-of-the-art methods in EM.

Supplementary Material

MP4 File (WSDM22-fp614.mp4)

Entity matching (EM) identifies data records that refer to the same real-world entity. Despite the effort in the past years to improve the performance in EM, the existing methods still require a huge amount of labeled data in each domain during the training phase. The knowledge that is learned from one dataset is not utilized to better understand the EM task and make predictions on the unseen datasets with fewer labeled samples. We propose a new domain adaptation-based method that transfers the task knowledge from multiple source domains to a target domain. Our objective is to capture the task-specific knowledge from pretraining our model using multiple source domains. We study the zero-shot learning case on the target domain, and demonstrate that our method transfers knowledge to the target domain. We extensively study fine-tuning our model on the target dataset from multiple domains, and demonstrate that our model generalizes better than existing methods in EM.

Download
22.16 MB

References

[1]

Nils Barlaug and Jon Atle Gulla. 2021. Neural Networks for Entity Matching: A Survey. ACM Trans. Knowl. Discov. Data, Vol. 15, 3 (2021), 52:1--52:37.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Cross-domain feature enhancement for unsupervised domain adaptation

LMDT

Domain consistency regularization for unsupervised multi-source domain adaptive classification

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations