short-paper

Landmark Explanation: An Explainer for Entity Matching Models

Authors:

Andrea Baraldi,

Francesco Del Buono,

Matteo Paganelli,

Francesco GuerraAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 4680 - 4684

https://doi.org/10.1145/3459637.3481981

Published: 30 October 2021 Publication History

Abstract

State-of-the-art approaches model Entity Matching (EM) as a binary classification problem, where Machine (ML) or Deep Learning (DL) based techniques are applied to evaluate if descriptions of pairs of entities refer to the same real-world instance. Despite these approaches have experimentally demonstrated to achieve high effectiveness, their adoption in real scenarios is limited by the lack of interpretability of their behavior.

This paper showcases Landmark Explanation1, a tool that makes generic post-hoc (model-agnostic) perturbation-based explanation systems able to explain the behavior of EM models. In particular, Landmark Explanation computes local interpretations, i.e., given a description of a pair of entities and an EM model, it computes the contribution of each term in generating the prediction. The demonstration shows that the explanations generated by Landmark Explanation are effective even for non-matching pairs of entities, a challenge for explanation systems.

References

[1]

Andrea Baraldi, Francesco Del Buono, Matteo Paganelli, and Francesco Guerra. 2021. Using Landmarks for Explaining Entity Matching Models. https://edbt2021proceedings.github.io/docs/p259.pdf. In EDBT.

[2]

Ursin Brunner and Kurt Stockinger. 2020. Entity Matching with Transformer Architectures - A Step Forward in Data Integration. In EDBT. OpenProceedings.org, 463--473.

[3]

Vincenzo Di Cicco, Donatella Firmani, Nick Koudas, Paolo Merialdo, and Divesh Srivastava. 2019. Interpreting deep learning models for entity resolution: an experience report using LIME. In aiDM@SIGMOD. ACM, 8:1--8:4.

Digital Library

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171--4186.

[5]

Mengnan Du, Ninghao Liu, and Xia Hu. 2020. Techniques for interpretable machine learning. Commun. ACM, Vol. 63, 1 (2020), 68--77.

Digital Library

[6]

Amr Ebaid, Saravanan Thirumuruganathan, Walid G Aref, Ahmed Elmagarmid, and Mourad Ouzzani. 2019. Explainer: Entity resolution explanations. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2000--2003.

[7]

Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. Proc. VLDB Endow., Vol. 11, 11 (2018), 1454--1467.

Digital Library

[8]

Uriel Feige, Vahab S. Mirrokni, and Jan Vondrák. 2011. Maximizing Non-monotone Submodular Functions. SIAM J. Comput., Vol. 40, 4 (2011), 1133--1153.

Digital Library

[9]

Amirata Ghorbani and James Y. Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In ICML (Proceedings of Machine Learning Research), Vol. 97. PMLR, 2242--2251.

Digital Library

[10]

Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. CoRR, Vol. abs/1902.10186 (2019).

[11]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. Proc. VLDB Endow., Vol. 14, 1 (Sept. 2020), 50--60. https://doi.org/10.14778/3421424.3421431

Digital Library

[12]

Xiaolan Wang Laura Haas Alexandra Meliou. 2018. Explaining Data Integration. Data Engineering (2018), 47.

[13]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD Conference. ACM, 19--34.

Digital Library

[14]

Matteo Paganelli, Francesco Del Buono, Marco Pevarello, Francesco Guerra, and Maurizio Vincini. 2021. Automated Machine Learning for Entity Matching Tasks. In EDBT. OpenProceedings.org, 325--330.

[15]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.

Digital Library

[16]

Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. In AAAI. AAAI Press, 1527--1535.

[17]

Cynthia Rudin. 2018. Please Stop Explaining Black Box Models for High Stakes Decisions. CoRR, Vol. abs/1811.10154 (2018).

[18]

Saravanan Thirumuruganathan, Mourad Ouzzani, and Nan Tang. 2019. Explaining Entity Resolution Predictions: Where are we and What needs to be done?. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1--6.

Digital Library

[19]

Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 11--20.

[20]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In EMNLP (Demos). Association for Computational Linguistics, 38--45.

[21]

Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning. In WWW. ACM, 2413--2424.

Digital Library

Cited By

Li YLi JSuhara YDoan ATan W(2023)Effective entity matching with transformersThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00779-z32:6(1215-1235)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1007/s00778-023-00779-z

Index Terms

Landmark Explanation: An Explainer for Entity Matching Models
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution
      2. Mediators and data integration

Recommendations

Complexity results for explanations in the structural-model approach

We analyze the computational complexity of Halpern and Pearl's (causal) explanations in the structural-model approach, which are based on their notions of weak and actual cause. In particular, we give a precise picture of the complexity of deciding ...
Towards automatic Privacy-Preserving Record Linkage: A Transfer Learning based classification step
Abstract
Privacy-Preserving Record Linkage (PPRL) intends to identify records that match the same real-world entities across disparate data sources while preserving the privacy of the individual entities. To identify matching records across different data ...
Hybrid Multilevel Explanation: A New Approach for Explaining Regression Models
Intelligent Systems
Abstract
Regression models are commonly used to model the associations between a set of features and an observed outcome, for purposes such as prediction, finding associations, and determining causal relationships. However, interpreting the outputs of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
170
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li YLi JSuhara YDoan ATan W(2023)Effective entity matching with transformersThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00779-z32:6(1215-1235)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1007/s00778-023-00779-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents