research-article

Evaluating Knowledge Graph Accuracy Powered by Optimized Human-machine Collaboration

Authors:

Lei ZouAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 1368 - 1378

https://doi.org/10.1145/3534678.3539233

Published: 14 August 2022 Publication History

Abstract

Estimating the accuracy of an automatically constructed knowledge graph (KG) becomes a challenging task as the KG often contains a large number of entities and triples. Generally, two major components information extraction (IE) and entity linking (EL) are involved in KG construction. However, the existing approaches just focus on evaluating the triple accuracy that indicates the IE quality, completely ignoring the entity accuracy. Motivated by the fact that the major advance of machines is the strong computing power while humans are skilled in correctness verification, we propose an efficient interactive method to reduce the overall cost for evaluating the KG quality, which produces accuracy estimates with a statistical guarantee for both triples and entities. Instead of annotating triples and entities separately, we design a general annotation cost that blends triples and entities generated from the identical source text. During human verification, the machine can pre-compute and infer triples to be annotated in the next round by speculating human feedback. The human-machine collaborative mechanism is optimized by formulating an order selection problem of triples which is NP-hard. Thus, a Monte Carlo Tree Search is proposed to guide the annotation process by finding an approximate solution. Extensive experiments demonstrate that our method takes less annotation cost while yielding higher accuracy estimation quality compared to the state-of-the-art approaches.

Supplemental Material

MP4 File

This is a video for paper "Evaluating Knowledge Graph Accuracy Powered by Optimized Human-machine Collaboration".

Download
131.75 MB

References

[1]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2 (2002), 235--256.

Digital Library

[2]

Matthias Brocheler, Lilyana Mihalkova, and Lise Getoor. 2012. Probabilistic similarity logic. arXiv preprint arXiv:1203.3469 (2012).

[3]

Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, Vol. 4, 1 (2012), 1--43.

[4]

George Casella and Roger L Berger. 2002. Statistical inference . Vol. 2. Duxbury Pacific Grove, CA.

[5]

Haihua Chen, Gaohui Cao, Jiangping Chen, and Junhua Ding. 2019. A Practical Framework for Evaluating the Quality of Knowledge Graph. In China Conference on Knowledge Graph and Semantic Computing. Springer, 111--122.

[6]

Jose Ortiz Costa and Anagha Kulkarni. 2018. Leveraging Knowledge Graph for Open-domain Question Answering. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE, 389--394.

[7]

Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In SIGKDD . 601--610.

[8]

MS Fabian, Kasneci Gjergji, WEIKUM Gerhard, et almbox. 2007. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW . 697--706.

[9]

Michael C Fu. 2016. AlphaGo and Monte Carlo tree search: the simulation optimization perspective. In Winter Simulation Conference (WSC). 659--670.

[10]

Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW . 413--422.

[11]

Junyang Gao, Xian Li, Yifan Ethan Xu, Bunyamin Sisman, Xin Luna Dong, and Jun Yang. 2019. Efficient Knowledge Graph Accuracy Evaluation. Proc. VLDB Endow., Vol. 12, 11 (2019), 1679--1691.

Digital Library

[12]

Kiril Gashteovski, Sebastian Wanner, Sven Hertling, Samuel Broscheit, and Rainer Gemulla. 2019. OPIEC: An Open Information Extraction Corpus. In AKBC .

[13]

Morris H Hansen and William N Hurwitz. 1943. On the theory of sampling from finite populations. The Annals of Mathematical Statistics, Vol. 14, 4 (1943), 333--362.

[14]

Steven Haussmann, Oshani Seneviratne, Yu Chen, Yarden Ne'eman, James Codella, Ching-Hua Chen, Deborah L McGuinness, and Mohammed J Zaki. 2019. FoodKG: a semantics-driven knowledge graph for food recommendation. In International Semantic Web Conference. Springer, 146--162.

Digital Library

[15]

David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In SIGKDD. 137--146.

[16]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In European conference on machine learning . Springer, 282--293.

Digital Library

[17]

Ni Lao, Tom Mitchell, and William Cohen. 2011. Random walk inference and learning in a large scale knowledge base. In EMNLP . 529--539.

[18]

Ying Li, Vitalii Zakhozhyi, Daniel Zhu, and Luis J Salazar. 2020. Domain Specific Knowledge Graphs as a Service to the Public: Powering Social-Impact Funding in the US. In SIGKDD. 2793--2801.

[19]

Shuangyan Liu, Mathieu d'Aquin, and Enrico Motta. 2017. Measuring accuracy of triples in knowledge graphs. In International Conference on Language, Data and Knowledge. Springer, 343--357.

[20]

Xusheng Luo, Le Bo, Jinhang Wu, Lin Li, Zhiy Luo, Yonghua Yang, and Keping Yang. 2021. AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce. In SIGKDD. 3385--3393.

[21]

Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Betteridge, Andrew Carlson, Bhanava Dalvi, Matt Gardner, Bryan Kisiel, et almbox. 2018. Never-ending learning. Commun. ACM, Vol. 61, 5 (2018), 103--115.

Digital Library

[22]

Feng Niu, Che Zhang, Christopher Ré, and Jude W Shavlik. 2012. DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. VLDS, Vol. 12 (2012), 25--28.

[23]

Prakhar Ojha and Partha Talukdar. 2017. KGEval: Accuracy estimation of automatically constructed knowledge graphs. In EMNLP . 1741--1750.

[24]

Ankur Padia et almbox. 2017. Cleaning noisy knowledge graphs. In Proceedings of the Doctoral Consortium at the 16th International Semantic Web Conference, Vol. 1962.

[25]

Ankur Padia, Frank Ferraro, and Tim Finin. 2018. KGCleaner: Identifying and correcting errors produced by information extraction systems. arXiv preprint arXiv:1808.04816 (2018).

[26]

Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, Vol. 8, 3 (2017), 489--508.

[27]

Jay Pujara, Eriq Augustine, and Lise Getoor. 2017. Sparsity and noise: Where knowledge graph embeddings fall short. In EMNLP . 1751--1756.

[28]

Jan Rörden, Artem Revenko, Bernhard Haslhofer, and Andreas Blumauer. 2017. Network-based Knowledge Graph Assessment. In SEMANTICS Posters&Demos .

[29]

Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. DocRED: A Large-Scale Document-Level Relation Extraction Dataset. In ACL . 764--777.

Cited By

Marchesin SSilvello G(2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665865
Marchesin SSilvello GAlonso OSerra ESpezzano F(2024)Veracity Estimation for Entity-Oriented Search with Knowledge GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679561(1649-1659)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679561
Tan JWang DSun JLiu ZLi XFeng Y(2024)Towards assessing the quality of knowledge graphs via differential testingInformation and Software Technology10.1016/j.infsof.2024.107521174(107521)Online publication date: Oct-2024
https://doi.org/10.1016/j.infsof.2024.107521
Show More Cited By

Index Terms

Evaluating Knowledge Graph Accuracy Powered by Optimized Human-machine Collaboration
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pretrained Transformer models have emerged as state-of-the-art approaches that learn contextual information from the text to improve the performance of several NLP tasks. These models, albeit powerful, still require specialized knowledge in specific ...
KG-ZESHEL: Knowledge Graph-Enhanced Zero-Shot Entity Linking
K-CAP '21: Proceedings of the 11th Knowledge Capture Conference

Entity linking is a fundamental task for a successful use of knowledge graphs in many information systems. It maps textual mentions to their corresponding entities in a given knowledge graph. However, with the rapid evolution of knowledge graphs, a ...
SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Knowledge about software used in scientific investigations is important for several reasons, for instance, to enable an understanding of provenance and methods involved in data handling. However, software is usually not formally cited, but rather ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Scientific and Technological Innovation 2030 - New Generation of Artificial Intelligence Major Project
Shanghai Science and Technology Committee
National Natural Science Foundation of China

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
425
Total Downloads

Downloads (Last 12 months)111
Downloads (Last 6 weeks)11

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marchesin SSilvello G(2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665865
Marchesin SSilvello GAlonso OSerra ESpezzano F(2024)Veracity Estimation for Entity-Oriented Search with Knowledge GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679561(1649-1659)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679561
Tan JWang DSun JLiu ZLi XFeng Y(2024)Towards assessing the quality of knowledge graphs via differential testingInformation and Software Technology10.1016/j.infsof.2024.107521174(107521)Online publication date: Oct-2024
https://doi.org/10.1016/j.infsof.2024.107521
Zhang MYang GLiu YShi JBai X(2024)Knowledge graph accuracy evaluation: an LLM-enhanced embedding approachInternational Journal of Data Science and Analytics10.1007/s41060-024-00661-3Online publication date: 8-Oct-2024
https://doi.org/10.1007/s41060-024-00661-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents