Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Knowledge graph accuracy evaluation: an LLM-enhanced embedding approach

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

As an effective way for knowledge representation and knowledge storage, knowledge graph has been widely used in various fields. However, with the rapid increase of scale and volume of various knowledge graphs, there will inevitably be some knowledge quality matters. To evaluate the accuracy of knowledge graph effectively and efficiently, a common paradigm is to match the facts in knowledge graph with specific external knowledge. In this study, an LLM-enhanced (large language model enhanced) embedding framework is designed, integrating the verification ability of large language models to further evaluate the embedding results. First an optimized embedding model is proposed to make use of knowledge graph’s internal structural information to measure whether the relation of a given triplet is probably founded. Then, the triplets which have less paths to support themselves are selected as the questionable ones, as their correctness cannot be determined confidently. Finally, the questionable triplets are filtered, and LLMs are adopted for further fact verification as external knowledge. The above three parts are aggregated to achieve the automated, accurate and efficient evaluation for knowledge graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. In embedding models, the number of training set triplets are constrained by BFS depth and subgraph size.

References

  1. Saxena, A., Tripathi, A., Talukdar, P.: Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

  2. Hildebrandt, M., Quintero Serna, J.A., Ma, Y., Ringsquandl, M., Joblin, M., Tresp, V.: Reasoning on knowledge graphs with debate dynamics. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4123–4131 (2020)

  3. Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Murphy, K.P., Sun, S., Zhang, W.: From data fusion to knowledge fusion. Proc. VLDB Endow. 7, 881–892 (2014)

    Article  Google Scholar 

  4. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12, 5–33 (1996)

    Article  Google Scholar 

  5. Gao, J., Li, X., Xu, Y.E., Sisman, B., Dong, X.L., Yang, J.: Efficient knowledge graph accuracy evaluation. Proc. VLDB Endow. 12, 1679–1691 (2019)

    Article  Google Scholar 

  6. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge unifying wordnet and Wikipedia. In: Proceedings of the 16th International Conference on World Wide Web (2007)

  7. Qi, Y., Zheng, W., Hong, L., Zou, L.: Evaluating knowledge graph accuracy powered by optimized human–machine collaboration. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD’22, pp. 1368–1378. Association for Computing Machinery, New York, NY, USA (2022)

  8. Ojha, P., Talukdar, P.: KGEval: accuracy estimation of automatically constructed knowledge graphs. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)

  9. Amaral, G., Rodrigues, O., Simperl, E.P.B.: Prove: a pipeline for automated provenance verification of knowledge graphs against textual sources (2022). arXiv:2210.14846

  10. Liu, S., d’Aquin, M., Motta, E.: Measuring accuracy of triples in knowledge graphs. In: International Conference on Language, Data, and Knowledge (2017)

  11. Jia, S., Xiang, Y., Chen, X., Wang, K.: Triple trustworthiness measurement for knowledge graph. In: The World Wide Web Conference (2019)

  12. Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., Zhang, N.: LLMS for knowledge graph construction and reasoning: recent capabilities and future opportunities (2023). arXiv:2305.13168

  13. Peng, B., Galley, M., He, P., Cheng, H., Xie, Y., Hu, Y., Huang, Q., Lidén, L., Yu, Z., Chen, W., Gao, J.: Check your facts and try again: improving large language models with external knowledge and automated feedback (2023). arXiv:2302.12813

  14. Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners (2021). arXiv:2109.01652

  15. Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2. NIPS’13, pp. 2787–2795. Curran Associates Inc., Red Hook, NY, USA (2013)

  16. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (2008)

  17. Akrami, F., Saeef, M.S., Zhang, Q., Hu, W., Li, C.: Realistic re-evaluation of knowledge graph completion methods: an experimental study. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2020)

  18. Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web (2014)

  19. Fan, W.: Dependencies for graphs: challenges and opportunities. J. Data Inf. Qual. (2019). https://doi.org/10.1145/3310230

    Article  Google Scholar 

  20. Fan, W., Hu, C., Liu, X., Lu, P.: Discovering graph functional dependencies. In: Proceedings of the 2018 International Conference on Management of Data (2018)

  21. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29, 2724–2743 (2017)

    Article  Google Scholar 

  22. Cao, J., Fang, J., Meng, Z., Liang, S.: Knowledge graph embedding: a survey from the perspective of representation spaces (2022). arXiv:2211.03536

  23. Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2015)

  24. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI Conference on Artificial Intelligence (2022)

  25. Lin, Y., Liu, Z., Luan, H., Sun, M., Rao, S., Liu, S.: Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)

  26. Xie, R., Liu, Z., Lin, F., Lin, L.: Does William Shakespeare really write Hamlet? knowledge representation learning with confidence. In: Proceedings of the AAAI Conference on Artificial Intelligence (2022)

  27. Zhang, Q., Dong, J., Duan, K., Huang, X., Liu, Y., Xu, L.: Contrastive knowledge graph error detection. In: Proceedings of the 31st ACM International Conference on Information Knowledge Management (2022)

  28. Wang, Y., Ma, F., Gao, J.: Efficient knowledge graph validation via cross-graph representation learning (2020). CoRR arXiv:2008.06995

  29. Lehmann, J., Gerber, D., Morsey, M., Ngonga Ngomo, A.-C.: DeFacto—deep fact validation, pp. 312–327 (2012)

  30. Gerber, D., Esteves, D., Lehmann, J., Bühmann, L., Usbeck, R., Ngonga Ngomo, A.-C., Speck, R.: Defacto—temporal and multilingual deep fact validation. J. Web Semant. (2015). https://doi.org/10.1016/j.websem.2015.08.001

    Article  Google Scholar 

  31. Li, X., Meng, W., Yu, C.: T-verifier: verifying truthfulness of fact statements. In: 2011 IEEE 27th International Conference on Data Engineering (2011)

  32. Liu, Z., Xiong, C., Sun, M., Liu, Z.: Fine-grained fact verification with kernel graph attention network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

  33. Zhou, J., Han, X., Yang, C., Liu, Z., Wang, L., Li, C., Sun, M.: Gear: Graph-based evidence aggregating and reasoning for fact verification (2019). arXiv:1908.01843

  34. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners (2020). CoRR arXiv:2005.14165

  35. Zeng, A., Liu, X., Du, Z., Wang, Z., Lai, H., Ding, M., Yang, Z., Xu, Y., Zheng, W., Xia, X., et al.: Glm-130b: an open bilingual pre-trained model (2022). arXiv preprint arXiv:2210.02414

  36. Li, S., Li, X., Shang, L., Dong, Z., Sun, C., Liu, B., Ji, Z., Jiang, X., Liu, Q.: How pre-trained language models capture factual knowledge? A causal-inspired analysis. In: Findings (2022)

  37. Guo, Z., Schlichtkrull, M., Vlachos, A.: A survey on automated fact-checking. Trans. Assoc. Comput. Linguist. 10, 178–206 (2022)

    Article  Google Scholar 

Download references

Acknowledgements

The research is funded by NSFC (72201275). The authors would like to thank researchers in AIBD, and their teams who have provided very helpful discussions and suggestions.

Author information

Authors and Affiliations

Authors

Contributions

MZ, GY and XB conceived and designed the research. MZ and JS conducted the computer simulations. All authors analysed the results and wrote the manuscript.

Corresponding author

Correspondence to Guoli Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Yang, G., Liu, Y. et al. Knowledge graph accuracy evaluation: an LLM-enhanced embedding approach. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00661-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41060-024-00661-3

Keywords

Navigation