research-article

ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language Model

Authors:

Nur Al Hasan Haldar,

Mohammad Matin Najafi,

Ge QuAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 17, Issue 7

Pages 1765 - 1774

https://doi.org/10.14778/3654621.3654640

Published: 30 May 2024 Publication History

Abstract

Entity alignment (EA), a crucial task in knowledge graph (KG) research, aims to identify equivalent entities across different KGs to support downstream tasks like KG integration, text-to-SQL, and question-answering systems. Given rich semantic information within KGs, pre-trained language models (PLMs) have shown promise in EA tasks due to their exceptional context-aware encoding capabilities. However, the current solutions based on PLMs encounter obstacles such as the need for extensive training, expensive data annotation, and inadequate incorporation of structural information. In this study, we introduce a novel zero-training EA framework, ZeroEA, which effectively captures both semantic and structural information for PLMs. To be specific, Graph2Prompt module serves as the bridge between graph structure and plain text by converting KG topology into textual context suitable for PLM input. Additionally, in order to provide PLMs with concise and clear input text of reasonable length, we design a motif-based neighborhood filter to eliminate noisy neighbors. The comprehensive experiments and analyses on 5 benchmark datasets demonstrate the effectiveness of ZeroEA, outperforming all leading competitors and achieving state-of-the-art performance in entity alignment. Notably, our study highlights the considerable potential of EA technique in improving the performance of downstream tasks, thereby benefitting the broader research field.

References

[1]

Nesreen K Ahmed, Jennifer Neville, Ryan A Rossi, and Nick Duffield. 2015. Efficient graphlet counting for large networks. In 2015 IEEE international conference on data mining. IEEE, 1--10.

Digital Library

[2]

Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science 353, 6295 (2016), 163--166.

[3]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).

[4]

Hongtai Cao, Qihao Wang, Xiaodong Li, Mohammad Matin Najafi, Kevin Chen-Chuan Chang, and Reynold Cheng. 2024. Large Subgraph Matching: A Comprehensive and Efficient Approach for Heterogeneous Graphs. In 2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE.

[5]

Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, and Kai Yu. 2021. LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 2541--2555.

[6]

Yixin Cao, Zhiyuan Liu, Chengjiang Li, Zhiyuan Liu, Juanzi Li, and Tat-Seng Chua. 2019. Multi-Channel Graph Neural Network for Entity Alignment. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1452--1461.

[7]

Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2016. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954 (2016).

Digital Library

[8]

Xiaowei Chen and John CS Lui. 2018. Mining graphlet counts in online social networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 4 (2018), 1--38.

[9]

Zhuo Chen, Jiaoyan Chen, Wen Zhang, Lingbing Guo, Yin Fang, Yufeng Huang, Yichi Zhang, Yuxia Geng, Jeff Z Pan, Wenting Song, et al. 2023. Meaformer: Multimodal entity alignment transformer for meta modality hybrid. In Proceedings of the 31st ACM International Conference on Multimedia. 3317--3327.

Digital Library

[10]

Reynold Cheng, Chenghao Ma, Xiaodong Li, Yixiang Fang, Ye Liu, Victor Wong, Esther Lee, Tai Hing Lam, Sai Yin Ho, Man Ping Wang, Weijie Gong, Wentao Ning, and Ben Kao. 2022. The Social Technology and Research (STAR) Lab in the University of Hong Kong. ACM SIGMOD Record 51, 2 (2022), 63--68.

Digital Library

[11]

Vachik S Dave, Nesreen K Ahmed, and Mohammad Al Hasan. 2017. E-CLoG: counting edge-centric local graphlets. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 586--595.

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.

[13]

Yixiang Fang, Reynold Cheng, Xiaodong Li, Siqiang Luo, and Jiafeng Hu. 2017. Effective community search over large spatial graphs. Proceedings of the VLDB Endowment (PVLDB) 10, 6 (2017), 709--720.

Digital Library

[14]

Yixiang Fang, Zheng Wang, Reynold Cheng, Xiaodong Li, Siqiang Luo, Jiafeng Hu, and Xiaojun Chen. 2018. On spatial-aware community search. IEEE Transactions on Knowledge and Data Engineering (TKDE) 31, 4 (2018), 783--798. https://ieeexplore.ieee.org/document/8375664

Digital Library

[15]

Congcong Ge, Xiaoze Liu, Lu Chen, Baihua Zheng, and Yunjun Gao. 2021. LargeEA: Aligning Entities for Large-scale Knowledge Graphs. Proc. VLDB Endow. 15, 2 (2021), 237--245.

Digital Library

[16]

Lingbing Guo, Zequn Sun, and Wei Hu. 2019. Learning to exploit long-term relational dependencies in knowledge graphs. In International conference on machine learning. PMLR, 2505--2514.

[17]

Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3549--3568.

[18]

Xiaolin Han, Reynold Cheng, Tobias Grubenmanny, Silviu Maniuz, Chenhao Ma, and Xiaodong Li. 2022. Leveraging Contextual Graphs for Stochastic Weight Completion in Sparse Road Networks. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM). SIAM.

[19]

Xiaolin Han, Tobias Grubenmann, Reynold Cheng, Sze Chun Wong, Xiaodong Li, and Wenya Sun. 2020. Traffic Incident Detection: A Trajectory-based Approach. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1866--1869. https://ieeexplore.ieee.org/document/9101794

[20]

Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards Reasoning in Large Language Models: A Survey. arXiv:2212.10403 [cs.CL]

[21]

Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In Proceedings of the twelfth ACM international conference on web search and data mining. 105--113.

Digital Library

[22]

Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, and Yongbin Li. 2023. Graphix-T5: Mixing Pre-trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing. In AAAI. AAAI Press, 13076--13084.

[23]

Xiaodong Li. 2019. DURS: A Distributed Method for k-Nearest Neighbor Search on Uncertain Graphs. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). IEEE, 377--378. https://ieeexplore.ieee.org/document/8788813

[24]

Xiaodong Li, Tsz Nam Chan, Reynold Cheng, Caihua Shan, Chenhao Ma, and Kevin Chang. 2019. Motif paths: A new approach for analyzing higher-order semantics between graph nodes. HKU Technique Reports 3 (2019), 4.

[25]

Xiaodong Li, Reynold Cheng, Kevin Chen-Chuan Chang, Caihua Shan, Chenhao Ma, and Hongtai Cao. 2021. On analyzing graphs with motif-paths. Proceedings of the VLDB Endowment 14, 6 (2021), 1111--1123.

Digital Library

[26]

Xiaodong Li, Reynold Cheng, Yixiang Fang, Jiafeng Hu, and Silviu Maniu. 2018. Scalable evaluation of k-nn queries on large uncertain graphs. In 21st International Conference on Extending Database Technology (EDBT). 181--192. https://openproceedings.org/2018/conf/edbt/paper-69.pdf

[27]

Xiaodong Li, Reynold Cheng, Matin Najafi, Kevin Chang, Xiaolin Han, and Hongtai Cao. 2020. M-Cypher: A GQL Framework Supporting Motifs. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM). 3433--3436.

Digital Library

[28]

Xiaodong Li, Vincent KC Yan, Xuxiao Ye, Min Ou, Ruibang Luo, Qingpeng Zhang, Bo Tang, Benjamin J Cowling, Ivan Hung, Chung Wah Siu, Ian CK Wong, Reynold CK Cheng, and Esther W Chan. 2021. Drug Repurposing for the Treatment of COVID-19: A Knowledge Graph Approach. Advanced Therapeutics 4 (2021), 2100055. Issue 7.

[29]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74--81.

[30]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1--35.

Digital Library

[31]

Qian Liu, Dejian Yang, Jiahui Zhang, Jiaqi Guo, Bin Zhou, and Jian-Guang Lou. 2021. Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1174--1189.

[32]

Xiao Liu, Haoyun Hong, Xinghao Wang, Zeyi Chen, Evgeny Kharlamov, Yuxiao Dong, and Jie Tang. 2022. Selfkg: self-supervised entity alignment in knowledge graphs. In Proceedings of the ACM Web Conference 2022. 860--870.

Digital Library

[33]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 61--68.

[34]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692

[35]

Chenhao Ma, Reynold Cheng, Laks VS Lakshmanan, Tobias Grubenmann, Yixiang Fang, and Xiaodong Li. 2019. LINC: a motif counting algorithm for uncertain graphs. Proceedings of the VLDB Endowment (PVLDB) 13, 2 (2019), 155--168.

Digital Library

[36]

Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Xiaolin Han, and Xiaodong Li. 2023. Accelerating directed densest subgraph queries with software and hardware approaches. The VLDB Journal 33, 1 (2023), 207--230.

Digital Library

[37]

Xinnian Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2022. LightEA: A Scalable, Robust, and Interpretable Entity Alignment Framework via Three-view Label Propagation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 825--838.

[38]

Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.

[39]

Mohammad Matin Najafi, Chenhao Ma, Xiaodong Li, Laks V.S. Lakshmanan, and Reynold Cheng. 2023. MOSER: Scalable Network Motif Discovery using Serial Test. Proceedings of the VLDB Endowment (PVLDB) 17, 3 (2023), 591--603.

Digital Library

[40]

Ali Pinar, Comandur Seshadhri, and Vaidyanathan Vishal. 2017. Escape: Efficiently counting all 5-vertex subgraphs. In Proceedings of the 26th international conference on world wide web. 1431--1440.

Digital Library

[41]

Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, Fei Huang, and Yongbin Li. 2022. A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions. arXiv:2208.13629 [cs.CL]

[42]

Ryan A Rossi, Nesreen K Ahmed, Aldo Carranza, David Arbour, Anup Rao, Sungchul Kim, and Eunyee Koh. 2020. Heterogeneous graphlets. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 1 (2020), 1--43.

[43]

Ping Shao, Yang Yang, Shengyao Xu, and Chunping Wang. 2021. Network embedding via motifs. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 3 (2021), 1--20.

[44]

Zequn Sun, Wei Hu, and Chengkai Li. 2017. Cross-lingual entity alignment via joint attribute-preserving embedding. In The Semantic Web-ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21--25, 2017, Proceedings, Part I 16. Springer, 628--644.

[45]

Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping entity alignment with knowledge graph embedding. In IJCAI, Vol. 18.

[46]

Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs. Proceedings of the VLDB Endowment 13, 12 (2020).

Digital Library

[47]

Xiaobin Tang, Jing Zhang, Bo Chen, Yang Yang, Hong Chen, and Cuiping Li. 2020. BERT-INT: a BERT-based interaction model for knowledge graph alignment. interactions 100 (2020), e1.

[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[49]

Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7567--7578.

[50]

Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge graph convolutional networks for recommender systems. In The world wide web conference. 3307--3313.

[51]

Kaixin Wang, Cheng Long, Da Yan, Jie Zhang, and HV Jagadish. 2023. Reinforcement learning enhanced weighted sampling for accurate subgraph counting on fully dynamic graph streams. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 1084--1097.

[52]

Qihao Wang, Hongtai Cao, Xiaodong Li, Kevin Chen-Chuan Chang, and Reynold Cheng. 2024. From Motif to Path: Connectivity and Homophily. In 2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE.

[53]

Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018. Cross-lingual knowledge graph alignment via graph convolutional networks. In Proceedings of the 2018 conference on empirical methods in natural language processing. 349--357.

[54]

Y Wu, X Liu, Y Feng, Z Wang, R Yan, and D Zhao. 2019. Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence.

[55]

Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2022. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 602--631.

[56]

Shuo Yu, Yufan Feng, Da Zhang, Hayat Dino Bedru, Bo Xu, and Feng Xia. 2020. Motif discovery in networks: A survey. Computer Science Review 37 (2020), 100267.

[57]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911--3921.

[58]

W Zeng, X Zhao, J Tang, and X Lin. 1912. Collective embedding-based entity alignment via adaptive features, CoRR abs. arXiv preprint arXiv:1912.08404 (1912).

[59]

Hao Zhang, Jeffrey Xu Yu, Yikai Zhang, Kangfei Zhao, and Hong Cheng. 2020. Distributed subgraph counting: a general approach. Proceedings of the VLDB Endowment 13, 12 (2020), 2493--2507.

Digital Library

[60]

Jing Zhang, Bo Chen, Xianming Wang, Hong Chen, Cuiping Li, Fengmei Jin, Guojie Song, and Yutao Zhang. 2018. Mego2vec: Embedding matched ego networks for user alignment across social networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 327--336.

Digital Library

[61]

Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. 2019. Multi-view knowledge graph embedding for entity alignment. arXiv preprint arXiv:1906.02390 (2019).

[62]

Rui Zhang, Bayu Distiawan Trisedya, Miao Li, Yong Jiang, and Jianzhong Qi. 2022. A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning. The VLDB Journal 31, 5 (2022), 1143--1168.

Digital Library

[63]

Huan Zhao, Xiaogang Xu, Yangqiu Song, Dik Lun Lee, Zhao Chen, and Han Gao. 2018. Ranking users in social networks with higher-order structures. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[64]

Huan Zhao, Xiaogang Xu, Yangqiu Song, Dik Lun Lee, Zhao Chen, and Han Gao. 2019. Ranking users in social networks with motif-based pagerank. IEEE Transactions on Knowledge and Data Engineering 33, 5 (2019), 2179--2192.

[65]

Ziyue Zhong, Meihui Zhang, Ju Fan, and Chenxiao Dou. 2022. Semantics driven embedding learning for effective entity alignment. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2127--2140.

Cited By

Huang J(2024)SSNF: Optimizing Entity Alignment with a Novel Structural and Semantic Neighbor FilteringKnowledge Science, Engineering and Management10.1007/978-981-97-5495-3_13(180-191)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-981-97-5495-3_13

Recommendations

Research on effects of pre-trained language models in medical named entity recognition
ICIT '21: Proceedings of the 2021 9th International Conference on Information Technology: IoT and Smart City

Named entity recognition (NER) is a step stone for numerous downstream applications, and medical NER is an important part of NER. Prior studies have applied various pre-trained language models (PLMs) to medical NER, but they ignore to systematically ...
An Entity Alignment Method Based on Graph Attention Network with Pre-classification
Web Information Systems and Applications
Abstract
Entity alignment is the process of identifying entities that point to the same object in different knowledge graphs. Entity alignment is a key step in building knowledge graphs, and the result of entity alignment directly affects the quality of ...
Joint Entity Summary and Attribute Embeddings for Entity Alignment Between Knowledge Graphs
Hybrid Artificial Intelligent Systems
Abstract
Knowledge Graph (KG) is a popular way of storing facts about the real world entities, where nodes represent the entities and edges denote relations. KG is being used in many AI applications, so several large scale Knowledge Graphs (KGs) e.g., ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 17, Issue 7

March 2024

260 pages

Editors:
Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 30 May 2024

Published in PVLDB Volume 17, Issue 7

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
97
Total Downloads

Downloads (Last 12 months)97
Downloads (Last 6 weeks)32

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang J(2024)SSNF: Optimizing Entity Alignment with a Novel Structural and Semantic Neighbor FilteringKnowledge Science, Engineering and Management10.1007/978-981-97-5495-3_13(180-191)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-981-97-5495-3_13

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents