A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks
<p>An entity alignment example.</p> "> Figure 2
<p>Filtering Multi-type Graph Neural Networks. We use a straight line with an arrow to represent the relationship between entities (blue nodes) and attribute values (orange or green nodes), with the arrow pointing from the entity to the attribute values. In <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>K</mi> <mi>G</mi> </mrow> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>K</mi> <mi>G</mi> </mrow> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>i</mi> </mrow> </semantics></math> represents the candidate set for the <math display="inline"><semantics> <mrow> <mi>i</mi> </mrow> </semantics></math>-th entity.</p> "> Figure 3
<p>An example of the filtering module. <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>C</mi> </mrow> <mrow> <mi>m</mi> </mrow> </msub> </mrow> </semantics></math> represents the candidate set for the <math display="inline"><semantics> <mrow> <mi>m</mi> </mrow> </semantics></math>-th entity. <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>K</mi> <mi>G</mi> </mrow> <mrow> <mi>m</mi> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> represents the sub-KG reconstructed by entities from the original <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>K</mi> <mi>G</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> in the <math display="inline"><semantics> <mrow> <mi>m</mi> </mrow> </semantics></math>-th candidate set.</p> "> Figure 4
<p>The diagram of the structural aggregator. The black line with an arrow represent the relationships between entities and attribute values, and the dotted green line with an arrow represent the indirect relationships between the entities.</p> "> Figure 5
<p>A schematic diagram of the jointly embedding phase.</p> ">
Abstract
:1. Introduction
- Existing methods have to compute the embeddings of all the entities and relationships, but it is unnecessary to compute some entities which represent different real-world objects. Excluding these noisy entities can reduce the number of calculations and bring significant improvement to the accuracy of entity alignment.
- Existing works focus on embedding structural information; however, if there are few relationship connections between entities, this results in the sparse state of the entity neighborhood structure in domain KGs, especially for recipes.
- The traditional methods generally aggregate and propagate the attribute names of entities, but ignore attribute values. However, attribute value plays a significant role in enhancing entity alignment, and each attribute value has a different influence on an entity. It is non-trivial to utilize attribute values in order to enhance entity embedding.
- Considering that few relationship connections exist in domain-oriented embedded representation and the underuse of attribute values information, we employed a SA to generate high-order neighborhood-aware embeddings of entities through attribute values. Moreover, an AA utilizes the self-attention mechanism to dynamically calculate the weights between entities and attribute values and generates the attribute-aware embedding of entities. In addition, Multi-type Graph Neural Networks can enhance the aggregation of entity features.
- In order to exclude unnecessary computations and improve the accuracy of the entity alignment, we designed the filtering mechanism with entity attributes (e.g., taste or cooking technique) on domain KGs, which select the candidate set by the blocker. The experiments show that this can achieve expectations.
- Our approach only needs a few pre-aligned entities, and does not require any pre-aligned relationships or attributes between the KGs, which reduces the cost of manually annotating data in the early stages.
- The experimental results on a real-world dataset show that compared to the six state-of-the-art methods, our approach has higher accuracy and better stability.
2. Related Work
3. Problem Formalization
4. The Filtering Multi-Type Graph Neural Networks
4.1. The Filtering Module
4.2. The Embedding Module
4.2.1. The Structure-Aware Entity Embedding
4.2.2. The Attribute-Aware Entity Embedding
4.2.3. The Jointly Embedding
4.2.4. The Spatial Mapping
4.3. The Alignment Module
5. Experiments and Results
5.1. The Experimental Settings
5.1.1. Datasets
5.1.2. Experimental Environment
5.1.3. Evaluation Metrics
5.1.4. Baseline Methods
- GCN-Align: GCN-Align [31] employs GCN to effectively encode both the structural and attribute information of the entities and to generate high-quality embedding vectors. The method calculates the similarity of the structural and attribute feature vectors of the entities, and subsequently integrates them via a weighted summation, which provides a criterion for the entity similarity assessment.
- MuGNN: MuGNN [26] employs GNN to embed the structural information of the knowledge graph into multiple channels. Additionally, it utilizes an attention mechanism to assign weights to the relationships between entities, ultimately facilitating the calculation of the similarity of embedding features for achieving alignment.
- HGCN: HGCN [27] is an entity alignment method for employing GCN to capture the implicit features of entities and relationships via jointly learning, and to iteratively learn the embedding representations of entities and relationships.
- FGWEA: FGWEA [41] is an unsupervised entity alignment framework with Gromov–Wasserstein distance. The method can make full use of the structural information of the knowledge graph to realize a comprehensive comparison of the corresponding entities in different knowledge graphs through optimizing entity semantics and the knowledge graph structure.
- PEEA: PEEA [42] belongs to a weakly supervised learning framework. In addition to absorbing structural and relational information, PEEA is designed to enhance the connections between distant entities and labeled entities by integrating positional information into the representation learning process through a Position Attention Layer (PAL).
- SDEA: SDEA [43] consists of attribute embedding and relation embedding. SDEA first employs the pre-trained language model transformer to extract the semantic information of attribute values, and then utilizes GRU equipped with an attention mechanism to aggregate the structural information of neighbor entities.
5.2. The Experimental Results
5.2.1. The Comparison of Six Baseline Methods
5.2.2. Ablation Experiments
- Autoencoder: The autoencoder model consists of an aggregator, encoder, and decoder. The aggregator can dispose of the various sequences of word embedding, which the feed-forward NN cannot accept. The encoder and decoder utilize two-layer feed-forward NNs with the Tanh activation function to reconstruct the feature of the entity.
- Hybrid: The hybrid model consists of an autoencoder and CTT, which are stacked by training the autoencoder first and then the CTT. This method employs the trained encoder of the autoencoder as the aggregator for CTT to generate the tuple embeddings.
5.2.3. Results on Various-Scale Datasets
6. Summary
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zeng, K.; Li, C.; Hou, L.; Li, J.; Feng, L. A comprehensive survey of entity alignment for knowledge graphs. AI Open 2021, 2, 1–13. [Google Scholar] [CrossRef]
- Shen, L.; He, R.; Huang, S. Entity alignment with adaptive margin learning knowledge graph embedding. Data Knowl. Eng. 2022, 139, 101987. [Google Scholar] [CrossRef]
- Huang, H.; Li, C.; Peng, X.; He, L.; Guo, S.; Peng, H.; Wang, L.; Li, J. Cross-knowledge-graph entity alignment via relation prediction. Knowl. Based Syst. 2022, 240, 107813. [Google Scholar] [CrossRef]
- Xu, Y.; Li, Z.; Chen, Q.; Wang, Y.; Fan, F. An Approach for Reconciling Inconsistent Pairs Based on Factor Graph. J. Comput. Res. Dev. 2020, 57, 175–187. [Google Scholar]
- Huang, J.; Wang, J.; Li, Y.; Zhao, W. A Survey of Entity Alignment of Knowledge Graph Based on Embedded Representation. J. Phys. Conf. Ser. 2022, 2171, 012050. [Google Scholar] [CrossRef]
- Xu, Y.; Li, Z.; Chen, Q.; Fan, F. GL-RF: A reconciliation framework for label-free entity resolution. Front. Comput. Sci. 2018, 12, 1035–1037. [Google Scholar] [CrossRef]
- Weishan, C.; Yizhao, W.; Shun, M.; Jieyu, Z.; Yuncheng, J. Multi-heterogeneous neighborhood-aware for Knowledge Graphs alignment. Inf. Process. Manag. 2022, 59, 102790. [Google Scholar]
- Usman, A.M.; Liu, J.; Xie, Z.; Liu, X.; Sheeraz, A.; Huang, B. Entity alignment based on relational semantics augmentation for multilingual knowledge graphs. Knowl. Based Syst. 2022, 252, 109494. [Google Scholar]
- Chen, L.; Tian, X.; Tang, X.; Cui, J. Multi-information embedding based entity alignment. Appl. Intell. 2021, 51, 8896–8912. [Google Scholar] [CrossRef]
- Liu, J.; Chai, B.; Shang, Z. A cross-lingual medical knowledge graph entity alignment algorithm based on neural tensor network. Basic Clin. Pharmacol. Toxicol. 2021, 128, 31–32. [Google Scholar]
- Zhu, B.; Bao, T.; Liu, L.; Han, J.; Wang, J.; Peng, T. Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement. Appl. Intell. 2023, 53, 6159–6177. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Yao, X.; Xie, Y. An Improved Mapping Method of Comprehensive Ontology Similarity. Comput. Mod. 2014, 61–65. [Google Scholar]
- Suchanek, F.M.; Serge, A.; Pierre, S. PARIS: Probabilistic alignment of relations, instances, and schema. Proc. VLDB Endow. 2011, 5, 157–168. [Google Scholar] [CrossRef]
- Shao, C.; Hu, L.; Li, J.; Wang, Z.; Chung, T.; Xia, J. RiMOM-IM: A Novel Iterative Framework for Instance Matching. J. Comput. Sci. Technol. 2016, 31, 185–197. [Google Scholar] [CrossRef]
- Li, Y.; Gao, D. Research on Entities Similarity Calculation in Knowledge Graph. J. Chin. Inf. Process. 2017, 31, 140–146. [Google Scholar]
- Cohen, W.W.; Richman, J. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Montreal, QC, Canada, 23–26 July 2002; pp. 475–480. [Google Scholar]
- Verykios, V.S.; Moustakides, G.V.; Elfeky, M.G. A Bayesian decision model for cost optimal record matching. VLDB J. 2003, 12, 28–40. [Google Scholar] [CrossRef] [Green Version]
- Li, L. Research on Entity Alignment Method for Linked Open Data. Master’s Thesis, Beijing University of Chemical Technology, Beijing, China, 2017. [Google Scholar]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1544–1550. [Google Scholar]
- Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. TransG: A generative mixture model for knowledge graph embedding. arXiv 2015, arXiv:1509.05488. [Google Scholar]
- Lin, Y.; Liu, Z.; Sun, M. Modeling Relation Paths for Representation Learning of Knowledge Bases. arXiv 2015, arXiv:1506.00379. [Google Scholar]
- Huang, W.; Li, G.; Jin, Z. Improved knowledge base completion by the path-augmented TransR model. In Proceedings of the Knowledge Science, Engineering and Management: 10th International Conference, Melbourne, VIC, Australia, 19–20 August 2017; pp. 149–159. [Google Scholar]
- Cao, Y.; Liu, Z.; Li, C.; Liu, Z.; Li, J.; Chua, T.-S. Multi-Channel Graph Neural Network for Entity Alignment. arXiv 2019, arXiv:1908.09898. [Google Scholar]
- Wu, Y.; Liu, X.; Feng, Y.; Wang, Z.; Zhao, D. Jointly Learning Entity and Relation Representations for Entity Alignment. arXiv 2019, arXiv:1909.09317. [Google Scholar]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the The Semantic Web: 15th International Conference, Heraklion, Greece, 3–7 June 2018; pp. 593–607. [Google Scholar]
- Wang, C.; Huang, Z.; Wan, Y.; Wei, J.; Zhao, J.; Wang, P. FuAlign: Cross-lingual entity alignment via multi-view representation learning of fused knowledge graphs. Inf. Fusion 2023, 89, 41–52. [Google Scholar] [CrossRef]
- Teong, K.-S.; Soon, L.-K.; Su, T.T. Schema-agnostic entity matching using pre-trained language models. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Birmingham, UK, 19–23 October 2020; pp. 2241–2244. [Google Scholar]
- Wang, Z.; Lv, Q.; Lan, X.; Zhang, Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 349–357. [Google Scholar]
- Liu, Z.; Cao, Y.; Pan, L.; Li, J.; Chua, T.-S. Exploring and evaluating attributes, values, and structures for entity alignment. arXiv 2020, arXiv:2010.03249. [Google Scholar]
- Thirumuruganathan, S.; Li, H.; Tang, N.; Ouzzani, M.; Govind, Y.; Paulsen, D.; Fung, G.; Doan, A. Deep learning for blocking in entity matching: A design space exploration. Proc. VLDB Endow. 2021, 14, 2459–2472. [Google Scholar] [CrossRef]
- Nie, H.; Han, X.; Sun, L.; Wong, C.; Chen, Q.; Wu, S.; Zhang, W. Global structure and local semantics-preserved embeddings for entity alignment. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; pp. 3658–3664. [Google Scholar]
- Xiang, Y.; Zhang, Z.; Chen, J.; Chen, X.; Lin, Z.; Zheng, Y. OntoEA: Ontology-guided entity alignment via joint knowledge graph embedding. arXiv 2021, arXiv:2105.07688. [Google Scholar]
- Liu, F.; Vulić, I.; Korhonen, A.; Collier, N. Learning domain-specialised representations for cross-lingual biomedical entity linking. arXiv 2021, arXiv:2105.14398. [Google Scholar]
- Azzalini, F.; Jin, S.; Renzi, M.; Tanca, L. Blocking Techniques for Entity Linkage: A Semantics-Based Approach. Data Sci. Eng. 2021, 6, 20–38. [Google Scholar] [CrossRef]
- Muhammad, E.; Saravanan, T.; Shafiq, J.; Mourad, O.; Nan, T. Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 2018, 11, 1454–1467. [Google Scholar]
- Javdani, D.; Rahmani, H.; Allahgholi, M.; Karimkhani, F. Deepblock: A novel blocking approach for entity resolution using deep learning. In Proceedings of the 2019 5th International Conference on Web Research (ICWR), Cambridge, UK, 26–28 August 2019; pp. 41–44. [Google Scholar]
- Zhang, W.; Wei, H.; Sisman, B.; Dong, X.L.; Faloutsos, C.; Page, D. Autoblock: A hands-off blocking framework for entity matching. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 744–752. [Google Scholar]
- Tang, J.; Zhao, K.; Li, J. A Fused Gromov-Wasserstein Framework for Unsupervised Knowledge Graph Entity Alignment. arXiv 2023, arXiv:2305.06574. [Google Scholar]
- Tang, W.; Su, F.; Sun, H.; Qi, Q.; Wang, J.; Tao, S.; Hao, Y. Weakly Supervised Entity Alignment with Positional Inspiration. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 814–822. [Google Scholar]
- Zhong, Z.; Zhang, M.; Fan, J.; Dou, C. Semantics driven embedding learning for effective entity alignment. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 2127–2140. [Google Scholar]
Notation | Description |
---|---|
σ | The nonlinear activation function |
A | The adjacency matrix |
I | The identity matrix |
The degree matrix corresponding to the adjacency matrix | |
The weight coefficient matrix of the -th layer | |
The adjacency matrix with self-connection | |
The collection of neighboring entities of the -th entity in KGs | |
The embedding of domain KGs | |
The bias term of the -th layer |
Dataset | Entity | Attribute | Attribute Triple |
---|---|---|---|
ZH-MSJ | 8156 | 18 | 95,145 |
ZH-ZHYSW | 6083 | 11 | 46,652 |
Tools | Value |
---|---|
CPU | Intel Core i7-12700H |
RAM | 32 GB |
HDD | 2 TB |
PyTorch | 1.10.2 |
Python | 3.6.2 |
Scipy | 1.5.4 |
Numpy | 1.16.2 |
Hyperparameters | Description | Value |
---|---|---|
Keep-prob | The keep probability of neuron | 0.9 |
Learning rate | The step size for model parameter updates | 0.001 |
Alpha | The slope of LeakyReLu | 0.2 |
Max epoch | The maximum number of training epochs | 100 |
Method | Hits@1 (%) | Hits@10 (%) | Hits@50 (%) | MRR (%) |
---|---|---|---|---|
GCN-Align | 46.25 | 86.74 | 92.21 | 60.07 |
MuGNN | —— | —— | —— | —— |
HGCN | —— | —— | —— | —— |
FGWEA | 55.94 | 88.79 | 92.36 | 69.45 |
PEEA | 53.59 | 88.61 | 90.16 | 67.98 |
SDEA | 55.18 | 89.26 | 91.65 | 70.94 |
GCN-Align+ | 47.03 (0.78↑) | 86.13 (0.61↓) | 92.73 (0.52↑) | 60.41 (0.34↑) |
MuGNN+ | 53.71 | 92.48 | 96.74 | 69.09 |
HGCN+ | 59.02 | 95.38 | 97.19 | 73.23 |
FGWEA+ | 62.14 (6.2↑) | 97.24 (8.45↑) | 99.02 (6.66↑) | 76.61 (7.16↑) |
PEEA+ | 60.39 (6.8↑) | 93.87 (5.26↑) | 98.71 (8.55↑) | 74.63 (6.65↑) |
SDEA+ | 62.98 (7.8↑) | 97.45 (8.19↑) | 98.15 (6.5↑) | 78.77 (6.83↑) |
DomainEA | 64.66 | 98.07 | 98.91 | 79.02 |
CSSR (%) | Autoencoder | CTT | Hybrid |
---|---|---|---|
1.41 | 81.03 | 78.41 | 83.26 |
1.69 | 84.74 | 81.69 | 84.39 |
1.97 | 87.16 | 84.69 | 86.40 |
2.25 | 92.63 | 87.49 | 89.92 |
2.53 | 94.25 | 93.82 | 96.20 |
2.81 | 98.03 | 96.62 | 97.05 |
5.62 | 99.90 | 99.90 | 99.57 |
Method | Hits@1 (%) | Hits@10 (%) | Hits@50 (%) | MRR (%) | Time (s) |
---|---|---|---|---|---|
DomainEA-f | 58.39 | 95.73 | 97.66 | 71.14 | 49.47 |
DomainEA | 64.66 | 98.07 | 98.91 | 79.02 | 6.43 |
#Layers | Hits@1 (%) | Hits@10 (%) | Hits@50 (%) | MRR (%) |
---|---|---|---|---|
1 | 53.88 | 93.98 | 97.74 | 71.91 |
2 | 64.32 | 97.24 | 98.16 | 77.72 |
3 | 64.66 | 98.07 | 98.91 | 79.02 |
4 | 64.41 | 97.91 | 98.93 | 78.86 |
5 | 62.32 | 97.15 | 98.33 | 76.35 |
Dataset | Hits@1 (%) | Hits@10 (%) | Hits@50 (%) | MRR (%) |
---|---|---|---|---|
KG_1K | 57.24 | 95.37 | 97.48 | 74.35 |
KG_2K | 56.89 | 95.48 | 97.66 | 73.81 |
KG_3K | 60.48 | 95.98 | 98.16 | 75.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Y.; Zhong, J.; Zhang, S.; Li, C.; Li, P.; Guo, Y.; Li, Y.; Liang, H.; Zhang, Y. A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks. Appl. Sci. 2023, 13, 9237. https://doi.org/10.3390/app13169237
Xu Y, Zhong J, Zhang S, Li C, Li P, Guo Y, Li Y, Liang H, Zhang Y. A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks. Applied Sciences. 2023; 13(16):9237. https://doi.org/10.3390/app13169237
Chicago/Turabian StyleXu, Yaoli, Jinjun Zhong, Suzhi Zhang, Chenglin Li, Pu Li, Yanbu Guo, Yuhua Li, Hui Liang, and Yazhou Zhang. 2023. "A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks" Applied Sciences 13, no. 16: 9237. https://doi.org/10.3390/app13169237
APA StyleXu, Y., Zhong, J., Zhang, S., Li, C., Li, P., Guo, Y., Li, Y., Liang, H., & Zhang, Y. (2023). A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks. Applied Sciences, 13(16), 9237. https://doi.org/10.3390/app13169237