Abstract
Entity alignment aims to identify semantical matchings between entities from different groups. Traditional methods (e.g., attribute comparison-based methods, graph operation-based methods and active learning ones) are usually supervised by labeled data as prior knowledge. Since it is not trivial to label data for training, researchers have then turned to unsupervised methods, and have thus developed similarity-based methods, probabilistic methods, graphical model-based methods, etc. In addition, structure or class information is further explored. As an important part of a knowledge graph, entities contain rich semantical information that can be well learned by knowledge graph embedding methods in low-dimensional vector spaces. However, existing methods for entity alignment have paid little attention to knowledge graph embedding. In this paper, we propose a self-learning and embedding based method for entity alignment, thus called SEEA, to iteratively find semantically aligned entity pairs, which makes full use of semantical information contained in the attributes of entities. Experiments on three realistic datasets and comparison with a few baseline methods validate the effectiveness and merits of the proposed method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Algergawy A, Nayak R, Saake G (2010) Element similarity measures in xml schema matching. Inf Sci 180(24):4975–4998
Arasu A, Götz M, Kaushik R (2010) On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data (SIGMOD’10), pp 783–794
Araujo S, Tran DT, de Vries AP, Schwabe D (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27(5):1397–1440
Bibby J (1974) Axiomatisations of the average and a further generalisation of monotonic sequences. Glasg Math J 15(1):63–65
Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 39–48
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 2787–2795
Cai P, Li W, Feng Y, Wang Y, Jia Y (2017) Learning knowledge representation across knowledge graphs. In: AAAI 2017 workshop on knowledge-based techniques for problem solving and reasoning (KnowProS’17)’
Chen M, Tian Y, Yang M, Zaniolo C (2016) Multi-lingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954
Chen Z, Kalashnikov DV, Mehrotra S (2009) Exploiting context analysis for combining multiple entity resolution systems. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD’09), pp 207–218
Cohen WW, Richman J (2002) Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 475–480
Cook RD, Yin X (2001) Theory & methods: special invited paper: dimension reduction and visualization in discriminant analysis (with discussion). Aust N Z J Stat 43(2):147–199
Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th international conference on data engineering (ICDE’02), pp 17–28
Feng J, Huang M, Wang M, Zhou M, Hao Y, Zhu X (2016) Knowledge graph embedding by flexible translation. In: Proceedings of the 15th international conference on principles of knowledge representation and reasoning (KR’16), pp 557–560
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2):179–188
Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
Goldberg Y, Levy O (2014) Word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method, arXiv preprint. arXiv:1402.3722
Hao Y, Zhang Y, He S, Liu K, Zhao J (2016) A joint embedding method for entity alignment of knowledge bases. In: Proceedings of the 1st China conference on knowledge graph and semantic computing (CCKS’16). Springer, pp 3–14
He W, Feng Y, Zou L, Zhao D (2015) Knowledge base completion using matrix factorization. In: Proceedings of the 17th Asia-Pacific web conference (APWeb’15), pp 256–267
Jenatton R, Roux NL, Bordes A, Obozinski G (2012) A latent factor model for highly multi-relational data. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), pp 3167–3175
Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (ACL’15), pp 687–696
Ji G, Liu K, He S, Zhao J (2016) Knowledge graph completion with adaptive sparse transfer matrix. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 985–991
Jia Y, Wang Y, Lin H, Jin X, Cheng X (2016) Locally adaptive translation for knowledge graph embedding. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 992–998
Jiménez-Ruiz E, Grau BC (2011) Logmap: Logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the Semantic Web-volume part I (ISWC’11), pp 273–288
Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z (2013) Sigma: simple greedy matching for aligning large knowledge bases. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’13), pp 572–580
Lin H, Wang Y, Jia Y, Xiong J, Zhang P, Cheng X (2015) An ensemble matchers based rank aggregation method for taxonomy matching. In: Proceedings of the 17th Asia-Pacific Web conference (APWeb’15), pp 190–202
Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015) Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15), pp 705–714
Lin Y, Liu Z, Sun M (2016) Knowledge representation learning with entities, attributes and relations. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI’16), pp 2866–2872
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI conference on artificial intelligence (AAAI’15), pp 2181–2187
Marie A, Gal A (2008) Boosting schema matchers. In: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on the move to meaningful internet systems (OTM’08), pp 283–300
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 3111–3119
Ngo D, Bellahsene Z (2016) Overview of YAM++(not) yet another matcher for ontology alignment task. Web Semant Sci Serv Agents World Wide Web 41:30–49
Ngomo A-CN, Lyko K (2013) Unsupervised learning of link specifications: Deterministic vs. non-deterministic. In: Proceedings of the 8th international conference on ontology matching-volume 1111 (OM’13), pp 25–36
Nguyen DQ, Sirts K, Qu L, Johnson M (2016) Stranse: a novel embedding model of entities and relationships in knowledge bases. In: Proceedings of the 15th conference of North American chapter of the Association for Computational Linguistics: human language technologies (NAACL-HLT’16), pp 460–466
Nickel M, Tresp V, Kriegel H-P (2011) A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th international conference on machine learning (ICML’11), pp 809–816
Nikolov A, d’Aquin M, Motta E (2012) Unsupervised learning of link discovery configuration. In: Proceedings of the 9th international conference on the Semantic Web: research and applications (ESWC’12), pp 119–133
Peukert E, Massmann S, Koenig K (2010) Comparing similarity combination methods for schema matching. GI Jahrestag 1(175):692–701
Ravikumar P, Cohen WW (2004) A hierarchical graphical model for record linkage. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI’04), pp 454–461
Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: performance oriented schema mediation. Inf Syst 33(7):637–657
Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 269–278
Suchanek FM, Abiteboul S, Senellart P (2011) Paris: probabilistic alignment of relations, instances, and schema. Proc VLDB Endow 5(3):157–168
Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via joint attribute-preserving embedding, arXiv preprint. arXiv:1708.05045
Tekli J, Chbeir R (2012) Minimizing user effort in xml grammar matching. Inf Sci 210:1–40
Trouillon T, Welbl J, Riedel S, Gaussier E, Bouchard G (2016) Complex embeddings for simple link prediction. In: Proceedings of the 33rd international conference on machine learning (ICML’16), vol 48, pp 2071–2080
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI’14), pp 1112–1119
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262
Acknowledgements
This work is supported by National Key Research and Development Program of China under Grants 2016YFB1000902 and 2017YFC0820404, and National Natural Science Foundation of China under Grants 61772501, 61572473, 61572469, and 91646120.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guan, S., Jin, X., Wang, Y. et al. Self-learning and embedding based entity alignment. Knowl Inf Syst 59, 361–386 (2019). https://doi.org/10.1007/s10115-018-1191-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1191-0