Name Variants for Improving Entity Discovery and Linking

Authors Albert Weichselbraun, Philipp Kuntschik, Adrian M. P. Braşoveanu

Document Identifiers

Author Details

Albert Weichselbraun
  • Swiss Institute for Information Science, University of Applied Sciences Chur, Pulvermühlestrasse 57, 7000 Chur, Switzerland
Philipp Kuntschik
  • Swiss Institute for Information Science, University of Applied Sciences Chur, Pulvermühlestrasse 57, 7000 Chur, Switzerland
Adrian M. P. Braşoveanu
  • MODUL Technology GmbH, Am Kahlenberg 1, 1190 Vienna, Austria

Albert Weichselbraun, Philipp Kuntschik, and Adrian M. P. Braşoveanu. Name Variants for Improving Entity Discovery and Linking. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 14:1-14:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content.
This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.

Subject Classification

ACM Subject Classification
  • Information systems → Incomplete data
  • Information systems → Inconsistent data
  • Information systems → Extraction, transformation and loading
  • Information systems → Entity resolution
  • Named Entity Linking
  • Name Variance
  • Machine Learning
  • Linked Data


