Abstract
Natural language understanding is a key task in a wide range of applications targeting data interoperability or analytics. For the analysis of domain-specific data, specialised knowledge resources (terminologies, grammars, word vector models, lexical databases) are necessary. The heterogeneity of such resources is, however, a major obstacle to their efficient use, especially in combination. This paper presents the open-source Diversicon Framework that helps application developers in finding, integrating, and accessing lexical domain knowledge, both symbolic and statistical, in a unified manner. The major components of the framework are: (1) an API and domain knowledge model that allow applications to retrieve domain knowledge through a common interface from a diversity of resource types, (2) implementations of the API for some of the most commonly used symbolic and statistical knowledge sources, (3) a domain-aware knowledge base that helps integrate static lexico-semantic resources, and (4) an online catalogue that either hosts or links to the existing resources from multiple domains. Support for Diversicon is already integrated into two of the most popular ontology matcher applications, a fact that we exploit to validate the framework and demonstrate its use on a example study that evaluates the effect of several common-sense and domain knowledge resources on a medical ontology matching task.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
For example, the SNOMED ontology of medical terms contains over 500,000 concepts and 1,500,000 labels in English only.
The vectors of antonyms, such as hot and cold, are typically considered as very closely related by these models.
The Diversicon-based SMATCH extensions are downloadable from https://github.com/s-match/.
The Diversicon-equipped version of LogMap is downloadable from https://github.com/diversicon-kb/logmap-matcher.
References
Bella G, Giunchiglia F, McNeill F (2017) Language and domain aware lightweight ontology matching. Web Semant Sci Serv Agents World Wide Web 43(1):1–17
Bella G, Zamboni A, Giunchiglia F (2016) Domain-based sense disambiguation in multilingual structured data. In: The diversity workshop at the European conference on artificial intelligence
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(Suppl 1):D267–D270
Donnelly K (2006) SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inf 121:279
Eckle-Kohler J, McCrae JP, Chiarcos C (2015) LemonUby—a large, interlinked, syntactically-rich lexical resource for ontologies. Semant Web 6(4):371–378
Ehrmann M et al (2014) Representing multilingual data as linked data: the case of BabelNet 2.0. In: Proceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland
Faria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF (2018) Tackling the challenges of matching biomedical ontologies. J Biomed Semant 9(1):4
Francopoulo G, George M, Calzolari N, Monachini M, Bel N, Pet M, Soria C (2006) Lexical markup framework (LMF). In: International conference on language resources and evaluation-LREC 2006
Fumagalli M, Bella G, Giunchiglia F (2019) Towards understanding classification and identification. In: Proceedings of the 16th Pacific Rim international conference on artificial intelligence (PRICAI)
Gella S, Strapparava C, Nastase V (2014) Mapping WordNet domains, WordNet topics and Wikipedia categories to generate multilingual domain specific resources. In: LREC, pp 1117–1121
Ghosh S, Chakraborty P, Cohn E, Brownstein JS, Ramakrishnan N (2016) Designing domain specific word embeddings: applications to disease surveillance. arXiv preprint arXiv:1603.00106
Giunchiglia F, McNeill F, Yatskevich M, Pane J, Besana P, Shvaiko P (2008) Approximate structure-preserving semantic matching. In: Meersman R, Tari Z (eds) OTM confederated international conferences "on the move to meaningful internet systems". Springer, Berlin, pp 1217–1234
Giunchiglia F, Yatskevich M, Shvaiko P (2007) Semantic matching: algorithms and implementation. J Data Semant 9:1–38
Gliozzo A, Strapparava C (2009) Semantic domains in computational linguistics. Springer, Berlin
González-Agirre A, Rigau G, Castillo M (2012) A graph-based method to improve WordNet domains. Springer, Berlin, pp 17–28. https://doi.org/10.1007/978-3-642-28604-9_2
Gurevych I, Eckle-Kohler J, Hartmann S, Matuschek M, Meyer CM, Wirth C (2012) Uby: a large-scale unified lexical-semantic resource based on LMF. In: Proceedings of the 13th EACL conference. Association for Computational Linguistics, pp 580–590
Jiménez-Ruiz E, Cuenca Grau B (2011) LogMap: logic-based and scalable ontology matching. In: The semantic web—ISWC 2011, vol 7031, pp 273–288
Lambrix P, Tan H (2006) SAMBO—a system for aligning and merging biomedical ontologies. Web Semant Sci Serv Agents World Wide Web 4(3):196–206
Magnini B, Strapparava C, Pezzulo G, Gliozzo A (2001) Using domain information for word sense disambiguation. In: The proceedings of the second international workshop on evaluating word sense disambiguation systems, SENSEVAL ’01. Association for Computational Linguistics, Stroudsburg, pp 111–114
McCoy RT, Pavlick E, Linzen T (2019) Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. arXiv preprint arXiv:1902.01007
McCrae J, Spohr D, Cimiano P (2011) Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou G, Grobelnik M, Simperl E, Parsia B, Plexousakis D, De Leenheer P, Pan J (eds) Extended semantic web conference. Springer, Berlin, pp 245–259
McNeill F, Gkaniatsou A, Bundy A (2014) Dynamic data sharing for facilitating communication during emergency responses. In: ISCRAM
Monachini M, Quochi V, Del Gratta R, Calzolari N (2007) Using LMF to shape a lexicon for the biomedical domain. In: LangTech proceeding, Rome
Nooralahzadeh F, Øvrelid L, Lønning JT (2018) Evaluation of domain-specific word embeddings using knowledge resources. In: Proceedings of LREC 2018. European Language Resources Association (ELRA). Miyazaki, Japan
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pianta E, Bentivogli L, Girardi C (2002) MultiWordNet: developing an aligned multilingual database. In: Proceedings of the first international conference on global WordNet, pp 21–25. http://multiwordnet.fbk.eu/paper/MWN-India-published.pdf
Pilehvar MT, Collier N (2016) Improved semantic representation for domain-specific entities. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 12–16
Rotella F, Ferilli S, Leuzzi F (2012) A domain-based approach to information retrieval in digital libraries. In: Agosti M, Esposito F, Ferilli S, Ferro N (eds) Italian research conference on digital libraries. Springer, Berlin, pp 129–140
Toral A, Monachini M, Soria C, Cuadros M, Rigau G, Bosma W, Vossen P (2010) Linking a domain thesaurus to WordNet and conversion to WordNet-LMF. In: Proceedings of second international conference on global interoperability for language resources (ICGL2010). Hong Kong
Trier J (1931) Der deutsche Wortschatz im Sinnbezirk des Verstandes: die Geschichte eines sprachlichen Feldes. 1. von den Anfängen bis zum Beginn des 13. Jahrhunderts. Winter
Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Kluwer, Norwell
Vulić I, Ponzetto SP, Glavaš G (2019) Multilingual and cross-lingual graded lexical entailment. In: Proceedings of the 57th conference of the association for computational linguistics, pp 4963–4974
Wright S, Budin G (2001) Handbook of terminology management. Application-oriented terminology management. J. Benjamins, New York
Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 425–434
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bella, G., McNeill, F., Leoni, D. et al. Diversicon: Pluggable Lexical Domain Knowledge. J Data Semant 8, 219–234 (2019). https://doi.org/10.1007/s13740-019-00107-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-019-00107-1