Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica
Walter S (2017)
Bielefeld: Universität Bielefeld.
Bielefelder E-Dissertation | Englisch
Download
Autor*in
Gutachter*in / Betreuer*in
Einrichtung
Abstract / Bemerkung
There is an increasing interest in providing common web users with access to structured knowledge bases such as DBpedia, for example by means of question answering systems.
All such question answering systems have in common that they have to map a natural language input, be it spoken or written, to a formal representation in order to extract the correct answer from the target knowledge base. This is also the case for systems which generate natural language text from a given knowledge base. The main challenge is how to map natural language (spoken or written) to structured data and vice versa. To this end, question answering systems require knowledge about how the vocabulary elements used in the available datasets are verbalized in natural language, covering different verbalization variants. Multilinguality of course increases the complexity of this challenge.
In this thesis we introduce M-ATOLL, a framework for automatically inducing ontology lexica in multiple languages, to find such verbalization variants.
We have instantiated the system for three languages, English, German and Spanish, by exploiting a set of language-specific dependency patterns for finding lexicalizations in text corpora. Additionally, we extended our framework to extract complex adjective lexicalizations with a machine-learning-based approach.
M-ATOLL is the first open-source and multilingual approach for the generation of ontology lexica. In this thesis we present grammatical patterns for three different languages, on which the extraction of lexicalization relies. We provide an analysis of these patterns as well as a comparison with those proposed by other state-of-the-art systems. Additionally, we present a detailed evaluation comparing the different approaches with different settings on a publicly available goldstandard, and discuss their potential and limitations.
All such question answering systems have in common that they have to map a natural language input, be it spoken or written, to a formal representation in order to extract the correct answer from the target knowledge base. This is also the case for systems which generate natural language text from a given knowledge base. The main challenge is how to map natural language (spoken or written) to structured data and vice versa. To this end, question answering systems require knowledge about how the vocabulary elements used in the available datasets are verbalized in natural language, covering different verbalization variants. Multilinguality of course increases the complexity of this challenge.
In this thesis we introduce M-ATOLL, a framework for automatically inducing ontology lexica in multiple languages, to find such verbalization variants.
We have instantiated the system for three languages, English, German and Spanish, by exploiting a set of language-specific dependency patterns for finding lexicalizations in text corpora. Additionally, we extended our framework to extract complex adjective lexicalizations with a machine-learning-based approach.
M-ATOLL is the first open-source and multilingual approach for the generation of ontology lexica. In this thesis we present grammatical patterns for three different languages, on which the extraction of lexicalization relies. We provide an analysis of these patterns as well as a comparison with those proposed by other state-of-the-art systems. Additionally, we present a detailed evaluation comparing the different approaches with different settings on a publicly available goldstandard, and discuss their potential and limitations.
Jahr
2017
Page URI
https://pub.uni-bielefeld.de/record/2907706
Zitieren
Walter S. Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Bielefeld: Universität Bielefeld; 2017.
Walter, S. (2017). Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Bielefeld: Universität Bielefeld.
Walter, Sebastian. 2017. Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Bielefeld: Universität Bielefeld.
Walter, S. (2017). Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Bielefeld: Universität Bielefeld.
Walter, S., 2017. Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica, Bielefeld: Universität Bielefeld.
S. Walter, Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica, Bielefeld: Universität Bielefeld, 2017.
Walter, S.: Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Universität Bielefeld, Bielefeld (2017).
Walter, Sebastian. Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Bielefeld: Universität Bielefeld, 2017.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:42Z
MD5 Prüfsumme
4a5fcb1d7093ec11d8faa6ef20901a56