This dissertation describes the creation of a large-scale, richly structured lexical knowledge base (LKB) from complex structures of labeled semantic relations. These structures were automatically extracted using a natural language parser from the definitions and example sentences contained in two machine readable dictionaries. The structures were then completely inverted and propagated across all of the relevant headwords in the dictionaries to create the LKB.
A method is described for efficiently accessing salient paths of semantic relations between words in the LKB using weights assigned to those paths. The weights are based on a unique computation called averaged vertex probability. Extended paths, created by joining sub-paths from two different semantic relation structures, are allowed in order to increase the coverage of the information in the LKB.
A novel procedure is used to determine the similarity between words in the LKB based on the patterns of the semantic relation paths connecting those words. The patterns were obtained by extensive training using word pairs from an online thesaurus and a specially created anti-thesaurus.
The similarity procedure and the path accessing mechanism are used in a procedure to infer semantic relations that are not explicitly stored in the LKB. In particular, the utility of such inferences is discussed in the context of disambiguating phrasal attachments in a natural language understanding system.
Quantitative results indicate that the size and coverage of the LKB created in this research and the effectiveness of the methods for accessing explicit and implicit information contained therein represent significant progress toward the development of a truly broad-coverage semantic component for natural language processing.
Cited By
- Wang J, Wu Y, Liu X and Gao X (2010). Knowledge acquisition method from domain text based on theme logic model and artificial neural network, Expert Systems with Applications: An International Journal, 37:1, (267-275), Online publication date: 1-Jan-2010.
- Snow R, Vanderwende L and Menezes A Effectively using syntax for recognizing false entailment Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, (33-40)
- Vanderwende L, Kacmarcik G, Suzuki H and Menezes A MindNet Proceedings of HLT/EMNLP on Interactive Demonstrations, (8-9)
- Lin D and Pantel P DIRT @SBT@discovery of inference rules from text Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, (323-328)
- Lin D LaTaT Proceedings of the first international conference on Human language technology research, (1-6)
- Litkowski K Use of machine readable dictionaries for word-sense disambiguation in SENSEVAL-2 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems, (107-110)
- Suzuki H, Brockett C and Kacmarcik G Using a broad-coverage parser for word-breaking in Japanese Proceedings of the 18th conference on Computational linguistics - Volume 2, (822-828)
- Lin D Automatic identification of non-compositional phrases Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, (317-324)
- Richardson S, Dolan W and Vanderwende L MindNet Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, (1098-1102)
- Rigau G, Rodríguez H and Agirre E Building accurate semantic taxonomies from monolingual MRDs Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, (1103-1109)
Index Terms
- Determining similarity and inferring relations in a lexical knowledge base
Recommendations
A morphological analyzer using hash tables in main memory (MAHT) and a lexical knowledge base
CICLing'12: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part IThis paper presents a morphological analyzer for the Spanish language (MAHT). This system is mainly based on the storage of words and its morphological information, leading to a lexical knowledge base that has almost five million words. The lexical ...
Role of Semantic Relations in Hindi Word Sense Disambiguation
AbstractSemantic relations play an important role in resolving the ambiguity of a polysemous word. This paper investigates the role of hypernym, hyponym, holonym and meronym relations in Hindi Word Sense Disambiguation. In this work, we have considered ...
Unsupervised learning of semantic relations of a morphologically rich language
Use of semantic concepts and relations for NLP applications including information retrieval and web search is a major area of research. In this context, semantic relation extraction from open domain web documents is important not only for English but ...