At the highest level, we explore the practical problem of grammar modularity in natural language processing ( NLP ). Two aspects of the problem are modular design and modular use of NL grammars. We define grammar modules and describe the operation of merging two grammar modules into a larger module, and extraction of a subgrammar module from a larger module given an application context, e.g., a text and type of needed information. Grammar modularity can be applied to various domains, especially in distributed NLP—a synergetic area of the Internet and NLP techniques.
For the formal materialization of this higher-level approach we use HPSG—Head-driven Phrase Structure Grammar formalism. We define the formalism in a concise way, which is more amenable to implementations in procedural programming languages than the previous approaches. We define grammar modules and module merging in the context of this formalism, and present and analyze algorithms for subgrammar extraction for context-free and HPSG grammars.
On the practical side, we use the problem of open-domain question answering to illustrate the use and usefulness of the approach. The question-answering framework of the well-known TREC conference is used: The task is to find a short answer to a NL question as a substring of a document from the given document collection. We show that our novel approach can be successfully used with a classical information retrieval search engine.
We describe an implementation of the HPSG parser in Java. Motivated by the recent successes of probabilistic parsers, a stochastic component of the HPSG formalism is defined and implemented in the parser. The parser uses known techniques for efficient graph unification and parsing, such as hidden structure sharing.
The rest of our QA system is implemented in Perl. It includes the parts for managing grammar modules and for subgrammar extraction.
Finally, a new algorithm and an abstract machine model for graph unification is described. The advantages of this contribution include a more compact memory representation, efficient memory management within the algorithm, sub-node hidden structure sharing, and flat structure without frequent function calls. A Java and a C implementation of the algorithm are given in appendices.
Index Terms
- Modular stochastic hpsgs for question answering
Recommendations
Parsing and question classification for question answering
ODQA '01: Proceedings of the workshop on Open-domain question answering - Volume 12This paper describes machine learning based parsing and question classification for question answering. We demonstrate that for this type of application, parse trees have to be semantically richer and structurally more oriented towards semantics than ...
Full machine translation for factoid question answering
EACL 2012: Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)In this paper we present an SMT-based approach to Question Answering (QA). QA is the task of extracting exact answers in response to natural language questions. In our approach, the answer is a translation of the question obtained with an SMT system. We ...
Adaptation of Montague Grammar to the requirements of question-answering
COLING '80: Proceedings of the 8th conference on Computational linguisticsIn this paper a new version of Montague Grammar (MG) is developed, which is suitable for application in question-answering systems. The general framework for the definition of syntax and semantics described in Montague's 'Universal Grammar' is taken as ...