Nothing Special   »   [go: up one dir, main page]

skip to main content
Modular stochastic hpsgs for question answering
Publisher:
  • University of Waterloo
  • Computer Science Dept. University Avenue Waterloo, Ont. N2L 3G1
  • Canada
ISBN:978-0-612-77237-3
Order Number:AAINQ77237
Pages:
320
Reflects downloads up to 22 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

At the highest level, we explore the practical problem of grammar modularity in natural language processing ( NLP ). Two aspects of the problem are modular design and modular use of NL grammars. We define grammar modules and describe the operation of merging two grammar modules into a larger module, and extraction of a subgrammar module from a larger module given an application context, e.g., a text and type of needed information. Grammar modularity can be applied to various domains, especially in distributed NLP—a synergetic area of the Internet and NLP techniques.

For the formal materialization of this higher-level approach we use HPSG—Head-driven Phrase Structure Grammar formalism. We define the formalism in a concise way, which is more amenable to implementations in procedural programming languages than the previous approaches. We define grammar modules and module merging in the context of this formalism, and present and analyze algorithms for subgrammar extraction for context-free and HPSG grammars.

On the practical side, we use the problem of open-domain question answering to illustrate the use and usefulness of the approach. The question-answering framework of the well-known TREC conference is used: The task is to find a short answer to a NL question as a substring of a document from the given document collection. We show that our novel approach can be successfully used with a classical information retrieval search engine.

We describe an implementation of the HPSG parser in Java. Motivated by the recent successes of probabilistic parsers, a stochastic component of the HPSG formalism is defined and implemented in the parser. The parser uses known techniques for efficient graph unification and parsing, such as hidden structure sharing.

The rest of our QA system is implemented in Perl. It includes the parts for managing grammar modules and for subgrammar extraction.

Finally, a new algorithm and an abstract machine model for graph unification is described. The advantages of this contribution include a more compact memory representation, efficient memory management within the algorithm, sub-node hidden structure sharing, and flat structure without frequent function calls. A Java and a C implementation of the algorithm are given in appendices.

Contributors
  • York University
  • Dalhousie University
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations