Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1220175.1220238dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

QuestionBank: creating a corpus of parse-annotated questions

Published: 17 July 2006 Publication History

Abstract

This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank provides a useful new resource in parser-based QA research.

References

[1]
Daniel M. Bikel. 2002. Design of a multi-lingual, parallel-processing statistical parsing engine. In Proceedings of HLT 2002, pages 24--27, San Diego, CA.
[2]
Aoife Cahill, Michael Burke, Ruth O'Donovan, Josef van Genabith, and Andy Way. 2004. Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations. In Proceedings of ACL-04, pages 320--327, Barcelona, Spain.
[3]
Stephen Clark, Mark Steedman, and James R. Curran. 2004. Object-extraction and question-parsing using ccg. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP-04, pages 111--118, Barcelona, Spain.
[4]
Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA.
[5]
Daniel Gildea. 2001. Corpus variation and parser performance. In Lillian Lee and Donna Harman, editors, Proceedings of EMNLP, pages 167--202, Pittsburgh, PA.
[6]
Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. The ATIS Spoken Language Systems pilot corpus. In Proceedings of DARPA Speech and Natural Language Workshop, pages 96--101, Hidden Valley, PA.
[7]
Mark Johnson. 2002. A simple pattern-matching algorithm for recovering empty nodes and their antecedents. In Proceedings ACL-02, University of Pennsylvania, Philadelphia, PA.
[8]
John Judge, Aoife Cahill, Michael Burke, Ruth O'Donovan, Josef van Genabith, and Andy Way. 2005. Strong Domain Variation and Treebank-Induced LFG Resources. In Proceedings LFG-05, pages 186--204, Bergen, Norway, July.
[9]
Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of COLING-02, pages 556--562, Taipei, Taiwan.
[10]
Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330.

Cited By

View all
  • (2022)Skeleton parsing for complex question answering over knowledge basesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2021.10069872:COnline publication date: 1-Apr-2022
  • (2014)Message passing for soft constraint dual decompositionProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence10.5555/3020751.3020759(62-71)Online publication date: 23-Jul-2014
  • (2012)Improved parsing and POS tagging using inter-sentence consistency constraintsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391112(1434-1444)Online publication date: 12-Jul-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
July 2006
1214 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 17 July 2006

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Skeleton parsing for complex question answering over knowledge basesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2021.10069872:COnline publication date: 1-Apr-2022
  • (2014)Message passing for soft constraint dual decompositionProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence10.5555/3020751.3020759(62-71)Online publication date: 23-Jul-2014
  • (2012)Improved parsing and POS tagging using inter-sentence consistency constraintsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391112(1434-1444)Online publication date: 12-Jul-2012
  • (2012)Using search-logs to improve query taggingProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 210.5555/2390665.2390723(238-242)Online publication date: 8-Jul-2012
  • (2012)Syntactic annotations for the Google Books Ngram CorpusProceedings of the ACL 2012 System Demonstrations10.5555/2390470.2390499(169-174)Online publication date: 10-Jul-2012
  • (2011)Exploring linguistically-rich patterns for question generationProceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop10.5555/2187741.2187749(33-38)Online publication date: 31-Jul-2011
  • (2011)Training dependency parsers by jointly optimizing multiple objectivesProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145591(1489-1499)Online publication date: 27-Jul-2011
  • (2011)Training a parser for machine translation reorderingProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145454(183-192)Online publication date: 27-Jul-2011
  • (2011)The Uppsala-FBK systems at WMT 2011Proceedings of the Sixth Workshop on Statistical Machine Translation10.5555/2132960.2133010(372-378)Online publication date: 30-Jul-2011
  • (2011)Parsing natural language queries for life science knowledgeProceedings of BioNLP 2011 Workshop10.5555/2002902.2002926(164-173)Online publication date: 23-Jun-2011
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media