Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Learning to find answers to questions on the Web

Published: 01 May 2004 Publication History

Abstract

We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from existing information retrieval systems. The method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transformations on target information retrieval systems such as real-world general purpose search engines. At run-time, questions are transformed into a set of queries, and reranking is performed on the documents retrieved. We present a prototype search engine, Tritus, that applies the method to Web search engines. Blind evaluation on a set of real queries from a Web search engine log shows that the method significantly outperforms the underlying search engines, and outperforms a commercial search engine specializing in question answering. Our methodology cleanly supports combining documents retrieved from different search engines, resulting in additional improvement with a system that combines search results from multiple Web search engines.

References

[1]
Abney, S., Collins, M., and Singhal, A. 2000. Answer extraction. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 296--301.
[2]
Agichtein, E., Lawrence, S., and Gravano, L. 2001. Learning search engine specific query transformations for question answering. In Proceedings of the World Wide Web Conference (WWW-10). 169--178.
[3]
Aliod, D., Berri, J., and Hess, M. 1998. A real world implementation of answer extraction. In Proceedings of the 9th International Workshop on Database and Expert Systems, Workshop: Natural Language and Information Systems (NLIS-98). 143--148.
[4]
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. O. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference. 192--199.
[5]
Brill, E. 1992. A simple rule-based part of speech tagger. In Proceedings of the Applied Natural Language Processing Conference (ANLP-92). 152--155.
[6]
Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the TREC-10 Question Answering Track. 393--400.
[7]
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.
[8]
Burke, R., Hammond, K., and Kozlovsky, J. 1995. Knowledge-based information retrieval for semi-structured text. In AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval. 19--24.
[9]
Cardie, C., Ng, V., Pierce, D., and Buckley, C. 2000. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question answering system. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 180--187.
[10]
Croft, W. B. 2000. Combining approaches to information retrieval. Advan. Info. Retrieval. 1--36.
[11]
Glover, E., Flake, G., Lawrence, S., Birmingham, W. P., Kruger, A., Giles, C. L., and Pennock, D. 2001. Improving category specific Web search by learning query modifications. In Symposium on Applications and the Internet (SAINT-2001). 23--31.
[12]
Harabagiu, S. M., Pasca, M. A., and Maiorano, S. J. 2000. Experiments with open-domain textual question answering. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 292--298.
[13]
Hawking, D., Craswell, N., Thistlewaite, P., and Harman, D. 1999. Results and challenges in Web search evaluation. Computer Networks (Amsterdam, Netherlands) 31, 11--16, 1321--1330.
[14]
Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the TREC-9 Question Answering Track. 655--672.
[15]
Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the TREC-9 Question Answering Track. 231--234.
[16]
Joho, H. and Sanderson, M. 2000. Retrieving descriptive phrases from large amounts of free text. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 180--186.
[17]
Klavans, J. L. and Kan, M.-Y. 1998. Role of verbs in document analysis. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). 680--686.
[18]
Kwok, C. C. T., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. In Proceedings of the World Wide Web Conference (WWW-10). 150--161.
[19]
Lawrence, S., Bollacker, K., and Giles, C. L. 1999. Indexing and retrieval of scientific literature. In Proceedings of the International Conference on Information and Knowledge Management (CIKM-99). 139--146.
[20]
Lawrence, S. and Giles, C. L. 1998. Context and page analysis for improved web search. IEEE Internet Comput. 2, 4, 38--46.
[21]
Mann, G. 2002. Learning how to answer questions using trivia games. In Proceedings of the International Conference on Computational Linguistics (COLING-2002).
[22]
Miller, G. A. 1995. Wordnet: A lexical database for English. Comm. ACM. 39--41.
[23]
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference. 206--214.
[24]
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V. 1999. Lasso: A tool for surfing the answer net. In Proceedings of the TREC-8 Question Answering Track. 175--184.
[25]
Prager, J., Chu-Caroll, J., and Czuba, K. 2002. Statistical answer-type identification in open-domain question answering. In Proceedings of the Human Language Technology Conference (HLT-2002). 137--143.
[26]
Radev, D., Fan, W., Qi, H., Wu, H., and Grewal, A. 2002. Probabilistic question answering on the Web. In Proceedings of the World Wide Web Conference (WWW-2002). 408--419.
[27]
Radev, D. R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Fan, Z. Z. W., and Prager, J. M. 2001. Mining the web for answers to natural language questions. In Proceedings of the International Conference on Knowledge Management (CIKM-2001). 143--150.
[28]
Robertson, S. 1990. On term selection for query expansion. In J. Document. 46, 359--364.
[29]
Robertson, S. and Sparck-Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 129--146.
[30]
Robertson, S. and Walker, S. 1997. On relevance weights with little relevance information. In Proceedings of the ACM SIGIR Conference. 16--24.
[31]
Robertson, S., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In TREC-7 Proceedings. 253--264.
[32]
Rocchio, J. 1971. Relevance feedback in information retrieval, G. Salton, Ed. The SMART Retrieval System--Experiments in Automatic Document Processing. 313--323.
[33]
Salton, G. 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley.
[34]
Schiffman, B. and McKeown, K. R. 2000. Experiments in automated lexicon building for text searching. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 719--725.
[35]
Spink, A., Milchak, S., Sollenberger, M., and Hurson, A. 2000. Elicitation queries to the Excite Web search engine. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 134--140.
[36]
Voorhees, E. 1999a. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8. 1--24.
[37]
Voorhees, E. 1999b. The TREC-8 question answering track report. In Proceedings of TREC-8. 77--82.
[38]
Voorhees, E. 2000. Overview of the TREC-9 question answering track. In Proceedings of TREC-9. 71--80.
[39]
Voorhees, E. 2001. Overview of the TREC-2001 question answering track. In Proceedings of TREC-10. 42--51.
[40]
Voorhees, E. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of TREC-8. 84--106.
[41]
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Info. Syst. (TOIS) 18, 1, 79--112.

Cited By

View all
  • (2024)LLM-based Automated Web Retrieval and Text Classification of Food Sharing InitiativesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680090(4983-4990)Online publication date: 21-Oct-2024
  • (2023)A novel word-graph-based query rewriting method for question answeringData Technologies and Applications10.1108/DTA-05-2022-018758:1(1-23)Online publication date: 18-May-2023
  • (2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 4, Issue 2
May 2004
113 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/990301
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2004
Published in TOIT Volume 4, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Web search
  2. information retrieval
  3. meta-search
  4. query expansion
  5. question answering

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)7
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LLM-based Automated Web Retrieval and Text Classification of Food Sharing InitiativesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680090(4983-4990)Online publication date: 21-Oct-2024
  • (2023)A novel word-graph-based query rewriting method for question answeringData Technologies and Applications10.1108/DTA-05-2022-018758:1(1-23)Online publication date: 18-May-2023
  • (2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
  • (2019)Peer-to-Peer Data ManagementPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_9(395-448)Online publication date: 3-Dec-2019
  • (2019)Parallel Database SystemsPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_8(349-394)Online publication date: 3-Dec-2019
  • (2019)Database Integration—Multidatabase SystemsPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_7(281-347)Online publication date: 3-Dec-2019
  • (2019)Data ReplicationPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_6(247-280)Online publication date: 3-Dec-2019
  • (2019)Distributed Transaction ProcessingPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_5(183-246)Online publication date: 3-Dec-2019
  • (2019)Distributed Query ProcessingPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_4(129-182)Online publication date: 3-Dec-2019
  • (2019)Distributed Data ControlPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_3(91-127)Online publication date: 3-Dec-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media