article

Learning to find answers to questions on the Web

Authors:

Eugene Agichtein,

Steve Lawrence,

Luis GravanoAuthors Info & Claims

ACM Transactions on Internet Technology (TOIT), Volume 4, Issue 2

Pages 129 - 162

https://doi.org/10.1145/990301.990303

Published: 01 May 2004 Publication History

Abstract

We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from existing information retrieval systems. The method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transformations on target information retrieval systems such as real-world general purpose search engines. At run-time, questions are transformed into a set of queries, and reranking is performed on the documents retrieved. We present a prototype search engine, Tritus, that applies the method to Web search engines. Blind evaluation on a set of real queries from a Web search engine log shows that the method significantly outperforms the underlying search engines, and outperforms a commercial search engine specializing in question answering. Our methodology cleanly supports combining documents retrieved from different search engines, resulting in additional improvement with a system that combines search results from multiple Web search engines.

References

[1]

Abney, S., Collins, M., and Singhal, A. 2000. Answer extraction. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 296--301.

[2]

Agichtein, E., Lawrence, S., and Gravano, L. 2001. Learning search engine specific query transformations for question answering. In Proceedings of the World Wide Web Conference (WWW-10). 169--178.

[3]

Aliod, D., Berri, J., and Hess, M. 1998. A real world implementation of answer extraction. In Proceedings of the 9th International Workshop on Database and Expert Systems, Workshop: Natural Language and Information Systems (NLIS-98). 143--148.

[4]

Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. O. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference. 192--199.

[5]

Brill, E. 1992. A simple rule-based part of speech tagger. In Proceedings of the Applied Natural Language Processing Conference (ANLP-92). 152--155.

[6]

Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the TREC-10 Question Answering Track. 393--400.

[7]

Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.

Digital Library

[8]

Burke, R., Hammond, K., and Kozlovsky, J. 1995. Knowledge-based information retrieval for semi-structured text. In AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval. 19--24.

[9]

Cardie, C., Ng, V., Pierce, D., and Buckley, C. 2000. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question answering system. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 180--187.

Digital Library

[10]

Croft, W. B. 2000. Combining approaches to information retrieval. Advan. Info. Retrieval. 1--36.

[11]

Glover, E., Flake, G., Lawrence, S., Birmingham, W. P., Kruger, A., Giles, C. L., and Pennock, D. 2001. Improving category specific Web search by learning query modifications. In Symposium on Applications and the Internet (SAINT-2001). 23--31.

[12]

Harabagiu, S. M., Pasca, M. A., and Maiorano, S. J. 2000. Experiments with open-domain textual question answering. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 292--298.

[13]

Hawking, D., Craswell, N., Thistlewaite, P., and Harman, D. 1999. Results and challenges in Web search evaluation. Computer Networks (Amsterdam, Netherlands) 31, 11--16, 1321--1330.

[14]

Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the TREC-9 Question Answering Track. 655--672.

[15]

Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the TREC-9 Question Answering Track. 231--234.

[16]

Joho, H. and Sanderson, M. 2000. Retrieving descriptive phrases from large amounts of free text. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 180--186.

[17]

Klavans, J. L. and Kan, M.-Y. 1998. Role of verbs in document analysis. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). 680--686.

[18]

Kwok, C. C. T., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. In Proceedings of the World Wide Web Conference (WWW-10). 150--161.

[19]

Lawrence, S., Bollacker, K., and Giles, C. L. 1999. Indexing and retrieval of scientific literature. In Proceedings of the International Conference on Information and Knowledge Management (CIKM-99). 139--146.

[20]

Lawrence, S. and Giles, C. L. 1998. Context and page analysis for improved web search. IEEE Internet Comput. 2, 4, 38--46.

Digital Library

[21]

Mann, G. 2002. Learning how to answer questions using trivia games. In Proceedings of the International Conference on Computational Linguistics (COLING-2002).

[22]

Miller, G. A. 1995. Wordnet: A lexical database for English. Comm. ACM. 39--41.

[23]

Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference. 206--214.

[24]

Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V. 1999. Lasso: A tool for surfing the answer net. In Proceedings of the TREC-8 Question Answering Track. 175--184.

[25]

Prager, J., Chu-Caroll, J., and Czuba, K. 2002. Statistical answer-type identification in open-domain question answering. In Proceedings of the Human Language Technology Conference (HLT-2002). 137--143.

[26]

Radev, D., Fan, W., Qi, H., Wu, H., and Grewal, A. 2002. Probabilistic question answering on the Web. In Proceedings of the World Wide Web Conference (WWW-2002). 408--419.

[27]

Radev, D. R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Fan, Z. Z. W., and Prager, J. M. 2001. Mining the web for answers to natural language questions. In Proceedings of the International Conference on Knowledge Management (CIKM-2001). 143--150.

[28]

Robertson, S. 1990. On term selection for query expansion. In J. Document. 46, 359--364.

Digital Library

[29]

Robertson, S. and Sparck-Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 129--146.

[30]

Robertson, S. and Walker, S. 1997. On relevance weights with little relevance information. In Proceedings of the ACM SIGIR Conference. 16--24.

[31]

Robertson, S., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In TREC-7 Proceedings. 253--264.

[32]

Rocchio, J. 1971. Relevance feedback in information retrieval, G. Salton, Ed. The SMART Retrieval System--Experiments in Automatic Document Processing. 313--323.

[33]

Salton, G. 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley.

[34]

Schiffman, B. and McKeown, K. R. 2000. Experiments in automated lexicon building for text searching. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 719--725.

Digital Library

[35]

Spink, A., Milchak, S., Sollenberger, M., and Hurson, A. 2000. Elicitation queries to the Excite Web search engine. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 134--140.

[36]

Voorhees, E. 1999a. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8. 1--24.

[37]

Voorhees, E. 1999b. The TREC-8 question answering track report. In Proceedings of TREC-8. 77--82.

[38]

Voorhees, E. 2000. Overview of the TREC-9 question answering track. In Proceedings of TREC-9. 71--80.

[39]

Voorhees, E. 2001. Overview of the TREC-2001 question answering track. In Proceedings of TREC-10. 42--51.

[40]

Voorhees, E. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of TREC-8. 84--106.

[41]

Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Info. Syst. (TOIS) 18, 1, 79--112.

Digital Library

Cited By

Wu HCho HDavies AJones GSerra ESpezzano F(2024)LLM-based Automated Web Retrieval and Text Classification of Food Sharing InitiativesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680090(4983-4990)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680090
Yan RDang DGao HWu YYu W(2023)A novel word-graph-based query rewriting method for question answeringData Technologies and Applications10.1108/DTA-05-2022-018758:1(1-23)Online publication date: 18-May-2023
https://doi.org/10.1108/DTA-05-2022-0187
Azad HDeepak A(2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
https://doi.org/10.1016/j.ipm.2019.05.009
Show More Cited By

Index Terms

Learning to find answers to questions on the Web
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Searching with context
WWW '06: Proceedings of the 15th international conference on World Wide Web

Contextual search refers to proactively capturing the information need of a user by automatically augmenting the user query with information extracted from the search context; for example, by using terms from the web page the user is currently browsing ...
Query routing for Web search engines: architecture and experiments
Abstract
General-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology

ACM Transactions on Internet Technology Volume 4, Issue 2

May 2004

113 pages

ISSN:1533-5399

EISSN:1557-6051

DOI:10.1145/990301

Issue’s Table of Contents

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2004

Published in TOIT Volume 4, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
2,877
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)7

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu HCho HDavies AJones GSerra ESpezzano F(2024)LLM-based Automated Web Retrieval and Text Classification of Food Sharing InitiativesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680090(4983-4990)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680090
Yan RDang DGao HWu YYu W(2023)A novel word-graph-based query rewriting method for question answeringData Technologies and Applications10.1108/DTA-05-2022-018758:1(1-23)Online publication date: 18-May-2023
https://doi.org/10.1108/DTA-05-2022-0187
Azad HDeepak A(2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
https://doi.org/10.1016/j.ipm.2019.05.009
Özsu MValduriez PÖzsu MValduriez P(2019)Peer-to-Peer Data ManagementPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_9(395-448)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_9
Özsu MValduriez PÖzsu MValduriez P(2019)Parallel Database SystemsPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_8(349-394)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_8
Özsu MValduriez PÖzsu MValduriez P(2019)Database Integration—Multidatabase SystemsPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_7(281-347)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_7
Özsu MValduriez PÖzsu MValduriez P(2019)Data ReplicationPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_6(247-280)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_6
Özsu MValduriez PÖzsu MValduriez P(2019)Distributed Transaction ProcessingPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_5(183-246)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_5
Özsu MValduriez PÖzsu MValduriez P(2019)Distributed Query ProcessingPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_4(129-182)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_4
Özsu MValduriez PÖzsu MValduriez P(2019)Distributed Data ControlPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_3(91-127)Online publication date: 3-Dec-2019
https://doi.org/10.1007/978-3-030-26253-2_3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents