Article

Retrieving answers from frequently asked questions pages on the web

Authors:

Valentin Jijkoun,

Maarten de RijkeAuthors Info & Claims

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Pages 76 - 83

https://doi.org/10.1145/1099554.1099571

Published: 31 October 2005 Publication History

Abstract

We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps: (1) fetching FAQ pages from the web; (2) automatic extraction of question/answer (Q/A) pairs from the collected pages; and (3) answering users' questions by retrieving appropriate Q/A pairs. We discuss our solutions for each of the three tasks, and give detailed evaluation results on a collected corpus of about 3.6Gb of text data (293K pages, 2.8M Q/A pairs), with real users' questions sampled from a web search engine log. Specifically, we propose simple but effective methods for Q/A extraction and investigate task-specific retrieval models for answering questions. Our best model finds answers for 36% of the test questions in the top 20 results. Our overall conclusion is that FAQ pages on the web provide an excellent resource for addressing real users' information needs in a highly focused manner.

References

[1]

Apache Lucene: A high-performance, full-featured text search engine library. http://lucene.apache.org.

[2]

E. Agichtein, S. Lawrence, and L. Gravano. Learning to find answers to questions on the web. ACM Trans. Inter. Tech., 4(2):129--162, 2004.

Digital Library

[3]

A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer- finding. In Proc. SIGIR 2000, pages 192--199, 2000.

Digital Library

[4]

R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Natural language processing in the FAQFinder system: Results and prospects. In Proc. 1997 AAAI Spring Symposium on Natural Language Processing for the World Wide Web, pages 17--26, 1997.

[5]

R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the FAQFinder system. AI Magazine, 18(2):57--66, 1997.

[6]

D. Carmel, M. Shtalhaim, and A. Soffer. eResponder: Electronic question responder. In Proc. CoopIS 2002, pages 150--161, 2000.

Digital Library

[7]

S. Chakrabarti, M. Van Den Berg, and B. Dom. Focused crawling: A new approach to topic-specific Web resource discovery. Computer Networks, 31:1623--1640, 1999.

Digital Library

[8]

W. Daelemans, J. Zavrel, K. Van Der Sloot, and A. Van Den Bosch. TiMBL: Tilburg Memory Based Learner, version 5.0. Tech. Report 03--10, 2003.

[9]

O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in KnowItAll: (preliminary results). In Proc. WWW 2004, pages 100--110, 2004.

Digital Library

[10]

A. Foster and N. Ford. Serendipity and information seeking: an empirical study. J. Documentation, 59(3):321--340, 2003.

[11]

N. Fuhr, M. Lalmas, S. Malik, and Z. Szlavik, editors. Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2004), LNCS 3493, Springer, 2005

Digital Library

[12]

R. Girju. Automatic detection of causal relations for question answering. In Proc. ACL 2003 Workshop on Multilingual Summarization and Question Answering, 2003.

Digital Library

[13]

B. Katz. Annotating the World Wide Web using natural language. In Proc. RIAO'97, 1997.

[14]

B. Katz, S. Felshin, D. Yuret, A. Ibrahim, J. Lin, G. Marton, A. McFarland, and B. Temelkuran. Omnibase: Uniform access to heterogeneous data for question answering. In Proc. NLDB 2002, 2002.

Digital Library

[15]

H. Kim and J. Seo. High-performance FAQ retrieval using an automatic clustering method of query logs. Information Processing & Management, in press.

Digital Library

[16]

L. Kossseim, S. Beauregard, and G. Lapalme. Using information extraction and natural language generation to answer e-mail. Data & Knowledge Engineering, 38(1):85--100, 2001.

Digital Library

[17]

N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(1--2):15--68, 2000.

Digital Library

[18]

C. Kwok, O. Etzioni, and D. Weld. Scaling question answering to the web. In Proc. WWW 2001, pages 150--161, 2001.

Digital Library

[19]

Y.-S. Lai, K.-A. Fung, and C.-H. Wu. FAQ mining via list detection. In Proc. Coling Workshop on Multilingual Summarization and Question Answering, 2002.

Digital Library

[20]

H. Limanto, N. Giang, V. Trung, N. Huy, J. Zhang, and Q. He. An information extraction engine for web discussion forums. In Proc. WWW 2005, pages 978--979, 2005.

Digital Library

[21]

C.-Y. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. Karger. What makes a good answer? The role of context in question answering systems. In Proc. INTERACT 2003, 2003.

[22]

S. Lytinen and N. Tomuro. The use of question types to match questions in FAQFinder. In Proc. AAAI-2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pages 46--53, 2002.

[23]

S. Lytinen, N. Tomuro, and T. Repede. The use of WordNet sense tagging in FAQFinder. In Proc. AAAI-2000 Workshop on AI and Web Search, Austin, TX, 2000.

[24]

A. McCallum, D. Freitag, and F. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proc. ICML 2000, pages 591--598, 2000.

Digital Library

[25]

G. Mishne and M. de Rijke. Boosting Web Retrieval through Query Operations. In Proc. ECIR 2005, pages 502--516, 2005.

Digital Library

[26]

M. Porter. An algorithm for suffix stripping. Program, 14 (3):130--137, 1980.

[27]

D. Radev, W. Fan, H. Qi, H. Wu, and A. Grewal. Probabilistic question answering on the web. In Proc. WWW 2002, pages 408--419, 2002.

Digital Library

[28]

G. Ramakrishnan, S. Chakrabarti, D. Paranjpe, and P. Bhattacharya. Is question answering an acquired skill? In Proc. WWW 2004, pages 111--120, 2004.

Digital Library

[29]

R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In Proc. HLT/NAACL, 2004.

[30]

E. Voorhees. Evaluating answers to definition questions. In Proc. HLT 2003, 2003.

Digital Library

[31]

J. Wang and F. Lochovsky. Data extraction and label assignment for web databases. In Proc. WWW 2003, pages 197--196, 2003.

Digital Library

[32]

S. Whitehead. Auto-FAQ: An experiment in cyberspace leveraging. Computer Networks and ISDN Systems, 28(1--2): 137--146, 1995.

Digital Library

[33]

R. Wilkinson. Effective retrieval of structured documents. In Proc. SIGIR 1994, pages 311--317, 1994.

Digital Library

[34]

Z. Zheng. AnswerBus question answering system. In Proc. HLT 2002, 2002.

Digital Library

Cited By

Zhang JHe JZhou YSun XYu X(2023)HSM-QA: Question Answering System Based on Hierarchical Semantic MatchingIEEE Access10.1109/ACCESS.2023.329685011(77826-77839)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3296850
Rashid MJamour FHristidis VDemartini GZuccon GCulpepper JHuang ZTong H(2021)QuAXProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482289(1518-1527)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482289
Li JJi CYan GYou LChen J(2021)An Ensemble Net of Convolutional Auto-Encoder and Graph Auto-Encoder for Auto-DiagnosisIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2020.298433513:1(189-199)Online publication date: Mar-2021
https://doi.org/10.1109/TCDS.2020.2984335
Show More Cited By

Index Terms

Retrieving answers from frequently asked questions pages on the web
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval

Recommendations

FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Frequently Asked Question (FAQ) retrieval is an important task where the objective is to retrieve an appropriate Question-Answer (QA) pair from a database based on a user's query. We propose a FAQ retrieval system that considers the similarity between a ...
QuAX: Mining the Web for High-utility FAQ
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Frequently Asked Questions (FAQ) are a form of semi-structured data that provides users with commonly requested information and enables several natural language processing tasks. Given the plethora of such question-answer pairs on the Web, there is an ...
Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval

We study the potential of supervised learning to rank for FAQ retrieval.Supervised models offer performance improvements for this task.We explored low-effort paraphrase-based data labeling strategies.Paraphrase-based labeling was effective for the best ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

October 2005

854 pages

ISBN:1595931406

DOI:10.1145/1099554

General Chair:
Otthein Herzog
University of Bremen, Germany
,
Program Chairs:
Hans-Jörg Schek
University for Health Sciences, Medical Informatics and Technology, Austria
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Abdur Chowdhury
America Online, USA
,
Wilfried Teiken
IBM T.J. Watson Research Center, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM05

Sponsor:

CIKM05: Conference on Information and Knowledge Management

October 31 - November 5, 2005

Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
1,469
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)8

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JHe JZhou YSun XYu X(2023)HSM-QA: Question Answering System Based on Hierarchical Semantic MatchingIEEE Access10.1109/ACCESS.2023.329685011(77826-77839)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3296850
Rashid MJamour FHristidis VDemartini GZuccon GCulpepper JHuang ZTong H(2021)QuAXProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482289(1518-1527)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482289
Li JJi CYan GYou LChen J(2021)An Ensemble Net of Convolutional Auto-Encoder and Graph Auto-Encoder for Auto-DiagnosisIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2020.298433513:1(189-199)Online publication date: Mar-2021
https://doi.org/10.1109/TCDS.2020.2984335
Khushhal SMajid AAbbas SNadeem MShah S(2020)Question retrieval using combined queries in community question answeringJournal of Intelligent Information Systems10.1007/s10844-020-00612-xOnline publication date: 24-Jul-2020
https://doi.org/10.1007/s10844-020-00612-x
Damani SNarahari KChatterjee AGupta MAgrawal P(2020)Optimized Transformer Models for FAQ AnsweringAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47426-3_19(235-248)Online publication date: 6-May-2020
https://doi.org/10.1007/978-3-030-47426-3_19
Ben Abacha ADemner-Fushman D(2019)A question-entailment approach to question answeringBMC Bioinformatics10.1186/s12859-019-3119-420:1Online publication date: 22-Oct-2019
https://doi.org/10.1186/s12859-019-3119-4
Gupta SCarvalho VPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)FAQ Retrieval Using Attentive MatchingProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331294(929-932)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331294
Yulianti EChen RScholer FCroft WSanderson M(2018)Document Summarization for Answering Non-Factoid QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.275437330:1(15-28)Online publication date: 1-Jan-2018
https://doi.org/10.1109/TKDE.2017.2754373
Nguyen DNguyen DPham SUnger CNgonga Ngomo ACimiano PAuer SPaliouras G(2017)Ripple Down Rules for question answeringSemantic Web10.3233/SW-1502048:4(511-532)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.3233/SW-150204
Efraim OMaraev VRodrigues J(2017)Boosting a Rule-Based Chatbot Using Statistics and User Satisfaction RatingsArtificial Intelligence and Natural Language10.1007/978-3-319-71746-3_3(27-41)Online publication date: 28-Nov-2017
https://doi.org/10.1007/978-3-319-71746-3_3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten