Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1390334.1390419acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Discovering key concepts in verbose queries

Published: 20 July 2008 Publication History

Abstract

Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. In this paper, we develop and evaluate a technique that uses query-dependent, corpus-dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries. We show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency. Finally, we propose a probabilistic model for integrating the weighted key concepts identified by our method into a query, and demonstrate that this integration significantly improves retrieval effectiveness for a large set of natural language description queries derived from TREC topics on several newswire and web collections.

References

[1]
J. Allan, M.E. Connell, W.B. Croft, F.F. Feng, D. Fisher, and X. Li. INQUERY and TREC-9. Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 551--562, 2000.
[2]
James Allan, Jamie Callan, W. Bruce Croft, Lisa Ballesteros, John Broglio, Jinxi Xu, and Hongmin Shu. INQUERY at TREC-5. pages 119--132. NIST, 1997.
[3]
L. Bentivogli and E. Pianta. Beyond lexical units: Enriching wordnets with phrasets. Proceedings of the Research Note Sessions of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL03), pages 67--70, 2003.
[4]
D.M. Bikel, R. Schwartz, and R.M. Weischedel. An Algorithm that Learns What's in a Name. Machine Learning, 34(1):211--231, 1999.
[5]
Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1, 2006.
[6]
Chris Buckley, Mandar Mitra, Janet A. Walz, and Claire Cardie. Using clustering and superconcepts within SMART: TREC 6. Information Processing and Management, 36(1):109--131, 2000.
[7]
James P. Callan, W. Bruce Croft, and John Broglio. TREC and tipster experiments with INQUERY. Information Processing and Management, 31(3):327--343, 1995.
[8]
Kenneth W. Church and William A. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163--190, 1995.
[9]
K. Collins-Thompson and J. Callan. Query expansion using random walk models. Proceedings of the 14th ACM international conference on Information and knowledge management, pages 704--711, 2005.
[10]
W. Bruce Croft and John Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.
[11]
J.F. da Silva, J. Mexia, C.A. Coelho, and J.G.P. Lopes. Document Clustering and Cluster Topic Extraction in Multilingual Corpora. Proceedings of the 2001 IEEE International Conference on Data Mining, pages 513--520, 2001.
[12]
E. Frank, G.W. Paynter, I.H. Witten, C. Gutwin, and C.G. Nevill-Manning. Domain-specific keyphrase extraction. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pages 668--673, 1999.
[13]
Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, 148:156, 1996.
[14]
Djoerd Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--41. ACM, 2002.
[15]
A. Hulth. Improved automatic keyword extraction gmore linguistic knowledge. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 216--223, 2003.
[16]
Kevin Knight and Daniel Marcu. Statistics-based summarization - step one: Sentence compression. In AAAI/IAAI, pages 703--710, 2000.
[17]
Giridhar Kumaran and James Allan. A case for shorter queries, and helping user create them. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 220--227, 2006.
[18]
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 194--201, 2004.
[19]
Hugo Liu. MontyLingua: An end-to-end natural language processor with common sense, 2004. Available at: web.media.mit.edu/ hugo/montylingua.
[20]
X. Liu and W.B. Croft. Cluster-based retrieval using language models. Proceedings of the 27th annual international conference on Research and developement in information retrieval, pages 186--193, 2004.
[21]
Q. Mei, H. Fang, and C. Zhai. A study of poisson query generation model for information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 319--326. ACM, 2007.
[22]
D. Metzler and W.B. Croft. A Markov random field model for term dependencies. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 472--479, 2005.
[23]
D. Metzler and W.B. Croft. Latent concept expansion using markov random fields. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 311--318, 2007.
[24]
P. Ogilvie and J. Callan. Experiments using the Lemur toolkit. Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.
[25]
Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.
[26]
M. Porter. The Porter Stemming Algorithm. Accessible at http://www.tartarus.org/martin/PorterStemmer.
[27]
Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, 1988.
[28]
T. Strohman, D. Metzler, H. Turtle, and W.B. Croft. Indri: A language model-based search engine for complex queries. Proceedings of the International Conference on Intelligence Analysis, 2004.
[29]
P.D. Turney. Learning Algorithms for Keyphrase Extraction. Information Retrieval, 2(4):303--336, 2000.
[30]
X. Wei and W.B. Croft. LDA-based document models for ad-hoc retrieval. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185, 2006.
[31]
I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
[32]
J. Xu and W.B. Croft. Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4--11, 1996.
[33]
Wen T. Yih, Joshua Goodman, and Vitor R. Carvalho. Finding advertising keywords on web pages. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 213--222, New York, NY, USA, 2006. ACM.
[34]
Y. Zhou and W.B. Croft. Query performance prediction in web search environments. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550, 2007.

Cited By

View all
  • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
  • (2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
  • (2023)Text Indexing for Long Patterns: Anchors are All you NeedProceedings of the VLDB Endowment10.14778/3598581.359858616:9(2117-2131)Online publication date: 1-May-2023
  • Show More Cited By

Index Terms

  1. Discovering key concepts in verbose queries

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
    July 2008
    934 pages
    ISBN:9781605581644
    DOI:10.1145/1390334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. key concepts extraction
    3. verbose queries

    Qualifiers

    • Research-article

    Conference

    SIGIR '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
    • (2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
    • (2023)Text Indexing for Long Patterns: Anchors are All you NeedProceedings of the VLDB Endowment10.14778/3598581.359858616:9(2117-2131)Online publication date: 1-May-2023
    • (2023)Boosting legal case retrieval by query content selection with large language modelsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625328(176-184)Online publication date: 26-Nov-2023
    • (2023)End-to-End Query Term WeightingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599815(4778-4786)Online publication date: 6-Aug-2023
    • (2023)AgAsk: an agent to help answer farmer’s questions from scientific documentsInternational Journal on Digital Libraries10.1007/s00799-023-00369-y25:4(569-584)Online publication date: 19-Jun-2023
    • (2021)Pre-training for Ad-hoc RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482286(1212-1221)Online publication date: 26-Oct-2021
    • (2021)Scientific document summarization in multi-objective clustering frameworkApplied Intelligence10.1007/s10489-021-02376-5Online publication date: 23-May-2021
    • (2021)Pseudo-relevance feedback based query expansion using boosting algorithmArtificial Intelligence Review10.1007/s10462-021-09972-4Online publication date: 20-Feb-2021
    • (2020)Distant Supervision for Keyphrase Extraction using Search Queries2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService)10.1109/BigDataService49289.2020.00019(70-77)Online publication date: Aug-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media