Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1871437.1871474acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Term necessity prediction

Published: 26 October 2010 Publication History

Abstract

The probability that a term appears in relevant documents (P(t | R)) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of relevant documents retrieved by the term - how necessary a term's occurrence is to document relevance. Prior research typically either set this probability to a constant, or estimated it based on the term's inverse document frequency, neither of which was very effective.
This paper identifies several factors that affect term necessity, for example, a term's topic centrality, synonymy and abstractness. It develops term- and query-dependent features for each factor that enable supervised learning of a predictive model of term necessity from training data. Experiments with two popular retrieval models and 6 standard datasets demonstrate that using predicted term necessity estimates as user term weights of the original query terms leads to significant improvements in retrieval accuracy.

References

[1]
S. E. Robertson and K. Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146. 1976.
[2]
W. Greiff. A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 11--19, 1998.
[3]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). 109--126. Gaithersburg, USA, November 1994.
[4]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281, 1998.
[5]
INDRI - Language modeling meets inference networks. http://www.lemurproject.org/indri/. Retrieved Oct 1, 2009.
[6]
M. Lease, J. Allan and W. B. Croft. Regression rank: Learning to meet the opportunity of descriptive queries. In Proceedings of the 31st European Conference on Information Retrieval (ECIR). 90--101, 2009.
[7]
W. B. Croft and D. J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285--295, December 1979.
[8]
C. T. Yu, K. Lam, and G. Salton. Term weighting in information retrieval using the term precision model. Journal of the ACM, 29(1):152--170, January 1982
[9]
S. E. Robertson. On relevance weight estimation and query expansion. Journal of Documentation, 42(3): 182--188, 1986.
[10]
S. Cronen-Townsend, Y. Zhou and W. B. Croft. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 299--306, 2002.
[11]
M. Bendersky, W. B. Croft. Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 491--498, 2008.
[12]
G. Kumaran and V. Carvalho. Reducing long queries using query quality predictors. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 564--571, 2009.
[13]
Y. Lu, H. Fang and C. Zhai. An empirical study of gene synonym query expansion in biomedical information retrieval. Information Retrieval, 12(1): 51--68, 2009.
[14]
D. Metzler. Generalized inverse document frequency. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. 399--408, 2008.
[15]
M. D. Smucker, J. Allan and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 623--632, 2007.
[16]
J. Allan, M. Connell, W. B. Croft, F. Feng, D. Fisher and X. Li. INQUERY and TREC-9. In Proceedings of the Ninth Text REtrieval Conference (TREC 2002). 551--600, 2000.
[17]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 120--127, 2001.
[18]
L. Zhao and J. Callan. Effective and efficient structured retrieval (poster description). In Proceedings of the 18th ACM Conference on Information and Knowledge Management. 1573--1576, 2009.
[19]
H. Schütze, D. A. Hull and J. O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 229--237, 1995.
[20]
A. Kontostathis and W. M. Pottenger. Detecting patterns in the LSI term-term matrix. IEEE ICDM02 Workshop Proceedings, The Foundation of Data Mining and Knowledge Discovery (FDM). 2002.
[21]
C.J. van Rijsbergen. Information Retrieval (2nd Edition), chapter 6. Butterworths. London 1979.
[22]
R. Lawlor. Information technology and the law. Advances in Computers, 3: 299--346, 1962.
[23]
G. Goertz and H. Starr (eds.) Necessary conditions: theory, methodology, and applications. Lanham, Md.: Rowman & Littlefield 2002. page 10.
[24]
N. Fuhr and C. Buckley. A probabilistic learning approach for document indexing. ACM Transactions on Information Systems 9(3):223--248. 1991.
[25]
W. Cooper, A. Chen and F. Gey. Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. NIST Special Publication 500--215: The Second Text REtrieval Conference (TREC-2). 57--66, 1993.
[26]
V. Dang and W. B. Croft. Query reformulation using anchor text. In Proceedings of the third ACM International Conference on Web Search and Data Mining. 41--50, 2010.
[27]
D. Metzler, V. Lavrenko and W. B. Croft. Formal Multiple-Bernoulli Models for Language Modeling. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 540--541, 2004.

Cited By

View all
  • (2024)Listwise Generative Retrieval Models via a Sequential Learning ProcessACM Transactions on Information Systems10.1145/365371242:5(1-31)Online publication date: 29-Apr-2024
  • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
  • (2023)First steps towards improving official statistics data accessibility in Mexico: Query expansion with neural networks and ad-hoc space vectorsStatistical Journal of the IAOS10.3233/SJI-23001439:3(745-754)Online publication date: 12-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ad-hoc retrieval models
  2. mismatch
  3. necessity
  4. term weighting

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Listwise Generative Retrieval Models via a Sequential Learning ProcessACM Transactions on Information Systems10.1145/365371242:5(1-31)Online publication date: 29-Apr-2024
  • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
  • (2023)First steps towards improving official statistics data accessibility in Mexico: Query expansion with neural networks and ad-hoc space vectorsStatistical Journal of the IAOS10.3233/SJI-23001439:3(745-754)Online publication date: 12-Sep-2023
  • (2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
  • (2022)Semantic Models for the First-Stage Retrieval: A Comprehensive ReviewACM Transactions on Information Systems10.1145/348625040:4(1-42)Online publication date: 24-Mar-2022
  • (2022)On Natural Language User Profiles for Transparent and Scrutable RecommendationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531873(2863-2874)Online publication date: 6-Jul-2022
  • (2021)Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency BenefitsAdvances in Information Retrieval10.1007/978-3-030-72113-8_5(63-78)Online publication date: 27-Mar-2021
  • (2020)Query Rewriting for Voice Shopping Null QueriesProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401052(1369-1378)Online publication date: 25-Jul-2020
  • (2018)Refining Query Expansion Terms using Query ContextProceedings of the 23rd Australasian Document Computing Symposium10.1145/3291992.3292000(1-4)Online publication date: 11-Dec-2018
  • (2018)Towards Better Text Understanding and Retrieval through Kernel Entity Salience ModelingThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3209982(575-584)Online publication date: 27-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media