Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-31865-1_15guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Term frequency normalisation tuning for BM25 and DFR models

Published: 21 March 2005 Publication History

Abstract

The term frequency normalisation parameter tuning is a crucial issue in information retrieval (IR), which has an important impact on the retrieval performance. The classical pivoted normalisation approach suffers from the collection-dependence problem. As a consequence, it requires relevance assessment for each given collection to obtain the optimal parameter setting. In this paper, we tackle the collection-dependence problem by proposing a new tuning method by measuring the normalisation effect. The proposed method refines and extends our methodology described in [7]. In our experiments, we evaluate our proposed tuning method on various TREC collections, for both the normalisation 2 of the Divergence From Randomness (DFR) models and the BM25's normalisation method. Results show that for both normalisation methods, our tuning method significantly outperforms the robust empirically-obtained baselines over diverse TREC collections, while having a marginal computational cost.

References

[1]
G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow, 2003.
[2]
G. Amati and C. J. van Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. In ACM Transactions on Information Systems (TOIS), volume 20(4), pages 357-389, 2002.
[3]
J. Callan and M. Connell. Query-based sampling of text databases. In ACM Transactions on Information Systems (TOIS), pages 97-130, Volume 19, Issue 2, April, 2001.
[4]
A. Chowdhury, M. C. McCabe, D. Grossman, and O. Frieder. Document normalization revisited. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 381-382, Tampere, Finland, 2002.
[5]
D. Hawking. Overview of the TREC-9 Web Track. In Proceedings of the Nineth Text REtrieval Conference (TREC-9), pages 87-94, Gaithersburg, MD, 2000.
[6]
D. Hawking, E. Voorhees, N. Craswell, and P. Bailey. Overview of the TREC-8 Web Track. In Proceedings of the Eighth Text REtrieval Conference (TREC-8), pages 131-150, Gaithersburg, MD, 1999.
[7]
B. He and I. Ounis. A study of parameter tuning for term frequency normalization. In Proceedings of the Twelveth ACM CIKM International Conference on Information and Knowledge Management, pages 10-16, New Orleans, LA, 2003.
[8]
C. J. van Rijsbergen. Information Retrieval, 2nd edition. Department of Computer Science, University of Glasgow, 1979.
[9]
S. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), pages 73-96, Gaithersburg, MD, 1995.
[10]
C. Silverstein, M. R. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6-12, 1999.
[11]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21-29, 1996.
[12]
K. Sparck-Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, 36(2000):779-840, 2000.

Cited By

View all
  • (2019)A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query termsInformation Retrieval10.1007/s10791-018-9347-922:6(543-569)Online publication date: 1-Dec-2019
  • (2017)A novel Fuzzy-PSO term weighting automatic query expansion approach using combined semantic filteringKnowledge-Based Systems10.1016/j.knosys.2017.09.004136:C(97-120)Online publication date: 15-Nov-2017
  • (2017)An Investigation into the Use of Document Scores for Optimisation over Rank-Biased PrecisionInformation Retrieval Technology10.1007/978-3-319-70145-5_15(197-209)Online publication date: 22-Nov-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ECIR'05: Proceedings of the 27th European conference on Advances in Information Retrieval Research
March 2005
572 pages
ISBN:3540252959
  • Editors:
  • David E. Losada,
  • Juan M. Fernández-Luna

Sponsors

  • CEPIS: Council of European Professional Informatics Societies
  • Sharp Laboratories of Europe, Ltd.: Sharp Laboratories of Europe, Ltd.
  • University of Granada: University of Granada
  • Microsoft Research: Microsoft Research
  • BCS-IRSG: BCS/Information Retrieval Specialist Group

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 21 March 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query termsInformation Retrieval10.1007/s10791-018-9347-922:6(543-569)Online publication date: 1-Dec-2019
  • (2017)A novel Fuzzy-PSO term weighting automatic query expansion approach using combined semantic filteringKnowledge-Based Systems10.1016/j.knosys.2017.09.004136:C(97-120)Online publication date: 15-Nov-2017
  • (2017)An Investigation into the Use of Document Scores for Optimisation over Rank-Biased PrecisionInformation Retrieval Technology10.1007/978-3-319-70145-5_15(197-209)Online publication date: 22-Nov-2017
  • (2016)OLFinderJournal of Information Science10.1177/016555151560521742:5(659-674)Online publication date: 1-Oct-2016
  • (2015)Verboseness Fission for BM25 Document Length NormalizationProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809486(385-388)Online publication date: 27-Sep-2015
  • (2012)A constraint to automatically regulate document-length normalisationProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398662(2443-2446)Online publication date: 29-Oct-2012
  • (2010)Reverted indexing for feedback and expansionProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871571(1049-1058)Online publication date: 26-Oct-2010
  • (2007)Setting per-field normalisation hyper-parameters for the named-page finding search taskProceedings of the 29th European conference on IR research10.5555/1763653.1763709(468-480)Online publication date: 2-Apr-2007
  • (2007)Parameter sensitivity in the probabilistic model for ad-hoc retrievalProceedings of the sixteenth ACM conference on Conference on information and knowledge management10.1145/1321440.1321479(263-272)Online publication date: 6-Nov-2007
  • (2007)An axiomatic comparison of learned term-weighting schemes in information retrievalArtificial Intelligence Review10.1007/s10462-008-9074-528:1(51-68)Online publication date: 1-Jun-2007
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media