Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2808194.2809486acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper

Verboseness Fission for BM25 Document Length Normalization

Published: 27 September 2015 Publication History

Abstract

BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.

References

[1]
G. Amati and J. C. C. Van Rijsbergen. Probabilistic models for information retrieval based on divergence from randomness. TOIS, 20(4), 2002.
[2]
A. Chowdhury, M. C. McCabe, D. Grossman, and O. Frieder. Document Normalization Revisited. In Proc. of SIGIR, 2002.
[3]
D. Harman. Overview of the Fourth Text REtrieval Conference (TREC-4). In Proc. of TREC 4, 1995.
[4]
B. He and I. Ounis. A Study of Parameter Tuning for Term Frequency Normalization. In Proc. of CIKM, 2003.
[5]
B. He and I. Ounis. A Study of the Dirichlet Priors for Term Frequency Normalisation. In Proc. of SIGIR, 2005.
[6]
B. He and I. Ounis. Term Frequency Normalisation Tuning for BM25 and DFR Models. In Proc. of ECIR, 2005.
[7]
Y. Lv and C. Zhai. Adaptive Term Frequency Normalization for BM25. In Proc. of CIKM, 2011.
[8]
Y. Lv and C. Zhai. Lower-bounding Term Frequency Normalization. In Proc. of CIKM, 2011.
[9]
Y. Lv and C. Zhai. When Documents Are Very Long, BM25 Fails! In Proc. of SIGIR, 2011.
[10]
D. Metzler and H. Zaragoza. Semi-parametric and non-parametric term weighting for information retrieval. In Proc. of ICTIR, 2009.
[11]
S.-H. Na, I.-S. Kang, and J.-H. Lee. Improving term frequency normalization for multi-topical documents and application to language modeling approaches. In Proc. of ECIR, 2008.
[12]
S. Robertson, S. Walker, M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In Proc. of TREC 4, 1995.
[13]
S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 2009.
[14]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of TREC-3, 1994.
[15]
F. Rousseau and M. Vazirgiannis. Composition of TF Normalizations: New Insights on Scoring Functions for Ad Hoc IR. In Proc. of SIGIR, 2013.
[16]
T. Sakai. Alternatives to Bpref. In Proc. of SIGIR, 2007.
[17]
A. Singhal, C. Buckley, and M. Mitra. Pivoted Document Length Normalization. In Proc. of SIGIR, 1996.

Cited By

View all
  • (2024)Retrieval for Extremely Long Queries and Documents with RPRS: A Highly Efficient and Effective Transformer-based Re-RankerACM Transactions on Information Systems10.1145/363193842:5(1-32)Online publication date: 29-Apr-2024
  • (2022)Employees Turnover Rate with Pivoted Length Normalization2022 27th International Computer Conference, Computer Society of Iran (CSICC)10.1109/CSICC55295.2022.9780489(1-4)Online publication date: 23-Feb-2022
  • (2021)Simple but Effective Knowledge-Based Query Reformulations for Precision Medicine RetrievalInformation10.3390/info1210040212:10(402)Online publication date: 29-Sep-2021
  • Show More Cited By

Index Terms

  1. Verboseness Fission for BM25 Document Length Normalization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
    September 2015
    402 pages
    ISBN:9781450338332
    DOI:10.1145/2808194
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 September 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Short-paper

    Conference

    ICTIR '15
    Sponsor:

    Acceptance Rates

    ICTIR '15 Paper Acceptance Rate 29 of 57 submissions, 51%;
    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Retrieval for Extremely Long Queries and Documents with RPRS: A Highly Efficient and Effective Transformer-based Re-RankerACM Transactions on Information Systems10.1145/363193842:5(1-32)Online publication date: 29-Apr-2024
    • (2022)Employees Turnover Rate with Pivoted Length Normalization2022 27th International Computer Conference, Computer Society of Iran (CSICC)10.1109/CSICC55295.2022.9780489(1-4)Online publication date: 23-Feb-2022
    • (2021)Simple but Effective Knowledge-Based Query Reformulations for Precision Medicine RetrievalInformation10.3390/info1210040212:10(402)Online publication date: 29-Sep-2021
    • (2020)Weighting Passages Enhances AccuracyACM Transactions on Information Systems10.1145/342868739:2(1-11)Online publication date: 17-Dec-2020
    • (2020)On the Replicability of Combining Word Embeddings and Retrieval ModelsAdvances in Information Retrieval10.1007/978-3-030-45442-5_7(50-57)Online publication date: 8-Apr-2020
    • (2019)On Biases in Information Retrieval Models and EvaluationACM SIGIR Forum10.1145/3308774.330880452:2(172-173)Online publication date: 17-Jan-2019
    • (2019)A topic‐based term frequency normalization framework to enhance probabilistic information retrievalComputational Intelligence10.1111/coin.1224836:2(486-521)Online publication date: 20-Nov-2019
    • (2018)A systematic approach to normalization in probabilistic modelsInformation Retrieval Journal10.1007/s10791-018-9334-121:6(565-596)Online publication date: 30-Jun-2018
    • (2018)Weighting of Noun Phrases Based on Local Frequency of NounsRecent Advances on Soft Computing and Data Mining10.1007/978-3-319-72550-5_42(436-445)Online publication date: 12-Jan-2018
    • (2017)Back to the Sketch-Board: Integrating Keyword Search, Semantics, and Information RetrievalSemantic Keyword-Based Search on Structured Data Sources10.1007/978-3-319-53640-8_5(49-61)Online publication date: 15-Feb-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media