Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1076034.1076114acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A study of the dirichlet priors for term frequency normalisation

Published: 15 August 2005 Publication History

Abstract

In Information Retrieval (IR), the Dirichlet Priors have been applied to the smoothing technique of the language modeling approach. In this paper, we apply the Dirichlet Priors to the term frequency normalisation of the classical BM25 probabilistic model and the Divergence from Randomness PL2 model. The contributions of this paper are twofold. First, through extensive experiments on four TREC collections, we show that the newly generated models, to which the Dirichlet Priors normalisation is applied, provide robust and effective performance. Second, we propose a novel theoretically-driven approach to the automatic parameter tuning of the Dirichlet Priors normalisation. Experiments show that this tuning approach optimises the retrieval performance of the newly generated Dirichlet Priors-based weighting models.

References

[1]
G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow, 2003.
[2]
G. Amati and C. J. van Rijsbergen. Probabilistic models of Information Retrieval based on measuring the divergence from randomness. In ACM Transactions on Information Systems (TOIS), volume 20(4), pages 357--389, October 2002.
[3]
S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pages 310--318, San Francisco, CA, 1996.
[4]
M. DeGroot. Probability and Statistics. Addison Wesley, 2nd edition edition, 1989.
[5]
D. Hawking. Overview of the TREC-9 Web Track. In Proceedings of the Nineth Text REtrieval Conference (TREC-9), pages 87--94, Gaithersburg, MD, 2000.
[6]
D. Hawking, E. Voorhees, N. Craswell, and P. Bailey. Overview of the TREC-8 Web Track. In Proceedings of the Eighth Text REtrieval Conference (TREC-8), pages 131--150, Gaithersburg, MD, 1999.
[7]
B. He and I. Ounis. Tuning Term Frequency Normalisation for BM25 and DFR Models. In Proceedings of the 27th European Conference on Information Retrieval (ECIR'05), pages 200--214, Santiago de Compostela, Spain, March, 2005.
[8]
F. Jelinek and R. Mercer. Interpolated estimation of markov source parameters from sparse data. In E. S. Gelsema and L. N. Kanal, editors, Pattern Recognition in Practice, pages 381--402, Amsterdam, The Netherlands, 1980.
[9]
C. J. van Rijsbergen. Information Retrieval, 2nd edition. Department of Computer Science, University of Glasgow, 1979.
[10]
S. Robertson, S. Walker, M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), pages 73--96, Gaithersburg, MD, 1995.
[11]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--29, Zurich, Switzerland, 1996.
[12]
K. Sparck-Jones. A statistical interpretation of term specificity and its application to retrieval. Journal of Documentation, (28):11--21, 1972.
[13]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc Information Retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334--342, New Orleans, LA, 2001.

Cited By

View all
  • (2021)A Comparison between Term-Independence Retrieval Models for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/348361240:3(1-37)Online publication date: 8-Dec-2021
  • (2021)Towards adaptive structured Dirichlet smoothing model for digital resource objectsMultimedia Tools and Applications10.1007/s11042-020-10305-wOnline publication date: 9-Jan-2021
  • (2021)SeAbOM: Semi-supervised Learning for Aspect-Based Opinion MiningProceedings of International Conference on Data Science and Applications10.1007/978-981-16-5120-5_36(479-489)Online publication date: 23-Nov-2021
  • Show More Cited By

Index Terms

  1. A study of the dirichlet priors for term frequency normalisation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
    August 2005
    708 pages
    ISBN:1595930345
    DOI:10.1145/1076034
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dirichlet priors
    2. term frequency normalisation
    3. weighting model

    Qualifiers

    • Article

    Conference

    SIGIR05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)A Comparison between Term-Independence Retrieval Models for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/348361240:3(1-37)Online publication date: 8-Dec-2021
    • (2021)Towards adaptive structured Dirichlet smoothing model for digital resource objectsMultimedia Tools and Applications10.1007/s11042-020-10305-wOnline publication date: 9-Jan-2021
    • (2021)SeAbOM: Semi-supervised Learning for Aspect-Based Opinion MiningProceedings of International Conference on Data Science and Applications10.1007/978-981-16-5120-5_36(479-489)Online publication date: 23-Nov-2021
    • (2019)A New Digital Signal Processing Based Model With Multi-Aspect Term Frequency for Information RetrievalIEEE Access10.1109/ACCESS.2019.29462887(160738-160754)Online publication date: 2019
    • (2018)A systematic approach to normalization in probabilistic modelsInformation Retrieval Journal10.1007/s10791-018-9334-121:6(565-596)Online publication date: 30-Jun-2018
    • (2017)A novel Fuzzy-PSO term weighting automatic query expansion approach using combined semantic filteringKnowledge-Based Systems10.1016/j.knosys.2017.09.004136:C(97-120)Online publication date: 15-Nov-2017
    • (2016)Estimating Retrieval Performance Bound for Single Term QueriesProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970428(237-240)Online publication date: 12-Sep-2016
    • (2016)A Reproducibility Study of Information Retrieval ModelsProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970415(77-86)Online publication date: 12-Sep-2016
    • (2016)Parameterized Decay Model for Information RetrievalACM Transactions on Intelligent Systems and Technology10.1145/28007947:3(1-21)Online publication date: 1-Feb-2016
    • (2015)Verboseness Fission for BM25 Document Length NormalizationProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809486(385-388)Online publication date: 27-Sep-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media