Abstract
We introduce the hypergeometric models KL, DLH and DLLH using the DFR approach, and we compare these models to other relevant models of IR. The hypergeometric models are based on the probability of observing two probabilities: the relative within-document term frequency and the entire collection term frequency. Hypergeometric models are parameter-free models of IR. Experiments show that these models have an excellent performance with small and very large collections. We provide their foundations from the same IR probability space of language modelling (LM). We finally discuss the difference between DFR and LM. Briefly, DFR is a frequentist (Type I), or combinatorial approach, whilst language models use a Bayesian (Type II) approach for mixing the two probabilities, being thus inherently parametric in its nature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amati, G.: Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, University of Glasgow (June 2003)
Amati, G., Carpineto, C., Romano, G.: FUB at TREC 10 web track: a probabilistic framework for topic relevance term weighting. In: Voorhees, E., Harman, D. (eds.) Proceedings of the 10th Text Retrieval Conference TREC 2001, Gaithersburg, MD, pp. 182–191. NIST Special Publication 500-250 (2002)
Amati, G., Carpineto, C., Romano, G.: Fondazione Ugo Bordoni at TREC 2004. In: Voorhees, E., Harman, D. (eds.) Proceedings of the 13th Text Retrieval Conference TREC 2001, Gaithersburg, MD, NIST Special Publication 500-261 (2004)
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)
Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5 2, 179–190 (1983)
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 222–229. ACM Press, New York (1999)
Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Carpineto, C., De Mori, R., Romano, G., Bigi, B.: An information theoretic approach to automatic query expansion. ACM Transactions on Information Systems 19(1), 1–27 (2001)
Feller, W.: An introduction to probability theory and its applications., 3rd edn., vol. I. John Wiley & Sons Inc., New York (1968)
Good, I.J.: The Estimation of Probabilities: an Essay onModern BayesianMethods, vol. 30. The M.I.T. Press, Cambridge (1968)
Harter, S.P.: A probabilistic approach to automatic keyword indexing. PhD thesis, Graduate Library, The University of Chicago, Thesis No. T25146 (1974)
He, B., Ounis, I.: A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth International Conference on Information and Knowledge Management. Springer, Heidelberg (2005)
He, B., Ounis, I.: A study of the Dirichlet priors for term frequency normalisation. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 465–471. ACM Press, New York (2005)
Jelinek, F., Mercer, R.: Interpolated estimation of markov source parameters from sparse data. In: Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of ACM SIGIR, New Orleans, Louisiana, USA, pp. 111–119. ACM Press, New York (2001)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)
Plachouras, V., He, B., Ounis, I.: University of Glasgow at TREC2004: Experiments in Web, Robust and Terabyte tracks with Terrier. In: Proceedings of the 13th Text REtrieval Conference (TREC 2004), Gaithersburg, MD, NIST Special Pubblication 500-261 (2004)
Plochouras, V., Ounis, I.: Usefulness of hyperlink structure for query-biased topic distillation. In: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 448–455. ACM Press, New York (2004)
Ponte, J., Croft, B.: A Language Modeling Approach in Information Retrieval. In: Croft, B., Moffat, A., Van Rijsbergen, C. (eds.) The 21st ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 275–281. ACM Press, New York (1998)
Raghavan, V.V., Wong, S.K.: A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science 37(5), 279–287 (1986)
Renyi, A.: Foundations of probability. Holden-Day Press, San Francisco (1969)
Robertson, S., Walker, S.: Some simple approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 232–241. Springer, Heidelberg (1994)
Salton, G.: The SMART Retrieval System. Prentice Hall, New Jersey (1971)
Salton, G., McGill, M.: Introduction to modern Information Retrieval. McGraw–Hill, New York (1983)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: ClKM 2001, Atlanta, Georgia, USA, November 5-10, pp. 334–342. ACM Press, New York (2001)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amati, G. (2006). Frequentist and Bayesian Approach to Information Retrieval. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_3
Download citation
DOI: https://doi.org/10.1007/11735106_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33347-0
Online ISBN: 978-3-540-33348-7
eBook Packages: Computer ScienceComputer Science (R0)