Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-32686-9_5guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

BM25 Beyond Query-Document Similarity

Published: 07 October 2019 Publication History

Abstract

The massive growth of information produced and shared online has made retrieving relevant documents a difficult task. Query Expansion (QE) based on term co-occurrence statistics has been widely applied in an attempt to improve retrieval effectiveness. However, selecting good expansion terms using co-occurrence graphs is challenging. In this paper, we present an adapted version of the BM25 model, which allows measuring the similarity between terms. First, a context window-based approach is applied over the entire corpus in order to construct the term co-occurrence graph. Afterward, using the proposed adapted version of BM25, candidate expansion terms are selected according to their similarity with the whole query. This measure stands out by its ability to evaluate the discriminative power of terms and select semantically related terms to the query. Experiments on two ad-hoc TREC collections (the standard Robust04 collection and the new TREC Washington Post collection) show that our proposal outperforms the baselines over three state-of-the-art IR models and leads to significant improvements in retrieval effectiveness.

References

[1]
Aklouche, B., Bounhas, I., Slimani, Y.: Query expansion based on NLP and word embeddings. In: Proceedings of the The Twenty-Seventh Text Retrieval Conference (TREC 2018), Gaithersburg, Maryland, USA (14–16 November 2018)
[2]
Aklouche B, Bounhas I, and Slimani Y Welzer T, Eder J, Podgorelec V, and Kamisalic Latific A Pseudo-relevance feedback based on locally-built co-occurrence graphs Advances in Databases and Information Systems 2019 105-119
[3]
ALMasri M, Berrut C, and Chevallet J-P Ferro N, Crestani F, Moens M-F, Mothe J, Silvestri F, Di Nunzio GM, Hauff C, and Silvello G A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information Advances in Information Retrieval 2016 Cham Springer 709-715
[4]
Amati, G.: Probability models for information retrieval based on divergence from randomness. Ph.D. thesis, University of Glasgow, UK (2003)
[5]
Ariannezhad, M., Montazeralghaem, A., Zamani, H., Shakery, A.: Improving retrieval performance for verbose queries via axiomatic analysis of term discrimination heuristic. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, pp. 1201–1204. ACM, 7–11 August 2017
[6]
Bai, J., Song, D., Bruza, P., Nie, J.Y., Cao, G.: Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, pp. 688–695. ACM, 31 October–5 November 2005
[7]
Bounhas I, Elayeb B, Evrard F, and Slimani Y ArabOnto: experimenting a new distributional approach for building arabic ontological resources Int. J. Metadata, Semant. Ontol. 2011 6 2 81-95
[8]
Carpineto C and Romano G A survey of automatic query expansion in information retrieval ACM Comput. Surv. (CSUR) 2012 44 1 11-150
[9]
Elayeb B, Bounhas I, Khiroun OB, Evrard F, and Saoud NBB A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation Knowl. Inf. Syst. 2015 44 1 91-126
[10]
Elayeb B, Bounhas I, Khiroun OB, and Saoud NBB Duval B, van den Herik J, Loiseau S, and Filipe J Combining semantic query disambiguation and expansion to improve intelligent information retrieval Agents and Artificial Intelligence 2015 Cham Springer 280-295
[11]
Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 91–101. ACM (3–5 June 1987)
[12]
Fonseca, B.M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., Ziviani, N.: Concept-based interactive query expansion. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, pp. 696–703. ACM (31 October – 05 November 2005)
[13]
He B, Huang JX, and Zhou X Modeling term proximity for probabilistic information retrieval models Inf. Sci. 2011 181 14 3017-3031
[14]
Jones KS, Walker S, and Robertson SEA probabilistic model of information retrieval: development and comparative experiments: Part 2Inf. Process. Manag.2000366809840https://doi.org/10.1016/S0306-4573(00)00016-9
[15]
Lv, Y., Zhai, C.: Lower-bounding term frequency normalization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK, pp. 7–16. ACM, 24–28 October 2011
[16]
Manning CD, Raghavan P, and Schütze H Introduction to Information Retrieval 2008 Cambridge Cambridge University Press
[17]
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, pp. 472–479. ACM (15–19 August 2005)
[18]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, United States, pp. 3111–3119. 5–8 December 2013
[19]
Peat HJ and Willett P The limitations of term co-occurrence data for query expansion in document retrieval systems J. Am. Soc. Inf. Sci. 1991 42 5 378-383
[20]
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543. ACL 25–29 October 2014
[21]
Rasolofo Y and Savoy J Sebastiani F Term proximity scoring for keyword-based retrieval systems Advances in Information Retrieval 2003 Heidelberg Springer 207-218
[22]
Robertson SE and Walker S Croft BW and van Rijsbergen CJ Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval SIGIR 1994 1994 London Springer 232-241
[23]
Robertson SE and Zaragoza H The probabilistic relevance framework: Bm25 and beyond Found. Trends Inf. Retrieval 2009 3 4 333-389
[24]
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, Washington, D.C., USA, pp. 42–49. ACM, 08–13 November 2004
[25]
Salton G and McGill M Introduction to Modern Information Retrieval 1984 USA McGraw-Hill Book Company
[26]
Song R, Taylor MJ, Wen J-R, Hon H-W, and Yu Y Macdonald C, Ounis I, Plachouras V, Ruthven I, and White RW Viewing term proximity from a different perspective Advances in Information Retrieval 2008 Heidelberg Springer 346-357
[27]
Valcarce, D., Parapar, J., Barreiro, A.: Lime: Linear methods for pseudo-relevance feedback. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, Pau, France, pp. 678–687. ACM, 09–13 April 2018
[28]
Xu J and Croft WB Improving the effectiveness of information retrieval with local context analysis ACM Trans. Inf. Syst. (TOIS) 2000 18 1 79-112
[29]
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 4–11. ACM, 18–22 August 1996
[30]
Zamani, H., Croft, W.B.: Relevance-based word embedding. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, pp. 505–514. ACM, 7–11 August 2017
[31]
Zamani, H., Dadashkarimi, J., Shakery, A., Croft, W.B.: Pseudo-relevance feedback based on matrix factorization. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, Indiana, USA, pp. 1483–1492. ACM, 24–28 October 2016
[32]
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 334–342. ACM, 9–13 September 2001
[33]
Zingla, M.A., Chiraz, L., Slimani, Y.: Short query expansion for microblog retrieval. In: Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 20th International Conference KES-2016, York, UK, pp. 225–234. Elsevier, 5–7 September 2016

Index Terms

  1. BM25 Beyond Query-Document Similarity
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      String Processing and Information Retrieval: 26th International Symposium, SPIRE 2019, Segovia, Spain, October 7–9, 2019, Proceedings
      Oct 2019
      536 pages
      ISBN:978-3-030-32685-2
      DOI:10.1007/978-3-030-32686-9

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 07 October 2019

      Author Tags

      1. Query expansion
      2. Co-occurrence graph
      3. BM25
      4. Term discriminative power
      5. Ad-hoc IR

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media