Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3511808.3557207acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Open access

ranx.fuse: A Python Library for Metasearch

Published: 17 October 2022 Publication History

Abstract

This paper presents ranx.fuse, a Python library for Metasearch. Built following a user-centered design, it provides easy-to-use tools for combining the results of multiple search engines. ranx.fuse comprises 25 Metasearch algorithms implemented with Numba, a just-in-time compiler for Python code, for efficient vector operations and automatic parallelization. Moreover, in conjunction with the Metasearch algorithms, our library implements six normalization strategies that transform the search engines' result lists to make them comparable, a mandatory step for Metasearch. Finally, as many Metasearch algorithms require a training or optimization step, ranx.fuse offers a convenient functionality for their optimization that evaluates pre-defined hyper-parameters configurations via grid search. By relying on the provided functions, the user can optimally combine the results of multiple search engines in very few lines of code. ranx.fuse can also serve as a user-friendly tool for fusing the rankings computed by a first-stage retriever and a re-ranker, as a library providing several baselines for Metasearch, and as a playground to test novel normalization strategies.

Supplementary Material

MP4 File (CIKM-demo156.mp4)
Presentation Video

References

[1]
Javed A. Aslam and Mark H. Montague. 2001. Models for Metasearch. In SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 9--13, 2001, New Orleans, Louisiana, USA, W. Bruce Croft, David J. Harper, Donald H. Kraft, and Justin Zobel (Eds.). ACM, 275--284. https://doi.org/10.1145/383952.384007
[2]
John Aycock. 2003. A brief history of just-in-time. ACM Comput. Surv., Vol. 35, 2 (2003), 97--113.
[3]
Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2017. Retrieval Consistency in the Presence of Query Variations. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 395--404. https://doi.org/10.1145/3077136.3080839
[4]
Elias Bassani. 2022. ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison. In Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10--14, 2022, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 13186), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer, 259--264. https://doi.org/10.1007/978--3-030--99739--7_30
[5]
Gordon V. Cormack, Charles L. A. Clarke, and Stefan Bü ttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR. ACM, 758--759.
[6]
W Bruce Croft. 2002. Combining approaches to information retrieval. In Advances in information retrieval. Springer, 1--36.
[7]
Edward A. Fox and Joseph A. Shaw. 1993. Combination of Multiple Searches. In TREC (NIST Special Publication, Vol. 500--215). National Institute of Standards and Technology (NIST), 243--252.
[8]
Luyu Gao, Zhuyun Dai, Zhen Fan, and Jamie Callan. 2020. Complementing Lexical Retrieval with Semantic Residual Embedding. CoRR, Vol. abs/2004.13969 (2020). showeprint[arXiv]2004.13969 https://arxiv.org/abs/2004.13969
[9]
D. Frank Hsu and Isak Taksa. 2005. Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval. Inf. Retr., Vol. 8, 3 (2005), 449--480. https://doi.org/10.1007/s10791-005--6994--4
[10]
Kalervo J"a rvelin and Jaana Kek"a l"a inen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., Vol. 20, 4 (2002), 422--446.
[11]
Oren Kurland and J. Shane Culpepper. 2018. Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2019, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 1383--1386. https://doi.org/10.1145/3209978.3210186
[12]
Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: a LLVM-based Python JIT compiler. In LLVM@SC. ACM, 7:1--7:6.
[13]
Joon Ho Lee. 1997. Analyses of Multiple Evidence Combination. In SIGIR. ACM, 267--276.
[14]
David Lillis, Fergus Toolan, Rem W. Collier, and John Dunnion. 2006. ProbFuse: a probabilistic approach to data fusion. In SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6--11, 2006, Efthimis N. Efthimiadis, Susan T. Dumais, David Hawking, and Kalervo J"a rvelin (Eds.). ACM, 139--146. https://doi.org/10.1145/1148170.1148197
[15]
David Lillis, Fergus Toolan, Rem W. Collier, and John Dunnion. 2008. Extending Probabilistic Data Fusion Using Sliding Windows. In Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 4956), Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.). Springer, 358--369. https://doi.org/10.1007/978--3--540--78646--7_33
[16]
David Lillis, Lusheng Zhang, Fergus Toolan, Rem W. Collier, David Leonard, and John Dunnion. 2010. Estimating probabilities for effective data fusion. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19--23, 2010, Fabio Crestani, Sté phane Marchand-Maillet, Hsin-Hsi Chen, Efthimis N. Efthimiadis, and Jacques Savoy (Eds.). ACM, 347--354. https://doi.org/10.1145/1835449.1835508
[17]
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond. Morgan & Claypool Publishers. https://doi.org/10.2200/S01123ED1V01Y202108HLT053
[18]
Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian. 2021. Simplified Data Wrangling with ir_datasets. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 2429--2436. https://doi.org/10.1145/3404835.3463254
[19]
Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing, Vol. 14, 9 (2011), 1--9.
[20]
Mark H. Montague and Javed A. Aslam. 2001. Relevance Score Normalization for Metasearch. In Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, November 5--10, 2001. ACM, 427--433. https://doi.org/10.1145/502585.502657
[21]
Mark H. Montague and Javed A. Aslam. 2002. Condorcet fusion for improved retrieval. In Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 4--9, 2002. ACM, 538--548. https://doi.org/10.1145/584792.584881
[22]
André Mour a o, Flá vio Martins, and Jo a o Magalh a es. 2015. Multimodal medical information retrieval with unsupervised rank fusion. Comput. Medical Imaging Graph., Vol. 39 (2015), 35--45. https://doi.org/10.1016/j.compmedimag.2014.05.006
[23]
Jo a o R. M. Palotti, Harrisen Scells, and Guido Zuccon. 2019. TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21--25, 2019, Benjamin Piwowarski, Max Chevalier, É ric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer (Eds.). ACM, 1325--1328. https://doi.org/10.1145/3331184.3331399
[24]
M. Elena Renda and Umberto Straccia. 2003. Web Metasearch: Rank vs. Score Based Rank Aggregation Methods. In Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), March 9--12, 2003, Melbourne, FL, USA, Gary B. Lamont, Hisham Haddad, George A. Papadopoulos, and Brajendra Panda (Eds.). ACM, 841--846. https://doi.org/10.1145/952532.952698
[25]
Stephen E. Robertson and Steve Walker. 1994. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3--6 July 1994 (Special Issue of the SIGIR Forum), W. Bruce Croft and C. J. van Rijsbergen (Eds.). ACM/Springer, 232--241. https://doi.org/10.1007/978--1--4471--2099--5_24
[26]
Milad Shokouhi. 2007. Segmentation of Search Engine Results for Effective Data-Fusion. In Advances in Information Retrieval, 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2--5, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4425), Giambattista Amati, Claudio Carpineto, and Giovanni Romano (Eds.). Springer, 185--197. https://doi.org/10.1007/978--3--540--71496--5_19
[27]
Shuai Wang, Shengyao Zhuang, and Guido Zuccon. 2021. BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval. In ICTIR '21: The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Canada, July 11, 2021, Faegheh Hasibi, Yi Fang, and Akiko Aizawa (Eds.). ACM, 317--324. https://doi.org/10.1145/3471158.3472233
[28]
Shengli Wu and Fabio Crestani. 2002. Data fusion with estimated weights. In Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 4--9, 2002. ACM, 648--651. https://doi.org/10.1145/584792.584908
[29]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 1503--1512. https://doi.org/10.1145/3404835.3462880

Cited By

View all
  • (2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
  • (2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
  • (2024)Enhancing Dataset Search with Compact Data SnippetsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657837(1093-1103)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
  • General Chairs:
  • Mohammad Al Hasan,
  • Li Xiong
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fusion
  2. information retrieval
  3. metasearch
  4. tool

Qualifiers

  • Short-paper

Conference

CIKM '22
Sponsor:

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
Overall Acceptance Rate 823 of 3,288 submissions, 25%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)391
  • Downloads (Last 6 weeks)22
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
  • (2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
  • (2024)Enhancing Dataset Search with Compact Data SnippetsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657837(1093-1103)Online publication date: 10-Jul-2024
  • (2024)Injecting the score of the first-stage retriever as text improves BERT-based re-rankersDiscover Computing10.1007/s10791-024-09435-827:1Online publication date: 26-Jun-2024
  • (2023)A Comparative Study of Rank Aggregation Methods in Recommendation SystemsEntropy10.3390/e2501013225:1(132)Online publication date: 9-Jan-2023
  • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 11-Dec-2023
  • (2023)SE-PEF: a Resource for Personalized Expert FindingProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625335(288-309)Online publication date: 26-Nov-2023
  • (2023)ranxhub: An Online Repository for Information Retrieval RunsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591823(3210-3214)Online publication date: 19-Jul-2023
  • (2023)Bibliometric Data Fusion for Biomedical Information Retrieval2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL57899.2023.00026(107-118)Online publication date: Jun-2023
  • (2022)A Multi-Domain Benchmark for Personalized Search EvaluationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557536(3822-3827)Online publication date: 17-Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media