short-paper

Open access

ranx.fuse: A Python Library for Metasearch

Authors:

Luca RomelliAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 4808 - 4812

https://doi.org/10.1145/3511808.3557207

Published: 17 October 2022 Publication History

Abstract

This paper presents ranx.fuse, a Python library for Metasearch. Built following a user-centered design, it provides easy-to-use tools for combining the results of multiple search engines. ranx.fuse comprises 25 Metasearch algorithms implemented with Numba, a just-in-time compiler for Python code, for efficient vector operations and automatic parallelization. Moreover, in conjunction with the Metasearch algorithms, our library implements six normalization strategies that transform the search engines' result lists to make them comparable, a mandatory step for Metasearch. Finally, as many Metasearch algorithms require a training or optimization step, ranx.fuse offers a convenient functionality for their optimization that evaluates pre-defined hyper-parameters configurations via grid search. By relying on the provided functions, the user can optimally combine the results of multiple search engines in very few lines of code. ranx.fuse can also serve as a user-friendly tool for fusing the rankings computed by a first-stage retriever and a re-ranker, as a library providing several baselines for Metasearch, and as a playground to test novel normalization strategies.

Supplementary Material

MP4 File (CIKM-demo156.mp4)

Presentation Video

Download
79.50 MB

References

[1]

Javed A. Aslam and Mark H. Montague. 2001. Models for Metasearch. In SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 9--13, 2001, New Orleans, Louisiana, USA, W. Bruce Croft, David J. Harper, Donald H. Kraft, and Justin Zobel (Eds.). ACM, 275--284. https://doi.org/10.1145/383952.384007

Digital Library

[2]

John Aycock. 2003. A brief history of just-in-time. ACM Comput. Surv., Vol. 35, 2 (2003), 97--113.

Digital Library

[3]

Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2017. Retrieval Consistency in the Presence of Query Variations. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 395--404. https://doi.org/10.1145/3077136.3080839

Digital Library

[4]

Elias Bassani. 2022. ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison. In Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10--14, 2022, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 13186), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer, 259--264. https://doi.org/10.1007/978--3-030--99739--7_30

[5]

Gordon V. Cormack, Charles L. A. Clarke, and Stefan Bü ttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR. ACM, 758--759.

[6]

W Bruce Croft. 2002. Combining approaches to information retrieval. In Advances in information retrieval. Springer, 1--36.

[7]

Edward A. Fox and Joseph A. Shaw. 1993. Combination of Multiple Searches. In TREC (NIST Special Publication, Vol. 500--215). National Institute of Standards and Technology (NIST), 243--252.

[8]

Luyu Gao, Zhuyun Dai, Zhen Fan, and Jamie Callan. 2020. Complementing Lexical Retrieval with Semantic Residual Embedding. CoRR, Vol. abs/2004.13969 (2020). showeprint[arXiv]2004.13969 https://arxiv.org/abs/2004.13969

[9]

D. Frank Hsu and Isak Taksa. 2005. Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval. Inf. Retr., Vol. 8, 3 (2005), 449--480. https://doi.org/10.1007/s10791-005--6994--4

Digital Library

[10]

Kalervo J"a rvelin and Jaana Kek"a l"a inen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., Vol. 20, 4 (2002), 422--446.

Digital Library

[11]

Oren Kurland and J. Shane Culpepper. 2018. Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2019, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 1383--1386. https://doi.org/10.1145/3209978.3210186

Digital Library

[12]

Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: a LLVM-based Python JIT compiler. In LLVM@SC. ACM, 7:1--7:6.

[13]

Joon Ho Lee. 1997. Analyses of Multiple Evidence Combination. In SIGIR. ACM, 267--276.

[14]

David Lillis, Fergus Toolan, Rem W. Collier, and John Dunnion. 2006. ProbFuse: a probabilistic approach to data fusion. In SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6--11, 2006, Efthimis N. Efthimiadis, Susan T. Dumais, David Hawking, and Kalervo J"a rvelin (Eds.). ACM, 139--146. https://doi.org/10.1145/1148170.1148197

Digital Library

[15]

David Lillis, Fergus Toolan, Rem W. Collier, and John Dunnion. 2008. Extending Probabilistic Data Fusion Using Sliding Windows. In Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 4956), Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.). Springer, 358--369. https://doi.org/10.1007/978--3--540--78646--7_33

[16]

David Lillis, Lusheng Zhang, Fergus Toolan, Rem W. Collier, David Leonard, and John Dunnion. 2010. Estimating probabilities for effective data fusion. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19--23, 2010, Fabio Crestani, Sté phane Marchand-Maillet, Hsin-Hsi Chen, Efthimis N. Efthimiadis, and Jacques Savoy (Eds.). ACM, 347--354. https://doi.org/10.1145/1835449.1835508

Digital Library

[17]

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond. Morgan & Claypool Publishers. https://doi.org/10.2200/S01123ED1V01Y202108HLT053

[18]

Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian. 2021. Simplified Data Wrangling with ir_datasets. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 2429--2436. https://doi.org/10.1145/3404835.3463254

Digital Library

[19]

Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing, Vol. 14, 9 (2011), 1--9.

[20]

Mark H. Montague and Javed A. Aslam. 2001. Relevance Score Normalization for Metasearch. In Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, November 5--10, 2001. ACM, 427--433. https://doi.org/10.1145/502585.502657

Digital Library

[21]

Mark H. Montague and Javed A. Aslam. 2002. Condorcet fusion for improved retrieval. In Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 4--9, 2002. ACM, 538--548. https://doi.org/10.1145/584792.584881

Digital Library

[22]

André Mour a o, Flá vio Martins, and Jo a o Magalh a es. 2015. Multimodal medical information retrieval with unsupervised rank fusion. Comput. Medical Imaging Graph., Vol. 39 (2015), 35--45. https://doi.org/10.1016/j.compmedimag.2014.05.006

[23]

Jo a o R. M. Palotti, Harrisen Scells, and Guido Zuccon. 2019. TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21--25, 2019, Benjamin Piwowarski, Max Chevalier, É ric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer (Eds.). ACM, 1325--1328. https://doi.org/10.1145/3331184.3331399

Digital Library

[24]

M. Elena Renda and Umberto Straccia. 2003. Web Metasearch: Rank vs. Score Based Rank Aggregation Methods. In Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), March 9--12, 2003, Melbourne, FL, USA, Gary B. Lamont, Hisham Haddad, George A. Papadopoulos, and Brajendra Panda (Eds.). ACM, 841--846. https://doi.org/10.1145/952532.952698

Digital Library

[25]

Stephen E. Robertson and Steve Walker. 1994. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3--6 July 1994 (Special Issue of the SIGIR Forum), W. Bruce Croft and C. J. van Rijsbergen (Eds.). ACM/Springer, 232--241. https://doi.org/10.1007/978--1--4471--2099--5_24

[26]

Milad Shokouhi. 2007. Segmentation of Search Engine Results for Effective Data-Fusion. In Advances in Information Retrieval, 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2--5, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4425), Giambattista Amati, Claudio Carpineto, and Giovanni Romano (Eds.). Springer, 185--197. https://doi.org/10.1007/978--3--540--71496--5_19

[27]

Shuai Wang, Shengyao Zhuang, and Guido Zuccon. 2021. BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval. In ICTIR '21: The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Canada, July 11, 2021, Faegheh Hasibi, Yi Fang, and Akiko Aizawa (Eds.). ACM, 317--324. https://doi.org/10.1145/3471158.3472233

Digital Library

[28]

Shengli Wu and Fabio Crestani. 2002. Data fusion with estimated weights. In Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 4--9, 2002. ACM, 648--651. https://doi.org/10.1145/584792.584908

Digital Library

[29]

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 1503--1512. https://doi.org/10.1145/3404835.3462880

Digital Library

Cited By

Keller JBreuer TSchaer POosterhuis HBast HXiong C(2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672530
Cachel KRundensteiner ESerra ESpezzano F(2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679649
Chen QChen JZhou XCheng GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Enhancing Dataset Search with Compact Data SnippetsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657837(1093-1103)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657837
Show More Cited By

Index Terms

ranx.fuse: A Python Library for Metasearch
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Combination, fusion and federated search

Recommendations

Building efficient and effective metasearch engines

Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support ...
Rank aggregation using ant colony approach for metasearch

Metasearch engines provide a plethora of information to the user through World Wide Web. They are the prominent sources of query-based search and centralized human---world interactions. Metasearch engine shows a list of Web sites to a particular query ...
Advanced metasearch engines
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

A metasearch engine is a system, which is connected to different search engines. In response to a user query, it invokes suitable search engines for the query, merges the information returned by these search engines and output the merged result. There ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

5274 pages

ISBN:9781450392365

DOI:10.1145/3511808

General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM '22

Sponsor:

CIKM '22: The 31st ACM International Conference on Information and Knowledge Management

October 17 - 21, 2022

GA, Atlanta, USA

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;

Overall Acceptance Rate 823 of 3,288 submissions, 25%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
876
Total Downloads

Downloads (Last 12 months)391
Downloads (Last 6 weeks)22

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Keller JBreuer TSchaer POosterhuis HBast HXiong C(2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672530
Cachel KRundensteiner ESerra ESpezzano F(2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679649
Chen QChen JZhou XCheng GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Enhancing Dataset Search with Compact Data SnippetsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657837(1093-1103)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657837
Askari AAbolghasemi APasi GKraaij WVerberne S(2024)Injecting the score of the first-stage retriever as text improves BERT-based re-rankersDiscover Computing10.1007/s10791-024-09435-827:1Online publication date: 26-Jun-2024
https://doi.org/10.1007/s10791-024-09435-8
Bałchanowski MBoryczka U(2023)A Comparative Study of Rank Aggregation Methods in Recommendation SystemsEntropy10.3390/e2501013225:1(132)Online publication date: 9-Jan-2023
https://doi.org/10.3390/e25010132
Bassani ETonellotto NPasi G(2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3624988
Kasela PPasi GPerego R(2023)SE-PEF: a Resource for Personalized Expert FindingProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625335(288-309)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625335
Bassani EChen HDuh WHuang HKato MMothe JPoblete B(2023)ranxhub: An Online Repository for Information Retrieval RunsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591823(3210-3214)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591823
Breuer TKreutz CSchaer PTunger D(2023)Bibliometric Data Fusion for Biomedical Information Retrieval2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL57899.2023.00026(107-118)Online publication date: Jun-2023
https://doi.org/10.1109/JCDL57899.2023.00026
Bassani EKasela PRaganato APasi GAl Hasan MXiong L(2022)A Multi-Domain Benchmark for Personalized Search EvaluationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557536(3822-3827)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557536

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents