research-article

FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation

Authors:

Ekaterina Khramtsova,

Shengyao Zhuang,

Guido ZucconAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 763 - 773

https://doi.org/10.1145/3626772.3657853

Published: 11 July 2024 Publication History

Abstract

Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent. With the increasing uptake of Retrieval-Augmented Generation (RAG) pipelines, federated search can play a pivotal role in sourcing relevant information across heterogeneous data sources to generate informed responses. However, existing datasets, such as those developed in the past TREC FedWeb tracks, predate the RAG paradigm shift and lack representation of modern information retrieval challenges.

To bridge this gap, we present FeB4RAG, a novel dataset specifically designed for federated search within RAG frameworks. This dataset, derived from 16 sub-collections of the widely used BEIR benchmarking collection, includes 790 information requests (akin to conversational queries) tailored for chatbot applications, along with top results returned by each resource and associated LLM-derived relevance judgements. Additionally, to support the need for this collection, we demonstrate the impact on response generation of a high quality federated search system for RAG compared to a naive approach to federated search. We do so by comparing answers generated by the RAG pipeline with a qualitative side-by-side comparison. Our collection fosters and supports the development and evaluation of new federated search methods, especially in the context of RAG pipelines. The resource is publicly available at https://github.com/ielab/FeB4RAG.

References

[1]

Suresh K Bhavnani and Concepción S Wilson. 2009. Information scattering. Encyclopedia of library and information sciences (2009), 2564--2569.

[2]

Alexander Bondarenko, Maik Fröbe, Meriem Beloucif, Lukas Gienapp, Yamen Ajjour, Alexander Panchenko, Chris Biemann, Benno Stein, Henning Wachsmuth, Martin Potthast, et al. 2020. Overview of Touché 2020: argument retrieval. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22-25, 2020, Proceedings 11. Springer, 384--395.

[3]

Jamie Callan. 2002. Distributed information retrieval. In Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Springer, 127--150.

[4]

Jamie Callan, Mark Hoy, Changkuk Yoo, and Le Zhao. 2009. Clueweb09 data set.

[5]

Harrison Chase. 2022. LangChain. https://github.com/langchain-ai/langchain

[6]

Charles LA Clarke, Nick Craswell, Ian Soboroff, et al. 2004. Overview of the TREC 2004 Terabyte Track. In TREC, Vol. 4. 74.

[7]

Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S Weld. 2020. SPECTER: Document-level Representation Learning using Citation-informed Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2270--2282.

[8]

Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems. arXiv preprint arXiv:2401.14887 (2024).

[9]

Zhuyun Dai, Yubin Kim, and Jamie Callan. 2017. Learning to rank resources. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 837--840.

Digital Library

[10]

Thomas Demeester, D Trieschnigg, D Nguyen, and D Hiemstra. 2013. Overview of the TREC 2013 federated web search track. In Text Retrieval Conference (TREC-2013). 1--11.

[11]

Thomas Demeester, Dolf Trieschnigg, Dong Nguyen, Djoerd Hiemstra, and Ke Zhou. 2014. Overview of the trec 2014 federated web search track. In Proceedings of The Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, November 19-21, 2014.

[12]

Thomas Demeester, Dolf Trieschnigg, Dong Nguyen, Djoerd Hiemstra, and Ke Zhou. 2015. FedWeb greatest hits: Presenting the new test collection for federated web search. In Proceedings of the 24th International Conference on World Wide Web. 27--28.

Digital Library

[13]

Ulugbek Ergashev, Eduard Dragut, and Weiyi Meng. 2023. Learning To Rank Resources with GNN. In Proceedings of the ACM Web Conference 2023. 3247--3256.

Digital Library

[14]

Guglielmo Faggioli, Laura Dietz, Charles LA Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, et al. 2023. Perspectives on large language models for relevance judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval. 39--50.

Digital Library

[15]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo, et al. 2022. Pre-training methods in information retrieval. Foundations and Trends® in Information Retrieval, Vol. 16, 3 (2022), 178--317.

[16]

Adamu Garba, Shengli Wu, and Shah Khalid. 2023. Federated search techniques: an overview of the trends and state of the art. Knowledge and Information Systems, Vol. 65, 12 (2023), 5065--5095.

Digital Library

[17]

Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guide Zucoon, Benno Stein, et al. 2023. Evaluating Generative Ad Hoc Information Retrieval. arXiv preprint arXiv:2311.04694 (2023).

[18]

Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2020. A deep look into neural ranking models for information retrieval. Information Processing & Management, Vol. 57, 6 (2020), 102067.

[19]

Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, and Matei Zaharia. 2022. Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP. arXiv preprint arXiv:2212.14024 (2022).

[20]

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv preprint arXiv:2310.03714 (2023).

[21]

Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, and Sunghun Kim. 2023. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling. arxiv: 2312.15166 [cs.CL]

[22]

Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon. 2022. To interpolate or not to interpolate: Prf, dense and sparse retrievers. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2495--2500.

Digital Library

[23]

Xianming Li and Jing Li. 2023. AnglE-optimized Text Embeddings. arXiv preprint arXiv:2309.12871 (2023).

[24]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281 (2023).

[25]

Jerry Liu. 2022. LlamaIndex. https://doi.org/10.5281/zenodo.1234

[26]

Bhaskar Mitra, Nick Craswell, et al. 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval, Vol. 13, 1 (2018), 1--126.

[27]

André Mourao, Flávio Martins, and Joao Magalhaes. 2013. NovaSearch at TREC 2013 Federated Web Search Track: Experiments with rank fusion. In TREC.

[28]

Niklas Muennighoff. 2022. SGPT: GPT Sentence Embeddings for Semantic Search. arXiv preprint arXiv:2202.08904 (2022).

[29]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316 (2022).

[30]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. choice, Vol. 2640 (2016), 660.

[31]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290 (2023).

[32]

Milad Shokouhi. 2007. Central-rank-based collection selection in uncooperative distributed information retrieval. In European Conference on Information Retrieval. Springer, 160--172.

[33]

Milad Shokouhi, Luo Si, et al. 2011. Federated search. Foundations and Trends® in Information Retrieval, Vol. 5, 1 (2011), 1--102.

[34]

Luo Si and Jamie Callan. 2003. Relevant document distribution estimation method for resource selection. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 298--305.

Digital Library

[35]

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A Smith, Luke Zettlemoyer, and Tao Yu. 2022. One embedder, any task: Instruction-finetuned text embeddings. arXiv preprint arXiv:2212.09741 (2022).

[36]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).

[37]

Paul Thomas and David Hawking. 2006. Evaluation by comparing result sets in context. In Proceedings of the 15th ACM international conference on Information and knowledge management. 94--101.

Digital Library

[38]

Paul Thomas and Milad Shokouhi. 2009. Sushi: Scoring scaled samples for server selection. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 419--426.

Digital Library

[39]

Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2023. Large language models can accurately predict searcher preferences. arXiv preprint arXiv:2309.10621 (2023).

[40]

Kien Tjin-Kam-Jet and Djoerd Hiemstra. 2010. Learning to merge search results for efficient distributed information retrieval. (2010).

[41]

Nicola Tonellotto. 2022. Lecture notes on neural information retrieval. arXiv preprint arXiv:2207.13443 (2022).

[42]

Mohamed Trabelsi, Zhiyu Chen, Brian D Davison, and Jeff Heflin. 2021. Neural ranking models for document retrieval. Information Retrieval Journal, Vol. 24 (2021), 400--444.

Digital Library

[43]

Henning Wachsmuth, Martin Trenkmann, Benno Stein, Gregor Engels, and Tsvetomira Palakarska. 2014. A Review Corpus for Argumentation Analysis. In Proceedings of the 15th International Conference on Intelligent Text Processing and Computational Linguistics (Kathmandu, Nepal), Alexander Gelbukh (Ed.). Springer, Berlin Heidelberg New York, 115--127. https://doi.org/10.1007/978-3-642-54903-8_10

Digital Library

[44]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training. arXiv preprint arXiv:2212.03533 (2022).

[45]

Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, and Guido Zuccon. 2024. Zero-shot Generative Large Language Models for Systematic Review Screening Automation. arXiv preprint arXiv:2401.06320 (2024).

[46]

Shuai Wang, Shengyao Zhuang, Bevan Koopman, and Guido Zuccon. 2024. ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search. arXiv preprint arXiv:2401.17645 (2024).

[47]

Shuai Wang, Shengyao Zhuang, and Guido Zuccon. 2021. Bert-based dense retrievers require interpolation with bm25 for effective passage retrieval. In Proceedings of the 2021 ACM SIGIR international conference on theory of information retrieval. 317--324.

Digital Library

[48]

Jinxi Xu and W Bruce Croft. 1999. Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 254--261.

Digital Library

[49]

Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen. 2023. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107 (2023).

[50]

Honglei Zhuang, Zhen Qin, Kai Hui, Junru Wu, Le Yan, Xuanhui Wang, and Michael Berdersky. 2023. Beyond yes and no: Improving zero-shot llm rankers via scoring fine-grained relevance labels. arXiv preprint arXiv:2310.14122 (2023).

[51]

Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, and Guido Zuccon. 2023. A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models. arXiv preprint arXiv:2310.09497 (2023).

Index Terms

FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
1. Information systems
  1. Data management systems
    1. Information integration
      1. Federated databases
  2. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections
    2. Retrieval models and ranking
      1. Combination, fusion and federated search

Recommendations

Federated search in the wild: the combined power of over a hundred search engines
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. ...
From federated to aggregated search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Federated search refers to the brokered retrieval of content from a set of auxiliary retrieval systems instead of from a single, centralized retrieval system. Federated search tasks occur in, for example, digital libraries (where documents from several ...
Updating collection representations for federated search
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

To facilitate the search for relevant information across a setof online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
207
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)79

Reflects downloads up to 18 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents