research-article

Open access

The Power of Noise: Redefining Retrieval for RAG Systems

Authors:

Florin Cuconasu,

Giovanni Trappolini,

Federico Siciliano,

Cesare Campagnano,

Nicola Tonellotto,

Fabrizio SilvestriAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 719 - 729

https://doi.org/10.1145/3626772.3657834

Published: 11 July 2024 Publication History

Abstract

Retrieval-Augmented Generation (RAG) has recently emerged as a method to extend beyond the pre-trained knowledge of Large Language Models by augmenting the original prompt with relevant passages or documents retrieved by an Information Retrieval (IR) system. RAG has become increasingly important for Generative AI solutions, especially in enterprise settings or in any domain in which knowledge is constantly refreshed and cannot be memorized in the LLM. We argue here that the retrieval component of RAG systems, be it dense or sparse, deserves increased attention from the research community, and accordingly, we conduct the first comprehensive and systematic examination of the retrieval strategy of RAG systems. We focus, in particular, on the type of passages IR systems within a RAG solution should retrieve. Our analysis considers multiple factors, such as the relevance of the passages included in the prompt context, their position, and their number. One counter-intuitive finding of this work is that the retriever's highest-scoring documents that are not directly relevant to the query (e.g., do not contain the answer) negatively impact the effectiveness of the LLM. Even more surprising, we discovered that adding random documents in the prompt improves the LLM accuracy by up to 35%. These results highlight the need to investigate the appropriate strategies when integrating retrieval with LLMs, thereby laying the groundwork for future research in this area.

References

[1]

Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, and Guilherme Penedo. 2023. The Falcon Series of Open Language Models. arxiv: 2311.16867 [cs.CL]

[2]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arxiv: 2310.11511 [cs.CL]

[3]

Giuseppe Attanasio, Debora Nozza, Dirk Hovy, and Elena Baralis. 2022. Entropy-based attention regularization frees unintended bias mitigation from lists.

[4]

Andrea Bacciu, Florin Cuconasu, Federico Siciliano, Fabrizio Silvestri, Nicola Tonellotto, and Giovanni Trappolini. 2023 a. RRAML: Reinforced Retrieval Augmented Machine Learning. In Proceedings of the Discussion Papers - 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023 DP) co-located with 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023), Rome, Italy, November 6--9, 2023 (CEUR Workshop Proceedings, Vol. 3537), Roberto Basili, Domenico Lembo, Carla Limongelli, and Andrea Orlandini (Eds.). CEUR-WS.org, 29--37. https://ceur-ws.org/Vol-3537/paper4.pdf

[5]

Andrea Bacciu, Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, and Fabrizio Silvestri. 2023 b. Fauno: The Italian Large Language Model that will leave you senza parole!. In Proceedings of the 13th Italian Information Retrieval Workshop (IIR 2023), Pisa, Italy, June 8--9, 2023 (CEUR Workshop Proceedings, Vol. 3448), Franco Maria Nardini, Nicola Tonellotto, Guglielmo Faggioli, and Antonio Ferrara (Eds.). CEUR-WS.org, 9--17. https://ceur-ws.org/Vol-3448/paper-24.pdf

[6]

Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, and Thomas Wolf. 2023. Open LLM Leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.

[7]

Parishad BehnamGhader, Santiago Miret, and Siva Reddy. 2023. Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 15492--15509. https://doi.org/10.18653/v1/2023.findings-emnlp.1036

[8]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, Baltimora, 2206--2240.

[9]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.

[10]

Yiming Cui, Ziqing Yang, and Xin Yao. 2023. Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. arXiv preprint arXiv:2304.08177 (2023). https://arxiv.org/abs/2304.08177

[11]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arxiv: 2401.08281 [cs.LG]

[12]

Garrachonr. 2023. LlamaDos. https://github.com/Garrachonr/LlamaDos.

[13]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, Vienna, 3929--3938.

[14]

David T Hoffmann, Simon Schrodi, Nadine Behrmann, Volker Fischer, and Thomas Brox. 2023. Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems.

[15]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2021. Unsupervised dense information retrieval with contrastive learning.

[16]

Mojan Javaheripi, Sébastien Bubeck, Marah Abdin, Jyoti Aneja, Sebastien Bubeck, Caio César Teodoro Mendes, Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, et al. 2023. Phi-2: The surprising power of small language models.

[17]

jphme. 2023. Llama-2--13b-chat-german. https://huggingface.co/jphme/Llama-2--13b-chat-german.

[18]

Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large language models struggle to learn long-tail knowledge. In Proceedings of the 40th International Conference on Machine Learning (ICML'23). JMLR.org, Honolulu, Hawaii, USA, Article 641, bibinfonumpages12 pages.

Digital Library

[19]

Vladimir Karpukhin, Barlas Oug uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020a. Dense passage retrieval for open-domain question answering.

[20]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020b. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 6769--6781. https://doi.org/10.18653/v1/2020.emnlp-main.550

[21]

Zixuan Ke, Weize Kong, Cheng Li, Mingyang Zhang, Qiaozhu Mei, and Michael Bendersky. 2024. Bridging the Preference Gap between Retrievers and LLMs. arXiv preprint arXiv:2401.06954 (2024).

[22]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. Association for Computational Linguistic, Minneapolis, 2.

[23]

Urvashi Khandelwal, He He, Peng Qi, and Dan Jurafsky. 2018. Sharp nearby, fuzzy far away: How neural language models use context.

[24]

Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. ACM, Xi'an, 39--48.

Digital Library

[25]

Bevan Koopman and Guido Zuccon. 2023. Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 15012--15022. https://doi.org/10.18653/v1/2023.emnlp-main.928

[26]

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 452--466. https://doi.org/10.1162/tacl_a_00276

[27]

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Llu'i s Mà rquez (Eds.). Association for Computational Linguistics, Florence, 6086--6096. https://doi.org/10.18653/V1/P19--1612

[28]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rockt"aschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 9459--9474.

[29]

Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. 2023. Textbooks are all you need ii: phi-1.5 technical report.

[30]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2023. Lost in the middle: How language models use long contexts.

[31]

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. 2022. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 8086--8098. https://doi.org/10.18653/v1/2022.acl-long.556

[32]

C Manning, P Raghavan, and H Schutze. 2008. Term weighting, and the vector space model. Cambridge University Press Cambridge, Cambridge. 109--133 pages.

[33]

Grégoire Mialon, Roberto Dess`i, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et al. 2023. Augmented language models: a survey.

[34]

Sewon Min, Julian Michael, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2020. AmbigQA: Answering Ambiguous Open-domain Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 5783--5797. https://doi.org/10.18653/v1/2020.emnlp-main.466

[35]

Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, and Julien Launay. 2023. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arxiv: 2306.01116 [cs.CL]

[36]

Ofir Press, Noah Smith, and Mike Lewis. 2022. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. https://openreview.net/forum?id=R8sQPpGCv0

[37]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training.

[38]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.

[39]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models.

[40]

Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, Vol. 3, 4 (2009), 333--389.

Digital Library

[41]

Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Eric Michael Smith, Y-Lan Boureau, and Jason Weston. 2021. Recipes for Building an Open-Domain Chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, Online, 300--325. https://doi.org/10.18653/v1/2021.eacl-main.24

[42]

Gerard Salton and Michael J. McGill. 1983. Introduction to modern information retrieval. McGraw-Hill (1983).

Digital Library

[43]

Andrea Santilli and Emanuele Rodolà. 2023. Camoscio: an Italian Instruction-tuned LLaMA. arxiv: 2307.16456 [cs.CL]

[44]

Artsiom Sauchuk, James Thorne, Alon Halevy, Nicola Tonellotto, and Fabrizio Silvestri. 2022. On the Role of Relevance in Natural Language Processing Tasks. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Madrid, 1785--1789.

Digital Library

[45]

Noam Shazeer. 2019. Fast Transformer Decoding: One Write-Head is All You Need. arxiv: 1911.02150 [cs.NE]

[46]

Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, and Mohit Iyyer. 2021. Do long-range language models actually use long-range context?

[47]

MosaicML NLP Team et al. 2023. Introducing mpt-7b: A new standard for open-source, ly usable llms.

[48]

Gabriele Tolomei, Cesare Campagnano, Fabrizio Silvestri, and Giovanni Trappolini. 2023. Prompt-to-OS (P2OS): Revolutionizing Operating Systems and Human-Computer Interaction with Integrated AI Generative Models. In 5th IEEE International Conference on Cognitive Machine Intelligence, CogMI 2023, Atlanta, GA, USA, November 1--4, 2023. IEEE, 128--134. https://doi.org/10.1109/COGMI58952.2023.00027

[49]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023 a. Llama: Open and efficient foundation language models.

[50]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023 b. Llama 2: Open foundation and fine-tuned chat models.

[51]

Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, Alon Y. Halevy, and Fabrizio Silvestri. 2023. Multimodal Neural Databases. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23--27, 2023, Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete (Eds.). ACM, 2619--2628. https://doi.org/10.1145/3539618.3591930

Digital Library

[52]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., Long Beach. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

[53]

Michael Völske, Martin Potthast, Shahbaz Syed, and Benno Stein. 2017. TL;DR: Mining Reddit to Learn Automatic Summarization. In Proceedings of the Workshop on New Frontiers in Summarization, Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, and Fei Liu (Eds.). Association for Computational Linguistics, Copenhagen, Denmark, 59--63. https://doi.org/10.18653/v1/W17--4508

[54]

Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, and Dacheng Tao. 2024. OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models. arXiv preprint arXiv:2401.06628 (2024).

[55]

Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, and Yu Su. 2023. Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts. In The Twelfth International Conference on Learning Representations.

[56]

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. 2024. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. arxiv: 2404.07972 [cs.AI]

[57]

Andrew Yates, Rodrigo Nogueira, and Jimmy Lin. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials, Greg Kondrak, Kalina Bontcheva, and Dan Gillick (Eds.). Association for Computational Linguistics, Online, 1--4. https://doi.org/10.18653/v1/2021.naacl-tutorials.1

[58]

Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M Susskind. 2023. Stabilizing transformer training by preventing attention entropy collapse. In International Conference on Machine Learning. PMLR, PMLR, Hawaii, 40770--40803.

[59]

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). New York, NY, USA, 1503--1512.

Digital Library

[60]

Guido Zuccon, Bevan Koopman, and Razia Shaik. 2023. ChatGPT Hallucinates when Attributing Answers. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (Beijing, China) (SIGIR-AP '23). Association for Computing Machinery, New York, NY, USA, 46--51. https://doi.org/10.1145/3624918.3625329

Digital Library

Cited By

Zhang WZhang J(2025)Hallucination Mitigation for Retrieval-Augmented Large Language Models: A ReviewMathematics10.3390/math1305085613:5(856)Online publication date: 4-Mar-2025
https://doi.org/10.3390/math13050856
Huang LYu WMa WZhong WFeng ZWang HChen QPeng WFeng XQin BLiu T(2025)A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open QuestionsACM Transactions on Information Systems10.1145/370315543:2(1-55)Online publication date: 24-Jan-2025
https://doi.org/10.1145/3703155
Elfayoumi MAbouElazm MMohamed OAbuhmed TEl-Sappagh S(2025)Knowledge Augmented Significant Language Model-Based Chatbot for Explainable Diabetes Mellitus Prediction2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM64595.2025.10857525(1-8)Online publication date: 3-Jan-2025
https://doi.org/10.1109/IMCOM64595.2025.10857525
Show More Cited By

Index Terms

The Power of Noise: Redefining Retrieval for RAG Systems
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Novelty in information retrieval

Recommendations

Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

Successful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Incorporating rich features to boost information retrieval performance

Research highlights We propose a regression-based re-ranking framework that can take into account rich features for boosting information retrieval (IR) performance. A set of salient features that may affect IR performance are investigated. Extensive ...
Exploration of query context for information retrieval
WWW '07: Proceedings of the 16th international conference on World Wide Web

A number of existing information retrieval systems propose the notion of query context to combine the knowledge of query and user into retrieval to reveal the most exact description of user's information needs. In this paper we interpret query context ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing
PNRR
Italian Ministry of Education and Research

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
7,237
Total Downloads

Downloads (Last 12 months)7,237
Downloads (Last 6 weeks)1,315

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang WZhang J(2025)Hallucination Mitigation for Retrieval-Augmented Large Language Models: A ReviewMathematics10.3390/math1305085613:5(856)Online publication date: 4-Mar-2025
https://doi.org/10.3390/math13050856
Huang LYu WMa WZhong WFeng ZWang HChen QPeng WFeng XQin BLiu T(2025)A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open QuestionsACM Transactions on Information Systems10.1145/370315543:2(1-55)Online publication date: 24-Jan-2025
https://doi.org/10.1145/3703155
Elfayoumi MAbouElazm MMohamed OAbuhmed TEl-Sappagh S(2025)Knowledge Augmented Significant Language Model-Based Chatbot for Explainable Diabetes Mellitus Prediction2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM64595.2025.10857525(1-8)Online publication date: 3-Jan-2025
https://doi.org/10.1109/IMCOM64595.2025.10857525
Setyawan Soekamto YChristopher Limanjaya LKaleb Purwanto YKang D(2025)From Queries to Courses: SKYRAG’s Revolution in Learning Path Generation via Keyword-Based Document RetrievalIEEE Access10.1109/ACCESS.2025.353561813(21434-21455)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3535618
Xu KZhang KLi JHuang WWang Y(2024)CRP-RAG: A Retrieval-Augmented Generation Framework for Supporting Complex Logical Reasoning and Knowledge PlanningElectronics10.3390/electronics1401004714:1(47)Online publication date: 26-Dec-2024
https://doi.org/10.3390/electronics14010047
Petroni FSiciliano FSilvestri FTrappolini G(2024)Report on the 1st Workshop on Information Retrieval's Role in RAG Systems (IR-RAG 2024) at SIGIR 2024ACM SIGIR Forum10.1145/3722449.372246358:2(1-12)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1145/3722449.3722463
Azzopardi LClarke CKantor PMitra BTrippas JRen ZAliannejadi MArabzadeh NChandrasekar Rde Rijke MEustratiadis PHersh WHuang JKanoulas EKareem JLi YLupart SMekonnen KRoegiest ASoboroff ISilvestri FVerberne SVos DYang EZhao Y(2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687288
Soudani HKanoulas EHasibi FSakai TIshita EOhshima HHasibi FMao JJose J(2024)Fine Tuning vs. Retrieval Augmented Generation for Less Popular KnowledgeProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698415(12-22)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698415
Wang CSu WHu YAi QWu YLuo CLiu YZhang MMa SSakai TIshita EOhshima HHasibi FMao JJose J(2024)LeKUBE: A Knowledge Update BEnchmark for Legal DomainProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698407(175-185)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698407
Huly OPogrebinsky ICarmel DKurland OMaarek YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Old IR Methods Meet RAGProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657935(2559-2563)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657935
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten