Nothing Special   »   [go: up one dir, main page]

Skip to main content

ChatGPT Goes Shopping: LLMs Can Predict Relevance in eCommerce Search

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Abstract

The dependence on human relevance judgments limits the development of information retrieval test collections that are vital for evaluating these systems. Since their launch, large language models (LLMs) have been applied to automate several human tasks. Recently, LLMs started being used to provide relevance judgments for document search. In this work, our goal is to assess whether LLMs can replace human annotators in a different setting – product search in eCommerce. We conducted experiments on open and proprietary industrial datasets to measure LLM’s ability to predict relevance judgments. Our results found that LLM-generated relevance assessments present a strong agreement (\(\sim \)82%) with human annotations indicating that LLMs have an innate ability to perform relevance judgments in an eCommerce setting. Then, we went further and tested whether LLMs can generate annotation guidelines. Our results found that relevance assessments obtained with LLM-generated guidelines are as accurate as the ones obtained from human instructions.\(^1\)(The source code for this work is available at https://github.com/danimtk/chatGPT-goes-shopping)

B. Soviero and D. Kuhn—Work conducted during an internship at VTEX.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blanco, R., et al.: Repeatable and reliable search system evaluation using crowdsourcing. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 923–932 (2011)

    Google Scholar 

  2. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  3. Carterette, B., Allan, J., Sitaraman, R.: Minimal test collections for retrieval evaluation. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 268–275 (2006)

    Google Scholar 

  4. Chen, Y., Liu, S., Liu, Z., Sun, W., Baltrunas, L., Schroeder, B.: WANDS: dataset for product search relevance assessment. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13185, pp. 128–141. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99736-6_9

    Chapter  Google Scholar 

  5. Cleverdon, C.W.: The ASLIB cranfield research project on the comparative efficiency of indexing systems. In: ASLIB Proceedings, vol. 12, pp. 421–431. MCB UP Ltd. (1960)

    Google Scholar 

  6. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.747, https://aclanthology.org/2020.acl-main.747

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  8. Faggioli, G., et al.: Perspectives on large language models for relevance judgment. In: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 39–50 (2023)

    Google Scholar 

  9. Harman, D., Voorhees, E.: Overview of the eighth text retrieval conference (TREC-8). In: Proceedings of the Eight Text Retrieval Conference (TREC-8), pp. 1–19 (1999)

    Google Scholar 

  10. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: ACM SIGIR Forum, vol. 51, pp. 4–11. ACM New York, NY, USA (2017)

    Google Scholar 

  11. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2020). https://openreview.net/forum?id=SyxS0T4tvS

  12. Lima de Oliveira, L., Romeu, R.K., Moreira, V.P.: REGIS: a test collection for geoscientific documents in Portuguese. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2363–2368 (2021)

    Google Scholar 

  13. Ouyang, L., et al.: Training language models to follow instructions with human feedback. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744 (2022)

    Google Scholar 

  14. Sanderson, M., et al.: Test collection based evaluation of information retrieval systems. Found. Trends® Inf. Retrieval 4(4), 247–375 (2010)

    Google Scholar 

  15. Schick, T., Schütze, H.: It’s not just size that matters: small language models are also few-shot learners. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2339–2352. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.185, https://aclanthology.org/2021.naacl-main.185

  16. Sondhi, P., Sharma, M., Kolari, P., Zhai, C.: A taxonomy of queries for e-commerce search. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1245–1248 (2018)

    Google Scholar 

  17. Spark-Jones, K., van Rijsbergen, C.J.: Report on the need for and provision of an “ideal” information retrieval test collection. University of Cambridge, Computer Laboratory (1975)

    Google Scholar 

  18. Thomas, P., Spielman, S., Craswell, N., Mitra, B.: Large language models can accurately predict searcher preferences. arXiv preprint arXiv:2309.10621 (2023)

  19. Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Inform. Process. Manag. 36(5), 697–716 (2000)

    Article  Google Scholar 

  20. Voorhees, E.M., et al.: Overview of the TREC 2003 robust retrieval track. In: Proceedings of the Text Retrieval Conference, pp. 69–77 (2003)

    Google Scholar 

  21. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=rJ4km2R5t7

  22. Xu, L., et al.: FewCLUE: a Chinese few-shot learning evaluation benchmark (2021)

    Google Scholar 

Download references

Acknowledgments

The authors thank Shervin Malmasi for his helpful comments and suggestions. This work has been financed in part by VTEX BRASIL (EMBRAPII PCEE1911.0140), CAPES Finance Code 001, and CNPq/Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viviane Pereira Moreira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Soviero, B., Kuhn, D., Salle, A., Moreira, V.P. (2024). ChatGPT Goes Shopping: LLMs Can Predict Relevance in eCommerce Search. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14611. Springer, Cham. https://doi.org/10.1007/978-3-031-56066-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56066-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56065-1

  • Online ISBN: 978-3-031-56066-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics