research-article

Statistical biases in Information Retrieval metrics for recommender systems

Authors:

Alejandro Bellogín,

Pablo Castells,

Iván CantadorAuthors Info & Claims

Information Retrieval Journal, Volume 20, Issue 6

Pages 606 - 634

https://doi.org/10.1007/s10791-017-9312-z

Published: 01 December 2017 Publication History

Abstract

There is an increasing consensus in the Recommender Systems community that the dominant error-based evaluation metrics are insufficient, and mostly inadequate, to properly assess the practical effectiveness of recommendations. Seeking to evaluate recommendation rankings—which largely determine the effective accuracy in matching user needs—rather than predicted rating values, Information Retrieval metrics have started to be applied for the evaluation of recommender systems. In this paper we analyse the main issues and potential divergences in the application of Information Retrieval methodologies to recommender system evaluation, and provide a systematic characterisation of experimental design alternatives for this adaptation. We lay out an experimental configuration framework upon which we identify and analyse specific statistical biases arising in the adaptation of Information Retrieval metrics to recommendation tasks, namely sparsity and popularity biases. These biases considerably distort the empirical measurements, hindering the interpretation and comparison of results across experiments. We develop a formal characterisation and analysis of the biases upon which we analyse their causes and main factors, as well as their impact on evaluation metrics under different experimental configurations, illustrating the theoretical findings with empirical evidence. We propose two experimental design approaches that effectively neutralise such biases to a large extent. We report experiments validating our proposed experimental variants, and comparing them to alternative approaches and metrics that have been defined in the literature with similar or related purposes.

References

[1]

Armstrong, T. G., Moffat, A., Webber, W., & Zobel, J. (2009a). Has ad-hoc retrieval improved since 1994? In Proceedings of the 32nd ACM conference on Research and development in Information Retrieval, SIGIR’09. ACM, pp. 692–693.

[2]

Armstrong, T. G., Moffat, A., Webber, W., & Zobel, J. (2009b). Improvements that don’t add up: Ad hoc retrieval results since 1998. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM’09. ACM, pp. 601–610.

[3]

Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2^nd ed.) (ACM Press Books). Addison-Wesley Professional.

[4]

Barbieri, N., Costa, G., Manco, G., & Ortale, R. (2011). Modeling item selection and relevance for accurate recommendations: a bayesian approach. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 21–28.

[5]

Basu, C., Hirsh, H., & Cohen, W. W. (1998). Recommendation as classification: Using social and content-based information in recommendation. In Proceedings of AAAI/IAAI’98, pp. 714–720.

[6]

Bellogín, A., Castells, P., & Cantador, I. (2011). Precision-oriented evaluation of recommender systems: an algorithmic comparison. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 333–336.

[7]

Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th annual conference on uncertainty in artificial intelligence, UAI’98, pp. 43–52.

[8]

Buckley C, Dimmick D, Soboroff I, and Voorhees EM Bias and the limits of pooling for large collections Information Retrieval 2007 10 6 491-508

[9]

Buckley, C., & Voorhees, E. M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th ACM conference on research and development in information retrieval, SIGIR’04. ACM, pp. 25–32.

[10]

Cañamares, R., & Castells, P. (2014). Exploring social network effects on popularity biases in recommender systems. In Proceedings of the 6th workshop on recommender systems and the social web, RSWeb’14, at the 8th ACM conference on recommender systems, RecSys’14.

[11]

Cañamares, R., & Castells, P. (2017). A probabilistic reformulation of memory-based collaborative filtering—Implications on popularity biases. In Proceedings of the 40th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’17. ACM.

[12]

Celma O Music recommendation and discovery: The long tail, long fail, and long play in the digital music space 2010 1 Berlin Springer

[13]

Celma, O., & Cano, P. (2008). From hits to niches? Or how popular artists can bias music recommendation and discovery. In NETFLIX’08: Proceedings of the 2nd KDD workshop on large-scale recommender systems and the netflix prize competition. ACM, pp. 1–8.

[14]

Celma, O., & Herrera, P. (2008). A new approach to evaluating novel recommendations. In Proceedings of the 2nd ACM conference on recommender systems, RecSys’08. ACM, pp. 179–186.

[15]

Chen, L., & Pan, W. (2013). Cofiset: Collaborative filtering via learning pairwise preferences over item-sets. In Proceedings of the 13th SIAM international conference on data mining, pp. 180–188.

[16]

Cremonesi, P., Koren, Y., & Turrin, R. (2010). Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the 4th ACM conference on recommender systems, RecSys’10. ACM, pp. 39–46.

[17]

Fleder, D. M. and Hosanagar, K. (2007). Recommender systems and their impact on sales diversity. In Proceedings 8th ACM conference on electronic commerce (EC’07), pp. 192–199.

[18]

Harper FM and Konstan JA The movielens datasets: History and context ACM Transactions on Interactive Intelligent Systems 2016 5 4 19

[19]

Herlocker JL, Konstan JA, Terveen LG, and Riedl JT Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems 2004 22 1 5-53

[20]

Hofmann T Latent semantic models for collaborative filtering ACM Transactions on Information Systems 2004 22 1 89-115

[21]

Jambor, T., & Wang, J. (2010a). Goal-driven collaborative filtering—A directional error based approach. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, & K. Rijsbergen (Eds.), Advances in information retrieval (vol. 5993, chapter 36, pp. 407–419). Springer.

[22]

Jambor, T., & Wang, J. (2010b). Optimizing multiple objectives in collaborative filtering. In Proceedings of the 4th ACM conference on recommender systems, RecSys’10. ACM, pp. 55–62.

[23]

Jannach D, Lerche L, Kamehkhosh I, and Jugovac M What recommenders recommend: an analysis of recommendation biases and possible countermeasures User Modeling and User-Adapted Interaction 2015 25 5 427-491

[24]

Kluver, D., & Konstan, J. A. (2014). Evaluating recommender behavior for new users. In Proceedings of the 8th ACM conference on recommender systems, RecSys’14. ACM, pp. 121–128.

[25]

Koren Y, Bell R, and Volinsky C Matrix factorization techniques for recommender systems Computer 2009 42 8 30-37

[26]

Levy, M., & Bosteels, K. (2010). Music recommendation and the long tail. In 1st workshop on music recommendation and discovery, WOMRAD’10, at the 4th ACM conference on recommender systems, RecSys’10.

[27]

Pradel, B., Usunier, N., & Gallinari, P. (2012). Ranking with non-random missing ratings: Influence of popularity and positivity on evaluation metrics. In Proceedings of the 6th ACM conference on recommender systems, RecSys’12. ACM, pp. 147–154.

[28]

Shani, G., Chickering, D. M., & Meek, C. (2008). Mining recommendations from the web. In Proceedings of the 2nd ACM conference on recommender systems, RecSys’08. ACM, pp. 35–42.

[29]

Shani, G., & Gunawardana, A. (2011). Evaluating recommendation systems. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (chapter 8, pp. 257–297). Springer.

[30]

Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., Oliver, N., & Hanjalic, A. (2012). Climf: Learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of the 6th ACM conference on recommender systems, RecSys’12. ACM, pp. 139–146.

[31]

Shi, Y., Serdyukov, P., Hanjalic, A., & Larson, M. (2011). Personalized landmark recommendation based on geotags from photo sharing sites. In Proceedings of the 5th international conference on weblogs and social media.

[32]

Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 125–132.

[33]

Steck, H., & Xin, Y. (2010). A generalized probabilistic framework and its variants for training top-k recommender system. In Proceedings of the workshop on the practical use of recommender systems, algorithms and technologies, PRSAT’10, pp. 35–42.

[34]

van Rijsbergen CJ Towards an information logic SIGIR Forum 1989 23 SI 77-86

[35]

Vargas, S., & Castells, P. (2011). Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 109–116.

[36]

Voorhees, E. M. (2001). The philosophy of information retrieval evaluation. In Evaluation of cross-language information retrieval systems, 2nd workshop of the cross-language evaluation forum, CLEF’01, revised papers, pp. 355–370.

[37]

Voorhees EM and Harman DK TREC: Experiment and evaluation in information retrieval 2005 Cambridge MIT Press

[38]

Yilmaz, E., & Aslam, J. A. (2006). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM conference on information and knowledge management, CIKM’06. ACM, pp. 102–111.

Cited By

Erfanian MJagadish HAsudeh A(2024)Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment10.14778/3681954.368201417:11(3470-3483)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3682014
Gabbolini GBridge D(2024)Surveying More Than Two Decades of Music Information Retrieval Research on PlaylistsACM Transactions on Intelligent Systems and Technology10.1145/368839815:6(1-68)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3688398
Dietz LŠćepanović SZhou KQuercia D(2024)Exploratory Analysis of Recommending Urban Parks for Health-Promoting ActivitiesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691712(1131-1135)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3691712
Show More Cited By

Index Terms

Statistical biases in Information Retrieval metrics for recommender systems
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

User-centered Evaluation of Popularity Bias in Recommender Systems
UMAP '21: Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization

Recommendation and ranking systems are known to suffer from popularity bias; the tendency of the algorithm to favor a few popular items while under-representing the majority of other items. Prior research has examined various approaches for mitigating ...
Exploring potential biases towards blockbuster items in ranking-based recommendations
Abstract
Popularity bias is defined as the intrinsic tendency of recommendation algorithms to feature popular items more than unpopular ones in the ranked lists lists they produced. When investigating the adverse effects of popularity bias, the literature ...
Popularity Bias in False-positive Metrics for Recommender Systems Evaluation
We investigate the impact of popularity bias in false-positive metrics in the offline evaluation of recommender systems. Unlike their true-positive complements, false-positive metrics reward systems that minimize recommendations disliked by users. Our ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Retrieval

Information Retrieval Volume 20, Issue 6

Dec 2017

88 pages

ISSN:1386-4564

Issue’s Table of Contents

© Springer Science+Business Media, LLC 2017.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2017

Accepted: 19 July 2017

Received: 04 August 2016

Author Tags

Qualifiers

Research-article

Funding Sources

Secretaría de Estado de Investigación, Desarrollo e Innovación

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

61
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Erfanian MJagadish HAsudeh A(2024)Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment10.14778/3681954.368201417:11(3470-3483)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3682014
Gabbolini GBridge D(2024)Surveying More Than Two Decades of Music Information Retrieval Research on PlaylistsACM Transactions on Intelligent Systems and Technology10.1145/368839815:6(1-68)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3688398
Dietz LŠćepanović SZhou KQuercia D(2024)Exploratory Analysis of Recommending Urban Parks for Health-Promoting ActivitiesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691712(1131-1135)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3691712
Shevchenko VBelousov NVasilev AZholobov VSosedka ASemenova NVolodkevich ASavchenko AZaytsev ABaeza-Yates RBonchi F(2024)From Variability to Stability: Advancing RecSys Benchmarking PracticesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671655(5701-5712)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671655
Bauer CZangerle ESaid A(2024)Exploring the Landscape of Recommender Systems Evaluation: Practices and PerspectivesACM Transactions on Recommender Systems10.1145/36291702:1(1-31)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3629170
Zhang AMa WZheng JWang XChua T(2024)Robust Collaborative Filtering to Popularity Distribution ShiftACM Transactions on Information Systems10.1145/362715942:3(1-25)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3627159
Huang JOosterhuis HMansoury Mvan Hoof Hde Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657749(416-426)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657749
Fernandez MBellogín ACantador I(2024)Analysing the Effect of Recommendation Algorithms on the Spread of MisinformationProceedings of the 16th ACM Web Science Conference10.1145/3614419.3644003(159-169)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3614419.3644003
Gupta SKaur KJain S(2024)EqBal-RS: Mitigating popularity bias in recommender systemsJournal of Intelligent Information Systems10.1007/s10844-023-00817-w62:2(509-534)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10844-023-00817-w
Ihemelandu NEkstrand M(2024)Multiple Testing for IR and Recommendation System ExperimentsAdvances in Information Retrieval10.1007/978-3-031-56063-7_37(449-457)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56063-7_37
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents