Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Statistical biases in Information Retrieval metrics for recommender systems

Published: 01 December 2017 Publication History

Abstract

There is an increasing consensus in the Recommender Systems community that the dominant error-based evaluation metrics are insufficient, and mostly inadequate, to properly assess the practical effectiveness of recommendations. Seeking to evaluate recommendation rankings—which largely determine the effective accuracy in matching user needs—rather than predicted rating values, Information Retrieval metrics have started to be applied for the evaluation of recommender systems. In this paper we analyse the main issues and potential divergences in the application of Information Retrieval methodologies to recommender system evaluation, and provide a systematic characterisation of experimental design alternatives for this adaptation. We lay out an experimental configuration framework upon which we identify and analyse specific statistical biases arising in the adaptation of Information Retrieval metrics to recommendation tasks, namely sparsity and popularity biases. These biases considerably distort the empirical measurements, hindering the interpretation and comparison of results across experiments. We develop a formal characterisation and analysis of the biases upon which we analyse their causes and main factors, as well as their impact on evaluation metrics under different experimental configurations, illustrating the theoretical findings with empirical evidence. We propose two experimental design approaches that effectively neutralise such biases to a large extent. We report experiments validating our proposed experimental variants, and comparing them to alternative approaches and metrics that have been defined in the literature with similar or related purposes.

References

[1]
Armstrong, T. G., Moffat, A., Webber, W., & Zobel, J. (2009a). Has ad-hoc retrieval improved since 1994? In Proceedings of the 32nd ACM conference on Research and development in Information Retrieval, SIGIR’09. ACM, pp. 692–693.
[2]
Armstrong, T. G., Moffat, A., Webber, W., & Zobel, J. (2009b). Improvements that don’t add up: Ad hoc retrieval results since 1998. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM’09. ACM, pp. 601–610.
[3]
Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.) (ACM Press Books). Addison-Wesley Professional.
[4]
Barbieri, N., Costa, G., Manco, G., & Ortale, R. (2011). Modeling item selection and relevance for accurate recommendations: a bayesian approach. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 21–28.
[5]
Basu, C., Hirsh, H., & Cohen, W. W. (1998). Recommendation as classification: Using social and content-based information in recommendation. In Proceedings of AAAI/IAAI’98, pp. 714–720.
[6]
Bellogín, A., Castells, P., & Cantador, I. (2011). Precision-oriented evaluation of recommender systems: an algorithmic comparison. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 333–336.
[7]
Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th annual conference on uncertainty in artificial intelligence, UAI’98, pp. 43–52.
[8]
Buckley C, Dimmick D, Soboroff I, and Voorhees EM Bias and the limits of pooling for large collections Information Retrieval 2007 10 6 491-508
[9]
Buckley, C., & Voorhees, E. M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th ACM conference on research and development in information retrieval, SIGIR’04. ACM, pp. 25–32.
[10]
Cañamares, R., & Castells, P. (2014). Exploring social network effects on popularity biases in recommender systems. In Proceedings of the 6th workshop on recommender systems and the social web, RSWeb’14, at the 8th ACM conference on recommender systems, RecSys’14.
[11]
Cañamares, R., & Castells, P. (2017). A probabilistic reformulation of memory-based collaborative filtering—Implications on popularity biases. In Proceedings of the 40th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’17. ACM.
[12]
Celma O Music recommendation and discovery: The long tail, long fail, and long play in the digital music space 2010 1 Berlin Springer
[13]
Celma, O., & Cano, P. (2008). From hits to niches? Or how popular artists can bias music recommendation and discovery. In NETFLIX’08: Proceedings of the 2nd KDD workshop on large-scale recommender systems and the netflix prize competition. ACM, pp. 1–8.
[14]
Celma, O., & Herrera, P. (2008). A new approach to evaluating novel recommendations. In Proceedings of the 2nd ACM conference on recommender systems, RecSys’08. ACM, pp. 179–186.
[15]
Chen, L., & Pan, W. (2013). Cofiset: Collaborative filtering via learning pairwise preferences over item-sets. In Proceedings of the 13th SIAM international conference on data mining, pp. 180–188.
[16]
Cremonesi, P., Koren, Y., & Turrin, R. (2010). Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the 4th ACM conference on recommender systems, RecSys’10. ACM, pp. 39–46.
[17]
Fleder, D. M. and Hosanagar, K. (2007). Recommender systems and their impact on sales diversity. In Proceedings 8th ACM conference on electronic commerce (EC’07), pp. 192–199.
[18]
Harper FM and Konstan JA The movielens datasets: History and context ACM Transactions on Interactive Intelligent Systems 2016 5 4 19
[19]
Herlocker JL, Konstan JA, Terveen LG, and Riedl JT Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems 2004 22 1 5-53
[20]
Hofmann T Latent semantic models for collaborative filtering ACM Transactions on Information Systems 2004 22 1 89-115
[21]
Jambor, T., & Wang, J. (2010a). Goal-driven collaborative filtering—A directional error based approach. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, & K. Rijsbergen (Eds.), Advances in information retrieval (vol. 5993, chapter 36, pp. 407–419). Springer.
[22]
Jambor, T., & Wang, J. (2010b). Optimizing multiple objectives in collaborative filtering. In Proceedings of the 4th ACM conference on recommender systems, RecSys’10. ACM, pp. 55–62.
[23]
Jannach D, Lerche L, Kamehkhosh I, and Jugovac M What recommenders recommend: an analysis of recommendation biases and possible countermeasures User Modeling and User-Adapted Interaction 2015 25 5 427-491
[24]
Kluver, D., & Konstan, J. A. (2014). Evaluating recommender behavior for new users. In Proceedings of the 8th ACM conference on recommender systems, RecSys’14. ACM, pp. 121–128.
[25]
Koren Y, Bell R, and Volinsky C Matrix factorization techniques for recommender systems Computer 2009 42 8 30-37
[26]
Levy, M., & Bosteels, K. (2010). Music recommendation and the long tail. In 1st workshop on music recommendation and discovery, WOMRAD’10, at the 4th ACM conference on recommender systems, RecSys’10.
[27]
Pradel, B., Usunier, N., & Gallinari, P. (2012). Ranking with non-random missing ratings: Influence of popularity and positivity on evaluation metrics. In Proceedings of the 6th ACM conference on recommender systems, RecSys’12. ACM, pp. 147–154.
[28]
Shani, G., Chickering, D. M., & Meek, C. (2008). Mining recommendations from the web. In Proceedings of the 2nd ACM conference on recommender systems, RecSys’08. ACM, pp. 35–42.
[29]
Shani, G., & Gunawardana, A. (2011). Evaluating recommendation systems. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (chapter 8, pp. 257–297). Springer.
[30]
Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., Oliver, N., & Hanjalic, A. (2012). Climf: Learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of the 6th ACM conference on recommender systems, RecSys’12. ACM, pp. 139–146.
[31]
Shi, Y., Serdyukov, P., Hanjalic, A., & Larson, M. (2011). Personalized landmark recommendation based on geotags from photo sharing sites. In Proceedings of the 5th international conference on weblogs and social media.
[32]
Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 125–132.
[33]
Steck, H., & Xin, Y. (2010). A generalized probabilistic framework and its variants for training top-k recommender system. In Proceedings of the workshop on the practical use of recommender systems, algorithms and technologies, PRSAT’10, pp. 35–42.
[34]
van Rijsbergen CJ Towards an information logic SIGIR Forum 1989 23 SI 77-86
[35]
Vargas, S., & Castells, P. (2011). Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the 5th ACM conference on recommender systems, RecSys’11. ACM, pp. 109–116.
[36]
Voorhees, E. M. (2001). The philosophy of information retrieval evaluation. In Evaluation of cross-language information retrieval systems, 2nd workshop of the cross-language evaluation forum, CLEF’01, revised papers, pp. 355–370.
[37]
Voorhees EM and Harman DK TREC: Experiment and evaluation in information retrieval 2005 Cambridge MIT Press
[38]
Yilmaz, E., & Aslam, J. A. (2006). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM conference on information and knowledge management, CIKM’06. ACM, pp. 102–111.

Cited By

View all
  • (2024)Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment10.14778/3681954.368201417:11(3470-3483)Online publication date: 30-Aug-2024
  • (2024)Surveying More Than Two Decades of Music Information Retrieval Research on PlaylistsACM Transactions on Intelligent Systems and Technology10.1145/368839815:6(1-68)Online publication date: 12-Aug-2024
  • (2024)Exploratory Analysis of Recommending Urban Parks for Health-Promoting ActivitiesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691712(1131-1135)Online publication date: 8-Oct-2024
  • Show More Cited By

Index Terms

  1. Statistical biases in Information Retrieval metrics for recommender systems
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Information Retrieval
          Information Retrieval  Volume 20, Issue 6
          Dec 2017
          88 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 01 December 2017
          Accepted: 19 July 2017
          Received: 04 August 2016

          Author Tags

          1. Evaluation
          2. Recommender systems
          3. Popularity bias
          4. Sparsity bias
          5. Cranfield

          Qualifiers

          • Research-article

          Funding Sources

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 23 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment10.14778/3681954.368201417:11(3470-3483)Online publication date: 30-Aug-2024
          • (2024)Surveying More Than Two Decades of Music Information Retrieval Research on PlaylistsACM Transactions on Intelligent Systems and Technology10.1145/368839815:6(1-68)Online publication date: 12-Aug-2024
          • (2024)Exploratory Analysis of Recommending Urban Parks for Health-Promoting ActivitiesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691712(1131-1135)Online publication date: 8-Oct-2024
          • (2024)From Variability to Stability: Advancing RecSys Benchmarking PracticesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671655(5701-5712)Online publication date: 25-Aug-2024
          • (2024)Exploring the Landscape of Recommender Systems Evaluation: Practices and PerspectivesACM Transactions on Recommender Systems10.1145/36291702:1(1-31)Online publication date: 7-Mar-2024
          • (2024)Robust Collaborative Filtering to Popularity Distribution ShiftACM Transactions on Information Systems10.1145/362715942:3(1-25)Online publication date: 22-Jan-2024
          • (2024)Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657749(416-426)Online publication date: 10-Jul-2024
          • (2024)Analysing the Effect of Recommendation Algorithms on the Spread of MisinformationProceedings of the 16th ACM Web Science Conference10.1145/3614419.3644003(159-169)Online publication date: 21-May-2024
          • (2024)EqBal-RS: Mitigating popularity bias in recommender systemsJournal of Intelligent Information Systems10.1007/s10844-023-00817-w62:2(509-534)Online publication date: 1-Apr-2024
          • (2024)Multiple Testing for IR and Recommendation System ExperimentsAdvances in Information Retrieval10.1007/978-3-031-56063-7_37(449-457)Online publication date: 24-Mar-2024
          • Show More Cited By

          View Options

          View options

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media