research-article

Open access

Debiased offline evaluation of recommender systems: a weighted-sampling approach

Authors:

Derek BridgeAuthors Info & Claims

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Pages 1435 - 1442

https://doi.org/10.1145/3341105.3375759

Published: 30 March 2020 Publication History

Abstract

Offline evaluation of recommender systems mostly relies on historical data, which is often biased by many confounders. In such data, user-item interactions are Missing Not At Random (MNAR). Measures of recommender system performance on MNAR test data are unlikely to be reliable indicators of real-world performance unless something is done to mitigate the bias. One way that researchers try to obtain less biased offline evaluation is by designing new supposedly unbiased performance estimators for use on MNAR test data. We investigate an alternative solution, a sampling approach. The general idea is to use a sampling strategy on MNAR data to generate an intervened test set with less bias --- one in which interactions are Missing At Random (MAR) or, at least, one that is more MAR-like. An example of this is SKEW, a sampling strategy that aims to adjust for the confounding effect that an item's popularity has on its likelihood of being observed.

In this paper, we propose a novel formulation for the sampling approach. We compare our solution to SKEW and to two baselines which perform a random intervention on MNAR data (and hence are equivalent to no intervention in practice). We empirically validate for the first time the effectiveness of SKEW and we show our approach to be a better estimator of the performance one would obtain on (unbiased) MAR test data. Our strategy benefits from high generality properties (e.g. it can also be employed for training a recommender) and low overheads (e.g. it does not require any learning).

References

[1]

Peter C. Austin and Elizabeth A. Stuart. 2015. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine 34, 28 (2015), 3661--3679.

[2]

Alejandro Bellogín, Pablo Castells, and Iván Cantador. 2017. Statistical biases in Information Retrieval metrics for recommender systems. Information Retrieval Journal 20, 6 (01 Dec 2017), 606--634.

Digital Library

[3]

Stephen Bonner and Flavian Vasile. 2018. Causal Embeddings for Recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 104--112.

Digital Library

[4]

Rocío Cañamares and Pablo Castells. 2018. Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 415--424.

Digital Library

[5]

Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt. 2018. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 224--232.

[6]

Corinna Cortes, Yishay Mansour, and Mehryar Mohri. 2010. Learning Bounds for Importance Weighting. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 442--450.

[7]

Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. 2008. Sample Selection Bias Correction Theory. In Algorithmic Learning Theory, Yoav Freund, László Györfí, György Turán, and Thomas Zeugmann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 38--53.

[8]

Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys '10). ACM, New York, NY, USA, 39--46.

Digital Library

[9]

José Miguel Hernández-Lobato, Neil Houlsby, and Zoubin Ghahramani. 2014. Probabilistic Matrix Factorization with Non-random Missing Data. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML'14). JMLR.org, II-1512--II-1520.

Digital Library

[10]

Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.

[11]

Thorsten Joachims and Adith Swaminathan. 2016. SIGIR Tutorial on Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (proceedings of the 39th international acm sigir conference on research and development in information retrieval ed.). ACM, 1199--1201.

[12]

Yong-Deok Kim and Seungjin Choi. 2014. Bayesian Binomial Mixture Model for Collaborative Prediction with Non-random Missing Data. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys '14). ACM, New York, NY, USA, 201--208.

Digital Library

[13]

Dawen Liang, Laurent Charlin, and David M. Blei. 2016. Causal Inference for Recommendation. In UAI Workshop on Causation.

[14]

Dawen Liang, Laurent Charlin, James McInerney, and David M. Blei. 2016. Modeling User Exposure in Recommendation. In Proceedings of the 25th International Conference on World Wide Web (WWW '16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 951--961.

[15]

Daryl Lim, Julian McAuley, and Gert Lanckriet. 2015. Top-N Recommendation with Missing Implicit Feedback. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15). ACM, New York, NY, USA, 309--312.

Digital Library

[16]

Roderick J A Little and Donald B Rubin. 1986. Statistical Analysis with Missing Data. John Wiley & Sons, Inc., New York, NY, USA.

[17]

Benjamin M. Marlin and Richard S. Zemel. 2009. Collaborative Prediction and Ranking with Non-random Missing Data. In Third ACM Conference on Recommender Systems. 5--12.

[18]

Benjamin M. Marlin, Richard S. Zemel, Sam Roweis, and Malcolm Slaney. 2007. Collaborative Filtering and the Missing at Random Assumption. In Twenty-Third Conference on Uncertainty in Artificial Intelligence. 267--275.

Digital Library

[19]

Arnaud De Myttenaere, Benedicte Le Grand, Boris Golden, and Fabrice Rossi. 2014. Reducing Offline Evaluation Bias in Recommendation Systems. CoRR abs/1407.0822 (2014).

[20]

István Pilászy, Dávid Zibriczky, and Domonkos Tikk. 2010. Fast Als-based Matrix Factorization for Explicit and Implicit Feedback Datasets. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys '10). ACM, New York, NY, USA, 71--78.

Digital Library

[21]

Bruno Pradel, Nicolas Usunier, and Patrick Gallinari. 2012. Ranking with Non-random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys '12). ACM, New York, NY, USA, 147--154.

Digital Library

[22]

Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. CoRR abs/1602.05352 (2016).

[23]

Harald Steck. 2010. Training and Testing of Recommender Systems on Data Missing Not at Random. In 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 713--722.

Digital Library

[24]

Harald Steck. 2011. Item Popularity and Recommendation Accuracy. In Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys '11). ACM, New York, NY, USA, 125--132.

Digital Library

[25]

Harald Steck. 2013. Evaluation of Recommendations: Rating-prediction and Ranking. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys '13). ACM, New York, NY, USA, 213--220.

Digital Library

[26]

Adith Swaminathan and Thorsten Joachims. 2015. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 814--823.

[27]

Yixin Wang, Dawen Liang, Laurent Charlin, and David M. Blei. 2018. The De-confounded Recommender: A Causal Inference Approach to Recommendation. CoRR abs/1808.06581 (2018).

[28]

Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018. Unbiased Offline Recommender Evaluation for Missing-not-at-random Implicit Feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 279--287.

Digital Library

Cited By

Tamm YDamdinov RVasilev A(2021)Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?Proceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3478848(708-713)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3460231.3478848
Carraro DBridge D(2021)A sampling approach to Debiasing the offline evaluation of recommender systemsJournal of Intelligent Information Systems10.1007/s10844-021-00651-y58:2(311-336)Online publication date: 10-Jul-2021
https://dl.acm.org/doi/10.1007/s10844-021-00651-y
Hussain WMerigó J(2021)Centralised Quality of Experience and Service Framework Using PROMETHEE-II for Cloud Provider SelectionIntelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge10.1007/978-3-030-78303-7_5(79-94)Online publication date: 27-Nov-2021
https://doi.org/10.1007/978-3-030-78303-7_5
Show More Cited By

Index Terms

Debiased offline evaluation of recommender systems: a weighted-sampling approach
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Unbiased offline recommender evaluation for missing-not-at-random implicit feedback
RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems

Implicit-feedback Recommenders (ImplicitRec) leverage positive only user-item interactions, such as clicks, to learn personalized user preferences. Recommenders are often evaluated and compared offline using datasets collected from online platforms. ...
A sampling approach to Debiasing the offline evaluation of recommender systems
Abstract
Offline evaluation of recommender systems (RSs) mostly relies on historical data, which is often biased. The bias is a result of many confounders that affect the data collection process. In such biased data, user-item interactions are Missing Not ...
On the Consistency, Discriminative Power and Robustness of Sampled Metrics in Offline Top-N Recommender System Evaluation
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Negative item sampling in offline top-n recommendation evaluation has become increasingly wide-spread, but remains controversial. While several studies have warned against using sampled evaluation metrics on the basis of being a poor approximation of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

March 2020

2348 pages

ISBN:9781450368667

DOI:10.1145/3341105

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Tomas Cerny
Baylor University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Alessio Bechini
University of Pisa, Italy

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Science Foundation Ireland

Conference

SAC '20

Sponsor:

SIGAPP

SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing

March 30 - April 3, 2020

Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
643
Total Downloads

Downloads (Last 12 months)105
Downloads (Last 6 weeks)16

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tamm YDamdinov RVasilev A(2021)Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?Proceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3478848(708-713)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3460231.3478848
Carraro DBridge D(2021)A sampling approach to Debiasing the offline evaluation of recommender systemsJournal of Intelligent Information Systems10.1007/s10844-021-00651-y58:2(311-336)Online publication date: 10-Jul-2021
https://dl.acm.org/doi/10.1007/s10844-021-00651-y
Hussain WMerigó J(2021)Centralised Quality of Experience and Service Framework Using PROMETHEE-II for Cloud Provider SelectionIntelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge10.1007/978-3-030-78303-7_5(79-94)Online publication date: 27-Nov-2021
https://doi.org/10.1007/978-3-030-78303-7_5
Ludewig MMauro NLatifi SJannach D(2020)Empirical analysis of session-based recommendation algorithmsUser Modeling and User-Adapted Interaction10.1007/s11257-020-09277-131:1(149-181)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1007/s11257-020-09277-1
Jannach DQuadrana MCremonesi P(2012)Session-Based Recommender SystemsRecommender Systems Handbook10.1007/978-1-0716-2197-4_8(301-334)Online publication date: 24-Feb-2012
https://doi.org/10.1007/978-1-0716-2197-4_8

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents