Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3341105.3375759acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article
Open access

Debiased offline evaluation of recommender systems: a weighted-sampling approach

Published: 30 March 2020 Publication History

Abstract

Offline evaluation of recommender systems mostly relies on historical data, which is often biased by many confounders. In such data, user-item interactions are Missing Not At Random (MNAR). Measures of recommender system performance on MNAR test data are unlikely to be reliable indicators of real-world performance unless something is done to mitigate the bias. One way that researchers try to obtain less biased offline evaluation is by designing new supposedly unbiased performance estimators for use on MNAR test data. We investigate an alternative solution, a sampling approach. The general idea is to use a sampling strategy on MNAR data to generate an intervened test set with less bias --- one in which interactions are Missing At Random (MAR) or, at least, one that is more MAR-like. An example of this is SKEW, a sampling strategy that aims to adjust for the confounding effect that an item's popularity has on its likelihood of being observed.
In this paper, we propose a novel formulation for the sampling approach. We compare our solution to SKEW and to two baselines which perform a random intervention on MNAR data (and hence are equivalent to no intervention in practice). We empirically validate for the first time the effectiveness of SKEW and we show our approach to be a better estimator of the performance one would obtain on (unbiased) MAR test data. Our strategy benefits from high generality properties (e.g. it can also be employed for training a recommender) and low overheads (e.g. it does not require any learning).

References

[1]
Peter C. Austin and Elizabeth A. Stuart. 2015. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine 34, 28 (2015), 3661--3679.
[2]
Alejandro Bellogín, Pablo Castells, and Iván Cantador. 2017. Statistical biases in Information Retrieval metrics for recommender systems. Information Retrieval Journal 20, 6 (01 Dec 2017), 606--634.
[3]
Stephen Bonner and Flavian Vasile. 2018. Causal Embeddings for Recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 104--112.
[4]
Rocío Cañamares and Pablo Castells. 2018. Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 415--424.
[5]
Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt. 2018. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 224--232.
[6]
Corinna Cortes, Yishay Mansour, and Mehryar Mohri. 2010. Learning Bounds for Importance Weighting. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 442--450.
[7]
Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. 2008. Sample Selection Bias Correction Theory. In Algorithmic Learning Theory, Yoav Freund, László Györfí, György Turán, and Thomas Zeugmann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 38--53.
[8]
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys '10). ACM, New York, NY, USA, 39--46.
[9]
José Miguel Hernández-Lobato, Neil Houlsby, and Zoubin Ghahramani. 2014. Probabilistic Matrix Factorization with Non-random Missing Data. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML'14). JMLR.org, II-1512--II-1520.
[10]
Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
[11]
Thorsten Joachims and Adith Swaminathan. 2016. SIGIR Tutorial on Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (proceedings of the 39th international acm sigir conference on research and development in information retrieval ed.). ACM, 1199--1201.
[12]
Yong-Deok Kim and Seungjin Choi. 2014. Bayesian Binomial Mixture Model for Collaborative Prediction with Non-random Missing Data. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys '14). ACM, New York, NY, USA, 201--208.
[13]
Dawen Liang, Laurent Charlin, and David M. Blei. 2016. Causal Inference for Recommendation. In UAI Workshop on Causation.
[14]
Dawen Liang, Laurent Charlin, James McInerney, and David M. Blei. 2016. Modeling User Exposure in Recommendation. In Proceedings of the 25th International Conference on World Wide Web (WWW '16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 951--961.
[15]
Daryl Lim, Julian McAuley, and Gert Lanckriet. 2015. Top-N Recommendation with Missing Implicit Feedback. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15). ACM, New York, NY, USA, 309--312.
[16]
Roderick J A Little and Donald B Rubin. 1986. Statistical Analysis with Missing Data. John Wiley & Sons, Inc., New York, NY, USA.
[17]
Benjamin M. Marlin and Richard S. Zemel. 2009. Collaborative Prediction and Ranking with Non-random Missing Data. In Third ACM Conference on Recommender Systems. 5--12.
[18]
Benjamin M. Marlin, Richard S. Zemel, Sam Roweis, and Malcolm Slaney. 2007. Collaborative Filtering and the Missing at Random Assumption. In Twenty-Third Conference on Uncertainty in Artificial Intelligence. 267--275.
[19]
Arnaud De Myttenaere, Benedicte Le Grand, Boris Golden, and Fabrice Rossi. 2014. Reducing Offline Evaluation Bias in Recommendation Systems. CoRR abs/1407.0822 (2014).
[20]
István Pilászy, Dávid Zibriczky, and Domonkos Tikk. 2010. Fast Als-based Matrix Factorization for Explicit and Implicit Feedback Datasets. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys '10). ACM, New York, NY, USA, 71--78.
[21]
Bruno Pradel, Nicolas Usunier, and Patrick Gallinari. 2012. Ranking with Non-random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys '12). ACM, New York, NY, USA, 147--154.
[22]
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. CoRR abs/1602.05352 (2016).
[23]
Harald Steck. 2010. Training and Testing of Recommender Systems on Data Missing Not at Random. In 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 713--722.
[24]
Harald Steck. 2011. Item Popularity and Recommendation Accuracy. In Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys '11). ACM, New York, NY, USA, 125--132.
[25]
Harald Steck. 2013. Evaluation of Recommendations: Rating-prediction and Ranking. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys '13). ACM, New York, NY, USA, 213--220.
[26]
Adith Swaminathan and Thorsten Joachims. 2015. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 814--823.
[27]
Yixin Wang, Dawen Liang, Laurent Charlin, and David M. Blei. 2018. The De-confounded Recommender: A Causal Inference Approach to Recommendation. CoRR abs/1808.06581 (2018).
[28]
Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018. Unbiased Offline Recommender Evaluation for Missing-not-at-random Implicit Feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 279--287.

Cited By

View all
  • (2021)Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?Proceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3478848(708-713)Online publication date: 13-Sep-2021
  • (2021)A sampling approach to Debiasing the offline evaluation of recommender systemsJournal of Intelligent Information Systems10.1007/s10844-021-00651-y58:2(311-336)Online publication date: 10-Jul-2021
  • (2021)Centralised Quality of Experience and Service Framework Using PROMETHEE-II for Cloud Provider SelectionIntelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge10.1007/978-3-030-78303-7_5(79-94)Online publication date: 27-Nov-2021
  • Show More Cited By

Index Terms

  1. Debiased offline evaluation of recommender systems: a weighted-sampling approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing
    March 2020
    2348 pages
    ISBN:9781450368667
    DOI:10.1145/3341105
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 March 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bias
    2. intervened test sets
    3. offline evaluation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SAC '20
    Sponsor:
    SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing
    March 30 - April 3, 2020
    Brno, Czech Republic

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)105
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 23 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?Proceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3478848(708-713)Online publication date: 13-Sep-2021
    • (2021)A sampling approach to Debiasing the offline evaluation of recommender systemsJournal of Intelligent Information Systems10.1007/s10844-021-00651-y58:2(311-336)Online publication date: 10-Jul-2021
    • (2021)Centralised Quality of Experience and Service Framework Using PROMETHEE-II for Cloud Provider SelectionIntelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge10.1007/978-3-030-78303-7_5(79-94)Online publication date: 27-Nov-2021
    • (2020)Empirical analysis of session-based recommendation algorithmsUser Modeling and User-Adapted Interaction10.1007/s11257-020-09277-131:1(149-181)Online publication date: 20-Oct-2020
    • (2012)Session-Based Recommender SystemsRecommender Systems Handbook10.1007/978-1-0716-2197-4_8(301-334)Online publication date: 24-Feb-2012

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media