Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3397271.3401069acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Accelerated Convergence for Counterfactual Learning to Rank

Published: 25 July 2020 Publication History

Abstract

Counterfactual Learning To Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights.
To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios -- across optimizers, batch sizes, and different degrees of position bias.

Supplementary Material

MP4 File (3397271.3401069.mp4)
This presentation is about accelerating the convergence of counterfactual learning to rank (LTR). Counterfactual LTR relies on Inverse Propensity Scoring (IPS) to debias the learning process. However, IPS-weights can introduce a large amount of variance, which in turn can slow down the learning process. To address this problem we make the following contributions in this paper: (a) we show that the convergence rate of IPS-weighted SGD scales poorly with IPS weights, (b) we introduce CounterSample: a sample-based SGD method that has provably better convergence than IPS-weighted SGD, and (c) we empirically validate these theoretical findings with experiments in a number of settings -- across optimizers, batch sizes and different severities of position bias.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, et almbox. 2016. TensorFlow: A System for Large-scale Machine Learning. In USENIX OSDI. 265--283.
[2]
Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019 a. A General Framework for Counterfactual Learning-to-Rank. In SIGIR. ACM, 5--14.
[3]
Alekh Agarwal, Martin J Wainwright, Peter L Bartlett, and Pradeep K Ravikumar. 2009. Information-theoretic Lower Bounds on the Oracle Complexity of Convex Optimization. In NIPS. 1--9.
[4]
Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019 b. Estimating Position Bias without Intrusive Interventions. In WSDM. ACM, 474--482.
[5]
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In SIGIR. ACM, 385--394.
[6]
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. 2015. Variance Reduction in SGD by Distributed Importance Sampling. arXiv preprint arXiv:1511.06481 (2015).
[7]
Muhammad Umer Anwaar, Dmytro Rybalko, and Martin Kleinsteuber. 2019. Counterfactual Learning from Logs for Improved Ranking of E-Commerce Products. arXiv preprint arXiv:1907.10409 (2019).
[8]
Michael Bendersky, Xuanhui Wang, Donald Metzler, and Marc Najork. 2017. Learning from user interactions in personal search via attribute parameterization. In WSDM. ACM, 791--799.
[9]
Ben Carterette and Praveen Chandar. 2018. Offline Comparative Evaluation with Incremental, Minimally-invasive Online Feedback. In SIGIR. ACM, 705--714.
[10]
Praveen Chandar and Ben Carterette. 2018. Estimating Clickthrough Bias in the Cascade Model. In CIKM. ACM, 1587--1590.
[11]
Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the Learning to Rank Challenge. 1--24.
[12]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In WSDM. ACM, 87--94.
[13]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR, Vol. 12, Jul (2011), 2121--2159.
[14]
Zhichong Fang, Aman Agarwal, and Thorsten Joachims. 2019. Intervention Harvesting for Context-dependent Examination-bias Estimation. In SIGIR. 825--834.
[15]
Artem Grotov and Maarten de Rijke. 2016. Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial. In SIGIR. ACM, 1215--1218.
[16]
Mohammad Al Hasan, Nish Parikh, Gyanit Singh, and Neel Sundaresan. 2011. Query Suggestion for E-commerce Sites. In WSDM. ACM, 765--774.
[17]
Elad Hazan, Amit Agarwal, and Satyen Kale. 2007. Logarithmic Regret Algorithms for Online Convex Optimization. Machine Learning, Vol. 69, 2--3 (2007), 169--192.
[18]
Elad Hazan and Satyen Kale. 2014. Beyond the Regret Minimization Barrier: Optimal Algorithms for Stochastic Strongly-convex Optimization. JMLR, Vol. 15, 1 (2014), 2489--2512.
[19]
Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions. In SIGIR. ACM, 15--24.
[20]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques. TOIS, Vol. 20, 4 (2002), 422--446.
[21]
Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In KDD. ACM, 133--142.
[22]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2017a. Accurately Interpreting Clickthrough Data as Implicit Feedback. In SIGIR Forum, Vol. 51. Acm New York, NY, USA, 4--11.
[23]
Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018. Deep Learning with Logged Bandit Feedback. In ICLR.
[24]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017b. Unbiased Learning-to-Rank with Biased Feedback. In WSDM. ACM, 781--789.
[25]
Angelos Katharopoulos and Francc ois Fleuret. 2018. Not All Samples are Created Equal: Deep Learning with Importance Sampling. arXiv preprint arXiv:1803.00942 (2018).
[26]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014).
[27]
Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 3, 3 (2009), 225--331.
[28]
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. 2016. Post-learning Optimization of Tree Ensembles for Efficient Ranking. In SIGIR. ACM, 949--952.
[29]
Deanna Needell, Rachel Ward, and Nati Srebro. 2014. Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz Algorithm. In NIPS. 1017--1025.
[30]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPS.
[31]
Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. 2011. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization. arXiv preprint arXiv:1109.5647 (2011).
[32]
Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. 2019. On the Convergence of Adam and Beyond. arXiv preprint arXiv:1904.09237 (2019).
[33]
Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method. The Annals of Mathematical Statistics (1951), 400--407.
[34]
Walter Rudin. 1987. Real and Complex Analysis .McGraw-Hill.
[35]
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
[36]
Ohad Shamir and Tong Zhang. 2013. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes. In ICML. 71--79.
[37]
Adith Swaminathan and Thorsten Joachims. 2015a. Batch Learning from Logged Bandit Feedback Through Counterfactual Risk Minimization. JMLR, Vol. 16, 1 (2015), 1731--1755.
[38]
Adith Swaminathan and Thorsten Joachims. 2015b. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In ICML. 814--823.
[39]
Adith Swaminathan and Thorsten Joachims. 2015c. The Self-normalized Estimator for Counterfactual Learning. In NIPS. 3231--3239.
[40]
Alastair J Walker. 1974. New Fast Method for Generating Discrete Random Numbers with Arbitrary Frequency Distributions. Electronics Letters, Vol. 10, 8 (1974), 127--128.
[41]
Alastair J Walker. 1977. An Efficient Method for Generating Discrete Random Variables with General Distributions. TOMS, Vol. 3, 3 (1977), 253--256.
[42]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In SIGIR. ACM, 115--124.
[43]
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In WSDM. ACM, 610--618.
[44]
Peilin Zhao and Tong Zhang. 2015. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization. In ICML. 1--9.

Cited By

View all
  • (2024)Distillation vs. Sampling for Efficient Training of Learning to Rank ModelsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672527(51-60)Online publication date: 2-Aug-2024
  • (2024)CLEVER: Crafting Intelligent MISP for Cyber Threat Intelligence2024 IEEE 49th Conference on Local Computer Networks (LCN)10.1109/LCN60385.2024.10639749(1-9)Online publication date: 8-Oct-2024
  • (2024)Multispecies deep learning using citizen science data produces more informative plant community modelsNature Communications10.1038/s41467-024-48559-915:1Online publication date: 24-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. counterfactual learning
  2. learning to rank
  3. unbiased learning

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Distillation vs. Sampling for Efficient Training of Learning to Rank ModelsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672527(51-60)Online publication date: 2-Aug-2024
  • (2024)CLEVER: Crafting Intelligent MISP for Cyber Threat Intelligence2024 IEEE 49th Conference on Local Computer Networks (LCN)10.1109/LCN60385.2024.10639749(1-9)Online publication date: 8-Oct-2024
  • (2024)Multispecies deep learning using citizen science data produces more informative plant community modelsNature Communications10.1038/s41467-024-48559-915:1Online publication date: 24-May-2024
  • (2022)Location Representations for Accelerating the Training of Next POI Recommender SystemAdjunct Proceedings of the 2022 ACM International Joint Conference on Pervasive and Ubiquitous Computing and the 2022 ACM International Symposium on Wearable Computers10.1145/3544793.3560370(133-135)Online publication date: 11-Sep-2022
  • (2022)Rax: Composable Learning-to-Rank Using JAXProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539065(3051-3060)Online publication date: 14-Aug-2022
  • (2022)Reinforcement online learning to rank with unbiased reward shapingInformation Retrieval Journal10.1007/s10791-022-09413-y25:4(386-413)Online publication date: 4-Aug-2022
  • (2021)RecSys 2021 Tutorial on Conversational Recommendation: Formulation, Methods, and EvaluationProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3473325(842-844)Online publication date: 13-Sep-2021
  • (2021)How do Online Learning to Rank Methods Adapt to Changes of Intent?Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462937(911-920)Online publication date: 11-Jul-2021
  • (2021)Adapting Interactional Observation Embedding for Counterfactual Learning to RankProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462901(285-294)Online publication date: 11-Jul-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media