research-article

Open access

Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

Authors:

Jeff M. Phillips,

Qingyao AiAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 1486 - 1496

https://doi.org/10.1145/3589334.3645487

Published: 13 May 2024 Publication History

Abstract

Ranking is at the core of many artificial intelligence (AI) applications, including search engines, recommender systems, etc. Modern ranking systems are often constructed with learning-to-rank (LTR) models built from user behavior signals. While previous studies have demonstrated the effectiveness of using user behavior signals (e.g., clicks) as both features and labels of LTR algorithms, we argue that existing LTR algorithms that indiscriminately treat behavior and non-behavior signals in input features could lead to suboptimal performance in practice. Because user behavior signals often have strong correlations with the ranking objective and can only be collected on items that have already been shown to users, directly using behavior signals in LTR could create an exploitation bias that hurts the system performance in the long run.

To address the exploitation bias, we propose an uncertainty-aware empirical Bayes based ranking algorithm, referred to as EBRank. Specifically, EBRank uses a sole non-behavior feature-based prior model to get a prior estimation of relevance. In the dynamic training and serving of ranking systems, EBRank uses the observed user behaviors to update posterior relevance estimation instead of concatenating behaviors as features in ranking models. Besides, EBRank additionally applies an uncertainty-aware exploration strategy to explore actively and collect user behaviors for empirical Bayesian modeling. Experiments on three public datasets show that EBRank is effective, practical and significantly outperforms state-of-the-art ranking algorithms.

Supplemental Material

MP4 File

Supplemental video

Download
6.29 MB

References

[1]

Milton Abramowitz and Irene A Stegun. 1964. Handbook of mathematical functions with formulas, graphs, and mathematical tables. Vol. 55. US Government printing office.

Digital Library

[2]

Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019a. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 5--14.

Digital Library

[3]

Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019b. Estimating position bias without intrusive interventions. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 474--482.

Digital Library

[4]

Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 19--26.

Digital Library

[5]

Qingyao Ai, Keping Bi, Jiafeng Guo, and W Bruce Croft. 2018a. Learning a deep listwise context model for ranking refinement. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 135--144.

Digital Library

[6]

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018b. Unbiased learning to rank with unbiased propensity estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 385--394.

Digital Library

[7]

Qingyao Ai, Tao Yang, Huazheng Wang, and Jiaxin Mao. 2021. Unbiased Learning to Rank: Online or Offline? ACM Transactions on Information Systems (TOIS), Vol. 39, 2 (2021), 1--29.

Digital Library

[8]

Jessa Bekker, Pieter Robberechts, and Jesse Davis. 2020. Beyond the selected completely at random assumption for learning from positive and unlabeled data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 71--85.

Digital Library

[9]

Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge. PMLR, 1--24.

[10]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621--630.

Digital Library

[11]

Matej Cief, Branislav Kveton, and Michal Kompan. 2022. Pessimistic Off-Policy Optimization for Learning to Rank. arXiv preprint arXiv:2206.02593 (2022).

[12]

Daniel Cohen, Bhaskar Mitra, Oleg Lesota, Navid Rekabsaz, and Carsten Eickhoff. 2021. Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models. arXiv preprint arXiv:2105.04651 (2021).

[13]

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.

Digital Library

[14]

J Shane Culpepper, Charles LA Clarke, and Jimmy Lin. 2016. Dynamic cutoff prediction in multi-stage retrieval systems. In Proceedings of the 21st Australasian Document Computing Symposium. 17--24.

Digital Library

[15]

Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Transactions on Information Systems (TOIS), Vol. 35, 2 (2016), 1--31.

Digital Library

[16]

Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.

[17]

Jianfeng Gao, Wei Yuan, Xiao Li, Kefeng Deng, and Jian-Yun Nie. 2009. Smoothing clickthrough data for web search ranking. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 355--362.

Digital Library

[18]

Parth Gupta, Tommaso Dreossi, Jan Bakus, Yu-Hsiang Lin, and Vamsi Salaka. 2020. Treating Cold Start in Product Search by Priors. In Companion Proceedings of the Web Conference 2020. 77--78.

Digital Library

[19]

Cuize Han, Pablo Castells, Parth Gupta, Xu Xu, and Vamsi Salaka. 2022. Addressing Cold Start in Product Search via Empirical Bayes. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (Atlanta, GA, USA) (CIKM '22). Association for Computing Machinery, New York, NY, USA, 3141--3151. https://doi.org/10.1145/3511808.3557066

Digital Library

[20]

Maria Heuss, Daniel Cohen, Masoud Mansoury, Maarten de Rijke, and Carsten Eickhoff. 2023. Predictive Uncertainty-based Bias Mitigation in Ranking. arXiv preprint arXiv:2309.09833 (2023).

[21]

Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.

Digital Library

[22]

Olivier Jeunen and Bart Goethals. 2021. Pessimistic reward models for off-policy learning in recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems. 63--74.

Digital Library

[23]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.

Digital Library

[24]

Branislav Kveton, Ofer Meshi, Masrour Zoghi, and Zhen Qin. 2022. On the Value of Prior in Online Learning to Rank. In International Conference on Artificial Intelligence and Statistics. PMLR, 6880--6892.

[25]

Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvári, and Masrour Zoghi. 2020. BubbleRank: Safe online learning to re-rank via implicit click feedback. In Uncertainty in Artificial Intelligence. PMLR, 196--206.

[26]

Dawen Liang and Nikos Vlassis. 2022. Local Policy Improvement for Recommender Systems. arXiv preprint arXiv:2212.11431 (2022).

[27]

Yen-Chieh Lien, Daniel Cohen, and W Bruce Croft. 2019. An assumption-free approach to the dynamic truncation of ranked lists. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 79--82.

Digital Library

[28]

Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, Vol. 3, 3 (2009), 225--331.

[29]

Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2012. On the usefulness of query features for learning to rank. In Proceedings of the 21st ACM international conference on Information and knowledge management. 2559--2562.

Digital Library

[30]

Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. 2020. Controlling Fairness and Bias in Dynamic Learning-to-Rank. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 429--438. https://doi.org/10.1145/3397271.3401100

Digital Library

[31]

Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1293--1302.

Digital Library

[32]

Harrie Oosterhuis and Maarten de Rijke. 2020. Policy-aware unbiased learning to rank for top-k rankings. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 489--498.

Digital Library

[33]

Harrie Oosterhuis and Maarten de Rijke. 2021a. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 463--471.

Digital Library

[34]

Harrie Oosterhuis and Maarten de de Rijke. 2021b. Robust Generalization and Safe Query-Specializationin Counterfactual Learning to Rank. In Proceedings of the Web Conference 2021. 158--170.

[35]

Zohreh Ovaisi, Ragib Ahsan, Yifan Zhang, Kathryn Vasilaky, and Elena Zheleva. 2020. Correcting for selection bias in learning-to-rank systems. In Proceedings of The Web Conference 2020. 1863--1873.

Digital Library

[36]

Gustavo Penha and Claudia Hauff. 2021. On the Calibration and Uncertainty of Neural Learning to Rank Models for Conversational Search. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 160--170.

[37]

Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.

Digital Library

[38]

Howard Raiffa, Robert Schlaifer, et al. 1961. Applied statistical decision theory. (1961).

[39]

Haggai Roitman, Shai Erera, and Bar Weiner. 2017. Robust standard deviation estimation for query performance prediction. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 245--248.

Digital Library

[40]

Yuta Saito, Suguru Yaginuma, Yuta Nishino, Hayato Sakata, and Kazuhide Nakata. 2020. Unbiased recommender learning from missing-not-at-random implicit feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining. 501--509.

Digital Library

[41]

Anne Schuth, Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013. Lerot: An online learning to rank framework. In Proceedings of the 2013 workshop on Living labs for information retrieval evaluation. 23--26.

Digital Library

[42]

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. 2016. Multileave gradient descent for fast online learning to rank. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 457--466.

Digital Library

[43]

Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219--2228.

Digital Library

[44]

Mark D Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. 623--632.

Digital Library

[45]

Anh Tran, Tao Yang, and Qingyao Ai. 2021. ULTRA: An Unbiased Learning To Rank Algorithm Toolbox. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4613--4622.

Digital Library

[46]

Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1475--1484.

Digital Library

[47]

Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, and Hongning Wang. 2019. Variance reduction in gradient exploration for online learning to rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 835--844.

Digital Library

[48]

Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, and Hongning Wang. 2018b. Efficient exploration of gradient space for online learning to rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 145--154.

Digital Library

[49]

Qunbo Wang, Wenjun Wu, Yuxing Qi, and Yongchi Zhao. 2021. Deep bayesian active learning for learning to rank: A case study in answer selection. IEEE Transactions on Knowledge and Data Engineering (2021).

[50]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115--124.

Digital Library

[51]

Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018a. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 610--618.

Digital Library

[52]

Tao Yang and Qingyao Ai. 2021. Maximizing marginal fairness for dynamic learning to rank. In Proceedings of the Web Conference 2021. 137--145.

Digital Library

[53]

Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai. 2020. Analysis of multivariate scoring functions for automatic unbiased learning to rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2277--2280.

Digital Library

[54]

Tao Yang, Chen Luo, Hanqing Lu, Parth Gupta, Bing Yin, and Qingyao Ai. 2022. Can clicks be both labels and features? Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 6--17.

Digital Library

[55]

Tao Yang, Zhichao Xu, Zhenduo Wang, Anh Tran, and Qingyao Ai. 2023. Marginal-Certainty-aware Fair Ranking Algorithm. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 24--32.

Digital Library

[56]

Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201--1208.

Digital Library

[57]

Jianhan Zhu, Jun Wang, Michael Taylor, and Ingemar J Cox. 2009. Risk-aware information retrieval. In European Conference on Information Retrieval. Springer, 17--28.

Digital Library

[58]

Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2017. Online learning to rank in stochastic click models. In International Conference on Machine Learning. PMLR, 4199--4208. io

Index Terms

Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Can Clicks Be Both Labels and Features?: Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Using implicit feedback collected from user clicks as training labels for learning-to-rank algorithms is a well-developed paradigm that has been extensively studied and used in modern IR systems. Using user clicks as ranking features, on the other hand, ...
Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
219
Total Downloads

Downloads (Last 12 months)219
Downloads (Last 6 weeks)47

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents