Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3442381.3449794acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

An Alternative Cross Entropy Loss for Learning-to-Rank

Published: 03 June 2021 Publication History

Abstract

Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set—as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound, is a convex bound on NDCG—a popular ranking metric—and is consistent with NDCG under learning scenarios common in information retrieval. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness.

References

[1]
Sebastian Bruch, Shuguang Han, Michael Bendersky, and Marc Najork. 2020. A Stochastic Treatment of Learning to Rank Scoring Functions. In Proceedings of the 13th International Conference on Web Search and Data Mining. 61–69.
[2]
Sebastian Bruch, Xuanhui Wang, Mike Bendersky, and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on the Theory of Information Retrieval.
[3]
Sebastian Bruch, Masrour Zoghi, Mike Bendersky, and Marc Najork. 2019. Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.
[4]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. 89–96.
[5]
Christopher J.C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010-82. Microsoft Research.
[6]
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. 129–136.
[7]
Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. 1–24.
[8]
Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. 621–630.
[9]
Olivier Chapelle and Mingrui Wu. 2010. Gradient Descent Optimization of Smoothed Information Retrieval Metrics. Information Retrieval 13, 3 (June 2010), 216–235.
[10]
Stéphan Clémençon and Nicolas Vayatis. 2008. Empirical Performance Maximization for Linear Rank Statistics. In Proceedings of the 21st International Conference on Neural Information Processing Systems(Vancouver, British Columbia, Canada). 305–312.
[11]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In Proceedings of the 2008 International Conference on Web Search and Data Mining (Palo Alto, California, USA). 87–94.
[12]
Felipe Cucker and Steve Smale. 2002. On the mathematical foundations of learning. Bull. Amer. Math. Soc. 39 (2002), 1–49.
[13]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189–1232.
[14]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446.
[15]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 217–226.
[16]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (Cambridge, United Kingdom). 781–789.
[17]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30. 3146–3154.
[18]
Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th Annual International Conference on Machine Learning (Montreal, Quebec, Canada). 577–584.
[19]
Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-Aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (Quebec City, Quebec, Canada). 449–458.
[20]
Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331.
[21]
Donald A Metzler, W Bruce Croft, and Andrew Mccallum. 2005. Direct maximization of rank-based metrics for information retrieval. CIIR report 429. University of Massachusetts.
[22]
Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2019. Self-Attentive Document Interaction Networks for Permutation Equivariant Ranking. arxiv:1910.09676
[23]
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. arxiv:1306.2597
[24]
Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Information Retrieval 13, 4 (2010), 375–397.
[25]
Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Vol. 15. PMLR, 618–626.
[26]
Cynthia Rudin. 2009. The P-Norm Push: A Simple Convex Ranking Algorithm That Concentrates at the Top of the List. Journal of Machine Learning Research 10 (Dec. 2009), 2233–2271.
[27]
Cynthia Rudin and Yining Wang. 2018. Direct Learning to Rank and Rerank. In Proceedings of Artificial Intelligence and Statistics AISTATS.
[28]
Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st International Conference on Web Search and Data Mining. 77–86.
[29]
Ambuj Tewari and Sougata Chaudhuri. 2015. Generalization Error Bounds for Learning to Rank: Does the Length of Document Lists Matter?. In Proceedings of the 32nd International Conference on Machine Learning (Lille, France). 315–323.
[30]
Maksims N. Volkovs and Richard S. Zemel. [n.d.]. BoltzRank: learning to maximize expected ranking gain. In Proceedings of the 26th Annual International Conference on Machine Learning (Montreal, Quebec, Canada). 1089–1096.
[31]
Xuanhui Wang, Cheng Li, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2018. The LambdaLoss Framework for Ranking Metric Optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy). 1313–1322.
[32]
Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13, 3 (2010), 254–270.
[33]
Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning. 1192–1199.
[34]
Jun Xu and Hang Li. 2007. AdaRank: A Boosting Algorithm for Information Retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 391–398.
[35]
Jun Xu, Tie-Yan Liu, Min Lu, Hang Li, and Wei-Ying Ma. 2008. Directly Optimizing Evaluation Measures in Learning to Rank. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore). 107–114.
[36]
Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Alexander Grushetsky, Yonghui Wu, Petr Mitrichev, Ethan Sterling, Nathan Bell, Walker Ravina, and Hai Qian. 2020. Interpretable Learning-to-Rank with Generalized Additive Models. arxiv:2005.02553

Cited By

View all
  • (2024)A Self-Distilled Learning to Rank Model for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/368178442:6(1-28)Online publication date: 25-Jul-2024
  • (2024)A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657931(2261-2265)Online publication date: 10-Jul-2024
  • (2024)Enhancing cross entropy with a linearly adaptive loss function for optimized classification performanceScientific Reports10.1038/s41598-024-78858-614:1Online publication date: 9-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '21: Proceedings of the Web Conference 2021
April 2021
4054 pages
ISBN:9781450383127
DOI:10.1145/3442381
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information Retrieval
  2. Learning to Rank
  3. Ranking Metric Optimization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '21
Sponsor:
WWW '21: The Web Conference 2021
April 19 - 23, 2021
Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)9
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Self-Distilled Learning to Rank Model for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/368178442:6(1-28)Online publication date: 25-Jul-2024
  • (2024)A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657931(2261-2265)Online publication date: 10-Jul-2024
  • (2024)Enhancing cross entropy with a linearly adaptive loss function for optimized classification performanceScientific Reports10.1038/s41598-024-78858-614:1Online publication date: 9-Nov-2024
  • (2024)ITNR: Inversion Transformer-based Neural Ranking for cancer drug recommendationsComputers in Biology and Medicine10.1016/j.compbiomed.2024.108312172(108312)Online publication date: Apr-2024
  • (2023)Distributionally robust learning-to-rank under the Wasserstein metricPLOS ONE10.1371/journal.pone.028357418:3(e0283574)Online publication date: 30-Mar-2023
  • (2023)LambdaRank Gradients are IncoherentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614948(1777-1786)Online publication date: 21-Oct-2023
  • (2023)Multi-Label Learning to Rank through Multi-Objective OptimizationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599870(4605-4616)Online publication date: 6-Aug-2023
  • (2023)An Architectural Technical Debt Index Based on Machine Learning and Architectural SmellsIEEE Transactions on Software Engineering10.1109/TSE.2023.328617949:8(4169-4195)Online publication date: 1-Aug-2023
  • (2023)COLTR: Semi-Supervised Learning to Rank With Co-Training and Over-Parameterization for Web SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327075035:12(12542-12555)Online publication date: 26-Apr-2023
  • (2023)Multilayer self‐attention residual network for code searchConcurrency and Computation: Practice and Experience10.1002/cpe.765035:9Online publication date: 13-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media