research-article

An Alternative Cross Entropy Loss for Learning-to-Rank

Author:

Sebastian BruchAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 118 - 126

https://doi.org/10.1145/3442381.3449794

Published: 03 June 2021 Publication History

Abstract

Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set—as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound, is a convex bound on NDCG—a popular ranking metric—and is consistent with NDCG under learning scenarios common in information retrieval. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness.

References

[1]

Sebastian Bruch, Shuguang Han, Michael Bendersky, and Marc Najork. 2020. A Stochastic Treatment of Learning to Rank Scoring Functions. In Proceedings of the 13th International Conference on Web Search and Data Mining. 61–69.

Digital Library

[2]

Sebastian Bruch, Xuanhui Wang, Mike Bendersky, and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on the Theory of Information Retrieval.

Digital Library

[3]

Sebastian Bruch, Masrour Zoghi, Mike Bendersky, and Marc Najork. 2019. Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[4]

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. 89–96.

Digital Library

[5]

Christopher J.C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010-82. Microsoft Research.

[6]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. 129–136.

Digital Library

[7]

Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. 1–24.

[8]

Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. 621–630.

Digital Library

[9]

Olivier Chapelle and Mingrui Wu. 2010. Gradient Descent Optimization of Smoothed Information Retrieval Metrics. Information Retrieval 13, 3 (June 2010), 216–235.

Digital Library

[10]

Stéphan Clémençon and Nicolas Vayatis. 2008. Empirical Performance Maximization for Linear Rank Statistics. In Proceedings of the 21st International Conference on Neural Information Processing Systems(Vancouver, British Columbia, Canada). 305–312.

Digital Library

[11]

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In Proceedings of the 2008 International Conference on Web Search and Data Mining (Palo Alto, California, USA). 87–94.

Digital Library

[12]

Felipe Cucker and Steve Smale. 2002. On the mathematical foundations of learning. Bull. Amer. Math. Soc. 39 (2002), 1–49.

[13]

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189–1232.

[14]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446.

Digital Library

[15]

Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 217–226.

Digital Library

[16]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (Cambridge, United Kingdom). 781–789.

Digital Library

[17]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30. 3146–3154.

Digital Library

[18]

Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th Annual International Conference on Machine Learning (Montreal, Quebec, Canada). 577–584.

Digital Library

[19]

Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-Aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (Quebec City, Quebec, Canada). 449–458.

[20]

Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331.

Digital Library

[21]

Donald A Metzler, W Bruce Croft, and Andrew Mccallum. 2005. Direct maximization of rank-based metrics for information retrieval. CIIR report 429. University of Massachusetts.

[22]

Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2019. Self-Attentive Document Interaction Networks for Permutation Equivariant Ranking. arxiv:1910.09676

[23]

Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. arxiv:1306.2597

[24]

Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Information Retrieval 13, 4 (2010), 375–397.

Digital Library

[25]

Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Vol. 15. PMLR, 618–626.

[26]

Cynthia Rudin. 2009. The P-Norm Push: A Simple Convex Ranking Algorithm That Concentrates at the Top of the List. Journal of Machine Learning Research 10 (Dec. 2009), 2233–2271.

[27]

Cynthia Rudin and Yining Wang. 2018. Direct Learning to Rank and Rerank. In Proceedings of Artificial Intelligence and Statistics AISTATS.

[28]

Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st International Conference on Web Search and Data Mining. 77–86.

Digital Library

[29]

Ambuj Tewari and Sougata Chaudhuri. 2015. Generalization Error Bounds for Learning to Rank: Does the Length of Document Lists Matter?. In Proceedings of the 32nd International Conference on Machine Learning (Lille, France). 315–323.

[30]

Maksims N. Volkovs and Richard S. Zemel. [n.d.]. BoltzRank: learning to maximize expected ranking gain. In Proceedings of the 26th Annual International Conference on Machine Learning (Montreal, Quebec, Canada). 1089–1096.

[31]

Xuanhui Wang, Cheng Li, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2018. The LambdaLoss Framework for Ranking Metric Optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy). 1313–1322.

Digital Library

[32]

Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13, 3 (2010), 254–270.

Digital Library

[33]

Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning. 1192–1199.

Digital Library

[34]

Jun Xu and Hang Li. 2007. AdaRank: A Boosting Algorithm for Information Retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 391–398.

Digital Library

[35]

Jun Xu, Tie-Yan Liu, Min Lu, Hang Li, and Wei-Ying Ma. 2008. Directly Optimizing Evaluation Measures in Learning to Rank. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore). 107–114.

Digital Library

[36]

Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Alexander Grushetsky, Yonghui Wu, Petr Mitrichev, Ethan Sterling, Nathan Bell, Walker Ravina, and Hai Qian. 2020. Interpretable Learning-to-Rank with Generalized Additive Models. arxiv:2005.02553

Cited By

Keshvari SSaeedi FSadoghi Yazdi HEnsan F(2024)A Self-Distilled Learning to Rank Model for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/368178442:6(1-28)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3681784
Vecchiato TLucchese CNardini FBruch SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657931(2261-2265)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657931
Shim J(2024)Enhancing cross entropy with a linearly adaptive loss function for optimized classification performanceScientific Reports10.1038/s41598-024-78858-614:1Online publication date: 9-Nov-2024
https://doi.org/10.1038/s41598-024-78858-6
Show More Cited By

Recommendations

On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...
Learning to rank by optimizing expected reciprocal rank
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Learning to rank is one of the most hot research areas in information retrieval, among which listwise approach is an important research direction and the methods that directly optimizing evaluation metrics in listwise approach have been used for ...
Enhanced Learning to Rank using Cluster-loss Adjustment
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Most Learning To Rank (LTR) algorithms like Ranking SVM, RankNet, LambdaRank and LambdaMART use only relevance label judgments as ground truth for training. But in common scenarios like ranking of information cards (google now, other personal assistants),...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
548
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)9

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Keshvari SSaeedi FSadoghi Yazdi HEnsan F(2024)A Self-Distilled Learning to Rank Model for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/368178442:6(1-28)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3681784
Vecchiato TLucchese CNardini FBruch SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657931(2261-2265)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657931
Shim J(2024)Enhancing cross entropy with a linearly adaptive loss function for optimized classification performanceScientific Reports10.1038/s41598-024-78858-614:1Online publication date: 9-Nov-2024
https://doi.org/10.1038/s41598-024-78858-6
Sotudian SPaschalidis I(2024)ITNR: Inversion Transformer-based Neural Ranking for cancer drug recommendationsComputers in Biology and Medicine10.1016/j.compbiomed.2024.108312172(108312)Online publication date: Apr-2024
https://doi.org/10.1016/j.compbiomed.2024.108312
Sotudian SChen RPaschalidis I(2023)Distributionally robust learning-to-rank under the Wasserstein metricPLOS ONE10.1371/journal.pone.028357418:3(e0283574)Online publication date: 30-Mar-2023
https://doi.org/10.1371/journal.pone.0283574
Marcuzzi FLucchese COrlando SFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)LambdaRank Gradients are IncoherentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614948(1777-1786)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614948
Mahapatra DDong CChen YMomma MSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Multi-Label Learning to Rank through Multi-Objective OptimizationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599870(4605-4616)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599870
Sas DAvgeriou P(2023)An Architectural Technical Debt Index Based on Machine Learning and Architectural SmellsIEEE Transactions on Software Engineering10.1109/TSE.2023.328617949:8(4169-4195)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3286179
Li YXiong HWang QKong LLiu HLi HBian JWang SChen GDou DYin D(2023)COLTR: Semi-Supervised Learning to Rank With Co-Training and Over-Parameterization for Web SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327075035:12(12542-12555)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3270750
Hu HLiu JZhang X(2023)Multilayer self‐attention residual network for code searchConcurrency and Computation: Practice and Experience10.1002/cpe.765035:9Online publication date: 13-Feb-2023
https://doi.org/10.1002/cpe.7650
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents