research-article

Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines

Authors:

B. Taner Dinçer,

Craig Macdonald,

Iadh OunisAuthors Info & Claims

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 483 - 492

https://doi.org/10.1145/2911451.2911511

Published: 07 July 2016 Publication History

Abstract

A robust retrieval system ensures that user experience is not damaged by the presence of poorly-performing queries. Such robustness can be measured by risk-sensitive evaluation measures, which assess the extent to which a system performs worse than a given baseline system. However, using a particular, single system as the baseline suffers from the fact that retrieval performance highly varies among IR systems across topics. Thus, a single system would in general fail in providing enough information about the real baseline performance for every topic under consideration, and hence it would in general fail in measuring the real risk associated with any given system. Based upon the Chi-squared statistic, we propose a new measure ZRisk that exhibits more promise since it takes into account multiple baselines when measuring risk, and a derivative measure called GeoRisk, which enhances ZRisk by also taking into account the overall magnitude of effectiveness. This paper demonstrates the benefits of ZRisk and GeoRisk upon TREC data, and how to exploit GeoRisk for risk-sensitive learning to rank, thereby making use of multiple baselines within the learning objective function to obtain effective yet risk-averse/robust ranking systems. Experiments using 10,000 topics from the MSLR learning to rank dataset demonstrate the efficacy of the proposed Chi-square statistic-based objective function.

References

[1]

A. Agresti. Categorical Data Analysis. Wiley, 2002. 2nd ed.,

[2]

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proceedings of ECIR, 2004.

[3]

T. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: ad-hoc retrieval results since 1998. In Proceedings of ACM CIKM, 2009.łooseness 0

Digital Library

[4]

S. Beitzel, E. Jensen, and O. Frieder. GMAP. In L. Liu and M. Özsu, eds., Encyclopedia of Database Systems, pp 1256--1256, 2009.\pageenlarge2

[5]

P. N. Bennett, M. Shokouhi, and R. Caruana. Implicit preference labels for learning highly selective personalized rankers. In Proceedings of ACM ICTIR, 2015.

Digital Library

[6]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, 2005.

Digital Library

[7]

C. J. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010--82, Microsoft Research, 2010.

[8]

D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of ACM SIGIR, 2002.

Digital Library

[9]

O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of ACM CIKM, 2009.

Digital Library

[10]

C. L. A. Clarke, N. Craswell, and E. Voorhees. Overview of the TREC 2012 Web track. In Proceedings of TREC, 2012.

[11]

K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In Proceedings of ACM CIKM, 2009.

Digital Library

[12]

K. Collins-Thompson, P. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. Overview of the TREC 2013 Web track. In Proceedings of TREC, 2013.

[13]

B. T. Dinçer, C. Macdonald, and I. Ounis. Hypothesis testing for the risk-sensitive evaluation of retrieval systems. In Proceedings of ACM SIGIR, 2014.

Digital Library

[14]

B. T. Dinçer, I. Ounis, and C. Macdonald. Tackling biased baselines in the risk-sensitive evaluation of retrieval systems. In Proceedings of ECIR, 2014.

[15]

Y. Ganjisaffar, R. Caruana, and C. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of ACM SIGIR, 2011.

Digital Library

[16]

D. Hoaglin, F. Mosteller, and J. Tukey, eds. Understanding robust & exploratory data analysis. Wiley, 1983.

[17]

S. Kharazmi, F. Scholer, D. Vallet and M. Sanderson. Examining Additivity and Weak Baselines. TOIS, to appear, 2016.

Digital Library

[18]

I. Kocabaş, B. T. Dinçer, and B. Karaoglan. A nonparametric term weighting method for information retrieval based on measuring the divergence from independence. Information Retrieval, 17(2):153--176, 2014.

Digital Library

[19]

T.-Y. Liu. Learning to rank for information retrieval. Foundation and Trends in Information Retrieval, 3(3):225--331, 2009.

Digital Library

[20]

C. Macdonald, R. L. Santos, and I. Ounis. The whens and hows of learning to rank for web search. Information Retrieval., 16(5):584--628, 2013.

Digital Library

[21]

D. A. Metzler. Automatic feature selection in the markov random field model for information retrieval. In Proceedings of ACM CIKM, 2007.

Digital Library

[22]

M. Oakes, R. Gaaizauskas, H. Fowkes, A. Jonsson, V. Wan, and M. Beaulieu. A method based on the chi-square test for document classification. In Proceedings of ACM SIGIR, 2001.

Digital Library

[23]

S. Robertson. On GMAP - and other transformations. In Proceedings of ACM CIKM, 2006.

Digital Library

[24]

E. M. Voorhees. Overview of the TREC 2003 Robust retrieval track. In Proceedings of TREC, 2003.% NIST Special Publication 500--255.

[25]

E. M. Voorhees. The TREC Robust retrieval track. SIGIR Forum, 39(1):11--20, June 2005.

Digital Library

[26]

E. M. Voorhees and C. Buckley. The effect of topic set size on retrieval experiment error. In Proceedings of ACM SIGIR, 2002.

Digital Library

[27]

L. Wang, P. N. Bennett, and K. Collins-Thompson. Robust ranking models via risk-sensitive optimization. In Proceedings of ACM SIGIR, 2012.

Digital Library

[28]

Q. Wu, C. J. C. Burges, K. M. Svore, and J. Gao. Ranking, boosting, and model adaptation. Technical Report MSR-TR-2008--109, Microsoft, 2008.

Cited By

Mothe JUllah M(2023)Selective Query Processing: A Risk-Sensitive Selection of Search ConfigurationsACM Transactions on Information Systems10.1145/360847442:1(1-35)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3608474
Silva Rodrigues PXavier Sousa DCouto Rosa TGonçalves MAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Risk-Sensitive Deep Neural Learning to RankProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532056(803-813)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3532056
Zhang PGao HHu ZYang MSong DWang JHou YHu B(2022)A bias–variance evaluation framework for information retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10274759:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102747
Show More Cited By

Index Terms

Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

Risk-Sensitive Deep Neural Learning to Rank
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Learning to Rank (L2R) is the core task of many Information Retrieval systems. Recently, a great effort has been put on exploring Deep Neural Networks (DNNs) for L2R, with significant results. However, risk-sensitiveness, an important and recent advance ...
Hypothesis testing for the risk-sensitive evaluation of retrieval systems
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

The aim of risk-sensitive evaluation is to measure when a given information retrieval (IR) system does not perform worse than a corresponding baseline system for any topic. This paper argues that risk-sensitive evaluation is akin to the underlying ...
Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm
WWW '19: The World Wide Web Conference

Recently a number of algorithms under the theme of 'unbiased learning-to-rank' have been proposed, which can reduce position bias, the major type of bias in click data, and train a high-performance ranker with click data. Most of the existing algorithms,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

TUBITAK

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
289
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mothe JUllah M(2023)Selective Query Processing: A Risk-Sensitive Selection of Search ConfigurationsACM Transactions on Information Systems10.1145/360847442:1(1-35)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3608474
Silva Rodrigues PXavier Sousa DCouto Rosa TGonçalves MAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Risk-Sensitive Deep Neural Learning to RankProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532056(803-813)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3532056
Zhang PGao HHu ZYang MSong DWang JHou YHu B(2022)A bias–variance evaluation framework for information retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10274759:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102747
Mothe JUllah MDemartini GZuccon GCulpepper JHuang ZTong H(2021)Defining an Optimal Configuration Set for Selective Search Strategy - A Risk-Sensitive ApproachProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482422(1335-1345)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482422
Benham RMoffat ACulpepper J(2021)Bayesian System Inference on Shallow PoolsAdvances in Information Retrieval10.1007/978-3-030-72240-1_17(209-215)Online publication date: 28-Mar-2021
https://dl.acm.org/doi/10.1007/978-3-030-72240-1_17
Benham RCarterette BCulpepper JMoffat AHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Bayesian Inferential Risk Evaluation On Multiple IR SystemsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401033(339-348)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401033
Benham RPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Evaluating Risk-Sensitive Text RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331423(1455-1455)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331423
Sousa DCanuto SGonçalves MRosa TMartins W(2019)Risk-Sensitive Learning to Rank with Evolutionary Multi-Objective Feature SelectionACM Transactions on Information Systems10.1145/330019637:2(1-34)Online publication date: 14-Feb-2019
https://dl.acm.org/doi/10.1145/3300196
Gallagher LChen RBlanco RCulpepper JCulpepper JMoffat ABennett PLerman K(2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
https://dl.acm.org/doi/10.1145/3289600.3290986
Benham RMoffat ACulpepper J(2019)On the Pluses and Minuses of RiskInformation Retrieval Technology10.1007/978-3-030-42835-8_8(81-93)Online publication date: 7-Nov-2019
https://dl.acm.org/doi/10.1007/978-3-030-42835-8_8
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents