research-article

An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

Authors:

Enrique Amigó,

Jorge Carrillo-de-AlbornozAuthors Info & Claims

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 625 - 634

https://doi.org/10.1145/3209978.3210024

Published: 27 June 2018 Publication History

Abstract

Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.

References

[1]

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong . 2009. Diversifying Search Results. In Proc. WSDM. 5--14.

Digital Library

[2]

Enrique Amigó, Jorge Carrillo-de Albornoz, Mario Almagro-Cádiz, Julio Gonzalo, Javier Rodríguez-Vidal, and Felisa Verdejo . 2017. EvALL: Open Access Evaluation for Information Access Systems Proc. SIGIR. 1301--1304.

Digital Library

[3]

Enrique Amigó, Julio Gonzalo, and Felisa Verdejo . 2013. A General Evaluation Measure for Document Organization Tasks Proc. SIGIR. 643--652.

Digital Library

[4]

Luca Busin and Stefano Mizzaro . 2013. Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics Proc. ICTIR. 8.

Digital Library

[5]

Praveen Chandar and Ben Carterette . 2013. Preference Based Evaluation Measures for Novelty and Diversity Proc. SIGIR. 413--422.

Digital Library

[6]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan . 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. CIKM. 621--630.

Digital Library

[7]

Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon . 2008. Novelty and Diversity in Information Retrieval Evaluation Proc. SIGIR. 659--666.

Digital Library

[8]

Charles L. Clarke, Maheedhar Kolla, and Olga Vechtomova . 2009. An Effectiveness Measure for Ambiguous and Underspecified Queries Proc. ICTIR. 188--199.

Digital Library

[9]

Kevyn Collins-Thompson, Craig Macdonald, Paul Bennett, Fernando Diaz, and Ellen M Voorhees . 2015. TREC 2014 Web Track Overview. In Proc. TREC.

[10]

Marco Ferrante, Nicola Ferro, and Maria Maistro . 2015. Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness Proc. ICTIR. 21--30.

Digital Library

[11]

Peter B. Golbus, Javed A. Aslam, and Charles L. A. Clarke . 2013. Increasing Evaluation Sensitivity to Diversity. Inf. Retr. Vol. 16, 4 (2013), 530--555.

Digital Library

[12]

Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Sys. Vol. 20 (2002), 422--446.

Digital Library

[13]

Teerapong Leelanupab, Guido Zuccon, and Joemon M. Jose . 2013. Is Intent-Aware Expected Reciprocal Rank Sufficient to Evaluate Diversity? Proc. ECIR. 738--742.

Digital Library

[14]

Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst . 2013. The Water Filling Model and the Cube Test: Multi-dimensional Evaluation for Professional Search. In Proc. CIKM. 709--714.

Digital Library

[15]

Alistair Moffat . 2013. Seven Numeric Properties of Effectiveness Metrics. In Proc. Asia Info. Retri. Soc. Conf. 1--12.

[16]

Alistair Moffat and Justin Zobel . 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Sys. Vol. 27, 1 (2008), 2:1--2:27.

Digital Library

[17]

Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin yew Lin . 2010. Simple Evaluation Metrics for Diversified Search Results Proc. EVIA. 42--50.

[18]

Tetsuya Sakai and Ruihua Song . 2011. Evaluating Diversified Search Results Using Per-intent Graded Relevance Proc. SIGIR. 1043--1052.

Digital Library

[19]

Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis . 2015. Search Result Diversification. Found. & Trends in IR Vol. 9, 1 (2015), 1--90.

Digital Library

[20]

Falk Scholer, Diane Kelly, and Ben Carterette . 2016. Information Retrieval Evaluation Using Test Collections. Inf. Retr. Vol. 19, 3 (2016), 225--229.

Digital Library

[21]

Mark D. Smucker and Charles L.A. Clarke . 2012. Time-based Calibration of Effectiveness Measures. In Proc. SIGIR. 95--104.

Digital Library

[22]

Karen Sparck Jones and Cornelis J. van Rijsbergen . 1976. Information Retrieval Test Collections. J. Documentation Vol. 32, 1 (1976), 59--75.

[23]

Ake Tangsomboon and Teerapong Leelanupab . 2014. Evaluating Diversity and Redundancy-Based Search Metrics Independently Proc. Aust. Doc. Comp. Symp. 42--49.

Digital Library

[24]

Andrew Turpin, Falk Scholer, Kalvero Jarvelin, Mingfang Wu, and J. Shane Culpepper . 2009. Including Summaries in System Evaluation. In Proc. SIGIR. 508--515.

Digital Library

[25]

Cornelis J. van Rijsbergen . 1974. Foundation of Evaluation. J. Documentation Vol. 30, 4 (1974), 365--373.

[26]

Ellen M. Voorhees . 1999. The TREC-8 Question Answering Track Report. In Proc. TREC. 77--82.

[27]

Ellen M. Voorhees and Donna K. Harman . 2005. TREC: Experiment and Evaluation in Information Retrieval. Vol. Vol. 1. MIT Press Cambridge.

Digital Library

[28]

Hui Yang, John Frank, and Ian Soboroff . 2015. Overview of the TREC 2015 Dynamic Domain Track. In Proc. TREC.

[29]

Yiming Yang and Abhimanyu Lad . 2009. Modeling Expected Utility of Multi-session Information Distillation Proc. ICTIR. 164--175.

Digital Library

[30]

Yiming Yang, Abhimanyu Lad, Ni Lao, Abhay Harpale, Bryan Kisiel, and Monica Rogati . 2007. Utility-based Information Distillation over Temporally Sequenced Documents Proc. SIGIR. 31--38.

Digital Library

[31]

Haitao Yu, Adam Jatowt, Roi Blanco, Hideo Joho, and Joemon M. Jose . 2017. An In-depth Study on Diversity Evaluation: The Importance of Intrinsic Diversity. Inf. Proc. & Man. Vol. 53 (2017), 799--813.

Digital Library

[32]

Cheng Xiang Zhai, William W. Cohen, and John Lafferty . 2003. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In Proc. SIGIR. 10--17.

Digital Library

Cited By

Schumacher TLutz MSikdar SStrohmaier M(2024)Properties of Group Fairness Measures for RankingsACM Transactions on Social Computing10.1145/3674883Online publication date: 27-Aug-2024
https://dl.acm.org/doi/10.1145/3674883
Wu HZhang YMa CLyu FHe BMitra BLiu X(2024)Result Diversification in Search and Recommendation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338226236:10(5354-5373)Online publication date: Oct-2024
https://doi.org/10.1109/TKDE.2024.3382262
Giner F(2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3632171
Show More Cited By

Index Terms

An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness

Recommendations

Desirable Properties for Diversity and Truncated Effectiveness Metrics
ADCS '18: Proceedings of the 23rd Australasian Document Computing Symposium

A wide range of evaluation metrics have been proposed to measure the quality of search results, including in the presence of diversification. Some of these metrics have been adapted for use in search tasks with different complexities, such as where the ...
Adoption of object-oriented software metrics for ontology evaluation
BCI '12: Proceedings of the Fifth Balkan Conference in Informatics

Object-oriented software metrics are well established and widely acknowledged as a measure of software quality. The aim of our research is to analyze the potential use of some of these metrics for ontology evaluation. In this paper we present the ...
Meta-metric Evaluation of E-Commerce-related Metrics

User-perceived software quality is subjective, and thus difficult to be measured. Its importance however in user-centric, web-based systems such as e-commerce systems is huge. How can one measure the subjective? Metrics are one of the most powerful ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

June 2018

1509 pages

ISBN:9781450356572

DOI:10.1145/3209978

General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Spanish Government
Australian Research Council

Conference

SIGIR '18

Sponsor:

SIGIR

SIGIR '18: The 41st International ACM SIGIR conference on research and development in Information Retrieval

July 8 - 12, 2018

MI, Ann Arbor, USA

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
398
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schumacher TLutz MSikdar SStrohmaier M(2024)Properties of Group Fairness Measures for RankingsACM Transactions on Social Computing10.1145/3674883Online publication date: 27-Aug-2024
https://dl.acm.org/doi/10.1145/3674883
Wu HZhang YMa CLyu FHe BMitra BLiu X(2024)Result Diversification in Search and Recommendation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338226236:10(5354-5373)Online publication date: Oct-2024
https://doi.org/10.1109/TKDE.2024.3382262
Giner F(2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3632171
Sakai TKim JKang I(2023)A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and RelevanceACM Transactions on Information Systems10.1145/358976342:1(1-36)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3589763
Shah CWhite RThomas PMitra BSarkar SBelkin N(2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
https://dl.acm.org/doi/10.1145/3576840.3578288
Giner FCrestani FPasi GGaussier E(2022)On the Effect of Ranking Axioms on IR Evaluation MetricsProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545153(13-23)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545153
Linjordet TBalog KSetty VCrestani FPasi GGaussier E(2022)Towards Formally Grounded Evaluation Measures for Semantic Parsing-based Knowledge Graph Question AnsweringProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545146(3-12)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545146
Porcaro LGómez ECastillo C(2022)Perceptions of Diversity in Electronic Music: the Impact of Listener, Artist, and Track CharacteristicsProceedings of the ACM on Human-Computer Interaction10.1145/35129566:CSCW1(1-26)Online publication date: 7-Apr-2022
https://dl.acm.org/doi/10.1145/3512956
Porcaro LGomez ECastillo CElsweiler DKruschwitz ULudwig B(2022)Diversity in the Music Listening Experience: Insights from Focus Group InterviewsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505778(272-276)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.1145/3498366.3505778
Amigó EMizzaro SSpina DAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Ranking InterruptusProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532051(588-598)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3532051
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents