Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3209978.3210024acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

Published: 27 June 2018 Publication History

Abstract

Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.

References

[1]
Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong . 2009. Diversifying Search Results. In Proc. WSDM. 5--14.
[2]
Enrique Amigó, Jorge Carrillo-de Albornoz, Mario Almagro-Cádiz, Julio Gonzalo, Javier Rodríguez-Vidal, and Felisa Verdejo . 2017. EvALL: Open Access Evaluation for Information Access Systems Proc. SIGIR. 1301--1304.
[3]
Enrique Amigó, Julio Gonzalo, and Felisa Verdejo . 2013. A General Evaluation Measure for Document Organization Tasks Proc. SIGIR. 643--652.
[4]
Luca Busin and Stefano Mizzaro . 2013. Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics Proc. ICTIR. 8.
[5]
Praveen Chandar and Ben Carterette . 2013. Preference Based Evaluation Measures for Novelty and Diversity Proc. SIGIR. 413--422.
[6]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan . 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. CIKM. 621--630.
[7]
Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon . 2008. Novelty and Diversity in Information Retrieval Evaluation Proc. SIGIR. 659--666.
[8]
Charles L. Clarke, Maheedhar Kolla, and Olga Vechtomova . 2009. An Effectiveness Measure for Ambiguous and Underspecified Queries Proc. ICTIR. 188--199.
[9]
Kevyn Collins-Thompson, Craig Macdonald, Paul Bennett, Fernando Diaz, and Ellen M Voorhees . 2015. TREC 2014 Web Track Overview. In Proc. TREC.
[10]
Marco Ferrante, Nicola Ferro, and Maria Maistro . 2015. Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness Proc. ICTIR. 21--30.
[11]
Peter B. Golbus, Javed A. Aslam, and Charles L. A. Clarke . 2013. Increasing Evaluation Sensitivity to Diversity. Inf. Retr. Vol. 16, 4 (2013), 530--555.
[12]
Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Sys. Vol. 20 (2002), 422--446.
[13]
Teerapong Leelanupab, Guido Zuccon, and Joemon M. Jose . 2013. Is Intent-Aware Expected Reciprocal Rank Sufficient to Evaluate Diversity? Proc. ECIR. 738--742.
[14]
Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst . 2013. The Water Filling Model and the Cube Test: Multi-dimensional Evaluation for Professional Search. In Proc. CIKM. 709--714.
[15]
Alistair Moffat . 2013. Seven Numeric Properties of Effectiveness Metrics. In Proc. Asia Info. Retri. Soc. Conf. 1--12.
[16]
Alistair Moffat and Justin Zobel . 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Sys. Vol. 27, 1 (2008), 2:1--2:27.
[17]
Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin yew Lin . 2010. Simple Evaluation Metrics for Diversified Search Results Proc. EVIA. 42--50.
[18]
Tetsuya Sakai and Ruihua Song . 2011. Evaluating Diversified Search Results Using Per-intent Graded Relevance Proc. SIGIR. 1043--1052.
[19]
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis . 2015. Search Result Diversification. Found. & Trends in IR Vol. 9, 1 (2015), 1--90.
[20]
Falk Scholer, Diane Kelly, and Ben Carterette . 2016. Information Retrieval Evaluation Using Test Collections. Inf. Retr. Vol. 19, 3 (2016), 225--229.
[21]
Mark D. Smucker and Charles L.A. Clarke . 2012. Time-based Calibration of Effectiveness Measures. In Proc. SIGIR. 95--104.
[22]
Karen Sparck Jones and Cornelis J. van Rijsbergen . 1976. Information Retrieval Test Collections. J. Documentation Vol. 32, 1 (1976), 59--75.
[23]
Ake Tangsomboon and Teerapong Leelanupab . 2014. Evaluating Diversity and Redundancy-Based Search Metrics Independently Proc. Aust. Doc. Comp. Symp. 42--49.
[24]
Andrew Turpin, Falk Scholer, Kalvero Jarvelin, Mingfang Wu, and J. Shane Culpepper . 2009. Including Summaries in System Evaluation. In Proc. SIGIR. 508--515.
[25]
Cornelis J. van Rijsbergen . 1974. Foundation of Evaluation. J. Documentation Vol. 30, 4 (1974), 365--373.
[26]
Ellen M. Voorhees . 1999. The TREC-8 Question Answering Track Report. In Proc. TREC. 77--82.
[27]
Ellen M. Voorhees and Donna K. Harman . 2005. TREC: Experiment and Evaluation in Information Retrieval. Vol. Vol. 1. MIT Press Cambridge.
[28]
Hui Yang, John Frank, and Ian Soboroff . 2015. Overview of the TREC 2015 Dynamic Domain Track. In Proc. TREC.
[29]
Yiming Yang and Abhimanyu Lad . 2009. Modeling Expected Utility of Multi-session Information Distillation Proc. ICTIR. 164--175.
[30]
Yiming Yang, Abhimanyu Lad, Ni Lao, Abhay Harpale, Bryan Kisiel, and Monica Rogati . 2007. Utility-based Information Distillation over Temporally Sequenced Documents Proc. SIGIR. 31--38.
[31]
Haitao Yu, Adam Jatowt, Roi Blanco, Hideo Joho, and Joemon M. Jose . 2017. An In-depth Study on Diversity Evaluation: The Importance of Intrinsic Diversity. Inf. Proc. & Man. Vol. 53 (2017), 799--813.
[32]
Cheng Xiang Zhai, William W. Cohen, and John Lafferty . 2003. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In Proc. SIGIR. 10--17.

Cited By

View all
  • (2024)Properties of Group Fairness Measures for RankingsACM Transactions on Social Computing10.1145/3674883Online publication date: 27-Aug-2024
  • (2024)Result Diversification in Search and Recommendation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338226236:10(5354-5373)Online publication date: Oct-2024
  • (2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
  • Show More Cited By

Index Terms

  1. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
    June 2018
    1509 pages
    ISBN:9781450356572
    DOI:10.1145/3209978
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. axiomatic analysis
    2. evaluation
    3. search result diversification

    Qualifiers

    • Research-article

    Funding Sources

    • Spanish Government
    • Australian Research Council

    Conference

    SIGIR '18
    Sponsor:

    Acceptance Rates

    SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 19 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Properties of Group Fairness Measures for RankingsACM Transactions on Social Computing10.1145/3674883Online publication date: 27-Aug-2024
    • (2024)Result Diversification in Search and Recommendation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338226236:10(5354-5373)Online publication date: Oct-2024
    • (2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
    • (2023)A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and RelevanceACM Transactions on Information Systems10.1145/358976342:1(1-36)Online publication date: 30-May-2023
    • (2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
    • (2022)On the Effect of Ranking Axioms on IR Evaluation MetricsProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545153(13-23)Online publication date: 23-Aug-2022
    • (2022)Towards Formally Grounded Evaluation Measures for Semantic Parsing-based Knowledge Graph Question AnsweringProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545146(3-12)Online publication date: 23-Aug-2022
    • (2022)Perceptions of Diversity in Electronic Music: the Impact of Listener, Artist, and Track CharacteristicsProceedings of the ACM on Human-Computer Interaction10.1145/35129566:CSCW1(1-26)Online publication date: 7-Apr-2022
    • (2022)Diversity in the Music Listening Experience: Insights from Focus Group InterviewsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505778(272-276)Online publication date: 14-Mar-2022
    • (2022)Ranking InterruptusProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532051(588-598)Online publication date: 6-Jul-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media