Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1277741.1277755acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Reliable information retrieval evaluation with incomplete and biased judgements

Published: 23 July 2007 Publication History

Abstract

Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique.
We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.

References

[1]
P. Ahlgren and L. Grönqvist. Retrieval Evaluation with Incomplete Relevance Data: A Comparative Study of Three Measures. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pages 872--873, Arlington, USA, November 2006.
[2]
J. A. Aslam, V. Pavlu, and E. Yilmaz. A Statistical Method for System Evaluation Using Incomplete Judgments. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 541--548, Seattle, USA, 2006.
[3]
J. A. Aslam and E. Yilmaz. Inferring Document Relevance via Average Precision. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 601--602, Seattle, USA, 2006.
[4]
C. Buckley and E. M. Voorhees. Retrieval Evaluation with Incomplete Information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25--32, Sheffield, United Kingdom, 2004.
[5]
S. Böttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 Terabyte Track. In Proceedings of TREC 2006, Gaithersburg, USA, November 2006.
[6]
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In Proceedings of the 13th Text REtrieval Conference, Gaithersburg, USA, November 2004.
[7]
C. Cleverdon. The Cranfield Tests on Index Language Devices. In Readings in Information Retrieval, pages 47--59, 1997.
[8]
C. Cortes and V. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273--297, September 1995.
[9]
L. Grönqvist. Evaluating Latent Semantic Vector Models with Synonym Tests and Document Retrieval. In ELECTRA Workshop: Methodologies and Evaluation of Lexical Cohesion Techniques in Real-World Applications Beyond Bag of Words, pages 86--88, Salvador, Brazil, August 2005.
[10]
K. Järvelin and J. Kekäläinen. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[11]
T. Joachims. Text Categorization with Suport Vector Machines: Learning with Many Relevant Features. In Proceedings of the 10th European Conference on Machine Learning, pages 137--142, Chemnitz, Germany, April 1998.
[12]
T. Joachims. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 200--209, Bled, Slovenia, June 1999.
[13]
M. G. Kendall. A New Measure of Rank Correlation. Biometrika, (30):81--89, 1938.
[14]
E. M. Voorhees. The Philosophy of Information Retrieval Evaluation. In Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum, pages 355--370, London, UK, 2002.
[15]
E. Yilmaz and J. A. Aslam. Estimating Average Precision with Incomplete and Imperfect Judgments. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pages 102--111, Arlington, USA, 2006.
[16]
J. Zobel. How Reliable are the Results of Large-Scale Information Retrieval Experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 307--314, Melbourne, Australia, 1998.

Cited By

View all

Index Terms

  1. Reliable information retrieval evaluation with incomplete and biased judgements

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2007
    946 pages
    ISBN:9781595935977
    DOI:10.1145/1277741
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 July 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. incomplete judgments
    3. information retrieval

    Qualifiers

    • Article

    Conference

    SIGIR07
    Sponsor:
    SIGIR07: The 30th Annual International SIGIR Conference
    July 23 - 27, 2007
    Amsterdam, The Netherlands

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
    • (2024)LaQuE: Enabling Entity Search at ScaleAdvances in Information Retrieval10.1007/978-3-031-56060-6_18(270-285)Online publication date: 16-Mar-2024
    • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
    • (2023)Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and EducationACM SIGIR Forum10.1145/3636341.363635157:1(1-28)Online publication date: 1-Jun-2023
    • (2023)The Impact of Judgment Variability on the Consistency of Offline Effectiveness MeasuresACM Transactions on Information Systems10.1145/359651142:1(1-31)Online publication date: 18-Aug-2023
    • (2023)One-Shot Labeling for Automatic Relevance EstimationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592032(2230-2235)Online publication date: 19-Jul-2023
    • (2023)Bootstrapped nDCG Estimation in the Presence of Unjudged DocumentsAdvances in Information Retrieval10.1007/978-3-031-28244-7_20(313-329)Online publication date: 17-Mar-2023
    • (2022)Understanding and Predicting Characteristics of Test Collections in Information RetrievalInformation for a Better World: Shaping the Global Future10.1007/978-3-030-96960-8_10(136-148)Online publication date: 23-Feb-2022
    • (2022)Offline recommender system evaluationAI Magazine10.1002/aaai.1205143:2(225-238)Online publication date: 16-Jun-2022
    • (2021)Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval ScalesIEEE Access10.1109/ACCESS.2021.31168579(136182-136216)Online publication date: 2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media