Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1835449.1835541acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Reusable test collections through experimental design

Published: 19 July 2010 Publication History

Abstract

Portable, reusable test collections are a vital part of research and development in information retrieval. Reusability is difficult to assess, however. The standard approach--simulating judgment collection when groups of systems are held out, then evaluating those held-out systems--only works when there is a large set of relevance judgments to draw on during the simulation. As test collections adapt to larger and larger corpora, it becomes less and less likely that there will be sufficient judgments for such simulation experiments. Thus we propose a methodology for information retrieval experimentation that collects evidence for or against the reusability of a test collection while judgments are being made. Using this methodology along with the appropriate statistical analyses, researchers will be able to estimate the reusability of their test collections while building them and implement "course corrections" if the collection does not seem to be achieving desired levels of reusability. We show the robustness of our design to inherent sources of variance, and provide a description of an actual implementation of the framework for creating a large test collection.

References

[1]
J. Allan, J. A. Aslam, B. Carterette, V. Pavlu, and E. Kanoulas. Overview of the TREC 2008 million query track. In Proceedings of TREC, 2008.
[2]
J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluat ion, technical report.
[3]
S. Buttcher, C. Clarke, P. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proceedings of SIGIR, pages 63--70, 2007.
[4]
B. Carterette. Robust test collections for retrieval evaluation. In Proceedings of SIGIR, pages 55--62, 2007.
[5]
B. Carterette. On rank correlation and the distance between rankings. In Proceedings of SIGIR, 2009.
[6]
B. Carterette, J. Allan, and R. K. Sitaraman. Minimal test collections for retrieval evaluation. In Proceedings of SIGIR, pages 268--275, 2006.
[7]
B. Carterette, E. Gabrilovitch, V. Josifovsky, and D. Metzler. Measuring the reusability of test collections. In Proceedings of WSDM, 2009.
[8]
B. Carterette, V. Pavlu, H. Fang, and E. Kanoulas. Overview of the TREC 2009 million query track. In Notebook Proceedings of TREC, 2009.
[9]
B. Carterette and M. D. Smucker. Hypothesis testing with incomplete relevance judgments. In Proceedings of CIKM, pages 643--652, 2007.
[10]
J. Cohen. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum, 2nd edition, 1998.
[11]
D. Harman. Overview of the second text retrieval conference (trec-2). Inf. Process. Manage., 31(3):271--289, 1995.
[12]
J. P. A. Ionnidis. Why most published research findings are false. PLoS Med., 2(8), 2005.
[13]
T. Sakai. Comparing metrics across trec and ntcir: the robustness to system bias. In Proceedings of CIKM, pages 581--590, 2008.
[14]
E. M. Voorhees. The philosophy of information retrieval evaluation. In CLEF '01: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, pages 355--370, London, UK, 2002. Springer-Verlag.
[15]
E. M. Voorhees. Overview of trec 2009. In Proceedings of TREC, 2009. Notebook draft.
[16]
W. Webber, A. Moffat, and J. Zobel. Statistical power in retrieval experimentation. In Proceedings of CIKM, pages 571--580, 2008.
[17]
J. Zobel. How Reliable are the Results of Large-Scale Information Retrieval Experiments? In Proceedings of SIGIR, pages 307--314, 1998.

Cited By

View all
  • (2022)Information Retrieval EvaluationundefinedOnline publication date: 10-Mar-2022
  • (2019)Traversing semantically annotated queries for task-oriented query recommendationProceedings of the 13th ACM Conference on Recommender Systems10.1145/3298689.3346994(511-515)Online publication date: 10-Sep-2019
  • (2017)Building Test CollectionsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3082064(1407-1410)Online publication date: 7-Aug-2017
  • Show More Cited By

Index Terms

  1. Reusable test collections through experimental design

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. information retrieval
    3. reusability
    4. test collections

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Information Retrieval EvaluationundefinedOnline publication date: 10-Mar-2022
    • (2019)Traversing semantically annotated queries for task-oriented query recommendationProceedings of the 13th ACM Conference on Recommender Systems10.1145/3298689.3346994(511-515)Online publication date: 10-Sep-2019
    • (2017)Building Test CollectionsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3082064(1407-1410)Online publication date: 7-Aug-2017
    • (2016)Counterfactual Evaluation and Learning for Search, Recommendation and Ad PlacementProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914803(1199-1201)Online publication date: 7-Jul-2016
    • (2016)A Short Survey on Online and Offline Methods for Search Quality EvaluationInformation Retrieval10.1007/978-3-319-41718-9_3(38-87)Online publication date: 26-Jul-2016
    • (2015)Statistical Significance Testing in Information RetrievalProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809445(7-9)Online publication date: 27-Sep-2015
    • (2015)Pooling-based continuous evaluation of information retrieval systemsInformation Retrieval Journal10.1007/s10791-015-9266-y18:5(445-472)Online publication date: 8-Sep-2015
    • (2014)Improving test collection pools with machine learningProceedings of the 19th Australasian Document Computing Symposium10.1145/2682862.2682864(2-9)Online publication date: 26-Nov-2014
    • (2013)Evaluation in Music Information RetrievalJournal of Intelligent Information Systems10.1007/s10844-013-0249-441:3(345-369)Online publication date: 1-Dec-2013
    • (2012)The Reusability of a Diversified Search Test CollectionInformation Retrieval Technology10.1007/978-3-642-35341-3_3(26-38)Online publication date: 2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media