Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Reproduce and Improve: An Evolutionary Approach to Select a Few Good Topics for Information Retrieval Evaluation

Published: 29 September 2018 Publication History

Abstract

Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches.

References

[1]
James Allan, Ben Carterette, Javed A. Aslam, Virgil Pavlu, Blagovest Dachev, and Evangelos Kanoulas. 2007. Million Query Track 2007 Overview. Technical Report. NIST. http://trec.nist.gov/pubs/trec18/papers/MQ09OVERVIEW.pdf.
[2]
David Banks, Paul Over, and Nien-Fan Zhang. 1999. Blind men and elephants: Six approaches to TREC data. Information Retrieval 1, 1 (1999), 7--34.
[3]
Andrea Berto, Stefano Mizzaro, and Stephen Robertson. 2013. On using fewer topics in information retrieval evaluations. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). ACM, New York, NY, Article 9, 8 pages.
[4]
Chris Buckley and Ellen M. Voorhees. 2000. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00). ACM, New York, NY, 33--40.
[5]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.
[6]
Agoston E. Eiben and J. E. Smith. 2003. Introduction to Evolutionary Computing. Springer-Verlag.
[7]
Nicola Ferro. 2017. Reproducibility challenges in information retrieval evaluation. Journal of Data and Information Quality 8, 2, Article 8 (Jan. 2017), 4 pages.
[8]
Nicola Ferro, Norbert Fuhr, Kalervo Järvelin, Noriko Kando, Matthias Lippold, and Justin Zobel. 2016. Increasing reproducibility in IR: Findings from the Dagstuhl seminar on reproducibility of data-oriented experiments in E-science. In ACM SIGIR Forum, Vol. 50. ACM, 68--82.
[9]
John Guiver, Stefano Mizzaro, and Stephen Robertson. 2009. A few good topics: Experiments in topic set reduction for retrieval evaluation. ACM Transactions on Information Systems 27, 4, Article 21 (Nov. 2009), 26 pages.
[10]
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632.
[11]
Xiaolu Lu, Alistair Moffat, and J. Shane Culpepper. 2016. The effect of pooling and evaluation depth on IR metrics. Information Retrieval 19, 4 (Aug. 2016), 416--445.
[12]
Stefano Mizzaro and Stephen Robertson. 2007. Hits hits TREC: Exploring IR evaluation results with network analysis. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, 479--486.
[13]
Stephen Robertson. 2011. On the contributions of topics to system evaluation. In Proceedings of the 33rd European Conference on Advances in Information Retrieval, Volume 6611 (ECIR’11). Springer-Verlag, New York, 129--140.
[14]
Kevin Roitero, Eddy Maddalena, and Stefano Mizzaro. 2017. Do Easy Topics Predict Effectiveness Better Than Difficult Topics? Springer International Publishing, Cham, 605--611.
[15]
Tetsuya Sakai. 2014. Designing test collections for comparing many systems. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). ACM, New York, NY, USA, 61--70.
[16]
Tetsuya Sakai. 2016. Topic set size design. Information Retrieval Journal 19, 3 (1 Jun 2016), 256--283.
[17]
Ian Soboroff, Charles Nicholas, and Patrick Cahan. 2001. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, NY, 66--73.
[18]
Karen Sparck Jones and Cornelis Joost van Rijsbergen. n.d. Information retrieval test collections. Journal of Documentation 32, 1.
[19]
Anselm Spoerri. 2005. How the overlap between the search results of different retrieval systems correlates with document relevance. Proceedings of the American Society for Information Science and Technology 42, 1 (2005).
[20]
Ellen M. Voorhees. 2004. Overview of the TREC 2004 robust track. In Proceedings of The Thirteenth Text Retrieval Conference (TREC'04). Vol. 4. http://trec.nist.gov/pubs/trec13/papers/ROBUST.OVERVIEW.pdf.
[21]
Ellen M. Voorhees and Chris Buckley. 2002. The effect of topic set size on retrieval experiment error. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’02). ACM, New York, NY, 316--323.
[22]
Ellen M. Voorhees and Donna Harman. 2000. Overview of the Proceedings of the 8th Text REtrieval Conference (TREC-8). 1--24.
[23]
William Webber, Alistair Moffat, and Justin Zobel. 2008. Statistical power in retrieval experimentation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 571--580.
[24]
Shengli Wu and Fabio Crestani. 2003. Methods for ranking information retrieval systems without relevance judgments. In Proceedings of the 2003 ACM Symposium on Applied Computing (SAC’03). ACM, New York, NY, 811--816.
[25]
Justin Zobel. 1998. How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). ACM, New York, NY, 307--314.

Cited By

View all
  • (2023)How Many Crowd Workers Do I Need? On Statistical Power when Crowdsourcing Relevance JudgmentsACM Transactions on Information Systems10.1145/359720142:1(1-26)Online publication date: 22-May-2023
  • (2021)Estimation of Fair Ranking Metrics with Incomplete JudgmentsProceedings of the Web Conference 202110.1145/3442381.3450080(1065-1075)Online publication date: 19-Apr-2021
  • (2020)Effectiveness evaluation without human relevance judgmentsInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10214957:2Online publication date: 1-Mar-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 10, Issue 3
Special Issue on Reproducibility in IR: Evaluation Campaigns, Collections and Analyses
September 2018
94 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3282439
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2018
Accepted: 01 July 2018
Revised: 01 April 2018
Received: 01 October 2017
Published in JDIQ Volume 10, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Test collection
  2. evolutionary algorithms
  3. few topics
  4. reproducibility
  5. topic selection strategy
  6. topic sets

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)How Many Crowd Workers Do I Need? On Statistical Power when Crowdsourcing Relevance JudgmentsACM Transactions on Information Systems10.1145/359720142:1(1-26)Online publication date: 22-May-2023
  • (2021)Estimation of Fair Ranking Metrics with Incomplete JudgmentsProceedings of the Web Conference 202110.1145/3442381.3450080(1065-1075)Online publication date: 19-Apr-2021
  • (2020)Effectiveness evaluation without human relevance judgmentsInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10214957:2Online publication date: 1-Mar-2020
  • (2020)Fewer topics? A million topics? Both?! On topics subsets in test collectionsInformation Retrieval10.1007/s10791-019-09357-w23:1(49-85)Online publication date: 1-Feb-2020
  • (2019)Towards Stochastic Simulations of Relevance ProfilesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358123(2217-2220)Online publication date: 3-Nov-2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media