research-article

Examining Additivity and Weak Baselines

Authors:

Sadegh Kharazmi,

Mark SandersonAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 34, Issue 4

Article No.: 23, Pages 1 - 18

https://doi.org/10.1145/2882782

Published: 09 June 2016 Publication History

Abstract

We present a study of which baseline to use when testing a new retrieval technique. In contrast to past work, we show that measuring a statistically significant improvement over a weak baseline is not a good predictor of whether a similar improvement will be measured on a strong baseline. Sometimes strong baselines are made worse when a new technique is applied. We investigate whether conducting comparisons against a range of weaker baselines can increase confidence that an observed effect will also show improvements on a stronger baseline. Our results indicate that this is not the case -- at best, testing against a range of baselines means that an experimenter can be more confident that the new technique is unlikely to significantly harm a strong baseline. Examining recent past work, we present evidence that the information retrieval (IR) community continues to test against weak baselines. This is unfortunate as, in light of our experiments, we conclude that the only way to be confident that a new technique is a contribution is to compare it against nothing less than the state of the art.

References

[1]

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. 2009. Diversifying search results. In Proceedings of WSDM. ACM, 5--14.

Digital Library

[2]

Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements that don’t add up: Ad-hoc retrieval results since 1998. In Proceedings of CIKM. ACM, 601--610.

Digital Library

[3]

David Bodoff. 2013. Fuhr’s challenge: Conceptual research, or bust. In ACM SIGIR Forum, Vol. 47. ACM, 3--16.

Digital Library

[4]

D. Bodoff and P. Li. 2007. Test Theory for Assessing IR Test Collections. ACM, New York, NY, 367374.

Digital Library

[5]

Jamie Callan and Alistair Moffat. 2012. Panel on use of proprietary data. In ACM SIGIR Forum, Vol. 46. ACM, 10--18.

Digital Library

[6]

J. Carbonell and J. Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR. ACM, 335--336.

Digital Library

[7]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of CIKM. ACM, 621--630.

Digital Library

[8]

C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR. ACM, 659--666.

Digital Library

[9]

Ronan Cummins, Mounia Lalmas, and Colm O’Riordan. 2011. The limits of retrieval effectiveness. In Advances in Information Retrieval. Springer, 277--282.

Digital Library

[10]

Van Dang and W. Bruce Croft. 2012. Diversity by proportionality: An election-based approach to search result diversification. In Proceedings of SIGIR. ACM, 65--74.

Digital Library

[11]

J. Stephen Downie. 2008. The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research. Acoustical Science and Technology 29, 4, 247--255.

[12]

J. Stephen Downie, Andreas F. Ehmann, Mert Bay, and M. Cameron Jones. 2010. The music information retrieval evaluation exchange: Some observations and insights. In Advances in Music Information Retrieval. Springer, 93--115.

[13]

Hui Fang, Hao Wu, Peilin Yang, and ChengXiang Zhai. 2014. Virlab: A web-based virtual lab for learning and studying information retrieval models. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1249--1250.

Digital Library

[14]

Nicola Ferro and Gianmaria Silvello. 2015. Rank-biased precision reloaded: Reproducibility and generalization. In Advances in Information Retrieval. Springer, 768--780.

[15]

Norbert Fuhr. 2012. Salton Award Lecture: Information retrieval as engineering science. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 1--2.

Digital Library

[16]

Matthias Hagen, Martin Potthast, Michel Büchner, and Benno Stein. 2015. Twitter sentiment detection via ensemble classification using averaged confidence scores. In Advances in Information Retrieval. Springer, 741--754.

[17]

Jiyin He, Vera Hollink, and Arjen de Vries. 2012. Combining implicit and explicit topic representations for result diversification. In Proceedings of SIGIR. ACM, 851--860.

Digital Library

[18]

Xuedong Huang, James Baker, and Raj Reddy. 2014. A historical perspective of speech recognition. Communications of the ACM 57, 1, 94--103.

Digital Library

[19]

Samuel Huston and W. Bruce Croft. 2014. A comparison of retrieval models using term dependencies. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. ACM, 111--120.

Digital Library

[20]

Johannes Leveling, Lorraine Goeuriot, Liadh Kelly, and Gareth J. Jones. 2012. DCU@ TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of TREC.

[21]

Philipp Mayr, Andrea Scharnhorst, Birger Larsen, Philipp Schaer, and Peter Mutschke. 2014. Bibliometric-Enhanced Information Retrieval. Springer, 798--801. Retrieved May 13, 2016 from http://link.springer.com/chapter/10.1007/978-3-319-06028-6_99

[22]

Donald Metzler and Oren Kurland. 2012. Experimental methods for information retrieval. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 1185--1186.

Digital Library

[23]

David Newman, Youn Noh, Edmund Talley, Sarvnaz Karimi, and Timothy Baldwin. 2010. Evaluating topic models for digital libraries. In Proceedings of the 10th Annual Joint Conference on Digital Libraries. ACM, 215--224.

Digital Library

[24]

Antti Puurula. 2013. Cumulative progress in language models for information retrieval. In Proceedings of Australasian Language Technology Association Workshop. 96--100.

[25]

Jinfeng Rao, Jimmy Lin, and Miles Efron. 2015. Reproducible experiments on lexical and temporal feedback for tweet search. In Advances in Information Retrieval. Springer, 755--767.

[26]

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline M. Hancock-Beaulieu, and Mike Gatford. 1995. Okapi at TREC-3. NIST SPECIAL PUBLICATION SP 109--109.

[27]

Alan Said and Alejandro Bellogín. 2014. Comparative recommender system evaluation: Benchmarking recommendation frameworks. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 129--136.

Digital Library

[28]

Tetsuya Sakai and Chin-Yew Lin. 2010. Ranking retrieval systems without relevance assessments-revisited. In 3rd International Workshop on Evaluating Information Access (EVIA’10). 25--33.

[29]

Mark Sanderson. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval 4, 4, 247375.

[30]

Mark Sanderson, Andrew Turpin, Ying Zhang, and Falk Scholer. 2012. Differences in effectiveness across sub-collections. In Proceedings of CIKM. ACM, 1965--1969.

Digital Library

[31]

S. Sanner, S. Guo, T. Graepel, S. Kharazmi, and S. Karimi. 2011. Diverse retrieval via greedy optimization of expected 1-call@ k in a latent subtopic relevance model. In Proceedings of CIKM.

Digital Library

[32]

Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011a. How diverse are web search results? In Proceedings of SIGIR. ACM, 1187--1188.

Digital Library

[33]

Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011b. Intent-aware search result diversification. In Proceedings of SIGIR. ACM, 595--604.

Digital Library

[34]

Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011c. On the suitability of diversity metrics for learning-to-rank for diversity. In Proceedings of SIGIR. ACM, 1185--1186.

Digital Library

[35]

Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2012. On the role of novelty for search result diversification. Information Retrieval 15, 5, 478--502.

Digital Library

[36]

Rodrygo L. T. Santos, Jie Peng, Craig Macdonald, and Iadh Ounis. 2010. Explicit search result diversification through sub-queries. In Proceedings of ECIR. Springer, Milton Keynes, UK, 87--99.

Digital Library

[37]

Markus Schedl, Emilia Gómez, and Julián Urbano. 2014. Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval.

Digital Library

[38]

Aliaksei Severyn, Alessandro Moschitti, Manos Tsagkias, Richard Berendsen, and Maarten de Rijke. 2014. A syntax-aware re-ranker for microblog retrieval. In Proceedings of SIGIR. ACM.

Digital Library

[39]

K. Spärck Jones and C. J. van Rijsbergen. 1976. Information retrieval test collections. Journal of Documentation 32, 59--75.

[40]

Florian Stegmaier, Christin Seifert, Roman Kern, Patrick Hfler, Sebastian Bayerl, Michael Granitzer, Harald Kosch, Stefanie Lindstaedt, Belgin Mutlu, Vedran Sabol, and others. 2014. Unleashing Semantics of Research Data. Springer, 103--112. Retrieved May 13, 2016 from http://link.springer.com/chapter/10.1007/978-3-642-53974-9_10.

Digital Library

[41]

Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, Vol. 2. Citeseer, 26. Retrieved May 13, 2016 from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.3502&rep==rep1&type==pdf.

[42]

Andrew Trotman and David Keeler. 2011. Ad hoc IR: Not much room for improvement. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 10951096. http://dl.acm.org/citation.cfm?id=2010066

Digital Library

[43]

Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS’14). ACM, 58:58--58:65.

Digital Library

[44]

David Vallet and Pablo Castells. 2012. Personalized diversification of search results. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 841--850.

Digital Library

[45]

Saúl Vargas, Pablo Castells, and David Vallet. 2012. Explicit relevance models in intent-oriented information retrieval diversification. In Proceedings of SIGIR. ACM, 75--84.

Digital Library

[46]

Ellen M. Voorhees and Chris Buckley. 2002. The effect of topic set size on retrieval experiment error. In Proceedings of SIGIR. ACM, 316--323.

Digital Library

[47]

E. M. Voorhees and D. K. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge MA.

Digital Library

[48]

J. Wang and J. Zhu. 2009. Portfolio theory of information retrieval. In Proceedings of SIGIR. ACM, 115--122.

Digital Library

[49]

William Webber, Alistair Moffat, and Justin Zobel. 2008. Score standardization for inter-collection comparison of retrieval systems. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 51--58.

Digital Library

[50]

Zheng Ye, Jimmy Xiangji Huang, and Jun Miao. 2012. A hybrid model for ad-hoc information retrieval. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1025--1026.

Digital Library

[51]

C. X. Zhai and J. Lafferty. 2006. A risk minimization framework for information retrieval. Information Processing and Management 42, 1, 31--55.

Digital Library

[52]

Guido Zuccon, Leif Azzopardi, Dell Zhang, and Jun Wang. 2012. Top-k retrieval using facility location analysis. In Proceedings of ECIR. Springer, 305--316.

Digital Library

Cited By

Ferro N(2024)What Happened in CLEF$$\ldots $$ For Another While?Experimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_1(3-57)Online publication date: 14-Sep-2024
https://doi.org/10.1007/978-3-031-71736-9_1
Alessio MFaggioli GFerro NChen HDuh WHuang HKato MMothe JPoblete B(2023)DECAF: A Modular and Extensible Conversational Search FrameworkProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591913(3075-3085)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591913
Faggioli GFerro NMuntean CPerego RTonellotto NChen HDuh WHuang HKato MMothe JPoblete B(2023)A Geometric Framework for Query Performance Prediction in Conversational SearchProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591625(1355-1365)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591625
Show More Cited By

Index Terms

Examining Additivity and Weak Baselines
1. Information systems
  1. Information retrieval

Recommendations

On the Additivity and Weak Baselines for Search Result Diversification Research
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

A recent study on the topic of additivity addresses the task of search result diversification and concludes that while weaker baselines are almost always significantly improved by the evaluated diversification methods, for stronger baselines, just the ...
Robust test collections for retrieval evaluation
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Low-cost methods for acquiring relevance judgments can be a boon to researchers who need to evaluate new retrieval tasks or topics but do not have the resources to make thousands of judgments. While these judgments are very useful for a one-time ...
Back to Basics: A Sanity Check on Modern Time Series Classification Algorithms
Advanced Analytics and Learning on Temporal Data
Abstract
The state-of-the-art in time series classification has come a long way, from the 1NN-DTW algorithm to the ROCKET family of classifiers. However, in the current fast-paced development of new classifiers, taking a step back and performing simple ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 34, Issue 4

September 2016

217 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2954381

Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2016

Accepted: 01 January 2016

Revised: 01 November 2015

Received: 01 July 2015

Published in TOIS Volume 34, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Council's Discovery Projects scheme
NICTA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
323
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ferro N(2024)What Happened in CLEF$$\ldots $$ For Another While?Experimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_1(3-57)Online publication date: 14-Sep-2024
https://doi.org/10.1007/978-3-031-71736-9_1
Alessio MFaggioli GFerro NChen HDuh WHuang HKato MMothe JPoblete B(2023)DECAF: A Modular and Extensible Conversational Search FrameworkProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591913(3075-3085)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591913
Faggioli GFerro NMuntean CPerego RTonellotto NChen HDuh WHuang HKato MMothe JPoblete B(2023)A Geometric Framework for Query Performance Prediction in Conversational SearchProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591625(1355-1365)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591625
Zhang PGao HHu ZYang MSong DWang JHou YHu B(2022)A bias–variance evaluation framework for information retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10274759:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102747
Dang ELuk RAllan J(2021)A Comparison between Term-Independence Retrieval Models for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/348361240:3(1-37)Online publication date: 8-Dec-2021
https://dl.acm.org/doi/10.1145/3483612
Albahem ASpina DScholer FCavedon L(2021)Component-based Analysis of Dynamic Search PerformanceACM Transactions on Information Systems10.1145/348323740:3(1-47)Online publication date: 22-Nov-2021
https://dl.acm.org/doi/10.1145/3483237
Han XLiu YLin JHasibi FFang YAizawa A(2021)The Simplest Thing That Can Possibly Work: (Pseudo-)Relevance Feedback via Text ClassificationProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472261(123-129)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3471158.3472261
Mackenzie JMoffat Ad'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412000
Yigit-Sert SAltingovde IMacdonald COunis IUlusoy Ö(2020)Supervised approaches for explicit search result diversificationInformation Processing & Management10.1016/j.ipm.2020.10235657:6(102356)Online publication date: Nov-2020
https://doi.org/10.1016/j.ipm.2020.102356
Ulu YAltingovde I(2020)Predicting the Size of Candidate Document Set for Implicit Web Search Result DiversificationAdvances in Information Retrieval10.1007/978-3-030-45442-5_51(410-417)Online publication date: 14-Apr-2020
https://dl.acm.org/doi/10.1007/978-3-030-45442-5_51
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents