Measuring ineffectiveness
EM Voorhees - Proceedings of the 27th annual international ACM …, 2004 - dl.acm.org
Proceedings of the 27th annual international ACM SIGIR conference on …, 2004•dl.acm.org
An evaluation methodology that targets ineffective topics is needed to support research on
obtaining more consistent retrieval across topics. Using average values of traditional
evaluation measures is not an appropriate methodology because it emphasizes effective
topics: poorly performing topics' scores are by definition small, and they are therefore difficult
to distinguish from the noise inherent in retrieval evaluation. We examine two new measures
that emphasize a system's worst topics. While these measures focus on different aspects of …
obtaining more consistent retrieval across topics. Using average values of traditional
evaluation measures is not an appropriate methodology because it emphasizes effective
topics: poorly performing topics' scores are by definition small, and they are therefore difficult
to distinguish from the noise inherent in retrieval evaluation. We examine two new measures
that emphasize a system's worst topics. While these measures focus on different aspects of …
An evaluation methodology that targets ineffective topics is needed to support research on obtaining more consistent retrieval across topics. Using average values of traditional evaluation measures is not an appropriate methodology because it emphasizes effective topics: poorly performing topics' scores are by definition small, and they are therefore difficult to distinguish from the noise inherent in retrieval evaluation. We examine two new measures that emphasize a system's worst topics. While these measures focus on different aspects of retrieval behavior than traditional measures, the measures are less stable than traditional measures and the margin of error associated with the new measures is large relative to the observed differences in scores.
ACM Digital Library