research-article

Towards methods for the collective gathering and quality control of relevance assessments

Authors:

Gabriella Kazai,

Natasa Milic-Frayling,

Jamie CostelloAuthors Info & Claims

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 452 - 459

https://doi.org/10.1145/1571941.1572019

Published: 19 July 2009 Publication History

Abstract

Growing interest in online collections of digital books and video content motivates the development and optimization of adequate retrieval systems. However, traditional methods for collecting relevance assessments to tune system performance are challenged by the nature of digital items in such collections, where assessors are faced with a considerable effort to review and assess content by extensive reading, browsing, and within-document searching. The extra strain is caused by the length and cohesion of the digital item and the dispersion of topics within it. We propose a method for the collective gathering of relevance assessments using a social game model to instigate participants' engagement. The game provides incentives for assessors to follow a predefined review procedure and makes provisions for the quality control of the collected relevance judgments. We discuss the approach in detail, and present the results of a pilot study conducted on a book corpus to validate the approach. Our analysis reveals intricate relationships between the affordances of the system, the incentives of the social game, and the behavior of the assessors. We show that the proposed game design achieves two designated goals: the incentive structure motivates endurance in assessors and the review process encourages truthful assessment.

References

[1]

Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A. P., and Yilmaz, E. 2008. Relevance assessment: are judges exchangeable and does it matter. In Proc. of 31st ACM SIGIR (Singapore). ACM, New York, NY, 667--674.

Digital Library

[2]

Clark, P. B. and J. Q. Wilson. 1961. "Incentive Systems: A Theory of Organizations." Administrative Science Quarterly 6:129--26.

[3]

Cormack, G. V. and Lynam, T. R. 2007. Power and bias of subset pooling strategies. In Proc. of 30th ACM SIGIR (Amsterdam). ACM, New York, NY, 837--838.

Digital Library

[4]

Fuhr, N., Kamps, J., Lalmas, M., Malik, S., and Trotman, A. 2007. Overview of the INEX 2007 ad hoc track. In Proc. of INEX'07. 1--22.

[5]

Kazai, G., Doucet, A., Landoni, M. 2009. Overview of the INEX 2008 Book Track. In Proc. of INEX'08. LNCS Vol. 5613, Springer.

[6]

Piwowarski, B., Trotman, A., and Lalmas, M. 2008. Sound and complete relevance assessment for XML retrieval. ACM Trans. Inf. Syst. 27(1), 1--37.

Digital Library

[7]

Sanderson, M. and Joho, H. 2004. Forming test collections with no system pooling. In Proc. of 27th ACM SIGIR (Sheffield).ACM, New York, NY, 33--40.

Digital Library

[8]

Soboroff, I., Nicholas, C., and Cahan, P. 2001. Ranking retrieval systems without relevance judgments. In Proc. of 24th ACM SIGIR (New Orleans). ACM, New York, 66--73.

Digital Library

[9]

Spink, A. and Greisdorf, H. 2001. Regions and levels: measuring and mapping users'' relevance judgments. J. Am. Soc. Inf. Sci. Technol. 52(2), 161--173.

Digital Library

[10]

Trotman, A., Pharo, N.&Lehtonen (2006). XML-IR users and use cases. In Pre-Proc. of INEX'06, 274--286.

[11]

Trotman, A. and Jenkinson, D. 2007. IR evaluation using multiple assessors per topic. In Proc. of ADCS.

[12]

von Ahn, L. and Dabbish, L. 2008. Designing games with a purpose. Commun. ACM 51(8), 58--67.

Digital Library

[13]

von Ahn, L. and Dabbish, L. 2004. Labeling images with a computer game. In Proc. of SIGCHI Conference on Human Factors in Comp. Systems (Vienna). ACM, NY, 319--326.

Digital Library

[14]

Voorhees, E. M. and Tice, D. M. 2000. Building a question answering test collection. In Proc. of 23rd ACM SIGIR (Athens). ACM, New York, NY, 200--207.

Digital Library

[15]

Voorhees, E. M. and Harman, D. K. 2005 TREC: Experiment and Evaluation in Information Retrieval. The MIT Press.

Digital Library

[16]

Yilmaz, E., Kanoulas, E., and Aslam, J. A. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proc. of 31st ACM SIGIR. ACM, New York, 603--610.

Digital Library

[17]

Zobel, J. 1998. How reliable are the results of large-scale information retrieval experiments?. In Proc. of 21st ACM SIGIR (Melbourne). ACM, New York, NY, 307--314.

Digital Library

Cited By

Zhu DNimmagadda SWong KReiners T(2023)Relevance Judgment Convergence Degree – A Measure of Inconsistency among Assessors for Information RetrievalProceedings of the 30th International Conference on Information Systems Development10.62036/ISD.2022.38Online publication date: 2023
https://doi.org/10.62036/ISD.2022.38
Zhu DNimmagadda SWong KReiners T(2023)Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval DatasetsAdvances in Information Systems Development10.1007/978-3-031-32418-5_9(149-168)Online publication date: 27-Jun-2023
https://doi.org/10.1007/978-3-031-32418-5_9
Alonso O(2019)The Practice of CrowdsourcingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00904ED1V01Y201903ICR06611:1(1-149)Online publication date: 28-May-2019
https://doi.org/10.2200/S00904ED1V01Y201903ICR066
Show More Cited By

Index Terms

Towards methods for the collective gathering and quality control of relevance assessments
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

On information retrieval metrics designed for evaluation with incomplete relevance assessments

Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, ...
Liberal relevance criteria of TREC -: counting on negligible documents?
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

Most test collections (like TREC and CLEF) for experimental research in information retrieval apply binary relevance assessments. This paper introduces a four-point relevance scale and reports the findings of a project in which TREC-7 and TREC-8 ...
Relevance assessments, bibliometrics, and altmetrics: a quantitative study on PubMed and arXiv
Abstract
Relevance is a key element for analyzing bibliometrics and information retrieval (IR). In both domains, relevance decisions are discussed theoretically and sometimes evaluated in empirical studies. IR research is often based on test collections ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

July 2009

896 pages

ISBN:9781605584836

DOI:10.1145/1571941

General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '09

Sponsor:

SIGIR '09: The 32nd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2009

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
652
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu DNimmagadda SWong KReiners T(2023)Relevance Judgment Convergence Degree – A Measure of Inconsistency among Assessors for Information RetrievalProceedings of the 30th International Conference on Information Systems Development10.62036/ISD.2022.38Online publication date: 2023
https://doi.org/10.62036/ISD.2022.38
Zhu DNimmagadda SWong KReiners T(2023)Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval DatasetsAdvances in Information Systems Development10.1007/978-3-031-32418-5_9(149-168)Online publication date: 27-Jun-2023
https://doi.org/10.1007/978-3-031-32418-5_9
Alonso O(2019)The Practice of CrowdsourcingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00904ED1V01Y201903ICR06611:1(1-149)Online publication date: 28-May-2019
https://doi.org/10.2200/S00904ED1V01Y201903ICR066
Spasic IWilliams LBuerki A(2019)Idiom—based features in sentiment analysis: Cutting the Gordian knotIEEE Transactions on Affective Computing10.1109/TAFFC.2017.2777842(1-1)Online publication date: 2019
https://doi.org/10.1109/TAFFC.2017.2777842
Soboroff IKando NSakai TJoho HLi Hde Vries AWhite R(2017)Building Test CollectionsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3082064(1407-1410)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3082064
Osorio-Zuluaga GDario Duque Mendez N(2016)Collaborative construction of metadata and full-text dataset2016 XI Latin American Conference on Learning Objects and Technology (LACLO)10.1109/LACLO.2016.7751767(1-6)Online publication date: Oct-2016
https://doi.org/10.1109/LACLO.2016.7751767
Poesio MChamberlain JKruschwitz URobaldo LDucceschi L(2015)Phrase detectivesProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832747.2832841(4202-4206)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832747.2832841
Megorskaya OKukushkin VSerdyukov PBaeza-Yates RLalmas MMoffat ARibeiro-Neto B(2015)On the Relation Between Assessor's Agreement and Accuracy in Gamified Relevance AssessmentProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767727(605-614)Online publication date: 9-Aug-2015
https://dl.acm.org/doi/10.1145/2766462.2767727
Lafourcade MJoubert ALe Brun N(2015)BibliographyGames with a Purpose (Gwaps)10.1002/9781119136309.biblio(127-134)Online publication date: 3-Jul-2015
https://doi.org/10.1002/9781119136309.biblio
Azzopardi LBevc MGardner AMaxwell DRazzouk AHopfgartner FKazai GKruschwitz UMeder M(2014)PageFetch 2Proceedings of the First International Workshop on Gamification for Information Retrieval10.1145/2594776.2594784(38-41)Online publication date: 13-Apr-2014
https://dl.acm.org/doi/10.1145/2594776.2594784
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten