High-throughput crowdsourcing mechanisms for complex tasks

Guido Sautter¹ &
Klemens Böhm¹

221 Accesses
5 Citations
Explore all metrics

Abstract

Crowdsourcing has been identified as a way to facilitate large-scale data processing that requires human input. However, working with a large anonymous user community also poses new challenges. In particular, both possible misjudgment and dishonesty threaten the quality of the results. Common countermeasures are based on redundancy, giving way to a tradeoff between result quality and throughput. Ideally, measures should (1) maintain high throughput and (2) ensure high result quality at the same time. Existing research on crowdsourcing mostly focuses on result quality and pays little attention to throughput or even to the tradeoff between the two. One reason is that the number of tasks (atomic units of work) is usually small. A further problem is that the tasks themselves are small as well. In consequence, existing result quality-improvement mechanisms do not scale to the number or complexity of tasks that arise, for instance, in proofreading and processing of digitized legacy literature. This paper proposes novel mechanisms that (1) are independent of the size and complexity of tasks and (2) allow to trade result quality for throughput to a significant extent. Both mathematical analyses and extensive simulations demonstrate the effectiveness of the proposed mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

This is the mode we use in our evaluation.
Note that ‘hypothesis’ does not mean ‘a research hypothesis of ours’ in this current context; it means the hypothesis that a user has a sufficiently low error probability to be eligible for a vote boost.
Note that the parameter values increase exponentially, so the plots in the figure actually are linear.

References

AMT. The Amazon Mechanical Turk, http://www.mturk.com
Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, Leaver-Fay A, Baker D, Popovic Z (2010) Predicting protein structures with a multiplayer online game. Nature 466:756–760
Article Google Scholar
Eckert K, Niepert M, Niemann C, Buckner C, Allen C, Stuckenschmidt H (2010) Crowdsourcing the assembly of concept hierarchies. In: Proceedings of JCDL 2010, Brisbane, Australia
Lintott CJ, Schawinski K, Slosar A, Land K, Bamford S, Thomas D, Raddick MJ, Nichol RC, Szalay A, Andreescu D, Murray P, Vandenberg J (2008) Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 389. doi: 10.1111/j.1365-2966.2008.13689.x
Newby GB, Franks C (2003) Distributed proofreading. In Proceedings of JCDL 2003. Houston, TX, USA. doi:10.1109/JCDL.2003.1204888
Sautter G, Böhm K (2011) High-throughput crowdsourcing mechanisms for complex tasks. In: Proceedings of SocInfo 2011, Singapore
Sautter G, Agosti D, Böhm K, Klingenberg C (2009) Creating digital resources from legacy documents—an experience report from the biosystematics domain. In: Proceedings of ESWC, Heraklion, Greece
Siorpaes K, Hepp M (2007) OntoGame: towards over-coming the incentive bottleneck in ontology building. In: Proceedings OTM 2007, Vilamoura, Portugal
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast — but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP 2008, Morristown, NJ, USA
Von Ahn L (2006) Games with a purpose. IEEE Comput 29(6):92–94
Article Google Scholar
Von Ahn L, Blum M, Hopper N, Langford J (2003) CAPTCHA: using hard ai problems for security. Advances in cryptology—EUROCRYPT 2003. Springer Berlin/Heidelberg. doi:10.1007/3-540-39200-9_18
Von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 321 (5895). doi:10.1126/science.1160379

Download references

Author information

Authors and Affiliations

Computer Science Department, Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
Guido Sautter & Klemens Böhm

Authors

Guido Sautter
View author publications
You can also search for this author in PubMed Google Scholar
Klemens Böhm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guido Sautter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sautter, G., Böhm, K. High-throughput crowdsourcing mechanisms for complex tasks. Soc. Netw. Anal. Min. 3, 873–888 (2013). https://doi.org/10.1007/s13278-013-0114-z

Download citation

Received: 08 February 2012
Revised: 20 February 2013
Accepted: 16 April 2013
Published: 08 May 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s13278-013-0114-z

High-throughput crowdsourcing mechanisms for complex tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Picking Peaches or Squeezing Lemons: Selecting Crowdsourcing Workers for Reducing Cost of Redundancy

Cohort of Crowdsourcıng – Survey

The Dimensions of Crowdsourcing Task Design

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

High-throughput crowdsourcing mechanisms for complex tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Picking Peaches or Squeezing Lemons: Selecting Crowdsourcing Workers for Reducing Cost of Redundancy

Cohort of Crowdsourcıng – Survey

The Dimensions of Crowdsourcing Task Design

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation