Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Crowdsourced Data Management: A Survey

Published: 01 September 2016 Publication History

Abstract

Any important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring platforms are an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. In this paper, we survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Based on this analysis we then outline key factors that need to be considered to improve crowdsourced data management.

Cited By

View all
  • (2024)Naive Bayes classifiers over missing dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692227(3913-3934)Online publication date: 21-Jul-2024
  • (2024)Efficient online crowdsourcing with complex annotationsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i9.28876(10119-10127)Online publication date: 20-Feb-2024
  • (2024)Color Theme Evaluation through User Preference ModelingACM Transactions on Applied Perception10.1145/366532921:3(1-35)Online publication date: 21-May-2024
  • Show More Cited By

Recommendations

Reviews

David Gary Hill

Many data management and analytics tasks, notably entity resolution, sentiment analysis, and image recognition, cannot always be fulfilled through automated software processes alone, but also require the application of human cognition. Human computation capabilities can be harnessed using crowdsourced platforms. This paper "surveys and synthesizes a [broad range] of existing studies on crowdsourced data management" and then "outlines key factors that [should] be considered to improve crowdsourced data management." A major focus of the paper is on three key problems in crowdsourced data management, namely quality control, cost control, and latency control. Quality control covers how to prevent low-quality results, "such as eliminating low-quality workers." Cost control addresses the issue of how to ensure that costs are not more than necessary to complete the crowdsourcing tasks. One way of doing this is using pruning algorithms to eliminate unnecessary tasks. Latency control discusses strategies for meeting established time constraints, such as pricing. The paper gives considerable attention to crowdsourced operators that have been proposed to improve real-world applications, including filtering, find, and search operators. "Crowdsourcing systems that integrate [crowdsourced] relational database management systems ... to process computer-hard queries" are discussed. Two crowdsourcing platforms, Amazon Mechanical Turk and CrowdFlower, are examined. The paper is very thorough, clear, and detailed. Those readers who follow crowdsourced data management should find this paper a very valuable reference. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 28, Issue 9
September 2016
289 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 September 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Naive Bayes classifiers over missing dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692227(3913-3934)Online publication date: 21-Jul-2024
  • (2024)Efficient online crowdsourcing with complex annotationsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i9.28876(10119-10127)Online publication date: 20-Feb-2024
  • (2024)Color Theme Evaluation through User Preference ModelingACM Transactions on Applied Perception10.1145/366532921:3(1-35)Online publication date: 21-May-2024
  • (2024)Towards secure and trustworthy crowdsourcing: challenges, existing landscape, and future directionsWireless Networks10.1007/s11276-022-03015-830:5(4329-4341)Online publication date: 1-Jul-2024
  • (2024)Multilabel classification using crowdsourcing under budget constraintsKnowledge and Information Systems10.1007/s10115-023-01973-966:2(841-877)Online publication date: 1-Feb-2024
  • (2023)From Large Language Models to Databases and Back: A Discussion on Research and EducationACM SIGMOD Record10.1145/3631504.363151852:3(49-56)Online publication date: 2-Nov-2023
  • (2023)Mitigating Voter Attribute Bias for Fair Opinion AggregationProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604660(170-180)Online publication date: 8-Aug-2023
  • (2023)On Dynamically Pricing Crowdsourcing TasksACM Transactions on Knowledge Discovery from Data10.1145/354401817:2(1-27)Online publication date: 20-Feb-2023
  • (2023)In Pursuit of Beauty: Aesthetic-Aware and Context-Adaptive Photo Selection in CrowdsensingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323796935:9(9364-9377)Online publication date: 1-Sep-2023
  • (2023)A Generative Answer Aggregation Model for Sentence-Level Crowdsourcing TasksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314282135:4(3299-3312)Online publication date: 1-Apr-2023
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media