Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, Zhen Nie

Abstract

High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce CROWDAQ, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that CROWDAQ simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.

Anthology ID:: 2020.emnlp-demos.17
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: October
Year:: 2020
Address:: Online
Editors:: Qun Liu, David Schlangen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 127–134
Language:
URL:: https://aclanthology.org/2020.emnlp-demos.17
DOI:: 10.18653/v1/2020.emnlp-demos.17
Bibkey:
Cite (ACL):: Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, and Zhen Nie. 2020. Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 127–134, Online. Association for Computational Linguistics.
Cite (Informal):: Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ (Ning et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-demos.17.pdf
Optional supplementary material:: 2020.emnlp-demos.17.OptionalSupplementaryMaterial.pdf

PDF Cite Search Optional supplementary material