Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, Zhen Nie


Abstract
High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce CROWDAQ, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that CROWDAQ simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.
Anthology ID:
2020.emnlp-demos.17
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
October
Year:
2020
Address:
Online
Editors:
Qun Liu, David Schlangen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–134
Language:
URL:
https://aclanthology.org/2020.emnlp-demos.17
DOI:
10.18653/v1/2020.emnlp-demos.17
Bibkey:
Cite (ACL):
Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, and Zhen Nie. 2020. Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 127–134, Online. Association for Computational Linguistics.
Cite (Informal):
Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ (Ning et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-demos.17.pdf
Optional supplementary material:
 2020.emnlp-demos.17.OptionalSupplementaryMaterial.pdf