Modeling Ambiguity with Many Annotators and Self-Assessments of Annotator Certainty

Melanie Andresen, Michael Vauth, Heike Zinsmeister


Abstract
Most annotation efforts assume that annotators will agree on labels, if the annotation categories are well-defined and documented in annotation guidelines. However, this is not always true. For instance, content-related questions such as ‘Is this sentence about topic X?’ are unlikely to elicit the same answer from all annotators. Additional specifications in the guidelines are helpful to some extent, but can soon get overspecified by rules that cannot be justified by a research question. In this study, we model the semantic category ‘illness’ and its use in a gradual way. For this purpose, we (i) ask many annotators (30 votes per item, 960 items) for their opinion in a crowdsourcing experiment, (ii) ask annotators to indicate their certainty with respect to their annotation, and (iii) compare this across two different text types. We show that results of multiple annotations and average annotator certainty correlate, but many ambiguities can only be captured if several people contribute. The annotated data allow us to filter for sentences with high or low agreement and analyze causes of disagreement, thus getting a better understanding of people’s perception of illness—as an example of a semantic category—as well as of the content of our annotated texts.
Anthology ID:
2020.law-1.5
Volume:
Proceedings of the 14th Linguistic Annotation Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain
Editors:
Stefanie Dipper, Amir Zeldes
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–59
Language:
URL:
https://aclanthology.org/2020.law-1.5
DOI:
Bibkey:
Cite (ACL):
Melanie Andresen, Michael Vauth, and Heike Zinsmeister. 2020. Modeling Ambiguity with Many Annotators and Self-Assessments of Annotator Certainty. In Proceedings of the 14th Linguistic Annotation Workshop, pages 48–59, Barcelona, Spain. Association for Computational Linguistics.
Cite (Informal):
Modeling Ambiguity with Many Annotators and Self-Assessments of Annotator Certainty (Andresen et al., LAW 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.law-1.5.pdf