Momresp: A Bayesian Model for Multi-Annotator Document Labeling

Paul Felt, Robbie Haertel, Eric Ringger, Kevin Seppi


Abstract
Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MomResp, a model that incorporates information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels and annotator reliability for the document classification task. We implement this model and show dramatic improvements over majority vote in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Because MomResp predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MomResp does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work.
Anthology ID:
L14-1107
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3704–3711
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1153_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Paul Felt, Robbie Haertel, Eric Ringger, and Kevin Seppi. 2014. Momresp: A Bayesian Model for Multi-Annotator Document Labeling. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3704–3711, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Momresp: A Bayesian Model for Multi-Annotator Document Labeling (Felt et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1153_Paper.pdf