On the practice of error analysis for machine translation evaluation

Sara Stymne, Lars Ahrenberg


Abstract
Error analysis is a means to assess machine translation output in qualitative terms, which can be used as a basis for the generation of error profiles for different systems. As for other subjective approaches to evaluation it runs the risk of low inter-annotator agreement, but very often in papers applying error analysis to MT, this aspect is not even discussed. In this paper, we report results from a comparative evaluation of two systems where agreement initially was low, and discuss the different ways we used to improve it. We compared the effects of using more or less fine-grained taxonomies, and the possibility to restrict analysis to short sentences only. We report results on inter-annotator agreement before and after measures were taken, on error categories that are most likely to be confused, and on the possibility to establish error profiles also in the absence of a high inter-annotator agreement.
Anthology ID:
L12-1417
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1785–1790
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/717_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Sara Stymne and Lars Ahrenberg. 2012. On the practice of error analysis for machine translation evaluation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1785–1790, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
On the practice of error analysis for machine translation evaluation (Stymne & Ahrenberg, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/717_Paper.pdf