Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

Liane Guillou, Christian Hardmeier


Abstract
We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.
Anthology ID:
D18-1513
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4797–4802
Language:
URL:
https://aclanthology.org/D18-1513
DOI:
10.18653/v1/D18-1513
Bibkey:
Cite (ACL):
Liane Guillou and Christian Hardmeier. 2018. Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4797–4802, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point (Guillou & Hardmeier, EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1513.pdf
Video:
 https://aclanthology.org/D18-1513.mp4