Are we Estimating or Guesstimating Translation Quality?

Shuo Sun, Francisco Guzmán, Lucia Specia


Abstract
Recent advances in pre-trained multilingual language models lead to state-of-the-art results on the task of quality estimation (QE) for machine translation. A carefully engineered ensemble of such models won the QE shared task at WMT19. Our in-depth analysis, however, shows that the success of using pre-trained language models for QE is over-estimated due to three issues we observed in current QE datasets: (i) The distributions of quality scores are imbalanced and skewed towards good quality scores; (iii) QE models can perform well on these datasets while looking at only source or translated sentences; (iii) They contain statistical artifacts that correlate well with human-annotated QE labels. Our findings suggest that although QE models might capture fluency of translated sentences and complexity of source sentences, they cannot model adequacy of translations effectively.
Anthology ID:
2020.acl-main.558
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6262–6267
Language:
URL:
https://aclanthology.org/2020.acl-main.558
DOI:
10.18653/v1/2020.acl-main.558
Bibkey:
Cite (ACL):
Shuo Sun, Francisco Guzmán, and Lucia Specia. 2020. Are we Estimating or Guesstimating Translation Quality?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6262–6267, Online. Association for Computational Linguistics.
Cite (Informal):
Are we Estimating or Guesstimating Translation Quality? (Sun et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.558.pdf
Software:
 2020.acl-main.558.Software.txt
Video:
 http://slideslive.com/38929307