Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models

Yuming Zhai, Gabriel Illouz, Anne Vilnat


Abstract
Human-generated non-literal translations reflect the richness of human languages and are sometimes indispensable to ensure adequacy and fluency. Non-literal translations are difficult to produce even for human translators, especially for foreign language learners, and machine translations are still on the way to simulate human ones on this aspect. In order to foster the study on appropriate and creative non-literal translations, automatically detecting them in parallel corpora is an important step, which can benefit downstream NLP tasks or help to construct materials to teach translation. This article demonstrates that generic sentence representations produced by a pre-trained cross-lingual language model could be fine-tuned to solve this task. We show that there exists a moderate positive correlation between the prediction probability of being human translation and the non-literal translations’ proportion in a sentence. The fine-tuning experiments show an accuracy of 80.16% when predicting the presence of non-literal translations in a sentence and an accuracy of 85.20% when distinguishing literal and non-literal translations at phrase level. We further conduct a linguistic error analysis and propose directions for future work.
Anthology ID:
2020.coling-main.522
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5944–5956
Language:
URL:
https://aclanthology.org/2020.coling-main.522
DOI:
10.18653/v1/2020.coling-main.522
Bibkey:
Cite (ACL):
Yuming Zhai, Gabriel Illouz, and Anne Vilnat. 2020. Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5944–5956, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models (Zhai et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.522.pdf
Code
 yumingzhai/nlt_xlm
Data
EuroparlXNLI