Do Explicit Alignments Robustly Improve Multilingual Encoders?

Shijie Wu; Mark Dredze

doi:10.18653/v1/2020.emnlp-main.362

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Abstract

Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised multilingual encoders can effectively learn cross-lingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. However, word-level alignments are often suboptimal and such bitexts are unavailable for many languages. In this paper, we propose a new contrastive alignment objective that can better utilize such signal, and examine whether these previous alignment methods can be adapted to noisier sources of aligned data: a randomly sampled 1 million pair subset of the OPUS collection. Additionally, rather than report results on a single dataset with a single model run, we report the mean and standard derivation of multiple runs with different seeds, on four datasets and tasks. Our more extensive analysis finds that, while our new objective outperforms previous work, overall these methods do not improve performance with a more robust evaluation framework. Furthermore, the gains from using a better underlying model eclipse any benefits from alignment training. These negative results dictate more care in evaluating these methods and suggest limitations in applying explicit alignment objectives.

Anthology ID:: 2020.emnlp-main.362
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4471–4482
Language:
URL:: https://aclanthology.org/2020.emnlp-main.362/
DOI:: 10.18653/v1/2020.emnlp-main.362
Bibkey:
Cite (ACL):: Shijie Wu and Mark Dredze. 2020. Do Explicit Alignments Robustly Improve Multilingual Encoders?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4471–4482, Online. Association for Computational Linguistics.
Cite (Informal):: Do Explicit Alignments Robustly Improve Multilingual Encoders? (Wu & Dredze, EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.362.pdf
Video:: https://slideslive.com/38939127
Code: shijie-wu/crosslingual-nlp
Data: OPUS-100, XNLI

PDF Cite Search Code Video Fix data