The Impact of Positional Encodings on Multilingual Compression

Vinit Ravishankar, Anders Søgaard


Abstract
In order to preserve word-order information in a non-autoregressive setting, transformer architectures tend to include positional knowledge, by (for instance) adding positional encodings to token embeddings. Several modifications have been proposed over the sinusoidal positional encodings used in the original transformer architecture; these include, for instance, separating position encodings and token embeddings, or directly modifying attention weights based on the distance between word pairs. We first show that surprisingly, while these modifications tend to improve monolingual language models, none of them result in better multilingual language models. We then answer why that is: sinusoidal encodings were explicitly designed to facilitate compositionality by allowing linear projections over arbitrary time steps. Higher variances in multilingual training distributions requires higher compression, in which case, compositionality becomes indispensable. Learned absolute positional encodings (e.g., in mBERT) tend to approximate sinusoidal embeddings in multilingual settings, but more complex positional encoding architectures lack the inductive bias to effectively learn cross-lingual alignment. In other words, while sinusoidal positional encodings were designed for monolingual applications, they are particularly useful in multilingual language models.
Anthology ID:
2021.emnlp-main.59
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
763–777
Language:
URL:
https://aclanthology.org/2021.emnlp-main.59
DOI:
10.18653/v1/2021.emnlp-main.59
Bibkey:
Cite (ACL):
Vinit Ravishankar and Anders Søgaard. 2021. The Impact of Positional Encodings on Multilingual Compression. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 763–777, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
The Impact of Positional Encodings on Multilingual Compression (Ravishankar & Søgaard, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.59.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.59.mp4
Data
MultiNLIXNLI