Probing Multilingual Sentence Representations With X-Probe

Vinit Ravishankar, Lilja Øvrelid, Erik Velldal


Abstract
This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.
Anthology ID:
W19-4318
Volume:
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–168
Language:
URL:
https://aclanthology.org/W19-4318
DOI:
10.18653/v1/W19-4318
Bibkey:
Cite (ACL):
Vinit Ravishankar, Lilja Øvrelid, and Erik Velldal. 2019. Probing Multilingual Sentence Representations With X-Probe. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 156–168, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Probing Multilingual Sentence Representations With X-Probe (Ravishankar et al., RepL4NLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4318.pdf
Data
XNLI