Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction

Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun


Abstract
Cross-lingual natural language processing hinges on the premise that there exists invariance across languages. At the word level, researchers have identified such invariance in the word embedding semantic spaces of different languages. However, in order to connect the separate spaces, cross-lingual supervision encoded in parallel data is typically required. In this paper, we attempt to establish the cross-lingual connection without relying on any cross-lingual supervision. By viewing word embedding spaces as distributions, we propose to minimize their earth mover’s distance, a measure of divergence between distributions. We demonstrate the success on the unsupervised bilingual lexicon induction task. In addition, we reveal an interesting finding that the earth mover’s distance shows potential as a measure of language difference.
Anthology ID:
D17-1207
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1934–1945
Language:
URL:
https://aclanthology.org/D17-1207
DOI:
10.18653/v1/D17-1207
Bibkey:
Cite (ACL):
Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1934–1945, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction (Zhang et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1207.pdf
Video:
 https://vimeo.com/238232779