BUCC2020: Bilingual Dictionary Induction using Cross-lingual Embedding

Sanjanasri JP, Vijay Krishna Menon, Soman KP


Abstract
This paper presents a deep learning system for the BUCC 2020 shared task: Bilingual dictionary induction from comparable corpora. We have submitted two runs for this shared Task, German (de) and English (en) language pair for “closed track” and Tamil (ta) and English (en) for the “open track”. Our core approach focuses on quantifying the semantics of the language pairs, so that semantics of two different language pairs can be compared or transfer learned. With the advent of word embeddings, it is possible to quantify this. In this paper, we propose a deep learning approach which makes use of the supplied training data, to generate cross-lingual embedding. This is later used for inducting bilingual dictionary from comparable corpora.
Anthology ID:
2020.bucc-1.11
Volume:
Proceedings of the 13th Workshop on Building and Using Comparable Corpora
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
Venue:
BUCC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
65–68
Language:
English
URL:
https://aclanthology.org/2020.bucc-1.11
DOI:
Bibkey:
Cite (ACL):
Sanjanasri JP, Vijay Krishna Menon, and Soman KP. 2020. BUCC2020: Bilingual Dictionary Induction using Cross-lingual Embedding. In Proceedings of the 13th Workshop on Building and Using Comparable Corpora, pages 65–68, Marseille, France. European Language Resources Association.
Cite (Informal):
BUCC2020: Bilingual Dictionary Induction using Cross-lingual Embedding (JP et al., BUCC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.bucc-1.11.pdf