Automatic Verification and Augmentation of Multilingual Lexicons

Maryam Aminian, Mohamed Al-Badrashiny, Mona Diab


Abstract
We present an approach for automatic verification and augmentation of multilingual lexica. We exploit existing parallel and monolingual corpora to extract multilingual correspondents via tri-angulation. We demonstrate the efficacy of our approach on two publicly available resources: Tharwa, a three-way lexicon comprising Dialectal Arabic, Modern Standard Arabic and English lemmas among other information (Diab et al., 2014); and BabelNet, a multilingual thesaurus comprising over 276 languages including Arabic variant entries (Navigli and Ponzetto, 2012). Our automated approach yields an F1-score of 71.71% in generating correct multilingual correspondents against gold Tharwa, and 54.46% against gold BabelNet without any human intervention.
Anthology ID:
W16-4810
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
73–81
Language:
URL:
https://aclanthology.org/W16-4810
DOI:
Bibkey:
Cite (ACL):
Maryam Aminian, Mohamed Al-Badrashiny, and Mona Diab. 2016. Automatic Verification and Augmentation of Multilingual Lexicons. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 73–81, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Automatic Verification and Augmentation of Multilingual Lexicons (Aminian et al., VarDial 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4810.pdf