Iterative Language Model Adaptation for Indo-Aryan Language Identification

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén


Abstract
This paper presents the experiments and results obtained by the SUKI team in the Indo-Aryan Language Identification shared task of the VarDial 2018 Evaluation Campaign. The shared task was an open one, but we did not use any corpora other than what was distributed by the organizers. A total of eight teams provided results for this shared task. Our submission using a HeLI-method based language identifier with iterative language model adaptation obtained the best results in the shared task with a macro F1-score of 0.958.
Anthology ID:
W18-3907
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
66–75
Language:
URL:
https://aclanthology.org/W18-3907
DOI:
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Heidi Jauhiainen, and Krister Lindén. 2018. Iterative Language Model Adaptation for Indo-Aryan Language Identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 66–75, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Iterative Language Model Adaptation for Indo-Aryan Language Identification (Jauhiainen et al., VarDial 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3907.pdf