Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation

Nianheng Wu, Eric DeMattos, Kwok Him So, Pin-zhen Chen, Çağrı Çöltekin


Abstract
This paper describes the work done by team tearsofjoy participating in the VarDial 2019 Evaluation Campaign. We developed two systems based on Support Vector Machines: SVM with a flat combination of features and SVM ensembles. We participated in all language/dialect identification tasks, as well as the Moldavian vs. Romanian cross-dialect topic identification (MRC) task. Our team achieved first place in German Dialect identification (GDI) and MRC subtasks 2 and 3, second place in the simplified variant of Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT) as well as Cuneiform Language Identification (CLI), and third and fifth place in DMT traditional and MRC subtask 1 respectively. In most cases, the SVM with a flat combination of features performed better than SVM ensembles. Besides describing the systems and the results obtained by them, we provide a tentative comparison between the feature combination methods, and present additional experiments with a method of adaptation to the test set, which may indicate potential pitfalls with some of the data sets.
Anthology ID:
W19-1406
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–63
Language:
URL:
https://aclanthology.org/W19-1406
DOI:
10.18653/v1/W19-1406
Bibkey:
Cite (ACL):
Nianheng Wu, Eric DeMattos, Kwok Him So, Pin-zhen Chen, and Çağrı Çöltekin. 2019. Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 54–63, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation (Wu et al., VarDial 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1406.pdf