Ensemble Methods to Distinguish Mainland and Taiwan Chinese

Hai Hu, Wen Li, He Zhou, Zuoyu Tian, Yiwen Zhang, Liang Zou


Abstract
This paper describes the IUCL system at VarDial 2019 evaluation campaign for the task of discriminating between Mainland and Taiwan variation of mandarin Chinese. We first build several base classifiers, including a Naive Bayes classifier with word n-gram as features, SVMs with both character and syntactic features, and neural networks with pre-trained character/word embeddings. Then we adopt ensemble methods to combine output from base classifiers to make final predictions. Our ensemble models achieve the highest F1 score (0.893) in simplified Chinese track and the second highest (0.901) in traditional Chinese track. Our results demonstrate the effectiveness and robustness of the ensemble methods.
Anthology ID:
W19-1417
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–171
Language:
URL:
https://aclanthology.org/W19-1417
DOI:
10.18653/v1/W19-1417
Bibkey:
Cite (ACL):
Hai Hu, Wen Li, He Zhou, Zuoyu Tian, Yiwen Zhang, and Liang Zou. 2019. Ensemble Methods to Distinguish Mainland and Taiwan Chinese. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 165–171, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Ensemble Methods to Distinguish Mainland and Taiwan Chinese (Hu et al., VarDial 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1417.pdf