Fusion of Simple Models for Native Language Identification

Fabio Kepler, Ramon F. Astudillo, Alberto Abad


Abstract
In this paper we describe the approaches we explored for the 2017 Native Language Identification shared task. We focused on simple word and sub-word units avoiding heavy use of hand-crafted features. Following recent trends, we explored linear and neural networks models to attempt to compensate for the lack of rich feature use. Initial efforts yielded f1-scores of 82.39% and 83.77% in the development and test sets of the fusion track, and were officially submitted to the task as team L2F. After the task was closed, we carried on further experiments and relied on a late fusion strategy for combining our simple proposed approaches with modifications of the baselines provided by the task. As expected, the i-vectors based sub-system dominates the performance of the system combinations, and results in the major contributor to our achieved scores. Our best combined system achieves 90.1% and 90.2% f1-score in the development and test sets of the fusion track, respectively.
Anthology ID:
W17-5048
Volume:
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Joel Tetreault, Jill Burstein, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
423–429
Language:
URL:
https://aclanthology.org/W17-5048
DOI:
10.18653/v1/W17-5048
Bibkey:
Cite (ACL):
Fabio Kepler, Ramon F. Astudillo, and Alberto Abad. 2017. Fusion of Simple Models for Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 423–429, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Fusion of Simple Models for Native Language Identification (Kepler et al., BEA 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5048.pdf