Multi-source morphosyntactic tagging for spoken Rusyn

Yves Scherrer, Achim Rabus


Abstract
This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn. As neither annotated corpora nor parallel corpora are electronically available for Rusyn, we propose to combine existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish and adapt them to Rusyn. Using MarMoT as tagging toolkit, we show that a tagger trained on a balanced set of the four source languages outperforms single language taggers by about 9%, and that additional automatically induced morphosyntactic lexicons lead to further improvements. The best observed accuracies for Rusyn are 82.4% for part-of-speech tagging and 75.5% for full morphological tagging.
Anthology ID:
W17-1210
Volume:
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Preslav Nakov, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann, Shevin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–92
Language:
URL:
https://aclanthology.org/W17-1210
DOI:
10.18653/v1/W17-1210
Bibkey:
Cite (ACL):
Yves Scherrer and Achim Rabus. 2017. Multi-source morphosyntactic tagging for spoken Rusyn. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pages 84–92, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Multi-source morphosyntactic tagging for spoken Rusyn (Scherrer & Rabus, VarDial 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1210.pdf
Data
MULTEXT-EastUniversal Dependencies