Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian

Lauriane Aufrant, Guillaume Wisniewski, François Yvon


Abstract
Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages. That is why, we apply state-of-the-art methods for cross-lingual transfer on Romanian tagging and parsing, from English and several Romance languages. We compare the performance with monolingual systems trained with sets of different sizes and establish that training on a few sentences in target language yields better results than transferring from large datasets in other languages.
Anthology ID:
L16-1241
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1520–1526
Language:
URL:
https://aclanthology.org/L16-1241
DOI:
Bibkey:
Cite (ACL):
Lauriane Aufrant, Guillaume Wisniewski, and François Yvon. 2016. Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1520–1526, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian (Aufrant et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1241.pdf