Evaluating Language Tools for Fifteen EU-official Under-resourced Languages

Diego Alves, Gaurish Thakkar, Marko Tadić


Abstract
This article presents the results of the evaluation campaign of language tools available for fifteen EU-official under-resourced languages. The evaluation was conducted within the MSC ITN CLEOPATRA action that aims at building the cross-lingual event-centric knowledge processing on top of the application of linguistic processing chains (LPCs) for at least 24 EU-official languages. In this campaign, we concentrated on three existing NLP platforms (Stanford CoreNLP, NLP Cube, UDPipe) that all provide models for under-resourced languages and in this first run we covered 15 under-resourced languages for which the models were available. We present the design of the evaluation campaign and present the results as well as discuss them. We considered the difference between reported and our tested results within a single percentage point as being within the limits of acceptable tolerance and thus consider this result as reproducible. However, for a number of languages, the results are below what was reported in the literature, and in some cases, our testing results are even better than the ones reported previously. Particularly problematic was the evaluation of NERC systems. One of the reasons is the absence of universally or cross-lingually applicable named entities classification scheme that would serve the NERC task in different languages analogous to the Universal Dependency scheme in parsing task. To build such a scheme has become one of our the future research directions.
Anthology ID:
2020.lrec-1.230
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1866–1873
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.230
DOI:
Bibkey:
Cite (ACL):
Diego Alves, Gaurish Thakkar, and Marko Tadić. 2020. Evaluating Language Tools for Fifteen EU-official Under-resourced Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1866–1873, Marseille, France. European Language Resources Association.
Cite (Informal):
Evaluating Language Tools for Fifteen EU-official Under-resourced Languages (Alves et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.230.pdf