A Finite-State Morphological Analyser for Evenki

Anna Zueva, Anastasia Kuznetsova, Francis Tyers


Abstract
It has been widely admitted that morphological analysis is an important step in automated text processing for morphologically rich languages. Evenki is a language with rich morphology, therefore a morphological analyser is highly desirable for processing Evenki texts and developing applications for Evenki. Although two morphological analysers for Evenki have already been developed, they are able to analyse less than a half of the available Evenki corpora. The aim of this paper is to create a new morphological analyser for Evenki. It is implemented using the Helsinki Finite-State Transducer toolkit (HFST). The lexc formalism is used to specify the morphotactic rules, which define the valid orderings of morphemes in a word. Morphophonological alternations and orthographic rules are described using the twol formalism. The lexicon is extracted from available machine-readable dictionaries. Since a part of the corpora belongs to texts in Evenki dialects, a version of the analyser with relaxed rules is developed for processing dialectal features. We evaluate the analyser on available Evenki corpora and estimate precision, recall and F-score. We obtain coverage scores of between 61% and 87% on the available Evenki corpora.
Anthology ID:
2020.lrec-1.314
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2581–2589
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.314
DOI:
Bibkey:
Cite (ACL):
Anna Zueva, Anastasia Kuznetsova, and Francis Tyers. 2020. A Finite-State Morphological Analyser for Evenki. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2581–2589, Marseille, France. European Language Resources Association.
Cite (Informal):
A Finite-State Morphological Analyser for Evenki (Zueva et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.314.pdf