Finite-state morphological transducers for three Kypchak languages

Jonathan Washington, Ilnar Salimzyanov, Francis Tyers


Abstract
This paper describes the development of free/open-source finite-state morphological transducers for three Turkic languages―Kazakh, Tatar, and Kumyk―representing one language from each of the three sub-branches of the Kypchak branch of Turkic. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST). This paper describes how the development of a transducer for each subsequent closely-related language took less development time. An evaluation is presented which shows that the transducers all have a reasonable coverage―around 90%―on freely available corpora of the languages, and high precision over a manually verified test set.
Anthology ID:
L14-1143
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3378–3385
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1207_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Jonathan Washington, Ilnar Salimzyanov, and Francis Tyers. 2014. Finite-state morphological transducers for three Kypchak languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3378–3385, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Finite-state morphological transducers for three Kypchak languages (Washington et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1207_Paper.pdf