A Positional Tagset for Russian

Jirka Hana, Anna Feldman


Abstract
Fusional languages have rich inflection. As a consequence, tagsets capturing their morphological features are necessarily large. A natural way to make a tagset manageable is to use a structured system. In this paper, we present a positional tagset for describing morphological properties of Russian. The tagset was inspired by the Czech positional system (Hajic, 2004). We have used preliminary versions of this tagset in our previous work (e.g., Hana et al. (2004, 2006); Feldman (2006); Feldman and Hana (2010)). Here, we both systematize and extend these preliminary versions (by adding information about animacy, aspect and reflexivity); give a more detailed description of the tagset and provide comparison with the Czech system. Each tag of the tagset consists of 16 positions, each encoding one morphological feature (part-of-speech, detailed part-of-speech, gender, animacy, number, case, possessor's gender and number, person, reflexivity, tense, aspect, degree of comparison, negation, voice, variant). The tagset contains approximately 2,000 tags.
Anthology ID:
L10-1555
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/807_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Jirka Hana and Anna Feldman. 2010. A Positional Tagset for Russian. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
A Positional Tagset for Russian (Hana & Feldman, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/807_Paper.pdf