A Proposal for a Part-of-Speech Tagset for the Albanian Language

Besim Kabashi, Thomas Proisl


Abstract
Part-of-speech tagging is a basic step in Natural Language Processing that is often essential. Labeling the word forms of a text with fine-grained word-class information adds new value to it and can be a prerequisite for downstream processes like a dependency parser. Corpus linguists and lexicographers also benefit greatly from the improved search options that are available with tagged data. The Albanian language has some properties that pose difficulties for the creation of a part-of-speech tagset. In this paper, we discuss those difficulties and present a proposal for a part-of-speech tagset that can adequately represent the underlying linguistic phenomena.
Anthology ID:
L16-1682
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4305–4310
Language:
URL:
https://aclanthology.org/L16-1682
DOI:
Bibkey:
Cite (ACL):
Besim Kabashi and Thomas Proisl. 2016. A Proposal for a Part-of-Speech Tagset for the Albanian Language. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4305–4310, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Proposal for a Part-of-Speech Tagset for the Albanian Language (Kabashi & Proisl, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1682.pdf