Prague Dependency Style Treebank for Tamil

Loganathan Ramasamy, Zdeněk Žabokrtský


Abstract
Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our efforts in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT) and consists of annotation at 2 levels or layers: (i) morphological layer (m-layer) and (ii) analytical layer (a-layer). For both the layers, we introduce annotation schemes i.e. positional tagging for m-layer and dependency relations for a-layers. Finally, we discuss some of the issues in treebank development for Tamil.
Anthology ID:
L12-1242
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1888–1894
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/456_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Loganathan Ramasamy and Zdeněk Žabokrtský. 2012. Prague Dependency Style Treebank for Tamil. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1888–1894, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Prague Dependency Style Treebank for Tamil (Ramasamy & Žabokrtský, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/456_Paper.pdf