Czech Legal Text Treebank 1.0

Vincent Kríž, Barbora Hladká, Zdeňka Urešová


Abstract
We introduce a new member of the family of Prague dependency treebanks. The Czech Legal Text Treebank 1.0 is a morphologically and syntactically annotated corpus of 1,128 sentences. The treebank contains texts from the legal domain, namely the documents from the Collection of Laws of the Czech Republic. Legal texts differ from other domains in several language phenomena influenced by rather high frequency of very long sentences. A manual annotation of such sentences presents a new challenge. We describe a strategy and tools for this task. The resulting treebank can be explored in various ways. It can be downloaded from the LINDAT/CLARIN repository and viewed locally using the TrEd editor or it can be accessed on-line using the KonText and TreeQuery tools.
Anthology ID:
L16-1378
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2387–2392
Language:
URL:
https://aclanthology.org/L16-1378
DOI:
Bibkey:
Cite (ACL):
Vincent Kríž, Barbora Hladká, and Zdeňka Urešová. 2016. Czech Legal Text Treebank 1.0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2387–2392, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Czech Legal Text Treebank 1.0 (Kríž et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1378.pdf