French and German Corpora for Audience-based Text Type Classification

Amalia Todirascu, Sebastian Padó, Jennifer Krisch, Max Kisselew, Ulrich Heid


Abstract
This paper presents some of the results of the CLASSYN project which investigated the classification of text according to audience-related text types. We describe the design principles and the properties of the French and German linguistically annotated corpora that we have created. We report on tools used to collect the data and on the quality of the syntactic annotation. The CLASSYN corpora comprise two text collections to investigate general text types difference between scientific and popular science text on the two domains of medical and computer science.
Anthology ID:
L12-1286
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1591–1597
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/518_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Amalia Todirascu, Sebastian Padó, Jennifer Krisch, Max Kisselew, and Ulrich Heid. 2012. French and German Corpora for Audience-based Text Type Classification. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1591–1597, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
French and German Corpora for Audience-based Text Type Classification (Todirascu et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/518_Paper.pdf