The KiezDeutsch Korpus (KiDKo) Release 1.0

Ines Rehbein, Sören Schalowski, Heike Wiese


Abstract
This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multiethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.
Anthology ID:
L14-1062
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3927–3934
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1081_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ines Rehbein, Sören Schalowski, and Heike Wiese. 2014. The KiezDeutsch Korpus (KiDKo) Release 1.0. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3927–3934, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
The KiezDeutsch Korpus (KiDKo) Release 1.0 (Rehbein et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1081_Paper.pdf