An Out-of-Domain Test Suite for Dependency Parsing of German

Wolfgang Seeker, Jonas Kuhn


Abstract
We present a dependency conversion of five German test sets from five different genres. The dependency representation is made as similar as possible to the dependency representation of TiGer, one of the two big syntactic treebanks of German. The purpose of these test sets is to enable researchers to test dependency parsing models on several different data sets from different text genres. We discuss some easy to compute statistics to demonstrate the variation and differences in the test sets and provide some baseline experiments where we test the effect of additional lexical knowledge on the out-of-domain performance of two state-of-the-art dependency parsers. Finally, we demonstrate with three small experiments that text normalization may be an important step in the standard processing pipeline when applied in an out-of-domain setting.
Anthology ID:
L14-1627
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4066–4073
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/809_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Wolfgang Seeker and Jonas Kuhn. 2014. An Out-of-Domain Test Suite for Dependency Parsing of German. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4066–4073, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
An Out-of-Domain Test Suite for Dependency Parsing of German (Seeker & Kuhn, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/809_Paper.pdf