Chunking Historical German

Katrin Ortmann


Abstract
Quantitative studies of historical syntax require large amounts of syntactically annotated data, which are rarely available. The application of NLP methods could reduce manual annotation effort, provided that they achieve sufficient levels of accuracy. The present study investigates the automatic identification of chunks in historical German texts. Because no training data exists for this task, chunks are extracted from modern and historical constituency treebanks and used to train a CRF-based neural sequence labeling tool. The evaluation shows that the neural chunker outperforms an unlexicalized baseline and achieves overall F-scores between 90% and 94% for different historical data sets when POS tags are used as feature. The conducted experiments demonstrate the usefulness of including historical training data while also highlighting the importance of reducing boundary errors to improve annotation precision.
Anthology ID:
2021.nodalida-main.19
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
190–199
Language:
URL:
https://aclanthology.org/2021.nodalida-main.19
DOI:
Bibkey:
Cite (ACL):
Katrin Ortmann. 2021. Chunking Historical German. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 190–199, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Chunking Historical German (Ortmann, NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.19.pdf
Code
 rubcompling/nodalida2021