Fine-grained Named Entity Annotations for German Biographic Interviews

Josef Ruppenhofer, Ines Rehbein, Carolina Flinz


Abstract
We present a fine-grained NER annotations with 30 labels and apply it to German data. Building on the OntoNotes 5.0 NER inventory, our scheme is adapted for a corpus of transcripts of biographic interviews by adding categories for AGE and LAN(guage) and also features extended numeric and temporal categories. Applying the scheme to the spoken data as well as a collection of teaser tweets from newspaper sites, we can confirm its generality for both domains, also achieving good inter-annotator agreement. We also show empirically how our inventory relates to the well-established 4-category NER inventory by re-annotating a subset of the GermEval 2014 NER coarse-grained dataset with our fine label inventory. Finally, we use a BERT-based system to establish some baseline models for NER tagging on our two new datasets. Global results in in-domain testing are quite high on the two datasets, near what was achieved for the coarse inventory on the CoNLLL2003 data. Cross-domain testing produces much lower results due to the severe domain differences.
Anthology ID:
2020.lrec-1.566
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4605–4614
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.566
DOI:
Bibkey:
Cite (ACL):
Josef Ruppenhofer, Ines Rehbein, and Carolina Flinz. 2020. Fine-grained Named Entity Annotations for German Biographic Interviews. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4605–4614, Marseille, France. European Language Resources Association.
Cite (Informal):
Fine-grained Named Entity Annotations for German Biographic Interviews (Ruppenhofer et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.566.pdf