Reconstructing NER Corpora: a Case Study on Bulgarian

Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, Alexander Popov


Abstract
The paper reports on the usage of deep learning methods for improving a Named Entity Recognition (NER) training corpus and for predicting and annotating new types in a test corpus. We show how the annotations in a type-based corpus of named entities (NE) were populated as occurrences within it, thus ensuring density of the training information. A deep learning model was adopted for discovering inconsistencies in the initial annotation and for learning new NE types. The evaluation results get improved after data curation, randomization and deduplication.
Anthology ID:
2020.lrec-1.571
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4647–4652
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.571
DOI:
Bibkey:
Cite (ACL):
Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, and Alexander Popov. 2020. Reconstructing NER Corpora: a Case Study on Bulgarian. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4647–4652, Marseille, France. European Language Resources Association.
Cite (Informal):
Reconstructing NER Corpora: a Case Study on Bulgarian (Marinova et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.571.pdf