A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News

Corbèn Poot, Andreas van Cranenburgh


Abstract
We evaluate a rule-based (Lee et al., 2013) and neural (Lee et al., 2018) coreference system on Dutch datasets of two domains: literary novels and news/Wikipedia text. The results provide insight into the relative strengths of data-driven and knowledge-driven systems, as well as the influence of domain, document length, and annotation schemes. The neural system performs best on news/Wikipedia text, while the rule-based system performs best on literature. The neural system shows weaknesses with limited training data and long documents, while the rule-based system is affected by annotation differences. The code and models used in this paper are available at https://github.com/andreasvc/crac2020
Anthology ID:
2020.crac-1.9
Volume:
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
December
Year:
2020
Address:
Barcelona, Spain (online)
Editors:
Maciej Ogrodniczuk, Vincent Ng, Yulia Grishina, Sameer Pradhan
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–90
Language:
URL:
https://aclanthology.org/2020.crac-1.9
DOI:
Bibkey:
Cite (ACL):
Corbèn Poot and Andreas van Cranenburgh. 2020. A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 79–90, Barcelona, Spain (online). Association for Computational Linguistics.
Cite (Informal):
A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News (Poot & van Cranenburgh, CRAC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.crac-1.9.pdf
Code
 andreasvc/crac2020