N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format

Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, Andreas Both


Abstract
Extracting Linked Data following the Semantic Web principle from unstructured sources has become a key challenge for scientific research. Named Entity Recognition and Disambiguation are two basic operations in this extraction process. One step towards the realization of the Semantic Web vision and the development of highly accurate tools is the availability of data for validating the quality of processes for Named Entity Recognition and Disambiguation as well as for algorithm tuning. This article presents three novel, manually curated and annotated corpora (N3). All of them are based on a free license and stored in the NLP Interchange Format to leverage the Linked Data character of our datasets.
Anthology ID:
L14-1662
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3529–3533
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/856_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, and Andreas Both. 2014. N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3529–3533, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format (Röder et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/856_Paper.pdf