Aleda, a free large-scale entity database for French

Benoît Sagot, Rosa Stern


Abstract
Named entity recognition, which focuses on the identification of the span and type of named entity mentions in texts, has drawn the attention of the NLP community for a long time. However, many real-life applications need to know which real entity each mention refers to. For such a purpose, often refered to as entity resolution and linking, an inventory of entities is required in order to constitute a reference. In this paper, we describe how we extracted such a resource for French from freely available resources (the French Wikipedia and the GeoNames database). We describe the results of an instrinsic evaluation of the resulting entity database, named Aleda, as well as those of a task-based evaluation in the context of a named entity detection system. We also compare it with the NLGbAse database (Charton and Torres-Moreno, 2010), a resource with similar objectives.
Anthology ID:
L12-1664
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1273–1276
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1124_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Benoît Sagot and Rosa Stern. 2012. Aleda, a free large-scale entity database for French. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1273–1276, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Aleda, a free large-scale entity database for French (Sagot & Stern, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1124_Paper.pdf