A Dataset for Anaphora Analysis in French Emails

Hani Guenoune, Kevin Cousot, Mathieu Lafourcade, Melissa Mekaoui, Cédric Lopez


Abstract
In 2019, about 293 billion emails were sent worldwide every day. They are a valuable source of information and knowledge for professionals. Since the 90’s, many studies have been done on emails and have highlighted the need for resources regarding numerous NLP tasks. Due to the lack of available resources for French, very few studies on emails have been conducted. Anaphora resolution in emails is an unexplored area, annotated resources are needed, at least to answer a first question: Does email communication have specifics that must be addressed to tackle the anaphora resolution task? In order to answer this question 1) we build a French emails corpus composed of 100 anonymized professional threads and make it available freely for scientific exploitation. 2) we provide annotations of anaphoric links in the email collection.
Anthology ID:
2020.crac-1.17
Volume:
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
December
Year:
2020
Address:
Barcelona, Spain (online)
Editors:
Maciej Ogrodniczuk, Vincent Ng, Yulia Grishina, Sameer Pradhan
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–175
Language:
URL:
https://aclanthology.org/2020.crac-1.17
DOI:
Bibkey:
Cite (ACL):
Hani Guenoune, Kevin Cousot, Mathieu Lafourcade, Melissa Mekaoui, and Cédric Lopez. 2020. A Dataset for Anaphora Analysis in French Emails. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 165–175, Barcelona, Spain (online). Association for Computational Linguistics.
Cite (Informal):
A Dataset for Anaphora Analysis in French Emails (Guenoune et al., CRAC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.crac-1.17.pdf