ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

Olga Uryupina, Ron Artstein, Antonella Bristot, Federica Cavicchio, Kepa Rodriguez, Massimo Poesio


Abstract
This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.
Anthology ID:
L16-1326
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2058–2062
Language:
URL:
https://aclanthology.org/L16-1326
DOI:
Bibkey:
Cite (ACL):
Olga Uryupina, Ron Artstein, Antonella Bristot, Federica Cavicchio, Kepa Rodriguez, and Massimo Poesio. 2016. ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2058–2062, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions (Uryupina et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1326.pdf