Variation in Coreference Strategies across Genres and Production Media

Berfin Aktaş, Manfred Stede


Abstract
In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of “behavior” for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken–written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.
Anthology ID:
2020.coling-main.508
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5774–5785
Language:
URL:
https://aclanthology.org/2020.coling-main.508
DOI:
10.18653/v1/2020.coling-main.508
Bibkey:
Cite (ACL):
Berfin Aktaş and Manfred Stede. 2020. Variation in Coreference Strategies across Genres and Production Media. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5774–5785, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Variation in Coreference Strategies across Genres and Production Media (Aktaş & Stede, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.508.pdf
Code
 berfingit/coreference-variation
Data
OntoNotes 5.0