PotTS: The Potsdam Twitter Sentiment Corpus

Uladzimir Sidarenka


Abstract
In this paper, we introduce a novel comprehensive dataset of 7,992 German tweets, which were manually annotated by two human experts with fine-grained opinion relations. A rich annotation scheme used for this corpus includes such sentiment-relevant elements as opinion spans, their respective sources and targets, emotionally laden terms with their possible contextual negations and modifiers. Various inter-annotator agreement studies, which were carried out at different stages of work on these data (at the initial training phase, upon an adjudication step, and after the final annotation run), reveal that labeling evaluative judgements in microblogs is an inherently difficult task even for professional coders. These difficulties, however, can be alleviated by letting the annotators revise each other’s decisions. Once rechecked, the experts can proceed with the annotation of further messages, staying at a fairly high level of agreement.
Anthology ID:
L16-1181
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1133–1141
Language:
URL:
https://aclanthology.org/L16-1181
DOI:
Bibkey:
Cite (ACL):
Uladzimir Sidarenka. 2016. PotTS: The Potsdam Twitter Sentiment Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1133–1141, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
PotTS: The Potsdam Twitter Sentiment Corpus (Sidarenka, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1181.pdf