XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann


Abstract
We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Anthology ID:
2020.coling-main.575
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6542–6552
Language:
URL:
https://aclanthology.org/2020.coling-main.575
DOI:
10.18653/v1/2020.coling-main.575
Bibkey:
Cite (ACL):
Emily Öhman, Marc Pàmies, Kaisla Kajava, and Jörg Tiedemann. 2020. XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6542–6552, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection (Öhman et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.575.pdf
Code
 Helsinki-NLP/XED
Data
XEDGoEmotions