TWEETSUM: Event oriented Social Summarization Dataset

Ruifang He, Liangliang Zhao, Huanyu Liu


Abstract
With social media becoming popular, a vast of short and noisy messages are produced by millions of users when a hot event happens. Developing social summarization systems becomes more and more critical for people to quickly grasp core and essential information. However, the publicly available and high-quality large scale social summarization dataset is rare. Constructing such corpus is not easy and very expensive since short texts have very complex social characteristics. In this paper, we construct TWEETSUM, a new event-oriented dataset for social summarization. The original data is collected from twitter and contains 12 real world hot events with a total of 44,034 tweets and 11,240 users. Each event has four expert summaries, and we also have the annotation quality evaluation. In addition, we collect additional social signals (i.e. user relations, hashtags and user profiles) and further establish user relation network for each event. Besides the detailed dataset description, we show the performance of several typical extractive summarization methods on TWEETSUM to establish baselines. For further researches, we will release this dataset to the public.
Anthology ID:
2020.coling-main.504
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5731–5736
Language:
URL:
https://aclanthology.org/2020.coling-main.504
DOI:
10.18653/v1/2020.coling-main.504
Bibkey:
Cite (ACL):
Ruifang He, Liangliang Zhao, and Huanyu Liu. 2020. TWEETSUM: Event oriented Social Summarization Dataset. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5731–5736, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
TWEETSUM: Event oriented Social Summarization Dataset (He et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.504.pdf