Centroid-based Text Summarization through Compositionality of Word Embeddings

Gaetano Rossiello, Pierpaolo Basile, Giovanni Semeraro


Abstract
The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroid-based method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks.
Anthology ID:
W17-1003
Volume:
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
George Giannakopoulos, Elena Lloret, John M. Conroy, Josef Steinberger, Marina Litvak, Peter Rankel, Benoit Favre
Venue:
MultiLing
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–21
Language:
URL:
https://aclanthology.org/W17-1003
DOI:
10.18653/v1/W17-1003
Bibkey:
Cite (ACL):
Gaetano Rossiello, Pierpaolo Basile, and Giovanni Semeraro. 2017. Centroid-based Text Summarization through Compositionality of Word Embeddings. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, pages 12–21, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Centroid-based Text Summarization through Compositionality of Word Embeddings (Rossiello et al., MultiLing 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1003.pdf
Code
 gaetangate/text-summarizer +  additional community code