Can Topic Modelling benefit from Word Sense Information?

Adriana Ferrugento, Hugo Gonçalo Oliveira, Ana Alves, Filipe Rodrigues


Abstract
This paper proposes a new topic model that exploits word sense information in order to discover less redundant and more informative topics. Word sense information is obtained from WordNet and the discovered topics are groups of synsets, instead of mere surface words. A key feature is that all the known senses of a word are considered, with their probabilities. Alternative configurations of the model are described and compared to each other and to LDA, the most popular topic model. However, the obtained results suggest that there are no benefits of enriching LDA with word sense information.
Anthology ID:
L16-1540
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3387–3393
Language:
URL:
https://aclanthology.org/L16-1540
DOI:
Bibkey:
Cite (ACL):
Adriana Ferrugento, Hugo Gonçalo Oliveira, Ana Alves, and Filipe Rodrigues. 2016. Can Topic Modelling benefit from Word Sense Information?. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3387–3393, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Can Topic Modelling benefit from Word Sense Information? (Ferrugento et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1540.pdf
Code
 aferrugento/SemLDA