Word Mover’s Embedding: From Word2Vec to Document Embedding

Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, Michael J. Witbrock


Abstract
While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called Word Mover’s Distance (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the Word Mover’s Embedding (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.
Anthology ID:
D18-1482
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4524–4534
Language:
URL:
https://aclanthology.org/D18-1482
DOI:
10.18653/v1/D18-1482
Bibkey:
Cite (ACL):
Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, and Michael J. Witbrock. 2018. Word Mover’s Embedding: From Word2Vec to Document Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4524–4534, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Word Mover’s Embedding: From Word2Vec to Document Embedding (Wu et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1482.pdf
Attachment:
 D18-1482.Attachment.zip
Code
 IBM/WordMoversEmbeddings