Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information

Wenyu Zhao, Dong Zhou, Lin Li, Jinjun Chen


Abstract
Recent studies show that word embedding models often underestimate similarities between similar words and overestimate similarities between distant words. This results in word similarity results obtained from embedding models inconsistent with human judgment. Manifold learning-based methods are widely utilized to refine word representations by re-embedding word vectors from the original embedding space to a new refined semantic space. These methods mainly focus on preserving local geometry information through performing weighted locally linear combination between words and their neighbors twice. However, these reconstruction weights are easily influenced by different selections of neighboring words and the whole combination process is time-consuming. In this paper, we propose two novel word representation refinement methods leveraging isometry feature mapping and local tangent space respectively. Unlike previous methods, our first method corrects pre-trained word embeddings by preserving global geometry information of all words instead of local geometry information between words and their neighbors. Our second method refines word representations by aligning original and re-fined embedding spaces based on local tangent space instead of performing weighted locally linear combination twice. Experimental results obtained from standard semantic relatedness and semantic similarity tasks show that our methods outperform various state-of-the-art baselines for word representation refinement.
Anthology ID:
2020.coling-main.301
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3401–3412
Language:
URL:
https://aclanthology.org/2020.coling-main.301
DOI:
10.18653/v1/2020.coling-main.301
Bibkey:
Cite (ACL):
Wenyu Zhao, Dong Zhou, Lin Li, and Jinjun Chen. 2020. Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3401–3412, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information (Zhao et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.301.pdf