Unsupervised Sentence-embeddings by Manifold Approximation and Projection

Subhradeep Kayal


Abstract
The concept of unsupervised universal sentence encoders has gained traction recently, wherein pre-trained models generate effective task-agnostic fixed-dimensional representations for phrases, sentences and paragraphs. Such methods are of varying complexity, from simple weighted-averages of word vectors to complex language-models based on bidirectional transformers. In this work we propose a novel technique to generate sentence-embeddings in an unsupervised fashion by projecting the sentences onto a fixed-dimensional manifold with the objective of preserving local neighbourhoods in the original space. To delineate such neighbourhoods we experiment with several set-distance metrics, including the recently proposed Word Mover’s distance, while the fixed-dimensional projection is achieved by employing a scalable and efficient manifold approximation method rooted in topological data analysis. We test our approach, which we term EMAP or Embeddings by Manifold Approximation and Projection, on six publicly available text-classification datasets of varying size and complexity. Empirical results show that our method consistently performs similar to or better than several alternative state-of-the-art approaches.
Anthology ID:
2021.eacl-main.1
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Editors:
Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/2021.eacl-main.1
DOI:
10.18653/v1/2021.eacl-main.1
Bibkey:
Cite (ACL):
Subhradeep Kayal. 2021. Unsupervised Sentence-embeddings by Manifold Approximation and Projection. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1–11, Online. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Sentence-embeddings by Manifold Approximation and Projection (Kayal, EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.1.pdf
Code
 DeepK/distance-embed