Auto-Encoding Dictionary Definitions into Consistent Word Embeddings

Tom Bosc, Pascal Vincent


Abstract
Monolingual dictionaries are widespread and semantically rich resources. This paper presents a simple model that learns to compute word embeddings by processing dictionary definitions and trying to reconstruct them. It exploits the inherent recursivity of dictionaries by encouraging consistency between the representations it uses as inputs and the representations it produces as outputs. The resulting embeddings are shown to capture semantic similarity better than regular distributional methods and other dictionary-based methods. In addition, our method shows strong performance when trained exclusively on dictionary data and generalizes in one shot.
Anthology ID:
D18-1181
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1522–1532
Language:
URL:
https://aclanthology.org/D18-1181
DOI:
10.18653/v1/D18-1181
Bibkey:
Cite (ACL):
Tom Bosc and Pascal Vincent. 2018. Auto-Encoding Dictionary Definitions into Consistent Word Embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1522–1532, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Auto-Encoding Dictionary Definitions into Consistent Word Embeddings (Bosc & Vincent, EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1181.pdf
Attachment:
 D18-1181.Attachment.zip
Code
 tombosc/cpae +  additional community code