Exploration of register-dependent lexical semantics using word embeddings

Andrey Kutuzov, Elizaveta Kuzmenko, Anna Marakasova


Abstract
We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm. Models trained on register-specific subcorpora of the BNC corpus are employed to compare lists of nearest associates for particular words and draw conclusions about their semantic shifts depending on register in which they are used. The models are evaluated on the task of register classification with the help of the deep inverse regression approach. Additionally, we present a demo web service featuring most of the described models and allowing to explore word meanings in different English registers and to detect register affiliation for arbitrary texts. The code for the service can be easily adapted to any set of underlying models.
Anthology ID:
W16-4005
Volume:
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Erhard Hinrichs, Marie Hinrichs, Thorsten Trippel
Venue:
LT4DH
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
26–34
Language:
URL:
https://aclanthology.org/W16-4005
DOI:
Bibkey:
Cite (ACL):
Andrey Kutuzov, Elizaveta Kuzmenko, and Anna Marakasova. 2016. Exploration of register-dependent lexical semantics using word embeddings. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pages 26–34, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Exploration of register-dependent lexical semantics using word embeddings (Kutuzov et al., LT4DH 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4005.pdf
Code
 ElizavetaKuzmenko/dsm_genres