Discrete Latent Variable Representations for Low-Resource Text Classification

Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu


Abstract
While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for low-resource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes.
Anthology ID:
2020.acl-main.437
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4831–4842
Language:
URL:
https://aclanthology.org/2020.acl-main.437
DOI:
10.18653/v1/2020.acl-main.437
Bibkey:
Cite (ACL):
Shuning Jin, Sam Wiseman, Karl Stratos, and Karen Livescu. 2020. Discrete Latent Variable Representations for Low-Resource Text Classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4831–4842, Online. Association for Computational Linguistics.
Cite (Informal):
Discrete Latent Variable Representations for Low-Resource Text Classification (Jin et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.437.pdf
Video:
 http://slideslive.com/38929414
Code
 shuningjin/discrete-text-rep
Data
AG News