Improved Language Modeling by Decoding the Past

Siddhartha Brahma


Abstract
Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next token. This biases the model towards retaining more contextual information, in turn improving its ability to predict the next token. With negligible overhead in the number of parameters and training time, our Past Decode Regularization (PDR) method improves perplexity on the Penn Treebank dataset by up to 1.8 points and by up to 2.3 points on the WikiText-2 dataset, over strong regularized baselines using a single softmax. With a mixture-of-softmax model, we show gains of up to 1.0 perplexity points on these datasets. In addition, our method achieves 1.169 bits-per-character on the Penn Treebank Character dataset for character level language modeling.
Anthology ID:
P19-1142
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1468–1476
Language:
URL:
https://aclanthology.org/P19-1142
DOI:
10.18653/v1/P19-1142
Bibkey:
Cite (ACL):
Siddhartha Brahma. 2019. Improved Language Modeling by Decoding the Past. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1468–1476, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Improved Language Modeling by Decoding the Past (Brahma, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1142.pdf
Data
Penn TreebankWikiText-2