From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Zihang Dai, Qizhe Xie, Eduard Hovy


Abstract
In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.
Anthology ID:
P18-1155
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1672–1682
Language:
URL:
https://aclanthology.org/P18-1155
DOI:
10.18653/v1/P18-1155
Bibkey:
Cite (ACL):
Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1672–1682, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction (Dai et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-1155.pdf
Note:
 P18-1155.Notes.pdf
Poster:
 P18-1155.Poster.pdf
Code
 zihangdai/ERAC-VAML
Data
MS COCO