Using Semantic Similarity as Reward for Reinforcement Learning in Sentence Generation

Go Yasui, Yoshimasa Tsuruoka, Masaaki Nagata


Abstract
Traditional model training for sentence generation employs cross-entropy loss as the loss function. While cross-entropy loss has convenient properties for supervised learning, it is unable to evaluate sentences as a whole, and lacks flexibility. We present the approach of training the generation model using the estimated semantic similarity between the output and reference sentences to alleviate the problems faced by the training with cross-entropy loss. We use the BERT-based scorer fine-tuned to the Semantic Textual Similarity (STS) task for semantic similarity estimation, and train the model with the estimated scores through reinforcement learning (RL). Our experiments show that reinforcement learning with semantic similarity reward improves the BLEU scores from the baseline LSTM NMT model.
Anthology ID:
P19-2056
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Fernando Alva-Manchego, Eunsol Choi, Daniel Khashabi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
400–406
Language:
URL:
https://aclanthology.org/P19-2056
DOI:
10.18653/v1/P19-2056
Bibkey:
Cite (ACL):
Go Yasui, Yoshimasa Tsuruoka, and Masaaki Nagata. 2019. Using Semantic Similarity as Reward for Reinforcement Learning in Sentence Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 400–406, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Using Semantic Similarity as Reward for Reinforcement Learning in Sentence Generation (Yasui et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-2056.pdf
Data
GLUE