Deep Reinforcement Learning for NLP

Many Natural Language Processing (NLP) tasks (including generation, language grounding, reasoning, information extraction, coreference resolution, and dialog) can be formulated as deep reinforcement learning (DRL) problems. However, since language is often discrete and the space for all sentences is infinite, there are many challenges for formulating reinforcement learning problems of NLP tasks. In this tutorial, we provide a gentle introduction to the foundation of deep reinforcement learning, as well as some practical DRL solutions in NLP. We describe recent advances in designing deep reinforcement learning for NLP, with a special focus on generation, dialogue, and information extraction. Finally, we discuss why they succeed, and when they may fail, aiming at providing some practical advice about deep reinforcement learning for solving real-world NLP problems.


Tutorial Description
Deep Reinforcement Learning (DRL) (Mnih et al., 2015) is an emerging research area that involves intelligent agents that learn to reason in Markov Decision Processes (MDP). Recently, DRL has achieved many stunning breakthroughs in Atari games (Mnih et al., 2013) and the Go game (Silver et al., 2016). In addition, DRL methods have gained significantly more attentions in NLP in recent years, because many NLP tasks can be formulated as DRL problems that involve incremental decision making. DRL methods could easily combine embedding based representation learning with reasoning, and optimize for a variety of non-differentiable rewards. However, a key challenge for applying deep reinforcement learning techniques to real-world sized NLP problems is the model design issue. This tutorial draws connections from theories of deep reinforcement learning to practical applications in NLP.
We further discuss several critical issues in DRL solutions for NLP tasks, including (1) The efficient and practical design of the action space, state space, and reward functions; (2) The trade-off between exploration and exploitation; and (3) The goal of incorporating linguistic structures in DRL. To address the model design issue, we discuss several recent solutions (He et al., 2016b;Li et al., 2016;Xiong et al., 2017). We then focus on a new case study of hierarchical deep reinforcement learning for video captioning (Wang et al., 2018b), discussing the techniques of leveraging hierarchies in DRL for NLP generation problems. This tutorial aims at introducing deep reinforcement learning methods to researchers in the NLP community. We do not assume any particular prior knowledge in reinforcement learning. The intended length of the tutorial is 3 hours, including a coffee break.

Outline
Representation Learning, Reasoning (Learning to Search), and Scalability are three closely related research subjects in Natural Language Processing. In this tutorial, we touch the intersection of all the three research subjects, covering various aspects of the theories of modern deep reinforcement learning methods, and show their successful applications in NLP. This tutorial is organized in three parts: • Foundations of Deep Reinforcement Learning. First, we will provide a brief overview of reinforcement learning (RL), and discuss the classic settings in RL. We describe classic methods such as Markov Decision Processes, REINFORCE (Williams, 1992), and Qlearning (Watkins, 1989). We introduce modelfree and model-based reinforcement learning approaches, and the widely used policy gradient methods. In this part, we also introduce the modern renovation of deep reinforcement learning (Mnih et al., 2015), with a focus on games (Mnih et al., 2013;Silver et al., 2016).
• Practical Deep Reinforcement Learning: Case Studies in NLP Second, we will focus on the designing practical DRL models for NLP tasks.
In particular, we will take the first deep reinforcement learning solution for dialogue (Li et al., 2016) as a case study. We describe the main contributions of this work: including its design of the reward functions, and why they are necessary for dialog. We then introduce the gigantic action space issue for deep Q-learning in NLP (He et al., 2016a,b), including several solutions. To conclude this part, we discuss interesting applications of DRL in NLP, including information extraction and reasoning.
• Lessons Learned, Future Directions, and Practical Advices for DRL in NLP Third, we switch from the theoretical presentations to an interactive demonstration and discussion session: we aim at providing an interactive session to transfer the theories of DRL into practical insights. More specifically, we will discuss three important issues, including problem formulation/model design, exploration vs. exploitation, and the integration of linguistic structures in DRL. We will show case a recent study (Wang et al., 2018b) that leverages hierarchical deep reinforcement learning for language and vision, and extend the discussion. Practical advice including programming advice will be provided as a part of the demonstration.

History
The full content of this tutorial has not yet been presented elsewhere, but some parts of this tutorial has also been presented at the following locations in recent years: