The Amazing World of Neural Language Generation

Neural Language Generation (NLG) – using neural network models to generate coherent text – is among the most promising methods for automated text creation. Recent years have seen a paradigm shift in neural text generation, caused by the advances in deep contextual language modeling (e.g., LSTMs, GPT, GPT2) and transfer learning (e.g., ELMo, BERT). While these tools have dramatically improved the state of NLG, particularly for low resources tasks, state-of-the-art NLG models still face many challenges: a lack of diversity in generated text, commonsense violations in depicted situations, difficulties in making use of factual information, and difficulties in designing reliable evaluation metrics. In this tutorial, we will present an overview of the current state-of-the-art in neural network architectures, and how they shaped recent research directions in text generation. We will discuss how and why these models succeed/fail at generating coherent text, and provide insights on several applications.


Introduction
Natural Language Generation (NLG) forms the basis of many Natural Language Processing (NLP) tasks such as document summarization, machine translation, image captioning, conversational dialogue, and creative writing, making it an essential component in human-machine communication tasks. With recent progress in training deep neural networks, there has been a paradigm shift from template based approaches to neural methods as the predominant building blocks for text generation systems. Specifically, the rich representation learning capabilities of neural networks have allowed NLG models to be trained directly from large amounts of training data, significantly reducing the need for manual feature engineering.
Many benefits have emerged from this new research direction. First, the prototypical framework for training neural networks in an end-to-end fashion has allowed for a diverse array of contextual information to be incorporable into text generation systems (Vaswani et al., 2017;Radford et al., 2019;Ziegler et al., 2019;Keskar et al., 2019), allowing for a richer range of stylistic variability in generated text. Simultaneously, the combination of deep neural networks, large-scale text data and cheap computational power has accelerated new developments in neural network language models.
However, NLG models still raise many challenges which are the focus of a growing body of work. Examples of such limitations are the lack of diversity in generated texts, difficulty in controlling the discourse coherence of the generated text, the lack of commonsense in generated outputs, an uncertain reliance on provided factual information, and more general open questions on architecture design and optimization settings.
In this tutorial, we will start with an introduction to neural language generation, presenting neural language models and encoder-decoder models. We will then discuss the capabilities and limitations of recent text generation models, the suitable architectures for text generation in various specific applications, and then provide insights into why and how these generation models can be adapted for a particular task (Wiseman et al., 2017;Li et al., 2017;See et al., 2017;Xie, 2017). The discussion on evaluation metrics will start from ngram matching up to the recent progress on text generation evaluation metrics. In the end, this tutorial will be concluded by presenting and discussing major current research directions in the field of neural language generation. All materials (including slides, code, and demos) will be publicly available online on the day of the tutorial. We do not assume any particular prior knowl-edge in text generation or language modeling. Familiarity with standard neural network modules (LSTM/CNN/Transformer) is a plus but not required. The intended length of the tutorial is 3 hours, including a coffee break.

Overview
This tutorial will mainly focus on the recent advances in neural networks for language generation and will have minimal coverage on traditional methods. We will provide an overview on the recent progress of neural language generation for those working in this research area, and will also introduce this exciting research area to the NLP researchers who are not familiar with newest advancements in neural text generation. This tutorial is designed for anyone who has basic knowledge background of NLP or deep learning, which makes it accessible to any attendee of an NLP conference.

Tutorial Organization
Fundamentals and Progression of Neural Text Generation. Interest in neural text generation was recently catalyzed by the renaissance of neural network research in natural language processing, particularly with the development of neural language models and encoder-decoder models. Requiring minimal templates and hand-designed rules, unlike classical language generation methods, neural language generation models massively reduce the time needed to design and build new text generation system.
In particular, language models and encoderdecoder models conveniently allows to incorporate contexts such as previous or parallel sentences, as exemplified in machine translation models. However the spectrum of applications of NLG systems extends far beyond machine translation and can involve: (1) complex reasoning processes that go behind semantically preserving mapping from one language to another, for instance to model discourse, dialog flows or multi-hop reasoning; (2) a wide range of context information, from memory to multi-modalities like images or speech; and (3) challenging evaluation, as multiple generated outputs can be simultaneously valid for a given context (so called high-entropy tasks). The tutorial will highlight some these topics and provide a comprehensive overview of the advances of neural language generation.
Technical Details for Training and Optimization Neural Text Generation. Many of the recent progresses in neural language generation can be characterized as approaches to address some of the above mentioned issues. By investigating the difference between language generation and other sequential modeling problems, novel training methods (e.g., reinforcement learning or imitation learning) can be designed to capture longterm dependencies in generation. New decoding methods like top-k , nucleus sampling (Holtzman et al., 2019) or penalized sampling (Keskar et al., 2019) are invented to resolve the diversity issues.
Eventually, smarter ways to incorporate various contextual information in neural network models (Golovanov et al., 2019;Ziegler et al., 2019;Radford et al., 2019;Keskar et al., 2019) provide more flexibility as well as a better reliance of the model on the conditioning inputs.
Evaluation of Text Generation. Finally, there is a formidable challenge in getting better metrics to evaluate the quality of generated texts that stems from open-ended nature of these models output. Leveraging recent advances in representation learning, the field of neural language generation has been able to move beyond evaluation methods based on n-gram matching and incorporate promising approaches to design more reliable evaluation metrics. This tutorial will cover recent progress in this field as well as highlighting pressing issues with the current state of experimental reporting in NLG. Together with evaluation, we will overview several text generation benchmarks commonly used in the field.
Lessons Learned, Future Directions and Practical Advances of Neural Text Geneation. The last part of this tutorial will discuss practical issues when using cutting-edge language generation techniques. Most of the content covered in this part will have corresponding code or demo implemented in a standard deep learning framework like PyTorch or TensorFlow. The concluding part of the tutorial, we will provide a summary of current and future research direction as well as of some open questions to open the discussion.

Diversity and Inclusion
Diversity. The background of the instructors of this tutorial is evenly distributed among academia and industry. The instructors consist of a group of researchers ranging from an assistant professor at University of Virginia (Yangfeng Ji), a senior Ph.D. student at University of Washington with years of industry research experience (Antoine Bosselut) and two senior research scientists in industry (Thomas Wolf and Asli Celikyilmaz), who both have years of industry research experience. The tutorial instructors are also from different countries and continents (the Netherlands and USA).

Schedule
The tutorial will be 3 hours long.
1. Introduction of Natural Language Generation (15 minutes long): This section will introduce the tutorial by presenting the recent impact of neural network modeling approaches on the field. We will briefly overview the classical text generation pipeline, and introduce basic building blocks of neural text generation: language modeling and the encoder-decoder frameworks. We will also discuss the limitations of the simple encoder-decoder frameworks and motivate the rest of the tutorial.
2. Building blocks of Neural Network Models for Language Generation (60 minutes long): This section will comprise three closely related topics corresponding to three fundamental aspects of building a neural language generation system: (1) (Bengio et al., 2015), unlikelihood training (Welleck et al., 2019) or reinforcement/imitation learning (Kreutzer et al., 2018;Tan et al., 2018;Huang et al., 2019;Du and Ji, 2019) which can help alleviate exposure bias (He et al., 2019) and repetition issues, and improve handling long-term rewards; (3) selecting a decoding strategy, from classical methods like greedy decoding, beam search and random sampling up to more recent techniques like top-k , nucleus sampling (Holtzman et al., 2019) or penalized sampling (Keskar et al., 2019). This section will cover the material on classical techniques (30% of time) and mainly focus the recent progress on the related topics (70% of time) 3. Break (20 minutes) 4. Generation with Rich Context (25 minutes long): This section will discuss recent works on incorporating various types of context information in neural language generation. Going beyond simple context information provided by single sentence contexts, we will overview the growing body of work exploring various strategies to incorporate different types of context information either textual, e.g., syntactic, topic, and discourse information Clark et al., 2018;, or beyond text, including knowledge graph, database and images (Parthasarathi and Pineau, 2018;Dinan et al., 2018).

5.
Benchmarks and Evaluation (30 minutes long): Given the diversity of text generation tasks and domains, it can be challenging to design reliable benchmarks and evaluation metrics (Lowe et al., 2017;Reiter, 2018;See et al., 2019). In this section, we will summarize the current status on these topics.

Building Neural Models for Generation
(20 minutes long): This section will provide hand-on exercise, using existing deep learning packages, to build a neural language generation model. This section will also demonstrates how different learning/decoding strategies can have a strong impact on the quality of generated texts. 7. Open problems and directions (10 minutes long): In this final section, we will summarize the topics covered in the tutorial and point to a selection of open problems and future research directions.

Breadth
We estimate that the 30% of the tutorial will cover the recent work by the tutorial presenters, and the rest will be on cutting-research work by other researchers. Audience Size. Based on the increasing interest in natural language generation (larger growth rate in submissions compared to other areas of NLP 1 ), we anticipate that between 150 and 200 attendees will be interested in this tutorial.

Information about the Presenters
Special Requirements. The tutorial will require internet access for participants to be able to access the slides and, optionally, to access hands-on coding notebooks.
Open Access. We agree to allow the publication of our slides and a video recording of our tutorial in the ACL Anthology. All our materials will additionally be posted on our tutorial website.
Small Reading List.