Learning to Control the Fine-grained Sentiment for Story Ending Generation

Automatic story ending generation is an interesting and challenging task in natural language generation. Previous studies are mainly limited to generate coherent, reasonable and diversified story endings, and few works focus on controlling the sentiment of story endings. This paper focuses on generating a story ending which meets the given fine-grained sentiment intensity. There are two major challenges to this task. First is the lack of story corpus which has fine-grained sentiment labels. Second is the difficulty of explicitly controlling sentiment intensity when generating endings. Therefore, we propose a generic and novel framework which consists of a sentiment analyzer and a sentimental generator, respectively addressing the two challenges. The sentiment analyzer adopts a series of methods to acquire sentiment intensities of the story dataset. The sentimental generator introduces the sentiment intensity into decoder via a Gaussian Kernel Layer to control the sentiment of the output. To the best of our knowledge, this is the first endeavor to control the fine-grained sentiment for story ending generation without manually annotating sentiment labels. Experiments show that our proposed framework can generate story endings which are not only more coherent and fluent but also able to meet the given sentiment intensity better.


Introduction
Story ending generation aims at completing the plot and concluding a story given a story context. Previous works mainly study on how to generate a coherent, reasonable and diversified story ending Guan et al., 2018;. However, few of them focus on controllable story ending generation, especially ⇤ Equal Contribution. 1 Our code and data can be found at https://github. com/luofuli/sentimental-story-ending She still lost the game and was very upset.

0.3
She almost won the game, but eventually lost.

0.5
The game ended with a draw.

0.7
She eventually won the game.

0.9
She won the game and was very proud of her team.
Story context: Sally really loves to play soccer. She joined a team with her friends and she plays everyday. Her coach and her teammates are all really fun. Sally practiced extra hard for her first match. Figure 1: An example of the input story context and output story endings for this task. All of the story endings are coherent with the story context but express different sentiment intensities.
controlling the sentiment for story ending generation. Yao et al. (2018b) is the only work on controlling the sentiment for story ending generation. However, their work needs manually label the story dataset with sentiment labels (happy, sad, unknown), which is time-consuming and laborintensive. What's more, they only focus on coarsegrained sentiment.
Different from previous work, we propose the task of controlling the sentiment for story ending generation at a fine-grained level, without any human annotation of story dataset 2 . Take Figure 1 as an example, given the same story context, our goal is to generate a story ending that satisfies the given sentiment intensity, where 0 denotes the most negative and 1 denotes the most positive, following the setting of sentiment intensity on sentiment intensity prediction task (Abdou et al., 2018;Akhtar et al., 2018). To the proposed task, there are two major challenges. First, how to annotate story corpus with sentiment intensities. Second, how to incorporate the fine-grained sentiment control into a generative model.  Figure 2: The overview of the proposed framework, which consists of a sentiment analyzer and a sentimental generator. During training, the target sentiment intensity s is computed by the sentiment analyzer. During testing, users can input any sentiment intensity to control the sentiment for story ending generation.
In this work, we propose a framework which consists a sentiment analyzer and a sentimental generator. To address the first challenge, the sentiment analyzer adopts three methods including an unsupervised rule-based method, a regression model, and a domain-adversarial regression model to acquire sentiment intensities of the story training corpus. To address the second challenge, the sentimental generator uses a sentiment intensity controlled sequence-to-sequence model (SIC-Seq2Seq) to generate a story ending which expresses the given sentiment intensity. It introduces an explicit sentiment intensity control variable into the Seq2Seq model via a Gaussian Kernel Layer to guide the generation.
Experiments show the effectiveness and generality of the proposed framework, since it can generate story endings which are not only coherent and fluent but also able to better meet the given sentiment intensity.

Overview
Here we formulate the task of fine-grained sentiment controllable story ending generation. Given the story context x = (x 1 , · · · , x m ) which consists of m sentences, and the target sentiment intensity s, the goal of this task is to generate a story ending y that is coherent to story context x and expresses the target sentiment intensity s. Note that the sentiment intensity s 2 [0, 1].
Although existing datasets for story ending generation can provide paired data (x, y), the true sentiment s of y is not observable. To remedy this, the sentiment analyzer S employs several methods to acquire the sentiment intensity s of y. Then the sentimental generator G takes the story context x and the sentiment of the story ending s as input to generate the story ending y. The overview of our proposed framework is presented in Figure  2, which is composed of two modules: a sentiment analyzer S and a sentimental generator G. The next two sections will show detailed configurations in each module.

Sentiment Analyzer
The sentiment analyzer S aims to predicting the sentiment intensity s of the gold story ending y to construct paired data (x, s; y). As the first attempt to solve the proposed task, we explore three kinds of sentiment analyzers as follows.
Rule-based (RB): VADER (Hutto and Gilbert, 2014) is an rule-based unsupervised model for sentiment analysis. We use it to extract the sentiment intensity s of y and then scale s to [0, 1].

Regression Model (RM):
We first train a linear regression model R on the Stanford Sentiment Treebank (SST) (Socher et al., 2013) dataset, which is widely-used for sentiment analysis. Then we use R to acquire the sentiment intensity of y.
Domain-Adversarial (DA): In the absence of sentiment annotations for the story dataset, domain adaptation can provide an effective solution since there exists some labeled datasets of a similar task but from a different domain. We use adversarial learning (Ganin and Lempitsky, 2015) to extract a domain-independent feature which not only performs well in the SST sentiment regression task but also misleads the domain discriminator. Finally, we use the adapted regression model to acquire the sentiment intensity s of y.

Sentimental Generator
The sentimental generator G aims to generate story endings that match the target sentiment intensities s. It consists of an encoder and a decoder equipped with a Gaussian Kernel Layer.
The encoder is to map the input story context x into a compact vector that can capture its essential context features. Specifically, we use a normal bi-directional LSTM as the encoder. All context words x i are represented by their semantic embeddings E as the input and we use the concatenation of final forward and backward hidden states as the initial hidden state of the decoder. The decoder aims to generate a story ending which accords with the target sentiment intensity s. As shown in Figure 3, the probability of generating a target word P is composed of two probabilities: where P R (y t ) denotes the semantic generation probability, P S (y t ) denotes the sentiment generation probability, ↵ and are trainable coefficients. Specifically, P R (y t ) is defined as follow: where w is a one-hot indicator vector of word w, W R and b R are trainable parameters, h t is the t-th hidden state of the LSTM decoder with attention mechanism (Luong et al., 2015). P S (y t ) measures the generation probability of the target word given the target sentiment intensity s. For all words, beyond their semantic embeddings, they also have sentiment embeddings U. The sentiment embeddings of words reflect their sentiment properties. A Gaussian Kernel Layer (Luong et al., 2015;) is used to encourage words with sentiment intensity near to target sentiment s, and P S (y t ) is defined as follow: where 2 is the variance, S maps the sentiment embedding into a real value, the target sentiment intensity s is the mean of the Gaussian distribution, W U and b U are trainable parameters.

Dataset
We choose the widely-used ROCStories corpus (Mostafazadeh et al., 2016) which consists of 100k five-sentence stories. We split the data into a training set with 93,126 stories, a validation set with 5,173 stories and a test set with 5,175 stories.

Baselines
Since there is no direct related work of this task, we design an intuitive pipeline (generate-andmodify) as baseline. It first generates a story ending using a general sequence-to-sequence model with attention (Luong et al., 2015), and then modifies the sentiment of the story ending towards the target sentiment intensity via a fine-grained sentiment modification method (Liao et al., 2018). We call this baseline Seq2Seq + SentiMod.

Experiment Settings
We tune hyper-parameters on the validation set. For the RM and DA sentiment analyzer, we implement the encoder as a 3-layer bidirectional LSTM with a hidden size of 512. We implement the regression module as a MLP with 1 hidden layer of size 32. For domain adaption, we implement a domain discriminator as a MLP with 1 hidden layer of size 32. A Gradient Reversal Layer is added into the domain discriminator. For the sentimental generator, both the semantic and sentiment embeddings are 256 dimensions and randomly initialized. We implement both encoder and decoder as 1-layer bidirectional LSTM with a hidden size of 512. The variance 2 of Gaussian Kernel Layer is set as 1. The batch size is 32 and the dropout (Srivastava et al., 2014) is 0.5. We use the Adam optimizer (Kingma and Ba, 2014) with an initial learning rate of 0.0003.

Evaluation Metrics
For the proposed task, there are no existing accepted metrics. We propose both automatic evaluation and human evaluation for this task.

Automatic Evaluation
Sentiment Consistency: We propose the pairwise sentiment consistency (SentiCons) to evaluate the consistency of two lists of sentiment intensities. For two lists A and B with the same length,

SentiCons(A, B) is calculated by
where n is the length of the list and I is the indicator function. To evaluate the performance of sentiment analyzer, we calculate SentiCons of human-annotated sentiment intensities and modelpredicted sentiment intensities of gold story endings in the test set (H-M SentiCons). To evaluate the performance of sentimental generator, for each story context in the test set, we generate five story endings with five target sentiment intensity ranging from [0, 1]. Then we calculate SentiCons of input target sentiment intensities and sentiment intensities of the outputs predicted by the best sentiment analyzer (I-O SentiCons). BLEU: For each story in the test set, we take the context x and the human-annotated sentiment intensity s of the gold story ending y as input. The corresponding output isŷ. Then we calculate the BLEU (Papineni et al., 2002) score of y andŷ as the overall quality of the generated story endings.

Human Evaluation
We hire two evaluators who are skilled in English to evaluate the generated story endings. For each story in the test set, we distribute the story context, five target sentiment intensities and corresponding generated story endings to the evaluators. Evaluators are required to score the generated endings from 1 to 5 in terms of three criteria: Coherency, Fluency and Sentiment. Coherency measures whether the endings are coherent with the context. Fluency measures whether the endings are fluent. Sentiment measures how much the endings express the target sentiment intensities. Table 1 shows the automatic evaluation results of three sentiment analyzers. We find that: (1) The rule-based method RB performs the best. This accords with the fact that story endings in the ROC-Stories corpus are simple and have relatively obvious emotional words. (2) DA can not improve   the performance of sentiment analysis in our task compared to RM. We hypothesize that is because the domains of labeled SST corpus and ROCStories corpus differ too much that affects the performance of domain adaptation. The automatic and human evaluation results of four generation models are shown in Table 2 and  Table 3 respectively. We have the following observations: (1) Three models based on our proposed framework do not have obvious performance difference in terms of BLEU, Coherency, and Fluency. Meanwhile, all of them can largely outperform the Seq2Seq+SentiMod baseline which does not follow our framework. Thus it shows the effectiveness of the proposed framework.

Evaluation Results
(2) H-M SentiCons which measures the performance of sentiment analyzer is marginally consistent with the I-O SentiCons and Sentiment which measure the performance of sentimental generator. This accords with our expectations because the sentimental generator takes the sentiment intensity predicted by the sentiment analyzer as the input signal for controlling the sentiment of the output.
From a comprehensive perspective, our framework can better control the sentiment while guaranteeing the coherency and fluency.

Case Study
We provide an example of story ending generation with five different target sentiment intensities in Table 4. This demonstrates that our proposed framework can generate more fluent and coherent story endings than the Seq2Seq + SentiMod baseline which does not follow our framework. More importantly, at the same time, our framework has better control over the sentiment tenden-

Story Context
Madison really wanted to buy a new car. She applied to work at different restaurants around town. One day a local restaurant hired her to be their new waitress! Molly worked very hard as a waitress and earned a lot of tips.

Outputs
Seq2Seq + SentiMod s = 0.1 Dates sangria and drinks went loved the drinks! s = 0.3 Madison was never in once some showed up. s = 0.5 Madison's finally cut and delicious wine. s = 0.7 Madison was happy so new great hospital! s = 0.9 Tom and satisfied big meal and sweet! Outputs SIC-Seq2Seq + RB s = 0.1 Madison got in trouble for not buying the car again.
Madison was so embarrassed that she threw her car out. s = 0.5 Madison was able to buy her car. s = 0.7 Madison was so excited to be able to buy her car! s = 0.9 Madison was happy to have a new car and be happy with her new car! cies of generated story endings, e.g. "in trouble" ! "embarrassed" ! "able to" ! "excited" ! "happy" and "new car".

Related Work
Story generation Automatic story generation has attracted interest over the past few years. Recently, many approaches are proposed to generate a better story in terms of coherence (Jain et al., 2017;, rationality , topic-consistence (Yao et al., 2018a). However, most of story generation methods lack the ability to receive guidance from users to achieve a specific goal. There are only a few works focus on the controllability of story generation, especially on sentiment. Tambwekar et al. (2018) introduces a policy gradient learning approach to ensure that the model ends with a specific type of event given in advance. Yao et al. (2018b) uses manually annotated story data to control the ending valence and storyline of story generation. Different from them, our proposed framework can acquire distant sentiment labels without the dependence on the human annotations.
Sentimental Text Generation Generating sentimental and emotional texts is a key step towards building intelligent and controllable natural language generation systems. To date several works of dialogue generation Zhou and Wang, 2018) and text sentiment transfer task Luo et al., 2019) have studied on generating emotional or sentimental text. They always pre-define a binary sentiment label (positive/negative) or a small limited set of emotions, such as "anger", "love". Different from them, controlling the fine-grained sentiment (a numeric value) for story ending generation is not limited to several emotional labels, thus we can not embed each sentiment label into a separate vector as usual. Therefore, we propose to introduce the numeric sentiment value via a Gaussian Kernel Layer.

Conclusion and Future Work
In this paper, we make the first endeavor to control the fine-grained sentiment for story ending generation. The proposed framework is generic and novel, and does not need any human annotation of story dataset. Experiments show the effectiveness of the proposed framework to control the sentiment intensity on both automatic evaluation and human evaluation. Future work can combine the analyzer and generator via joint training, hopefully to achieve better results.