Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books

Books have the power to make us feel happiness, sadness, pain, surprise, or sorrow. An author’s dexterity in the use of these emotions captivates readers and makes it difficult for them to put the book down. In this paper, we model the flow of emotions over a book using recurrent neural networks and quantify its usefulness in predicting success in books. We obtained the best weighted F1-score of 69% for predicting books’ success in a multitask setting (simultaneously predicting success and genre of books).


Introduction
Books have the power to evoke a multitude of emotions in their readers. They can make readers laugh at a comic scene, cry at a tragic scene and even feel pity or hate for the characters. Specific patterns of emotion flow within books can compel the reader to finish the book, and possibly pursue similar books in the future. Like a musical arrangement, the right emotional rhythm can arouse readers, but even a slight variation in the composition might turn them away. Vonnegut (1981) discussed the potential of plotting emotions in stories on the "Beginning-End" and the "Ill Fortune-Great Fortune" axes. Reagan et al. (2016) used mathematical tools like Singular Value Decomposition, agglomerative clustering, and Self Organizing Maps (Kohonen et al., 2001) to generate basic shapes of stories. They found that stories are dominated by six different shapes. They even correlated these different shapes to the success of books. Mohammad (2011) visualized emotion densities across books of different genres. He found that the progression of emotions varies with the genre. For example, there is a stronger progression into darkness in horror stories than in comedy. Likewise, Kar et al. (2018) showed that movies having similar flow of emotions across their plot synopses were assigned similar set of tags by the viewers. As an example, in Figure 1, we draw the flow of emotions across the book: Alice in Wonderland. The plot shows continuous change in trust, fear, and sadness, which relates to the main character's getting into and out of trouble. These patterns present the emotional arcs of the story. Even though they do not reveal the actual plot, they indicate major events happening in the story.
In this paper, we hypothesize that readers enjoy emotional rhythm and thus modeling emotion flows will help predicting a book's potential success. In addition, we show that using the entire content of the book yields better results. Considering only a fragment, as done in earlier work that focuses mainly on style (Maharjan et al., 2017;Ashok et al., 2013), disregards important emotional changes. Similar to Maharjan et al. (2017), we also find that adding genre as an auxiliary task improves success prediction.

Methodology
We extract emotion vectors from different chunks of a book and feed them to a recurrent The source code and data for this paper can be downloaded from https://github.com/sjmaharjan/ emotion flow neural network (RNN) to model the sequential flow of emotions. We aggregate the encoded sequences into a single book vector using an attention mechanism. Attention models have been successfully used in various Natural Language Processing tasks (Wang et al., 2016;Yang et al., 2016;Hermann et al., 2015;Chen et al., 2016;Rush et al., 2015;Luong et al., 2015). This final vector, which is emotionally aware, is used for success prediction.

Representation of Emotions: NRC Emotion
Lexicons provide ∼14K words (Version 0.92) and their binary associations with eight types of elementary emotions (anger, anticipation, joy, trust, disgust, sadness, surprise, and fear) from the Hourglass of emotions model with polarity (positive and negative) (Mohammad andTurney, 2013, 2010). These lexicons have been shown to be effective in tracking emotions in literary texts (Mohammad, 2011).
Inputs: Let X be a collection of books, where each book x ∈ X is represented by a sequence of n chunk emotion vectors, x = (x 1 , x 2 , ..., x n ), where x i is the aggregated emotion vector for chunk i, as shown in Figure 2. We divide the book into n different chunks based on the number of sentences. We then create an emotion vector for each sentence by counting the presence of words of the sentence in each of the ten different types of emotions of the NRC Emotion Lexicons. Thus, the sentence emotion vector has a dimension of 10. Finally, we aggregate these sentence emotion vectors into a chunk emotion vector by taking the average and standard deviation of sentence vectors in the chunk. Mathematically, the ith chunk emotion vector (x i ) is defined as follows: where, N is the total number of sentences, s ij ands i are the jth sentence emotion vector and the mean of the sentence emotion vectors for the ith chunk, respectively. The chunk vectors have a dimension of 20 each. The motivation behind using the standard deviation as a feature is to capture the dispersion of emotions within a chunk. rize the contextual emotion flow information from both directions. The forward and backward GRUs will read the sequence from x 1 to x n , and from x n to x 1 , respectively. These operations will compute the forward hidden states ( The annotation for each chunk x i is obtained by concatenating its forward hidden states − → h i and its backward hidden We then learn the relative importance of these hidden states for the classification task. We combine them by taking the weighted sum of all h i and represent the final book vector r using the following equation: where, α i are the weights, W a is the weight matrix, b a is the bias, v is the weight vector, and selu (Klambauer et al., 2017) is the nonlinear activation function. Finally, we apply a linear transformation that maps the book vector r to the number of classes. In case of the single task (ST) setting, where we only predict success, we apply sigmoid activation to get the final prediction probabilities and compute errors using binary cross entropy loss. Similarly, in the multitask (MT) setting, where we predict both success and genre (Maharjan et al., 2017), we apply a softmax activation to get the final prediction probabilities for the genre prediction. Here, we add the losses from both tasks, i.e. L total =L suc + L gen (L suc and L gen are success and genre tasks' losses, respectively), and then train the network using backpropagation.

Dataset
We experimented with the dataset introduced by Maharjan et al. (2017). The dataset consists of 1,003 books from eight different genres collected from Project Gutenberg 1 . The authors considered only those books that were at least reviewed by ten reviewers. They categorized these books into two classes, Successful (654 books) and Unsuccessful (349 books), based on the average rating for the books in Goodreads 2 website. They considered only the first 1K sentences from each book.

Baselines
We compare our proposed methods with the following baselines: Majority Class: The majority class in training data is success. This baseline obtains a weighted F1-score of 0.506 for all the test instances. SentiWordNet+SVM: Maharjan et al. (2017) used SentiWordNet (Baccianella et al., 2010) to compute the sentiment features along with counts of different Part of Speech (PoS) tags for every 50 consecutive sentences (20 chunks from 1K sentences) and used an SVM classifier. NRC+SVM: We concatenate the chunk emotion vectors (x i ) created using the NRC lexicons and feed them to the SVM classifier. We experiment by varying the number of book chunks.
These baseline methods do not incorporate the sequential flow of emotions across the book and treat each feature independently of each other.

Experimental Setup
We experimented with the same random stratified splits of 70:30 training to test ratio as used by Maharjan et al. (2017). We use the SVM algorithm for the baselines and RNN for our proposed emotion flow method. We tuned the C hyperparameter of the SVM classifier by performing grid search on the values (1e{-4,...,4}), using three fold cross validation on the training split. For the experiments with RNNs, we first took a random stratified split of 20% from the training data as validation set. We then tuned the RNN hyperparameters by running 20 different experiments with a random selection of different values for the hyperparameters. We tuned the weight initialization (Glorot Uniform (Glorot and Bengio, 2010), Le-Cun Uniform (LeCun et al., 1998)), learning rate with Adam (Kingma and Ba, 2015) Table 1 presents the results. Our proposed method performs better than different baseline methods and obtains the highest weighted F1score of 0.690. The results highlight the importance of taking into account the sequential flow of emotions across books to predict how much readers will like a book. We obtain better performance when we use an RNN to feed the sequences of emotion chunk vectors. The performance decreases with the SVM classifier, which discards this sequential information by treating each feature independently of each other. Moreover, increasing the granularity of the emotions by increasing the number of chunks seems to be helpful for success prediction. However, we see a slight decrease in performance beyond 50 chunks (weighted F1 score of 0.662 and 0.664 for 60 and 100 chunks, respectively).
The results also show that the MT setting is beneficial over the ST setting, whether we consider the first 1K sentences or the entire book. This finding is akin to Maharjan et al. (2017). Similar to them, we suspect the auxiliary task of genre classification is acting as a regularizer.
Considering only the first 1K sentences of books may miss out important details, especially when the only input to the model is the distribution of emotions. It is necessary to include information from later chapters and climax of the story as they gradually reveal the answers to the suspense, the events, and the emotional ups and downs in characters that build up through the course of the book. Accordingly, our results show that it is important to consider emotions from the entire book rather than from just the first 1K sentences. From Figure 3, we see that using the attention mechanism to aggregate vectors is better than just concatenating the final forward and backward hidden states to represent the book in both ST and MT settings. We also observe that the multitask approach performs better than the singe task one regardless of the number of chunks and the use of attention.  Figure 4 plots the heatmap of the average attention weights for test books grouped by their genre. The model has learned that the last two to three chunks that represent the climax, are most important for predicting success. Since this is a bidirectional RNN model, hidden representations for each chunk carry information from the whole book. Thus, using only the last chunks will probably result in lower performance. Also, the weights visualization shows an interesting pattern for Poetry. In Poetry, emotions are distributed across the different regions. This may be due to sudden important events or abrupt change in emotions. For Short stories, initial chunks also receive some weights, suggesting the importance of the premise. Climax Emotions: Since the last chunk is assigned more weights than other chunks, we used information gain to rank features of that chunk. From Figure 5, we see that features capturing the variation of different emotions are ranked higher than features capturing the average scores. This suggests that readers tend to enjoy emotional ups and downs portrayed in books, making the standard deviation features more important than the average features for the same emotions. Table 2 shows the mean (µ) and standard deviation (σ) for different emotions extracted for all the data, and further categorized by Successful and Unsuccessful label from the last chunk. We see that authors generally end books with higher rates of positive words (µ = 0.888) than negative words (µ = 0.599) and the difference is significant (p < 0.001). Similarly the means for anticipation, joy, trust, and fear are higher than for sadness, surprise, anger, and disgust. This further vali-  dates that authors prefer happy ending. Moving on to Successful and Unsuccessful categories, we see that the means for Successful books are higher than Unsuccessful books for anger, anticipation, disgust, fear, joy, and sadness (highly significant, p < 0.001). We observe the same pattern for trust, and surprise, although the p value is only p < 0.02 in this case. Moreover, the standard deviations for all emotions are significantly different across the two categories (p < 0.001). Thus, emotion concentration (µ) and variation (σ) for Successful books are higher than for Unsuccessful books for all emotions in the NRC lexicon. Emotion Shapes: We visualize the prominent emotion flow shapes in the dataset using K-means clustering algorithm. We took the average joy across 50 chunks for all books and clustered them into 100 different clusters. We then plotted the smoothed centroid of clusters having ≥ 20 books. We found two distinct shapes ( "Man in the hole" (fall to rise) and "Tragedy" or "Riches to rags" (fall)). Figure 6 shows such centroid plots. The plot also shows that the "Tragedy" shapes have an overall lower value of joy than the "Man in the hole" shapes. Upon analyzing the distribu-tion of Successful and Unsuccessful books within these shapes, we found that the "Man in the hole" shapes have a higher number of successful books whereas, the "Tragedy" shapes have the opposite.

Conclusions and Future Work
In this paper, we showed that modeling emotions as a flow, by capturing the emotional content at different stages, improves prediction accuracy. We learned that most of the attention weights are given to the last fragment in all genres, except for Poetry where other fragments seem to be relevant as well. We also showed empirically that adding an attention mechanism is better than just considering the last forward and backward hidden states from the RNN. We found two distinct emotion flow shapes and found that the clusters with "Tragedy" shape had more unsuccessful books than successful ones. In future work, we will be exploring how we can use these flows of emotions to detect important events that result in suspenseful scenes. Also, we will be applying hierarchical methods that take in the logical grouping of books (sequence of paragraphs to form a chapter and sequence of chapters to form a book) to build books' emotional representations. resource for sentiment analysis and opinion mining. In LREC. volume 10, pages 2200-2204.
Danqi Chen, Jason Bolton, and Christopher D. Manning. 2016. A thorough examination of the cnn/daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 2358-2367. http:// www.aclweb.org/anthology/P16-1223.
Understanding the difficulty of training deep feedforward neural networks.