Sentiment Forecasting in Dialog

Sentiment forecasting in dialog aims to predict the polarity of next utterance to come, and can help speakers revise their utterances in sentimental utterances generation. However, the polarity of next utterance is normally hard to predict, due to the lack of content of next utterance (yet to come). In this study, we propose a Neural Sentiment Forecasting (NSF) model to address inherent challenges. In particular, we employ a neural simulation model to simulate the next utterance based on the context (previous utterances encountered). Moreover, we employ a sequence influence model to learn both pair-wise and seq-wise influence. Empirical studies illustrate the importance of proposed sentiment forecasting task, and justify the effectiveness of our NSF model over several strong baselines.


Introduction
Developing intelligent chatbots is of great appealing to both the industry and the academics. However it is challenging to build up such an intelligent chatbot which involves a series of high-level natural language processing techniques, such as sentiment analysis of utterances in dialog.
Previous studies on sentiment classification focus on determining polarity (positive or negative) in a single document (Pang and Lee, 2008;Amplayo et al., 2018). In comparison, only few studies focus on determining polarity of utterances in dialog (Herzig et al., 2016;Majumder et al., 2018). However, all of these studies focus on determining the polarity of existing utterances. It may be more important to predict the polarity of next utterance yet to come. Given the example in Figure 1, although B expresses a positive sentiment in second utterance, A still shows a negative sentiment in his response. In this case, if B know that A would be very upset after his first utterance, he may revise his utterance to let A feel more comfortable. Hence, predicting the polarity of the next utterance can help a speaker to improve their utterances, which is important in automatic dialogue, such as customer service.
To the above purpose, we propose a new task, calls sentiment forecasting in dialog, which aims to predict the polarity of next utterance yet to come. In this paper, we focus on tackling two inherent challenges, one is how to simulate next utterance, and the other is how to learn the influence from context towards next utterance. The motivations behind are that, since next utterance is yet to come, it would be helpful if we can simulate next utterance from the context. Moreover, the polarity of next utterance can be much influenced by the context, it is necessary to consider the influence of the context for next utterance.
In this paper, we propose a Neural Sentiment Forecasting (NSF) model to address above challenges. In particular, a neural simulation model is employed to simulate next utterance based on existed utterances. In addition, a hierarchical influence model is employed to learn the influence from existed utterances by considering both pair-wise and sequence-wise influence between existed utterance sequence and next utterance. Empirical studies illustrate the importance of proposed sentiment forecasting task in dialog, and also show the effectiveness of our proposed NSF model over several strong baselines.

Neural Sentiment Forecasting Modeling
As illustrated in Figure 1, given a existed utterance sequence {u 1 , u 2 , ..., u n−1 } in a dialog d, we aim to predict the polarity (positive, negative, or neutral) of next utterance u n . Note that, next utterance u n does not exist in dialog, and we do not know the polarity of utterances in existed dialog sequence. Figure 2 shows the overview of proposed Neural Sentiment Forecasting (NSF) model. We first learn the representation of each utterance u i . Secondly, we employ a neural simulation model to simulate the representation of next utteranceû n based on the existed utterance representations. Thirdly, we employ a hierarchical sequence influence model to learn the influence from existed utterance sequence to the next utteranceû n . Finally, we predict polarityŷ of next utteranceû n based on the simulation model and the influence from existed utterances. In the following, we discuss these issues one by one.

Existed Utterances Representation
Firstly, we need to learn the representation of existed utterances in dialog. Given a utterance u i with m words {w 1 , w 2 , ..., w m }, we transform each token w i into a real-valued vector x i using the word embedding vector of w i (Mikolov et al., 2013). We employ LSTM model (Hochreiter and Schmidhuber, 1997) over u i to generate a hidden vector sequence (h 1 , h 2 , ..., h m ). At each step t, the hidden vector h t of LSTM model is computed based on the current vector x t and the previous vector h t−1 , and h t = LSTM(x t , h t−1 ). In particular, the initial state and all stand LSTM parameters are randomly initialized and tuned during training. In this way, we can use H i = h m as the representation for utterance u i in the context.

Neural Utterance Simulation Model
After we learn the representation H = {H 1 , H 2 , ..., H n−1 } from existed utterance sequence, we employ a neural utterance simulation model to simulate the next utteranceû n from H, and the overview of proposed neural utterance simulation model is illustrated in Figure 3. In particular, since the polarities of utterances from the same speaker are correlated, same speaker concatenation model is used to concatenate the utterances from same speaker of u n as a basic simulation. Moreover, since the polarity of u n is influenced by its context in dialog, dialog attention model is employed to consider the influence from dialog sequence for simulating u n .
Same Speaker Concatenation. To construct a basic simulation of u n , we concatenate the utterances from same speaker of u n : where {H 2 , H 4 , ..., H n−2 } denotes the sequence which is from the same speaker of u n , and u n is a basic simulation of u n . Dialog Attention. After getting the basic simulation u n , we then use dialog attention model to learn influence from utterance sequence to u n for simulating u n .
First, we learn the dialog representation d by concatenating the utterance sequence {H 1 , H 2 , ..., H n−1 }: Then, we use attention mechanism to learn influence from dialog representation d to the basic simulation u n . The attention model outputs a continuous vectorû n ∈ R d×1 recurrently by feeding the hidden where α j ∈ [0, 1] is the weight of h nj , and j α j = 1. For each piece of hidden state h dj ∈ d from the hidden representation of the dialog representation, the scoring function is calculated as follows: In this way, the vectorû n is learned as the simulation of u n from the existed utterance sequence H.

Hierarchical Sequence Influence Model
After we simulate the next utteranceû n from existed utterance sequence, we employ a hierarchical sequence influence model to learn influence from existed utterance sequence d to the simulated next utteranceû n . The overview of hierarchical sequence influence model is shown in Figure 4. In the hierarchical sequence influence model, we consider both pair-wise and sequence-wise influence model to learn the influence from existed utterance sequence to the simulated next utterance. The pairwise influence model is used to learn the influence from each utterance to the next utterance, and the sequence-wise influence model is used to learn influence from utterance sequence to the next utterance. Finally, we integrate the representations from pair-wise and sequence-wise influence model into a unified representation to learn influence from whole utterance sequence collectively.

Pair-wise Influence Model
Firstly, we employ attention mechanism to learn the pair-wise influence from utterance u i to the simulated next utteranceû n . The pair-wise attention model outputs a continuous vector v p i ∈ R d×1 recurrently by feeding the hidden representation vectors H n = {h n1 , h n2 , ..., h nm } fromû n as inputs. Specifically, v p i is computed as a weighted sum of where m is the hidden variable size, β j ∈ [0, 1] is the weight of h nj , and j β j = 1. For each piece of hidden state h ij ∈ H i from the hidden representation of u i , the scoring function is calculated as follows: Here, the vector v p i is used as the representation of influence from u i toû n .

Sequence-wise Influence Model
After we learn the pair-wise influence from each utterance to the simulated next utterance, we propose a sequence-wise influence model to learn the influence from whole utterance sequence to next utterance using attention mechanism. The sequence-wise attention model outputs a continuous vector v s ∈ R d×1 recurrently by feeding the hidden representation vectors H n = {h n1 , h n2 , ..., h nm } fromû n as inputs.
where γ j ∈ [0, 1] is the weight of h nj , and j γ j = 1. For each piece of hidden state h j from the hidden representation of the dialog representation d (Eq. 2), the scoring function is calculated as follows: Here, the vector v s is used as the representation of the sequence-wise influence from the whole utterance sequence d toû n .

Integrating Pair-wise and Sequence-wise Influence
After we learn the representation {v p 1 , v p 2 , ..., v p n−1 } from pair-wise influence model, and the representation v s from the sequence-wise influence model, we should integrate them into an uniform representation for sentiment forecasting of next utterance.
We first concatenate the representation {v p 1 , v p 2 , ..., v p n−1 } from pair-wise influence model into an uniform pair-wise representation v p : v We then concatenate pair-wise representation v p with sequence-wise representation v s into an uniform Here, we use v as the representation of simulated next utteranceû n by considering both pair-wise and sequence-wise influence for forecasting polarity of u n .

Sentiment Forecasting of Next Utterance
After we learn the representation v of simulated next utteranceû n with both pair-wise and sequence-wise influence, we employ a multi-layer perceptron model to learn the polarity (positive, negative, or neutral) of it. Since there are three sentiment categories, our task can be considered as a multi-label classification task. Formally, giving an input vector v, a hidden layer is used to induce a set of high-level features as follow: H P is used as inputs to a softmax output layer: Here, W h p , b h p , W p , and B p are model parameters.

Model Training
Given the utterance sequence {u 1 , u 2 , ..., u n−1 } in a dialog d i and the pre-defined polarity y i of next utteranceû n , our training objective is to minimize the cross-entropy loss over a set of training examples whereŷ i is the predicted label, θ y is the set of model parameters and λ is a parameter for L2 regularization.

Data and Setting
In all the experiments, the DailyDialog (Li et al., 2017) dataset is used to study the importance of sentiment forecasting task, and evaluate the performance of proposed NSF model. The dataset contains 13,118 multi-turn dialogs, the speaker turns are roughly 8, and the average tokens per utterance is about 15.
Since there are six kinds of emotion 1 in the original dataset, and some emotions occupy less than 5%, we thus merge all the emotions into three sentiment categories: positive(joy), negative(other emotions), and neutral (no emotion).
To construct a sentiment rich dataset, we only select the dialogs which contain at least one emotional utterance, we then get 7,395 dialogs. We randomly separate the dataset into training/test sets with 4,435/2,960 dialogs. For each dialog, we select top-4 utterances for our experiments: the top-3 utterances are considered as existed utterances ({u 1 , u 2 , u 3 }), the last utterance is considered as unknown utterance (u n ). Note that, we do not know the content of u n , and we should forecast the sentiment of it in experiments.
The vocabulary size is 9,888, the embedding size sets to 64, and the hidden size of all the model sets to 32. Here, all the model parameters are optimized by AdaGrad (Duchi et al., 2011). F1-measure (F1.) are used to evaluate the performance of proposed model in each sentiment category (positive, and negative), and micro-average F1-measure is used to evaluate the overall performance.

Experimental Results
In this subsection, we present experiment results to illustrate the importance of sentiment forecasting task, and show the effectiveness of proposed NSF model.

Comparison with Baselines
We first shows the results of proposed Neural Sentiment Forecasting (NSF) model with several strong baselines, where • LSTM i is a single utterance based sentiment forecasting model, it uses LSTM model to learn the representation of u i (1 ≤ i ≤ 3) (Section 3.1), and then employs the representation of u i to forecasting sentiment of next utterance u n .
• LSTM seq is a sequence based sentiment forecasting model, it employs a LSTM model to learn dialog representation d from the existed utterance sequence {u 1 , u 2 , u 3 } (Eq. 2), and then employ the dialog representation d to forecast sentiment of u n .
• ICON takes one utterance with previous k utterances as input, and uses a GRU model for modeling inter-personal dependency in previous utterances and stores all history with one memory network . • DialogRNN employs recurrent neural networks to keep track of the individual states of utterances and uses this information for sentiment classification in dialog (Majumder et al., 2018). It report best results in dialog sentiment classification.
Note that, since ICON and DialogRNN in Table 1 were designed for dialog sentiment classification instead of sentiment forecasting, we use u 2 to simulate u n for these two models 2 . From the results in Table 1, we can find that the performance of sequence based LSTM seq is better than utterance based LSTM i , it indicates that the importance of dialog sequence for forecasting sentiment, and it also suggest us to consider the influence of whole dialog sequence for sentiment forecasting of next utterance.
DialogRNN outperforms utterance based LSTM i , it also indicates the importance of dialog sequence for sentiment classification.
The proposed NSF model outperforms all other baselines significantly, it shows that we should consider both neural simulation and influence of dialog structure for forecasting sentiment of next utterance.

Comparison with Different Simulation Models
We then analyze the effectiveness of various neural simulation model, where • UniSim i is a basic single utterance simulation model, which use u i to simulate u n , and forecast polarity of it.
• DualSim i,j is a dual utterances simulation model, which concatenates the representation of u i and u j to simulate u n , and forecast polarity of it.
• SeqSim d→i is a sequence based simulation model, it employs attention mechanism to learn the influence from dialog representation d to u i for simulating u n , and forecast polarity of it. It has been discussed in Section 3.2.
From the results in Table 2, we can find that: UniSim i which only employs single utterance to simulate next utterance cannot achieve a well performance.
DualSim i,j which employs two utterances to simulate next utterance outperforms the single utterance based UniSim i , it indicates effectiveness of context for simulating next utterance.
The sequence attention based SeqSim d→i outperforms both single utterance and dual utterance based models, and SeqSim d→2 outperforms all other simulation model. It indicates the effectiveness of dialog attention and the same speaker's utterance (u 2 and u 4 are from the same speaker). Hence, we use SeqSim d→2 as the proposed neural simulation model in this studies.

Comparison with Different Influence Models
After we learn the simulated next utteranceû n from SeqSim d→2 in dialog, we then analyze influence of dialog sequence for next utterance with different neural influence model, where  • UniIf i→n employs attention mechanism to learn the influence from utterance u i to next utterance u n .
• PairIf learns pair-wise influence towards u n , by concatenating the representations from UniIf i→n , it has been discussed in Section 3.3.1.
• SeqIf learns sequence-wise influence from utterance sequence to u n , it has been discussed in Section 3.3.2.
From the results in Table 3, we can find that UniIf i→n which only considers the influence from single utterance u i cannot achieve a well performance.
The pair-wise influence model PairIf outperforms all the single utterance influence model, it indicates the importance of pair-wise influence model for sentiment forecasting. In addition, sequence-wise influence model SeqIf outperforms PairIf, it shows that sequence-wise influence model is much more important than pair-wise model.
Finally, the proposed NSF model outperforms all other influence models significantly, it shows that we should consider both pair-wise and sequence-wise influence for sentiment forecasting of next utterance.

Analysis and Discussion
In this section, we give some statistic and analysis to discuss our motivations and illustrate the importance of propose sentiment forecasting task.

Sentiment Correlation between Existed Utterances and Next Utterance
We analyze the sentiment correlation between next utterance u 4 and existed utterance u i (u i ∈ {u 1 , u 2 , u 3 }) in dialog. Table 5 shows conditional probability P (u n |u i ): given the polarity of u i , the conditional probability of u n (n = 4) with the same polarity of u i . From the table, we can find that polarity of all existed utterances are correlated with u n . In addition, we find that average conditional probability of P (u 4 |u 2 ) is much higher than other utterances, it may due to that u 2 and u 4 are from the same speaker, the polarity of utterances from same speaker may not much change in most situations.

Case Study
We select three examples from the testing data to illustrate the effectiveness of proposed NSF model compared with the LSTM seq model in Table 4. In the first example, we can find that although next utterance is not related with the existed utterances, the proposed NSF model still predicts correct polarity. It may due to that NSF can simulate next utterance based on existed utterance sequence.
In the second example, the polarity of next utterance is related with third utterance in second example, the proposed NSF predicts correct polarity by considering pair-wise influence between third utterance and next utterance. Meanwhile, by considering sequence-wise influence from existed utterances, the proposed NSF predicts correct polarity in the third examples.
In summary, NSF is much more effective by considering both neural simulation and influence of dialog structure for forecasting sentiment of next utterance.

Conclusion
In this paper, we propose a novel but important task, called sentiment forecasting in dialog, which aims to forecast the polarity of next utterance to come. There are two challenges in this task, one is that how to simulate the next utterance for predicting its' polarity, and another is how to learn the influence of existing utterance sequence for forecasting next utterance's polarity. In this paper, we propose a Neural Sentiment Forecasting (NSF) model to address above challenges. In particular, a neural simulation model is used to simulate the next utterance based on existed utterances sequence. In addition, a hierarchical influence model is used to learn the influence of existing utterance by considering both pair-wise and sequencewise influence. Empirical studies illustrate the importance of our proposed sentiment forecasting task, and show the effectiveness of our proposed NSF model over several strong baselines.