Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates

Inferring the agreement/disagreement relation in debates, especially in online debates, is one of the fundamental tasks in argumentation mining. The expressions of agreement/disagreement usually rely on argumentative expressions in text as well as interactions between participants in debates. Previous works usually lack the capability of jointly modeling these two factors. To alleviate this problem, this paper proposes a hybrid neural attention model which combines self and cross attention mechanism to locate salient part from textual context and interaction between users. Experimental results on three (dis)agreement inference datasets show that our model outperforms the state-of-the-art models.


Introduction
The rise of various discussion forums and online debate platforms, has given users a lot of opportunities to express themselves and argue with each other. The online argumentation and discussion are always initiated and evolved by expressions of agreement or disagreement of participants. Inferring the agreement/disagreement in online debates is crucial for many other tasks in broader analysis of social media and argumentation mining, such as stance identification (Somasundaran and Wiebe, 2010), claim/argument extraction (Hidey et al., 2017) and persuasion analysis (Tan et al., 2016).
It is observed that the expression of agreement/disagreement in debates can be decomposed into two factors: 1) the self-expression of claims and 2) argumentative expressions to interact with other participants. To illustrate this observation, we show some examples in Figure 1, which is one of quote-response pair (Q-R pair) in 4forum online * Corresponding author. debate website. The response expressed disagreement with the claim in quote text. The mark ? at the end of sentence carries strong emotion of authors, while the phrase why doesn't He answer refers to the claims of God IS GOOD and express the refutation to it.
God IS GOOD all the time … God IS GOOD all the time … God IS GOOD all the time … God IS GOOD all the time … Then why doesn't He answer prayers like He says He will in the Bible ? Then why doesn't He answer prayers like He says He will in the Bible ?

Response
Then why doesn't He answer prayers like He says He will in the Bible ?

Disagree
God IS GOOD all the time … God IS GOOD all the time … Then why doesn't He answer prayers like He says He will in the Bible ?

Response
Then why doesn't He answer prayers like He says He will in the Bible ? Figure 1: Sampled Q-R Pair with topic of evolution where the words colored red deliver crucial meaning of the text itself, while the words colored blue clarify the interactive relation between users.

Disagree
Previous works on agreement/disagreement inference mainly focus on exploiting features to model the semantic information which only reveals author's self-expression. (Rosenthal and McKeown, 2015;Menini and Tonelli, 2016).
These existing models treat agreement/disagreement inference as a ordinary sentiment classification problem and ignore the interactions between participants in the discussion. In order to jointly leverage the semantic information of the text and interactions between Q-R pairs, we regard the (dis)agreement inference as a special case of Natural Language Inference (NLI) (Rocktäschel et al., 2016), and propose a hybrid neural attention model to this problem. The proposed model consists of two kinds of attention: 1) self attention locates salient parts in text of quote and response, and 2) cross attention captures the interactive argumentations between Q-R pairs. The fusion of self and cross attention model is capable of jointly modeling the two important factors of inferring (dis)agreement in debates.
The main contributions of this paper are: (1) We propose a neural attention model for (dis)agreement inference which converts this problem to a natural language inference task. The proposed model incorporates self and cross attention mechanism, jointly capturing significant part for current context and extracting interactive relations between Q-R pairs. (2) Experimental results on three datasets show that the proposed model significantly improves performance (measured by F 1 score and accuracy) of state-of-the-art models by 1% on average. The visualization of extracted attention demonstrates different attention mechanism works effectively in different aspect for (dis)agreement inference.

Related Work
With the development of social forums, works on (dis)agreement inference have shifted to online debate. Abbott et al. (2011) Menini and Tonelli (2016) develop a SVM classifier to detect disagreement, relying on three aspects including sentiment-based, semantic and surface features extracted from both whole text and topic-related part. However, the performances of all these models highly depend on the quality of hand-crafted features. And these representations cannot reflect the interaction between quote and response.
In other NLP tasks, the end-to-end deep learning approaches with attention mechanism have shown impressive results. The attention mechanism is proposed by Bahdanau et al. (2014) in machine translation for selecting alignment between original words and foreign words before translation. For Document Classification, Yang et al. (2016) apply a hierarchical attention from word-level to sentence-level with learnable context vector. In Natural Language Inference (NLI), Liu et al. (2016) construct an inner-attention with mean pooling vector to seize important part from text itself. Hao et al. (2017) propose an cross attention modeling mutual influence between question and answer for Question Answering (QA). But there is no neural attention model incorporating both contextual and interactive information in the scenario of (dis)agreement inference.

Model
The overall architecture of our model is shown in Figure 2, comprising two parallel bi-directional LSTM (Hochreiter and Schmidhuber, 1997) networks as quote and response encoder and two attention components that respectively extract self and cross attention.

Quote and Response Encoder
A quote of length T is denoted as [q 1 , q 2 , · · · , q T ], where q t ∈ R de is the d e -dimensional representation of the t-th word in the text sequence. Similarly, the corresponding response can be represented as [r 1 , r 2 , · · · , r T ], which shares the same vector space with quote. To model the dependence relation of text sequence, we leverage bidirectional LSTM (BiLSTM) to encode quote and response. The BiLSTM consists of a forward −−−−→ LST M which reads the text from x 1 to x T and a backward Through concatenation, we obtain the representation of each time

Attention Component
After encoding the implicit word semantics, we acquire the representation of both quote and response.

Self Attention
The first source taken into consideration should be the text sequence itself, i.e. the attention from quote to quote itself and that from response to response itself. When issuing an opinion, people tend to center on several keywords which convey the main idea. Thus in some sense, self attention is a kind of dependency parsing that drives the model to focus on salient parts of the context. Here, for   where δ is a transformation mapping 2ddimensional vector into scalar value, with learnable weight W S ∈ R T and bias b S ∈ R defined as: Similarly, with another parallel transformation, the self attention signal of response can be calculated as above. Then, we can obtain a more compact representation of quote and response respectively derived from the weighted sum, where Q S , R S ∈ R 2d .

Cross Attention
Another prominent facet comes from the relation between each Q-R pair, i.e the attention from quote to response and that from response to quote. In whether disagreement or agreement cases, both quote and response provides a precise context for each other. The cross attention integrates interactive influence which produces more specific features for (dis)agreement inference. As discussed above, cross attention c t Q , c t R for quote and response can be computed by: where γ and γ are two parallel transformation with learnable weight matrix W C , W C ∈ R T and bias b C , b C ∈ R defined as: The representation of whole sequence Q C , R C ∈ R 2d embracing cross attention signal are:

Hybrid Attention
In order to cooperate the advantage of self attention and cross attention, we design hybrid attention to get a more specific representation for quote and response: where Q, R ∈ R 6d and ⊕ is the vector concatenation operation.

(Dis)agreement Inference
Finally, the quote representation Q and response representation R are concatenated as a vector v. We use a fully-connected network to project 12ddimensional representation into n-dimensional vector space, i.e.
where y ∈ R n is predicted probability distribution for (dis)agreement inference, W l and b l are parameters of softmax layer. In a supervised learning framework, we train our model in an end-to-end way. Given a set of training data {(Q i , R i ), y i }, let y i denote the predicted probability distribution, the goal of training is to minimize the cross-entropy loss: where i is the index of quote-response pair, j is the index of class and y i is the ground truth of corresponding pair.

Experiment and Results
As prior work, we concentrate on direct disagreement and agreement between quote-response (Q-R) pairs. Specifically, in the proposed model, the size of hidden units is 128 and all word embeddings are initialized by GloVe (Pennington et al., 2014) of 300d. Both length of quote and response are set to 64, padded where necessary. Adam is the optimizer of model whose learning rate is 1e − 3, β is (0.9, 0.999), is 1e − 8 and weight decay is 1e − 5. All models are trained by mini-batch of 32 instances, with 5-fold cross validation.

Datasets
We conduct experiments on three most commonly-used (dis)agreement inference datasets. Table 1 shows the detail of these datasets.
• Internet Argument Corpus (IAC) (Walker et al., 2012) is a corpus crawled from online political debate 4forums.com. Following prior work, we compute average score for each pair and convert the score into binary labels, with [−5, −1] as disagreement and [+1, +5] as agreement. • Debatepedia (DP) (Menini and Tonelli, 2016). DP corpus is crawled from debatepedia.org, which is an online encyclopedia of debates.
• Agreement by Create Debaters (ABCD) (Rosenthal and McKeown, 2015) is developed from createdebate.com with labels of agreement, disagreement and neutral.
As the original settings, the comparison experiments are conducted on a balanced training set by downsampling and the full test set.

Comparison with Baseline Methods
As shown in Table 2, by accuracy and average F 1-score in percentage, we compare our model with the best performing model of corresponding dataset to our knowledge. These models are reported in (Abbott et al., 2011;Menini and Tonelli, 2016;Rosenthal and McKeown, 2015) as Naive Bayes (NB), JRipχ 2 (ruled based classifier using χ 2 for feature selection), SVM, Maximum Entropy (ME), exploiting a rich suite of features including n-grams, sentiment lexicon and syntax. We also analyze the contribution of each component in ablation experiment. BiLSTM-sum and BiLSTM-concat refer only sum or concat operation is applied to both self and cross attention respectively. Results show that BiLSTM-hybrid gives the best performance across all datasets regardless of data sizes. For smaller dataset such as IAC, our model outperforms the previous best methods by 8.8%. This outcome is consistent across other larger datasets with a significant improvement of 19.6% on DP. What' s more important, on DP, the length of text is longer than other datasets, so ordinary BiLSTM suffering from gradient vanishing results in the poor performance. It is the hybrid attention that effects. As for ABCD, compared with ME based on textual features, Our BiLSTM-hybrid also gives superior performance of average F 1 in 3-way inference. Since ABCD is a corpus annotated by meta-thread rules, the ME attaching conversational structure attains the best performance. We think it a corpus-specific feature with weak generalization ability.
In addition, we adapt a NLI-oriented model proposed by Liu et al. (2016) as a stronger baseline, which comprises inner-attention with mean pooling. The mean pooling of text encoder is set as the summary representation for inner-attention to seize important part from text itself. It is similar to our self attention but with coarse-grained level from text to word. The results imply that our BiLSTM-hybrid modeling additional interaction with fine-grained attention from word to word performs better.

Qualitative Analysis
To validate that different attention focuses on different part of text sequence, we visualize the outputs of self attention layer and cross attention layer, with a Q-R pair of disagreement from IAC. As show in Figure 3, darker color indicates larger weight in the corresponding attention vector.
In quote, the self attention selects good which is exactly the point that quote wants to argue. Similarly, the self attention selects ? in response, which indicates a rhetorical mood to show disagreement. On the other hand, even though why doesn't he answer in response is endowed less weight from the self attention, the cross attention highlights it and god in quote. When inspecting the cross matrix product of this pair, Figure 4 demonstrates that our method is able to model the reference between god is good and why doesn't he answer in the whole interactive context.

Conclusion
In this paper, we propose a hybrid attention based neural network for (dis)agreement inference in debate. The main motivation is to jointly  leverage self attention for textual context and cross attention for interactions between users to improve the capability of inference on agreement/disagreement relations. Experimental results show that our model outperforms several strong baselines. Visualization of extracted attention of our model illustrates that our models is effective in capturing the main point from different aspects.