Detecting Aggression and Toxicity using a Multi Dimension Capsule Network

In the era of social media, hate speech, trolling and verbal abuse have become a common issue. We present an approach to automatically classify such statements, using a new deep learning architecture. Our model comprises of a Multi Dimension Capsule Network that generates the representation of sentences which we use for classification. We further provide an analysis of our model’s interpretation of such statements. We compare the results of our model with state-of-art classification algorithms and demonstrate our model’s ability. It also has the capability to handle comments that are written in both Hindi and English, which are provided in the TRAC dataset. We also compare results on Kaggle’s Toxic comment classification dataset.


Introduction
Many people refrain from expressing themselves or giving opinions online for the fear of harassment and abuse.Twitter admitted that such behavior is resulting in users quitting from their platform and sometimes they are even forced to change their location.Due to this, combating hate speech and abusive behavior has become a high priority area for major companies like Facebook, Twitter, Youtube, and Microsoft.With an ever-increasing content on such platforms, it makes impossible to manually detect toxic comments or hate speech.
Earlier works in Capsule network based deep learning architecture to classify toxic comments have proved that these networks work well as compared to other deep learning architectures (Srivastava et al., 2018).In this paper, we investigate the performance of a multi-dimension Capsule network as opposed to using a fixed dimension Capsule network for capturing a sentence representation and we shall discuss how well it captures features necessary for classification of such sen-tences.For our experiments we have taken up two different datasets, namely, TRAC-1, which has comments in Hindi and English both scraped from Facebook and Twitter and, Kaggle's Toxic Comment Classification Challenge which is a multilabel classification task.In our experiments, we discovered that our model is capable of handling transliterated comments, which is another major challenge in this task.Since one of the datasets we used, TRAC-1, was crawled from public Facebook Pages and Twitter, mainly on Indian topics, hence there is a presence of code-mixed text.This type of data is more observed in a real-world scenario.

Related Work
Numerous machine learning methods for detection of inappropriate comments in online forums exist today.Traditional approaches include Naive Bayes classifier (Kwok and Wang, 2013) (Chen et al., 2012) (Dinakar et al., 2011), logistic regression (Waseem, 2016) (Davidson et al., 2017) (Wulczyn et al., 2017) (Burnap and L. Williams, 2015), support vector machines (Xu et al., 2012) (Dadvar et al., 2013) (Schofield and Davidson, 2017), and random forests.However, deep learning models, for instance, convolutional neural networks (Gambäck and Sikdar, 2017) (Potapova and Gordeev, 2016) and variants of recurrent neural networks (Pavlopoulos et al., 2017) (Gao andHuang, 2017)(Pitsilis et al., 2018) (Zhang et al., 2018), have shown promising results and achieved better accuracies.Recent works in Toxic comment classification (van Aken et al.) compared different deep learning and shallow approaches on datasets and proposed an ensemble model that outperforms all approaches.Further, work done by (Nikhil et al., 2018) (Kumar et al., 2018) proposed LSTMs with attention on TRAC dataset for bet-  (Sabour et al., 2017), also recently these networks have been investigated for text classification (Yang et al., 2018).(Srivastava et al., 2018) proposed a Capsule Net based classifier for both the datasets used in this study, and showed that it works better than the previous stateof-art methods.We propose to extend this work by modifying it into a multi-dimension Capsule network, taking inspiration from Multi filter CNNs (Kim, 2014a).

Multi Dimension Capsule Net for Classification
We describe our multi-dimension Capsule Net architecture in this section which consists primarily of 5 layers as shown in Fig 1 .To get initial sentence representation, we concatenated individual word representation obtained from pretrained fast-Text embeddings (Joulin et al., 2016).The sentence representation is then passed through a feature extraction layer which consists of BiLSTM units to get a sentence representation.This representation is then passed through the Primary and Convolutional Capsule Layer to extract the highlevel features of a sentence.Finally, the features are then passed through a classification layer to calculate the class probabilities.
Word Embedding Layer: To get initial sentence representation, we used a weight matrix W ∈ R dw × |V | where, d w is the fixed vector dimension and |V | is vocabulary size.The vector in column w i of W represents lexical semantics of a word w i obtained after pre-training an unsupervised model on a large corpus (Mikolov et al., 2013), (Pennington et al., 2014), (Joulin et al., 2016).Feature Extraction Layer: This layer consists of BiLSTM units to capture the contextual information within words of a sentence.As proposed in (Schuster and Paliwal, 1997), we obtained both the  2×dsen) .We have used BiLSTMs for feature extraction as opposed to CNNs which have been used as a feature extraction layer for capsules in (Yang et al., 2018) and (Sabour et al., 2017), as CNNs put forward a difficulty of choosing an optimal window size (Lai et al., 2015) which could introduce noise.
Primary Capsule Layer: In (Sabour et al., 2017) authors proposed to replace singular scalar outputs of CNNs with highly informative vectors which consist of "instantiation parameters".These parameters are supposed to capture local order of word and their semantic representation (Yang et al., 2018).We have extended the model proposed in (Srivastava et al., 2018) to capture different features from input by varying the dimension of capsules.As proposed in (Kim, 2014b).having different window size can allow us to capture Ngram features from the input, we hypothesize that by varying dimension of capsules we can capture different instantiation parameters from the input.For context vectors C i , we used different shared where, g is nonlinear squash activation (Sabour et al., 2017), d is capsule dimension and d sen is the number of LSTM units used to capture input features.Factor d can be used to vary a capsule's dimension which can be used to capture different instantiation parameters.The capsules are then stacked together to create a capsule feature map, P = [p1, p2, p3, ..., p C ] ∈ R (N ×C×d) consisting of total N × C capsules of dimension d.
Dynamic Routing algorithm was proposed in (Sabour et al., 2017) to calculate agreement between capsules.The routing process introduces a coupling effect between the capsules of level (l) and (l+1) controlling the connection strengths between child and parent capsules.Output of a capsule is given by where, c ij is the coupling coefficient between capsule i of layer l to capsule j of layer (l+1) and are determined by iterative dynamic routing, W s is the shared weight matrix between the layers l and l+1.The routing process can be interpreted as computing soft attention between lower and higher level capsules.
Convolutional Capsule Layer: Similar to (Sabour et al., 2017) and (Yang et al., 2018), the capsules in this layer are connected to lower level capsules.The connection strengths are calculated by multiplying the input with a transformation matrix followed by the routing algorithm.The candidate parent capsule ûj|i is computed by ûj|i = W s ij u i where, u i is the child capsule and W s is shared weight between capsule i and j.The coupling strength between the child-parent capsule is determined by the routing algorithm to produce the parent feature map in r iterative rounds by c ij = exp(b ij ) k exp(b ik ) .Logits b ij which are initially same, determines how strongly the capsules j should be coupled with capsule i.The capsules are then flattened out into a single layer and then multiplied by a transformation matrix W FC followed by routing algorithm to compute the final sentence representation (s k ).The sentence representation is finally passed through the softmax layer to calculate the class probabilities.

Kaggle Toxic Comment Classification
In 2018, Kaggle hosted a competition named Toxic Comment Classification 1 .The dataset is made of Wikipedia talk page comments and is contributed by Conversation AI.Each comment has a multi-class label, and there are a total of 6 classes, namely, toxic, severe toxic, obscene, threat, insult and identity hate.We split the data (159571 sentences) into training (90%), validation (10%) and 153164 test sentences.

TRAC dataset
It is a dataset for Aggression identification2 , and contains 15,000 comments in both Hindi and English.The task is to classify the comments into the following categories, Overtly Aggressive (OAG), Covertly Aggressive (CAG), and Non-aggressive (NAG).We used the train, dev and test data as provided by the organizers of the task.

Experiments
As a preprocessing step, we performed casefolding of all the words and removal of punctuations.The code for tokenization was taken from (Devlin et al., 2018) which seems to properly separate the word tokens and special characters.
For training all our classification models, we have used fastText embeddings of dimension 300 trained on a common crawl.For out of vocabulary (OOV) words we initialized the embeddings randomly.For feature extraction, we used 200 LSTM units, each for capturing forward and backward contexts (total of 400).We used 20 capsules of dimension 15 and another 20 of dimension 20 for all the experiments.We kept the number of routings to be 3 as more routings could introduce overfitting.To further avoid overfitting, we adjusted the dropout values to 0.4.We used cross-entropy as the loss function and Adam as an optimizer (with default values) for all the models.We obtained all these hyperparameters values by tuning several models on the validation set and then finally selecting the model with minimum validation loss.

Results and Analysis
We have reported the results on a total of 3 datasets, two of which belong to TRAC-1 dataset.Our evaluation metric for TRAC-1 is F1 score, while for Kaggle dataset is ROC-AUC.We performed better for all the datasets except for TRAC Twitter data, in which our model could not beat the previous Capsule Network.We have used very  (Raffel and Ellis, 2015) 97.425 55.67 62.404 Hierarchical CNN (Conneau et al., 2017) 97.952 53.169 58.942 Bi-LSTM with Maxpool (Lai et al., 2015) 98  strong and some recent baseline algorithms for comparing our results.We shall now analyze examples for which our model is making mistakes, we will pick samples from TRAC Facebook English dataset.For analysis, we use LIME (Ribeiro et al., 2016), which performs some perturbations on the input data to understand the relationship between input and the output data.It uses a local interpretable model to approximate the model in question and tries to create certain explanations of input data.
From the confusion matrix, we can observe that the model gets most confused by predicting CAG comments as NAG.This can be because the words used in the sentence might not sound aggressive and the model labels them as neutral sentences.However, in reality, the sentence as a whole is a sarcastic one.For example, refer to Fig 4 which goes wrong because the words it is focussing on, are all neutral words, but when combined, it is sar- the government or some government official is being criticized, the attack is not directly pointed and there is hidden aggression.

Conclusion and Future Work
We reported our results on several obvious stateof-the-art deep learning architectures and reported better results on Capsule network.We also analyzed some misclassifications made by the model and tried to reason them as well using heatmap of the weights obtained from the model.For future work, as mentioned in (Sabour et al., 2017), there can be several methods to train capsules hence, we would like to explore these methods.We also want to try different loss functions like spread loss, focal loss and margin loss.We would also like to explore competency of capsules on different NLP tasks and explore their working using different investigation techniques seen in (Yang et al., 2018).

Figure 1 :
Figure 1: Multi Dimension Capsule Network ter classification.Capsule networks have shown to work better on images(Sabour et al., 2017), also recently these networks have been investigated for text classification(Yang et al., 2018).(Srivastava et al., 2018) proposed a Capsule Net based classifier for both the datasets used in this study, and showed that it works better than the previous stateof-art methods.We propose to extend this work by modifying it into a multi-dimension Capsule network, taking inspiration from Multi filter CNNs(Kim, 2014a).

Figure 3 :
Figure 3: Confusion matrix for TRAC dataset

Figure 4 :
Figure 4: CAG comment predicted as NAG comment

Figure 6 :
Figure 6: NAG comment predicted as OAG comment casm on bridging the gap the between the poor and the middle class.Secondly, the model is also incorrectly predicting NAG and OAG comments as CAG equally, this is because there are certain comments against the government which are mostly present in CAG class.Refer to Fig6 and Fig 4, in these comments, the government or some government official is being criticized, the attack is not directly pointed and there is hidden aggression.

Table 1 :
Results Of various architectures on publicly available datasets