Pay “Attention” to your Context when Classifying Abusive Language

The goal of any social media platform is to facilitate healthy and meaningful interactions among its users. But more often than not, it has been found that it becomes an avenue for wanton attacks. We propose an experimental study that has three aims: 1) to provide us with a deeper understanding of current data sets that focus on different types of abusive language, which are sometimes overlapping (racism, sexism, hate speech, offensive language, and personal attacks); 2) to investigate what type of attention mechanism (contextual vs. self-attention) is better for abusive language detection using deep learning architectures; and 3) to investigate whether stacked architectures provide an advantage over simple architectures for this task.


INTRODUCTION & RELATED WORK
Any social interaction involves an exchange of viewpoints and thoughts.But these views and thoughts can be caustic.Often we see that users resort to verbal abuse to win an argument or overshadow someone's opinion.On Twitter, people from every sphere have experienced online abuse.Be it a famous celebrity with millions of followers or someone representing a marginalized community such as LGBTQ, Women and more.We want to channelize Natural Language Processing (NLP) for social good and aid in the process of flagging abusive tweets and users.Detecting abuse on Twitter can be challenging, particularly because the text is often noisy.Abuse can also have different facets.[10] released one of the initial data sets from Twitter with the goal of identifying what constitutes racism and sexism.[9] in their work pointed out that hate speech is different from offensive language and released a data set of 25k tweets with the goal of distinguishing hate speech from offensive language.
Stop saying dumb blondes with pretty faces as you need a pretty face to pull them off !!! #mkr In Islam women must be locked in their houses and Muslims claim this is treating them well Table 1: Tweets from [10] data set demonstrating online abuse They find that racist and homophobic tweets are more likely to be classified as hate speech but sexist tweets are generally classified as offensive.[4] introduced a large, hand-coded corpus of online harassment data for studying the nature of harassing comments and the culture of trolling.Keeping these motivations in mind, we make the following salient contributions: • We build a deep context-aware attention-based model for abusive behavior detection on Twitter .To the best of our knowledge ours is the first work that exploits context aware attention for this task.• Our model is robust and achieves consistent performance gains in all the three abusive data sets • We show how context aware attention helps in focusing on certain abusive keywords when used in specific context and improve the performance of abusive behavior detection .

RELATED WORK
Existing approaches to abusive text detection can be broadly divided into two categories: 1) Feature intensive machine learning algorithms such as Logistic Regression (LR), Multilayer Perceptron (MLP) and etc. 2) Deep Learning models which learn feature representations on their own.[10] released the popular data set of 16k tweets annotated as belonging to sexism, racism or none class 1 , and provided a feature engineered model for detection of abuse in their corpus.[9] use a similar handcrafted feature engineered model to identify offensive language and distinguish it from hate speech.[2] in their work, experiment with multiple deep learning architectures for the task of hate speech detection on Twitter using the same data set by [10].Their best-reported F1-score is achieved using Long Short Term Memory Networks (LSTM) + Gradient Boosting.
On the data set released by [10], [5] experiment with a two-step approach of detecting abusive language first and then classifying them into specific types i.e. racist, sexist or none.They achieve best results using a Hybrid Convolution Neural Network (CNN) with the intuition that character level input would counter the purposely or mistakenly misspelled words and made-up vocabularies.[6] in their work ran experiments on the Gazetta dataset and the DETOX system ( [12]) and show that a Recurrent Neural Network (RNN) coupled with deep, classification-specific attention outperforms the previous state of the art in abusive comment moderation.In their more recent work [7] explored how user embeddings, user-type embeddings, and user type biases can improve their previous RNN based model on the Gazetta dataset.Attentive neural networks have been shown to perform well on a variety of NLP tasks ( [13], [11]).[13] use hierarchical contextual attention for text classification (i.e attention both at word and sentence level) on six large scale text classification tasks and demonstrate that the proposed architecture outperform previous methods by a substantial margin.We primarily focus on word level attention because most of the tweets are single sentence tweets.
The best choice for modeling tweets was Long Short Term Memory Networks (LSTMs) because of their ability to capture long-term dependencies by introducing a gating mechanism that ensures the proper gradient propagation through the network.We use bidirectional LSTMs because of their inherent capability of capturing information from both: the past and the future states.A bidirectional LSTM (BiLSTM) consists of a forward LSTM − → f that reads the sentence from x 1 to x T and a backward LSTM ← − f that reads the sentence from x T to x 1 , where T is the number of words in the sentence under consideration and x i is the i t h word in the sentence.We obtain the final annotation for a given word x i , by concatenating the annotations from both directions (Eq.[1]).[1] show that LSTMs can benefit from depth in space.Stacking multiple recurrent hidden layers on top of each other, just as feed forward layers are stacked in the conventional deep networks give performance gains .And hence we choose stacked LSTM for our experiments.

Word Attention
The attention mechanism assigns a weight to each word annotation that is obtained from the BiLSTM layer.We compute the fixed representation v of the whole message as a weighted sum of all the word annotations which is then fed to a final fully-connected Softmax layer to obtain the class probabilities.We first feed the LSTM output h i of each word x i through a Multi Layer Perceptron to get u i as its hidden representation.u c is our word level context vector that is randomly initialized and learned as we train our network.Once u i is obtained we calculate the importance of the word as the similarity Data Set Tweets Count [10] 15,844 [9] 25,112 [4] 20,362 Table 2: Data sets and their total tweets count of u i with u c and get a normalized importance weight α i through a softmax function.The context vector u c can be seen as a tool which filters which word is more important over all the words like that used in the LSTM. Figure 2 shows the high-level architecture of this model.W h and b h are the attention layers weights and biases.More formally,

EXPERIMENTS
In this section we talk about data sets first and then go on to show our results obtained on these three data sets .We also show some examples where our model failed .Finally we show how attention helps us understand the model in a better fashion.

Data Sets
We have used the 3 benchmark data sets for abusive content detection on Twitter.At the time of the experiment, the [10] data set had a total of 15,844 tweets out of which 1,924 were labelled as belonging to racism, 3,058 as sexism and 10,862 as none.The [9] data set had a total of 25,112 tweets out of which 1498 were labelled as hate speech, 19,326 as offensive language and 4,288 as neither.For the [4] data set, there were 20,362 tweets out of which 5,235 were positive harassment examples and 15,127 were negative.
We call [10] data set as D1 , [9] data set as D2 and [4] as D3 For tweet tokenization, we use Ekphrasis which is a text processing tool built specially from social platforms such as Twitter.
[3] use a big collection of Twitter messages (330M) to generate word embeddings, with a vocabulary size of 660K words, using GloVe ( [8]).We use these pre-trained word embeddings for initializing the first layer (embedding layer) of our neural networks.

Results
The network is trained at a learning rate of 0.001 for 10 epochs, with a dropout of 0.2 to prevent over-fitting.The results are averaged over 10-fold cross-validations for D1 and D3 and 5 fold cross-validations for D2 because [9] reported results using 5 fold CV.Because of class imbalance in all our data sets, we report weighted F1 scores.
Table 3 shows our results in detail.We compare our model with the best models reported in each paper.Because [4] is a data set paper, we cannot fill the corresponding row.* denotes the numbers from baseline papers.All the results were reproducible except for the one marked red.For (Waseem and Hovy, 2016) data set, (Badjatiyaet al., 2017) claim that using Gradient Boosting with LSTM embeddings obtained from random word embeddings boosted their performance by 12 F1 from 81.0 to 93.0.When we tried to reproduce the result, we did not find any significant improvement over 81.Results show that our model is robust when it comes to the performance on all of the three data sets.3: Data sets and the results of different models.We reproduced the results for each model on three of the data sets.
We also share some examples from the three data sets in Figure 2 which our BiLSTM attention model could not classify correctly.On closer investigation we find that most cases where our model fails are instances where annotation is either noisy or the difference between classes are very blurred and subtle.

Why Contextual Attention?
Attention mechanism enables our neural network to focus on the relevant parts of the input more than the irrelevant parts while performing a prediction task.But the relevance is often dependant on the context and so the importance of words is highly context dependent.For example, the word islam may appear in the realm of Racism as well as in any normal conversation.The top tweet in Figure 3

Attention Heat Map Visualization
The color intensity corresponds to the weight given to each word by the contextual attention.
Figure 4: The first tweet is a sexist tweet from [10] where as the second tweet is an example of racist tweet from the same datset .The third tweet is from [9] data set labelled as offensive language.

CONCLUSION AND FUTURE WORK
We successfully built a deep context-aware attention-based model and applied it to the task of abusive tweet detection.We ran experiments on three relevant data sets and empirically showed how our model is robust when it comes to detecting abuse on Twitter.We also show how context-aware attention helps us to interpret the model's performance by visualizing the attention weights and conducting thorough error analysis.As for future work, we want to experiment with a model that learns user embeddings from their historical tweets.We also want to model abusive text classification in Twitter by taking tweets in context because often standalone tweets don't give a clear picture of a tweet's intent.

Figure 2 :
Figure2: The first tweet is a tweet from[10], the second tweet is a tweet from from[9] data set and the third from the[4] datset belongs to None class while the bottom tweet belongs to Racism class.

Figure 3 :
Figure 3: An example showing how our model captures diverse context and assigns context-dependent weights to the same word in two different tweets.