AbuseAnalyzer: Abuse Detection, Severity and Target Prediction for Gab Posts

While extensive popularity of online social media platforms has made information dissemination faster, it has also resulted in widespread online abuse of different types like hate speech, offensive language, sexist and racist opinions, etc. Detection and curtailment of such abusive content is critical for avoiding its psychological impact on victim communities, and thereby preventing hate crimes. Previous works have focused on classifying user posts into various forms of abusive behavior. But there has hardly been any focus on estimating the severity of abuse and the target. In this paper, we present a first of the kind dataset with 7,601 posts from Gab which looks at online abuse from the perspective of presence of abuse, severity and target of abusive behavior. We also propose a system to address these tasks, obtaining an accuracy of ∼80% for abuse presence, ∼82% for abuse target prediction, and ∼65% for abuse severity prediction.


Introduction
In recent times, Online Social Media (OSM) has become an indispensable part of our lives. Not only these websites connect billions of people around the world, but they also serve as a platform for expressing opinions and sharing information quickly. However, recently OSM platforms have been a subject for criticism over the propagation of fake (Shu et al., 2017) and hateful content (Fortuna and Nunes, 2018). Such cases of online abuse have also translated into real world hate crimes. 2 Abuse in social media is spread across a wide spectrum from mild expressions of attitudes and beliefs to strong violent threats. Inspired by hate theories from Anti-Defamation League (ADL) 3 , we broadly classify forms of abuse as 'Biased Attitude, 'Act of Bias and Discrimination' and 'Violence and Genocide'. Moreover, abusive content could be targeted at specific individuals (e.g., a politician, a celebrity, etc.) or particular groups (a country, LGBTQ+, a religion, gender, an organization, etc.). Detection of such abusive content is critical for avoiding its psychological impact on victim communities, and thereby preventing hate crimes. Prioritization of particular abuse cases can be done if severity of abuse can be automatically assessed. Further, identifying if the abuse target is a person or a large group is critical to predict potential impact set and thereby predict if it could lead to real world crimes along with its scale. Hence, in this paper, we propose three abuse prediction tasks: prediction of abuse presence, abuse severity prediction and abuse target prediction.
Since traditional OSM websites are reasonably moderated, finding broadly abusive content is possible. But finding abusive behaviour of differing severity is a 'needle in a haystack' kind of challenge. In contrast to the other OSM, Gab is relatively unexplored and presents a wider spectrum of online abusive behaviour due to its liberal moderation policy . Hence, we gathered a dataset from Gab and contribute the labeled posts to the community in the hope of promoting deeper research on abusive content analysis. Gab is an alt-right social media website launched in 2016, which has seen a significant rise in the number of registered users to 1,000,000 users along with a daily web traffic of 5.1 million visits per day by the end of July 2019. 4 Our key contributions in this paper are as follows: • We contribute an abuse analysis dataset comprising 7,601 Gab posts with finer classification labels associated with presence, severity and target of abuse. The code and dataset are publicly available here 5 .
• We experiment with traditional machine learning (ML) classifiers with TF-IDF features, for the three abuse prediction tasks. We also experiment with two deep learning (DL) based methods. Our best method leads to high accuracy values of ∼80% for abuse presence, ∼82% for abuse target prediction, and ∼65% for abuse severity prediction.
Disclaimer: This paper contains examples of hate content used only for illustrative purposes, reader discretion is advised.

Related Work
Several past works have explored different kinds of online abuse (like racism, sexism etc.) on traditionally studied platforms like Twitter (Kwok and Wang, 2013;Waseem and Hovy, 2016;Davidson et al., 2017;ElSherief et al., 2018) and on some newer web communities like 4chan and Whisper (Hine et al., 2017;Silva et al., 2016). But web communities differ from each other through subtleties in language and demographic differences. Gab poses an altogether different challenge as it differs from older web groups primarily in its use of online communities to congregate, organize, and disseminate information in weaponized form (Marwick and Lewis, 2017). Some previous papers Lima et al., 2018;Mathew et al., 2019;Finkelstein et al., 2018) have presented basic statistical analysis of data extracted from Gab. Recently, Qian et al. (2019) presented a dataset of 33,776 posts on Gab annotated on binary labels hate/non-hate. While some papers have focused on racism versus sexism (Badjatiya et al., 2017), others have focused on sarcasm, cyber-bullying etc. (Founta et al., 2019). Initial works in this area focused on feature engineering based methods. With the emergence of deep learning, most of the recent works (Founta et al., 2019;Serrà et al., 2017;Park and Fung, 2017) have relied on deep learning techniques for abuse detection. To the best of our knowledge, there is no publicly available corpus or prediction system which focuses on fine-grained abusive language classification across all three tasks: prediction of abuse presence, abuse severity prediction and abuse target prediction.

Abuse Severity
Anti-Defamation League 6 presents a pyramid of hate from a sociology perspective (also supported by Sandu and Lyamouri-Bajja (2018)). Although the behaviors at each level of the hate pyramid negatively impacts individuals and groups, as one moves up the pyramid, the behaviors have more life-threatening consequences. Given the five levels in the pyramid, initially we considered five classes of abuse severity. However, we found that some of the classes were infrequent. Moreover, some classes were very similar. Hence, we finally recognized three clear classes of abuse as follows. Biased Attitude: Contains posts related to trolling, accusations, sarcasm and insensitive remarks. These posts project minor hate and negation of intimacy towards individual/group through jokes or blaming the target for some mishappening. E.g., "At lease my choice of president waits till they are out of diapers.", "Good news. No wonder they tried to f*ck her nomination over hard." Act of Bias and Discrimination: Consists of posts lying in the category of sexism, racism, xenophobia, homophobia and other forms of abuse which are based on some presumption or stereotype without well founded arguments. Additionally we consider aggression based on the ideologies (leftist, rightist, feminists etc.). This class also covers posts which induce a sense of inferiority complex on the basis of target's physical ('ugly', 'deformed' etc.), mental ('stupid', 'cretin' etc.) or moral deficiency ('sl*t', 'fraud' etc.). E.g., "Jews are tragic for the world.", "you're are a real f*cking braindead piece of trash b*tchboy." Violence and Genocide: Contains abusive behaviours like violent threat, intimidation and extremism. These posts have statements of intent to inflict harm that would result in serious and lasting bodily impact, where an individual could die or be significantly injured. It reflects intense anger/fear towards the targeted individual/group and depicts them as a threat to society. In many cases, the posts refer to adverse activities like deportations etc. E.g., "antiwhite whites deserve public hangings", "I know what is going on! I f*ck his mammy and I hurt her. Sorry mam!"

Abuse Targets
A comment targeted towards a particular community impacts a larger audience as compared to one targeted towards a particular individual. Hence, it is important to predict target of the abusive post as one of these three classes. Individual (Second-Person): Targets the person being mentioned in the post. Generally, there is usage of terms like '@username', 'you' and 'your' to refer the target. E.g., "No, but I do realize that you're full of sh*t and know it.", "@username is serving a purpose or just a load of hot air." Individual (Third-Person): Target a third person. Usually, these posts use terms like 'he', 'she', etc. or many a times the posts mention the name/username of the target. E.g., "His predatory sexual behavior is still evident.", "Another pedophile circles the wagons." Group: Target a group/organization based on ideologies, race, gender, religion, work industry or some other basis. Such posts contain terms like 'you all', 'they' or many a times refers to a group in an indirect manner. E.g., "We have some shit stirrers afoot today. Ignore them", "Why not set dead muslims on the curb in a trash bag?"

AbuseAnalyzer Dataset and Results
Our dataset contains 7,601 Gab posts classified on three different aspects: abuse presence or not, abuse severity and abuse target. Of the 4120 abusive posts, distribution based on severity is -'Biased Attitude': 1830, 'Act of Bias and Discrimination': 1807, and 'Violence and Genocide': 483. For the target classes -389 are in 'Individual (Second-Person)', 1330 in 'Individual (Third-Person)', and 2401 in the 'Group' class. The code and dataset are publicly available here 7 . Data Extraction and Pre-processing: We obtained a collection of 8.4 million Gab posts from http: //files.pushshift.io/gab/ for a period of 4 months from Jul to Oct 2018. We used a high precision lexicon which consists of racial, sexist, xenophobic, extremist and other derogatory terminologies aggregated from multiple source.We used this to filter 7,601 posts written in English for the annotation process. While we made efforts to strike a balance between abusive versus non-abusive posts, we made no efforts to maintain balance within abuse severity or abuse target classes. Annotation Procedure: Four annotators with fluent English skills were provided clear guidelines (refined iteratively) for annotating the posts across all the three abuse prediction tasks. In case a post could belong to more than one severity classes, annotators were asked to mark the higher severity class (based on life-threatening consequences), to avoid multi-labels. Each example was annotated by exactly 3 annotators and all the disagreements were resolved after involving all the annotators. As a measure of inter-annotator agreement, we observed Cohen's Kappa Score (Cohen, 1960) as (1) 0.719 for presence/absence of abuse, (2) 0.720 for presence+target, and (3) 0.683 for presence+severity classification. In each case the Kappa score is near 0.7 which is a very good agreement among the annotators. Dataset Statistics and Analysis: Table 1 shows the distribution of the 'Target' labels among each of the 'Severity' classes. We observe that majority of the abusive posts are against the 'Group' class, specifically for 'Act of Bias and Discrimination' class which is intuitive since this category covers the topics of racism, sexism etc.  Table 1: Distribution of posts across various abuse severity and abuse target classes. Table 2 shows popular unigrams and bigrams for various severity and target classes. We observe that: (1) Community related words and bigrams like 'jew', 'muslim', etc. are quite frequent for 'Act of Bias and Discrimination' class which is in line with the nature of posts on Gab.
(2) violent ngrams like 'kill', 'the holocaust' are present in the 'Violence and Genocide' class. (3) Second person pronouns like "you", "yourself", etc. are frequent in the 'Individual (Second-Person)' class. (4) Third person pronouns and bigrams like "he", "she", "hes a", etc. are frequent in the 'Individual (Third-Person)' class. (5) Multiplicity indicating ngrams like "these people", "them", etc. are popular in the 'Group' class.  Prediction Results: We experiment with multiple statistical ML methods (Support Vector Machines (SVM), XGBoost and Logistic Regression (LR)) using TF-IDF features. We also trained two Deep Learning based models: (1) Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018) using transfer learning and (2) GloVe-based (Pennington et al., 2014) Long Short Term Memory (Hochreiter and Schmidhuber, 1997) networks (referred as GloVe+LSTM). With BERT, we use an additional 2-layer multi-layer Perceptron (MLP) for classification with a dropout value of 0.2.We trained both the DL networks using Adam optimizer (Kingma and Ba, 2014). Table 3 shows 5-fold cross validation accuracy (micro F1) and macro F1 for each of the methods. We observe that our BERT based model outperforms other methods with SVM being the best out of the ML models.  Confusion matrices: We show the confusion matrices for abuse target and severity prediction tasks in Tables 4 and 5 respectively. The entries denote the sum of examples in the 5-fold cross validation. Error Analysis: Table 6 presents the cases where AbuseAnalyzer mis-classifies the examples. We present some interesting cases for each of the three abuse prediction tasks. For the task of prediction of presence of abuse, we see that terms like 'black', 'muslims' which are prone to online abuse pose   a challenge for the classifier. For example, the first post in Table 6 talks about the adoption of a girl belonging to the black community, this example is non-abusive but it is wrongly classified as abusive due to the presence of potentially racial terms. Similar is the case with the second post which reports a news of arrest of muslim jihadists. In example 4 in Table 6 the presence of the pronoun 'you' along with the overall tone of the post of being sarcastic confused the system to predict the target class as 'Individual (Second-Person)' where the ground truth label was 'Group' as the post conveys a racist ideology against Jews. Example 5 presents an interesting case which trolls the concerned person while making a general statement about the world, due to the presence of terms like 'evil' along with 'world', the system got confused. In example 6 the reference to the third person has been made using '@usermention' but later the pronoun 'you' has been used to refer to this person, this change in the way of referencing confused the system. Example 7 in Table 6 is a sexist comment on the target which blames her for making false accusation of rape. But the presence of an extremist term like rape made the classifier to commit error. Example 8 presents a case of an extremist post which propagates the hate in a subtle way. The post talks about killing immigrants from across the border. This phenomenon was common with other posts where the hate was expressed in a very subtle way without using any explicit terms. In example 9 we have a case of trolling, where the person posting has trolled national socialists.

Group
Individual Second Person My tweet to this creature usermention You scrubbed your Social Media history but its too late The FBI is investigating you now You better lawyer up You wont do well in Prison.

Individual Second Person
Individual Third Person Severity Rape Im sure she was begging for it Doesnt look like a rape scene to me Violence and Genocide Act of Bias and Discrimination As immigrants flow across US border American guns go south Act of Bias and Discrimination Violence and Genocide How do yall national socialists feel now that the democrats are adopting national socialist policies instead of marxist policies Act of Bias and Discrimination Biased Attitude Table 6: Sample cases where AbuseAnalyzer predicts incorrectly in comparison to the ground truth.

Conclusion
In this paper, we presented a novel dataset with 7,601 Gab posts labeled for abuse presence, target and severity. We experimented with both statistical and deep learning based models for each of these tasks and showed that the BERT based model performs the best. There are several open avenues for the presented work like exploring context based abuse detection. Another direction can be to annotate the multimodal data using the presented annotation scheme and use it for the task of abuse detection.