One-step and Two-step Classification for Abusive Language Detection on Twitter

Automatic abusive language detection is a difficult but important task for online social media. Our research explores a two-step approach of performing classification on abusive language and then classifying into specific types and compares it with one-step approach of doing one multi-class classification for detecting sexist and racist languages. With a public English Twitter corpus of 20 thousand tweets in the type of sexism and racism, our approach shows a promising performance of 0.827 F-measure by using HybridCNN in one-step and 0.824 F-measure by using logistic regression in two-steps.


Introduction
Fighting abusive language online is becoming more and more important in a world where online social media plays a significant role in shaping people's minds (Perse and Lambe, 2016). Nevertheless, major social media companies like Twitter find it difficult to tackle this problem (Meyer, 2016), as the huge number of posts cannot be mediated with only human resources. Warner and Hirschberg (2012) and Burnap and Williams (2015) are one of the early researches to use machine learning based classifiers for detecting abusive language. Djuric et al., (2015) incorporated representation word embeddings (Mikolov et al., 2013). Nobata et al. (2016) combined pre-defined language elements and word embedding to train a regression model. Waseem (2016) used logistic regression with n-grams and user-specific features such as gender and location. Davidson et al. (2017) conducted a deeper investigation on different types of abusive language. Badjatiya et al. (2017) experimented with deep learning-based models using ensemble gradient boost classifiers to perform multi-class classification on sexist and racist language. All approaches have been on one step.
Many have addressed the difficulty of the definition of abusive language while annotating the data, because they are often subjective to individuals (Ross et al. 2016) and lack of context (Waseem and Hovy, 2016;Schmidt & Wiegand, 2017). This makes it harder for non-experts to annotate without having a certain amount of domain knowledge (Waseem, 2016).
In this research, we aim to experiment a twostep approach of detecting abusive language first and then classifying into specific types and compare with a one-step approach of doing one multiclass classification on sexist and racist language.
Moreover, we explore applying a convolutional neural network (CNN) to tackle the task of abusive language detection. We use three kinds of CNN models that use both character-level and word-level inputs to perform classification on different dataset segmentations. We measure the performance and ability of each model to capture characteristics of abusive language.

Methodology
We propose to implement three CNN-based models to classify sexist and racist abusive language: CharCNN, WordCNN, and HybridCNN. The major difference among these models is whether the input features are characters, words, or both.
The key components are the convolutional layers that each computes a one-dimensional convolution over the previous input with multiple filter sizes and large feature map sizes. Having different filter sizes is the same as looking at a sentence with different windows simultaneously. Maxpooling is performed after the convolution to capture the feature that is most significant to the output.

CharCNN
CharCNN is a modification of the character-level convolutional network in (Zhang et al. 2015). Each character in the input sentence is first transformed into a one-hot encoding of 70 characters, including 26 English letters, 10 digits, 33 other characters, and a newline character (punctuations and special characters). All other non-standard characters are removed. Zhang et al. (2015) uses 7 layers of convolutions and max-pooling layers, 2 fully-connected layers, and 1 softmax layer, but we also designed a shallow version with 2 convolutions and maxpooling layers, 1 fully-connected layers, and 1 softmax layers with dropout, due to the relatively small size of our dataset to prevent overfitting.

WordCNN
WordCNN is a CNN-static version proposed by Kim (2014). The input sentence is first segmented into words and converted into a 300-dimensional embedding word2vec trained on 100 billion words from Google News (Mikolov et al., 2013). Incorporating pre-trained vectors is a widely-used method to improve performance, especially when using a relatively small dataset. We set the embedding to be non-trainable since our dataset is small.
We propose to segment some out-ofvocabulary phrases as well. Since the Twitter tweets often contain hashtags such as #womenagainstfeminism and #feminismisawful we use a wordsegment library (Segaran and Hammerbacher, 2009) to capture more words.

HybridCNN
We design HybridCNN, a variation of WordCNN, since WordCNN has the limitation of only taking word features as input. Abusive language often contains either purposely or mistakenly misspelled words and made-up vocabularies such as #feminazi.
Therefore, since CharCNN and WordCNN do not use character and word inputs at the same time, we design the HybridCNN to experiment whether the model can capture features from both levels of inputs.
HybridCNN has two input channels. Each channel is fed into convolutional layers with three filter windows of different size. The output of the convolution are concatenated into one vector after 1-max-pooling. The vector is then fed into the final softmax layer to perform classification (See Figure 1).

Datasets
We used the two English Twitter Datasets (Waseem and Hovy, 2016;Waseem, 2016) published as unshared tasks for the 1 st Workshop on Abusive Language Online(ALW1). It contains tweets with sexist and racist comments. Waseem and Hovy (2016) created a list of criteria based on a critical race theory and let an expert annotate the corpus. First, we concatenated the two datasets into one and then divided that into three datasets for one-step and two-step classification (Table 1). One-step dataset is a segmentation for multi-class classification. For two-step classification, we merged the sexism and racism labels into one abusive label. Finally, we created another dataset with abusive languages to experiment a second classifier to distinguish "sexism" and "racism", given that the instance is classified as "abusive".

Training and Evaluation
We performed two classification experiments: 1. Detecting "none", "sexist", and "racist" language (one-step) Figure 1 Architecture of HybridCNN 2. Detecting "abusive language", then further classifying into "sexist" or "racist" (twostep) The purpose of these experiments was to see whether dividing the problem space into two steps makes the detection more effective.
We trained the models using mini-batch stochastic gradient descent with AdamOptimizer (Kingma and Ba, 2014). For more efficient training in an unbalanced dataset, the mini-batch with a size of 32 had been sampled with equal distribution for all labels. The training continued until the evaluation set loss did not decrease any longer. All the results are average results of 10-fold cross validation.
As evaluation metric, we used F1 scores with precision and recall score and weighted averaged the scores to consider the imbalance of the labels. For this reason, total average F1 might not between average precision and recall.
As baseline, we used the character n-gram logistic regression classifier (indicated as LR on Table 2-4) from Waseem and Hovy (2016), Support Vector Machines (SVM) classifier, and FastText (Joulin et al., 2016) that uses average bag-of-words representations to classify sentences. It was the second best single model on the same dataset after CNN (Badjatiya et al., 2017).

Hyperparameters
For hyperparameter tuning, we evaluated on the validation set. These are the hyperparmeters used for evaluation.

One-step Classification
The results of the one-step multi-class classification are shown in the top part of Table 2.
Our newly proposed HybridCNN performs the best, giving an improvement over the result from WordCNN. We expected the additional character input channel improves the performance. We assumed that the reason CharCNN performing worse than WordCNN is that the dataset is too small for the character-based model to capture word-level features by itself.
Baseline methods tend to have high averaged F1 but low scores on racism and sexism labels due to low recall scores.

Two-step Classification
The two-step approach that combines two binary classifiers shows comparable results with one-step approach. The results of combining the two are shown in the bottom part of Table 3.
Combining two logistic regression classifiers in the two-step approach performs about as well as one-step HybridCNN and outperform one-step logistic regression classifier by more than 10 F1 points. This is surprising since logistic regression takes less features than the HybridCNN.
Furthermore, using HybridCNN on the first step to detect abusive language and logistic regression on the second step to classify racism and sexism worked better than just using Hy-bridCNN. Table 4 shows the results of abusive language classification. HybridCNN also performs best for abusive language detection, followed by WordCNN and logistic regression. Table 5 shows the results of classifying into sexism and racism given that it is abusive. The second classifier has significant performance in predicting a specific type (in this case, sexism  Since the precision and recall scores of the "abusive" label is higher than those of "racism" and "sexism" in the one-step approach, the twostep approach can perform as well as the one-step approach.

Conclusion and Future work
We explored a two-step approach of combining two classifiers -one to classify abusive language and another to classify a specific type of sexist and racist comments given that the language is abusive. With many different machine learning classifiers including our proposed Hy-bridCNN, which takes both character and word features as input, we showed the potential in the two-step approach compared to the one-step approach which is simply a multi-class classification. In this way, we can boost the performance of simpler models like logistic regression, which is faster and easier to train, and combine different types of classifiers like convolutional neural network and logistic regression together depending on each of its performance on different datasets.
We believe that two-step approach has potential in that large abusive language datasets with specific label such as profanity, sexist, racist, homophobic, etc. is more difficult to acquire than those simply flagged as abusive.
For this reason, in the future we would like to explore training the two-step classifiers on separate datasets (for example, a large dataset with abusive language for the first-step classifier and smaller specific-labelled dataset for the secondstep classifier) to build a more robust and detailed abusive language detector.