Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emojis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero- shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.


Introduction
The extensive use of large-scale self-supervised pretraining has greatly contributed to recent progress in many Natural Language Processing (NLP) tasks (Devlin et al., 2019;Liu et al., 2019;Conneau and Lample, 2019). In this context, masked language modelling objectives represent one of the main novelties of these approaches, where some tokens of an input sequence are randomly masked, and the objective is to predict these masked positions taking the corrupted sequence as input. Still, little attention has been devoted to the adaptation of these techniques to tasks dealing with social media data, probably because they are characterized by a very domain-specific language, with high variability and instability. Nevertheless, all these challenges make social media data an interesting testbed for novel deep-learning architectures, around the research question: how could the masking mechanism be adapted to target social media language?
In this paper, we address the above issue by adapting a novel architecture for cross-lingual models called XLM (Conneau and Lample, 2019) to zero-shot abusive language detection, a task that has gained increasing importance given the recent surge in abusive online behavior and the need to develop reliable and efficient methods to detect it. In particular, we evaluate two methods to pre-train bilingual language models, one similar to the original XLM masked model, and the other based on a novel hybrid emoji-based masked model. We then evaluate them on zero-shot abusive language detection for Italian, German and Spanish, showing that, although our results are below the state-of-the-art in a monolingual setting, the proposed solutions to adapt XLM to social media data are beneficial and can be effectively extended to other languages.
In the following, Section 2 discusses the related work. Section 3 describes our approach to train cross-lingual models for social media data classification, while Section 4 presents the experimental setup. Section 5 reports on the evaluation results, while Section 6 summarizes our findings.

Related work
The focus of this paper is the abusive language detection task, which has been widely explored in the last years thanks to numerous datasets, approaches and shared tasks (Waseem et al., 2017;Fišer et al., 2018;Carmona et al., 2018;Wiegand et al., 2018;Bosco et al., 2018;Zampieri et al., 2019b;Roberts et al., 2019) covering different languages. An increasing number of approaches has been proposed to detect this kind of messages (for a survey on the task, see (Schmidt and Wiegand, 2017) and (Fortuna and Nunes, 2018)).
Abusive language detection is usually framed as a supervised learning problem, built using a combination of manually crafted features such as n-grams (Wulczyn et al., 2017), syntactic features (Nobata et al., 2016), and linguistic features (Yin et al., 2009), to more recent neural networks (Park and Fung, 2017;Zhang and Tepper, 2018;Agrawal and Awekar, 2018;Corazza et al., 2018). (Lee et al., 2018) address a comparative study of various learning models on the Hate and Abusive Speech on Twitter dataset (Founta et al., 2018), while (Zampieri et al., 2019a) build the Offensive Language Identification Dataset and experiment with SVMs, BiLSTM and CNN both on the binary abusive language classification and on a more fine-grained categorization. Our work deals with the same task, addressed from a cross-lingual perspective.
In recent years, some proposals have been made to tackle abusive language detection in a cross-lingual framework (Sohn and Lee, 2019;Pamungkas and Patti, 2019;Casula et al., 2020), with some attempts at zero-shot learning (Stappen et al., 2020). Most systems, however, rely on pretrained models and do not investigate the potential of indomain data for pretraining. Additionally, as regards masked language models, we are not aware of any work in the literature modifying masking mechanisms for this task.
3 Cross-Lingual Language Models

MLM and HE-MLM training objectives
Our basic architecture relies on the XLM approach described in (Conneau and Lample, 2019), specifically developed to learn joint multilingual representations enabling knowledge transfer across languages. In particular, we borrow from XLM the method developed for unsupervised machine translation, that relies on the Masked Language Model (MLM) objective (Devlin et al., 2019) applied to multiple monolingual datasets as pretraining. We choose to adopt the unsupervised approach because the alternative (i.e., the supervised one based on the Translation Language Modeling) would need to be trained on parallel data, which are not available at scale for social media. As in XLM, we use Byte Pair Encoding (BPE) (Sennrich et al., 2016) to learn a shared vocabulary of common subwords between the languages. This technique has proven beneficial to the alignment of embeddings from different languages, when used on languages that share some common traits, such as alphabet and digits.
Following the original approach to MLM, 15% of the tokens in a sentence get selected, which get masked 80% of the times, replaced by a random token in 10% of the cases and kept unchanged 10% of the times. In order to reduce the impact of relatively frequent words on the model, tokens are sampled according to a multinomial distribution that is proportional to the square root of their inverted frequency. While the original XLM operates on streams of text, split by sentence separators, we split the stream of tweets, so that each example contains only one tweet.
Since using a standard pre-trained language model to classify irregular data obtained from social networks would prove very challenging, we try to adapt our cross-lingual model to social media data as much as possible. Specifically, we rely on two main intuitions: emojis are linked to emotion expressions, correlated in turn with various forms of online harassment (Arslan et al., 2019). Besides, emojis could be seen as common traits that are present in tweets across different languages, maintaining a similar meaning at least when comparing Indo-European languages (Lu et al., 2016). If we consider the data used in this paper, we can find a good coverage of emojis, with 16.82% of the tweets containing at least one emoji for English, 16.15% for German, 7.68% for Italian, and 18.39% for Spanish. Furthermore, in these datasets the most frequent emojis are shared among all the four languages, with 'red heart', 'face with tears of joy', 'thinking face' and 'smiling face with heart-eyes' among the top ten emojis in each dataset. We therefore compare a standard masked language model with one that targets emoji prediction instead of the cloze task (Taylor, 1953). However, since emojis are not always present in each tweet, we adopt a hybrid approach: when emojis are not present, the previously described MLM objective is trained. When emojis are found, we select them as candidates to be masked 80% of the time, replaced by a random token 10% of the time or kept unchanged 10% of the time as in MLM. With this technique, which we call Hybrid Emoji-based Masked Language Model (HE-MLM) we can use all the available data, while also leveraging the common information conveyed by emojis.
We test also a variant of MLM and HE-MLM, in which we put special tokens "<emoji>" and "</emoji>" around all emojis in the dataset, given that we are effectively performing two different tasks with the same model. This approach allows the model to distinguish between normal words and emojis in the text while training masked language models.

Fine-tuning for abusive language detection
In order to assess how invariant our tweet embeddings are with respect to the language provided as input to the encoder, we create a zero-shot framework, where the system is only trained on English tweets and is evaluated on multiple languages. In particular, we first load the pretrained transformer and attach to it a single feed-forward layer on top of the encoder with a single, sigmoid activated output neuron. The entire model is then fine-tuned on the English hate speech detection dataset using a binary cross-entropy loss function. The system uses early stopping with the minimum F1 score between the two classes as a stopping criterion, relying on a balanced dataset that contains all languages as validation set. Finally, the performance is evaluated on the German, Italian and Spanish test sets to assess how our classifier performs on the different languages using the bilingual models.
4 Experimental setting

Datasets
Since we run our classification in a zero-shot scenario, we use English data for training, and tweets in German, Italian and Spanish for validation and test. The datasets we used and the related number of tweets are reported in Table 1. To guarantee a comparable setting for our experiments, we carefully investigated data samples and the annotation schemes adopted for the different languages, concluding that the tweet content as well as the binary annotation tagsets (hate-speech/offensive and other) of the datasets are similar enough to use them in the same classification framework. Also the class distribution is similar, with the abusive class covering around 30% of the tweets in each dataset.
To pre-train our cross-lingual language models with in-domain data, we gather 5 million tweets for each of the targeted languages (i.e., English, German, Italian and Spanish  Streaming API using the stopwords of the target language as filter to query the API, as in (Scheffler, 2014).

Data splitting
Concerning the dataset splits into training and test instances, for the English dataset -since no standardized split is provided -we randomly selected 60% of the dataset for training, 20% for validation and 20% for testing. For the German and Italian datasets, we use the training and test split provided by the Germeval and Evalita task organisers, respectively. In both cases, we use 20% of the training set as validation set. Whenever we split the datasets, we use the train test split function from scikit-learn (Pedregosa et al., 2011), using 42 as a seed value. Finally, for Spanish, we use the development, test and training set provided by the HatEval task organisers.
For each combination of languages tested in our experiments (i.e., English-German, English-Italian and English-Spanish), the validation test is obtained by keeping the language-specific validation set as is and undersampling the English one to the same size, so that each language has the same weight during the early stopping phase.
Before classification, the text is first lowercased, all accents are removed, then it is tokenized with (Koehn et al., 2007)   fastBPE implementation 1 . We evaluate the classifier performance over a maximum of 100 training epochs, and use an early stopping mechanism with a patience of 5. The selected model is then used to evaluate performance on the test set.

Pretraining methods
Since we want to assess the impact of emojis on the pretraining results, we train four different configurations: • Using the base MLM training objective; • Using the base MLM training objective and <emoji> tokens; • Using the HE-MLM training objective; • Using the HE-MLM training objective and <emoji> tokens.
For each configuration, we pretrain two models in order to reduce the impact of random initialization on the final results and we fine-tune each model 10 times (20 total). The final results are obtained by averaging the results of these 20 runs.

Evaluation
We report the experiment results for each language in Tables 2, 3, 4. For all languages, training is performed using only English data.
1 https://github.com/glample/fastBPE Results for German (Table 2) show that using in-domain unlabeled data from Twitter instead of pre-trained models yields an improvement in performance on English, while on German the model is not able to outperform the pre-trained model. In this case, however, the pretrained model is only learning the non-hate class, while the other three models all achieve non zero recall on both classes. Beside the baseline, the HE-MLM model with <emoji> is the best performing one on the German data, while on English the best performance is achieved by using the vanilla MLM model.
We evaluate MLM and HE-MLM also for zeroshot Italian hate speech classification, comparing the configurations with and without <emoji> tokens like in the previous experiments (Table 3). For English, the best performing model is HE-MLM with emoji tokens, while on Italian the HE-MLM model with no tokens is better in terms of macro averaged F1. When comparing configurations, we observe that the MLM model with emoji tokens has better F1 score than the MLM one in the non hate speech class, while the MLM model has improved performance on the hate class. This results in the MLM model having better macro average F1 for Italian, while the MLM model with emoji tokens shows higher average F1 on English. When considering the hybrid emoji-based models, HE-MLM achieves a higher F1 for the hate speech class in English and for the non hate class in Italian. This results in the HE-MLM model having a higher  On all the runs, the classifier achieves a lower performance on German than on the other two languages, while the results on Italian and Spanish are comparable. This confirms the findings in (Corazza et al., 2020) suggesting that, even when using the same classification framework, experimental setting and amount of training data, offensive speech detection on German achieves lower performance than on other languages. This may have two possible reasons: on the one hand, German may have inherent characteristics that make it more challenging to classify for abusive language detection, for example the presence of compound words makes hashtag splitting more error-prone. On the other hand, the Germeval dataset was built by sampling data from specific users and avoiding keyword-based queries, so to obtain the highest possible variability in the offensive language. This led to the creation of a very challenging dataset, where lexical overlap between training and test data is limited and where hate speech is not associated with specific topics or keywords, as suggested in (Wiegand et al., 2019).

Conclusions
In this paper, we present a novel zero-shot framework for multilingual abusive language detection. We compare two cross-lingual language models, i.e., standard MLM and a hybrid version of MLM based on emojis (HE-MLM), highlighting that the latter shows some advantages over the MLM model when used on social media data: first of all, when using emojis, the pre-training step is aimed at predicting tokens that are inherently more relevant for the final abusive language detection task whenever possible, as opposed to random tokens. Secondly, emojis convey similar meaning in the languages that we consider, serving as a common trait between languages during pre-training. We also use <emoji> tokens around emojis to help the system discriminate between the two training objectives when using HE-MLM.
The proposed methods represent a novel contribution with respect to social media data processing and abusive language detection. Our aim is not to create a system comparable with monolingual state-of-the-art solutions, but to investigate the possibility to use an unsupervised approach for zeroshot cross lingual abusive language detection. As a first step in this direction, we focused on four European languages, for which similar data were available. The only existing work dealing with zero-shot abusive language detection, presented in (Stappen et al., 2020), only focuses on a language pair and, while obtaining promising results, relies on the English and Spanish corpora annotated for HatEval 2019 following the same guidelines and focusing on hate against immigrants and women. Our approach aims to be more robust, comparing datasets annotated for different shared tasks which may adopt slightly different guidelines.
In the near future, we plan to further extend the social media-specific datasets we are collecting to pre-train HE-MLM, since 5 million tweets we used for each language correspond to a small-sized corpus compared to standard pre-trained language models. Then, to investigate whether our results can be generalised also when dealing with typologically different languages, we will test our approach on additional abusive language datasets covering other languages (Ousidhoum et al., 2019;Zampieri et al., 2020).