CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech

Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the research community, is to actually oppose hate content with counter-narratives (i.e. informed textual responses). In this paper, we describe the creation of the first large-scale, multilingual, expert-based dataset of hate-speech/counter-narrative pairs. This dataset has been built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task. Together with the collected data we also provide additional annotations about expert demographics, hate and response type, and data augmentation through translation and paraphrasing. Finally, we provide initial experiments to assess the quality of our data.


Introduction
Together with the rapid growth of social media platforms, the amount of user-generated content is steadily increasing. At the same time, abusive and offensive language can spread quickly and is difficult to monitor. Defining hate speech is challenging for the broadness and the nuances in cultures and languages. For instance, according to UNESCO hate speech refers to "expressions that advocate incitement to harm based upon the targets being identified with a certain social or demographic group" (Gagliardone et al., 2015).
Victims of hate speech are usually targeted because of various aspects such as gender, race, religion, sexual orientation, physical appearance. For example, Sentence 1 shows explicit hostility towards a specific group with no reasons explained 1 .
(1) I hate Muslims. They should not exist.
Online hate speech can deepen prejudice and stereotypes (Citron and Norton, 2011) and bystanders may receive false messages and consider them correct.
Although Social Media Platforms (SMP) and governmental organizations have elicited unprecedented attention to take adequate actions against hate speech by implementing laws and policies (Gagliardone et al., 2015), they do not seem to achieve the desired effect, since hate content is continuously evolving and adapting, making its identification a tough problem (Davidson et al., 2017).
The standard approach used on SMPs to prevent hate spreading is the suspension of user accounts or deletion of hate comments, while trying to weigh the right to freedom of speech. Another strategy, which has received little attention so far, is to use counter-narratives. A counternarrative (sometimes called counter-comment or counter-speech) is a response that provides nonnegative feedback through fact-bound arguments and is considered as the most effective approach to withstand hate speech (Benesch, 2014;Schieb and Preuss, 2016). In fact, it preserves the right to freedom of speech, counters stereotypes and misleading information with credible evidence. It can also alter the viewpoints of haters and bystanders, by encouraging the exchange of opinions and mutual understanding, and can help de-escalating the conversation. A counter-narrative such as the one in Sentence 2 is a non-negative, appropriate response to Sentence 1, while the one in 3 is not, since it escalates the conversation.
(2) Muslims are human too. People can choose their own religion.
(3) You are truly one stupid backwards thinking idiot to believe negativity about Islam.
In this respect, some NGOs are tackling hatred online by training operators to monitor SMPs and to produce appropriate counter-narratives when necessary. Still, manual intervention against hate speech is a toil of Sisyphus, and automatizing the countering procedure would increase the efficacy and effectiveness of hate countering (Munger, 2017).
As a first step in the above direction, we have nichesourced the collection of a dataset of counternarratives to 3 different NGOs. Nichesourcing is a specific form of outsourcing that harnesses the computational efforts from niche groups of experts rather than the 'faceless crowd' (De Boer et al., 2012). Nichesourcing combines the strengths of the crowd with those of professionals (De Boer et al., 2012;Oosterman et al., 2014). In our case we organized several data collection sessions with NGO operators, who are trained experts, specialized in writing counter-narratives that are meant to fight hatred and de-escalate the conversation. In this way we build the first large-scale, multilingual, publicly available, expert-based dataset of hate speech/counter-narrative pairs for English, French and Italian, focusing on the hate phenomenon of Islamophobia. The construction of this dataset involved more than 100 operators and more than 500 person-hours of data collection. After the data collection phase, we hired three non-expert annotators, that performed additional tasks that did not require specific domain expertise (200 person-hours of work): paraphrase original hate content to augment the number of pairs per language, annotate hate content subtopic and counter-narrative type, translate content from Italian and French to English to have parallel data across languages. This additional annotation grants that the dataset can be used for several NLP tasks related to hate speech.
The remainder of the paper is structured as follows. First, we briefly discuss related work on hate speech in Section 2. Then, in Section 3, we introduce our CONAN dataset and some descriptive statistics, followed by a quantitative and qualitative analysis on our dataset in Section 4. We conclude with our future works in Section 5.
Hate datasets. Several hate speech datasets are publicly available, usually including a binary annotation, i.e. whether the content is hateful or not (Reynolds et al., 2011;Rafiq et al., 2015;Hosseinmardi et al., 2015;de Gibert et al., 2018;ElSherief et al., 2018). Also, several shared tasks have released their datasets for hate speech detection in different languages. For instance, there is the German abusive language identification on SMPs at Germeval (Bai et al., 2018), or the hate speech and misogyny identification for Italian at EVALITA (Del Vigna et al., 2017;Fersini et al., 2018) and for Spanish at IberEval (Ahluwalia et al., 2018;Shushkevich and Cardiff, 2018). Bilingual hate speech datasets are also available for Spanish and English (Pamungkas et al., 2018). Waseem and Hovy (2016) released 16k annotated tweets containing 3 offense types: sexist, racist and neither. Ross et al. (2017) first released a German hate speech dataset of 541 tweets targeting refugee crisis and then offered insights for the improvement on hate speech detection by providing multiple labels for each hate speech.
It should be noted that, due to the copyright limitations, usually hate speech datasets are distributed as a list of tweet IDs making them ephemeral and prone to data loss (Klubička and Fernández, 2018). For this reason, Sprugnoli et al. (2018) created a multi-turn annotated WhatsApp dataset for Italian on Cyberbullying, using simulation session with teenagers to overcome the data collection/loss problem.
Hate detection. Several works have investigated online English hate speech detection and the types of hate speech. Owing to the availability of current datasets, researchers often use supervisedapproaches to tackle hate speech detection on SMPs including blogs (Warner and Hirschberg, 2012;Djuric et al., 2015;Gitari et al., 2015), Twitter (Xiang et al., 2012;Silva et al., 2016;Mathew et al., 2018a), Facebook (Del Vigna et al., 2017), and Instagram (Zhong et al., 2016). The predominant approaches are to build a classifier trained on various features derived from lexical resources (Gitari et al., 2015;Williams, 2015, 2016), n-grams (Sood et al., 2012;Nobata et al., 2016) and knowledge base (Dinakar et al., 2012), or to utilize deep neural networks Badjatiya et al., 2017). In addition, other approaches have been proposed to detect subcategories of hate speech such as antiblack (Kwok and Wang, 2013) and racist (Badjatiya et al., 2017). Silva et al. (2016) studied the prevalent hate categories and targets on Twitter and Whisper, but limited hate speech only to the form of I <intensity> <user intent> <any word>. A comprehensive overview of recent approaches on hate speech detection using NLP can be found in (Schmidt and Wiegand, 2017;Fortuna and Nunes, 2018).
Hate countering. Lastly, we should mention that a very limited number of studies have been conducted on counter-narratives (Benesch, 2014;Schieb and Preuss, 2016;Ernst et al., 2017;Mathew et al., 2018b). Mathew et al. (2018b) collected Youtube comments that contain counternarratives to YouTube videos of hatred. Schieb and Preuss (2016) studied the effectiveness of counter-narrative on Facebook via a simulation model. The study of Wright et al. (2017) shows that some arguments among strangers induce favorable changes in discourse and attitudes. To our knowledge, there exists only one very recent seminal work (Mathew et al., 2018a), focusing on the idea of collecting hate message/counternarrative pairs from Twitter. They used a simple pattern in the form (I <hate> <category>) to first extract hate tweets and then manually annotate counter-narratives found in the responses. Still, there are several shortcomings of their approach: (i) this dataset already lost more that 60% of the pairs in a small time interval (content deletion) since only tweet IDs are distributed, (ii) it is only in English language, (iii) the dataset was collected from a specific template which limits the coverage of hate speech, and (iv) many of these answers come from ordinary web users and contain -for example-offensive text, that do not meet the de-escalation intent of NGOs and the standards/quality of their operators' responses.
Considering the aforementioned works, we can reasonably state that no suitable corpora of counter-narratives is available for our purposes, especially because the natural 'countering' data that can be found on SMP -such as example 3 -often does not meet the required standards. For this reason we decided to build CONAN, a dataset of COunter NArratives through Nichesourcing.

CONAN Dataset
In this section, we describe the characteristics that we intend our dataset to posses, the nichesourcing methodology we employed to collect the data and the further expansion of the dataset together with the annotation procedures. Moreover, we give some descriptive statistics and analysis for the collected data. CONAN can be downloaded at the following link https://github.com/ marcoguerini/CONAN.

Fundamentals of the Dataset
Considering the shortcomings of the existing datasets and our aim to provide a reliable resource to the research community, we want CONAN to comply with the following characteristics: Copy-free data. We want to provide a dataset that is not ephemeral, by releasing only copy-free textual data that can be directly exploited by researches without data loss across time, as originally pointed out in (Klubička and Fernández, 2018).
Multilingual data. Our dataset is produced as a multilingual resource to allow for cross lingual studies and approaches. In particular, it contains hate speech/counter-narrative pairs for English, French, and Italian.
Expert-based data. The hate speech/counternarrative pairs have been collected through nichesourcing to three different NGOs from United Kingdom, France and Italy. Therefore, both the responses and the hate speech itself are expert-based and composed by operators, specifically trained to oppose online hate speech.
Protecting operator's identity. We aim to create a secure dataset that will not disclose the identity of operators in order to protect them against being tracked and attacked online by hate spreaders. This might be the case if we were to collect their real SMP activities, following a procedure similar to the one in Mathew et al. (2018a). Therefore our data collection was based on simulated SMP activity.

Dataset Collection
We have followed the same data collection procedure for each language to grant the same conditions and comparability of the results. The data collection has been conducted along the following steps: 1. Hate speech collection. For each language we asked two native speaker experts (NGO trainers) to write around 50 prototypical islamophobic short hate texts. This step was used to ensure that: (i) the sample uniformly covers the typical 'arguments' against Islam as much as possible, (ii) we can distribute to the NLP community the original hate speech as well as its counter-narrative. 2. Preparation of data collection forms. We prepared three online forms (one per language) with the same instructions for the operators translated in the corresponding language. For each language, we prepared 2 types of forms: in the first users can respond to hate text prepared by NGO trainers, in the second users can write their own hate text and counter-narratives at the same time. In each form operators were first asked to anonymously provide their demographic profile including age, gender, and education level; secondly to compose up to 5 counter-narratives for each hate text. 3. Counter-narrative instructions. The operators were already trained to follow the guidelines of the NGOs for creating proper counter-narratives. Such guidelines are highly consistent across languages and across NGOs, and are similar to those in 'Get the Trolls Out' project 2 . These guidelines emphasize using fact-bounded information and non-offensive language in order to avoid escalating the discussion as outlined in Table 1. Furthermore, for our specific data collection task, op-erators were asked to follow their intuitions without over-thinking and to compose reasonable responses. The motivation for this instruction was to collect as much and as diverse data as possible, since for current AI technologies (such as deep learning approaches) quantity and quality are of paramount importance and few perfect examples do not provide enough generalization evidence. Other than this instruction and the fact of using a form -instead of responding on a SMP -operators carried out their normal counter messaging activities. 4. Data collection sessions. For each language, we performed three data collection sessions on different days. Each session lasted roughly three hours 3 and had a variable number of operators -usually around 20 (depending on their availability). Operators are different from NGO trainers and might change across sessions. Operators were gathered in the same room (NGO premises) with a computer, and received a brief introduction from the NGO trainer. This introduction was about our specific counter-narrative collection task, as described above. A sample of the collected data for the three languages is given in Table 2.

Dataset Augmentation and Annotation
After the data collection phase, we hired three non-expert annotators, that performed additional work that did not require specific domain expertise. Their work amounted to roughly 200 hours. In particular they were asked to (i) paraphrase original hate content to augment the number of pairs per language, (ii) annotate hate speech subtopics and counter-narrative types (iii) translate content from French and Italian to English to have parallel data across languages. To guarantee data quality, after the annotation and the augmentation phase, a validation procedure has been conducted by NGO trainers on the newly generated data for their specific language.
Paraphrasing for augmenting data pairs. Recent deep learning approaches are data hungry, and data augmentation is a way to mitigate the problem. For instance, to improve text classification performance for sexism, new tweets are generated by replacing words in original tweets with synonyms from ConceptNet (Sharifirad et al., Don't be abusive Before submitting a response, make sure the response does not spread any hate, bigotry, prejudice or illegal content. We want to maintain the conversations peaceful and not to degenerate into a conflict. We are talking about people not categories. Think about the objectives Before writing a response, think about the effect it may create and the one you want to obtain. Paying attention to the objectives will help use proper words. Call for influential users Enlisting influential supporters (civic leaders, politicians, subject experts) will help bring attention and deepen the effect to counter-narrative. Use credible evidence The information in hate speech may be confusing and misleading. Ask for clarification when necessary. Counter it with credible evidence and use reliable sources. Think about the tone We can demonstrate understanding and support to those who might be attacked. Be careful of using sarcasm, humour, parody and satire. We can use them, if we are able to master it as they run the danger of being antagonistic.

Hate Speech
Counter-Narrative Every Muslim is a potential terrorist.
Every Muslim is also a potential peacemaker, doctor, philanthropist... What's your point? I musulmani violentano le nostre donne e vanno castrati.
Good evening, if I understood correctly, you're telling me that all adult men of Islamic faith present in Italy, raped, rape or are going to rape Italian women? Can you bring in data in support of your statement? Le voile est contraireà la laïcité.
Bien au contraire la laïcité permetà tout citoyen de vivre librement sa confession. The veil is contrary to secularism. On the contrary, secularism allows every citizen to freely profess his faith.  (Sennrich et al., 2016) and gold standard repetition (Chatterjee et al., 2017) that have been used in sequence-tosequence Machine Translation. In all these tasks, adding the synthetic pairs to the original data always results in significant improvements in the performance.
In line with the idea of artificially augmenting pairs, and since in our dataset we have many responses for few hate speeches, we produced two manual paraphrases of each hate speech and paired them with the counter-narratives of the original one. Therefore we increased the number of our pairs by three times in each language.
Counter-narrative type annotation. In this task, we asked the annotators to label each counter-narrative with types.
Based on the counter-narrative classes proposed by (Benesch et al., 2016;Mathew et al., 2018b), we defined the following set of types: PRESENTA-TION OF FACTS, POINTING OUT HYPOCRISY OR CONTRADICTION, WARNING OF CONSE-QUENCES, AFFILIATION, POSITIVE TONE, NEG-ATIVE TONE, HUMOR, COUNTER-QUESTIONS, OTHER. With respect to the original guidelines, we added a new type of counter-narrative called COUNTER-QUESTIONS to cover expressions/replies using a question that can be thoughtprovoking or asking for more evidence from the hate speaker. In fact, a preliminary analysis showed that this category is quite frequent among operator responses. Finally, each counternarrative can be labeled with more than one type, thus making the annotation more fine-grained. Two annotators per language annotated all the counter-narratives independently. A reconciliation phase was then performed for the disagreement cases.
Hate speech sub-topic annotation. We labeled sub-topics of hate content to have an annotation that can be used both for fine grained hate speech classification, and for exploring the correlation between hate sub-topics and counternarrative types. The following sub-topics are determined for the annotation based on the guidelines used by NGOs to identify hate messages (mostly consistent across languages): CULTURE, criticizing Islamic culture or particular aspects such as religious events or clothes; ECONOMICS, hate statements about Muslims taking European workplaces or not contributing economically to the society; CRIMES, hate statements about Muslims committing actions against the law; RAPISM, a very frequent topic in hate speech, for this reason it has been isolated from the previous category; TERRORISM, accusing Muslims of being terrorists, killers, preparing attacks; WOMEN OP-PRESSION, criticizing Muslims for their behavior against women; HISTORY, stating that we should hate Muslims because of historical events; OTHER/GENERIC, everything that does not fall into the above categories.
As before, two annotators per language annotated all the material. Also in this annotation task, a reconciliation phase was performed for the disagreement cases.
Parallel corpus of language pairs. To allow studying cross-language approaches to counternarratives and more generally to increase language portability, we also translated the French and the Italian pairs (i.e. hate speech and counternarratives) to English. Similar motivations can be found in using zero-short learning to translate between unseen language pairs during training (Johnson et al., 2017). With parallel corpora we can exploit cross-lingual word embeddings to enable knowledge transfer between languages (Schuster et al., 2018).

Dataset Statistics
In total we had more than 500 hours of data collection with NGOs, where we collected 4078 hate speech/counter-narrative pairs; specifically, 1288 pairs for English, 1719 pairs for French, and 1071 pairs for Italian. At least 111 operators participated in the 9 data collection sessions and each  counter-narrative needed about 8 minutes on average to be composed. The paraphrasing of hate messages and the translation of French and Italian pairs to English brought the total number of pairs to more than 15 thousand. Regarding the token length of counter-narratives, we observe that there is a consistency across the three languages with 14 tokens on average for French, and 21 for Italian and English. Considering counter-narrative length in terms of characters, only a small portion (2% for English, 1% for French, and 5% for Italian) contains more than 280 characters, which is the character limit per message in Twitter, one of the key SMPs for hate speech research. Further details on the dataset can be found in Table 3.
Regarding demographics, the majority of responses were written by operators that held a bachelor's or a higher degree (95% for English, 65% for French, and 69% for Italian). As it is shown in Table 4, there is a good balance in responses with regard to declared gender, with a slight predominance of counter-narratives written by female operators in English and Italian (53 and 55 per cent respectively) while a slight predominance of counter-narratives written by male operators is present in French (61%). Finally, the predominant age bin is 21-30 for English and Italian,   Considering the annotation tasks, we give the distribution of counter-narrative types per language in Table 5. As can be seen in the table, there is a consistency across the languages such that FACTS, QUESTION, DENOUNCING, and HYPOCRISY are the most frequent counternarrative types. Before the reconciliation phase, the agreement between the annotators was moderate: Cohen's Kappa 4 0.55 over the three languages. This can be partially explained by the complexity of the messages, that often fall under more than one category (two labels were assigned in more than 50% of the cases). On the other hand, for hate speech sub-topic annotation, the agree-ment between the annotators was very high even before the reconciliation phase (Cohen's Kappa 0.92 over the three languages). A possible reason is that such messages represent short and prototypical hate arguments, as explicitly requested to the NGO trainers. In fact, the vast majority has only one label. In Table 6, we give a distribution of hate speech sub-topics per language. As can be observed in the table, the labels are distributed quite evenly among sub-topics and across languages -in particular, CULTURE, ISLAMIZATION, GENERIC, and TERRORISM are the most frequent sub-topics.

Evaluation
In order to assess the quality of our dataset, we ran a series of preliminary experiments that involved three annotators to judge hate speech/counternarrative pairs along a yes/no dimension.
Augmentation reliability. The first experiment was meant to assess how natural a pair is when coupling a counter-narrative with the manual paraphrase of the original hate speech it refers to. We administered 120 pairs to the subjects to be evaluated: 20 were kept as they are so to have an upper bound representing ORIGINAL pairs. In 50 pairs we replaced the hate speech with a PARA-PHRASE, and in the 50 remaining pairs, we randomly matched a hate speech with a counternarrative from another hate speech (UNRELATED baseline). Results show that 85% of the times in the ORIGINAL condition hate speech and counternarrative were considered as clearly tied, followed by the 74% of times by PARAPHRASE condition, and only 4% of the UNRELATED baseline, this difference is statistically significant with p < .001 (w.r.t. χ 2 test). This indicates that the quality of augmented pairs is almost as good as the one of original pairs.
Augmentation for counter-narrative selection.
Once we assessed the quality of augmented pairs, we focused on the possible contribution of the paraphrases also in standard information retrieval approaches that have been used as baselines in dialogue systems (Lowe et al., 2015;Mazaré et al., 2018b). We first collected a small sample of natural/real hate speech from Twitter using relevant keywords (such as "stop Islam") and manually selected those that were effectively hate speeches.
We then compared 2 tf-idf response retrieval models by calculating the tf-idf matrix using the following document variants: (i) hate speech and counter-narrative response, (ii) hate speech, its 2 paraphrases, and counter-narrative response. The final response for a given sample tweet is calculated by finding the highest score among the cosine similarities between the tf-idf vectors of the sample and all the documents in a model. For each of the 100 natural hate tweets, we then provided 2 answers (one per approach) selected from our English database. Annotators were then asked to evaluate the responses with respect to their relevancy/relatedness to the given tweet. Results show that introducing the augmented data as a part of the tf-idf model provides 9% absolute increase in the percentage of the agreed 'very relevant' responses, i.e. from 18% to 27% -this difference is statistically significant with p < .01 (w.r.t. χ 2 test). This result is especially encouraging since it shows that the augmented data can be helpful in improving even a basic automatic counter-narrative selection model. Impact of Demographics. The final experiment was designed to assess whether demographic information can have a beneficial effect on the task of counter-narrative selection/production. In this experiment, we selected a subsample of 230 pairs from our dataset written by 4 male and 4 female operators that were controlled for age (i.e. same age range). We then presented our subjects with each pair in isolation and asked them to state whether they would definitely use that particular counter-narrative for that hate speech or not. Note that, in this case, we did not ask whether the counter-narrative was relevant, but if they would use that given counter-narrative text to answer the paired hate speech. The results show that in the SAMEGENDER configuration (gender declared by the operator who wrote the message and gender declared by the annotator are the same), the appre-ciation was expressed 47% of the times, while it decreases to 32% in the DIFFERENTGENDER configuration (gender declared by the operator who wrote the message and gender declared by the annotator are different). This difference is statistically significant with p < .001 (w.r.t. χ 2 test), indicating that even if operators were following the same guidelines and were instructed on the same possible arguments to build counternarratives, there is still an effect of their gender on the produced text, and this effect contributes to the counter-narrative preference in a SAMEGENDER configuration.

Conclusion
As online hate content rises massively, responding to it with counter-narratives as a combating strategy draws the attention of international organizations. Although a fast and effective responding mechanism can benefit from an automatic generation system, the lack of large datasets of appropriate counter-narratives hinders tackling the problem through supervised approaches such as deep learning. In this paper, we described CONAN: the first large-scale, multilingual, and expert-based hate speech/counter-narrative dataset for English, French, and Italian. The dataset consists of 4078 pairs over the 3 languages. Together with the collected data we also provided several types of metadata: expert demographics, hate speech sub-topic and counter-narrative type. Finally, we expanded the dataset through translation and paraphrasing.
As future work, we intend to continue collecting more data for Islam and to include other hate targets such as migrants or LGBT+, in order to put the dataset at the service of other organizations and further research. Moreover, as a future direction, we want to utilize CONAN dataset to develop a counter-narrative generation tool that can support NGOs in fighting hate speech online, considering counter-narrative type as an input feature.