Hunting for Troll Comments in News Community Forums

There are different definitions of what a troll is. Certainly, a troll can be somebody who teases people to make them angry, or somebody who offends people, or somebody who wants to dominate any single discussion, or somebody who tries to manipulate people's opinion (sometimes for money), etc. The last definition is the one that dominates the public discourse in Bulgaria and Eastern Europe, and this is our focus in this paper. In our work, we examine two types of opinion manipulation trolls: paid trolls that have been revealed from leaked reputation management contracts and mentioned trolls that have been called such by several different people. We show that these definitions are sensible: we build two classifiers that can distinguish a post by such a paid troll from one by a non-troll with 81-82% accuracy; the same classifier achieves 81-82% accuracy on so called mentioned troll vs. non-troll posts.


Introduction
The practice of using Internet trolls for opinion manipulation has been reality since the rise of Internet and community forums. It has been shown that user opinions about products, companies and politics can be influenced by opinions posted by other online users in online forums and social networks (Dellarocas, 2006). This makes it easy for companies and political parties to gain popularity by paying for "reputation management" to people that write in discussion forums and social networks fake opinions from fake profiles. * This research started in the Sofia University.
Opinion manipulation campaigns are often launched using "personal management software" that allows a user to open multiple accounts and to appear like several different people. Over time, some forum users developed sensitivity about trolls, and started publicly exposing them. Yet, it is hard for forum administrators to block them as trolls try formally not to violate the forum rules. In our work, we examine two types of opinion manipulation trolls: paid trolls that have been revealed from leaked "reputation management contracts" 1 and "mentioned trolls" that have been called such by several different people.

Related Work
Troll detection was addressed using analysis of the semantics in posts (Cambria et al., 2010) and domain-adapting sentiment analysis (Seah et al., 2015). There are also studies on general troll behavior (Herring et al., 2002;Buckels et al., 2014).
Astroturfing and misinformation have been addressed in the context of political elections using mapping and classification of massive streams of microblogging data (Ratkiewicz et al., 2011). Fake profile detection has been studied in the context of cyber-bullying (Galán-García et al., 2014).
A related research line is on offensive language use (Xu and Zhu, 2010). This is related to cyberbullying, which has been detected using sentiment analysis , graph-based approaches over signed social networks (Ortega et al., 2012;Kumar et al., 2014), and lexico-syntactic features about user's writing style (Chen et al., 2012). 1 The independent Bulgarian media Bivol published a leaked contract described the following services in favor of the government:"Monthly posting online of 250 comments by virtual users with varied, typical and evolving profiles from different (non-recurring) IP addresses to inform, promote, balance or counteract. The intensity of the provided online presence will be adequately distributed and will correspond to the political situation in the country." See https: //bivol.bg/en/category/b-files-en/b-files-trolls-en  Trustworthiness of statements on the Web is another relevant research direction (Rowe and Butters, 2009). Detecting untruthful and deceptive information has been studied using both psychology and computational linguistics (Ott et al., 2011).
A related problem is Web spam detection, which has been addressed using spam keyword spotting (Dave et al., 2003), lexical affinity of arbitrary words to spam content (Hu and Liu, 2004), frequency of punctuation and word co-occurrence (Li et al., 2006). See (Castillo and Davison, 2011) for an overview on adversarial web search.
In our previous work, we focused on finding opinion manipulation troll users (Mihaylov et al., 2015a) and on modeling the behavior of exposed vs. paid trolls (Mihaylov et al., 2015b). Here, we go beyond user profile and we try to detect individual troll vs. non-troll comments in a news community forum based on both text and metadata.

Data
We crawled the largest community forum in Bulgaria, that of Dnevnik.bg, a daily newspaper (in Bulgarian) that requires users to be signed in order to read and comment. The platform allows users to comment on news, to reply to other users' comments and to vote on them with thumbs up/down. We crawled the Bulgaria, Europe, and World categories for the period 01-Jan-2013 to 01-Apr-2015, together with comments and user profiles: 34,514 publications on 232 topics with 13,575 tags and 1,930,818 comments (897,806 of them replies) by 14,598 users; see Table 1. We then extracted comments by paid trolls vs. mentioned trolls vs. nontrolls; see Table 2.
Paid troll comments: We collected them from the leaked reputation management documents, which included 10,150 paid troll comments: 2,000 in Facebook, and 8,150 in news community forums. The latter included 650 posted in the forum of Dnevnik.bg, which we used in our experiments.
Mentioned troll comments: We further collected 1,140 comments that have been replied to with an accusation of being troll comments. We considered a comment as a potential accusation if (i) it was a reply to a comment, and (ii) it contained words such as troll or murzi(lka). 2 Two annotators checked these comments and found 578 actual accusations. The inter-annotator agreement was substantial: Cohen's Kappa of 0.82. Moreover, a simple bag-of-words classifier could find these 578 accusations with an F 1 -score of 0.85.

Here are some examples (translated):
Accusation: "To comment from "Prorok Ilia": I can see that you are a red troll by the words that you are using" Accused troll's comment: This Boyko 3 is always in your mind! You only think of him. We like Boko the Potato (the favorite of the Lamb), the way we like the Karlies.
Paid troll's comment: in the previous protests, the entire country participated, but now we only see the paid fans of GERB. 4 These are not true protests, but chaotic happenings.
Non-troll comments are those posted by users that have at least 100 comments in the forum and have never been accused of being trolls. We selected 650 non-troll comments for the paid trolls, and other 578 for the mentioned trolls as follows: for each paid or mentioned troll comment, we selected a non-troll comment at random from the same thread. Thus, we have two separate non-troll sets of 650 and of 578 comments.

Features
We train a classifier to distinguish troll (paid or mentioned) vs. non-troll comments using the following features: Bag of words. We use words and their frequencies as features, after stopword filtering. 5 Bag of stems. We further experiment with bag of stems, where we stem the words with the Bul-Stem stemmer (Nakov, 2003a;Nakov, 2003b).
Word n-grams. We also experiment with 2and 3-word n-grams.
Char n-grams. We further use character ngrams, where for each word token we extract all n consecutive characters. We use n-grams of length 3 and 4 only as other values did not help.
Word prefix. For each word token, we extract the first 3 or 4 consecutive characters.
Word suffix. For each word token, we take the last 3 or 4 consecutive characters.
Emoticons. We extract the standard HTMLbased emoticons used in the forum of Dnevnik.bg.
Punctuation count. We count the number of exclamation marks, dots, and question marks, both single and elongated, the number of words, and the number of ALL CAPS words.
Metadata. We use the time of comment posting (worktime: 9:00-19:00h vs. night: 21:00-6:00h), part of the week (workdays: Mon-Fri vs. weekend: Sat-Sun), and the rank of the comment divided by the number of comments in the thread.
Word2Vec clusters. We trained word2vec on 80M words from 34,514 publications and 1,930,818 comments in our forum, obtaining 268,617 word vectors, which we grouped into 5,372 clusters using K-Means clustering, and then we use these clusters as features.
Sentiment. We use features derived from MPQA Subjectivity Lexicon (Wilson et al., 2005) and NRC Emotion Lexicon (Mohammad and Turney, 2013) and the lexicon of Hu and Liu (2004). Originally these lexicons were built for English, but we translated them to Bulgarian using Google Translate. Then, we reused the sentiment analysis pipeline from (Velichkov et al., 2014), which we adapted for Bulgarian.
Bad words. We use the number of bad words in the comment as a feature. The words come from the Bad words list v2.0, which contains 458 bad words collected for a filter of forum or IRC channels in English. 6 We translated this list to Bulgarian using Google Translate and we removed duplicates to obtain Bad Words Bg 1. We further used the above word2vec model to find the three most similar words for each bad word in Bad Words Bg 1, and we constructed another lexicon: Bad Words Bg 3. 7 Finally, we generate two features: one for each lexicon. 6 http://urbanoalvarez.es/blog/2008/04/ 04/bad-words-list/ 7 https://github.com/tbmihailov/ gate-lang-bulgarian-gazetteers/ -GATE resources for Bulgarian, including sentiment lexicons, bad words lexicons, politicians' names, etc.
Mentions. We noted that trolls use diminutive names or humiliating nicknames when referring to politicians that they do not like, but use full or family names for people that they respect. Based on these observations, we constructed several lexicons with Bulgarian politician names, their variations and nicknames (see footnote 7), and we generated a mention count feature for each lexicon.
POS tag distribution. We also use features based on part of speech (POS). We tag using GATE (Cunningham et al., 2011) with a simplified model trained on a transformed version of the BulTreeBank-DP (Simov et al., 2002). For each POS tag type, we take the number of occurrences in the text divided by the total number of tokens. We use both fine-grained and course-grained POS tags, e.g., from the POS tag Npmsi, we generate three tags: Npmsi, N and Np.
Named entities. We also use the occurrence of named entities as features. For extracting named entities such as location, country, person name, date unit, etc., we use the lexicons that come with Gate's ANNIE (Cunningham et al., 2002) pipeline, which we translated to Bulgarian. In future work, we plan to use a better named entity recognizer based on CRF (Georgiev et al., 2009).

Experiments and Evaluation
We train and evaluate an L2-regularized Logistic Regression with LIBLINEAR (Fan et al., 2008) as implemented in SCIKIT-LEARN (Pedregosa et al., 2011), using scaled and normalized features to the [0;1] interval. As we have perfectly balanced sets of 650 positive and 650 negative examples for paid troll vs. non-trolls and 578 positive and 578 negative examples for mentioned troll vs. non-trolls, the baseline accuracy is 50%. Below, we report F-score and accuracy with cross-validation. Table 3, shows the results for experiments to distinguish comments by mentioned trolls vs. such by non-trolls, using all features, as well as when excluding individual feature groups. We can see that excluding character n-grams, word suffixes and word prefixes from the features, as well as excluding bag of words with stems or stop words, yields performance gains; the most sizable gain is when excluding char n-grams, which yields one point of improvement. Excluding bad words usage and emoticons also improves the performance but insignificantly, which might be because they are covered by the bag of words features.  Excluding any of the other features hurts performance, the two most important features to keep being metadata (as it allows us to see the time of posting), and bag of words without stopwords (which looks at the vocabulary choice that mentioned trolls use differently from regular users). Table 4 shows the results for telling apart comments by paid trolls vs. such by non-trolls, using cross-validation and ablation with the same features as for the mentioned trolls. There are several interesting observations we can make. First, we can see that the overall accuracy for finding paid trolls is slightly higher, namely 81.02, vs. 79.24 for mentioned trolls. The most helpful feature again is metadata, but this time it is less helpful (excluding it yields a drop of 5 points vs. 8 points before). The least helpful feature again are character n-grams. The remaining features fall in between, and most of them yield better performance when excluded, which suggests that there is a lot of redundancy in the features.
Next, we look at individual feature groups. Table 5 shows the results for comments by mentioned trolls vs. such by non-trolls. We can see that the metadata features are by far the most important: using them alone outperforms the results when using all features by 3.5 points.  The reason could be that most troll comments are replies to other comments, while those by nontrolls are mostly not replies. Adding other features such as sentiment-based features, bad words, POS, and punctuation hurts the performance significantly. Features such as bad words are at the very bottom: they do not apply to all comments and thus are of little use alone; similarly for mentions and sentiment features, which are also quite weak in isolation. These results suggest that mentioned trolls are not that different from non-trolls in terms of language use, but have mainly different behavior in terms of replying to other users. Table 6 shows a bit different picture for comments by paid trolls vs. such by non-trolls. The biggest difference is that metadata features are not so useful. Also, the strongest feature set is the combination of sentiment, bad words distribution, POS, metadata, and punctuation. This suggests that paid trolls are smart to post during time intervals and days of the week as non-trolls, but they use comments with slightly different sentiment and bad word use than non-trolls. Features based on words are also very helpful because paid trolls have to defend pre-specified key points, which limits their vocabulary use, while non-trolls are free to express themselves as they wish.

Discussion
Overall, we have seen that our classifier for telling apart comments by mentioned trolls vs. such by non-trolls performs almost equally well for paid trolls vs. non-trolls, where the non-troll comments are sampled from the same threads that the troll comments come from. Moreover, the most and the least important features ablated from all are also similar. This suggests that mentioned trolls are very similar to paid trolls (except for their reply rate, time and day of posting patterns). However, using just mentions might be a "witch hunt": some users could have been accused of being "trolls" unfairly. One way to test this is to look not at comments, but at users and to see which users were called trolls by several different other users. Table 7 shows the results for distinguishing users with a given number of alleged troll comments from non-troll users; the classification is based on all comments by the corresponding users. We can see that finding users who have been called trolls more often is easier, which suggests they might be trolls indeed.   Table 7: Mentioned troll vs. non-troll users (not comments!). Experiments with different number of minimum mentions for January, 2015. 'Diff" is the difference from the majority class baseline.

Conclusion and Future Work
We have presented experiments in predicting whether a comment is written by a troll or not, where we define troll as somebody who was called such by other people. We have shown that this is a useful definition and that comments by mentioned trolls are similar to such by confirmed paid trolls.