Predicting the Role of Political Trolls in Social Media

We investigate the political roles of “Internet trolls” in social media. Political trolls, such as the ones linked to the Russian Internet Research Agency (IRA), have recently gained enormous attention for their ability to sway public opinion and even influence elections. Analysis of the online traces of trolls has shown different behavioral patterns, which target different slices of the population. However, this analysis is manual and labor-intensive, thus making it impractical as a first-response tool for newly-discovered troll farms. In this paper, we show how to automate this analysis by using machine learning in a realistic setting. In particular, we show how to classify trolls according to their political role —left, news feed, right— by using features extracted from social media, i.e., Twitter, in two scenarios: (i) in a traditional supervised learning scenario, where labels for trolls are available, and (ii) in a distant supervision scenario, where labels for trolls are not available, and we rely on more-commonly-available labels for news outlets mentioned by the trolls. Technically, we leverage the community structure and the text of the messages in the online social network of trolls represented as a graph, from which we extract several types of learned representations, i.e., embeddings, for the trolls. Experiments on the “IRA Russian Troll” dataset show that our methodology improves over the state-of-the-art in the first scenario, while providing a compelling case for the second scenario, which has not been explored in the literature thus far.


Introduction
Internet "trolls" are users of an online community who quarrel and upset people, seeking to sow discord by posting inflammatory content.More recently, organized "troll farms" of political opinion manipulation trolls have also emerged.
Such farms usually consist of state-sponsored agents who control a set of pseudonymous user accounts and personas, the so-called "sockpuppets", which disseminate misinformation and propaganda in order to sway opinions, destabilize the society, and even influence elections (Linvill and Warren, 2018).
The behavior of political trolls has been analyzed in different recent circumstances, such as the 2016 US Presidential Elections and the Brexit referendum in UK (Linvill and Warren, 2018;Llewellyn et al., 2018).However, this kind of analysis requires painstaking and time-consuming manual labor to sift through the data and to categorize the trolls according to their actions.Our goal in the current paper is to automate this process with the help of machine learning (ML).In particular, we focus on the case of the 2016 US Presidential Elections, for which a public dataset from Twitter is available.For this case, we consider only accounts that post content in English, and we wish to divide the trolls into some of the functional categories identified by Linvill and Warren (2018): left troll, right troll, and news feed.
We consider two possible scenarios.The first, prototypical ML scenario is supervised learning, where we want to learn a function from users to categories {left, right, news feed}, and the ground truth labels for the troll users are available.This scenario has been considered previously in the literature by Kim et al. (2019).Unfortunately, a solution for such a scenario is not directly applicable to a real-world use case.Suppose a new troll farm trying to sway the upcoming European or US elections has just been discovered.While the identities of the accounts might be available, the labels to learn from would not be present.Thus, any supervised machine learning approach would fall short of being a fully automated solution to our initial problem.
A more realistic scenario assumes that labels for troll accounts are not available.In this case, we need to use some external information in order to learn a labeling function.Indeed, we leverage more persistent entities and their labels: news media.We assume a learning scenario with distant supervision where labels for news media are available.By combining these labels with a citation graph from the troll accounts to news media, we can infer the final labeling on the accounts themselves without any need for manual labeling.
One advantage of using distant supervision is that we can get insights about the behavior of a newly-discovered troll farm quickly and effortlessly.Differently from troll accounts in social media, which usually have a high churn rate, news media accounts in social media are quite stable.Therefore, the latter can be used as an anchor point to understand the behavior of trolls, for which data may not be available.
We rely on embeddings extracted from social media.In particular, we use a combination of embeddings built on the user-to-user mention graph, the user-to-hashtag mention graph, and the text of the tweets of the troll accounts.We further explore several possible approaches using label propagation for the distant supervision scenario.
As a result of our approach, we improve the classification accuracy by more than 5 percentage points for the supervised learning scenario.The distant supervision scenario has not previously been considered in the literature, and is one of the main contributions of the paper.We show that even by hiding the labels from the ML algorithm, we can recover 78.5% of the correct labels.
The contributions of this paper can be summarized as follows: • We predict the political role of Internet trolls (left, news feed, right) in a realistic, unsupervised scenario, where labels for the trolls are not available, and which has not been explored in the literature before.• We propose a novel distant supervision approach for this scenario, based on graph embeddings, BERT, and label propagation, which projects the more-commonly-available labels for news media onto the trolls who cited these media.• We improve over the state of the art in the traditional, fully supervised setting, where training labels are available.
2 Related Work

Trolls and Opinion Manipulation
The promise of social media to democratize content creation (Kaplan and Haenlein, 2010) has been accompanied by many malicious attempts to spread misleading information over this new medium, which quickly got populated by sockpuppets (Kumar et al., 2017), Internet water army (Chen et al., 2013), astroturfers (Ratkiewicz et al., 2011), and seminar users (Darwish et al., 2017).
Several studies have shown that trust is an important factor in online relationships (Ho et al., 2012;Ku, 2012;Hsu et al., 2014;Elbeltagi and Agag, 2016;Ha et al., 2016), but building trust is a longterm process and our understanding of it is still in its infancy (Salo and Karjaluoto, 2007).This makes it easy for politicians and companies to manipulate user opinions in forums (Dellarocas, 2006;Li et al., 2016;Zhuang et al., 2018).
Trolls.Social media have seen the proliferation of fake news and clickbait (Hardalov et al., 2016;Karadzhov et al., 2017a), aggressiveness (Moore et al., 2012), andtrolling (Cole, 2015).The latter often is understood to concern malicious online behavior that is intended to disrupt interactions, to aggravate interacting partners, and to lure them into fruitless argumentation in order to disrupt online interactions and communication (Chen et al., 2013).Here we are interested in studying not just any trolls, but those that engage in opinion manipulation (Mihaylov et al., 2015a(Mihaylov et al., ,b, 2018)).This latter definition of troll has also become prominent in the general public discourse recently.Del Vicario et al. (2016) have also suggested that the spreading of misinformation online is fostered by the presence of polarization and echo chambers in social media (Garimella et al., 2016(Garimella et al., , 2017(Garimella et al., , 2018)).
Trolling behavior is present and has been studied in all kinds of online media: online magazines (Binns, 2012), social networking sites (Cole, 2015), online computer games (Thacker and Griffiths, 2012), online encyclopedia (Shachaf and Hara, 2010), and online newspapers (Ruiz et al., 2011), among others.
Sockpuppet is a related notion, and refers to a person who assumes a false identity in an Internet community and then speaks to or about themselves while pretending to be another person.The term has also been used to refer to opinion manipulation, e.g., in Wikipedia (Solorio et al., 2014).Sockpuppets have been identified by using authorship-identification techniques and link analysis (Bu et al., 2013).It has been also shown that sockpuppets differ from ordinary users in their posting behavior, linguistic traits, and social network structure (Kumar et al., 2017).
Internet Water Army is a literal translation of the Chinese term wangluo shuijun, which is a metaphor for a large number of people who are well organized to flood the Internet with purposeful comments and articles.Internet water army has been allegedly used in China by the government (also known as 50 Cent Party) as well as by a number of private organizations.
Astroturfing is an effort to simulate a political grass-roots movement.It has attracted strong interest from political science, and research on it has focused on massive streams of microblogging data (Ratkiewicz et al., 2011).
Identification of malicious accounts in social media includes detecting spam accounts (Almaatouq et al., 2016;McCord and Chuah, 2011), fake accounts (Fire et al., 2014;Cresci et al., 2015), compromised and phishing accounts (Adewole et al., 2017).Fake profile detection has also been studied in the context of cyber-bullying (Galán-García et al., 2016).A related problem is that of Web spam detection, which has been addressed as a text classification problem (Sebastiani, 2002), e.g., using spam keyword spotting (Dave et al., 2003), lexical affinity of arbitrary words to spam content (Hu and Liu, 2004), frequency of punctuation and word co-occurrence (Li et al., 2006).
For example, Castillo et al. (2011) leverage user reputation, author writing style, and various timebased features, Canini et al. (2011) analyze the interaction of content and social network structure, and Morris et al. ( 2012) studied how Twitter users judge truthfulness.Zubiaga et al. (2016) study how people handle rumors in social media, and found that users with higher reputation are more trusted, and thus can spread rumors easily.Lukasik et al. (2015) use temporal patterns to detect rumors and to predict their frequency, and Zubiaga et al. (2016) focus on conversational threads.More recent work has focused on the credibility and the factuality in community forums (Nakov et al., 2017;Mihaylova et al., 2018Mihaylova et al., , 2019;;Mihaylov et al., 2018).

Understanding the Role of Political Trolls
None of the above work has focused on understanding the role of political trolls.The only closely relevant work is that of Kim et al. (2019), who predict the roles of the Russian trolls on Twitter by leveraging social theory and Actor-Network Theory approaches.They characterize trolls using the digital traces they leave behind, which is modeled using a time-sensitive semantic edit distance.For this purpose, they use the "IRA Russian Troll" dataset (Linvill and Warren, 2018), which we also use in our experiments.However, we have a very different approach based on graph embeddings, which we show to be superior to their method in the supervised setup.We further experiment with a new, and arguably more realistic, setup based on distant supervision, where labels are not available.To the best of our knowledge, this setup has not been explored in previous work.

Graph Embeddings
Graph embeddings are machine learning techniques to model and capture key features from a graph automatically.They can be trained either in a supervised or in an unsupervised manner (Cai et al., 2018).The produced embeddings are latent vector representations that map each vertex V in a graph G to a d-dimensional vector.The vectors capture the underlying structure of the graph by putting "similar" vertices close together in the vector space.By expressing our data as a graph structure, we can leverage and extract critical insights about the topology and the contextual relationships between the vertices in the graph.
In mathematical terms, graph embeddings can be expressed as a function f : V → R d from the set of vertices V to a set of embeddings, where d is the dimensionality of the embeddings.The function f can be represented as a matrix of dimensions |V |× d.In our experiments, we train Graph Embeddings in an unsupervised manner by using node2vec (Grover and Leskovec, 2016), which is based on random walks over the graph.Essentially, this is an application of the well-known skip-gram model (Mikolov et al., 2013) from word2vec to random walks on graphs.
Besides node2vec, there have been a number of competing proposals for building graph embeddings; see (Cai et al., 2018) for an extensive overview of the topic.For example, SNE (Liao et al., 2018) model both the graph structure and some node attributes.Similarly, Line (Tang et al., 2015) represent each node as the concatenation of two embedded vectors that model first-and second-order proximity.TriDNR (Pan et al., 2016) represents nodes by coupling several neural network models.For our experiments, we use node2vec, as we do not have access to user attributes: the users have been banned from Twitter, their accounts were suspended, and we only have access to their tweets thanks to the "IRA Russian Trolls" dataset.

Method
Given a set of known political troll users (each user being represented as a collection of their tweets), we aim to detect their role: left, right, or news feed.Linvill and Warren (2018) describe these roles as follows: Right Trolls spread nativist and right-leaning populist messages.Such trolls support the candidacy and Presidency of Donald Trump and denigrate the Democratic Party; moreover, they often send divisive messages about mainstream and moderate Republicans.
Left Trolls send socially liberal messages and discuss gender, sexual, religious, and -especiallyracial identity.Many tweets are seemed intentionally divisive, attacking mainstream Democratic politicians, particularly Hillary Clinton, while supporting Bernie Sanders prior to the elections.
News Feed Trolls overwhelmingly present themselves as US local news aggregators, linking to legitimate regional news sources and tweeting about issues of local interest.
Technically, we leverage the community structure and the text of the messages in the social network of political trolls represented as a graph, from which we learn and extract several types of vector representations, i.e., troll user embeddings.Then, armed with these representations, we tackle the following tasks: T1 A fully supervised learning task, where we have labeled training data with example troll and their roles; T2 A distant supervision learning task, in which labels for the troll roles are not available at training time, and thus we use labels for news media as a proxy, from which we infer labels for the troll users.

Embeddings
We use two graph-based (user-to-hashtag and user-to-mentioned-user) and one text-based (BERT) embedding representations.

U2H
We build a bipartite, undirected User-to-Hashtag (U2H) graph, where nodes are users and hashtags, and there is an edge (u, h) between a user node u and a hashtag node h if user u uses hashtag h in their tweets.This graph is bipartite as there are no edges connecting two user nodes or two hashtag nodes.We run node2vec (Grover and Leskovec, 2016) on this graph, and we extract the embeddings for the users (we ignore the hashtag embeddings).We use 128 dimensions for the output embeddings.These embeddings capture how similar troll users are based on their usage of hashtags.

U2M
We build an undirected User-to-Mentioned-User (U2M) graph, where the nodes are users, and there is an edge (u, v) between two nodes if user u mentions user v in their tweets (i.e., u has authored a tweet that contains "@v" ).We run node2vec on this graph and we extract the embeddings for the users.As we are interested only in the troll users, we ignore the embeddings of users who are only mentioned by other trolls.We use 128 dimensions for the output embeddings.The embeddings extracted from this graph capture how similar troll users are according to the targets of their discussions on the social network.

BERT
BERT offers state-of-the-art text embeddings based on the Transformer (Devlin et al., 2019).We use the pre-trained BERT-large, uncased model, which has 24-layers, 1024-hidden, 16heads, and 340M parameters, which yields output embeddings with 768 dimensions.Given a tweet, we generate an embedding for it by averaging the representations of the BERT tokens from the penultimate layer of the neural network.To obtain a representation for a user, we average the embeddings of all their tweets.The embeddings extracted from the text capture how similar users are according to their use of language.
3.2 Fully Supervised Learning (T1) Given a set of troll users for which we have labels, we use the above embeddings as a representation to train a classifier.We use an L2-regularized logistic regression (LR) classifier.Each troll user is an example, and the label for the user is available for training thanks to manual labeling.We can therefore use cross-validation to evaluate the predictive performance of the model, and thus the predictive power of the features.We experiment with two ways of combining features: embedding concatenation and model ensembling.Embedding concatenation concatenates the feature vectors from different embeddings into a longer feature vector, which we then use to train the LR model.Model ensembling instead trains a separate model with each kind of embedding, and then merges the prediction of the different models by averaging the posterior probabilities for the different classes.Henceforth, we denote embedding concatenation with the symbol and model ensembling with ⊕.For example, U2H U2M is a model trained on the concatenation of U2H and U2M embeddings, while U2H ⊕ BERT represents the average predictions of two models, one trained on U2H embeddings and one on BERT.

Distant Supervision (T2)
In the distant supervision scenario, we assume not to have access to user labels.Given a set of troll users without labels, we use the embeddings described in Section 3.1 together with mentions of news media by the troll users to create proxy models.We assume that labels for news media are readily available, as they are stable sources of information that have a low churn rate.
We propagate labels from the given media to the troll user that mentions them according to the following media-to-user mapping:

LEFT→ left RIGHT→ right CENTER→ news feed
(1) This propagation can be done in different ways: (a) by training a proxy model for media and then applying it to users, (b) by additionally using label propagation (LP) for semi-supervised learning.
Let us describe the proxy model propagation for (a) first.Let M be the set of media, and U be the set of users.We say a user u ∈ U mentions a medium m ∈ M if u posts a tweet that contains a link to the website of m.We denote the set of users that mention the medium m as C m ⊆ U .
We can therefore create a representation for a medium by aggregating the embeddings of the users that mention the target medium.Such a representation is convenient as it lies in the same space as the user representation.In particular, given a medium m ∈ M , we compute its representation R(m) as where R(u) is the representation of user u, i.e., one (or a concatenation) of the embeddings described in Section 3.1.Finally, we can train a LR model that uses R(m) as features and the label for the medium l(m).This model can be applied to predict the label of a user u by using the same type of representation R(u), and the label mapping in Equation 1.
Label Propagation (b) is a transductive, graphbased, semi-supervised machine learning algorithm that, given a small set of labeled examples, assigns labels to previously unlabeled examples.The labels of each example change in relationship to the labels of neighboring ones in a properlydefined graph.
More formally, given a partially-labeled dataset of examples X = X u ∪ X l , of which X l are labeled examples with labels Y l , and X u are unlabeled examples, and a similarity graph G(X, E), the label propagation algorithm finds the set of unknown labels Y u such that the number of discordant pairs (u, v) ∈ E : y u = y v is minimized, where y z is the label assigned to example z.The algorithm works as follows: At every iteration of propagation, each unlabeled node updates its label to the most frequent one among its neighbors.LP reaches convergence when each node has the same label as the majority of its neighbors.We define two different versions of LP by creating two different versions of the similarity graph G.

LP1 Label Propagation using direct mention.
In the first case, the set of edges among users U in the similarity graph G consists of the logical OR between the 2-hop closure of the U2H and the U2M graph.That is, for each two users u, v ∈ U , there is an edge in the similarity graph (u, v) ∈ E if u and v share a common hashtag or a common user mention The graph therefore uses the same information that is available to the embeddings.
To this graph, which currently encompasses only the set of users U , we add connections to the set of media M .We add an edge between each pair (u, m) if u ∈ C m .Then, we run the label propagation algorithm, which propagates the labels from the labeled nodes M to the unlabeled nodes U , thanks to the mapping from Equation 1.

LP2 Label Propagation based on a similarity graph.
In this case, we use the same representation for the media as in the proxy model case above, as described by Equation 2.Then, we build a similarity graph among media and users based on their embeddings.For each pair x, y ∈ U ∪ M there is an edge in the similarity graph (x, y) where sim is a similarity function between vectors, e.g., cosine similarity, and τ is a user-specified parameter that regulates the sparseness of the similarity graph.
Finally, we perform label propagation on the similarity graph defined by the embedding similarity, with the set of nodes corresponding to M starting with labels, and with the set of nodes corresponding to U starting without labels.

IRA Russian Troll Tweets
Our main dataset contains 2 973 371 tweets by 2848 Twitter users, which the US House Intelligence Committee has linked to the Russian Internet Research Agency (IRA).The data was collected and published by Linvill and Warren (2018), and then made available online. 1 The time span covers the period from February 2012 to May 2018.
The trolls belong to the following manually assigned roles: Left Troll, Right Troll, News Feed, Commercial, Fearmonger, Hashtag Gamer, Non English, Unknown.Kim et al. (2019) have argued that the first three categories are not only the most frequent, but also the most interesting ones.Moreover, focusing on these troll types allows us to establish a connection between troll types and the political bias of the news media they mention.Table 1 shows a summary of the troll role distribution, the total number of tweets per role, as well as examples of troll usernames and tweets.

Media Bias/Fact Check
We use data from Media Bias/Fact Check (MBFC)2 to label news media sites.MBFC divides news media into the following bias categories: Extreme-Left, Left, Center-Left, Center, Center-Right, Right, and Extreme-Right.We reduce the granularity to three categories by grouping Extreme-Left and Left as LEFT, Extreme-Right and Right as RIGHT, and Center-Left, Center-Right, and Center as CENTER.Table 2 shows some basic statistics about the resulting media dataset.Similarly to the IRA dataset, the distribution is right-heavy.

Experimental Setup
For each user in the IRA dataset, we extracted all the links in their tweets, we expanded them recursively if they were shortened, we extracted the domain of the link, and we checked whether it could be found in the MBFC dataset.By grouping these relationships by media, we constructed the sets of users C m that mention a given medium m ∈ M .
The U2H graph consists of 108 410 nodes and 443 121 edges, while the U2M graph has 591 793 nodes and 832 844 edges.We ran node2vec on each graph to extract 128-dimensional vectors for each node.We used these vectors as features for the fully supervised and for the distantsupervision scenarios.For Label Propagation, we used an empirical threshold for edge materialization τ = 0.55, to obtain a reasonably sparse similarity graph.
We used two evaluation measures: accuracy, and macro-averaged F1 (the harmonic average of precision and recall).In the supervised scenario, we performed 5-fold cross-validation.In the distant-supervision scenario, we propagated labels from the media to the users.Therefore, in the latter case the user labels were only used for evaluation.

Evaluation Results
Table 3 shows the evaluation results.Each line of the table represents a different combination of features, models, or techniques.As mentioned in Section 3, the symbol ' ' denotes a single model trained on the concatenation of the features, while the symbol '⊕' denotes an averaging of individual models trained on each feature separately.The tags 'LP1' and 'LP2' denote the two label propagation versions, by mention and by similarity, respectively.
We can see that accuracy and macro-averaged F1 are strongly correlated and yield very consistent rankings for the different models.Thus, henceforth we will focus our discussion on accuracy.
We can see in Table 3 that it is possible to predict the roles of the troll users by using distant supervision with relatively high accuracy.Indeed, the results for T2 are lower compared to their T1 counterparts by only 10 and 20 points absolute in terms of accuracy and F1, respectively.This is impressive considering that the models for T2 have no access to labels for troll users.
Looking at individual features, for both T1 and T2, the embeddings from U2M outperform those from U2H and from BERT.One possible reason is that the U2M graph is larger, and thus contains more information.It is also possible that the social circle of a troll user is more indicative than the hashtags they used.Finally, the textual content on Twitter is quite noisy, and thus the BERT embeddings perform slightly worse when used alone.
All our models with a single type of embedding easily outperform the model of Kim et al. (2019).The difference is even larger when combining the embeddings, be it by concatenating the embedding vectors or by training separate models and then combining the posteriors of their predictions.
By concatenating the U2M and the U2H embeddings (U2H U2M), we fully leverage the hashtags and the mention representations in the latent space, thus achieving accuracy of 88.7 for T1 and 78.0 for T2, which is slightly better than when training separate models and then averaging their posteriors (U2H ⊕ U2M): 88.3 for T1 and 77.9 for T2.Adding BERT embeddings to the combination yields further improvements, and follows a similar trend, where feature concatenation works better, yielding 89.2 accuracy for T1 and 78.2 for T2 (compared to 89.0 accuracy for T1 and 78.0 for T2 for U2H ⊕ U2M ⊕ BERT).
Adding label propagation yields further improvements, both for LP1 and for LP2, with the latter being slightly superior: 89.6 vs. 89.3accuracy for T1, and 78.5 vs. 78.3 for T2.
Overall, our methodology achieves sizable improvements over previous work, reaching an accuracy of 89.6 vs. 84.0 of Kim et al. (2019) in the fully supervised case.Moreover, it achieves 78.5 accuracy in the distant supervised case, which is only 11 points behind the result for T1, and is about 10 points above the majority class baseline.

Ablation Study
We performed different experiments with the hyper-parameters of the graph embeddings.With smaller dimensionality (i.e., using 16 dimensions instead of 128), we noticed 2-3 points of absolute decrease in accuracy across the board.Moreover, we found that using all of the data for learning the embeddings was better than focusing only on users that we target in this study, namely left, right, and news feed, i.e., using the rest of the data adds additional context to the embedding space, and makes the target labels more contextually distinguishable.Similarly, we observe 5-6 points of absolute drop in accuracy when training our embeddings on tweets by trolls labeled as left, right, and news feed.

Comparison to Full Supervision
Next, we compared to the work of Kim et al. (2019), who had a fully supervised learning scenario, based on Tarde's Actor-Network Theory.They paid more attention to the content of the tweet by applying a text-distance metric in order to capture the semantic distance between two sequences.In contrast, we focus on critical elements of information that are salient in Twitter: hashtags and user mentions.By building a connection between users, hashtags, and user mentions, we effectively filtered out the noise and we focused only on the most sensitive type of context, thus automatically capturing features from this network via graph embeddings.

Method
Accuracy Macro F1 Leveraging user embeddings to predict the bias of the media cited by troll users.

Reverse Classification: Media from Trolls
Table 4 shows an experiment in distant supervision for reverse classification, where we trained a model on the IRA dataset with the troll labels, and then we applied that model to the representation of the media in the MBFC dataset, where each medium is represented as the average of the embeddings of the users who cited that medium.We can see that we could improve over the baseline by 20 points absolute in terms of accuracy and by 41 in terms absolute in terms of macro-averaged F1.We can see in Table 4 that the relative ordering in terms or performance for the different models is consistent with that for the experiments in the previous section.This suggests that the relationship between trolls and media goes both ways, and thus we can use labels for media as a way to label users, and we can also use labels for troll users as a way to label media.
We have proposed a novel approach to analyze the behavior patterns of political trolls according to their political leaning (left vs. news feed vs. right) using features from social media, i.e., from Twitter.We experimented with two scenarios: (i) supervised learning, where labels for trolls are provided, and (ii) distant supervision, where such labels are not available, and we rely on more common labels for news outlets cited by the trolls.Technically, we leveraged the community structure and the text of the messages in the online social network of trolls represented as a graph, from which we extracted several types of representations, i.e., embeddings, for the trolls.Our experiments on the "IRA Russian Troll" dataset have shown improvements over the state-of-the-art in the supervised scenario, while providing a compelling case for the distant-supervision scenario, which has not been explored before. 3n future work, we plan to apply our methodology to other political events such as Brexit as well as to other election campaigns around the world, in connection to which large-scale troll campaigns have been revealed.We further plan experiments with other graph embedding methods, and with other social media.Finally, the relationship between media bias and troll's political role that we have highlighted in this paper is extremely interesting.We have shown how to use it to go from the media-space to the user-space and vice-versa, but so far we have just scratched the surface in terms of understanding of the process that generated these data and its possible applications.

Table 1 :
Statistics and examples from the IRA Russian Trolls Tweets dataset.

Table 2 :
Summary statistics about the Media Bias/Fact Check (MBFC) dataset.

Table 3 :
Predicting the role of the troll users using full vs. distant supervision.