Account Deletion Prediction on RuNet: A Case Study of Suspicious Twitter Accounts Active During the Russian-Ukrainian Crisis

Social networks are dynamically changing over time e.g., some accounts are being created and some are being deleted or become private. This ephemerality at both an account level and content level results from a combination of privacy concerns, spam, and de-ceptive behaviors. In this study we analyze a large dataset of 180,340 accounts active during the Russian-Ukrainian crisis to discover a series of predictive features for the removal or shutdown of a suspicious account. We ﬁnd that unlike previously reported proﬁle and network features, lexical features form the basis for highly accurate prediction of the deletion of an account.


Introduction
Social media plays an important role in the life of millions of people. 1/7th of the world's population is using social media services such as Twitter, Facebook every day. There is no doubt that social media has positive effects on society by helping us to connect, communicate, access and spread information, and share our interests. Social media services have been effectively used to coordinate disaster responses (Sakaki et al., 2010), enhance emergency situational awareness (Yin et al., 2012) and coordinate crisis events (Bruno, 2011).
However, social media can potentially cause negative effects on our society. Social bots and spammers spread misinformation, 1 deceptive content, 1 Syrian hackers claim AP hack that tipped stock market by $136 billion. Is it terrorism? (Fisher, 2013). propaganda (Berger, 2015), manipulative campaigns over social networks on a large scale extremely fast e.g., several thousands retweets in a few minutes (Ferrara, 2015). Early detection of suspicious accounts that can potentially be spreading misinformation, manipulative and deceptive content is extremely important to ensure a safer and healthier environment in social media (Bamman et al., 2012b;Subrahmanian et al., 2016).
In this work we present an approach for automatically detecting deleted accounts in RuNet 2 collected during the Russian-Ukrainian crisis in 2014 -2015. We focused on this data because news media reported several cases of misbehavior and deceptive content spread by suspended or allegedly deleted accounts on Twitter relevant to the crisis. 3 Unlike the existing work on social bot prediction (Ferrara et al., 2014), suspended account analysis (Thomas et al., 2011) andnon-personal andspam user detection (Lin andHuang, 2013;Guo and Chen, 2014) we focus on a much harder task of automatically identifying fraudulent accounts (sometimes called trolls 4 ). Unlike social bots or spam accounts, troll profiles on Twitter and other social networks e.g., LiveJournal, VKontakte are created to look like real users. Trolls have similar follower and friend counts as the legitimate users engage in communications with other users, express opinions etc. That's why they are very difficult to detect compared to social bots 1 or spam accounts. Recent work on bot detection 5 analyzed 20,500 Twitter accounts that tweeted similar statements around key breaking news and events. The study suggested that bots follow many other bots, have no favorites and have no timezone, and never interact with other users through @replies and @mentions.
This is the first work that focuses on building predictive models and analyzing the effectiveness of different features to detect deleted accounts (including trolls 6 ) on Twitter using deeper linguistic analysis of user-generated content in Russian and Ukrainian, sentiment and emotion features, text embeddings and topics, in addition to profile, network, and behavior clues.

Dataset
To collect our data we sampled Twitter accounts which used crisis-related keywords in Russian or Ukrainian 7 from the 1% Twitter feed from Mar 2014 to Mar 2015. For example, translated tweet with the crisis-relevant keywords (underlined) is: A cache of rocket-propelled grenades was found in Kyiv which could be used for terrorist attacks.
The original dataset had 3.5 million users who used crisis-relevant keywords during this period. We then re-crawled a random sample of 1 million accounts within a couple of months (Jun 2015) of the initial data collection (Mar 2015). We discovered that 30% of previously active accounts have been deleted. We re-crawled these accounts in Dec 2015 to validate the accounts that have been deleted as of Mar 2015 and still remain deleted as of Dec 2015. We call this portion of the data deleted accounts D = 94, 170. We then randomly sampled 5 Social Network Analysis Reveals Full Scale of Kremlin's Twitter Bot Campaign (Lawrence, 2015). 6 We can not guarantee that these accounts might be potentially spreading deceptive content. However, after manual inspection of the tweets from 100 deleted accounts we found that all 100 accounts display characteristics and behavior shared by those involved in spreading deceptive content, for example, they only post/repost tweets relevant to crisis, there is high ngram/string similarity among their tweets. 7 Our lexicon of crisis-related keywords has been built independently by three native speakers of Russian and Ukrainian. The final lexicon contains 53 keywords in both languages e.g., Crimea, revolution, Donetsk, ceasefire, NATO, EU etc. the same number of accounts that were still active e.g., not deleted as of Mar 2015 and still remain active as of Dec 2015. We call this portion of the data non-deleted accountsD = 94, 170. For each user u ∈ {D,D} we were able to access at least 20 tweets with crisis-relevant keywords as well as user profile metadata.
In Table 1 we outline a comprehensive list of features we used to our build models. We significantly expanded the list of features that have been previously used for bot detection on Twitter (Ferrara et al., 2014). In addition to previously used account and behavior features our models rely on deeper linguistic analysis of content (tweets) generated by users, topics and embeddings, as well as visual and affect (sentiment and emotion) features. We outline the details on how we extracted lexical and affect features below.
BoW features Since Russian and Ukrainian are morphologically rich languages, to reduce sparsity and ensure better model generalization, we lemmatized words using pymorphy2 package. 8 We extracted bag-of-word (BoW) features from preprocessed lemmatized tweets; we also excluded all stopwords and words with frequency less than five; we run our experiments varying word ngram size (unigrams, bigrams and trigrams) for binary vs. normalized frequency-based features.
LSA features We performed linear dimensionality reduction on feature vectors extracted using BoW normalized frequency-based features as described above using Latent Semantic Analysis (Dumais, 2004) implemented as truncated Singular Value Decomposition (SVD) in scikit-learn. 9 Similarly, we performed linear dimensionality reduction on feature vectors extracted using hashtags and mentions. We varied the number of dimensions c = [50, 100, 500] to get the best F1 and report the results for c = 100.
Embeddings We learned word embeddings for Russian using Word2Vec's skip-gram and CBOW models (Mikolov et al., 2013) implemented in gensim package 11 with a layer size of 50. The embeddings are learned on the same corpus of 1 million tweets as LDA topics. After learning embeddings, we assign words to clusters by measuring cosine similarity between two word embeddings, and compute clusters using spectral clustering over a wordword similarity matrix.
Affect features Finally, to extract sentiment features we predict polarity score for every tweet for each user using the state-of-the-art sentiment classification system for Russian developed by , . Polarity scores vary around 0 (neutral) between -2 (negative) and +2 (positive). We calculate mean polarity scores, and the proportions of positive, negative and neutral tweets for every user (Dickerson et al., 2014).
To extract emotion features, we predict one of six Ekman's emotions such as: sadness, joy, fear, disgust, surprise and anger for each tweet using an approach recently developed by Mohammad and Kiritchenko (2015) and Volkova and Bachrach (2015). Similar to sentiment features, we use six emotion proportions per user as features.

Classification Results
In Table 3 we present account deletion classification results using individual feature types. We report our Profile (account and behavior) features |f prof | = 12 days since account creation, number of followers, number of friends, number of favorites, number of tweets, friend-to-follow ratio, name length in chars, bio in chars, screen name length in chars, screen name length in words, bio length words, avg. number of tweets per hour Visual features |f vis | = 658 bag-of-words (BoW) on profile background color, profile link color, text color, sidebar color, background tile, sidebar border color, default profile image Syntactic features |f syn | = 14 aver. tweet length in words, aver. tweet length in chars, retweet rate: prop. of RTs to tweets, uppercase word rate, elongated word rate, repeated mixed punctuation rate, prop. of tweets with links, tweets that are retweets (RTs), prop. of tweets with mentions, hashtags, punctuation, emoticons, mention, hashtag, url rate per word  results using 10-fold cross validation on a balanced set of 188,340 deleted and non-deleted accounts.
We found that lexical features are the most predictive yielding F1 as high as 0.87. Interestingly, we found that frequency-based features outperform binary features. It means that for account deletion prediction it is not only important what the users say but how much they say it. We also found that higher order ngrams only slightly outperform unigram features. When the dimensionality of the feature space is reduced from 110K to 1000 (Embeddings), 1,000 (LDA), and 100 (LSA), classification results drop by 0.11, 0.06 and 0.03, respectively. Syntactic features extracted using shallow linguistic analysis demonstrate lower F1 than lexical features, but higher F1 of 0.81 than the rest of non-lexical features.

Feature Analysis
To show that the differences between deleted and non-deleted accounts are statistically significant we performed a Mann-Whitney U-test on account, affect and syntactic features (Mann and Whitney, 1947). We found all differences to be significant (pvalue ≤ 0.001). We outline our key findings below.
Profile differences Deleted accounts have less followers than non-deleted accounts, but they have more friends. They have less favorites than nondeleted, as well as the tweets, and significantly lower friend-to-follower ratio. Deleted account have significantly shorter bios, but longer user names. Syntactic differences Deleted accounts generate shorter tweets, use less elongated words, capitalized words and repeated punctuation. They have lower hashtag, mention and url per word ratios. They produce significantly less retweets, tweets with hashtags, urls and mentions, tweets with punctuations and emoticons than non-deleted accounts. Sentiment and emotion differences Deleted accounts produce less positive tweets, more negative and more neutral tweets compared to non-deleted accounts. Deleted accounts express less anger, but significantly more sadness and fear in their tweets. Both account types produce comparable amounts of joy, disgust and surprise emotions. We present the examples of the most discriminative ngram, mention, hashtag and topic features learned by our models in Table 2.

Conclusion
We presented the first work on suspicious account deletion prediction in RuNet. We analyzed the predictive power of a variety of previously unexplored features including lexical, topics, hashtags, mentions, sentiments and emotions, in addition to the existing profile and behavior features. We found that deleted and non-deleted accounts on Twitter not only have different profiles, but also express significant differences in topics, hashtags and lexical terms they mention, the ways they generate tweets (syntactic differences), as well as sentiments and emotions they express. All of these differences allow building highly accurate models for detecting suspicious accounts in social media. 4