Calls to Action on Social Media: Detection, Social Impact, and Censorship Potential

Calls to action on social media are known to be effective means of mobilization in social movements, and a frequent target of censorship. We investigate the possibility of their automatic detection and their potential for predicting real-world protest events, on historical data of Bolotnaya protests in Russia (2011-2013). We find that political calls to action can be annotated and detected with relatively high accuracy, and that in our sample their volume has a moderate positive correlation with rally attendance.


Introduction
Calls to action (CTAs) are known to be effective means of mobilization in social networks (P.D. Guidry et al., 2014;Savage et al., 2016), and they are also known to be a target for censorship by authoritarian states (King et al., 2013(King et al., , 2014. However, to the best of our knowledge, they have not been systematically evaluated for their potential for automatic detection and predicting offline protest events. We contribute a case study on political CTAs in historical data on Bolotnaya protests in Russia (2011Russia ( -2013. We identify 14 core and borderline types of political CTAs, and we show that they are relatively easy both to annotate (with IAA 0.78) and to classify (F1 of 0.77, even with a small amount of annotated data). All of that puts them at high risk for censorship, but also opens the possibilities to track such censorship. We also find that in Bolotnaya data, the volume of CTAs on social media has a moderate positive correlation with actual rally attendance. Social movements differ in their goals (reforms or preservation of status quo), size of the group they are targeting, methods, and other factors (Snow et al., 2004), but their success always ultimately depends on successful mobilization of new participants. The role of social media in that has been clear since the Arab Spring (Dewey et al., 2012). Social media fundamentally changed the social movements, enabling new formats of protest, a new model of power, and greater activity outside of formal social organizations (Earl et al., 2015). Expert judgement is famously unreliable for predicting political events (Tetlock, 2017). So, if social media play such an important role in social movements, can they also be used to track and perhaps predict the real-world events? By now hundreds of studies explored various kinds of forecasting based on social media (Phillips et al., 2017;Agarwal and Sureka, 2015), from economic factors to civil unrest. Most of them show that their techniques do have predictive merits, although some skepticism is warranted (Gayo-Avello et al., 2013).
Most of the civil unrest prediction work is done on Twitter and news, sometimes in combination with other sources such as blogs and various economic indicators (Ramakrishnan et al., 2014;Manrique et al., 2013). The basic instrument of analysis in most of these studies is time series of social media activity on a given topic (Hua et al., 2013). Data filtering is typically performed via protest-related keywords, hashtags, geolocation or known activist accounts. Many studies also rely on some combination of spatiotemporal features (e.g. Ertugrul et al. (2019); Zhao et al. (2015)). The texts of posts could be mined for extracting struc-tured event-related information, or dense meaning representations could be used without identifying specific features, such as doc2vec representations of news articles and social media streams (Ning et al., 2016). Additionally, social network structure (Renaud et al., 2019) and activity cascades (Cadena et al., 2015) were also found useful, as well as mining and incorporating demographic information (Compton et al., 2014).
The typically-used features that are extracted from social media text include time, date, and place mentions, sentiment polarity of the post, and presence of violent words (Benkhelifa et al., 2014;Bahrami et al., 2018). Another popular approach relies on manually created lexicons of protestrelated vocabulary (such as "police", "molotov", "corruption", etc.) combined with event-specific names of politicians, activists etc. (Spangler and Smith, 2019;Mishler et al., 2017). Korolov et al. (2016) identifies possible stages of mobilization in a social movement (sympathy, awareness, motivation, ability to take part). To the best of our knowledge, CTAs have not been systematically investigated for their predictive potential.

Censorship in social media
Similarly to the systems used to predict offline events, many current censorship systems seem to rely on keywords (MacKinnon, 2009;Verkamp and Gupta, 2012;Chaabane et al., 2014;Zhu et al., 2013). However, it is highly likely that states engaging in suppression of collective action are researching more sophisticated options, and it is therefore imperative that censorship monitors also have better tools to monitor what gets deleted.
Much of research on Internet censorship focuses on China, where there does not seem to be a single policy enforced everywhere: local organizations and companies show significant variation in their implementations (Knockel et al., 2017;Miller, 2018;Knockel, 2018). This depends on only on the goals of the platform, its ties or dependence on the government, but also the market forces: a competing platform that would find a way to censor less would be more attractive for the users (Ling, 2010). The actual process also varies based on the available resources: it is likely that larger companies have significant censor staff (Li and Rajagopalan, 2013), while others might rely only rely on simple keyword filtering. Finally, even at government level not all criticism is disal-lowed: a significant degree of freedom seems to be allowed with respect to local social movements that are unlikely to become a threat to the regime (Qin et al., 2017).
Calls to action seem to be an obvious candidate for types of verbal messages strongly associated with social movements, and they are known to be effective means of mobilization in social networks (P.D. Guidry et al., 2014;Savage et al., 2016). In particular, King et al. (2013King et al. ( , 2014 report that sometimes the censors let through materials that are simply critical of government, but flag materials with collective action potential (such as calls to attend a rally or support opposition). The effort to shut down the collective action is clear, for example, from the fact that Instagram was simply shut down for 4 days while photos of Hong Kong protests were trending (Ma, 2016).
To the best of our knowledge, the censorship potential of CTAs has also not been specifically addressed in the context of political protests.

Case study: Bolotnaya protests, Russia
Our case study is the 2011-2013 Russian protests, of which the best known is the "March of the Millions" on May 6, 2012 in the Bolotnaya square in Moscow. The movement was widespread, with protests in many smaller Russian cities and towns. The protesters were opposing fraudulent elections and government corruption. This was the largest protest movement in Russia since 1990s.
The experiments discussed below rely on the "Bolotnaya" dataset that contains posts, likes and groups of users from VKontakte, the largest Russian social network. The main statistics for the dataset are shown in Table 1. It was created by the New Media Center (Moscow, Russia) in 2014 on the basis of a list of 476 protest groups, which was compiled by Yandex (the largest Russian search engine). The data is used by an agreement with New Media Center. Enikolopov et al. (2018) report that the number of users of VK social network in different locations was in itself associated with higher protest activity, and locations where the user base was fractured between VKontakte and Facebook had fewer protests, which overall suggests that the main role of social media was the ease of coordination (rather than actual spreading of information critical of the government). This is consistent with the reported role of Facebook in Egypt's  (Tufekci and Wilson, 2012). If these conclusions are correct, then higher volume of CTA should in itself also be a factor in higher protest attendance.

Defining Calls to Action
Prototypical CTAs are imperatives prompting the addressee to perform some action, such as "Don't let the government tell you what to think!". This seems like a straightforward category to annotate, but in reality CTAs may be expressed in various ways, including both direct and indirect speech acts. There are many borderline cases that would in the absence of clear guidelines decrease interannotator agreement (IAA). There is relevant work on the task of identification of requests in emails (Lampert et al., 2010) and intention classification for dialogue agents (Quinn and Zaiane, 2014), but, to the best of our knowledge, this work is the first to create a detailed schema for CTA annotation in the context of a political protest.
The current work on censorship is concerned not so much with CTAs in particular, but with a broader category of "material with collective action potential". King et al. (2013) defines such materials as those that '(a) involve protest or organized crowd formation outside the Internet; (b) related to individuals who have organized or incited collective action on the ground in the past; or (c) relate to nationalism or nationalist sentiment that have incited protest or collective action in the past.' In other words, this definition only concerns offline events, and does not include various forms of "crowd protesting" such as calls to share information critical of the government.
Based on extensive manual analysis of samples from Bolotnaya dataset, we identified 5 core and 9 borderline cases for political CTAs, shown in Figure 2. Since we were interested in CTAs for social movements, we excluded any other CTAs that would formally fit the criteria, such as invitations, marketing CTAs etc. We also excluded any other protest-related posts, such as reports of protest events. Of the core and borderline CTA cases, we chose to consider 8 as CTAs.
This choice does not have a firm theoretical underpinning and would vary depending on the researcher's perspective and the case study. For example, in our Bolotnaya data we opted to not include broad rhetorical questions like "For how much longer shall we put up with this?", but in a different context (especially in a different culture) they could be key. Inter-annotator agreement depends on how the guidelines' describing the chosen policy explicitly.

Annotation study
Pilot data analysis made it clear that the CTA and non-CTA classes were not balanced. Since CTAs overall constitute a small portion of all posts, we pre-selected the data for annotation using a manually created seed list of 155 protest-related keywords and phrases, such as "participate", "share", "join", "fair elections", etc.
We used our schema to develop detailed annotation guidelines for an annotation study on 1000 VKontakte posts from Russian Bolotnaya data. The annotation was performed on the level of full post, not individual sentences. We considered a post as CTA if it included even one instance of a political CTA as defined above. Ambiguous cases were treated as political CTA, as long as they could function as such: for example, Join us tomorrow! could refer to both a protest or a birthday party.
Each post was annotated by 3 native Russian speakers, using the classification interface of Prodigy 1 annotation tool. The inter-annotator agreement as estimated by Krippendorf's alpha was .78. In the end, we obtained 871 posts on which at least 2 annotators agreed. 300 of them were identified as CTAs, and 571 -as non-CTAs. This was used as the training dataset for the work to be described in subsequent sections.

Classification
In our experiments, we randomly split the collected CTA dataset into the train and test parts in the 80/20% ratio. We selected Logistic Regression (LR) and Support Vector Machine classifier with a linear kernel (SVC) as our baseline models. Both models were used as implemented in the  Examples for each type: 1. Everybody, join us tomorrow in Sakharov square! 2. If you love Russia, if you love your home city of Smolensk, start the fight with the crooks and thieves! 3. Do not form a line or arrange to meet in a specific place. 4. Invite foreign press and TV -let them see what is going on in our capital! 5. Observers in Kaluga, please respond! 6. That's ok, we will tell them what we think of them even in the square in front of the Central market! 7. I suggest we put on white stripes on our arms as a symbol of honest elections. That's easy to do! 8. On the 10th of March we should come in large numbers! 9. You can download the leaflet with the invitation here. 10. This is the beginning! We will start activities when we will have 50 members. We repeat, participation in this group can only be active. 11. I do like the idea of the government's resignation, but I think your slogans are too emotional. Furthermore, I'm against calling an early election. 12. Out with you, McFaul! And take Putin and Medvedev with you, together with Nemtsov and Chirikova! 13. Is THAT really our choice? (rhetorical) Today at 10 pm Vlad and I are going to post the leaflets around the city. Who wants to help us? (factual) 14. Together we will get rid of Putin's lies and dictatorship! 15. Everybody, come to my birthday party on Saturday! scikit-learn 2 library. In both cases we used TF-IDF representations of both original posts and posts lemmatized with pymorphy 3 library (Korobov, 2015). We picked the best regularization hyperparameters for each model through crossvalidation based on the average F1 score over 5 folds.
The current state-of-the-art deep learning approaches rely on large Transformer-based models pre-trained on large text corpora and then finetuned for a given task. In particular, we tried two versions of BERT (Devlin et al., 2019): the multilingual model released in the PyTorch repository of BERT 4 , and the Russian version (RuBERT) released by DeepPavlov 5 . The latter model is initialized as multilingual BERT and further fine-tuned on Russian Wikipedia and news corpora (Kuratov and Arkhipov, 2019). Both models have 12 layers and 180M parameters. We trained both models for 40 epochs with the batch size of 32 and the learning rate of 5e −5 .
Additionally to BERT representations, we experimented with the contextual embedder of the ELMo model (Peters et al., 2018) pre-trained for Russian and released by DeepPavlov 6 . The posts were split into sentences using the NLTK library 7 and each sentence token was encoded by the ELMo embedder into a 1024-dimensional vector. The classification was performed by a standard LSTM network (Hochreiter and Schmidhuber, 1997) with a hidden size of 256 units followed by a linear layer. We trained the network for 25 epochs and with the learning rate of 0.001.
The results of all the classification experiments are shown in Table 2. The best performance was achieved by RuBERT, with LSTM on ELMo close second. The effect of lemmatization with linear classifiers is inconsistent. It is interesting that simple logistic regression with lemmatizatized TF-IDF representation of the posts is only 4 F1 points below ELMo, which suggests that the overall classification task is not very difficult.

CTAs for Predicting social unrest
To estimate the potential usefulness of CTAs as indicators of offline protest events, we ran the trained RuBERT CTA classifier over 91K posts falling in the date range between Dec 2011 through Jul 2013 from the Bolotnaya dataset. Figure 3 shows the volume of posts identified as CTAs, plotted against the Wikipedia data about attendance of individual rallies 8 . When no attendance data is available, we assume that there were 0 protest events. The two green lines correspond to upper and lower attendance estimates. The blue line shows the detected CTAs.
Despite the noisiness and incompleteness of the available protest data (see subsection 9.1), the Pearson's correlation between attendance estimates and the number of detected CTAs is about 0.4, which is considered to be "moderate". This could make CTAs a useful additional factor to systems based on spatiotemporal, demographic, and/or network activity features.  We also conducted experiments to estimate the real-world effect of likes and reposts of CTA posts. Intuitively, one would expect that a higher number of likes and reposts of CTA posts should result in higher attendance for protest rallies. To see whether that was the case for Bolotnaya data we calculated the number of shares and likes on posts detected as CTAs by our classifier, and all other posts in the sample. Figure 4 shows these numbers plotted against the attendance of the protest events.
The pattern we actually observed in Bolotnaya data is different: before the March of the Millions the average number of both reposts and likes is spiking before a protest event, and going down after it. This corresponds to preparation and the aftermath of a major event. Interestingly, after the March of the Millions there was much like/repost activity which did not result in any larger events. This can be attributed to the introduction of the anti-protest laws that effectively stifled the movement: the link between social media and realworld activity clearly becomes weaker.

Censoring CTAs
Our annotated dataset is quite small (only 871 posts), and this is on purpose: our point is that even with such a small (and unbalanced) dataset it is already possible to obtain a reasonably good classifier (and its performance would likely improve with more data). This is an additional factor in censorship potential: if a system for detecting CTAs could be built quickly and cheaply, it is highly likely that such systems are already being developed by the well-sponsored research teams employed by the authoritarian states. Our study should at least level the playing field for censor-ship monitors, as will be discussed below. The guidelines we developed will be made available on request by researchers.
The data specific to our Bolotnaya case study would not be openly released because, 8 years after the events, the issues that were driving them continue to be the key factors in the activities of the Russian opposition movements. In particular, Russia has just experienced a new wave of protests estimated to be the largest since 2012 (Wilpert, 2019), also driven by the issues of corruption and fair elections, and resulting in hundreds of arrests (BBC, 2019a,b). Many of the key political figures on both sides are also the same. All this makes our Bolotnaya data potentially useful for censoring new protests.
The situation actually became worse for the protesters because since 2011 a range of new laws went into action to restrict activity on social media. The social network users and popular bloggers are personally identifiable (via their phone numbers), VPNs are illegal, and social network operators are obliged to store activity data for 6 months and decrypt them for authorities (House, 2018;Wesolowsky, 2019). Activists can be imprisoned for sharing "inauthentic and illegal information of social importance", a broad formulation that is interpreted freely by the authorities (Schreck, 2017).

Web monitoring potential
As discussed above, materials with collective action potential are already undergoing active censorship in authoritarian states, and it is highly likely that classifiers similar to ours are actually already in place. We hope that our study would somewhat level the playing field for those who combat the censorship.
In particular, if authoritarian states are able to detect CTAs for censorship, it is equally possible to use CTA classifiers in monitoring systems that would scan the web for content that is removed, and report on the ongoing censorship. At present monitoring efforts rely on manual and keyword analysis (MacKinnon, 2009;Verkamp and Gupta, 2012;Chaabane et al., 2014;Zhu et al., 2013). Note that if data on what is being censored were continually collected, the censors would actually "help" the monitoring efforts: by flagging and removing content they would essentially be providing free annotation.
One more use of CTA classifiers would be to help the protesters to find new creative ways of expressing their views that would pass the automatic filters, such as the Chinese egao phenomenon (Horsburgh, 2014;Yates and Hasmath, 2017). Providing an independent web-service with which the activists could check how easy their message is to flag would arguably boost such creativity and provide the activists with their own weapons in the linguistic race against the authoritarian states.

Limitations
The present study is limited in several ways. First, the small size of annotated data ony provides a lower bound on the performance on CTA classifiers, which would likely increase with more annotated data. However, our point was not in achieving the best possible performance, but in showing that automatic detection of CTAs is possible even with relatively little data.
The second limitation comes from the lack of reliable attendance data for Bolotnaya protestsa situation pervasive in authoritarian states with tight control over media and civic organizations. For example, the official police report for the Kaluga square event stated 8,000 people, while opposition politicians reported 100-120,000. According to bloggers, there were 30,000 people, and a Russian parliamentarian estimated 50-60,000 9 . However, this limitation would impact any prediction method, and it arises precisely in the situations in which the most important protest activity is happening.
Last but not the least, the whole field of forecasting in with social media data is suffering from the lack of common best practices, which is aggravated by the impossibility to replicate most of the results due to data sharing concerns (Phillips et al., 2017). This study is not an exception: Bolotnaya data in this study was used by an agreement with New Media Center, and we cannot release it publicly. Without major changes in accessibility of social network data for researchers, the only way forward in the field seems to be partial validation by similar patterns uncovered in other case studies.

Conclusion
Calls to action are a vital part of mobilization effort in social movements, but, to the best of our knowledge, their potential for censorship and predicting offline protest events has not yet been evaluated.
We examine political calls to action in a case study on historical data on Bolotnaya protests in Russia (2011Russia ( -2013. We identify 14 core and borderline types of political CTAs, and we show that they are relatively easy to annotate (with IAA 0.78) and detect automatically (F1 of 0.77, even with a small amount of annotated data), which puts them at high risk for censorship in authoritarian states. We also find that in Bolotnaya data, the volume of CTAs on social media has a moderate positive correlation with actual rally attendance.