#SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns

We consider the task of automatically identifying participants’ motivations in the public health campaign Movember and investigate the impact of the different motivations on the amount of campaign donations raised. Our classiﬁcation scheme is based on the Social Identity Model of Collective Action (van Zomeren et al., 2008). We ﬁnd that automatic classiﬁcation based on Movember proﬁles is fairly accurate, while automatic classiﬁcation based on tweets is challenging. Using our classiﬁer, we ﬁnd a strong relation between types of motivations and donations. Our study is a ﬁrst step towards scaling-up collective action research methods.


Introduction
Social media is a valuable source for studying health-related behaviors (De Choudhury, 2014). For example, Twitter was used for disease surveillance (Lamb et al., 2013;Aramaki et al., 2011), and was studied for its role in disseminating medical information (Desai et al., 2012) and organizing public health campaigns (Emery et al., 2014;Wehner et al., 2014). Social media data provides many opportunities to study social phenomena such as health campaigns, but statistics based on aggregating across social media users only provide a big picture of the phenomenon. A deeper analysis of such phenomena requires fine-grained information about the involved users. Since such information is often not readily available, numerous studies have appeared on automatically inferring user characteristics (Bamman et al., 2014;Eisenstein et al., 2010;Nguyen et al., 2013).
In the context of health campaigns, social scientists have been interested in the motivations of the participants (Cugelman et al., 2011). Knowledge about individual motivations helps to explain the emergence and effectiveness of collective action, such as volunteering (Bekkers and Wiepking, 2011) or mobilizing other people . The Social Identity Model of Collective Action (SIMCA) (van Zomeren et al., 2008) identifies three key motivations of participants: 1) social identification with the campaign organization and community, 2) a perception of injustice about the cause, and 3) collective efficacy, the collective belief that the campaign can make a difference. Taken together, these three motivations predict the chance that an individual will participate in collective action, such as participation in an online health campaign. Aggregating motivations to group-level may explain the effectiveness of online health campaigns. Social scientists, however, have not used computational methods to measure these motivations (Johnston et al., 2009), so that their analyses are often confined to small datasets.
Our study is a first step towards scaling-up collective action research methods. To do so, we explore automatic classification of the motivation types according to the SIMCA model. We analyze the global health campaign Movember (movember.com), which aims to raise funds and awareness of men-related health issues by engaging online conversations. Movember's fundraisers ask their friends to sponsor their moustache and their efforts in the month of November. The funds are donated to research concerned with menrelated health issues, such as prostate cancer.
Movember participants provide their motivations in their Movember profile. For example, a participant writing 'In honor of my Grandfather' could be considered having an injustice motivation, while 'To lead the brave men of Team [...] (and our exceptionally understanding significant others) in epic moustachery.' indicates a social identification motivation. Because such explicit motivation statements are not available for many online health campaigns, we also explore motiva-tion classification based on the tweets of participants during the campaign instead.
Our paper makes the following contributions: • We automatically classify the motivations of Movember participants and explore the use of free-text motivations provided in Movember profiles and tweets posted by the participants during the campaign (Section 3).
• We apply our classifier to all US and UK Movember profiles and find that participants with an injustice motivation raise significantly more funds (Section 4).

Dataset
In this section we discuss the collection and the annotation of the data.

Collection
We collect data from two different sources.

Movember Profiles
We focus on participants from the two countries with the highest number of English speaking Movember participants: the United States and the United Kingdom. From Movember we obtained the identifiers of all participants of these two countries and we crawled all US and UK Movember profiles in May 2015.
We extracted information such as the name, motivation (free-text), amount raised and whether the participant was part of a team. We collected 166,422 US and 138,546 UK profiles.
Twitter Data We link Movember participants to Twitter accounts based on tweets with a link to a Movember profile in 2013 and 2014 (e.g., 'please support my moustache [LINK]'). If the Levenshtein distance between the name of the author of the tweet and the name in the Movember profile was 1 or less, we considered it a match (in total: 5,519 users). Manual inspection of 100 matches showed that this method was highly precise (100% precision). However, some matches were missed due to the low Levenshtein distance threshold. For each Twitter user in our dataset, we collected the last 3,600 tweets. We kept all tweets written between October 18 and December 14 (2 weeks before and after the campaign). For each user, we used tweets from either 2013 or 2014, depending on whether the user posted a tweet with a Movember link at least once during the period, given preference to the year 2014.

Annotation
We annotated the campaign participants based on their provided motivations in their 'My motivation' section of their Movember profiles. The motivation categories in our codebook are based on the Social Identity Model of Collective Action (van Zomeren et al., 2008): • Injustice: A shared emotion that includes both affective (e.g, anger) and cognitive perceptions (ideology) of an unfair situation (van Zomeren et al., 2008). It covers the ideological motivation to join a campaign, when potential participants compare the cause and the situation of patients with their personal values (Klandermans, 2004). For example, 'my dad', 'I had testicular cancer' or 'because men's health is important to me'.
• Social identity: A sense of belonging together that emerges out of common attributes, experiences and external labels (van Zomeren et al., 2008). Participants may have social motivations to identify with the online health campaign, while not being interested in the cause (Kristofferson et al., 2014). This category includes psychological benefits, such as reputation or fun, that the social interactions of a campaign provide. For example, 'my friends asked me again to join them', or 'a great excuse to grow a stache'.
• Collective efficacy: The shared belief that ones group is capable of resolving its grievances through a campaign (Bandura, 2000;Klandermans, 2004;van Zomeren et al., 2008), for example by stating 'this campaign can make a difference!'.
Multiple motivations may be assigned to a single campaign participant. Exactly recurring motivation texts that occurred frequently (more than 50 times, based on data analysis), were most likely prefilled texts. They were not annotated, because it was unclear whether participants used these 'default' motivations on purpose. For example, the most frequent motivation 'my motivation is to use the power of the moustache to have an everlasting impact on the face of mens health' appeared in 104k profiles. The interrater reliability calculated using Cohen's Kappa was found to be satisfactory to good based on 200 double annotations: injustice (0.71), social identity (0.67) and collective efficacy (0.47) (Landis and Koch, 1977).  Table 1: Results free-text motivations: precision (P), recall (R), F 1 score and AUC.
From the set of Movember participants with matched Twitter accounts, we annotated a randomly selected subset of 2,108 participants. 21.8% of the participants had more than one motivation type assigned. We randomly split our dataset into a training and test set (

Classification Experiments
In this section we present results on automatically identifying the motivations of Movember participants. Because participants may have multiple motivation types, we train binary classification models for each motivation type separately. We use logistic regression with L2 regularization, implemented using the Scikit-learn toolkit (Pedregosa et al., 2011). We report results on the test set using precision, recall, F 1 score and the Area Under Curve (AUC) metric. Note that a majority class classifier achieves an AUC of 0.5. Feature development and parameter tuning was done based on cross-validation on the training set. Based on the same set of Movember participants, we explore the use of two different types of data: the provided free-text motivations in Movember profiles (Section 3.1) and tweets of the participants (Section 3.2).

Free-text Movember Motivations
All text is lowercased and tokenized. We explore the following features: 1) Token unigrams and bigrams (frequency values), 2) LDA with 20 topics (Blei et al., 2003) trained on text from all US and UK Movember profiles (with the topic proportions as feature values), 3) Text length, and 4) Country (US=1, UK=0) to control for prior motivation distributions in the two countries. The token features already lead to a high performance, and no notable increase in performance is observed by adding the other features (Table 1). The features with the highest weight are shown in Table 4. The performance numbers are in line with the obtained inter-annotator agreement. For example, the performance is highest on the injustice category, which also had the highest inter-annotator agreement (and vice versa for collective efficacy).
The lengths of texts alone have predictive power. The texts are short (on average 158.4 characters), but there are markable differences between motivation types. Participants with an injustice motivation write longer motivations: the average length of their texts is 213.74 chars in the training set, compared to 148.24 chars (social identity) and 130.93 chars (collective efficacy).   Table 3: Results on tweets: precision (P), recall (R), F 1 score and AUC.

Tweets
In this section, we present experiments on identifying the motivations based on Twitter data.
Preprocessing Many of the tweets posted during the time of the campaign are not about the campaign itself. Based on manually selected character sequences 2 , we separate relevant from nonrelevant tweets. The tweets are tokenized using the CMU POS tagger (Owoputi et al., 2013). The average number of tweets per user during the studied period is 109.1 (median: 46.0) and the average number of relevant tweets is 8.0 (median: 4.0).
Features We explore the same features as with the free-text motivations and several new features: • Unigram and bigram tokens: URLs and user mentions are replaced by generic tokens. We only keep tokens used by at least 10 Twitter users and we use their normalized frequency.
• User mentions: The Twitter accounts that are mentioned.
• LDA with 20 topics (Blei et al., 2003). The model is trained on 1.5M tweets from 2013 and 2014 about the Movember campaign.
• Behavior: Fraction of retweets, tweets that contain a user mention, hashtag, URL, or are a reply. Number of days with a tweet about Movember. Fraction of tweets in each week.

Results
The results are reported in Table 3. The URLs and behavior features were excluded from the run with the combined feature set, because their individual results suggest no predictive power (possibly due to the small training set). The results are fairly low and just above the 0.5 AUC value of a random classifier. To test whether the best performing classifiers for each motivation type (based on their AUC scores) are significantly better than a random classifier, we use permutation tests. We permute the labels to break the link with the features and calculate the AUC scores of the classifiers by training and testing on 1000 of such permutations. The best classifiers for the injustice and social identity motivation types are significantly better than random (p <0.01), but the performance of the collective efficacy classifier is only slightly significant (p <0.05).
To understand the low performance numbers, we took a closer look at the task and the data. First, we aimed to get a sense of the difficulty of the task. In a small experiment based on 100 Twitter users from the test set, one of the authors read the tweets and tried to identify the motivations. The results were also low (injustice: 0.488, social identity: 0.548, and collective efficacy: 0.590), suggesting that the task in itself is also difficult for humans.
The task is challenging because many users only post a few tweets about the campaign. In our data, 382 users have only one relevant tweet and 1,271 users have 5 relevant tweets or less. Furthermore, many of the tweets posted during the campaign focus on the Movember community (Bravo and Hoffman-Goetz, 2015;Dwi Prasetyo et al., 2015), making it hard to distinguish between the different motivations. For example, instagram.com is among the top three of hostnames for all motivation types. Sometimes participants do explicitely mention their motivation (e.g., 'In honour of my dad, [..], I'm growing a horrible moustache for an incredible cause, #Movember. Donate here: [LINK]'), but such instances are rare and in general the motivations of participants are much less visible through their tweets.
Social media plays a large role in mediating social relationships and users adapt their behavior to the online communities they are participating in (Danescu-Niculescu-Mizil et al., 2013;Nguyen and Rosé, 2011). This may explain why most participants, regardless of their motivation, emphasize the Movember community and its practices (such as the growing of moustaches) in their tweets. Various studies within the emerging field of Computational Social Science (Lazer et al., 2009) have found that Twitter tends to be a good reflection of society (Lamb et al., 2013;. However, our results emphasize that the nature of the used platform influences how humans behave, and that this should be taken into account when interpreting the data. In the case of Movember, Twitter data alone could give a misleading view of the motivations of the campaign's participants.

Motivations and Campaign Behavior
In this section we present a linear regression analysis (n=90,484) of how motivations affect campaign donations by applying our classifier to all US and UK Movember profile texts. Participants of the Movember campaign can be part of a team. We therefore included actual team membership as a control variable, as we expect that team members increase fundraisers' effort due to peer pressure. In our analysis, we exclude all participants that have predefined motivations (214,484 out of the 304,968 profiles), because these may not reflect the actual motivation.
The social identity motivation is the most frequent in both countries, but the countries differ in their distributions regarding the injustice and collective efficacy motivations (Table 5)  On average, US participants donate more than UK participants (Table 6). US campaign participants with an injustice motivation raise significantly (coef = 91.525, p < 0.001) more money than participants with a social identity (coef = −5.479, p = not significant) or collective efficacy motivation (coef = −5.765, p < 0.1). Participants that are part of a team raise significantly (coef = 75.849, p < 0.001) more money than participants without a team. Similar results were obtained for the UK. Furthermore, participants with a social identity motivation are more often part of a team (UK: 58% vs. 51% of the participants without a social identity motivation, US: 80% vs. 76%). The regression analysis reveals that being part of a team has a stronger and more positive effect on the amounts raised than the expression of identity as a motivation in the Movember profiles. Our findings are in line with recent Slacktivism research which proposes that people that express social motivations are reluctant to give more than token support due to a lack of interest in the campaigns cause Kristofferson et al., 2014). Actual team membership, however, contributes to the effectiveness of online fundraising.

Conclusion
We explored the task of automatically identifying the motivations of Movember campaign's participants. A classifier based on Movember profile texts performed better than a classifier based on Twitter data, possibly due to the role of Twitter in building social relationships. Based on US and UK Movember data, we found a strong link between motivations and donations, and motivations and team membership. Classification of motivations might help campaign organizers to improve their communication strategies. Our study is limited to the Movember campaign. Future research might diverge to other types of online collective action, such as online petitions and open source communities. We also plan to explore larger datasets and features based on network structures.