When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data

Sexism is prevalent in today’s society, both offline and online, and poses a credible threat to social equality with respect to gender. According to ambivalent sexism theory (Glick and Fiske, 1996), it comes in two forms: Hostile and Benevolent. While hostile sexism is characterized by an explicitly negative attitude, benevolent sexism is more subtle. Previous works on computationally detecting sexism present online are restricted to identifying the hostile form. Our objective is to investigate the less pronounced form of sexism demonstrated online. We achieve this by creating and analyzing a dataset of tweets that exhibit benevolent sexism. By using Support Vector Machines (SVM), sequence-to-sequence models and FastText classifier, we classify tweets into ‘Hostile’, ‘Benevolent’ or ‘Others’ class depending on the kind of sexism they exhibit. We have been able to achieve an F1-score of 87.22% using FastText classifier. Our work helps analyze and understand the much prevalent ambivalent sexism in social media.


Introduction
Sexism, as given by the Oxford dictionary, is the 'prejudice, stereotyping, or discrimination, typically against women, on the basis of sex'. Sexism is rife in the society's belief system and its manifestation online is not uncommon (Eadicicco, 2014). For example, Australian game show, My Kitchen Rules often prompts sexist tweets against its female participants. E.g.: 'Trying to find something pretty about these blonde idiots. #MKR'. However, evidence suggests that sexist remarks may not always express negative emotion (Becker and Wright, 2011). For instance, Rio Olympics shed light on the blatant as well as seemingly innocuous sexism that female athletes face, when, after the victory of 3-time Olympian Corey Cogdell-Unrein in women's trap shooting, Chicago Tribune tweeted, 'Wife of a Bears' lineman wins a bronze medal today in Rio Olympics' 1 . Katie Ledecky's record breaking win in 400-meter freestyle race was applauded by a lot of people while simultaneously commenting that 'she swims like a man' 2 . These are excellent examples of benign form of sexism prevailing in these times.
In their seminal paper, Glick and Fiske (1997) proposed ambivalent sexism theory that talked about two related but opposite orientations towards a particular gender: (i) Hostile Sexism (HS), i.e., sexist antipathy and (ii) Benevolent Sexism (BS), i.e., a subjectively positive view towards men or women. Hostile sexism is angry, harsh and expresses an explicitly negative viewpoint. E.g.: 'Jus gonna say it...again....DUMB BITCH! #MKR'. Benevolent Sexism, on the other hand, is often disguised as a compliment. E.g.: 'They're probably surprised at how smart you are, for a girl'. Moreover, there is a reverence for the stereotypical role of women as mothers, daughters and wives. BS puts women on a pedestal, but reinforces their sub-ordination. E.g.: 'No man succeeds without a good woman besides him. Wife or mother. If it is both, he is twice as blessed'. Despite the positive feelings of BS, it's underpinnings lie in masculine dominance and stereotyping both men and women. It shares the common assumption that women inhabit restricted domestic roles and are the 'weaker sex'. Although, it may not be immediately apparent, this also implicitly stereotypes men.
Sexism has far-reaching consequences for women as well as men. It has been seen that despite it's seemingly positive and inoffensive tone, benevolent sexism has worse effects than hostile sexism on women's cognitive performance (Dardenne et al., 2007). Furthermore, the experiments conducted by Russo et al. (2014) demonstrate how social justification (Jost et al., 2004;Jost and Kay, 2005) and benevolent sexism are positively correlated. Additionally, they conclude that gender inequality is promoted not only by hostile sexism but also by the subtle and more deceptive, benevolent sexism.
Recently, efforts have been made for detection of sexist content from the internet. Some of the tweets in Waseem and Hovy's (2016) publicly available hate speech dataset of 16k tweets are sexist. But as expected in a hate speech corpus, these sexist tweets express only hostile sexism. It is evident that the approaches that detect sexism online have overlooked benevolent sexism.
In order to address the above shortcoming, we propose computational models to automatically classify a tweet into one of the three classes: • Benevolent: if tweet exhibits subjectively positive sentiment but is sexist • Hostile: if the tweet exhibits explicitly negative emotion and is sexist • Others: if the tweet is not sexist To the best of our knowledge, there has not been any previous study in computationally identifying benevolent sexism and classifying sexist content into two different classes depending on the nature of sexism.
The rest of the paper is organized as follows. Section 2 presents existing literature in related areas like hate speech detection, sentiment analysis and identification of sexist content from social psychology point of view. Section 3 illustrates the process of dataset creation and annotation for BS tweets. Additionally, it describes the available dataset of HS tweets that we used for our experiments. Section 4 and 5 describe the technical aspects of the experiments conducted for the classification of tweets. We discuss the results of the experiments in Section 6 before concluding the paper in Section 7.

Related Work
A considerable amount of work has been done in social psychology for identification of sexist content and its impact. Research has provided evidence that not only men but also women endorse sexist beliefs (Barreto and Ellemers, 2005;Glick et al., 2000;Jackman, 1994;Kilianski and Rudman, 1998;Swim et al., 2005). Becker and Wagner (2008) introduce Gender Identity Model (GIM) using social identity theory (SIT) (Hogg, 2016) and social role theory (SRT) (Eagly et al., 2000) to explain women's endorsement of sexist beliefs. They conclude that women reject benevolent and hostile sexism when they highly identify themselves with the category 'women' and have a progressive outlook. In contrast, gender role preference has weaker or no effect on sexist beliefs when women do not strongly identify themselves with their gender in-group.
The work by Bolukbasi et al. (2016) revealed the hidden gender bias in Word2Vec. They showed how Word2Vec word embeddings were sexist because of the bias in news articles that made up the Word2Vec corpus. For a relation like, 'father : doctor :: mother : x', Word2Vec gives x = nurse. And the query 'man : computer programmer :: woman : x', returns x = homemaker. In order to address this warping, they transformed the vector space using a method called 'hard de-biasing' and removed the bias.
Hate speech detection, that includes identification of sexist content, has garnered a lot of attention in recent times. Djuric et al. (2015) try to address this problem in online user comments. Using neural networks, they learn distributed lowdimensional text representations, where semantically similar comments and words reside in the similar part of vector space. They, then, feed this to a linear classifier to identify hateful and clean comments. Davidson et al. (2017) use hate speech lexicon to collect tweets containing hate speech keywords. They train a multi-class classifier to separate these tweets into one of the three classes: those containing hate speech, only offensive language, and those with neither. Hate speech dataset, containing sexist tweets, has been made publicly available by Waseem and Hovy (2016). This dataset contains 16k tweets that fall into one of the three classes: sexist, racist or neither. They list a set of criteria based on critical race theory to annotate the data and then use Support Vector Machines (SVM) with handcrafted features to classify tweets. However, one of the major drawbacks of the decsribed approaches and dataset is that it takes into account only hostile sexist tweets.
To better understand the nature of sexism, sentiment analysis can be done. In recent times, sentiment analysis of Twitter data has received a lot of attention (Pak and Paroubek, 2010). Some of the early works by Go et al. (2009) and Bermingham and Smeaton (2010) use distant learning to acquire sentiment data. They show that using unigrams, bigrams and part-of-speech (POS) tags as features, SVM outperforms other classifiers like Naive Bayes and MaxEnt. To remove the need for feature engineering, Agarwal et al. (2011) use POS-specific prior polarity features and tree kernel for sentiment analysis. To detect contextual polarity using phrase-level sentiment analysis, Wilson et al. (2005) identify whether a phrase is neutral or polar. If the phrase is polar, they then disambiguate the polarity of the polar expression. State-of-the-art sentiment analyzers use deep learning techniques like Convolutional Neural Network (CNN) (Dos Santos and Gatti, 2014) and Recursive Neural Network (Tang et al., 2015) based approach to learn features automatically from the input text.

Dataset
For the purpose of classification of tweets on the basis of the type of sexism, we required a dataset that displayed benevolent sexism (BS). Hence, we created our own corpus of tweets belonging to 'Benevolent' class. In addition to this, we used the publicly available hate speech corpus (Waseem and Hovy, 2016) to collect tweets belonging to 'Hostile' and 'Others' classes. Tweets labelled as 'sexist' and 'neither' in the hate-speech dataset make up the 'Hostile' and 'Others' class in our corpus respectively. Distribution of tweets in the combined corpus has been shown in Table 1. For creation of the Benevolent Sexist dataset, we collected a total of 95,292 tweets. Out of these, we manually identified 7,205 BS tweets (including retweets). This dataset is publicly available 3 . However, the total number of unique tweets identified, after removing retweets, were only 712 in number. The total number of tokens in the created dataset is 74,874. The mean length of BS tweets is 80.95, with a standard deviation of 25.75. The dataset also contains the metadata of each tweet, like username, time of creation of the tweet, it's geographic location, number of retweets and number of likes.

Data Collection
We collected data using the public Twitter Search API. The terms queried were common phrases and hashtags that are generally used when exhibiting benevolent sexism. Some of them were: 'as good as a man', 'like a man', 'for a girl', 'smart for a girl', 'love of a woman', '#adaywithoutwomen', '#womensday', '#everydaysexism' and '#weareequal'. These lead to a dataset of tweets that were sexist in nature, both towards women and men. E.g.: 'He is a man who can't act like a man' is sexist towards men. We extracted tweets that were in English. After we had manually identified benevolent tweets (explained is Section 3.2), we asked three 23-year old non-activist feminists to crossvalidate the collected unique tweets to remove any kind of annotator bias. Fleiss' kappa score was calculated to assess the reliability of the agreement between the validators. It was found to be 0.74 which corresponds to 'substantial agreement' between the annotators (Fleiss et al., 1969).

Identification
To identify and annotate BS, we made use of the ambivalent sexism theory proposed by Glick and Fiske (1997) in social psychology. Sexism is hypothesized to encompass three sources of male ambivalence: Paternalism, Gender Differentiation and Heterosexuality. Each of these three components have two types, one of them results in hostile sexism and the other gives rise to benevolent sexism.
• Paternalism: Paternalism encompasses dominative paternalism and protective paternalism. Supporters of the former hold the view of women not being fully competent adults (Brehm, 1992;Peplau et al., 1983); whereas those who support the latter, view women as the weaker sex who need to be loved, cherished and protected (Peplau et al., 1983;Tavris et al., 1984). Protective paternalism : What is man without the love of a woman! results in benevolent sexism whereas dominative paternalism results in hostile.
• Gender Differentiation: Akin to dominative paternalism, competitive gender differentiation justifies patriarchy in the society by viewing men as ones having governing capabilities in the society (Tajfel, 2010). This gives rise to hostile sexism. On the other hand, complementary gender differentiation results in benevolent sexism as it shows women having favourable traits that men stereotypically lack (Eagly and Mladinic, 1994).
• Heterosexuality: Similarly, heterosexual intimacy gives rise to benevolent sexism by viewing women as romantic objects with a genuine desire for psychological closeness (Berscheid et al., 1989); and heterosexual hostility is shown in cases where, for some men sexual attraction towards women may not be separate from the desire to dominate them (Bargh and Raymond, 1995;Pryor et al., 1995). This results in hostile sexism. Table 2 shows some example tweets that highlight the ambivalent sexist attitude towards women. In order to clearly identify benevolent sexism, we studied the tweets and analyzed if it showed any one the three behaviors: protective paternalism, complementary gender differentiation, and heterosexual intimacy. If the tweet exhibited any one of the above, we annotated it as benevolently sexist.

Comparison of Hostile and Benevolent Sexist Tweets
The statistical difference in the distribution of hostile and benevolent sexist tweets in the combined dataset can be determined from   Table 3 shows the most common content words used in hostile and benevolent tweets. Apart from the words, 'girl(s)' and 'women', which are frequent in both kinds of tweets (as sexism is commonly expressed against females), we see that content words with high frequency differ significantly.

Hostile
Benevolent kat and andre think like man sexist don't like act like man call sexist whatever act like lady sexist can't stand last love man blondes pretty faces first love woman dumb blondes pretty love like woman sexist hate female without love woman don't like female lady think like comedians aren't funny man love like Most frequent trigrams in hostile and benevo-lent tweets are shown in Table 4. As hypothesized, benevolent tweets have trigrams that express positive attitudes while trigrams of hostile tweets express explicit negative attitude. Table 5 illustrates the most frequent adjectives used for both hostile and benevolent tweets. We observe that frequent adjectives in HS tweets display a negative sentiment whereas adjectives in BS tweets display positive sentiment.
Hostile Benevolent dumb real hot strong bad beautiful stupid better awful great Table 5: Most frequent adjectives in HS and BS tweets.
All the above illustrations are in line with our hypothesis which states that sexism in the benevolent form is camouflaged as a compliment and is hence difficult to pinpoint; whereas, hostile sexism is evidently negative and can be easily identified as sexist.

Pre-processing
Pre-processing of tweets involved removal of usernames, punctuations, emoticons, hyperlinks/URLs and RT tag.
Stop words were intentionally retained. The reason for this was that each tweet can contain a maximum of only 140 characters and removal of stop words would only lead to loss of information. For example in the tweet, 'Every guy should admit that #adaywithoutwomen is not a day worth living', stop word removal would remove 'not' which as a result, would change a BS tweet to an HS tweet.

Methodology
For classification of tweets into one of the three classes: 'Benevolent', 'Hostile' and 'Others', we made use of machine learning techniques described below.

SVM
Support Vector Machines (SVM) (Cortes and Vapnik, 1995) are supervised learning models used for classification. To classify tweets in our dataset, we used term frequency-inverse document frequency (TF-IDF) (Salton and Buckley, 1988) as a feature, as it captures the importance of the given word in a document. TF-IDF is calculated as: where f (t, d) indicates the number of times term, t appears in context, d and N is the total number of documents; |{d D : t d}| represents the total number of documents where t occurs.
We ensure that SVM uses TF-IDF, to construct a separating hyperplane for given labelled training data and classify new tweets into one of the three classes: 'Benevolent', 'Hostile', or 'Others'. To find the optimal hyperplane, SVM tries to find a decision boundary that maximizes the margin by minimizing ||w||: where w is the weight vector, x is the input vector and b is the bias.

Sequence to Sequence model
A basic sequence-to-sequence model consists of an encoder and a decoder (Sutskever et al., 2014;. For our experiment, we made use of a bi-directional RNN encoder-decoder (Schuster and Paliwal, 1997) with attention mechanism  that employs Long Short Term Memory (LSTM) cells (Hochreiter and Schmidhuber, 1997) to modulate the flow of information. The encoder reads the input sequence and generates an intermediate hidden representation of fixed length, c o given by: where h t denotes the hidden representation of x t , α ot [0, 1] and t α ot = 1. A learned alignment model computes the weight, α o t, for each c o such that: computes h t . The decoder then maps the intermediate representation into either one of the 'Benevolent', 'Hostile' or 'Others' class by computing: where lengths of the output and the input are O and T respectively. The posterior probability, y o is calculated as: where y o is the vector representation of y o , i.e., a one-hot vector followed by neural projection layer for dimension reduction and g(.) is a softmax function.

FastText
FastText classifier, made available by Facebook AI Research has proven to be efficient for text classification . It is often at par with deep learning classifiers in terms of accuracy, and much faster for training and evaluation. FastText uses bag of words and bag of n-grams as features for text classification. Bag of n-grams feature captures partial information about the local word order. FastText allows update of word vectors through back-propagation during training allowing the model to fine-tune word representations according to the task at hand . The model is trained using stochastic gradient descent and a linearly decaying learning rate.

Experiments and Results
Experiments conducted for classification of tweets have been described below. We trained and tested our algorithm only on unique tweets to avoid learning any kind of bias from retweets. For evaluating the experiments, we use precision, recall and f-measure.

Polarity Detection
To detect the polarity of each tweet, we experimented with rule-based sentiment analysis techniques using linguistic features. First, using the Penn Treebank tagset (Marcus et al., 1993), all tweets were tagged for part-of-speech (POS). After this, we used the Stanford Shallow Parser (Pradhan et al., 2004) to chunk tweets and get all the phrases. We calculated the positive score and the negative score for each phrase in the tweet, using SentiWordNet (Baccianella et al., 2010) and subjectivity lexicon (Taboada et al., 2011). The overall sentiment score of a tweet was calculated by summing up the individual score of the phrases in the tweet. If this overall sentiment score of the tweet was greater than 0, then the tweet was marked as positive; if the overall sentiment score was less than 0, it was marked as negative; else the tweet was marked as neutral. Table 6 shows the results of the basic sentiment analysis of tweets.  Table 6: Sentiment Analysis of tweets in the dataset.

SVM
For the purpose of our experiment, we used TF-IDF as a feature for SVM to classify previously unseen tweets into one the three classes. We implemented SVM using scikit (Pedregosa et al., 2011) library. Table 7 shows the precision, recall and F1-score after performing 10-fold cross validation.

Sequence to Sequence model
The implementation of the described Sequence to Sequence model has been done using tf-seq2seq framework (Britz et al., 2017) for Tensorflow (Abadi et al., 2016). The experiment was conducted after splitting the training set and the test set in the ratio 7 : 3. For 1000 epochs, with a batch-size of 32, the precision, recall and F1-score have been shown in Table 7.

FastText
The training set and the test set were split in 7 : 3 ratio for FastText.

Discussion
Using basic linguistic features, rule-based polarity detection of tweets show that benevolent sexism have positive polarity whereas the tweets exhibiting hostile sexism have a negative polarity. This is in accordance with our hypothesis which states that benevolent sexism expresses a positive outlook, in contrast to hostile sexism that displays negative emotion. For the purpose of classification of tweets into 'Benevolent', 'Hostile' or 'Others' class, Support Vector Machines (SVM) and Sequence to Sequence (Seq2Seq) classifier were implemented for baseline experiments. In SVM, the precision for the 'Benevolent' and 'Hostile' class is unusually high whereas the recall, specifically for the 'Hostile' class, is quite low. This implies that only 69% of BS tweets and 33% of HS tweets of the previously unseen test set have been labelled correctly. On comparing this with the results of Seq2Seq model, we observe that although the precision for classification of tweets into 'Benevolent' and 'Hostile' is not as high as that of SVM, the recall is 77% and 65% respectively for the two classes, which is better than the recall achieved using SVM. Seq2Seq takes into account the structure of the tweet, unlike the TF-IDF feature used in SVM, which is invariant to word order. This results in better recall.
The number of tweets in 'Others' class is significantly more than the number of tweets in 'Hostile' and 'Benevolent' classes combined. The performance of SVM and Sequence to Sequence models is known to improve, as the size of varied training data increases. This is further reflected in the high precision, recall and the comparable F1score achieved for the 'Others' class using the two models.
Overall, SVM gives a slightly better F1-score for 'Benevolent' and 'Others' class, whereas Sequence to Sequence classifier performs better for 'Hostile' class. FastText outperforms both the above classifiers, with an F1-score of 0.87 for Prec@1. Since, a tweet has limited number of characters and may not exhibit long range dependencies, the word order of a tweet is successfully captured by FastText, by using it's bag of n-gram feature. This, combined with the fact that FastText has lesser number of parameters to tune, results in it's better performance than the proposed Seq2Seq model.

Conclusion and Future Work
We presented a detailed analysis for detection and classification of sexism in twitter data by building a combined corpus of benevolent and hostile sexist tweets. Using ambivalent sexism theory, we annotated tweets that showed sexism in the benevolent form. A limitation of our approach was that the method of gathering benevolently sexist tweets was biased towards the initial search terms and likely missed many forms of benevolent sexism. In future, we aim to address this concern by increasing the size of the dataset, using the aforementioned ambivalent sexism theory, while additionally solving the issue of the comparatively lesser number of unique benevolently sexist tweets. We also plan to take into consideration the gender of the user, the geographic location of a tweet and its length as features for future experiments.
Apart from understanding and identifying various kinds of sexism, the created dataset can additionally be used to recognize and analyze the events that trigger sexism online. The methods described can also be used in contexts outside of social media, such as within workplace communications as means for automated assessment and eventual intervention. While the problem is far from solved, our experiments can be treated as a baseline for future work.
Our work is a step towards building a genderneutral society. The insights derived from the analysis and experiments presented in this paper may prove beneficial in understanding the prevalence of ambivalent sexism in social-media data and serve as a starting point for more work in this field.