SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories

With the recent rise of #MeToo, an increasing number of personal stories about sexual harassment and sexual abuse have been shared online. In order to push forward the fight against such harassment and abuse, we present the task of automatically categorizing and analyzing various forms of sexual harassment, based on stories shared on the online forum SafeCity. For the labels of groping, ogling, and commenting, our single-label CNN-RNN model achieves an accuracy of 86.5%, and our multi-label model achieves a Hamming score of 82.5%. Furthermore, we present analysis using LIME, first-derivative saliency heatmaps, activation clustering, and embedding visualization to interpret neural model predictions and demonstrate how this helps extract features that can help automatically fill out incident reports, identify unsafe areas, avoid unsafe practices, and ‘pin the creeps’.


Introduction
The hashtag #MeToo 1 has been prevalent on various social media platforms as a campaign centered around sharing stories of sexual harassment in an act of solidarity with other victims and spreading awareness of a widespread and endemic issue. With vast amounts of personal stories on the internet, it is important that we make scientific use of this data to push these movements forward and enable real-world change. Manually sorting and comprehending the information shared in these stories is an arduous task, and the power of natural language processing (NLP) can serve as the missing link between online activism and real change.
We present several neural NLP models that allow us to automatically classify, aggregate, and analyze vast amounts of harassment data found on social media, becoming an effective tool for 1 https://metoomvmt.org "Was walking on the street, a guy stands leaning on the gate of his home and whistles and calls out to me the whole time I cross his home. I take that road daily, except Sunday, to go to music classes. This went on for a month and I finally quit music classes. I was 13 then." spreading awareness, increasing understanding, and allowing faster action. This large-scale automatic categorization, summarization, and analysis of personal abuse stories can help activist groups enlighten the public and advocate for social change in a timely manner.
We present single-label and multi-label classification of diverse forms of sexual harassment present in abuse stories shared online through the forum SafeCity, a crowd-sourcing platform for personal stories of sexual harassment and abuse. Each story includes one or more tagged forms of sexual harassment, along with a description of the occurrence. For example, the description "My college was nearby. This happened all the time. Guys passing comments, staring, trying to touch. Frustrating" is positive for three classes: commenting, ogling/staring, and touching/groping.
We use CNN-RNN architectures (with character-level CNN embeddings and bidirectional RNNs) to classify the three forms of sexual harassment mentioned above using both singleand multi-label setups. Our models achieve strong performances of 80-86% on these setups. This automatic classification of different forms of sexual harassment can help victims and authorities to partially automate and speed up the process of filling online sexual violence reporting forms (see Figure 1), which usually requires the victim to detail each form of sexual harassment that took place. The act of partially filling out the report (by our classifier) in itself makes it more likely for the victim to file a report. A study by the Bureau of Justice found that victims who report sexual assault are more likely to seek medical treatment for injuries, which also allows for more immediate prosecution and a better chance of finding DNA evidence to convict the offender (Rennison, 2002). Further, it can also be used to fulfill the need to automatically categorize and summarize large numbers of online testimonials describing or reporting sexual harassment.
Next, in order to further utilize these stories as an important tool for harassment understanding and to help prevent similar situations from happening to others, we present interpretability analysis of our neural classification results in the forms of LIME analysis, first-derivative saliency heatmaps, activation clustering, and t-SNE embedding visualization. We show how these analysis techniques hold promise as avenues for future work and can potentially provide insightful clues towards building (1) a tool to analyze the most common circumstances around each distinct form of harassment to provide more detailed and accurate safety advice, (2) a map of unsafe areas to help others avoid dangerous spaces, and 3) an unofficial sex offender registry that marks frequentlymentioned offenders to warn potential victims. This paper seeks to provide an avenue to utilize the millions of stories shared on social media describing instances of sexual harassment, including #MeToo, #WhyILeft, and #YesAllWomen. With this task and analysis, we hope that these stories can be used to prevent future sexual harassment.

Related Work
Analyzing personal sexual harassment stories from online social forums is fairly unexplored, to the best of our knowledge. However, recent works in a similar vein include detecting the presence of domestic abuse stories on social media sites (Schrading et al., 2015a;Schrading, 2015;Schrading et al., 2015b). In more distantly related work, NLP has been used for various sociallydriven tasks, such as detecting the presence of cyberbullying or incivility (Ziegele et al., 2018;Founta et al., 2018;Chen et al., 2012;Zhao et al., 2016;Agrawal and Awekar, 2018;Van Hee et al., 2018), and detecting and providing aid for signs of depression or suicidal thoughts (Pestian et al., 2010;Yazdavar et al., 2017;Stepanov et al., 2017;Fitzpatrick et al., 2017).

Classification Models
For our single-label binary classification task, the two output classes can be [commenting, noncommenting], [ogling, non-ogling], or [groping, non-groping]. For our multi-label scenario, there are a total of 8 combinations (true or false for three types of sexual harassment), including a label for none of the three classes present in the description. CNN: For each input description, an embedding and convolutional layer are applied. This is followed by a max-pooling layer (Collobert et al., 2011). Filters of varying window sizes are applied to each window of word vectors, the result of which is then passed through a softmax layer to produce probabilities over the output classes. LSTM-RNN: As CNNs are not designed to capture sequential relationships (Pascanu et al., 2014), we adopted an RNN model that consisted of word vectors fed into LSTM layer, the final state of which was fed into a fully-connected layer. The result is passed through a softmax layer to output the probability over all output classes. CNN-RNN: As both models have strengths and weaknesses, we experimented with a hybrid architecture in which our LSTM-RNN model after the embedding layer is laid on top of our CNN model before the max-pooling (related to Zhou et al. (2015)). For single-label models, the final fully-connected layer is fed into a softmax to give final output probabilities.

Multi-Label Classification
We also present multi-label classification (Boutell et al., 2004;Tsoumakas and Katakis, 2006;Katakis et al., 2008), which allows for models to predict multiple categories simultaneously for the same input. We further utilized CNN-based character embeddings in addition to word embeddings, and also employed bidirectional RNNs (see Figure 2). The outputs of the final fully-connected layer (F) are fed into a sigmoid function. The classification for each category (C) are seen as positive (1) if the output is above threshold t and negative (0) if the output is below threshold t, a hyperparameter, giving the equation: C = 1(σ(F ) ≥ t).

Dataset
SafeCity 2 is, to the best of our knowledge, the largest publicly-available online forum for reporting sexual harassment. Its motto is "pin the creeps". Victims of sexual harassment share personal stories, with the objective of spreading awareness of ongoing sexual harassment and showcasing location-based trends. The language styles of SafeCity forums are very diverse, and therefore can potentially be used for a variety of test cases, such as emails or tweets.
Each of the 9,892 stories includes a description of the incident, the location, and tagged forms of harassment, with all identifying information removed. SafeCity has explicitly given us permission to use this data. The dataset 3 contains descriptions of text submitted by forum users, along with tags of 13 forms of sexual harassment. We chose the top three most dense categories-groping/touching, staring/ogling, and commenting-to use as our dataset, as the others were more sparse. Each description may fall into none, some, or all of the categories.

Evaluation
The single-label models were evaluated using accuracy. The multi-label models were evaluated using exact match ratio and Hamming score (calculated as the complement of Hamming loss). Hamming loss was used as detailed by Tsoumakas and Katakis (2006). Hamming loss (y) is equal to 1 over |D| (number of multi-label samples), multiplied by the sum of the symmetric differences between the predictions (Z) and the true labels (Y), divided by the number of labels (L), giving

Training Details
All models have vocabulary size of 10, 000, and use AdamOptimizer (Kingma and Ba, 2015) with a learning rate of 1e −4 . All gradient norms are clipped to 2.0 (Pascanu et al., 2013;Graves, 2013

Results
See Table 1 for single-label results on the selected harassment categories, where CNN-RNN was the best performing model compared to several nonneural and neural baselines. See Table 2 for multi-label classification results, where the Hamming score for the multi-label CNN-RNN model is 82.5%, showing potential for real-world use as well as substantial future research scope.

Analysis
We provide various visualization techniques to analyze our models. Each of these techniques employs a different approach and offers new information or supports previous findings.
A man standing too close to me in a semicrowded metro continued to touch me indecently till pushed away.

True label: Groping Predicted: Groping
The guy at first was staring at me and later started passing cheap comments.

Word Embedding Visualization
We selected seed words that corresponded to class labels and found the nearest neighbors of each seed word's vector by reducing the dimensionality of the word embeddings using t-SNE (see Table 3) (Maaten and Hinton, 2008). This form of visualization not only ensures that our model has learned appropriate word embeddings, but also demonstrates that each form of sexual harassment has a unique and distinct context. Furthermore, this shows that our model learns related words and concepts for each type of harassment.
6.2 LIME Analysis LIME analysis (Ribeiro et al., 2016), or Local Interpretable Model-Agnostic Explanation, interprets the local reasoning of a model around an instance. Results of LIME (ξ) are found by taking the minimum of L, which is the measure of how unfaithful the interpretable model (g) is to approximating the probability that an input (x) belongs to a certain class (f ) in the locally defined area (π x ) summed with complexity measures Ω, giving ξ(x) = argmin L(f, g, π x ) + Ω(g). In Figure 3 (left), the words "touch", "man", and the collective words "indecently till pushed away" are the most important to the local classification of "groping". Furthermore, the word "metro" has importance in the classification, suggesting that this may be a fre-quent location in which groping takes place. In Figure 3 (middle), the words with the most importance are "comments" and "staring", indicating that ogling may coincide with commenting very frequently. In Figure 3 (right), the words "ogling", "sexual", and "commenting" had the most importance, which further supports the notion that ogling and commenting often occur together. As verified by the data, ogling and commenting together is more common than ogling alone.

First Derivative Saliency
Saliency heatmaps (Simonyan et al., 2014;Li et al., 2016) illustrate which words of an input have the biggest impact on the final classification by taking the gradient of the final scores outputted by the neural network (S) with respect to the embedding (E), given the true label (L), giving ∂E . While LIME analysis and first derivative saliency are both used to find word-level contributions, first derivative saliency is model-dependent and gives reasoning behind classification based on the whole model, in contrast to the locally-faithful, model-agnostic LIME analysis technique.
In Figure 4 (left), the word "commenting" and the words "one boy" have the most influence on the classification. The influence of the word "lighting" indicates poor lighting is often present in situations where sexual harassment takes place. In Figure 4 (middle), the classification of "commenting" was most influenced by the word "commenting", followed by the word "age". This suggests the possibility of using descriptors of offenders as a classification tool. Figure 4 (right) is an incorrectly classified example. We see that the word "body", followed by "language", had the most influence on the classification of this exam-ple as "commenting". Our model identifies synonyms and hyponyms like the word "language" in relation to the category of commenting. However, the true label was "non-commenting", as the word was not used in a context of sexual language, but rather as "vague language" and "body language".

Activation Clustering
Activation clustering (Girshick et al., 2014;Aubakirova and Bansal, 2016) accesses the activation values of all n neurons and treats the activation values per input as coordinates in ndimensional space. K-means clustering was performed to group activation clusters and find common themes in these reports. Activation clustering is distinct from both LIME analysis and first derivative saliency in that it finds patterns and clusterings at a description-level. Circumstances of Harassment: One of the clusters was classified as "ogling": {'a group of boys was standing near us and were making weird expressions and as we moved away they started following'; 'a group of guys lurking around the theater...'}. Another cluster was classified as "commenting": {'a group of men were standing who commented on every girl who passed by the', 'a group of boys were standing there... as we started moving one of them commented on us'} Both of these clusters contained examples describing circumstances of the harassment, following the pattern of "a group of boys/men were standing/lurking and..." It can be inferred that certain forms of sexual harassment are more likely to happen with large groups of men. Activation clustering can identify the circumstances of harassment, helping potential victims to be better prepared. Location and Time of Harassment: Some clusters contain examples that point to specific locations of harassment, e.g., a groping cluster: {'i was in the bus and there was this man who purposely fell on me and touched me inappropriately'; 'while traveling in a crowded bus most of the time men try to grind their intimate part over my body'; 'i was in the bus when a man standing tried to put his d**k on my hand'}. Specific locations can also be found: {'the gurgaon sohna road is very unsafe at night especially if you are alone with no street lights'; 'kurla station really gets scary at night once i was trying to get a train from kurla station around 10'; 'mathura highway , not enough lights on the way during nights so is not safe for a individual to journey'}. Notice that the second cluster examples also contain the word "night". With data that contains more specific locations or times of day, activation clusters can serve as an automatic way to map out unsafe areas based on location and time of day. Identifying Offenders: Examples from another groping cluster include: {'...her step father abused her physically for a year'; 'one of the girl of about 6 years got raped by her own father'; 'it happened at my house my brother harassed me and also misbehaved with me one night its been six months'}. This shows that clusters can point to common relationships or titles for offenders. This phenomenon can be presumed to happen with names of offenders as well. If many reports have been filed around this offender, clusters will form around his/her name. Instead of a case of "he said, she said", activation clustering provides an avenue towards "he said, they said", as clusters form when multiple reports have been filed around the same name.
The main purpose of our visualization techniques is to explain what the black-box deep learning models are learning, such as locations, offenders, or times of day. With more detailed data in the future, we may be able to uncover more nuanced circumstances behind harassment.

Conclusion
We presented the novel task of identifying various forms of sexual harassment in personal stories. Our accurate multi-label classification models illustrate the plausibility of automatically filling out incident reports. Using visualization techniques, we found circumstances surrounding forms of harassment and the possibility of automatically identifying safe areas and repeat offenders. In future work, we hope to experiment with the transferability of our model to other datasets to encompass the diverse mediums through which these personal stories are shared. Honoring the courage that these victims demonstrated in sharing their stories online, we use these descriptions not only to help summarize online testimonials and provide more detailed safety advice, but also to help others report similar occurrences to hopefully prevent future sexual harassment from occurring.