HENIN: Learning Heterogeneous Neural Interaction Networks for Explainable Cyberbullying Detection on Social Media

In the computational detection of cyberbullying, existing work largely focused on building generic classifiers that rely exclusively on text analysis of social media sessions. Despite their empirical success, we argue that a critical missing piece is the model explainability, i.e., why a particular piece of media session is detected as cyberbullying. In this paper, therefore, we propose a novel deep model, HEterogeneous Neural Interaction Networks (HENIN), for explainable cyberbullying detection. HENIN contains the following components: a comment encoder, a post-comment co-attention sub-network, and session-session and post-post interaction extractors. Extensive experiments conducted on real datasets exhibit not only the promising performance of HENIN, but also highlight evidential comments so that one can understand why a media session is identified as cyberbullying.


Introduction
In recent years, cyberbullying has become one of the most pressing online risks among youth and raised serious concerns in society. Cyberbullying is commonly defined as the electronic transmission of insulting or embarrassing comments, photos or videos, as illustrated in Figure 1. Harmful bullying behavior can include posting rumors, threats, pejorative labels, and sexual remarks. Research from the American Psychological Association and the White House has revealed more than 40% of young people in the US indicate that they have been bullied on social media platforms (Dinakar et al., 2012). Such a growing prevalence of cyberbullying on social media has detrimental societal effects, such as victims may experience lower self-esteem, increased suicidal ideation, and a variety of negative emotional responses (Hinduja and Patchin, 2014). Therefore, it has become critically important to be able to detect and prevent cyberbullying on social media. Research in computer science aimed at identifying, predicting, and ultimately preventing cyberbullying through better understanding the nature and key characteristics of online cyberbullying.
In the literature, existing efforts toward automatically detecting cyberbullying have primarily focused on textual analysis of user comments, including keywords Nahar et al., 2013;Nand et al., 2016) and sentiments analysis (Dani et al., 2017). These studies attempt to build a generic binary classifier by taking highdimensional text features as the input and make predictions accordingly. Despite their satisfactory detection performance in practice, these models largely overlooked temporal information of cyberbullying behaviors. They also ignore user interactions in social networks. Furthermore, the majority of these methods focus on detecting cyberbullying sessions effectively but cannot explain "why" a media session was detected as cyberbullying. Given a sequence of comments with user attributes, we think sequential learning can allow us to better exploit and model the evolution and correlations among individual comments. Besides, graph-based learning can enable us to represent and learn how users interact with each other in a session.
This work aims to detect cyberbullying by jointly exploring explainable information from user comments on social media. To this end, we build an explainable cyberbullying detection framework, HEterogeneous Neural Interaction Networks (HENIN), through a coherent process. HENIN consists of three main components that learn various interactions among heterogeneous information displayed in social media sessions. A comment encoder is created to learn the representations of user comments through a hierarchical self-attention neural network so that the semantic and syntactic cues on cyberbullying can be captured. We create a postcomment co-attention mechanism to learn the interactions between a posted text and its comments. Moreover, two graph convolutional networks are leveraged to learn the latent representations depicting how sessions interact with one another in terms of users, and how posts are correlated with each other in terms of words.
Specifically, we address several challenges in this work: (a) how to perform explainable cyberbullying detection that can boost detection performance, (b) how to highlight explainable comments without the ground truth, (c) how to model the correlation between posted text and user comments, and (d) how to model the interactions between sessions in terms of users, and the interactions between textual posts in terms of words. Our solutions to these challenges result in a novel framework HENIN.
Our contributions are summarized as follows. (1) We study a novel problem of explainable cyberbullying detection on social media. (2) We provide a novel model, HENIN 1 , which jointly exploits posted text, user comments, and the interactions between sessions and between posts to learn the latent representations for cyberbullying detection.
(3) Experiments conducted on Instagram and Vine datasets exhibit the promising performance of HENIN, and the evidential comments and words highlighted by HENIN, for detecting cyberbullying media sessions with explanations.

Related Work
Relevant studies can be categories into social contexts-based and user comment-based approaches. Social contexts-based approaches utilize three categories of features, user-based, post-1 The Code of HENIN model is available at: https:// github.com/HsinYu7330/HENIN based, and network-based. (a) Post-based features rely on text analysis to identify cyberbullying evidences (e.g., profane words) on social media Nahar et al., 2013;Nand et al., 2016). Xu et al. (2012) point out Latent Semantic Analysis(LSA) and Latent Dirichlet Allocation (LDA) can be used to learn latent representations of posts. In addition, SICD (Dani et al., 2017) further models post sentiments for cyberbullying detection. (b) User-based features are extracted from user profiles to measure their characteristics. Gender-specific features, user's past posts, account registration time, and frequently-used words are useful user-based features Dadvar et al., 2013). (c) Existing studies (Cheng et al., 2019b;Tu et al., 2018;Wang et al., 2017) also prove that network-based features are effective in detecting cyberbullying. These features are learned by constructing propagation networks or interaction networks that depict how posts are spread and how users interact with each other. User comment-based approaches utilize the sequence of user comments to detect cyberbullying of the source post. CONcISE (Yao et al., 2019) is a sequential hypothesis testing method conducted on the comment sequence to select the significant comment features. Raisi and Huang (2018) detect harassment-based cyberbullying by identifying expert-provided key phrases from user comments.

Problem Statement
Let S = {s 1 , s 2 , ..., s M } denote a corpus of M social media sessions. Each media session contains the posted text and its subsequent comments. Let P be a posted text, consisting of N words .., c T } be a set of T comments related to the post P , where each comment c j = {w j 1 , w j 2 , ...w j Q j } contains Q j words. Let G ss = (V S , E S ) be a session-session weighted graph, in which we consider each media session as a node s ∈ V S and the similarity between sessions as an edge weight e (s i ,s j ) ∈ E S . Let G pp = (V P , E P ) be a post-post weighted graph, in which we consider each posted text as a node p ∈ V P and the similarity between posts as an edge weight e (p i ,p j ) ∈ E P . We treat the cyberbullying detection problem as the binary classification problem, i.e., each media session is associated with a binary label y = {0, 1} with 1 representing a bullying session, and 0 representing a non-bullying session. At the same time, we aim to learn a rank  list RC from all comments in {c j } T j=1 , according to the degree of explainability, where RC k denotes the k th most explainable comment. The explainability of comments denotes the impact degree of detecting the media session is cyberbullying or not. Formally, we can represent the problem as Explainable Cyberbullying Detection.
Problem: Given a posted text P , a set of related comments C, the session graph G ss and the post graph G pp , the goal is to learn a cyberbullying detection function f : f (P, C, G ss , G pp ) → (ŷ, RC), such that it maximizes the prediction accuracy with explainable comments ranked highest in RC.

The proposed HENIN Model
In this section, we present the details of the proposed HENIN, which jointly learns the hierarchical self-attention and graph convolutional neural networks for cyberbullying detection. It consists of four major components ( Figure 2): (1) a comment encoder (including word-level and sentence-level), (2) a post-comment co-attention mechanism, (3) session-session and post-post interaction extractors, and (4) a cyberbullying prediction component.
The comment encoder component depicts the modeling from the comment linguistic features to latent representation features through hierarchical word-level and sentence-level self-attention net-works. The explainability degree of comments is learned through the attention weights within sentence-level self-attention learning. The postcomment co-attention mechanism is performed in the level of word embeddings. The mutual interactions between the posted text and comments can be learned through the post-comment co-attention. On the other hand, the session-session interaction extractor and the post-post interaction extractor aim at modeling how users interact across media sessions, and how words are correlated across posts, through two graph convolutional neural networks. Finally, the cyberbullying prediction is made by concatenating the representations of the aforementioned three components.

Comment Encoding
A set of comments related to the given media session contains linguistic cues at the word and sentence levels. Textual usages in comments provide different degrees of importance for explainability of why the session is detected as cyberbullying. For example, in a cyberbullying media session extracted from the Instagram dataset (see Section 5.1), the comment "how the fuck are you even a fucking fan you cunt if you just talk shit about harry fuck you kaitlyn!", the words "fuck" and "shit" contribute more signals to reflect apparent and evidential emotion sense, compared to other ones. Meanwhile, this comment strongly expresses malicious remarks to someone, and therefore it is not only more explainable but also useful to determine whether it is a cyberbullying session.
Several studies have shown that improved document representations with highlighting important words and sentences for classification can be learned by hierarchical attention neural networks Cheng et al., 2019a). Inspired by , we adopt a hierarchical neural network to model word-level and sentence-level representations through selfattention mechanisms. Specifically, we first learn the comment embedding vector by utilizing the word encoder with self-attention. Then we learn the comment representations through the sentence encoder with self-attention.
Word Encoder. Given a comment c j with m words, we first embed the words to a latent space via the pre-trained word2vec model (Mikolov et al., 2013). Then we capture words' contextual relations among comments by calculating scaled dot-product attention (Vaswani et al., 2017). Specifically first, let word embeddings as input vectors x i . The query vector sequence q i , the key vector sequence k i , and the value vector sequence v i can be obtained by linear transformation, i.e., w v are the learnable parameters through the networks. Next we compute the dot products of the query with all keys, divide each by √ d k (d k is the dimension of keys), and apply a softmax function to obtain the attention weights on the values: , where a i is an attention weight vector that measures the importance of each word in the comment. Finally, each word's hidden representation can be obtained by computing the dot products of attention weights a i and the value vector sequence v i . We take the average of the learned representations to generate the comment vector c j , given by: Sentence Encoder. Similar to the word encoder, we utilize the scaled dot-product attention to encode each media session. The aim is to capture the context information at the sentence level, and to generate the media session representation of post P i , denoted by s i , from the learned comment embedding vectors {c 1 , c 2 , ..., c k }. Every post's sentence embedding s will be used as features for cyberbullying prediction.

Post-Comment Co-attention Mechanism
To model the interaction between posted text and comments, we propose a post-comment coattention mechanism that learns the semantic wordlevel correlation between posted text and comments. That said, we intend to simultaneously learn and derive the attention weights of words on posted text and comments. Specifically first, similar to comment encoding, word embeddings of a posted text are obtained by a pre-trained word2vec model. We adopt recurrent neural networks with bidirectional gated recurrent units (GRU) to model word sequences from both directions of words. The bidirectional GRU contains the forward GRU − → f that reads posted text p i from word w i 1 to w i m and the backward GRU ← − f that reads posted text p i from word w i m to w i 1 , given by: 1}). We obtain the embedding of word p i t in a posted text by concatenating its forward and backward hid- Then we can construct the feature matrix of words of posted text P = [p 1 , ..., p N ]. Similarly the feature matrix of comments C = [c 1 , ..., c T ] can be derived.
The proposed co-attention mechanism attends to the posted text words and the comment simultaneously. By extending the co-attention formulation (Lu et al., 2016;Cui et al., 2019), we first compute the affinity matrix L ∈ R T ×N : L = tanh(C W l P), where W l is a matrix of learnable weights. The affinity matrix L is used to transform the comment attention space to the posted text attention space, and vice versa for L . As a result, we can consider the affinity matrix as a feature matrix, and learn to predict the posted text and comment attention maps H p and H c , as follows: H p = tanh(W p P + (W c C)L) , and H c = tanh(W c C + (W p P)L ), where W p , W c are the matrices of learnable parameters. The attention weights of posted text and comments, a p and a c , can be obtained by: a p = softmax(w hp H p ), a c = softmax(w hc H c ), where w hp and w hc are vectors of learnable weight parameters. Based on the above attention weights, the posted text and comment attention vectors are obtained by calculating the weighted sum of the posted text features and comment features via: wherep andĉ are the learned features vectors for posted text and comments, respectively, through the co-attention mechanism.

Interaction Extractors
To learn and represent the potential interactions between two sessions as well as two text posts, we utilize multilayer neural networks that operate on graph data based on the layers of graph convolutional networks (GCN) (Kipf and Welling, 2016). GCN is able to induce embedding vectors of nodes based on features of their neighborhoods. We create two multi-layer GCNs to learn the embeddings of the given session s i and its posted text P i from the session-session graph G ss and the post-post graph G pp , respectively.
Session-session Interaction Extractor. Let X = (x 1 , x 2 , ..., x n ) ∈ R n×p be the vectors of user participation in all sessions, where n is the number of all sessions and p is the number of users. Each vector x i is a multi-hot encoding that depicts how session s i is participated by all users. Let matrixR ss be the representations of all sessions learned from the session-session graph G ss = (X, A), where A ∈ R n×n encodes the pairwise relationships (such as cosine similarity, which is used by default) between sessions. We exploit GCN to learnR ss . GCN contains one input layer, several propagation layers, and the final output layer (Kipf and Welling, 2016). At deeper layers, the nodes indirectly receive more information from farther nodes in the graph. Given the input feature matrix X (0) = X and the graph structure matrix A, GCN performs the layer-wise propagation in hidden layers via X (k+1) = ρ(ÂX (k) W (k) ), where k = 0, 1, ..., K −1 and W (k) is the matrix of learnable parameters in the k-th layer. ρ is a non-linear activation function, such as ReLU, and X (k+1) denotes the activation output in the k-th layer.Â is the normalized symmetric adjacency matrix,Â = Finally, the graph representationsR ss = [r ss ] can be obtained from the output layer that uses softmax as the activation function.
Post-post Interaction Extractor. Similar to session-session interaction extractor, we depict each posted text in the graph G pp as a real-valued vector x i by using the word embedding vector of post P i as the initial feature. By performing GCNs as aforementioned, we can derive the graph representations of all posts, denoted byR pp = [r pp ].

Cyberbullying Prediction
By concatenating the sentence embedding vector s, the post-comment co-attention feature vectorŝ p andĉ, the session interaction representation r ss , and the post interaction representationr pp , we generate the prediction via a fully-connected layer, given by:ŷ = σ([p,ĉ, s,r ss ,r pp ]W f +b f ), whereŷ is the predicted probability vector indicating the predicted probability of label 1 (i.e., cyberbullying). W f and b f are the learnable parameters and biases. σ is the sigmoid function. y ∈ {0, 1} denotes the ground-truth label of media sessions. The goal is to minimize the cross-entropy loss function: where Θ denotes all parameters of the network. The parameters in the network are learned through the Adam optimizer (Kingma and Ba, 2014), which is an adaptive learning rate method that uses estimations of first and second moments of gradient to adapt the learning rate for each weight of the neural network. We choose Adam since it is generally regarded as being fairly robust and effective to the choice of the hyperparameters, and it is widely used for training neural networks.

Experiments
We aim to answer the following evaluation questions. EQ1: Can HENIN improve the cyberbullying media session classification performance? EQ2: How effective is each component of HENIN? EQ3: Is HENIN able to perform accurate early detection of cyberbullying sessions? EQ4: Can HENIN highlight comments that can explain why a media session is detected as cyberbullying?

Datasets and Settings
We use two social media datasets whose statistics is shown in Table 1. One is Instagram dataset , which contains image description and user comments. The other is Vine , which is a mobile application website that allows users to record and edit a few seconds looping videos. The texts of both datasets are in English.
We compare our HENIN model with several methods, including classification models such as Logistic Regression (LR)  and Random Forest (RF) . We collect posted text and all related comments of the session as a document to embed the session to a latent space via pre-trained doc2vec model (Le and Mikolov, 2014). Then we leverage the session representations as input features to train LR and RF classifiers. In addition, we also compare HENIN with three end-to-end deep learning models, including RNN, GRU, and GRU with attention GRU+A. We also compare HENIN with a recent advance CONcISE (Yao et al., 2019), which has a sequential hypothesis testing-based mechanism to produce timely and accurate detection of cyberbullying. For a fair comparison with CONcISE, we follow their settings by using their suggested key terms: "ugly", "shut", "suck", "gay", "beautiful", "sick", 'bitch", 'work", "hate", and "fuck." We provide the hyperparameter settings to enable the reproducibility.

Cyberbullying Detection Performance
To answer EQ1, we first compare our HENIN with baseline methods. To evaluate the performance of cyberbullying detection methods, we use the following metrics, which are commonly used to evaluate classifiers: Accuracy (Acc), Precision (Pre), Recall (Rec), and F1-Score (F1). To have the experiments be more robust and reliable, we randomly choose 80% of media sessions for training and the remaining 20% for testing. We repeat the process 5 times, and report the average values. The results are shown in Table 2. We can find that the proposed HENIN consistently outperforms the competing methods across two datasets on Accuracy, Recall, and F1, i.e., except for the metric of Precision. Although RF and RNN lead to higher scores in Precision in Instagram and Vince datasets, respectively, their performance in other metrics is not stable. It is also worthwhile to notice that models considering attention mechanisms, i.e., HENIN and GRU+A, tend to produce better performance. This implies the importance of modeling contextual correlation and contribution at either word or sentence level on the detection of cyberbullying.

Ablation Analysis for HENIN
To answer EQ2, we further investigate the effect of each component in the proposed HENIN model. We aim at evaluating the following reduced variants of HENIN. The results are shown in Figure 3. The ablation analysis of HENIN brings two insights. First, all of the three components (i.e., comment encoder, session-session and post-post interactions, and posted text-comment co-attention) contribute apparently to the performance improvement. Second, When the model without considering the representations learned from session and post interactions, the performance reduces 14% and 9.6% in terms of F1-Score and Accuracy metrics on Instagram, and 30.7% and 6% on Vine. In other words, "-G" models hurt the performance most. The results suggest that modeling interactions between sessions and between posts through GCNs in HENIN is important.

Early Detection of Cyberbullying
To answer EQ3, we examine whether HENIN can accurately detect cyberbullying sessions at early stages. In other words, we aim to understand how a model performs given only a partial proportion of observed comments. Here we choose GRU as the baseline for comparison. Specifically, for each media session, we sort all comments by response time, then choose various fractions of comments into the training and testing sets. We utilize Pre-cision@k and Accuracy as the evaluation metrics, where k = 10. The results are shown in Figure 4 and Figure 5. From the figures, we can see that, our proposed HENIN can achieve much better performance when the observed comments are quite a few (i.e., the fraction of comments is low than 40%). In contrast, GRU model needs at least 50% comments on both datasets to obtain the same good performance as HENIN. In short, we prove that HENIN is able to produce quite accurate early detection of cyberbullying sessions.

Explainability and Case Study
Explainability. To answer EQ4, we evaluate the performance of the explainability of our HENIN model from the perspective of comments. We choose GRU+A as the baselines for comment explainability since it can learn attention weights for comments as a kind of explainability. Specifically, we want to see if the top-ranked explainable comments determined by our HENIN are more likely to be related to the major contexts in cyberbullying media sessions. We randomly choose 10 media sessions, which contains at least 20 but not more than 50 comments, to evaluate the explainability ranking list of the comment RC. Then we denote the ground-truth ranking list by rating the explainability score from {0, 1, 2, 3, 4} for each comment, where 0 means "not explainable at all", 1 means "not explainable", 2 means "neutral", 3 means "somewhat explainable", and 4 means "highly explainable (highly malicious)." We invite three domain experts to perform the ground-truth ratings for every comment. The average rating scores are used to generate the ranking list. Therefore, for each media session, we have two lists of top-k comments, L (1) = {L (1) lected using the comment attention weights from high to low. To estimate the rank-aware explainability of comments, we utilize Normalized Discounted Cumulative Gain (NDCG) (Järvelin and Kekäläinen, 2002) and Precision@k as the evaluation metrics. We empirically set k = 10.
The results are shown in Figure 6, where media sessions are sorted by the discrepancy in the metrics between two methods, i.e., NDCG@k(HENIN)−NDCG@k(GRU+A), in a descending order. From the figures, we can have two observations. First, among 10 Vine media sessions, HENIN obtains higher precision scores than GRU+A for 6 cases. The overall mean precision scores over 10 cases for HENIN and GRU+A are 0.51 and 0.41, respectively. Second, similar results can be found on NDCG scores. HENIN is superior to GRU+A on 7 cases, and two cases have equal NDCG scores. The overall mean NDCG scores over 10 cases for HENIN and GRU+A are 0.57 and 0.36, respectively. These results demonstrate that the attention weights of HENIN are able to highlight more evidential comments than GRU+A, and its explainability can be verified.
Case Study. We further demonstrate the explainable comments that HENIN correctly ranks high  but GRU+A misses. These cases are presented in Figure 7. We can find that: (1) our HENIN can rank more evidential comments higher than non-explainable comments. For example, the top-1 comment "What a bitch tell him to hmu and ill kill his bitch ass for hitting a woman" contains explicit vulgar and malicious texts that can explain why this media session detected as cyberbullying. (2) We can give higher attention weights to explainable comments than those neutral and unrelated comments. For example, the unrelated comment "Court-dawg Jimecia Bandy Donishia Phillips" has an attention weight 0.070, which is lower than an explainable comment "if a bitch hit a nigga wit a object damn right we gon retaliate" with attention weight 0.219. Therefore, the latter comment is selected to be a more important evidence for cyberbullying prediction. In short, HENIN is able to not only accurately detect cyberbullying sessions, but also highlight evidential comments as explanations.

HENIN Hyperparameter Analysis
Since we have shown that the graph-based interactions between sessions and between posts have a great impact on the detection (Section 5.3), we further aim to investigate how different hyperparameters of GCNs affect the performance. Here we study two hyperparameters. One is the number of GCN layers. The other is the choice of similarity measures in constructing the matrix A for GCN. The results on stacking the different number of GCN layers are shown in Table 3. We can see that stacking more GCN layers leads to performance improvement by around 1.1% in terms of F1 on Instagram and 2.2% on Vine. The weight matrix A for GCN is obtained by calculating the similarity for all pairs of nodes in the graph. We compare three commonly similarity measures, Cosine similarity: cos(x i , x j ) = x i ·x j x i x j , Jaccard similarity: jac(x i , x j ) = x i x j x i x j − x i x j , and Euclidean similarity: euc =    Table 4. We can see that on the Instagram dataset, using Euclidean similarity can improve the performance by 4.9% and 2.8% in terms of F1 and Accuracy, respectively. On the Vine dataset, using Jaccard similarity outperform than the other two measures by improving 1.2% and 1.7% in terms of F1 and Accuracy, respectively. The results suggest that in different datasets, we need to choose the proper similarity measure to construct the weight matrix as the performance can be affected.

Conclusion
Cyberbullying detection on social media attracts growing attention in recent years. It is also crucial to understand why a media session is detected as cyberbullying. Thus we study the novel problem of explainable cyberbullying detection that aims at improving detection performance and highlighting explainable comments. We propose a novel deep learning-based model, HEterogeneous Neural Interaction Networks (HENIN), to learn various feature representations from comment encodings, post-comment co-attention, and graph-based interactions between sessions and posts. Experimental results exhibit both promising performance and evidential explanation of HENIN. We also find that the learning of graph-based session-session and post-post interactions contributes most to the performance. Such results can encourage future studies to develop advanced graph neural networks in better representing the interactions between heterogeneous information. In addition, it is worthwhile to further model information propagation and tem-poral correlation of comments in the future.