Sentiment Analysis of Tweets Using Heterogeneous Multi-layer Network Representation and Embedding

Sentiment classiﬁcation on tweets often needs to deal with the problems of under-speciﬁcity, noise, and multilingual content. This study proposes a heterogeneous multi-layer network-based representation of tweets to generate multiple representations of a tweet and address the above issues. The generated representations are further ensembled and classiﬁed using a neural-based early fusion approach. Further, we propose a centrality aware random-walk for node embedding and tweet representations suitable for the multi-layer network. From various experimental analysis, it is evident that the proposed method can address the problem of under-speciﬁcity, noisy text, and multilingual content present in a tweet and provides better classiﬁcation performance than the text-based counterparts. Further, the proposed centrality aware based random walk provides better representations than unbiased and other biased counterparts.


Introduction
With the growing popularity of Twitter, sentiment analysis of tweets has drawn the attention of several researchers from both academia and industry in recent times. Unlike other regular texts, sentiment analysis on Twitter text poses plenty of challenges because of various characteristics such as (i) under-specificity due to text limits, (ii) free-form writing such as the presence of user-defined hashtags, mentions, emoticons, (iii) noisy texts due to the presence of short-form, long-form, multilingual, transliterated text, misspelling. Researchers try to address these problems by adopting various methods like task-specific representation learning (Singh et al., 2020;Pham and Le, 2018;Fu et al., 2018;Tang et al., 2016;Kim, 2014), incorporating additional information such as hash- * Equal contributions.
This paper proposes a novel approach to handle the above issues using a heterogeneous multi-layer network representation of a tweet. A multi-layer network is a network formulated by connecting different layers of networks. For example, a heterogeneous multi-layer network can be formed by connecting layers of networks of mentions, hashtags, and keywords. Multi-layer networks have shown to provide promising performance in other tasks like community detection and clustering (Hanteer and Rossi, 2019;Luo et al., 2020), node classification (Li et al., 2018;Zitnik and Leskovec, 2017;Ghorbani et al., 2019), representation learning in graphs (Cen et al., 2019;Ni et al., 2018). A tweet or a collection of tweets can be represented by a multi-layer network. An advantage of using network-based representation is that a network can be expanded by adding nodes or shrunk by removing nodes. The motivations of using a multi-layer network in this paper are as follows. (i) The semantic relation between keywords, hashtags, and mentions can be captured by applying an effective network embedding method. (ii) The noise and under-specificity can be reduced by expanding the network with related nodes or by shrinking the network after removing the unrelated nodes. Further, the co-occurring keywords, hashtags, and mentions often share semantic relationships (Wang et al., 2016;Weston et al., 2014;Qadir and Riloff, 2013;Wang et al., 2011). This paper has four major contributions. First, it transforms a tweet into a multi-layer network. Second, it proposes a centrality 1 aware random walk over the multi-layer network. Third, it generates multiple representations of a tweet using the proposed centrality aware random walk and builds an early-fusion based neural sentiment classifier. Fourth, it also addresses under-specificity and noisy text for sentiment classification by expanding or shrinking the network representing the tweets. As such, sentiment classification is a domain-dependent task (Karamibekr and Ghorbani, 2012). Therefore, we evaluate the proposed method over datasets in different domains. From extensive experimental evaluations, the proposed method is found to outperform its counterparts in the majority of the cases. To the best of our knowledge, this study is the first of its kind to investigate sentiment classification task by transforming tweet into a heterogeneous multi-layer network.
The rest part of the paper is organized as follows. Section 2 presents the literature related to this study. Section 3 presents the proposed framework. The experimental setup is described in Section 4. The results and observations are analyzed in Section 5. Finally, Section 6 concludes the study of this paper.

Related studies
Sentiment analysis is an old research area. Initial work on sentiment classification can be traced back as early as 2000 (Turney, 2002;Pang et al., 2002;Turney and Littman, 2003). There have been several paradigm shifts in sentiment analysis methods from statistical methods (Turney, 2002;Pang et al., 2002;Turney and Littman, 2003) to rule-based (Prabowo and Thelwall, 2009), to lexicon-based (Taboada et al., 2011;Balamurali et al., 2011;Mohammad et al., 2009), to featurebased (Kouloumpis et al., 2011;Barbosa and Feng, 2010), to deep neural network (Kim, 2014;Severyn and Moschitti, 2015). Majority of the recent studies focus on the application of neural network models. Therefore, this section briefly reviews a few of the recent and related studies which have exploited graph and neural models.
Authors in (Violos et al., 2016) use a homogeneous network known as word graph to represent a document by connecting co-occurring words in the document. Three different networks are created for positive, negative, and neutral classes using the documents in respective classes. Using these networks, a document is represented by a threedimensional vector defined by the three sentiment classes. The elements of the vector correspond to the similarity of the word graph of the document and the word graph of the respective sentiment class. The vector thus obtained is used for classifying the document. Similarly, authors in (Bijari et al., 2020) construct co-occurrence word-graph of a document collection and generate word embedding using Node2Vec (Grover and Leskovec, 2016). The embeddings thus obtained are used to represent words in the text and build a classifier using the Convolution Neural Network (CNN) model. Further, in the studies (Gui et al., 2017;Zhao et al., 2017), the advantages of exploiting the relationship between keywords, sentiment, products and users have also been evident in sentiment analysis.

Proposed framework
As mentioned earlier, the proposed method has four distinct components; (i) representation of a tweet or collection of a tweet using a multi-layer network, (ii) centrality aware random walk over the multi-layer network, (iii) tweet classification using multiple representations generated from the multilayer network of a tweet, and (iv) reduction of noise in a tweet by expanding or shrinking network. This section discusses the details of these components. Figure 1 shows a high-level schematic diagram of the proposed model using a heterogeneous multilayer network.  To capture both the co-occurrence and sequential characteristics of keywords, hashtags and mentions in a tweet, the proposed network consists of both directed and undirected edges. An edge e x,y ∈ E is directed if x and y occur sequentially next to other in a tweet where, i)

Representation of tweets using multi-layer network
co-occur in a tweet. An example of the proposed multi layer network for the tweet "@asadmunir38 Modi is agressive since #UriAttack, #BurhanWani & PM speech @UNGAPak needs to start dialogue with neighbours India, Afghan" is shown in Figure 1. Edge set E = {A ∪ B} which comprises of a set of intra-layer adjacency matrices A = {A 1 , A 2 , ..., A L } with matrix A i ∈ R N i ×N i in each layer i. A set of bipartite matrices B i,j ∈ R N i ×N j represents cross-layer association between layer i and layer j.
The intra-layer associations As are on the maindiagonal, and the cross-layer connections B are on the off-diagonal elements of S. Further, A K , B HK , B KH , B M K , B KM are asymmetric matrices and other matrices of S are symmetric. A tweet or a collection of tweets can be represented as a multi-layer network, as discussed above.
3.2 Centrality aware random-walk with restart for heterogeneous multi-layer network To generate random walk sequences from the proposed multi-layer tweet network, we extend the random walk followed in PageRank (Brin and Page, 1998) algorithm. Given a row stochastic adjacency matrix A of a network, the PageRank of the nodes in the network can be defined as the following vector.
where π t is the stationary probability distribution vector that depicts the probability with which a random walker would stay in a particular node at time t. The restart probability δ ∈ [0, 1] denotes the probability of jumping to a random node and π 0 is the initial stationary probability vector. As in (Li and Patra, 2010), the above randomwalk can be extended to our tweet multi-layer heterogeneous network in the following manner. If λ ∈ (0, 1) is the probability that a random-walker jumps to a different layer while surfing, in presence of L number of layers and considering jumping to any of the remaining layers is equiprobable, the transition probability M aka column-normalized supra-adjacency matrix S in Equation 1, is modified as, That is, for a node, if its bipartite association exists, a random-surfer can stay in the same layer with probability (1 − λ) or transit to a different layer with probability ( λ L−1 ). Now, Equation 2 can be re-written as follows, tance of layer i, π i 0 denotes the initial stationary distribution of nodes in layer i and i∈{H,M,K} η i = 1. And, π t ∈ R (N H +N M +N K ) is the stationary probability distribution of a random surfer on the heterogeneous multi-layer network at time t.
In this study, we propose to personalize the above PageRank algorithm using the global importance of nodes in the proposed heterogeneous multi-layer network. In Equation 4, π rs the restart probability vector is interpreted as layer importance weighted over the centrality based initial stationary probabilities of nodes. This interpretation needs not only the node centrality scores but also the layer importances. MultiRank (Rahmede et al., 2018), a centrality estimate for multiplex networks 2 formulated using a modified version of PageRank algorithm, can estimate both the node centrality scores as well as the layer influences. MultiRank uses a layer-influence weighted aggregated adjacency matrix and a weighted bipartite matrix that relates nodes with layers to determine the node and layer centrality scores simultaneously. We specifically change the definition of these two matrices to customize the MultiRank algorithm for estimating the centrality scores over the heterogeneous multi-layer network representation of tweets. As we calculate the centrality scores, we modify π rs of Equation 4 by replacing each η i with respective influence score of layer i and each initial stationary vector π i 0 with node centrality scores in layer i. In the customized MultiRank algorithm, we have tuned free-parameters (as described in the original paper) while calculating the centrality scores -i) to suppress or enhance the contribution of lowcentrality nodes, ii) to take into account the elite layers that contain a few highly central nodes, iii) to or not to normalize layer influences by weighted layer in-strength. We have tuned the restart parameter in MultiRank and multi-layer random walks in the range ∈ [0.5, 0.85]. In this study, the Multi-Rank algorithm and multi-layer random walks gave the best performance by setting the restart parameter to 0.5 and 0.85, respectively. Furthermore, the average number of tokens per tweet present in our training dataset is 29, so we have hypothetically set Node embedding methods FastText Embedding (FT) (Bojanowski et al., 2017) Multi-View Embedding (MVE) (Qu et al., 2017) Multiplex Network Embedding (MNE)  Sentiment Hashtag Embedding (SHE) (Singh et al., 2020) * The embedding dimension is of 128 size. Same hyper-parameter as suggested in the literature.

Deep-learning models
Hyper-parameter Convolution Neural Network (CNN) 3 Kernels, 128 #Filters, ReLu Activation Function Bidirectional Long Short Term Memory (Bi-LSTM) 64 LSTM Units, ReLu Activation Function Table 1: Different embedding and neural methods the walk-length at 30. We set the number of walks at 10. All the free parameters are tuned based on end-task performance.

Classification of tweets represented with a multi-layer network
Let G i be the multi-layer network representing a tweet T i . Over this network, we generate n number of node sequences S = {S 1 , S 2 , ..., S n } by using the above proposed random walk. Each node sequence is maintained to have a length of m nodes. With n number of random sequences and the original tweet, we have (n + 1) sentences to represent the tweet T i . Each word in these sentences can be represented using a vector obtained from an appropriate embedding method. This paper has considered different embedding methods, as listed in Table 1, trained over a large collection of tweets. For each node sequence S i , we apply a neural model (Bi-LSTM (Chen et al., 2017) and CNN (Kim, 2014)) to generate a representation of the sequence S i . The last hidden state output obtained after passing the node sequences to Bi-LSTM represents the sequence S i . While, the vector obtained after applying the pooling step in CNN represents the sequence S i . Thus, we obtained (n + 1) vectors for each tweet. We concatenate these (n + 1) vectors and feed it to a feed-forward dense layer with three neurons (each for positive, negative, and neutral) and classify the sentiment of the tweet using softmax activation function in the output layer as shown in Figure 1. We use Keras 3 deep learning framework for building our proposed model.
We calculate the error loss (∆) for the classifier using the well-known cross-entropy loss as, where c is the number of sentiment classes, t ic is the c th ground truth class for the tweet, l is the total number of training samples, and s ic is the predicted probability on sample i for the c th class.

Network expansion and shrinking
One of the motivations of using the multi-layer network for representing a tweet lies in its flexibility to expand or shrink the network. Given a set of existing nodes in a tweet-network as query nodes, the idea is to identify the most related nodes or most noisy nodes by exploiting a multi-layer network of a global tweet collection. We consider the most central and most similar neighboring nodes of the query nodes as potential expansion candidates. To reduce the search space, we first select the top k query nodes ranked by the nodes' centrality scores in the tweet network view. The centrality scores of the nodes are calculated from the whole tweets collection. We then find neighbors of the selected nodes and ranked them using a weighted combination of similarity and centrality score using the scoring function defined below: .centrality (u) where N v denotes neighbouring nodes of v, sim(v, u) denotes cosine similarity using node embeddings of v and u, and centrality(u) denotes centrality score of node u in global network. In this study, we take equal weights of cosine and centrality score by setting α = 0.5. Top neighbouring nodes are selected using the above scoring function and added to the network in their respective layers using the edge policy discussed in Section 3.1.
The above node expansion method finds new nodes having semantic relation with the query nodes. However, for the sentiment analysis task, we are interested in adding only sentiment bearing nodes by selecting only those nodes having the dominant sentiment class among the selected nodes for expansion. While, the rest of the nodes with less dominating sentiment classes are removed from the tweet network. The Sentiment Hashtag Embedding (SHE) method proposed in (Singh et al., 2020) is used to estimate the sentiment orientation of a node. We have used the same experimental setup as described in the literature.   (Karamibekr and Ghorbani, 2012). The Societal dataset contains 18% non-English tweets (i.e., Hindi and code-mix with English), of which 1, 626 code-mix tweets and 1, 505 tweets with less than five keywords are kept unseen for evaluation of our proposed model. Meanwhile, the hashtags and mentions cover 11% and 15% of the total 39, 428 unique vocabulary of the Societal dataset. This dataset is used to build sentiment classifiers and construct a multi-layer network to generate node embeddings. Details of the dataset is shown in Table 2.

Embedding method
We investigate the efficacy of our proposed multilayer network using four different types of node embedding methods namely Multiplex Network Embedding (MNE) , Multi-View Embedding (MVE) (Qu et al., 2017), Fast-Text (FT) (Bojanowski et al., 2017), and Sentiment Hashtag Embedding (SHE) (Singh et al., 2020)   (listed in Table 1). These embedding methods need a collection of node sequences. This study represents the tweet corpus into an expanded multilayer network by combining the whole tweet networks to generate node sequences via a random walk method. For experimental comparison, we investigate three random walk methods to generate the node sequences, namely Unbiased random walk used in MNE, biased random walk used in Node2Vec (N2V) (Grover and Leskovec, 2016) and the proposed centrality aware Biased random walk. Moreover, to investigate the efficacy of our proposed random walk (RW), we modeled the generated Biased RW sequences using the FastText embedding model -which we refer to as Biased FT (BFT) in Table 3.

Selection of n random walks
A random walker can generate various node sequences starting from a node in the given network. However, all of the sequences are not useful. To identify the node sequences of our interest, we consider a simple second-order Markov chain based language model (Lafferty and Zhai, 2001) by calculating the probability of generating a node sequence given a tweet network. This study considers the top three random-walk sequences. 5

Results and observations
In Table 3, we show the performance of two sentiment classifiers CNN (Nguyen and Nguyen, 2018) and Bi-LSTM  in terms of accuracy and F-Macro scores over the Societal dataset using 10-fold cross validation approach for four embedding models of our choice namely Multiplex Network Embedding (MNE) (Zhang et al., 5 We have considered only the top few walks (3, 5, and 7) with the highest probability. Experiments show that considering the top 3 walks provide the best results. The codes for this paper are available at: https://github. com/gloitongbam/SA_Hetero_Net 2018), Multi-View Embedding (MVE) (Qu et al., 2017), FastText (FT) (Bojanowski et al., 2017), and Sentiment Hashtag Embedding (SHE) (Singh et al., 2020). We consider the work of (Nguyen and Nguyen, 2018; as the baseline models for text-based sentiment classification of tweet. Along the rows of Even the best performances in both the metrics pertain to [C] Biased RW with SHE embedding using both the classifiers. We feel the N2V style global topology-based biasing is not that useful for sentiment prediction than our biased approach, which uses centrality scores intuitively. Among the embedding models, we observe that Biased FT and SHE give competitive performances. We believe Biased FT performs competitively as it is trained on centrality-aware random-walks, additionally augmented with sentiment polarized nodes. Whereas, SHE systematically embeds sentiment information and also aided by biased tweet graph view -this makes it an unbeatable performer for sentiment classification. To realize the importance of generating node sequences with an effective RW method over the proposed network, we investigate another experimental setup by randomly shuffling the selected nodes for expansion (both sentiment polarized and non-polarized nodes) with the tweet text. We call it as T+Shuffle-Filtered and Unfiltered methods for shuffling of sentiment polarized and nonpolarized node expansions respectively in [D]. For Bi-LSTM, we can see [D] Unfiltered beats textonly prediction, which signifies that the list of selected nodes, though randomly shuffled, but are informative enough. For both the classifiers in [D] Filtered outperforms text-only prediction on average by 0.8%, 2.4%, respectively, signifying selected nodes by sentiment polarized node expansion method aids in performance. Here we shall also showcase the novelty of node sequences over a randomly shuffled list of the same nodes.
[D] Unfiltered is comparable with [B] view -Biased RWs are seen to improve upon the prior. Whereas walks in the [C] view, which is comparable to [D] Filtered are seen to improve the performance of the latter.

Novelty of centrality-aware walks
It is evident from the already-shown results that our proposed biased random-walks are useful for the effective representation of tweets. One may be further interested in knowing how far these Biased RW sequences can improve any embedding models' performance. We conduct a pilot study by creating three versions of the FastText algorithma word embedding based original version (FT), an * The plot shows different scale but of same value due to round-off error. Unbiased RW sequence-based version (Unbiased FT), and a Biased RW sequence-based version (Biased FT) as summarized in Figure 2. Biased FT beats tweet-based FT in 6 out of 10 cases by an average of 1.11%. Biased FT also beats Unbiased FT in 6/10 cases by an average of 1.37%. Although Unbiased FT seems to perform poorer as compared to the original FT in general, in the case of sentiment polarized node expansion, it consistently outperformed the FT -which again proves the effectiveness of the sentiment polarized node expansion method.

Novelty of sentiment polarized node expansion
In this section, we further analyzed the effectiveness of node expansion for the sentiment classification task. We summarize using box-plot in Figure  3, the performances of the tweet-network representations (shown in Table 3) for sentiment polarized and non-polarized node expansion, and without node expansion over different RW algorithms (i.e. Unbiased, Node2Vec, Biased). From the figure, it is observed that for each RW methods, the node ex- pansion based representation beats the performance of the tweet representation without any node expansion. Precisely, the sentiment polarized node expansion beats the performance of classifiers with and without non-polarized node expansion by an average margin of 9.19% and 10.57%, respectively. Further, the non-polarized node expansion beats the performance of the classifiers without node expansion by 1.38%. From Figure 3, we observe two aspects; -i) the expansion of semantically related nodes in tweet-network makes the performance of centrality based biasing algorithm more reliable, ii) the box-plot of sentiment polarized node expansion methods has a small variance, indicating that it is a pretty stable, reliable method to enhance the tweet network view. Hence we can conclude that extending the networked-view of a tweet by including a few semantically similar, central nodes serves our purpose decently. Further, the performance is enhanced in a considerable margin by adding only the sentiment polarized nodes related to the tweet.

Response on under-specified Tweets
We consider tweets having less than five keywords 6 as an under-specified tweet. Tweets with fewer keywords, although informative, can pose challenges to sentiment classifiers due to under-specificity. We considered the CNN-based classifiers trained using Biased FT embedding to classify the underspecified tweets for this study. Figure 4(a) shows the CNN-based classifiers' performance based on the different types of tweet representations. From the figure, we observed that the sentiment classifier trained without any node expansion performs better than the classifier trained with tweet-text only. This observation shows the power of optimally selected 6 Including hashtags and mentions n random-walk sequences as an alternative representation of tweets. Among no expansion methods, Biased RW sequences give the best performance -beat tweet-text only prediction by 5.7% and Unbiased RW by 3.82%. We can see similar trends of performance for RW based sequences in case of sentiment polarized node expansion also. However, sentiment polarized node expansion strategically mitigates the problem of under-specified tweets by extending the tweet-network view to include less-noisy informative nodes so that the generated walks are more diverse and discriminating. The last pair of columns is one special scenario where we give the original tweet-text + list of randomlyshuffled sentiment polarized nodes to the sentiment classifier. This combination (T+Filtered) outperforms the tweet only prediction by 3.9% -depicting nodes selected for expansion are important for inference. However, as T+Biased without node expansion, T+Unbiased and T+Biased with sentiment polarized node expansion beat this T+Filtered by a margin of 1.8%, 2.7% & 6.4% accuracy respectively. This proves the veracity of this fact that random-walk sequences are a stronger representation of tweets as compared to mere inclusion of a shuffled-list of semantically related words to the tweet-text.

Response on Multilingual tweet
Figure 4(b) shows sentiment classification performance over the multilingual tweets -tweet-text written in the code-mixed language. This plot also follows similar trends, as reflected in Figure  4(a), but we have two striking observations this time. In the case of multilingual tweets, since the co-occurrence of multilingual words is rare, our proposed node expansion methods are useful to retrieve semantically related co-occurred English words that can aid in inference. We verify the same intuition with this plot. We can see the jump in prediction results for sentiment polarized node expansion for T+Unbiased, T+N2V, and T+Biased over their counterparts in the previous group (without node expansion) with a margin of 4.6%, 3.2% and 0.1% accuracy respectively. It is interesting to see the huge performance improvement of T+Biased without node expansion over tweet only prediction by a margin of 4.75% accuracy -which we believe is due to the power of interpretable, centrality-score aided, optimally the Biased RW sequences of multilingual words.

Evaluation on SemEval datasets
We further investigate the performance of the proposed method with two popular Twitter datasets used in SemEval challenges for sentiment analysis; SemEval-2013 7 and SemEval-2016 8 . For this study, we consider the train and test split provided in the datasets. Figure 5 (a) and (b) shows the performance of the CNN classifier trained over different types of tweet representation using the SemEval-2013 and SemEval-2016 datasets, respectively. For training the CNN classifier, we use Biased FT embeddings trained using the challenge datasets. Our proposed centrality aware-based biased random walker through sentiment polarized node expansion has achieved best performance up to 64% accuracy and 60% F-macro score on SemEval-2013 and up to 77% accuracy and 54% F-macro score for SemEval-2016. Further, on comparing the performance of tweet representation between text-based and network-based without node expansion, it is observed that for both datasets, the representation without node expansion could hardly beat text-based representation in F-macro 7 https://www.cs.york.ac.uk/semeval-2013/task2/ 8 http://saifmohammad.com/WebPages/StanceDataset. htm measure. However, for the SemEval-2016 dataset, our proposed method outperforms text-based representation in both the evaluation measures. We see substantial performance gain for N2V RW in both the datasets when augmented with any node expansion. For SemEval-2016, a fascinating thing to observe is -Unbiased and Biased RW-based sequences almost give a comparable performance in terms of accuracy. However, the Biased RW view consistently outperformed the Unbiased view in F-macro measure in both datasets for each of the cases of node expansion. This points to the fact that our method consistently performs better than its counterpart methods.

Conclusion
This study investigates the efficacy of transforming tweets to heterogeneous multi-layer network for the sentiment classification task. Our proposed centrality aware random-walk method can generate walk sequences that capture better semantic relations than its unbiased and biased random walk based counterparts. From various experimental observations, it is evident that sentiment-oriented node expansion can reduce under-specificity, noise in a tweet, and enhance the representation. The proposed method outperforms its text-based counterpart in a majority of the cases.

A Appendix
Here, we show some additional experiment results and their implications in support of our proposed framework for sentiment classification.
A.1 Interpretation of node centrality scores and layer influences In Table 4 and Figure 8, we precisely show three example tweets and their ranked centrality scores calculated by our proposed method. The first example is all about a terrorist attack in India and India's Prime Minister Modi's reaction to it. In simple multi-layer view of a tweet, we see india, pm, modi, speech -keywords related to how India reacts have more centrality than the attack #uriattack and one terrorist named #burhanwani. It is interesting to look at the list of nodes selected by our plain and sentiment polarized node expansion methods in Table 4. The list of nodes for expansion related to #uriattack talk about the surgical strike, home minister, defense minister, soldiers killed in this attack, and have higher ranks. The second tweet is one under-specified tweet where India's Prime Minister greets soldiers. Here, our node expansion methods beautifully guess that this greeting is related to India's success in #surgicalstrike as India's reaction to #uriattack. Keywords related to the war, causalities and related emotions like army, pak, loc, diplomatic, refute, lose, collateral, pray, roar come higher in centrality-score based ranking. Example 3 is one multilingual tweet whose main theme is Goods & Services Tax (GST) (a bill related to tax payment adopted by the Indian government in 2017). Although the original tweet mentions @narendramodi PM of India and uses Hindi keywords, but the nodes selected for expansion rightfully capture about finance ministry (@arunjaitley, @finminindia), home ministry (@amitshah), economic transformation and mostly positive sentiments about it. Also, as we create one large multi-layer heterogeneous network from the tweet corpus to train node embedding methods, the layer influence calculated by our method ranks the hashtag layer higher than the mention layer followed by the keyword layer (H > M > K). This ranking is pretty intuitive as we have most of the influential nodes in the hashtag (trending topics) and mention (Twitter handles of important personalities) networks. Whereas, the keyword layer has a large number of keywords, among them, the entire population of the less frequently used  keywords brings down the overall influence score of this layer.
A.2 Novelty of centrality score-based tweet network representation We created boxplots of aggregated performances of three competing methods (Unbiased, Node2Vec and Biased as in Table 3) for tweet network representation and generation of RW sequences. From Figure 6, for each networked view of tweets (NE, No NE, SNE), it is evident that our centrality scorebased RW sequences provide better tweet representations than unbiased and Node2Vec biasing based RW sequences. Node2Vec biasing does not seem to be an intuitive tweet networked view for tweet sentiment classification. Our proposed centrality aware RW sequences beat Node2Vec by 3.1% and unbiased RWs by 1.7% on average.