Modeling Inter-Aspect Dependencies for Aspect-Based Sentiment Analysis

Aspect-based Sentiment Analysis is a fine-grained task of sentiment classification for multiple aspects in a sentence. Present neural-based models exploit aspect and its contextual information in the sentence but largely ignore the inter-aspect dependencies. In this paper, we incorporate this pattern by simultaneous classification of all aspects in a sentence along with temporal dependency processing of their corresponding sentence representations using recurrent networks. Results on the benchmark SemEval 2014 dataset suggest the effectiveness of our proposed approach.


Introduction
Aspect-based Sentiment Analysis (ABSA) is a fine-grained task of sentiment classification. Sentimentally involved sentences in reviews, debates, etc., often comprise of multiple aspects that have varied sentiment polarities. An important subtask of ABSA is aspect or aspect-term classification which involves predicting sentiment of aspects embodied in a sentence (Young et al., 2017). Present works in the literature approach this task by analyzing associations between aspects and their contexts provided in the sentence. In this work, we argue that to classify an aspect into sentiment categories, knowledge of surrounding aspects, their sentiment orientation, and resulting inter-dependencies, is beneficial.
Inter-aspect dependencies abound in sentences with multiple aspects. Largely ignored in present * ⋆ means authors contributed equally. literature, these dependencies may reveal themselves in many forms, such as a) Incomplete information, where a certain aspect does not contain enough contextual information to convey the sentiment. In such cases, the surrounding aspects and their sentiment tone become crucial to fill the contextual gap. As an example, in the sentence The menu is very limited -I think we counted 4 or 5 entries., the subsentence I think ... entries containing aspect entries does not provide the required sentiment unless considered with the aspect menu. Here, the negative sentiment of menu induces entries to have the same sentiment. b) Sentiment influence in conjunctions, in which, the sentiment of an aspect in a sentence influences the succeeding aspects due to the presence of conjunctions. In particular, for sentences containing conjunctions like and, not only, also, but, however, though, etc., aspects tend to share/contrast their sentiments. In the sentence Food is usually very good, though I wonder about freshness of raw vegetables, the aspect raw vegetables does not have any sentiment marker linked to it. However, the positive sentiment of food due to the word good and presence of conjunction though determines the sentiment of raw vegetables to be negative. Thus, aspects when arranged as a sequence, reveal high correlation and interplay of sentiments.
In this paper, we facilitate such phenomena by proposing a neural network where the information is shared among the aspects by means of a Long Short-Term Memory (LSTM) network (Hochreiter and Schmidhuber, 1997). In other words, we model the sequential relationship between the aspects as per their occurrence in the sentence. Specifically, our model first takes a sentence along with all of its aspect-terms and then generates the sentential representations relative to each aspect to get better aspect-oriented features (Tang et al., 2016a). This is done using an attentionbased LSTM network, where the attention mechanism enables the model to focus on key parts of the sentence that modulate the sentiment of the aspects. To further guide the attention process the model incorporates aspect information at the word-level by concatenating aspect representations with each word (Wang et al., 2016). Finally, to capture the inter-aspect dependencies, the aspect-based sentential representations are ordered as a sequence and temporally modeled using another LSTM. Each timestep of this LSTM corresponds to a particular aspect. The hidden state output for each timestep is then projected to a dense layer and fed to a softmax classifier to predict the polarities of the corresponding aspect. To the best of our knowledge, use of inter-aspect dependencies in neural models is unprecedented and fills a significant gap in the literature.
In the remaining paper, Section 2 first provides a summary of existing works; Section 3 then describes the proposed approach in detail; Section 4 gives training and dataset details followed by results and a qualitative case study. Finally, Section 5 concludes the paper.

Related Works
Traditional methods in this field leveraged sentiment lexicons to solve this task (Rao and Ravichandran, 2009;Perez-Rosas et al., 2012) whereas present methods have transitioned to neural-based approaches. Tang et al. 2016a introduced the idea of aspect-based sentential representations which generates a custom representation of the sentence based on the aspect. This approach has been heavily adapted by modern works. Wang et al. 2016 built on this framework and introduced attention mechanism for generating these sentential features. They also incorporated aspect information into the attention module by concatenating them with the words. More recently, Ma et al. 2017 proposed a model where both context and aspect representations interact with each other's attention mechanism to generate the overall representation. Tay et al. 2017 proposed word-aspect associations using circular correlation as an improvement over Wang et al.'s work. ABSA has also been approached from a question-answering perspective where memory networks have played a major role (Tang et al., 2016b;. Our work is different from all these works since we train all aspects of a particular sentence together and capitalize on inter-aspect dependency modeling which they ignore.

Proposed Approach
Let us take a sentence S = [w 1 , ..., w n ] having n words. Each word is represented as a lowdimensional real-valued vector of size d em , called word embedding. To get the embeddings, we use the pre-trained Glove vectors (Pennington et al., 2014) having d em = 300. We can thus represent S as a matrix of dimensions R dem×n .
The sentence S also contains m aspect-terms (or aspects), where for each All the aspects A 1 , ..., A m are enumerated as per their order of occurrence in the sentence. The goal is to determine the sentiment label for each of these m aspects belonging to S.
The proposed model comprises two distinct phases ( Figure 1). The first phase involves the generation of aspect-based sentential representations s 1 , ..., s m , where, vector s i is created by coupling aspect A i with sentence S. The second phase models the inter-aspect dependencies in a sentence using an LSTM which is followed by the sentiment prediction for all the aspects.

Phase 1: Aspect-based sentential representations
Below, we describe the methodology to generate the i th aspect-based sentential representation s i for aspect A i and sentence S. Given sentence S and aspect-term A i , the model first generates the aspect representation t i . This is done by passing A i through an LSTM, named LSTM a , having internal dimension d a . LSTM a 's final hidden state vector h A i a ∈ R da is taken to be this representation, i.e., t i = h A i a . Following this, an attention-based LSTM model is used to create s i using S and t i (Wang et al., 2016). First, each word vector w j in S is concatenated with aspect t i to create a comprehensive feature vector x j i = (w j ; t i ) ∈ R (dem+da) , where ; is the concatenation operator. We then take this new sequence representation X i = [x 1 i , ..., x n i ] and 267  apply an LSTM, named LST M s with dimension d s , to model the long-term temporal dependencies within the sentence. The hidden state memory vectors across all n timesteps result in matrix H i = [h 1 s , ..., h n s ] ∈ R ds×n . Attention: Attention mechanism is applied on H i to get an attention vector α, which is in turn used to generate a weighted representation of H i . We use this weighted representation to be the i th aspect's sentential representation s i . Previous concatenations of words with the aspectrepresentations infuse aspect information into the attention process. This enables the attention mechanism to focus on relevant segments in the sentence with respect to the aspect. The overall attention mechanism to generate s i is summarized as: where, W h ∈ R n×1 and W b ∈ R ds×n are projection parameters to be learnt during training and d s is the dimension of the final sentence vector, i.e., s i ∈ R ds .
The overall process described above is individually applied to all m aspects to get sentential representations s 1 , ..., s m .

Phase 2: Inter-aspect relationship
To capture the implicit inter-aspect dependencies, we model the sentential representations as a sequence [s 1 , ..., s m ], following the order of occurrence of their corresponding aspect-terms in sen-tence S. An LSTM, named LSTM ad with dimension d ad is then applied on this sequence and at each of the i th timestep, its hidden state is projected to another vector having dimensions equal to the number of classes to predict. Finally, softmax operation is applied on this vector to get the prediction probabilities for the sentiment of this i th aspect-term for sentence S. The transitions are as follows: [h ad 1 , ..., h adm ] = LST M ad ([s 1 , ..., s m ]) (4) Here,ŷ i ∈ R C is the predicted probability distribution for the i th aspect of sentence S where C is the number of sentiment classes. W ad ∈ R C×d ad is a parameter and sof tmax( Loss Function: We use categorical crossentropy as the loss function which is averaged over all aspects for a sentence. Thus, stochastic loss for sentence S is calculated as: Here, m is the number of aspects for a sentence and C is the number of sentiment categories. y i is the one-hot vector ground truth of i th aspect of sentence S andŷ i,j is its predicted probability of belonging to sentiment class j. λ is the L 2 -regularization term and θ is the parameter set, i.e., θ = {W [h,b,ad] , LST M [t,s,ad] }, where LST M [] represents the internal parameters of that LSTM.

Experimentation
Training details: To perform experiments and subsequent hyperparameter tuning, we first split the training set randomly in the ratio 9 ∶ 1 to get a held-out validation set. For optimization, we use the Adam optimizer (Kingma and Ba, 2014) having learning rate 0.01. Embedding dimensions are set as follows, d a = 100, d s and d ad = 300. To facilitate batch processing, we attach dummy aspects in sentences with lesser aspects and also provide masking schemes. For termination, we use the early-stopping procedure with a patience value of 10 that is monitored on the validation loss.
Dataset: We conduct our experiments using the dataset for SemEval 2014 Task 4 containing customer reviews on restaurants and laptops. Each review has one or more aspects with their corresponding polarities. The polarity of an aspect can be positive, negative, neutral or conflict; however, we consider the first three labels for classification. Table 1 contains the statistics for the dataset. Table 2 presents the results of our proposed model along with state-of-the-art methods. Our model significantly surpasses the performance of ATAE-LSTM (Wang et al., 2016). Given that ATAE's architecture has a strong correlation to our aspectbased sentential generator (see Figure 1), their work can be categorized as a baseline to our model. This reinforces our hypothesis that a model capable of capturing inter-aspect dependencies indeed performs better. We also compare our model to the recently proposed IAN (Ma et al., 2017). On both datasets, our model performs competitively with IAN and produces nominal improvement. Given that IAN explores the inter-dependencies of aspects with their contexts, while we try to model inter-dependencies between aspects, an interesting direction would be to explore the IAN modeled in our proposed setting (Phase 2 of Figure 1). We set this path as an option for future research.  model. Specifically, we try out variants (a) Without attention: in this setting, we omit the attention mechanism while generating aspect-based sentential representation s i (Equation 1-3). Instead, we define s i to be h n s , i.e., the last hidden state vector of LSTM s with input S and A i . However, removing attention brings degradation in the performance of our model on the Restaurant and Laptop dataset by 4% and 3%, respectively. This signifies the importance of an attention mechanism to derive the aspect-based sentential representations. (b) With hadamard fusion: instead of concatenation of w j and t i , we use the hadamard product which is the element wise multiplication of the vectors. Although this variation reduces the total parameter sizes of the network, it still does not benefit the model and gives a poorer performance to simple concatenation. Numerous other fusion methods such as tensor fusion , compact bilinear pooling (Gao et al., 2016), attention-based fusion Hazarika et al., 2018), etc. are applicable, whose analyses, however, is not the focus of this paper.

Case Study
A qualitative study on the test set classifications by our model reveals its capability to learn interaspect dependencies (Section 1). For the sentence I love the keyboard and the screen, the model correctly identifies the sentiment of screen as positive which is hinted by positive aspect keyboard and conjunction and. In another case, for the sentence The best thing about this laptop is the price along with some of the newer features, aspect features is correctly classified as positive which is influenced by aspect price and positive word best. This shows that our model is performing well in clas-sifying joint aspects having conjunctions. For the slightly harder case of tackling incomplete information, our model fares well in sentences having this pattern. For example, one of the sentence Boot up slowed significantly after all windows updates were installed has aspect windows update which does not have a clear sentiment orientation but is implicitly dependent on the aspect boot up having a negative sentiment. This was also correctly classified by our model. Moreover, the above examples were incorrectly classified by ATAE. This reaffirms our hypothesis that the ability to learn inter-aspect dependencies is a crucial factor in the task of ABSA.

Conclusion
In this paper, we present a way to incorporate inter-aspect dependencies in the task of Aspectbased Sentiment Analysis. Our results suggest that capturing such information indeed improves the task of prediction. Through this work, we hope that future attempts by researchers include this idea in their methods.