METNet: A Mutual Enhanced Transformation Network for Aspect-based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) aims to determine the sentiment polarity of each specific aspect in a given sentence. Existing researches have realized the importance of the aspect for the ABSA task and have derived many interactive learning methods that model context based on specific aspect. However, current interaction mechanisms are ill-equipped to learn complex sentences with multiple aspects, and these methods underestimate the representation learning of the aspect. In order to solve the two problems, we propose a mutual enhanced transformation network (METNet) for the ABSA task. First, the aspect enhancement module in METNet improves the representation learning of the aspect with contextual semantic features, which gives the aspect more abundant information. Second, METNet designs and implements a hierarchical structure, which enhances the representations of aspect and context iteratively. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of METNet, and we further prove that METNet is outstanding in multi-aspect scenarios.


Introduction
Aspect-based sentiment analysis (ABSA) aims to determine the orientation of sentiment expressed on each aspect in a sentence (Liu, 2012). For instance, in the sentence "Although the service is not that great, I still like the food", the user mentions two aspects "service" and "food" and expresses negative sentiment and positive sentiment respectively. The ABSA task consists of two subtasks including aspect extraction and aspect sentiment classification . In this paper, we assume that the aspects are given and only focus on the aspect sentiment classification task. In ABSA task, multiple aspects could appear in one sentence. When predicting the sentiment of current aspect, words related to other aspects with different sentiment tendencies maybe become noises. Therefore, how to effectively model the semantic relationship between given aspect and context words is an important challenge.
The traditional methods mainly rely on manually designed features, which is labor-intensive, and this representation method almost reaches its performance bottleneck (Ma et al., 2017). Boosted by the recent development of deep learning techniques, some studies utilize attention mechanism to solve the above problems well, and many neural attention models have been proposed (Wang et al., 2016;Tang et al., 2016b;Chen et al., 2017;Ma et al., 2017). In these works, the model generally obtains the aspect representation first and then applies attention mechanism to extract the context features related to the given aspect for sentiment prediction. However, the attention mechanism has some drawbacks. When a sentence contains multiple aspects with different sentiment tendencies, opinion modifiers of other aspects are noise information for the current aspect. But, it is hard for the attention mechanism to differentiate opinion modifiers of multiple aspects, which directly affects the final prediction result. For example, in the sentence "I like coming back to Mac OS but this laptop is lacking in speaker quality compared to my $400 old HP laptop", the attention mechanism should pay more attention to the opinion word "like" with positive sentiment tendency for aspect "Mac OS". However, the attention mechanism tends to involve irrelevant opinion words, such as "lacking" with negative sentiment tendency, which interferes with the sentiment prediction of aspect "Mac OS". To this end, some researchers put forward many works Ran et al., 2019) to overcome the shortcomings inherent in attention mechanism for the ABSA task. However, most of these works are devoted to designing complex neural networks to improve the representation learning of the context. Few works focus on how to improve the representation learning of the aspect. In recent work, Yang et al. (2019) learns the effective representations of aspect and context alternately with an iteration mechanism and finally obtains more accurate prediction results.
In order to make the aspect play a better role in the ABSA task, we propose the mutual enhanced transformation network (METNet). METNet improves the representation learning of the aspect and extracts more effective context features based on enhanced aspect representation. Inspired by TNet  and Coattention-MemNet (Yang et al., 2019), METNet utilizes the hierarchical structure to learn the representations of aspect and context iteratively, and we name each computational layer as Bidirectional Enhancement Transformation (BET) layer. Each BET layer has three parts, including a bidirectional LSTM, an aspect enhancement module, and the aspect-specific transformation (AST) units. Specifically, BET learns the context dependency of sentence by a bidirectional LSTM first. Then, the aspect enhancement module utilizes the extracted context features to improve the aspect representation. After that, the AST units fuse the aspect information into each context word to obtain a sentence-level context representation. Contexts and aspects passing through multiple BET layers are eventually fed into a sentiment feature extractor. Since GCAE (Xue and Li, 2018) adds aspect information when extracting sentiment features, which further strengthens the connection between aspect and context. We replace the feature extractor from CNN to GCAE. Moreover, in order to help GCAE extract sentiment features more accurately, we utilize relative position information to scale the input of GCAE.
The main contributions of this work are summarized as follows: (1) We propose an aspect enhancement module which utilizes the extracted context features to enhance aspect representation. The enhanced aspect representation is utilized to obtain the more effective context representation. (2) Based on the aspect enhancement module, we propose the mutual enhanced transformation network (METNet) for aspect-based sentiment analysis which learns the representations of aspect and context alternately and iteratively. Experimental results confirm the effectiveness of METNet, and METNet is outstanding in multi-aspect scenarios.

Related Works
The traditional methods (Boiy and Moens, 2009;Kiritchenko et al., 2014) for the ABSA task mainly utilize manually designed features such as sentiment lexicon, n-grams, and dependency information, which is labor-intensive. Also, the quality of the features directly affects the classification accuracy of methods.
With the development of deep learning on natural language processing tasks, many neural network models (Tang et al., 2016a;Wang et al., 2016;Ma et al., 2017;Yang et al., 2019;Ran et al., 2019) have been proposed. These methods automatically learn sentiment features of sentence and achieve good results. Tang et al. (2016a) adopted two LSTMs to model the two clauses of given sentence and then integrated the coding features from the two LSTMs for prediction.
Recently, because attention mechanism can clearly capture the semantic relationship between given aspect and words in sentence, attention-based neural models have attracted growing interest. Wang et al. (2016) proposed an attention-based LSTM network. Tang et al. (2016b) adopted an end-to-end memory network architecture where each computational layer was based on the attention mechanism.  utilized the attention mechanism to generate different aspect representations based on individual context words and then extracted context features based on tailor-made aspect representations.
Existing methods pay little attention to how to improve the representation learning of the aspect. Inspired by TNet  and Coattention-MemNet (Yang et al., 2019), we propose the mutual enhanced transformation network (METNet) for aspect-based sentiment analysis. The main differences between our method and existing methods are as follows: (1) METNet improves the representation learning of the aspect. (2) We propose the Bidirectional Enhancement Transformation (BET) component which can be repeated by the hierarchical structure to obtain more effective representations of aspect and context.

Model Overview
In this section, we describe our proposed METNet model which is shown in Figure 1. The METNet model can be roughly divided into three parts, namely the BERT Layer, the Bidirectional Enhancement Transformation (BET), and the Convolutional Feature Extractor.

BERT Layer
BERT has been successfully applied to process various NLP tasks, such as Question Answer and Dialog System. The BERT layer uses the pre-trained BERT to generate word representations of sequence. Suppose that a sentence consists of m words and an aspect contains n words. Then, we can obtain the vector representation m }∈ R m×d of the sentence and the vector representation a ={a 1 , a 2 , . . . , a n }∈ R n×d of the aspect by the BERT layer, where d denotes the dimension of the BERT output layer.

Bidirectional Enhancement Transformation (BET)
The bidirectional enhancement transformation (BET) layer in Figure 1 is introduced in this section, while its details are shown in Figure 2. Each BET layer contains three parts, namely a Bi-directional LSTM Layer, an Aspect Enhancement Module, and a set of Aspect-Specific Transformation (AST) Units. The BiLSTM layer first generates the contextualized word representations based on the input. Then, the aspect enhancement module uses the word representations to enhance the aspect representation. Finally, the AST units generate the aspect-specific word representations based on the contextualized word representations and the enhanced aspect representation. Details are described below.
Bi-directional LSTM Layer: As mentioned earlier, we first use BERT to encode sentence. BERT's model architecture is a multi-layer bidirectional Transformer encoder (Devlin et al., 2019). Yan et al. (2019) analyzed the shortcomings of the Transformer, that is, the Transformer weakened the directional information and relative position information of the text. However, the relative position information of the text is important for the ABSA task. Therefore, we set up a BiLSTM after BERT to learn the context dependency of the text.
As shown in Figure 1, we repeat the BET layer by the hierarchical structure. The input of the BiLSTM layer in the bottom BET layer is the context representation outputted by BERT, and the input of the BiLSTM layer in the next BET layer is from the outputs of AST units in the previous BET layer.
We formulate the word representation outputted by the BiLSTM as h (i) ={h notes the number of hidden units. Similarly, the backward LSTM also outputs a set of hidden states Finally, we connect two hidden state lists to obtain the word repre- (a) Architecture of the l-th BET layer.
(b) Details of the AST unit. Aspect Enhancement Module: Before introducing this module in detail, we first introduce how to obtain the initial aspect representation. Specifically, we feed the aspect vector a ={a 1 , a 2 , . . . , a n }∈ R n×d outputted by BERT into another BiLSTM and then apply the average pooling method to the obtained hidden state vectors h a ={h a 1 , h a 2 , . . . , h a n }∈ R n×2d h . Finally, we can get the initial aspect representation v (0) a ∈ R 2d h , as shown in Figure 1. Take the bottom BET layer as an example to introduce the aspect enhancement module in detail. First, based on the contextualized context representation h (1) outputted by BiLSTM, we use the average pooling layer to obtain a vector v (1) h ∈ R 2d h , namely the contextual vector. Then, we use a basic feature fusion method, the point-wise addition, to fuse contextual vector into the initial aspect representation, which can be formulated as v This is an enhancement operation on the aspect. By analogy, the final aspect representation is v h ∈ R 2d h , and we unfold this formula as follows: v where represents the contextual vector in the i-th BET layer. From Eq.1, we can see that the aspect representation is gradually strengthened by different contextual vectors in multiple BET layers.
As shown in Figure 2, the aspect vector v (l) a , l ∈ [1, L − 1] has two directions, one is AST units in the same BET layer, and another is the aspect enhancement module in the next BET layer.
Aspect-Specific Transformation (AST) Unit: As mentioned earlier, one direction of the aspect vector v (l) a is AST units in the same BET layer. The role of AST units is to generate aspect-specific word representations. The AST unit uses a structure similar to the CPT module in TNet proposed by .
The AST unit takes the aspect vector v i is fed into a fully connected layer to obtain the i-th aspectspecific word representation h where g( * ) is a non-linear activation function, and ":" denotes vector concatenation. W and b are weight matrix and bias respectively. There is an information protection mechanism to ensure that the context dependency information captured from BiLSTM will not be lost. This information protection mechanism strengthens the transmission and use of features and can be formulated as: i is the output of the AST unit.

Convolutional Feature Extractor
In this subsection, we will introduce a feature extractor for extracting sentiment features which is a gated convolutional network (GCAE) proposed by Xue and Li (2018). GCAE differs from vanilla CNN. The ReLU gate in GCAE receives the aspect information to control the propagation of sentiment features, which further enhances the connection between aspect and context. Also, Chen et al. (2017) introduced position information, which makes the model pay more attention to sentiment modifiers that are closer to the current aspect, thereby improving the classification accuracy of the model in multi-aspect scenarios. Inspired by this work, a variable p i is introduced in our model to measure the relative position information between the i-th word in the context and the current aspect. Specifically, the calculation of p i is as follows 1 : where k is the index of the first aspect word, C is a pre-specified constant, and n is the length of the aspect. Then, we multiply p i as the weight by the word representation outputted by the i-th AST unit in the L-th BET layer: Then, we feed X ={x 1 , x 2 , . . . , x m } and the final aspect vector v (L) a into the above gated convolutional network to generate a feature map c: where k is the kernel size, and W a , V a , b a , W s , and b s are learnable parameters. × denotes elementwise multiplication. Then, we apply max pooling (Kim, 2014) and obtain the sentence representation z by employing s kernels: Finally, we pass z to a fully connected layer for sentiment prediction: 1 As we perform sentence padding, it is possible that the index i is larger than the actual length m of the sentence.
where W f and b f are learnable parameters.

Model training
METNet can be trained in an end-to-end manner within a supervised learning framework to optimize all the parameters notated as Θ. Cross entropy with L 2 regularization is used as the loss function, which is defined as: where y i denotes the ground truth, andŷ i is the estimated probability for each sentiment. O stands for the number of sentiment polarities. λ is the coefficient for L 2 regularization. Parameters Setting: We use the Bert-base-uncased pre-trained model which contains 12 layers and its hidden layer dimension is 768. All the weight matrices are given the initial value by sampling from the uniform distribution U(-0.1,0.1). We adopt the dropout strategy after BERT, and the dropout rate is set to 0.5. The coefficient for L 2 regularization is set to 10 −4 . The number of hidden units of BiLSTM d h , the constant C, and the size k and number s of the convolution kernel are set to 384, 40, 3, and 50, respectively. We train the model with the Adadelta (Zeiler, 2012) optimizer and set learning rate as 1.

Compared Methods
To justify the effectiveness of our METNet, we compared it with the following methods.
SVM (Kiritchenko et al., 2014): It is a traditional support vector machine based model with extensive feature engineering.
AE-LSTM (Wang et al., 2016): AE-LSTM is an attention-based LSTM network, which uses the attention mechanism to calculate the correlation between aspect and words in sentence.
ATAE-LSTM (Wang et al., 2016): ATAE-LSTM is an extension of AE-LSTM. Considering the importance of aspect for sentiment prediction, ATAE-LSTM adds aspect embedding to the input of the model.
IAN (Ma et al., 2017): IAN interactively learns attentions in the context and aspect and generates the representations for context and aspect separately.
MemNet (Tang et al., 2016b): MemNet contains multiple computational layers that share parameters. Each layer is a model based on content attention and location attention and assigns weights to each context word to obtain a sentence representation.
Cabasc : Cabasc uses two attention enhancement mechanisms to flexibly model the word order information, the aspect information, and the correlation between the word and the aspect.
TNet-LF : TNet-LF proposes to dynamically compute the importance of each aspect word based on each context word rather than the whole sentence.
TNet-ATT(+AS) (Tang et al., 2019): TNet-ATT(+AS) is developed based on TNet-LF. TNet-ATT(+AS) proposes an algorithm that can automatically mine attention supervision information, thereby improving the model's insufficient learning of low-frequency words with sentiment polarity.
IARM (Majumder et al., 2018): IARM generates independent aspect-aware sentence representations for all aspects in a sentence to help predict the sentiment polarity of current aspect.
HGMN (Ran et al., 2019): HGMN distills out aspect-specific effective text spans in sentence instead of only the aggregated contextual representation based on attention score.
Coattention-MemNet (Yang et al., 2019): Coattention-MemNet learns the key features from the aspect and context alternately with an iteration mechanism.

Models
Laptop  Table 2: Experimental results (%). The best results are in bold. The results with symbol "#" are reproduced under the same conditions with the original paper. Those starred (*) results are from Dong et al. (2014) and starred (**) results are from . Other results are retrieved from the original papers. Table 2 statistics the performance of each model on three datasets. The main evaluation metrics are Accuracy and Macro-averaged F1-score.

Experimental Results and Analysis
Analysis of METNet: We compare the classification accuracy of METNet with all baselines, and the main results are shown in Table 2. As we can see from the results, METNet outperforms all baselines on Laptop and Restaurant datasets. TNet-ATT(+AS) achieves the best result on Laptop dataset among all baselines due to its progressive self-supervised attention learning approach. HGMN achieves the best result on Restaurant dataset among all baselines due to its hierarchical gate mechanism. METNet achieves 0.75% and 0.17% accuracy improvements on Laptop and Restaurant datasets compared with TNet-ATT(+AS) and HGMN, which indicates the effectiveness of METNet. METNet did not perform well on Twitter dataset. The reason may be that METNet is not suitable for Twitter dataset that is composed of single-aspect sentences.
In addition, in order to highlight the advantages of METNet in multi-aspect scenarios, we conduct further experiments and introduce the relevant experimental results of IARM from the original paper. Specifically, we first delete the single-aspect samples in the Laptop testset and Restaurant testset and mark the new datasets as Laptop* and Restaurant*. Then, we apply the trained TNet-LF and METNet to Laptop* and Restaurant* datasets and present the results in Table 3. We find that METNet achieves significant accuracy improvements, which indicates the effectiveness of METNet in multi-aspect scenarios.

Models
Laptop  Effects of Aspect Enhancement Module: We validate that the aspect enhancement module is effective for the ABSA task by comparing METNet and METNet w/o aspect enhancement module. Note that METNet w/o sth represents METNet with sth removed, that is, METNet w/o aspect enhancement module is the version where METNet removed the aspect enhancement module. In order to prove the above point, we conduct ablation experiments. The results are shown in Table 2. Compared with METNet w/o aspect enhancement module, METNet achieves 0.31%, 1.16%, and 1.30% accuracy improvements on Laptop, Restaurant, and Twitter datasets respectively, which proves the effectiveness of the aspect enhancement module.
On the basis of the above, we also conduct a set of experiments to examine the importance of the aspect enhancement module in multi-aspect scenarios. We apply the trained METNet w/o aspect enhancement module to Laptop* and Restaurant* datasets and then record and analyze the experimental results. In Table 3, METNet achieves 1.57% and 1.56% accuracy improvements on Laptop* and Restaurant* datasets compared with METNet w/o aspect enhancement module, which effectively proves the importance of aspect information in multi-aspect scenarios. We also observe in Table 3 that METNet w/o aspect enhancement module outperforms TNet-LF and IARM, which may benefits from the BET layer we designed and the application of GCAE.   Table 4, we can see that METNet achieves the best results when LN is 2. We also have tried to set the LN larger, and the classification accuracy basically becomes worse, which is probably because more parameters increase the training difficulty.

Case Study
We pick some test examples from the testset to evaluate the performance of METNet further and present several example sentences in Table 4. In the sentence (5) and (6), the sentiment of aspects is determined by related opinion words, and the proposed METNet can make correct predictions even without using the attention mechanism. Moreover, METNet is better at handling long and complex sentences with multiple aspects compared with ATAE-LSTM and TNet. For example, sentence (1), (6), (7) are typical long reviews and involve multiple aspects. In this statement, METNet makes correct predictions on all aspects. Because compared with ATAE-LSTM and TNet, METNet has the aspect enhancement module which gives aspect with rich semantic information and enhances the connection between aspect and context. In addition, we find METNet, ATAE-LSTM and TNet perform poorly on sentences (2), (3). In this statement, the difficulty of prediction comes from the comparison between aspects. It is the inference based on implicit semantics, which is still quite challenging for neural network models.

Sample Sentences
ATAE-LSTM TNet METNet (1) Would you ever believe that when you complain about over an hour wait N , when they tell you it will be 20-30 minutes, the manager N tells the bartender O to spill the drinks O you just paid for? N, N, N, O N, N, N, O N, N, O, O (2) New hamburger with special sauce P is ok -at least better than big mac N ! O, P N, N P, P (3) Price N was higher when purchased on mac when compared to price P showing on pc when I bought this product.
N, N N, N N, N (4) Great food P but the service N was dreadful! P, P P, N P, N (5) They really provide a relaxing, laid-back atmosphere P . P P P (6) Not only did they have amazing, sandwiches P , soup P , pizza P etc, but their homemade sorbets P are out of this world! P, P, P, P N, P, P, P P, P, P, P (7) This is one great place to eat pizza P more out but not a good place for take-out pizza N . P, P P, N P, N

Conclusion
In this paper, we present an end-to-end solution, a mutual enhanced transformation network (METNet), to solve the issue of insufficient aspect representation learning in ABSA task. First, we propose Bidirectional Enhancement Transformation (BET) component to improve the representation learning of aspect, which meanwhile achieves alternative learning of aspect and context. Secondly, METNet uses a hierarchical structure to iteratively learn aspect and context to obtain better representations. Experimental results demonstrate the effectiveness of METNet for aspect-based sentiment analysis. Especially, MET-Net performs well in both single-aspect scenarios and multi-aspect scenarios. As future work, we will consider how to use sentiment lexicon and attention mechanism to further improve aspect representation learning.