Synchronous Double-channel Recurrent Network for Aspect-Opinion Pair Extraction

Opinion entity extraction is a fundamental task in fine-grained opinion mining. Related studies generally extract aspects and/or opinion expressions without recognizing the relations between them. However, the relations are crucial for downstream tasks, including sentiment classification, opinion summarization, etc. In this paper, we explore Aspect-Opinion Pair Extraction (AOPE) task, which aims at extracting aspects and opinion expressions in pairs. To deal with this task, we propose Synchronous Double-channel Recurrent Network (SDRN) mainly consisting of an opinion entity extraction unit, a relation detection unit, and a synchronization unit. The opinion entity extraction unit and the relation detection unit are developed as two channels to extract opinion entities and relations simultaneously. Furthermore, within the synchronization unit, we design Entity Synchronization Mechanism (ESM) and Relation Synchronization Mechanism (RSM) to enhance the mutual benefit on the above two channels. To verify the performance of SDRN, we manually build three datasets based on SemEval 2014 and 2015 benchmarks. Extensive experiments demonstrate that SDRN achieves state-of-the-art performances.


Introduction
Opinion entity extraction, which aims at identifying aspects and/or opinion expressions in review sentences, is an important task in fine-grained opinion mining. Recently, there have been considerable studies focused on this task. Specifically, Liu et al. (2012), Li and Lam (2017) and Li et al. (2018) explored aspect term extraction, and Fan et al. (2019) extracted opinion phrases with given aspects. Meanwhile, many studies dealt with aspect and opinion term co-extraction (Xu et al., 2013;Liu et al., 2015;Wang et al., 2017;Wang and Pan, 2019;Dai and Song, 2019). These studies have shown the importance of opinion entity extraction and achieved great progress. However, they neglect to recognize the relations between aspects and opinion expressions.
While aspect-opinion relation detection is one of the key parts of an opinion mining system (Hu and Liu, 2004;Popescu and Etzioni, 2005;Zhuang et al., 2006), it is neglected or assumed given beforehand, which leaves a significant gap to subsequent opinion mining tasks. For instance, as shown in Figure 1, we can obtain the aspect {food} and the opinion expressions {nice-looking, delicious} from opinion entity extraction. Although both nicelooking and delicious express positive sentiment, they further describe food from the appearance and taste perspectives, respectively. Therefore, only with the relations between aspects and opinion expressions, e.g., the pair food, delicious , can the more fine-grained subsequent tasks be executed, such as pair-level sentiment classification, pairlevel opinion clustering, etc.
To bridge the gap between opinion entity extraction and subsequent tasks, we explore Aspect-Opinion Pair Extraction (AOPE) task, which aims at extracting aspects and opinion expressions along with their relations. Specially, AOPE is not only necessary for subsequent tasks, but also beneficial to both opinion entity extraction and relation detection. However, the studies on AOPE are very limited. Early works (Hu and Liu, 2004;Zhuang et al., 2006) approach aspect-opinion pair extraction in a pipeline manner by dividing it into two isolated tasks. Yang and Cardie (2013), Klinger and Cimiano (2013b) and Katiyar and Cardie (2016) attempted to extract opinion entities and relations jointly without considering the interaction between opinion entity extraction and relation detection, which limits the performance.
Therefore, AOPE remains a rather challenging task. First, the relational structure of aspects and opinion expressions within a sentence can be complicated, requiring the model to be effective and flexible in detecting relations. For example, the relations can be one-to-many, many-to-one, and even embedded or overlapped. Second, opinion entity extraction and relation detection are not two independent tasks as in other multitask learning problems but rely on each other, hence posing a key challenge on how to fuse and learn the two subtasks properly. Third, how to synchronize opinion entity extraction with relation detection and make them mutually promotion is another primary challenge.
To address the aforementioned challenges, we propose Synchronous Double-channel Recurrent Network (SDRN). Specifically, we first utilize Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019) to learn context representations. Then, the double-channel recurrent network, which consists of an opinion entity extraction unit and a relation detection unit, is constructed to extract aspects, opinion expressions, and relations simultaneously. To enable the information interaction between the above two channels, we design a synchronization unit which contains Entity Synchronization Mechanism (ESM) and Relation Synchronization Mechanism (RSM). Extensive experiments verify that our model achieves state-ofthe-art performances. In summary, our contributions are three-fold: • We explore AOPE task, which is valuable and critical for downstream tasks but remains under-investigated. • We propose an end-to-end neural model, SDRN 1 . By adopting BERT as the encoding 1 https://github.com/NKU-IIPLab/SDRN layer, SDRN can learn richer context semantics. By designing the double-channel network and two synchronization mechanisms, SDRN could process opinion entity extraction and relation detection jointly and make them mutually beneficial. • We manually build three datasets based on Se-mEval 2014 and 2015 benchmarks for AOPE task. Extensive experiments are conducted to verify that our model achieves state-of-the-art performances.

Related Work
Aspect-opinion pair extraction is a critical task in fine-grained opinion mining. Early studies approach this task in a pipeline manner. Hu and Liu (2004) used association mining to identify aspects and extract the adjacent adjectives as opinions. Zhuang et al. (2006) extracted aspects and opinion expressions first, and then mined the relations with dependency relation templates. Popescu and Etzioni (2005) proposed an unsupervised model to extract aspects and corresponding opinions from reviews with pre-defined rules. Although the above methods achieved great progress, they generally suffered from error propagation. To avoid error propagation, recent studies propose joint learning methods. Klinger and Cimiano (2013a) adopted Imperatively Defined Factor graph (IDF) to analyze the inter-dependencies between aspects and opinion expressions. Klinger and Cimiano (2013b) presented a joint inference model based on IDF to extract aspect terms, opinion terms, and their relations. Yang and Cardie (2013) employed Integer Linear Programming (ILP) to identify opinion-related entities and their associated relations jointly. However, these works generally based on shallow machine learning methods and depended on hand-crafted features.
To automatically capture features, neural network methods have been applied to various finegrained opinion mining tasks. Xu et al. (2018) used Convolutional Neural Network (CNN) to extract aspects. Wang et al. (2016), Wang et al. (2017),  and Wang and Pan (2019) used deep learning methods to deal with aspect and opinion term co-extraction. Li et al. (2018) focused on aspect term extraction and adopted attention mechanism to exploit the latent relations between aspect and opinion terms. Hu et al. (2019) took BERT to extract aspects and corresponding sentiments. For AOPE, Katiyar and Cardie (2016) explored LSTM-based models to jointly extract opinion entities and their relations with three optimization methods. But this method neglects to learn the interaction between opinion entity extraction and relation detection.
Therefore, AOPE is still under-investigated and needs more researches. In this paper, we further explore this task and propose a neural model SDRN.

Model
Given a review sentence S, Aspect-Opinion Pair Extraction (AOPE) task aims to obtain a collection of aspect-opinion pairs C = [ a m , o m ] M m=1 from S, where a m and o m represent the aspect and the opinion expression, respectively 2 .
To deal with AOPE task, we propose Synchronous Double-channel Recurrent Network (SDRN). The overall framework of SDRN is illustrated in Figure 2. Specifically, we first adopt BERT as the encoding layer to learn the context representations. Then, an opinion entity extraction unit and a relation detection unit are constructed as double channels to extract aspects, opinion expressions, and relations simultaneously. Furthermore, a synchronization unit is designed to enable information interaction between the double channels. To capture high-level representations, we recurrently execute the above units. After multiple recurrent steps, we adopt an inference layer to obtain aspect-opinion pairs.

Encoding Layer
Given a review sentence S, we first tokenize it using the WordPiece vocabulary (Wu et al., 2016) and add tokens [CLS] and [SEP] to the beginning and the end of the tokenized sentence, respectively. As a result, we obtain the input sequence X = {x 1 , x 2 , ..., x N } with N tokens for each sentence.
Inspired by the success of BERT (Devlin et al., 2019), we adopt it as the encoder to learn the contextual semantics. For each token x i , the initial embedding e i is constructed by summing the corresponding token embedding e w i , segment embedding e s i , and position embedding e p i . Then, the embedding sequence E = {e 1 , e 2 , ..., e N } is fed into BERT, which consists of stacked Transformer blocks with multiple self-attention heads (Vaswani et al., 2017). We take the output of the last Transformer block as the context representation

Opinion Entity Extraction Unit
The opinion entity extraction unit, which aims at extracting the aspects and the opinion expressions, is developed as a channel of SDRN. To deal with this sequence labeling task, we couple Conditional Random Field (CRF) (Lafferty et al., 2001) upon the encoding layer, which serves as the opinion entity extraction unit. Formally, CRF adopts a state score matrix P ∈ R N ×K to model the mappings between tokens and labels, and a transition score matrix Q ∈ R K×K to model the relations between adjacent labels, where K denotes the dimension of the label space 3 . For a sequence of predicted labels Y t = y t 1 , y t 2 , ..., y t N at the t-th recurrent step, we define its score as follows: where N denotes the input hidden representation sequence at the t-th recurrent step for the opinion entity extraction unit, which is calculated with the context representation sequence H s and the relation synchronization semantics R t−1 . The details will be described in Section 3.3.2. The matrices W p ∈ R do×K and b p ∈ R N ×K are model parameters, where d o denotes the dimension of hidden representation h o t,i . Then, the probability of the predicted sequence Y t can be calculated as follows: where Y t X denotes all possible label sequences. During training, we maximize the likelihood probability p (Y | X) of gold label sequence at the last step. During decoding, we use the Viterbi algorithm to find the label sequence with the maximum score.

Relation Detection Unit
To extract opinion entities and relations simultaneously, we design a relation detection unit as another channel of SDRN. Considering the complicated relations between aspects and opinion expressions, we devise a supervised self-attention mechanism as the relation detection unit to flexibly model tokenlevel relations without the sequential limitation.
At the t-th recurrent step, we first compute the attention matrix G t ∈ R N ×N whose element g t i,j represents the degree of correlation between the i-th token and the j-th token as follows: where γ is a score function, and h r t,i denotes the input hidden representation of the i-th token for the relation detection unit. Note that the hidden representation sequence is calculated with the context representation sequence H s and the entity synchronization semantics U t−1 . The details will be described in Section 3.3.1. The matrices W 1 r ∈ R dr×dr , W 2 r ∈ R dr×dr , and W 3 r ∈ R dr×1 are model parameters, where d r is the dimension of hidden representation h r t,i . At the last step T , we further introduce supervision information into the calculation of the attention matrix G T by maximizing the likelihood probability as follows: where the standard relation matrix Z ∈ R N ×N consists of element z i,j , and the relation probability p (z i,j |x i , x j ) can be calculated as follows: where z i,j = 1 denotes the fact that there is a relation between the i-th token and the j-th token, and vice versa. With this supervision information, the attention can be guided to capture the correlations between the tokens more effectively.

Synchronization Unit
Since the above two channels are interdependent, it is important to synchronize their information and make them mutually beneficial. To this end, we design Entity Synchronization Mechanism (ESM) and Relation Synchronization Mechanism (RSM) to update the hidden representation sequences H o t and H r t by exchanging the high-level information.

Entity Synchronization Mechanism
Considering that opinion entities are generally phrases, both opinion entity semantics and tokenlevel interactions are crucial in detecting relations. For instance, given an aspect 'hot dog' and an opinion expression 'tasty', there is no relation between 'hot' and 'tasty' when only token-level interaction is considered, but it is easy to detect the relation if we utilize the semantics of aspect 'hot dog'.
Accordingly, we design ESM to capture the corresponding entity semantics for each token and integrate these semantics into the hidden representation sequence H r t+1 . Specifically, based on the predicted label sequence Y t and its probability obtained from the opinion entity extraction unit, each entity semantics u t,i of the i-th token at the t-th recurrent step can be calculated as follows: where B t i,j is the label probability of the j-th token if the i-th token and the j-th token belong to the same entity; otherwise, B t i,j is zero. And ϕ(·) is a normalization function.
To integrate both the context representation h s i and the entity semantics u t,i , we calculate the hidden representation h r t+1,i as follows: where W 4 r ∈ R ds×dr and W 5 r ∈ R ds×dr are model parameters, d s is the dimension of context representation, and σ is the activation function which can be tanh or sigmoid function. Note that we use zero matrix to initialize the entity semantics sequence U 0 = {u 0,1 , u 0,2 , ..., u 0,N }.

Relation Synchronization Mechanism
Since the relations between opinion entities can provide clues for opinion entity extraction, it's important to encode the relation semantics. For example, if 'overrated' is used to modify 'pizza', this relation could provide guidance to extract the aspect 'pizza' and the opinion expression 'overrated'.
Thus, we design RSM to capture the semantics which reflect the relations and update the hidden representation sequence H o t+1 . Concretely, at the t-th recurrent step, we can calculate the relation semantics r t,i of the i-th token with the correlated degree g t i,j from the relation detection unit: where ϕ(·) is the same normalization function as Eq.(9). To avoid noise, we utilize φ(·) to filter correlated scores below the given threshold β. Then, we combine the relation semantics r t,i and context representation h s i to obtain the hidden representation h o t+1,i : where W 1 o ∈ R ds×do and W 2 o ∈ R ds×do are model parameters. Similar to ESM, the initial relation semantics sequence R 0 = {r 0,1 , r 0,2 , ..., r 0,N } is set to zero.
Particularly, the integration methods used in ESM and RSM can also make the proposed SDRN easy to optimize, which is similar to the shortcut connections (He et al., 2016).

Joint Learning
To synchronously learn the proposed two channels, we fuse the loss functions from the two channels. For opinion entity extraction unit, given the gold label sequence Y , we minimize the negative loglikelihood loss function at the last step as follows: (14) For the relation detection unit, we convert the gold annotation to a one-hot matrix, where 0 denotes no relations, and 1 represents the existence of relations between two tokens. Then, we minimize the cross-entropy loss between the predicted distributionp (z i,j |x i , x j ) at the last step and the gold distribution p (z i,j |x i , x j ) as follows: (15) Then, the two parts are combined to construct the loss objective of the entire model: The optimization problems in Eq. (16) can be solved by using any gradient descent method. In this paper, we adopt the BERTAdam method.

Inference Layer
Because SDRN synchronously processes opinion entity extraction and relation detection, an inference layer is introduced to generate aspect-opinion pairs based on the results of the two channels.
With the label sequence Y T predicted by the opinion entity extraction unit at the last recurrent step, we can obtain the aspect set A = {a 1 , a 2 , ..., a l A } with l A aspects and the opinion set O = {o 1 , o 2 , ..., o l O } with l O opinion expressions. Then, the relations between aspects and opinion expressions can be calculated according to the weight matrix G T from the relation detection unit. For instance, given an aspect a = x i a S , ..., x i a E and an opinion expression o = x i o S , ..., x i o E , the correlated degree δ between them can be calculated as follows: where |a| and |o| denote the length of aspect and opinion expression. The pair a, o is extracted only if δ is higher than a given thresholdδ.  (Pontiki et al., 2014), Se-mEval 2015 5 (Pontiki et al., 2015), MPQA version 2.0 corpus 6 (Wiebe et al., 2005), and J.D. Power and Associates Sentiment Corpora 7 (JDPA) (Kessler et al., 2010). The statistics of these benchmark datasets are shown in Table 1. For SemEval 2014 and 2015 datasets, we manually build relations between aspects and opinion expressions because the original datasets only contain the gold standard annotation for aspects. Note that we follow the annotations for opinion expressions provided by Wang et al. (2016) and Wang et al. (2017).

Experimental Setting
We adopt the BERT BASE 8 model, which consists of 12 Transformer blocks with 12 self-attention heads, as the encoding layer of SDRN. The dimensions of both the embeddings and the context representation in BERT BASE are 768. To enhance the information interaction between the double channels, we set the recurrent step to 2. During training, we use the BERTAdam optimizer with 0.1 warmup rate. The learning rate is set to 2e-5 and 0.001 for finetuning BERT and training our model, respectively. Meanwhile, we set the batch size to 10 and the dropout rate to 0.5. With the cross-validation, other hyper-parameters are set as follows: d o = 250, d r = 250, β = 0.1, andδ = 0.5.

Evaluation
We use F 1 -score to evaluate the performance of SDRN. We consider a predicted aspect-opinion pair is correct if the gold standard annotations contain a pair the same as the prediction. Besides, following Katiyar and Cardie (2016), we report Binary Overlap F 1 -score for MPQA dataset.

Baselines
To achieve the comprehensive and comparative analysis of SDRN, we compare it with two kinds of models, including Pipeline methods 9 and Joint methods.

Pipeline method
For Pipeline methods, we first select five advanced extraction models to recognize opinion entities. Then, we train the relation detection unit (RD) separated from SDRN with BERT to detect relations. The details about RD are described in Section 3.2.2. The outputs of the extraction models are fed into the RD model to predict relations and obtain aspectopinion pairs. The details of the five extraction models are described as follows: • HAST (Li et al., 2018) exploits two useful clues, namely opinion summary and aspect detection history, to extract the aspects with the help of opinion information. Note that HAST can also extract aspects and opinion expressions simultaneously.
• DE-CNN (Xu et al., 2018) is a simple but outstanding CNN model employing two types of pre-trained embeddings, including generalpurpose and domain-specific embeddings. We trained two DE-CNN models for aspect and opinion expression extraction, respectively.
• IMN (He et al., 2019) is an interactive multitask learning network which jointly learns multiple tasks, including aspect and opinion term co-extraction, aspect-level sentiment classification, etc.
• SPAN (Hu et al., 2019) is a span-based extraction framework based on BERT. We trained two SPAN models for aspect and opinion expression extraction, respectively.
• RINANTE (Dai and Song, 2019) is a weak supervised opinion entity extraction model 9 The Pipeline models are expressed in the form of '{*}+{#}', where '*' means the opinion entity extraction method and '#' is the relation detection method. trained with human-labeled data and rule labeled auxiliary data.

Joint method
To sufficiently verify the performance of SDRN, we also compare it with Joint models: IDF (Klinger and Cimiano, 2013b), CRF+ILP (Yang and Cardie, 2013), and LSTM+SLL+RLL (Katiyar and Cardie, 2016). The details can be found in Section 2.

Experimental Results
We demonstrate and analyze the experimental results to answer the following research questions: • How does SDRN perform compared with the baselines on AOPE task? • Can the performance of opinion entity extraction subtask be improved by the joint learning with relation detection? • Does the synchronization unit promote the information interaction and further enhance the joint learning?

Pair Extraction
The comparison results of aspect-opinion pair extraction are shown in Table 2 and Table 3. According to the results, SDRN consistently obtains the state-of-the-art performances on five datasets. Compared to the best pipeline model, SDRN outperforms SPAN+RD by 2.31%, 1.14% and 3.39% on 14-Res, 14-Lap and 15-Res, respectively. This indicates that the joint model can effectively avoid the error propagation led by pipeline models. Furthermore, SPAN+RD outperforms other baselines, which shows that BERT can capture rich context representations. Besides, HAST+RD, IMN+RD and RINANTE+RD, which utilize the aspect and opinion term co-extraction models, achieve better performances than DE-CNN+RD. This shows that it is helpful to detect relations with considering latent relations between aspects and opinion expressions during the extraction phase.
We also compare SDRN with joint models on JDPA and MPQA datasets, and the results are reported using 10-fold cross validation. According to Table 3, our model brings significant improvements without any hand-crafted features. Particularly, for pair extraction, the results of IDF Joint are 7.4% and 10.5% inferior to IDF Pipeline on JDPA Camera and JDPA Car datasets. This illustrates that joint models may worse than pipeline models without adequate information interaction between opinion entity extraction and relation detection.

Opinion Entity Extraction
Although our task aims to identify the aspectopinion pairs, it is interesting to investigate the performance of opinion entity extraction. Hence, we compare SDRN with representative aspect and opinion expression extraction methods. The results are shown in Table 4. It is clearly shown that SDRN achieves state-of-the-art results on three datasets, which proves that the opinion entity extraction can be significantly improved by joint training with relation detection. Besides, the aspect and opinion term co-extraction models generally superior to aspect term extraction models, which demonstrates that joint extracting aspects and opinion expressions can benefits each other. HAST and SPAN are special cases of aspect term extraction models, because HAST extracts aspects with the help of opinion semantics, and SPAN adopts BERT as the backbone model.

Synchronization Unit
To investigate the efficacy of the synchronization unit composed of ESM and RSM, we perform ablation study and list the results in the second block of  Table 4: Experimental results of opinion entity extraction (F 1 score, %). A and O represent the aspect extraction and the opinion expression extraction, respectively. The methods with ' †' are aspect and opinion term co-extraction models, and others are aspect term extraction models. The results with '*' are reproduced by us, and others are copied from the released paper. Note that the improvements over baselines are significant (p < 0.05).

Dataset
Step = 1 Step = 2 Step = 3 Step = 4 Step = 5 (b) step Compared with Pipeline models, 'SDRN w/o ESM&RSM' is less competitive, which demonstrates that merely joint learning is not superior to the pipeline manner. By utilizing ESM or RSM, the performance is improved, which shows that either ESM or RSM is helpful. Specially, the contribution of ESM is slightly larger than RSM. Moreover, with the two synchronization mechanisms, SDRN surpasses all the baselines.

Convergence and Sensitivity Study
In Figure 3 . It can be observed that the performance of SDRN increases first and then becomes steady or slightly declining as the step number increases. For 15-Res, the limitation of training data may be the cause of performance decline. And the best results are generally obtained with two steps on three datasets, indicating that SDRN with two steps is enough to exploit the interaction information.

Visualization and Case Study
In order to verify the relation detection capability of SDRN, we visualize the attention scores in Figure  4. It is shown that SDRN can accurately capture the relations between aspects and opinion expressions, even with complex reviews. To clearly analyze the effect of the joint learning and the synchronization unit, some predictions of SDRN, 'SDRN w/o ESM&RSM' and SPAN+RD are listed in Table 5. It can be concluded that SPAN+RD suffers the problem of error propagation. For example, it divides 'selection of food' into 'selection' and 'food' in Review #2, and misses 'laid-back' in Review #3. With the pipeline way, it is impossible to obtain a correct pair once there is an incorrect extraction of entities at the first step. Due to the lack of information interaction, 'SDRN w/o ESM&RSM' is generally faced with relation detection errors when relations are complex. For example, it extracts error pair (receiver, superlatives) in Review #1, and fails to detect the relations between 'decor' and 'laid-back' in Review #3. In contrast, our model can effectively avoid the above problems.

Conclusion
In this paper, we explored Aspect-Opinion Pair Extraction (AOPE) task and proposed Synchronous Double-channel Recurrent Network (SDRN). Specifically, the opinion entity extraction unit and the relation detection unit are designed to extract aspects, opinion expressions and their relations simultaneously. The two units update themselves in a recurrent manner and form two channels, respectively. Meanwhile, the synchronization unit is devised to integrate high-level interaction information and enable the mutual benefit on opinion entity extraction and relation detection. Extensive experiments showed that our model achieves stateof-the-art performances.