Active Sentiment Domain Adaptation

Domain adaptation is an important technology to handle domain dependence problem in sentiment analysis field. Existing methods usually rely on sentiment classifiers trained in source domains. However, their performance may heavily decline if the distributions of sentiment features in source and target domains have significant difference. In this paper, we propose an active sentiment domain adaptation approach to handle this problem. Instead of the source domain sentiment classifiers, our approach adapts the general-purpose sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode, as well as the domain-specific sentiment similarities among words mined from unlabeled samples of target domain. A unified model is proposed to fuse different types of sentiment information and train sentiment classifier for target domain. Extensive experiments on benchmark datasets show that our approach can train accurate sentiment classifier with less labeled samples.


Introduction
Sentiment classification is widely known as a domain-dependent problem (Liu, 2012;Pang and Lee, 2008;Blitzer et al., 2007;. This is because different domains usually have many different sentiment expressions. For example, "lengthy" and "boring" are popularly used in Book domain to express negative sentiment. However, they are rare in Kitchen appliance domain. Moreover, the same word or phrase may convey * Corresponding author. different sentiments in different domains. For instance, "unpredictable" is frequently used to express positive sentiment in Movie domain (e.g., "The plot of this movie is fun and unpredictable"). However, it tends to be used as a negative word in Kitchen appliance domain (e.g., "Even holding heat is unpredictable. It is just terrible!"). Thus, every domain has many domain-specific sentiment expressions, which cannot be captured by other domains. The performance of directly applying a general sentiment classifier or a sentiment classifier trained in other domains to target domain is usually suboptimal.
Since there are a large number of domains in user-generated content, it is impractical to manually annotate enough samples for each domain to train an accurate domain-specific sentiment classifier. Thus, sentiment domain adaptation, which transfers the sentiment classifier trained in a source domain with sufficient labeled data to a target domain with no or scarce labeled data, has been widely studied (Blitzer et al., 2007;He et al., 2011;Glorot et al., 2011). Existing sentiment domain adaptation methods are mainly based on transfer learning techniques. Many of them try to learn a new feature representation to augment or replace the original feature space in order to reduce the gap of sentiment feature distributions between source and target domains Glorot et al., 2011). For example, Blitzer et al. (2007) proposed to learn a latent representation for domain-specific words from both source and target domains by using pivot features as bridge. The advantage of these methods is that no labeled data in target domain is needed. However, when the distributions of sentiment features in source and target domains have significant difference, the performance of domain adaptation will heavily decline . In some cases, the performance of adaptation is even lower than that without adaptation, which is usually known as negative transfer (Pan and .
In this paper, we propose an active sentiment domain adaptation approach to handle this problem by incorporating both general sentiment information and a small number of actively selected labeled samples from target domain. More specifically, in our approach the general sentiment information extracted from sentiment lexicons is adapted to target domain using domain-specific sentiment similarities among words. The general sentiment information is regarded as a "background" domain to transfer. The word similarities are extracted from unlabeled samples of target domain using both syntactic rules and co-occurrence patterns. Then we actively select and annotate a small number of informative samples from target domain in an active learning manner. These labeled samples are incorporated into our approach to improve the performance of sentiment domain adaptation. A unified model is proposed to incorporate different types of sentiment information to train sentiment classifier for target domain. Extensive experiments were conducted on benchmark datasets. The experimental results show that our approach can train accurate sentiment classifiers and reduce the manual annotation effort.

Sentiment Domain Adaptation
Sentiment classification is well known as a highly domain-dependent task, and domain adaptation is widely studied in sentiment analysis field to handle this problem (Blitzer et al., 2007;He et al., 2011;Glorot et al., 2011). Existing sentiment domain adaptation methods are mainly based on transfer learning technique (Pan and , where sentiment classifiers are trained in one or multiple source domains with sufficient labeled samples, and then applied to target domain where there is no or only scarce labeled samples. In order to reduce the gap of sentiment feature distributions between source and target domains, many sentiment domain adaptation methods try to learn a new feature representation to augment or replace the original feature space. For example,  proposed a sentiment domain adaptation method based on spectral feature alignment (SFA) algorithm. They first manually selected several domain-independent features and computed the associations between domain-specific features and domain-independent features. After that they built a bipartite graph where domain-independent and domain-specific features were regarded as two types of nodes. Then domain-specific features were grouped into several clusters using spectral clustering algorithm. These clusters were used to augment the original feature representations. Glorot et al. (2011) proposed a sentiment domain adaptation method based on a deep learning technique, i.e., Stacked Denoising Autoencoders. They learned the parameters of neural networks using unlabeled samples from both source and target domains, and used the hidden nodes of the neural networks as the latent feature representations of both domains. Then they trained sentiment classifiers using source domain labeled data in this new feature space and applied it to target domain. The advantage of these sentiment domain adaptation methods is that they do not rely on the labeled data in target domain. However, they have a common shortcoming, i.e., when the distributions of sentiment features in source and target domains have significant difference, the performance of domain adaptation will heavily decline . In some cases, negative transfer may happen (Blitzer et al., 2007;, which means the performance of adaptation is worse than that without adaptation (Pan and . Different from many existing sentiment domain adaptation methods, in our approach we adapt the general sentiment information in sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode. Since the sentiment words in generalpurpose sentiment lexicons usually convey consistent sentiment polarities in different domains, and the actively selected labeled samples contain rich domain-specific sentiment information of target domain, our approach can effectively reduce the risk of negative transfer. The usefulness of labeled samples from target domain in sentiment domain adaptation has been observed by previous research works (Choi and Cardie, 2009;Chen et al., 2011;. For example, Choi and Cardie (2009) proposed to adapt a sentiment lexicon to a specific domain by exploiting both the relations among words which co-occur in the same sentiment expressions and the relations between words and labeled sentiment expressions. However, the labeled samples used in these methods are randomly selected, while in our approach we actively select informative samples from target domain to annotate. Thus, our approach has the potential to reduce the manual annotation effort.

Active Learning
Active learning is a useful technique in scenarios where unlabeled data is abundant but their labels are difficult or expensive to obtain (Tong and Koller, 2002;Settles, 2010). By actively selecting informative samples to label, active learning can effectively reduce the annotation effort, and improve the classification performance with limited budget (Li et al., 2012). An important problem in active learning is how to evaluate the informativeness of unlabeled samples (Fu et al., 2013). Different methods have been applied to select informative samples, such as uncertainty sampling (Zhu et al., 2010;Yang et al., 2015), queryby-committee (Freund et al., 1997; and so on. In our approach, uncertainty combined with density is used to measure the informativeness of samples. A major difference between our approach and existing active learning methods is that in existing methods the parameters of the initial classifier are either initialized as zero (Cesa-Bianchi et al., 2006) or learned from a set of randomly selected samples (Settles, 2010). In contrast, the initial sentiment classifier in our approach is constructed by adapting the general sentiment information to target domain via the domain-specific sentiment similarities among words.
There are a few works that apply active learning methods to sentiment domain adaptation task (Rai et al., 2010;. For example, Rai et al. (2010) proposed an online active learning algorithm for sentiment domain adaptation. They started with a sentiment classifier trained on the labeled samples of a source domain. Then they sequentially selected informative samples in target domain to annotate with a probability positively related to classification uncertainty. The newly annotated samples were used to update the sentiment classifier in an online learning manner.  proposed another active learning method for cross-domain sentiment classification. In their method they trained two sentiment classifiers, one on the labeled samples of source domain, and the other one on the labeled samples of target domain. Then query-by-committee strategy was used to se-lect the informative instances from target domain. Different from these methods, our approach does not rely on the labeled data of source domains. Instead, in our approach the general sentiment information in sentiment lexicons is actively adapted to target domain, which usually has better generalization ability in various domains than the sentiment classifier trained in a source domain. In addition, our approach can incorporate the domainspecific sentiment similarities among words mined from unlabeled samples of target domain, which are not considered in these methods.

Notations
First we introduce several notations that will be used in remaining part of this paper. Denote the general sentiment information extracted from a general-purpose sentiment lexicon as p ∈ R D×1 , where D is the vocabulary size. If the i th word is labeled as positive (or negative) in the sentiment lexicon, then p i = +1 (or p i = −1). Otherwise, p i = 0. Following many previous works in sentiment classification field (Blitzer et al., 2007;, here we select linear classifier as sentiment classifier, and denote the linear classification model as w ∈ R D×1 . We use f (x i , y i , w) to represent the loss of classifying the i th labeled sample in target domain under the classification model w, where f is the classification loss function, x i ∈ R D×1 is the feature vector of this sample and y i is its sentiment label. In this paper we focus on binary sentiment classification and y i ∈ {+1, −1}. In addition, we select log loss for f . Thus, f (x i , y i , w) = log(1 + exp(−y i w T x i )). Besides, we use S ∈ R D×D to represent the sentiment similarities among words extracted from unlabeled samples of target domain.

Domain-Specific Sentiment Similarities
Next we introduce the extraction of domainspecific sentiment similarities among words from unlabeled samples of target domain. Two types of similarities are extracted in this paper. The first one is based on syntactic rules, which is inspired by (Hatzivassiloglou and McKeown, 1997;Huang et al., 2014;. If two words have the same POS-tag such as adjective, verb, and adverb, and they are connected by coordinating conjunction "and" in the same sentence, then we regard they convey the same sentiment polarity. In addition, if two words are connected by adversative conjunction "but" and have the same POS-tag, then they are assumed to have opposite sentiment polarities. Denote S r ∈ R D×D as the sentiment similarities extracted from unlabeled samples according to syntactic rules, and the similarity score between words i and j is defined as: where N s i,j and N o i,j are the frequencies of words i and j having the same or opposite sentiments respectively according to the syntactic rules, and α 1 is a positive smoothing factor. If two words have much higher frequency of sharing the same sentiment than opposite sentiments, then they will have a larger positive sentiment similarity score. Note that S r i,j can be negative according to Eq. (1). Here we focus on sentiment similarity rather than dissimilarity, and set all the negative values in S r to zero. The range of S r i,j is [0, 1]. The second type of sentiment similarities are extracted according to the co-occurrence patterns among words. It is inspired by the observation that words frequently co-occurring with each other not only have a high probability to have similar semantics, but also tend to share similar sentiments (Turney, 2002;Velikovich et al., 2010;Yogatama and Smith, 2014;Tang et al., 2015;Hamilton et al., 2016). In this paper, we compute the co-occurrence between words in the context of document. Denote D as the set of all documents, and N i d as the frequency of word i appearing in document d. Then, the sentiment similarity score between words i and j based on their co-occurrence patterns is defined as: where α 2 is a positive smoothing parameter. If two words frequently co-occur with each other in many documents, then they will have a high sentiment similarity score according to Eq. (2). The range of S c i,j is also [0, 1]. Denote S c ∈ R D×D as the set of all sentiment similarities extracted according to co-occurrence patterns.
The sentiment similarities extracted according to syntactic rules are usually of high accuracy. However, their coverage is limited, because the word pairs detected by these syntactic rules are sparse. In contrast, the coverage of sentiment similarities extracted from co-occurrence patterns is quite wide because document is a long context, while their accuracies are not as high as the similarities based on syntactic rules. Thus, we propose to combine these two types of sentiment similarities to obtain a balance between accuracy and coverage. Denote S ∈ R D×D as the final sentiment similarities among words, and is the combination coefficient. In this paper we set θ to 0.5, which means that we regard these two types of sentiment similarities as equally important.

Initial Sentiment Classifier Construction
In this section, we introduce the construction of the initial sentiment classifier to start the active learning process. Existing active learning methods usually randomly select a set of unlabeled samples to annotate and then train the initial classifier on them (Settles, 2010). However, these randomly selected samples may be redundant and not informative enough. In this paper, we propose to build the initial sentiment classifier by adapting the general sentiment information to target domain via domain-specific sentiment similarities as follows: where w 0 ∈ R D×1 is the initial sentiment classifier, α is a positive regularization coefficient, p i is the prior sentiment polarity of word i in sentiment lexicons, and S i,j is the sentiment similarity score between words i and j. Eq. (3) is motivated by (Bengio et al., 2006), and the quadratic cost criterion is equivalent to label propagation. In Eq. (3), − D i=1 p i w i means that if a word i is labeled as a positive (or negative) word in a generalpurpose sentiment lexicon, i.e., p i > 0 (or p i < 0), then we constrain that its sentiment weight in the sentiment classifier is also positive (or negative). Otherwise, a penalty will be added to the objective function. In addition, D i=1 j =i S i,j (w i − w j ) 2 represents that if two words share high sentiment similarity, then we constrain they have similar sentiment weights in sentiment classifier. For example, if we find that "great" and "easy" have high sentiment similarities in Kitchen appliances domain (e.g., "This is a great pan and easy to wash"), and "great" is a positive sentiment word in many sentiment lexicons, then we can infer that "easy" may also be a positive sentiment word in this domain by propagating the sentiment information from "great" to "easy". In this way, the general sentiment information can be adapted to many domainspecific sentiment expressions in target domain.

Query Strategy
Active learning methods iteratively select the most informative instances to label and add them to the training set (Settles, 2010). Thus, an important issue in these methods is how to measure the informativeness of unlabeled samples. In this paper, we select classification uncertainty as the informativeness measure, which has been proven effective in many active learning methods (Zhu et al., 2010;Yang et al., 2015). Since we focus on binary sentiment classification and the classification loss function is log loss, the classification uncertainty of an unlabeled instance x is defined as: where w is the linear sentiment classification model. The range of U (x) is [0, 1]. If |w T x| is large, which means that current sentiment classifier is confident in classifying this instance, then the uncertainty of x (i.e., U (x)) will be low. If |w T x| is close to 0, then the sentiment classifier is very uncertain about this instance, probably because the sentiment expressions in this instance are not covered by current sentiment classifier, and the uncertainty of the instance x will be high. In this case, annotating this instance and adding it to the training set are beneficial, because it can provide the information of unknown sentiment expressions and has the potential to quickly improve the quality of target domain sentiment classifier. However, many researchers have found that unlabeled instances with high uncertainties can be outliers, whose labels are useless and even misleading (Settles, 2010;Zhu et al., 2010). Thus, here we combine uncertainty with representativeness to avoid outliers. Density is proven to be an effective measure of representativeness in active learning methods (Zhu et al., 2010;Hajmohammadi et al., 2015). Here we use the k-nearest neighbour based density proposed by Zhu et al. (2010) as the representativeness measure, which is formulated as: where N (x) is the set of k most similar instances of x. The final informativeness score of an unlabeled sample is a linear combination of uncertainty and density which is formulated as follows: where η(t) ∈ [0, 1] is the combination coefficient at the t th iteration. In this paper, we select a monotonically increasing function for η(t), i.e., where T is the total number of iterations. It means that at initial iterations we put more emphasis on instances with high representativeness, because the initial sentiment classifier built by adapting the general sentiment information via the domain-specific sentiment similarities is relatively weak, and we prefer to select instances with more popular sentiment expressions to annotate. As more and more labeled samples are added to the training set and the sentiment classifier becomes stronger, we gradually focus on more difficult instances, i.e., those having higher classification uncertainty scores.

Active Domain Adaptation
Based on previous discussions, in this section we introduce the complete procedure of our active sentiment domain adaptation (ASDA) approach. Different from existing sentiment domain adaptation methods which rely on the sentiment classifier trained in source domains to transfer, in our approach we regard the general sentiment information in sentiment lexicons as the "background" domain and adapt it to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode. First, we build an initial sentiment classifier according to Eq. (3) by adapting the general sentiment information to target domain using the domainspecific sentiment similarities among words mined from unlabeled samples of target domain. Second, we compute the density of each unlabeled sample in U according to Eq. (5). Then we repeat following steps until the annotation budget has run out. First, we compute the uncertainty of each unlabeled sample in U according to Eq. (4), and further we compute their informativeness by combining uncertainty with density according to Eq. (6). Next, we select the unlabeled sample with the highest informativeness from U and manually annotate its sentiment polarity. Then we add it to the set of labeled samples L and remove it from U. After that we retrain the sentiment classifier for target domain based on the general sentiment information p, the labeled samples L, and the domain-specific sentiment similarities S as follows: where α, β, and λ are nonnegative coefficients. By the term − D i=1 p i w i we constrain that the target domain sentiment classifier learned by our approach is consistent with the general sentiment information. Through this way, the general sentiment information extracted from sentiment lexicons can be adapted to target domain. The term D i=1 j =i S i,j (w i − w j ) 2 is motivated by label propagation (Bengio et al., 2006). If two words tend to have high sentiment similarity with each other according to many unlabeled samples of target domain, then we constrain that their sentiment weights in the target domain sentiment classifier are also similar. The term x i ∈L log(1 + exp(−y i w T x i )) means that we hope to minimize the empirical classification loss on labeled samples of target domain. By this term the sentiment information in the labeled samples is incorporated into the learning of target domain sentiment classifier. The L 2 -norm regularization term is introduced to control model complexity. The sentiment classifier trained in Eq. (7) is further used at the next iteration of active sentiment domain adaptation until all the budget of manual annotation has been used. Then we obtain the final sentiment classifier of target domain. The complete algorithm of our active sentiment domain adaptation (ASDA) approach is summarized in Algorithm 1.
Algorithm 1 Active sentiment domain adaptation. 1: Input: The set of unlabeled samples U, the general sentiment information p, the domain-specific sentiment similarities S, and the total annotation budget N . 2: Output: Target domain sentiment classifier w. 3: Train the initial sentiment classifier w0 (Eq. (3)). 4: Compute the density of each sample xi in U (Eq. (5)). 5: Initialize the set of labeled samples L = ∅, the iteration number t = 0, and the sentiment classifier w = w0. 6: while t < N do 7: t = t + 1.

Datasets
The dataset used in our experiments is the Amazon product review dataset 1 collected by Blitzer et al. (2007), which is widely used in sentiment analysis and domain adaptation research Bollegala et al., 2011). This dataset contains product reviews in four domains, i.e., Book, DVD, Electronics, and Kitchen appliances. In each domain, 1,000 positive and 1,000 negative reviews as well as a large number of unlabeled samples are included. The detailed statistics of this dataset are summarized in Table 1  Following many previous works (Blitzer et al., 2007;Bollegala et al., 2011), unigrams and bigrams were used to build feature vectors in our experiments. We randomly split the labeled samples in each domain into two parts with equal size. The first part was used as test data, and the second part was used as the pool of "unlabeled" samples to perform active learning. The general sentiment information was extracted from Bing Liu's sentiment lexicon 2 (Hu and Liu, 2004), which is one of the state-of-the-art general-purpose sentiment lexicons. The domain-specific sentiment similarities among words were extracted from the large-scale unlabeled samples. The total number of samples actively selected by our approach to annotate was set to 100. The values of α, β, and λ were set to 0.1, 1, and 1 respectively. We repeated each experiment 10 times independently and the average results were reported.

Algorithm Effectiveness
First we conducted several experiments to explore the effectiveness of our active sentiment domain adaptation (ASDA) approach. We hope to answer two questions via these experiments: 1) whether the domain-specific sentiment similarities among words mined from unlabeled samples of target domain are useful for adapting the general sentiment information to target domain; 2) whether a small number of samples which are actively selected and annotated in target domain can help improve the domain adaptation performance. In our experiments, we implemented different versions of our ASDA approach using different combinations of sentiment information. The first one is Lexicon, which means only using the general sentiment information and no domain adaptation is conducted. It serves as a baseline. The second one is Lexicon+SentiSim, which means adapting general sentiment information to target domain using domain-specific sentiment similarities, but labeled samples of target domain are not incorporated. The third one is Lexicon+SentiSim+Label, which is the complete ASDA approach. The experimental results are summarized in Fig. 1.
According to Fig. 1, the performance of Lexicon is suboptimal. This is because the general sentiment lexicons cannot capture the domain-specific sentiment expressions in target domain (Choi and Cardie, 2009). Lexicon+SentiSim performs significantly better than Lexicon, which validates that the sentiment similarities among words extracted from unlabeled samples of target domain contain rich domain-specific sentiment information, and can help propagate the general sentiment information to many domain-specific sentiment expressions. Besides, after incorporating a small number of labeled samples which are actively selected and annotated by our approach in an active learning mode, the performance of our sentiment dom- ain adaptation approach is significantly improved. This is because although these labeled samples are in limited size and cannot cover all the sentiment expressions in target domain, they can provide sentiment information of popular domain-specific sentiment expressions, which can be propagated to other sentiment expressions in target domain during the domain adaptation process. Thus, above experimental results validate the effectiveness of our approach.
We also conducted several experiments to verify the advantage of the actively selected samples over randomly selected samples and validate the effectiveness of our active learning algorithm. We also compared the dynamic weighting scheme for combining uncertainty and density with the constant weighting scheme. The experimental results are summarized in Fig. 2. According to Fig. 2, our approach with actively selected samples performs better than that with randomly selected samples. It indicates that these actively selected samples are more informative than randomly selected samples for sentiment domain adaptation. In addition, our approach with dynamic weighting scheme in combining uncertainty and density outperforms that with constant weighting scheme, which implies that it is beneficial to emphasize representative samples at initial iterations and gradually focus on difficult samples at later iterations. Thus, the experimental results validate the effectiveness of our active learning algorithm.

Performance Evaluation
In this section we conducted experiments to evaluate the performance of our approach by comparing it with several baseline methods. The methods to be compared include: 1) MPQA and Bing-Liu, using two state-of-the-art sentiment lexicons, i.e., MPQA (Wilson et al., 2005) and Bing Liu's lexicon (Hu and Liu, 2004) for sentiment classification following the suggestions in (Hu and Liu, 2004); 2) SVM, LS, and LR, three popular supervised sentiment classification methods, i.e., support vector machine (Pang et al., 2002), least squares (Hu et al., 2013) and logistic regression (Wu et al., 2015); 3) ZIAL, the zero initialized active learning method (Cesa-Bianchi et al., 2006); 4) LIAL, the active learning method initialized by randomly selected labeled data (Settles, 2010); 5) SCL and SFA, two famous sentiment domain adaptation methods proposed in (Blitzer et al., 2007) and respectively; 6) ILP, adapting sentiment lexicons to target domain via integer linear programming (Choi and Cardie, 2009); 7) AODA, the active online domain adaptation method (Rai et al., 2010); 8) ALCD, the active learning method for cross-domain sentiment classification ; 9) ASDA, our active sentiment domain adaptation approach. For above methods, if labeled target domain samples are needed in training, the number of labeled samples was set to 100, and if source domain labeled samples are needed in training, the number of labeled samples was set to 1,000. The parameters in baseline methods were tuned via cross-validation. The experimental results are summarized in Table 2.  According to Table 2, the performance of directly applying sentiment lexicons to target domain is suboptimal. This is because there are many domain-specific sentiment expressions that are not covered by these general-purpose sentiment lexicons (Choi and Cardie, 2009). In addition, the performance of supervised sentiment classification methods such as SVM, LS, and LR is also limited, because the labeled samples for training are extremely scarce. The active learning methods such as ZIAL (Cesa-Bianchi et al., 2006) and LIAL (Settles, 2010) perform relatively better, because they can actively select informative samples to annotate and learn. Our approach can outperform both of them. This is because besides the labeled samples, our approach also adapts the general sentiment information in sentiment lexicons to target domain and incorporates it into the learning of target domain sentiment classifier. Our approach also performs better than state-of-the-art domain adaptation methods such as SCL (Blitzer et al., 2007) and SFA . It implies that a small number of actively selected labeled samples from target domain are beneficial for sentiment domain adaptation. ILP (Choi and Cardie, 2009) tries to adapt a sentiment lexicon to target domain, which is similar with our approach. ILP relies on labeled samples to extract the relations among words and relations between words and sentiment expressions. However, labeled samples in target domain are usually limited and the sentiment information in many unlabeled samples is not exploited in ILP. Thus, our approach can outperform it. Similar with our approach, AODA (Rai et al., 2010) and ALCD  also apply active learning to domain adaptation. The major difference is that in our approach the general sentiment information extracted from sentiment lexicons is adapted to target domain, while in AODA and ALCD the sentiment classifier trained in source domains is transferred. The superior performance of our approach implies that the general sentiment information has better generalization ability than the sentiment classifier trained in a specific source domain, and is more suitable for sentiment domain adaptation.  We further conducted several experiments to validate the advantage of our approach in training accurate sentiment classifier for target domain with only a few labeled samples. We varied the annotation budget, i.e., the number of labeled samples, from 100 to 1,000. The learning curve of our ASDA approach in Book domain is shown in Fig. 3. We also included a purely supervised sentiment classification method, i.e., SVM, in Fig. 3 as a baseline for comparison. Fig. 3 shows that our ASDA approach can consistently outperform SVM when the same number of labeled samples are used. The performance advantage of our approach is more significant when labeled samples are scarce. For example, the performance of our approach with only 200 labeled samples is similar to SVM with more than 800 labeled samples. Thus, the experimental results validate that by adapting the general sentiment information to target domain and selecting the most informative samples to annotate and learn, our approach can effectively reduce the manual annotation effort, and can train accurate sentiment classifier for target domain with much less labeled samples.

Parameter Analysis
In this section, we conducted several experiments to explore the influence of parameter settings on the performance of our approach. α and β are the two most important parameters in our approach, which control the relative importance of domainspecific sentiment similarities and the actively selected samples in training sentiment classifier for target domain. The experimental results of parameters α and β are summarized in Fig. 4.
According to Fig. 4, when α and β are too small, the performance of our approach is not optimal. This is because the useful sentiment information in domain-specific sentiment similarities mined from unlabeled samples and the actively selected labeled samples of target domain is not fully exploited. Thus, the performance of our approach improves when these parameters increase from a small value. However, when these parameters become too large, the performance of our approach starts to decline. This is because when β is too large the sentiment classifier learned by our approach is mainly decided by the limited labeled samples, and the general sentiment information extracted from sentiment lexicons is not fully exploited. When α is too large, the information in domain-specific sentiment similarities is overemphasized, and many different words will have nearly the same sentiment weights. Thus, the performance of our approach in these scenarios is also not optimal. A moderate value of α and β is most suitable for our approach.

Conclusion
In this paper we present an active sentiment domain adaptation approach to train accurate sentiment classifier for target domain with less labeled samples. In our approach, the general sentiment information in sentiment lexicons is adapted to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode. Both classification uncertainty and density are considered when selecting informative samples to label. In addition, we extract domain-specific sentiment similarities among words from unlabeled samples of target domain based on both syntactic rules and cooccurrence patterns, and incorporate them into the domain adaptation process to propagate the general sentiment information to many domain-specific sentiment words in target domain. We also propose a unified model to incorporate different types of sentiment information to train sentiment classifier for target domain. Experimental results on benchmark datasets show that our approach can train accurate sentiment classifier and at same time reduce the manual annotation effort.