KinGDOM: Knowledge-Guided DOMain Adaptation for Sentiment Analysis

Cross-domain sentiment analysis has received significant attention in recent years, prompted by the need to combat the domain gap between different applications that make use of sentiment analysis. In this paper, we take a novel perspective on this task by exploring the role of external commonsense knowledge. We introduce a new framework, KinGDOM, which utilizes the ConceptNet knowledge graph to enrich the semantics of a document by providing both domain-specific and domain-general background concepts. These concepts are learned by training a graph convolutional autoencoder that leverages inter-domain concepts in a domain-invariant manner. Conditioning a popular domain-adversarial baseline method with these learned concepts helps improve its performance over state-of-the-art approaches, demonstrating the efficacy of our proposed framework.


Introduction
Sentiment Analysis (SA) is a popular NLP task used in many applications . Current models trained for this task, however, cannot be reliably deployed due to the distributional mismatch between the training and evaluation domains (Daumé III and Marcu, 2006). Domain adaptation, a case of transductive transfer learning, is a widely studied field of research that can be effectively used to tackle this problem (Wilson and Cook, 2018).
Research in the field of cross-domain SA has proposed diverse approaches, which include learning domain-specific sentiment words/lexicons (Sarma et al., 2018;Hamilton et al., 2016b), co-occurrence based learning (Blitzer et al., 2007a), domainadversarial learning (Ganin et al., 2016) Figure 1: ConceptNet provides networks with background concepts that enhance their semantic understanding. For example, for a target sentence from electronics domain, The software came with decent screen savers, comprising domainspecific terms like screen saver or wallpaper, ConceptNet helps connecting them to general concepts like design, thus allowing a network better understand their meaning. Furthermore, inter-domain conceptual bridge can also be established to connect source and target domains (wallpaper-sketch have similar conceptual notions under the link design).
others. In this work, we adopt the domainadversarial framework and attempt to improve it further by infusing commonsense knowledge using ConceptNet -a large-scale knowledge graph (Speer et al., 2017).
Augmenting neural models with external knowledge bases (KB) has shown benefits across a range of NLP applications IV et al., 2019;Bi et al., 2019). Despite their popularity, efforts to incorporate KBs into the domain-adaptation framework has been sporadic (Wang et al., 2008;Xiang et al., 2010). To this end, we identify multiple advantages of using commonsense KBs for domain adaptation.
First, KBs help in grounding text to real enti-ties, factual knowledge, and commonsense concepts. Commonsense KBs, in particular, provide a rich source of background concepts-related by commonsense links-which can enhance the semantics of a piece of text by providing both domainspecific and domain-general concepts Zhong et al., 2019;Agarwal et al., 2015;Zhong et al., 2019) (see Fig. 1). For cross-domain SA, word polarities might vary among different domains. For example, heavy can be a positive feature for a truck, but a negative feature for a smartphone. It is, however, difficult to assign contextual-polarities solely from data, especially when there is no supervision (Boia et al., 2014). In this domain-specific scenario, commonsense knowledge provides a dynamic way to enhance the context and help models understand sentimentbearing terms and opinion targets through its structural relations (Cambria et al., 2018). They also often aid in unearthing implicitly expressed sentiment (Balahur et al., 2011). Second, domains often share relations through latent semantic concepts (Kim et al., 2017a). For example, notions of wallpaper (from electronics) and sketch (from books) can be associated via related concepts such as design (see Fig. 1). Multi-relational KBs provide a natural way to leverage such inter-domain relationships. These connections can help models understand target-specific terms by associating to known domain-general or even source-specific concepts.
Following these intuitions, we propose a twostep modular framework, KinGDOM (Knowledge-Guided Domain adaptation), which utilizes commonsense KB for domain adaptation. KinGDOM first trains a shared graph autoencoder using a graph convolution network (GCN) on ConceptNet, so as to learn: 1) inter-domain conceptual links through multiple inference steps across neighboring concepts; and 2) domain-invariant concept representations due to shared autoencoding. It then extracts document-specific sub-graph embeddings and feeds them to a popular domain-adversarial model DANN (Ganin et al., 2016). Additionally, we also train a shared autoencoder on these extracted graph embeddings to promote further domain-invariance (Glorot et al., 2011).
Our main contributions in this work are: 1. We propose KinGDOM, a domain-adversarial framework that uses an external KB (Concept-Net) for unsupervised domain adaptation. KinG-DOM learns domain-invariant features of KB concepts using a graph autoencoding strategy.

We demonstrate, through experiments, that
KinGDOM surpasses state-of-the-art methods on the Amazon-reviews dataset (Blitzer et al., 2007b), thus validating our claim that external knowledge can aid the task of cross-domain SA.
In the remaining paper, §2 explains related works and compares KinGDOM to them; §3 presents task definition and preliminaries; §4 introduces our proposed framework, KinGDOM; §5 discusses experimental setup followed by results and extensive analyses in §6; finally, §7 concludes this paper.

Related Work
Domain adaptation methods can be broadly categorized into three approaches: a) instanceselection (Jiang and Zhai, 2007;Chen et al., 2011;Cao et al., 2018), b) self-labeling (He and Zhou, 2011) and c) representation learning (Glorot et al., 2011;Chen et al., 2012;Tzeng et al., 2014). Our focus is on the third category which has emerged as a popular approach in this deep representation learning era (Ruder, 2019;Poria et al., 2020).
Domain-adversarial Training. Our work deals with domain-adversarial approaches (Kouw and Loog, 2019), where we extend DANN Ganin et al. (2016). Despite its popularity, DANN cannot model domain-specific information (e.g. indicators of tasty, delicious for kitchen domain) (Peng et al., 2018b). Rectifications include shared-private encoders that model both domain-invariant andspecific features (Li et al., 2012;Bousmalis et al., 2016a;Kim et al., 2017b;Chang et al., 2019), using adversarial and orthogonality losses (Liu et al., 2017;. Although we do not use private encoders, we posit that our model is capable of capturing domain-specificity via the sentencespecific concept graph. Also, our approach is flexible enough to be adapted to the setup of sharedprivate encoders. External Knowledge. Use of external knowledge has been explored in both inductive and transductive settings (Banerjee, 2007;Deng et al., 2018). Few works have explored external knowledge in domain adaptation based on Wikipedia as auxiliary information, using co-clustering (Wang et al., 2008) and semi-supervised learning (SSL) (Xiang et al., 2010). SSL has also been explored by Alam et al. (2018) in the Twitter domain. Although we share a similar motivation, there exist crucial differences. Primarily, we learn graph embeddings at the concept level, not across complete instances. Also, we do not classify each concept node in the graph, which renders SSL inapplicable to our setup.
Domain Adaptation on Graphs. With the advent of graph neural networks, graph-based methods have become a new trend (Ghosal et al., 2019) in diverse NLP tasks such as emotion recognition in conversations . Graphbased domain adaptation is categorized based on the availability of cross-domain connections. For domain-exclusive graphs, approaches include SSL with GCNs (Shen and Chung, 2019) and domainadversarial learning (Dai et al., 2019). For crossdomain connected graphs, co-regularized training (Ni et al., 2018) and joint-embedding (Xu et al., 2017) have been explored. We also utilize GCNs to learn node representations in our cross-domain ConceptNet graph. However, rather than using explicit divergence measures or domain-adversarial losses for domain invariance, we uniquely adopt a shared-autoencoder strategy on GCNs. Such ideas have been explored in vector-based approaches (Glorot et al., 2011;Chen et al., 2012). Sentiment Analysis. One line of work models domain-dependent word embeddings (Sarma et al., 2018;Shi et al., 2018;K Sarma et al., 2019) or domain-specific sentiment lexicons (Hamilton et al., 2016a), while others attempt to learn representations based on co-occurrences of domainspecific with domain-independent terms (Blitzer et al., 2007a;Pan et al., 2010;Sharma et al., 2018). Our work is related to approaches that address domain-specificity in the target domain (Peng et al., 2018b;Bhatt et al., 2015). Works like  attempts to model target-specificity by mapping domain-general information to domainspecific representations by using domain descriptor vectors. In contrast, we address relating domainspecific terms by modeling their relations with the other terms in knowledge bases like ConceptNet.

Task Definition
Domain adaptation deals with the training of models that can perform inference reliably in multiple domains. Across domains, it is assumed that the feature and label spaces are the same but with dis-crepancies in their feature distributions. In our setup, we consider two domains: source D s and target domain D t with different marginal data distributions, i.e., P Ds (x) ≠ P Dt (x). This scenario, also known as the covariate shift (Elsahar and Gallé, 2019), is predominant in SA applications and arises primarily with shifts in topics -causing a difference in vocabulary usage and their corresponding semantic and sentiment associations.
We account for unsupervised domain adaptation, where we are provided with labeled instances from the source domain D l . 1 This is a realistic setting as curating annotations for the target domain is often expensive as well as time consuming. Given this setup, our goal is to train a classifier that can achieve good classification performance on the target domain.

Domain-Adversarial Neural Network
We base our framework on the domain-adversarial neural network (DANN) proposed by Ganin et al. (2016). DANN learns a shared mapping of both source and target domain instances M (x s t ) such that a classifier C trained for the source domain can be directly applied for the target domain. Training of C is performed using the cross-entropy loss: where K is the number of labels. Both the mapping function M and the classifier C are realized using neural layers with parameters θ M and θ C .
Adversarial Loss. The core idea of DANN is to reduce domain gap by learning common representations that are indistinguishable to a domain discriminator. To learn a domain-invariant mapping, DANN uses an adversarial discriminator D adv with parameters θ D , whose job is to distinguish between source and target instances, M (x s ) vs. M (x t ). It is trained using the cross-entropy loss: The mapping function then learns domain invariance by pitting against the discriminator in a minimax optimization with loss L adv M = −L adv D (Tzeng et al., 2017). This setup forces the features to become discriminative to the main Step 1: Knowledge Graph Training Step 2: Domain-adversarial Training Step 1 uses GCN to learn concept representations.
Step 2 feeds concept features to DANN. learning task and indistinguishable across domains. The point estimates of the parameters are decided at a saddle point using the minimax objective: where λ is a hyper-parameter. The minimax objective is realized by reversing the gradients of L adv D when back-propagating through M .

Our Proposed Method
KinGDOM aims to improve the DANN approach by leveraging an external knowledge source i.e., ConceptNet. Such a knowledge base is particularly useful for domain adaptation as it contains both domain specific and domain general knowledge. Unlike traditional word embeddings and semantic knowledge graphs (e.g. WordNet), ConceptNet is unique as it contains commonsense related information. We posit that both these properties of ConceptNet will be highly useful for domain adaptation. KinGDOM follows a two-step approach described below: Step 1: This step deals with training a domainaggregated sub-graph of ConceptNet. In particular, it involves: a) Creating a sub-graph of Concept-Net based on all domains ( §4.1). b) Training a graph-convolutional autoencoder to learn concept embeddings (Schlichtkrull et al., 2018) ( §4.2).
Step 2: After the graph autoencoder is trained, a) we extract and pool document-relevant features from the trained graph for each instance in the dataset ( §4.3). b) The corresponding graph feature vector is then fed into the DANN architecture for adversarial training (Ganin et al., 2016). To further enforce domain invariance, we also introduce a shared autoencoder to reconstruct the graph features ( §4.4).

Step 1a) Domain-Aggregated Commonsense Graph Construction
We construct our domain-aggregated graph from ConceptNet (Speer et al., 2017). First, we introduce the following notation: the ConceptNet graph is represented as a directed labeled graph G = (V, E, R), with concepts/nodes 2 v i ∈ V and labeled edges (v i , r ij , v j ) ∈ E, where r ij ∈ R is the relation type of the edge between v i and v j . The concepts in ConceptNet are unigram words or ngram phrases. For instance one such triplet from ConceptNet is [baking-oven, AtLocation, kitchen].
ConceptNet has approximately 34 million edges, from which we first extract a subset of edges. From the training documents of all domains in our dataset, we first extract the set of all the unique nouns, adjectives, and adverbs. 3 These extracted words are treated as the seeds that we use to filter ConceptNet into a sub-graph. In particular, we extract all the triplets from G which are within a distance of 1 to any of those seed concepts, resulting in a sub-graph G ′ = (V ′ , E ′ , R ′ ), with approximately 356k nodes and 900k edges. This sub-graph would thus contain concepts across all domains along with inter-concept links. Looking at the sub-graph G ′ from the lens of each domain, we can observe the top-10 relations within the domain in Table 1.

Step 1b) Knowledge Graph Pre-training
To utilize G ′ in our task, we first need to compute a representation of its nodes. We do this by training a graph autoencoder model to perform link prediction. The model takes as input an incomplete set of edgesÊ ′ from E ′ in G ′ and then assign scores to possible edges (c 1 , r, c 2 ), determining how likely are these edges to be in E ′ . Following Schlichtkrull et al. (2018), our graph autoencoder model consists of: a R-GCN entity encoder and a DistMult scoring decoder.
Encoder Module. We employ the Relational Graph Convolutional Network (R-GCN) encoder from Schlichtkrull et al. (2018) as our graph encoder network. The power of this model comes from its ability to accumulate relational evidence in multiple inference steps from the local neighborhood around a given concept. The neighborhoodbased convolutional feature transformation process always ensures that distinct domains are connected via underlying concepts and influence each other to create enriched domain-aggregated feature vectors.
Precisely, our encoder module consists of two R-GCN encoders stacked upon one another. The initial concept feature vector g i is initialized randomly and thereafter transformed into the domainaggregated feature vector h i ∈ R d using the twostep graph convolution process. The transformation process is detailed below: where N r i denotes the neighbouring concepts of concept i under relation r ∈ R; c i,r is a normalization constant which either can be set in advance, such that, c i,r = N r i , or can be learned in a gradient-based learning setup. σ is an activation function such as ReLU, and W This stack of transformations effectively accumulates the normalized sum of the local neighborhood i.e. the neighborhood information for each concept in the graph. The self-connection ensures self-dependent feature transformation.
Decoder Module. DistMult factorization (Yang et al., 2014) is used as the scoring function. For a triplet (c i , r, c j ), the score s is obtained as follows: where σ is the logistic function; h c i , h c j ∈ R d are the R-GCN encoded feature vectors for concepts c i , c j . Each relation r ∈ R is also associated with a diagonal matrix R r ∈ R d×d .
Training. We train our graph autoencoder model using negative sampling (Schlichtkrull et al., 2018). For triplets inÊ ′ (positive samples), we create an equal number of negative samples by randomly corrupting the positive triplets. The corruption is performed by randomly modifying either one of the constituting concepts or the relation, creating the overall set of samples denoted by T . The task is set as a binary classification between the positive/negative triplets, where the model is trained with the standard cross-entropy loss: (y log s(c i , r, c j )+ (1 − y) log(1 − s(c i , r, c j ))).
Once we train the autoencoder graph model, it will ensure that target domain-specific concepts (crucial for KG) can possibly be explained via domain-general concepts and further via interdomain knowledge. In other words, the encoded node representations h i will capture commonsense graph information in the form of domain-specific and domain-general features and thus will be effective for the downstream task when there is a distributional shift during evaluation.

Step 2a) Commonsense Graph Feature Extraction
The trained graph autoencoder model as explained in the previous section §4.2, can be used for feature extraction. We now describe the methodology to extract the document-specific commonsense graph features for a particular document x: 1) The first step is to extract the set of all unique nouns, adjectives, and adverbs present in the document. We call this set W.
2) Next, we extract a subgraph from G ′ , where we take all triplets for which both the constituting nodes are either in W or are within the vicinity of radius 1 of any of the words in W. We call this graph G ′ W .
3) We then make a forward pass of G ′ W through the encoder of the pre-trained graph autoencoder model. This results in feature vectors h j for all unique nodes j in G ′ W .
4) Finally, we average over the feature vectors h j for all unique nodes in G ′ W , to obtain the commonsense graph features x cg for document x.
We surmise that since most documents will have both domain-specific and domain-general words in W, x cg will inherently capture the commonsense information likely to be helpful during domain adaptation.

Step 2b) Domain-adversarial Training
We feed the commonsense graph feature x cg pooled from G ′ W for document x ( §4.3) into the DANN architecture (see §3.2). We proceed by learning a encoder function for the graph vector z grp = M ′ θ G (x cg ) and combine its representation with the DANN encoder z dann = M θ M (x) to get the final feature representation [z dann ; z grp ], of the document x. Here, [a; b] represents concatenation.
The task classifier C and domain-discriminator D adv now takes this modified representation, [z dann ; z grp ], as its input instead of only z dann . To further enforce domain-invariance into the encoded graph representation z grp , we consider it as a hidden code in a traditional autoencoder and consequently add a shared decoder D recon (with parameters θ R ) with a reconstruction loss (meansquared error): We hypothesize that if θ R can reconstruct graph features for both domains, then it would ensure stronger domain invariance constraints in z grp . The final optimization of this domain-adversarial setup is based on the minimax objective: where λ and γ are hyper-parameters.

Dataset
We consider the Amazon-reviews benchmark dataset for domain adaptation in SA (Blitzer et al., 2007b). This corpus consists of Amazon product reviews and ranges across four domains: Books, DVDs, Electronics, and Kitchen appliances. Each review is associated with a rating denoting its sentiment polarity. Reviews with rating up to 3 stars are considered to contain negative sentiment and 4 or 5 stars as positive sentiment. The dataset follows a balanced distribution between both labels yielding 2k unlabelled training instances for each domain. Testing contains 3k -6k samples for evaluation. We follow similar pre-processing as bone by Ganin et al. (2016); Ruder and Plank (2018) where each review is encoded into a 5000-dimensional tfidf weighted bag-of-words (BOW) feature vector of unigrams and bigrams.

Training Details
We follow Ganin et al. (2016) in training our network. Our neural layers i.e., DANN encoder (M ), graph feature encoder (M ′ ), graph feature reconstructor (D recon ), task classifier (C) and domain discriminator (D adv ) are implemented with 100 dimensional fully connected layers. We use a cyclic λ as per (Ganin et al., 2016) and γ = 1 after validating with γ ∈ {0.5, 1, 2}. 25% dropout is used in   the fully connected layers and the model is trained with Adam (Kingma and Ba, 2015) optimizer.

Baseline Methods
In this paper, to inspect the role of external commonsense knowledge and analyze the improvement in performance it brings, we intentionally use BOW features and compare them against other baseline models that also use BOW features. This issue has also been addressed by Poria et al. (2020). The flexibility of KinGDOM allows other approaches, such as mSDA, CNN, etc. to be easily incorporated in it, which we plan to analyze in the future. We compare KinGDOM with the following unsupervised domain adaptation baseline methods: DANN (Ganin et al., 2016) is a domain-adversarial method, based on which we develop KinGDOM ( §3.2); DANN+ The DANN model where we use an Adam optimizer instead of the original SGD optimizer. The network architecture and the rest of the hyperparameters are kept same; Variational Fair Autoencoder (VFAE) (Louizos et al., 2015) learns latent representations independent from sensitive domain knowledge, while retaining enough task information by using a MMD-based loss; Central Moment Discrepancy (CMD) (Zellinger et al., 2017) is a regularization method which minimizes the difference between feature representations by utilizing equivalent representation of probability distributions by moment sequences; Asym (Saito et al., 2017) is the asymmetric tri-training framework that uses three neural networks asymmetrically for domain adaptation; MT-Tri (Ruder and Plank, 2018) is similar to Asym, but uses multi-task learning; Domain Separation Networks (DSN) (Bousmalis et al., 2016b) learns to extract shared and private components of each domain. As per Peng et al. (2018a), it stands as the present state-of-the-art method for unsupervised domain adaptation; Task Refinement Learning (TRL) (Ziser and Reichart, 2019) Task Refinement Learning is an unsupervised domain adaptation framework which iteratively trains a Pivot Based Language Model to gradually increase the information exposed about each pivot; TAT  is the transferable adversarial training setup to generate examples which helps in modelling the domain shift. TAT adversarially trains classifiers to make consistent predictions over these transferable examples; CoCMD (Peng et al., 2018a) is a co-training method based on the CMD regularizer which trains a classifier on simultaneously extracted domain specific and invariant features. CoCOMD, however, is SSL-based as it uses labeled data from the target domain. Although it falls outside the regime of unsupervised domain adaptation, we report its results to provide a full picture to the reader.

Results and Analysis
As mentioned in §5.3, we reimplemented the baseline DANN model using Adam optimizer and observed that its results has been notably underreported in many of the unsupervised domain adaptation literature for sentiment analysis (see Table 2). In the original DANN implementation (Ganin et al., 2016), Stochastic Gradient Descent (SGD) was used as the optimizer. However, in DANN+, using Adam optimizer leads to substantial performance jump that outperforms many of the recent advanced domain adaptation methods -CMD (Zellinger et al., 2017), VFAE (Louizos et al., 2015), ASym (Saito et al., 2017), and MT-Tri (Ruder and Plank, 2018).
We compare the performance of KinGDOM with its base models -DANN and DANN+. As observed in Fig. 3  monsense knowledge. Next, we look at Table 2 where comparisons are made with other baselines, including the state-ofthe-art DSN approach. As observed, KinGDOM outperforms DSN in all the task scenarios, indicating the efficacy of our approach. Blitzer et al. (2007b), in their original work, noted that domain transfer across the two groups of DVD, Books and Electronics, Kitchen is particularly challenging. Interestingly, in our results, we observe the highest gains when the source and target domains are from these separate groups (e.g., Kitchen → DVD, Kitchen → Books, Electronics → Books).
In Table 2, we also compare KinGDOM against CoCMD and TAT. Although CoCMD is a semisupervised method, KinGDOM surpasses its performance in several of the twelve domain-pair combinations and matches its overall result without using any labelled samples from the target domain. TAT is the state-of-the-art method for unsupervised domain adaptation in the Amazon reviews dataset when used with 30,000 Bag-Of-Words (BOW) features. Interestingly, KinGDOM used with 5000 BOW features can match TAT with 30,000 BOW features and outperforms TAT by around 1.6% overall when used with the same 30,000 BOW features. The reimplementation of DANN -DANN+ with 30,000 BOW also surpasses the result of TAT by 0.5%. The results indicate that external knowledge, when added to a simple architecture such as DANN, can surpass sophisticated state-of-the-art models, such as DSN and TAT. Our primary intention to utilize DANN as the base model is to highlight the role of knowledge base infusion in domain adaptation, devoid of sophisticated models, and complex neural maneuvering. Nevertheless, the flexibility of KinGDOM allows it to be associated with advanced models too (e.g., DSN, TAT), which we believe could perform even better. We intend to analyze this in the future.

Ablation Studies
We further analyze our framework and challenge our design choices. Specifically, we consider three variants of our architecture based on alternative ways to condition DANN with the graph features. Each of these variants reveals important clues regarding the invariance properties and task appropriateness of z grp . Variant 1 denotes separate decoders D recon for source and target domains. In Variant 2, domain classifier D adv takes only z dann as input whereas the sentiment classifier C takes the concatenated feature [z dann ; z grp ]. Finally, in Variant 3, D adv takes input [z dann ; z grp ] whereas C only takes z dann . As seen in Fig. 4  learning invariant representations and helps target domain classification. For Variant 2, removal of z grp from domain classifier diminishes the domaininvariance capabilities, thus making the domain classifier stronger and leading to a drop in sentiment classification performance. For Variant 3, removal of z grp from sentiment classifier C degrades the performance. This indicates that in KinGDOM, z grp contain task appropriate features retrieved from external knowledge (see §1).
Besides ablations, we also look at alternatives to the knowledge graph and bag-of-words representation used for the documents. For the former, we consider replacing ConceptNet with WordNet (Fellbaum, 2010), which is a lexical knowledge graph with conceptual-semantic and lexical connections. We find the performance of KinGDOM with Word-Net to be 1% worse than ConceptNet in terms of average accuracy score. This indicates the compatibility of ConceptNet with our framework. However, the competitive performance with WordNet also suggests the usability of our framework with any structural resource comprising inter-domain connections. For the latter, we use Glove-averaged embeddings with DANN. Glove is a popular word embedding method which captures semantics using co-occurrence statistics (Pennington et al., 2014). Results in Fig. 4 show that using only Glove does not provide the amount of conceptual semantics available in ConceptNet.

Case Studies
We delve further into our results and qualitatively analyze KinGDOM. We look at a particular test document from DVD domain, for which KinG-DOM predicts the correct sentiment, both when the  source domain is Electronics and also Books. In similar settings, DANN mispredicts the same document. Looking at the corresponding documentspecific sub-graph for this document, we observe conceptual links to both domain-general concepts and domain-specific concepts from the source domain. In Fig. 5, we can see the domain-specific terms CGI and film to be related to the general concept graphic which is further linked to domain-specific concepts like graphics card, writing, etc. from Electronics, Books, respectively. This example shows how KinGDOM might use these additional concepts to enhance the semantics as required for sentiment prediction.

Conclusion
In this paper, we explored the role of external commonsense knowledge for domain adaptation. We introduced a domain-adversarial framework called KinGDOM, which relies on an external commonsense KB (ConceptNet) to perform unsupervised domain adaptation. We showed that we can learn domain-invariant features for the concepts in the KB by using a graph convolutional autoencoder. Using the standard Amazon benchmark for domain adaption in sentiment analysis, we showed that our framework exceeds the performance of previously proposed methods for the same task. Our experiments demonstrate the usefulness of external knowledge for the task of cross-domain sentiment analysis. Our code is publicly available at https://github.com/declare-lab/kingdom.