A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network

In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction performance than previous state-of-the-art embedding models on two benchmark datasets WN18RR and FB15k-237.


Introduction
Large-scale knowledge bases (KBs), such as YAGO (Suchanek et al., 2007), Freebase (Bollacker et al., 2008) and DBpedia (Lehmann et al., 2015), are usually databases of triples representing the relationships between entities in the form of fact (head entity, relation, tail entity) denoted as (h, r, t), e.g., (Melbourne, cityOf, Australia). These KBs are useful resources in many applications such as semantic searching and ranking (Kasneci et al., 2008;Schuhmacher and Ponzetto, 2014;Xiong et al., 2017), question answering (Zhang et al., 2016;Hao et al., 2017) and machine reading (Yang and Mitchell, 2017). However, the KBs are still incomplete, i.e., missing a lot of valid triples (Socher et al., 2013;West et al., 2014). Therefore, much research work has been devoted towards knowledge base completion or link prediction to predict whether a triple (h, r, t) is valid or not (Bordes et al., 2011).
Many embedding models have proposed to learn vector or matrix representations for entities and relations, obtaining state-of-the-art (SOTA) link prediction results (Nickel et al., 2016a). In these embedding models, valid triples obtain lower implausibility scores than invalid triples. Let us take the well-known embedding model TransE (Bordes et al., 2013) as an example. In TransE, entities and relations are represented by kdimensional vector embeddings. TransE employs a transitional characteristic to model relationships between entities, in which it assumes that if (h, r, t) is a valid fact, the embedding of head entity h plus the embedding of relation r should be close to the embedding of tail entity t, i.e. v h + v r ≈ v t (here, v h , v r and v t are embeddings of h, r and t respectively). That is, a TransE score v h + v r − v t p of the valid triple (h, r, t) should be close to 0 and smaller than a score v h + v r − v t p of an invalid triple (h', r', t'). The transitional characteristic in TransE also implies the global relationships among same dimensional entries of v h , v r and v t .
Other transition-based models extend TransE to additionally use projection vectors or matrices to translate head and tail embeddings into the relation vector space, such as: TransH (Wang et al., 2014), TransR (Lin et al., 2015b), TransD (Ji et al., 2015), STransE (Nguyen et al., 2016b) and TranSparse . Furthermore, DIST-MULT (Yang et al., 2015) and ComplEx (Trouillon et al., 2016) use a tri-linear dot product to compute the score for each triple. Recent research has shown that using relation paths between entities in the KBs could help to get contextual information for improving KB completion performance (Lin et al., 2015a;Luo et al., 2015;Guu et al., 2015;Toutanova et al., 2016;Nguyen et al., 2016a). See other embedding models for KB completion in Nguyen (2017).
Recently, convolutional neural networks (CNNs), originally designed for computer vision (LeCun et al., 1998), have significantly received research attention in natural language processing (Collobert et al., 2011;Kim, 2014). CNN learns non-linear features to capture complex relationships with a remarkably less number of parameters compared to fully connected neural networks. Inspired from the success in computer vision, Dettmers et al. (2018) proposed ConvE-the first model applying CNN for the KB completion task. In ConvE, only v h and v r are reshaped and then concatenated into an input matrix which is fed to the convolution layer. Different filters of the same 3 × 3 shape are operated over the input matrix to output feature map tensors. These feature map tensors are then vectorized and mapped into a vector via a linear transformation. Then this vector is computed with v t via a dot product to return a score for (h, r, t). See a formal definition of the ConvE score function in Table 1. It is worth noting that ConvE focuses on the local relationships among different dimensional entries in each of v h or v r , i.e., ConvE does not observe the global relationships among same dimensional entries of an embedding triple (v h , v r , v t ), so that ConvE ignores the transitional characteristic in transition-based models, which is one of the most useful intuitions for the task.
In this paper, we present ConvKB-an embedding model which proposes a novel use of CNN for the KB completion task. In ConvKB, each entity or relation is associated with an unique kdimensional embedding. Let v h , v r and v t denote k-dimensional embeddings of h, r and t, respectively. For each triple (h, r, t), the corresponding triple of k-dimensional embeddings (v h , v r , v t ) is represented as a k × 3 input matrix. This input matrix is fed to the convolution layer where different filters of the same 1 × 3 shape are used to extract the global relationships among same dimensional entries of the embedding triple. That is, these filters are repeatedly operated over every row of the input matrix to produce different

Model
The score function f (h, r, t) g denotes a non-linear function. * denotes a convolution operator. · denotes a dot product. concat denotes a concatenation operator. v denotes a 2D reshaping of v. Ω denotes a set of filters.
feature maps. The feature maps are concatenated into a single feature vector which is then computed with a weight vector via a dot product to produce a score for the triple (h, r, t). This score is used to infer whether the triple (h, r, t) is valid or not. Our contributions in this paper are as follows: • We introduce ConvKB-a novel embedding model of entities and relationships for knowledge base completion. ConvKB models the relationships among same dimensional entries of the embeddings. This implies that ConvKB generalizes transitional characteristics in transition-based embedding models.
• We evaluate ConvKB on two benchmark datasets: WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova and Chen, 2015). Experimental results show that ConvKB obtains better link prediction performance than previous SOTA embedding models. In particular, ConvKB obtains the best mean rank and the highest Hits@10 on WN18RR, and produces the highest mean reciprocal rank and highest Hits@10 on FB15k-237.

Proposed ConvKB model
A knowledge base G is a collection of valid factual triples in the form of (head entity, relation, tail entity) denoted as (h, r, t) such that h, t ∈ E and r ∈ R where E is a set of entities and R is a set of relations. Embedding models aim to define a score function f giving an implausibility score for each triple (h, r, t) such that valid triples receive lower scores than invalid triples. Table 1 presents score functions in previous SOTA models.
We denote the dimensionality of embeddings by k such that each embedding triple

viewed as a matrix
Suppose that we use a filter ω ∈ R 1×3 operated on the convolution layer. ω is not only aimed to examine the global relationships between same dimensional entries of the embedding triple (v h , v r , v t ), but also to generalize the transitional characteristics in the transition-based models. ω is repeatedly operated over every row of A to finally generate a feature map v = [v 1 , v 2 , ..., v k ] ∈ R k as: where b ∈ R is a bias term and g is some activation function such as ReLU.
Our ConvKB uses different filters ∈ R 1×3 to generate different feature maps. Let Ω and τ denote the set of filters and the number of filters, respectively, i.e. τ = |Ω|, resulting in τ feature maps. These τ feature maps are concatenated into a single vector ∈ R τ k×1 which is then computed with a weight vector w ∈ R τ k×1 via a dot product to give a score for the triple (h, r, t). Figure 1 illustrates the computation process in ConvKB.
Formally, we define the ConvKB score function f as follows: where Ω and w are shared parameters, indepen-  dent of h, r and t; * denotes a convolution operator; and concat denotes a concatenation operator.
If we only use one filter ω (i.e. using τ = 1) with a fixed bias term b = 0 and the activation function g(x) = |x| or g(x) = x 2 , and fix ω = [1, 1, −1] and w = 1 during training, Con-vKB reduces to the plain TransE model (Bordes et al., 2013). So our ConvKB model can be viewed as an extension of TransE to further model global relationships.
We use the Adam optimizer (Kingma and Ba, 2014) to train ConvKB by minimizing the loss function L (Trouillon et al., 2016) with L 2 regularization on the weight vector w of the model: here G is a collection of invalid triples generated by corrupting valid triples in G.

Datasets
We evaluate ConvKB on two benchmark datasets: WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova and Chen, 2015). WN18RR and FB15k-237 are correspondingly subsets of two common datasets WN18 and FB15k (Bordes et al., 2013). As noted by Toutanova and Chen (2015), WN18 and FB15k are easy because they contain many reversible relations. So knowing relations are reversible allows us to easily predict the majority of test triples, e.g. state-of-the-art results on both WN18 and FB15k are obtained by using a simple reversal rule as shown in Dettmers et al. (2018). Therefore, WN18RR and FB15k-237 are created to not suffer from this reversible relation problem in WN18 and FB15k, for which the knowledge base completion task is more realistic.   (2018) where Hits@10 and MRR are rounded to 2 decimal places on WN18RR. The last 4 rows report results of models that exploit information about relation paths (KB LRN , R-GCN+ and Neural LP) or textual mentions derived from a large external corpus (Node+LinkFeat). The best score is in bold, while the second best score is in underline.

Evaluation protocol
In the KB completion or link prediction task (Bordes et al., 2013), the purpose is to predict a missing entity given a relation and another entity, i.e, inferring h given (r, t) or inferring t given (h, r). The results are calculated based on ranking the scores produced by the score function f on test triples. Following Bordes et al. (2013), for each valid test triple (h, r, t), we replace either h or t by each of other entities in E to create a set of corrupted triples. We use the "Filtered" setting protocol (Bordes et al., 2013), i.e., not taking any corrupted triples that appear in the KB into accounts. We rank the valid test triple and corrupted triples in ascending order of their scores. We employ three common evaluation metrics: mean rank (MR), mean reciprocal rank (MRR), and Hits@10 (i.e., the proportion of the valid test triples ranking in top 10 predictions). Lower MR, higher MRR or higher Hits@10 indicate better performance.
To learn our model parameters including entity and relation embeddings, filters ω and the weight vector w, we use Adam (Kingma and Ba, 2014) and select its initial learning rate ∈ {5e −6 , 1e −5 , 5e −5 , 1e −4 , 5e −4 }. We use ReLU as the activation function g. We fix the batch size at 256 and set the L 2 -regularizer λ at 0.001 in our objective function. The filters ω are initialized by a truncated normal distribution or by [0.1, 0.1, −0.1]. We select the number of filters τ ∈ {50, 100, 200, 400, 500}. We run ConvKB up to 200 epochs and use outputs from the last epoch for evaluation. The highest Hits@10 scores on the validation set are obtained when using k = 50, τ = 500, the truncated normal distribution for filter initialization, and the initial learning rate at 1e −4 on WN18RR; and k = 100, τ = 50, [0.1, 0.1, −0.1] for filter initialization, and the initial learning rate at 5e −6 on FB15k-237. Table 3 compares the experimental results of our ConvKB model with previous published results, using the same experimental setup. Table 3 shows that ConvKB obtains the best MR and highest Hits@10 scores on WN18RR and also the highest MRR and Hits@10 scores on FB15k-237.

Main experimental results
ConvKB does better than the closely related model TransE on both experimental datasets, especially on FB15k-237 where ConvKB gains significant improvements of 347 − 257 = 90 in MR (which is about 26% relative improvement) and 0.396 − 0.294 = 0.102 in MRR (which is 34+% relative improvement), and also obtains 51.7 − 46.5 = 5.2% absolute improvement in Hits@10. Previous work shows that TransE obtains very competitive results (Lin et al., 2015a;Nickel et al., 2016b;Trouillon et al., 2016;Nguyen et al., 2016a). However, when comparing the CNN-based embedding model ConvE with other models, Dettmers et al. (2018) did not experiment with TransE. We reconfirm previous findings that TransE in fact is a strong baseline model, e.g., TransE obtains better MR and Hits@10 than ConvE on WN18RR.
ConvKB obtains better scores than ConvE on both datasets (except MRR on WN18RR and MR on FB15k-237), thus showing the usefulness of taking transitional characteristics into accounts. In particular, on FB15k-237, ConvKB achieves improvements of 0.394 − 0.316 = 0.078 in MRR (which is about 25% relative improvement) and 51.7 − 49.1 = 2.6% in Hits@10, while both ConvKB and ConvE produce similar MR scores. ConvKB also obtains 25% relatively higher MRR score than the relation path-based model KB LRN on FB15k-237. In addition, ConvKB gives better Hits@10 than KB LRN , however, KB LRN gives better MR than ConvKB. We plan to extend Con-vKB with relation path information to obtain better link prediction performance in future work.

Conclusion
In this paper, we propose a novel embedding model ConvKB for the knowledge base completion task. ConvKB applies the convolutional neural network to explore the global relationships among same dimensional entries of the entity and relation embeddings, so that ConvKB generalizes the transitional characteristics in the transitionbased embedding models. Experimental results show that our model ConvKB outperforms other state-of-the-art models on two benchmark datasets WN18RR and FB15k-237. Our code is available at: https://github.com/daiquocnguyen/ConvKB.
We also plan to extend ConvKB for a new application where we could formulate data in the form of triples. For example, inspired from the work by Vu et al. (2017) for search personalization, we can also apply ConvKB to model user-oriented relationships between submitted queries and documents returned by search engines, i.e. modeling triple representations (query, user, document).