Improving Neural Knowledge Base Completion with Cross-Lingual Projections

In this paper we present a cross-lingual extension of a neural tensor network model for knowledge base completion. We exploit multilingual synsets from BabelNet to translate English triples to other languages and then augment the reference knowledge base with cross-lingual triples. We project monolingual embeddings of different languages to a shared multilingual space and use them for network initialization (i.e., as initial concept embeddings). We then train the network with triples from the cross-lingually augmented knowledge base. Results on WordNet link prediction show that leveraging cross-lingual information yields significant gains over exploiting only monolingual triples.


Introduction
In the recent years we have witnessed an impressive amount of work on the automatic construction of wide-coverage Knowledge Bases (KBs), ranging from Web-scale machine reading systems like NELL (Carlson et al., 2010) all the way through large-scale ontologies like DBpedia (Bizer et al., 2009), YAGO (Hoffart et al., 2013), and BabelNet (Navigli and Ponzetto, 2012b) as a multi-lingual KB covering a wide range of languages. All KBs, however, are incomplete. Researchers have tried to remedy for the issues of KB incompleteness by constructing knowledge bases of ever increasing coverage directly from the Web (Wu et al., 2012;Dong et al., 2014) or by involving community efforts (Bollacker et al., 2008).
Neural models have recently been ubiquitously applied to various NLP tasks, and knowledge base completion (KBC) is no exception (Bordes et al., 2011;Jenatton et al., 2012;Bordes et al., 2013;Yang et al., 2015). These models represent KB concepts and relations as vectors, matrices, and most expressive of them, like that of , as three-dimensional tensors. However, none of these models so far tried to exploit cross-lingual knowledge, i.e., informational and linguistic links between different languages.
We set to fill this gap and propose a cross-lingual extension of the neural tensor network model for knowledge base completion, proposed by   (NTNKBC, henceforth). We develop an approach that grounds entities of the multilingual KB in a shared multilingual embedding space obtained from monolingual word embeddings using the translation matrix model (Mikolov et al., 2013a). We then exploit cross-lingual triples from BabelNet (Navigli and Ponzetto, 2012a), a multilingual knowledge graph as additional information for training the NTNKBC model. Our results show that joining forces across languages and semantics of their corresponding embedding spaces yields significant performance improvements over using monolingual signal only. We believe that a shared multilingual embedding space and cross-lingual knowledge links provide a form of additional regularization for the neural tensor network model and allow for better generalization, consequently yielding significant link prediction improvements.

Related Work
In recent years a large body of work has focused on knowledge base completion (Yang et al., 2015;Nickel et al., 2016a). External KBC approaches use outer knowledge like text corpora Aprosio et al., 2013) or other KBs Bryl and Bizer, 2014) for acquiring additional knowledge. The text-based external methods typically employ a form of a distant supervi-sion. They first recognize mentions of pairs of KB entities in text and observe what textual patterns hold between them. They then associate the recognized patterns to particular KB relations and finally search the corpus for other entity pairs mentioned using the same patterns (Snow et al., 2004;Mintz et al., 2009;Aprosio et al., 2013). A slight modification is the approach by (West et al., 2014) where lexicalized KB relations are posed as queries to a search engine and results are parsed to find pairs of entities between which the initially queried relation holds. Complementary to this, open information extraction methods (Etzioni et al., 2011;Faruqui and Kumar, 2015) extract large amounts of facts from text that can then be used for extending KBs (Dutta et al., 2014).
Text-centered approaches, however, simply cannot capture knowledge that is rarely made explicit in texts. For example, much of the common-sense knowledge that is obvious to people such as, for instance, that bananas are yellow or that humans breath are rarely (or never) made explicit in textual corpora. A partial solution to this problem is provided by internal approaches that primarily rely on existing information in the KB itself (Bordes et al., 2011;Jenatton et al., 2012;Nickel et al., 2016b, inter alia) to simultaneously learn continuous representations of KB concepts and relations. These models exploit the KB structure as the ground truth for supervision. Obtaining meaningful concept and relation embeddings allows these models to infer additional KB facts from existing ones in an algebraic fashion.
KBs and text are truly synergistic sources of knowledge, as shown by complementary work from , who improve the quality of semantic vectors based on lexicon-derived relational information. Internal models for KB completion, however, make no use of cross-lingual links between entities, which are readily available in existing multilingual resources like BabelNet (Navigli and Ponzetto, 2012b). Here, we extend the model of  with cross-lingual links from BabelNet and demonstrate how introducing additional (cross-lingual) knowledge through these links improves the reasoning over the KB in terms of better performance on the link prediction task. Our findings are, in turn, different yet complementary to those found by building crosslingual embeddings using parallel or comparable data (Upadhyay et al., 2016) or KB-centric multilin-gual joint approaches to word understanding like, for instance, that of Navigli and Ponzetto (2012b). Assuming that each monolingual embedding space captures a slightly different aspect of a relation between same concepts, by introducing cross-lingual links over a shared embedding space we believe we provide an additional external regularization mechanism for the NTNKBC model.

Cross-Lingual Information for Knowledge Base Completion
In Figure 1 we highlight the main steps of our cross-lingual extension of the NTNKBC model. We first use BabelNet to translate KB triples used to train the NTNKBC model to other languages. Next we induce the multilingual embedding space by translating monolingual embedding spaces using the linear translation model (Mikolov et al., 2013a). Finally, we build cross-lingual triples and use them as training data for the NTNKBC model.
Knowledge base translation. We translate an input monolingual knowledge base KB s in the source language s, e.g., the English WordNet (Fellbaum, 1998), to each target language t ∈ T of interest by associating KB s concepts and entities with those within a multilingual lexical knowledge resource, e.g., BabelNet synsets (our approach, however, can be used with any multilingual KB providing adequate lexicographic coverage). Multilingual synsets allow us to translate the triples in KB s into any of the languages covered by BabelNet. That is, we can translate source language triples (e s 1 , r, e s 2 ) into the corresponding target language triples (e t 1 , r, e t 2 ) for each target language.
Multilingual embedding space. We independently train monolingual word embeddings for each of the languages in L = {s} ∪ T . Training monolingual word embeddings for each language separately gives us mutually non-associated embedding spaces, which do not necessarily contain similar embeddings for the same concept across languages (e.g., for English word "cat" and German word "Katze"). This is why we need to project embedding spaces of different languages to a shared multilingual embedding space. To this end, we use the linear mapping model of Mikolov et al. (2013a), where we learn a translation matrix M ∈ R dt×ds (where d s is the size of word embeddings of the source and d t of the target language) that projects source language embeddings into the embedding , M is obtained by minimizing the following objective: The obtained matrix M can then be used to map the embedding of any word from the source language to the embedding space of the target language. To obtain a shared multilingual embedding space we define the embedding space of one of the languages as the target embedding space and project embedding spaces of all other languages to that space. We train one matrix M t,s for each language t ∈ T that we translate KB s into, and use it to project the embeddings of KB t entities into the same embedding space as that of the source language s. Neural tensor networks for knowledge base completion. The NTNKBC model of  models KB relations as tensors that bilinearly link KB entities, adding them to the linear associations between entities introduced by earlier models (Bordes et al., 2011). The NTN model assigns the following score to each KB triple (e 1 , r, e 2 ): where W 1:k r ∈ R d×d×k is the relation-specific tensor for relation r and e T 1 W 1:k r e 2 is the bilinear tensor product of entity embeddings e 1 and e 2 that results in a k-dimensional vector in which each element is computed using a different slice W i r of the tensor W 1:k r . Matrix V r ∈ R k×2d linearly links the entities, b r ∈ R k is a bias vector, and u r ∈ R k is a vector of output layer weights. Relation-specific tensors allow for the multi-perspective modeling of KB relations, with each tensor slice capturing one aspect of the observed relation. For example, for the relation "part of", one slice might learn that animals have limbs (from triples like (arm, part of, person)), whereas another slice could capture that machines have mechanical parts (from examples like (engine, part of, car)).
Parameter values, including relation tensors and entity embeddings, are computed by minimizing the cost function J(Ω) that couples each correct triple F i = (e i 1 , r i , e i 2 ) with corrupt triples F i c = (e i 1 , r i , e i c ) in which one entity is replaced with a random KB entity. The correct triples are expected to be scored higher than corrupt triples, which is imposed by forming a standard margin-based objective (i.e., a perfect model will score each correct triple better by at least 1 than any of its corresponding corrupt triples): where Ω = {W, V, U, b, E} is the set of all parameters, N is the size of the training set, C is the number of corrupt triples for each correct triple, and λ is the regularization coefficient.
Cross-lingual neural tensor network. We extend the NTNKBC with multilingual and crosslingual KB projections. Our hunch is that triples lexicalized in different languages can provide complementary evidence for the existence of a semantic relation between entities (cf. Section 4). Let KB t i be the translation of the initial knowledge base KB s from the source language s into the target language t i , t i ∈ {t 1 , . . . , t k }. Our new cross-lingual knowledge base (CLKB) then contains: 1. All triples from KB s ; 2. All monolingual triples from each of the translated KBs KB t i ; 3. Cross-lingual triples obtained from monolingual triples by replacing one of the entities with its corresponding entity in another language.
Formally, for each original triple (e s 1 , r, e s 2 ), CLKB contains k additional monolingual triples (e t i 1 , r, e t i 2 ) and 2 k+1 2 corresponding cross-lingual triples -(e l i 1 , r, e l j 2 ) and (e l j 1 , r, e l i 2 ) for each pair of languages (l i , l j ) ∈ L × L, i = j, where L = {s} ∪ T . For example, from the English triple (football player, type of, athlete) and its corresponding German triple (Fußballspieler, type of, Sportler), we add the following cross-lingual triples (Fußballspieler, type of, athlete) and (football player, type of, Sportler) to the augmented cross-lingual knowledge base.
Following the NTNKBC approach, we initialize the embeddings of multi-word KB entities by averaging the embeddings of their constituent words . Finally, we translate the monolingual embeddings of all CLKB entities (obtained from respective monolingual word embeddings) to the shared embedding space and train the NTNKBC model on the CLKB triples.

Evaluation
In line with previous work , we evaluate our approach on the link prediction task, namely the binary classification task of predicting the correctness of a KB triple (e 1 , r, e 2 ), given entities e 1 and e 2 and a semantic relation r.

Experimental Setting
Dataset. We perform the evaluation on WordNet (Fellbaum, 1998) (i.e., WN11 dataset), following the same evaluation setting, i.e., the same train, development, and test split as in the evaluation of the original NTNKBC model . We translate the WN11 dataset to German (WN11DE) and Italian (WN11IT) via multilingual BabelNet synsets. Because not all WN11 synsets have German and Italian counterparts in BabelNet, 1 WN11DE and WN11IT are somewhat smaller than WN11. The sizes of train, development, and test portions (in terms of number of correct triples) are given in Table 1 for each of the three monolingual WN11 datasets. 1 Cf. Navigli and Ponzetto (2012a) reporting a synset coverage of almost 70% for German and Italian (  Word embeddings. We used the WaCky corpora (Baroni et al., 2009) -UkWaC, DeWaC, and ItWaC -to respectively train English, German, and Italian embeddings. We built the 100-dimensional embeddings using the CBOW algorithm (Mikolov et al., 2013b). We then mapped the German and Italian embeddings into the English embedding space by (1) translating 1100 most frequent English words (1000 pairs for training and 100 for testing) to German and Italian using Google translate and (2) training the respective German-to-English and Italianto-English translation matrices. The quality of the obtained translations, measured in terms of P@1 and P@5 (i.e., percentage of cases in which the translation pair was retrieved as the most similar or among the five most similar words from the other language), is shown Table 2. The performance levels we obtain are comparable to translation performances reported in the original work (Mikolov et al., 2013a).
Model configuration. The augmented CLKB contains a total of 846K triples (296K monolingual and 550K cross-lingual). Following , we set the number of tensor slices to k = 4 and the corruption rate (i.e., number of corrupt triples per each correct triple) to C = 10. We also optimize the NTNKBC's parameters with the minibatched L-BFGS algorithm, with minibatches of size N = 20.000 triples. We use the development portion of the WN11 dataset to optimize the model hyperparameters -the prediction thresolds for each of the 11 types of relations.

Results and Discussion
The link prediction performance for all abovementioned models, measured on the test portion of the original WN11 dataset (containing English triples) is shown in Table 3. Mono-EN achieves accuracy of 85.8%, which is very close to the 86.2% accuracy reported by . The monolingual English model Mono-EN significantly (p < 0.01) 2 outperforms the other two monolingual models. We credit this performance gap to the significantly larger training set (38.7K entities and 112.5K triples vs. 33.4K entities and 92K triples for both German and Italian). The Italian monolingual model (Mono-IT) outperforms the German monolingual model (Mono-DE) despite comparable training set sizes, which we credit to the lower quality of the DE→EN translation matrix in comparison with the IT→EN translation matrix (see Table 2).
The multilingual model outperforms only one of the three monolingual models. This is not so surprising (although it might seem so at first glance) if we consider that ML-NTN merely combines three disjoint KBs which share semantic information only through shared embedding space and relation tensors. Without the direct, cross-lingual links between entities of different monolingual KBs, these signals seem to be insufficient to compensate for a much larger number of parameters (three times larger number of entities) that the ML-NTN model has to learn compared to monolingual models.
The cross-lingual model (CL-NTN), on the other hand, significantly outperforms all monolingual models. We believe that this is because by adding cross-lingual triples we introduce additional regularization to the model -although cross-lingual triples describe the same facts as monolingual triples (i.e., same relations between same entities) the facts get represented slightly differently due to imperfect embedding translation and inherent language differences. We believe that this effect is similar to adding noise when training denoising autoencoders (Vincent et al., 2008), in order to obtain more robust entity representations. We believe that the addition of German and Italian monolingual triples has the same regularizing effect as the addition of cross-lingual triples, but their number is significantly smaller (184K compared to 550K cross-lingual triples) and alone they do not compensate for increased model complexity (i.e., three times larger number of entity vectors to be learned).

Conclusion
We presented a cross-lingual extension of the NTNKBC model of  that leverages a multilingual knowledge graph and multilingual embedding space. Our results indicate that using cross-lingual links between entity lexicalizations in different languages yields better NTNKBC model. That is, our experiments imply that the cross-lingual signal enabled through the multilingual KB and shared multilingual embedding space provides improved regularization for the neural KBC model. We intend to investigate whether such cross-lingual regularization can yield similar improvements for other neural KBC models and whether it can be combined with other types of regularization, such as that based on augmenting KB paths (Guu et al., 2015). We will also evaluate the cross-lingually extended KB-embedding models on other high-level tasks such as error detection and KB consistency checking.