Abstract Graphs and Abstract Paths for Knowledge Graph Completion

Knowledge graphs, which provide numerous facts in a machine-friendly format, are incomplete. Information that we induce from such graphs – e.g. entity embeddings, relation representations or patterns – will be affected by the imbalance in the information captured in the graph – by biasing representations, or causing us to miss potential patterns. To partially compensate for this situation we describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the results of link prediction using this information.


Introduction
Knowledge graphs have become a very useful framework to organize and store knowledge. Their interconnected nature is not just a natural way to represent facts, but it has potential that the separate storage of facts does not have, such as: (i) we can use it as a relational model of meaning, and derive jointly representations for nodes (entities) and edges (relations); (ii) the structure can be explored to discover systematic patterns that reveal interesting and exploitable regularities, such as paths connecting nodes in direct relations, (iii) discovering and inducing new connections.
Link prediction methods in knowledge graphs (see (Nickel et al., 2016) for an overview) predict additional edges in the graph, based on induced node and edge representations that encode the structure of the graph and thus capture regularities (such as homophily). Lao and Cohen (2010) introduced a new method that predicts direct links based on paths that connect the source and target nodes. Such paths are not only useful for link prediction (Lao et al., 2011;Gardner et al., 2014), but also for finding explanations for direct links and help with targeted information extraction to fill in incomplete knowledge repositories (Yin et al., 2018;Zhou and Nastase, 2018).
These approaches rely on the structure of the knowledge graph, which is inherently incomplete. This incompleteness can affect the process in different ways, e.g. it leads to representations for nodes with few connection that are not very informative, it can miss relevant patterns/paths (or derive misleading patterns/paths).
In this paper we investigate whether a higherlevel view of a graph -an abstract graph that captures an intensional view of the original extensional graph -can help derive more robust and informative patterns. Such patterns are paths (i.e. sequences of relations) that could be used not only for link prediction, but also for targeted information extraction for completing the graph with external information. This abstract graph will contain only one edge for each relation type, that will connect a node representing the relation's domain (or source) to its range (or target). Additional edges will link the nodes to capture set relations (intersection, subset, superset) information between the different relations' domains and ranges. This step drastically reduces the graph size, making many different graph processing approaches more tractable. We investigate whether in this graph that represents a more general version of the information in the original KG, good patterns/paths are stronger and easier to find, because the aggregated view compensates for individual missing edges throughout the graph. We test the extracted paths through the link prediction task on Freebase (Bollacker et al., 2008) and NELL (Carlson et al., 2010a), using Gardner et al. (2014)'s experimental set-up: pairs of nodes are represented using their connected paths as fea-tures, and a model for predicting the direct relations is learned and tested on training and test sets for 24 relations in Freebase and 10 relations in NELL. Our analysis shows that we find different and much fewer paths than the PRA method does (mostly because the abstract paths do not contain back-and-forth sequences of generalizing or type relations). The paths found in the abstract graphs lead to better performance on NELL than the PRA paths, which could be explained by the fact that NELL's relation inventory was designed to capture interdependencies (Carlson et al., 2010a). On Freebase the results we obtain are lower, but this could be due to a different negative sampling process. Inspection of the paths produced reveal that they seem to capture legitimate dependencies.

Related Work
Representing facts in a knowledge graph has multiple advantages: (i) they provide knowledge in an easily accessible and machine-friendly format; (ii) they facilitate various ways of encoding this information and deriving representations for nodes and edges that reflect their connectivity in the graph; (iii) they allow for the discovery of connectivity patterns, and possibly more.
In recent years, projecting the knowledge graph in an n-dimensional vector space, or learning embeddings for predicting missing facts has attracted a lot of interest. Embedding models aim to map entities, relations and triples to vector space such that additional facts can be inferred from known facts using notions of vector similarity. A class of embedding models that aim to factorize the graph are termed as latent factor models. Neural network based models such as ER-MLP (Dong et al., 2014), NTN (Socher et al., 2013), RNNs (Neelakantan et al., 2015;Das et al., 2016) and Graph CNNs (Schlichtkrull et al., 2018) are examples of embedding models while RESCAL (Nickel et al., 2012), DistMult (Yang et al., 2015), TransE (Bordes et al., 2013), ComplEx (Trouillon et al., 2017) are examples of latent factor models. Lao and Cohen (2010) introduced a novel way to exploit information in knowledge graphs: using weighted extracted paths as features in four different recommendation tasks, which can be modeled as typed proximity queries. The idea of using paths in the graph has then been applied to the task of link prediction (Lao et al., 2011), and extended to incorporate textual information (Gard-ner et al., 2014). Lao et al. (2011) obtain paths for given node pairs using random walks over the knowledge graph. To be used as features shared by multiple instances, the information about nodes on the paths is removed, transforming the actual paths into "meta-paths".
The paths themselves can be incorporated in different ways in a model -as features (Lao et al., 2011;Gardner et al., 2014), as Horn clauses to provide rules for inference in KGs whether directly or through scores that represent the strength of the path as a direct relation (Neelakantan et al., 2015;Guu et al., 2015), also taking into account information about intermediary nodes (Das et al., 2017;Yin et al., 2018). Gardner and Mitchell (2015) perform link prediction using random walks but do not attempt to connect a source and target node, but rather to characterize the local structure around a (source or target) node using such localized paths. Using these subgraph features leads to better results for the knowledge graph completion task.
We focus here on discovering useful and explanatory paths, not on optimizing or further improving the KGC task. Using paths can lead to interpretable models because the paths can help explain the predicted fact. Meng et al. (2015) present a method to automate the induction of metapaths in large heterogeneous information networks (a.k.a. knowledge graphs) for given node pairs, even if the given node pairs are not connected by a direct relation.
Path information is also found to improve performance since paths help the model learn logical rules. However, mining paths from a large knowledge graph is often computationally expensive since it involves performing a traversal through the graph. To overcome this limitation (Das et al., 2017) proposed deep reinforcement learning and (Chen et al., 2018) proposed RNNS for generating paths. However, many datasets suffer from paths sparsity, lack of enough paths connecting source target pairs, resulting in poor performance for many relations. Wang et al. (2013) have a different approachthey start with patterns in the form of first-order probabilistic rules, which they then ground in a small subgraph of a large knowledge graph.
The approach we present here combines different elements of these previous approaches in a novel way: we build an abstract graph to find pat-terns that would be similar to those used by (Wang et al., 2013). To test the quality of these paths we ground them using the original KG and use these grounded paths in a learning framework similar to (Gardner et al., 2014).

Abstract Graphs and Abstract Paths
Knowledge graphs are incomplete in an imbalanced way. Figures 1a-1b show how much the relation and node frequencies for Freebase 15k and NELL vary, and the fact that numerous nodes and edges have very low frequency (each data point corresponds to a node/relation, and the value is the degree of the node/frequency of the relation respectively). Freebase and NELL have a helpful characteristic: they have strongly typed relations, i.e. the source and target of a relation have a very specific type. NELL for example, has relations such as like Ac-torStarredinMovie, StateHasLake, and Freebase has /film/film/rating, /book/literary series/author, whose arguments have type Person, Movie, State, etc.
Previous work has shown that using node type information -provided in Freebase through the domain and range types for each relation -can help optimize computation for link prediction by filtering the entity matrix for each relation based on the relation's domain and range types (Chang et al., 2014), improve prediction by adding a factor in the loss function that accounts for the type of the entities involved in a relation (Kotnis and Nastase, 2017), or improve predictions based on paths in the graph by using the types of intermediary entities (Yin et al., 2018).
Entity types and the type of the domain and range of a relation have been proven to be useful for improving link prediction models. We investigate here the hypothesis that by relying on the fact that such strong constraints on the arguments of relations in Freebase exist, we can build an intensional graph of the knowledge repository that is smaller and thus easier to analyze than the full KG. We also hypothesize that at this abstract level we can induce better patterns/paths that are indicative of direct relations, because individual missing relation instances will not obfuscate useful patterns. We verify whether these patterns are good by testing their usefulness for link prediction. Finding qualitative patterns would have additional benefits, as they could be used to explain direct relation, and fill in the KG through targeted information extraction (Zhou and Nastase, 2018).

Abstract graphs
A knowledge graph (KG) is an extensional representation of a relation schema, where each instance of a relation type r corresponds to an edge connecting two nodes, a source s and a target t, usually represented as a triple: < s, r, t >. We replace this representation with an intesional representation, where we have only one edge for each relation type, and draw additional edges to capture set relations (intersection, subset, superset) between the (original graph's) relations' domain and ranges. These edges are weighed with the size of the overlap between the sets. Formally: we build the abstract graph where: the source node of relation r i in the abstract graph is the set of source nodes (the domain) of relation r i in KG: the target node of relation r i in the abstract graph is the set of target nodes (the range) of r i in KG: where the weight of a set relation between KG A 's nodes quantifies the overlap between the two sets: Figure 1: Knowledge graphs statistics on a logarithmic scale: relation and nodes frequencies for Freebase and NELL (the version used by (Gardner et al., 2014) and in this paper). Every data point is the degree of a node (top plots), or frequency of a relation (bottom plots). The data points are ordered monotonically, the x axis is just an index.
Building such a graph makes sense only for knowledge repositories that have strongly typed relations -like Freebase and NELL -but we do not require knowledge of the types of the relations' domains and ranges. Such information is not finegrained enough: for example, the relation capital has a type City as a domain, but capital cities are a very small subset of the set of all cities. Using an "atomic" node to represent the domain/range of a relation would not allow us to make finer grained connections and distinctions between the domains and ranges of the existing relations. Figure 2 shows a subset of the abstract graph built from the Freebase dataset. The blue edges are set relations -intersection, superset, subsetbetween the domains and ranges of a subset of the relations in the dataset. The black edges correspond to the actual relations in the dataset.

Abstract paths
The Path Ranking Algorithm formalism originally proposed by (Lao and Cohen, 2010) performs two main steps to represent of a pair of nodes in a graph: (i) feature selection -adding paths that connect the node pair; (ii) feature computation -   Table 1: Graph statistics on the datasets used by (Gardner et al., 2014), and their abstract versions associating a value for each added path.
Obtaining paths from a large graph is a computationally intensive problem, particularly in graphs that have numerous nodes with high degrees. Figure 1a shows that about 60% of Freebase nodes have degree higher than 10, which leads to an exponential growth in the number of paths starting in a node. Algorithms that harness path information often mine paths either by performing costly random walks (Guu et al., 2015), traversals (Gardner et al., 2014;Neelakantan et al., 2015;Das et al., 2016) or by constructing paths through generative models (Das et al., 2017;Ding et al., 2018). Here, we adopt a different approach, by abstracting the graph first, then finding paths in this graph through traversal algorithms.
For a relation r i , we start at its domain (source) node V i,s and search for a path to its range (target) node V i,t using breadth first search. We constrain this path to contain at most k "proper" relations 2 , and we do not allow consecutive set relations, thus forcing the algorithm to move from one "proper" relation to another through a set relation that connects the range of one with the domain of the next. An abstract path, just like a meta-path extracted by previous work, is a sequence of relation types: π j =< r j,1 , r j,2 , ...r j,m >, some of which are "proper" relations, some are set relations.
Because of the more general view of the graph, we lose information about individual paths (i.e. instances of a path in the original graph). Because of this, the paths we extract are hypothetical, but will have associated a confidence score based on the frequency of occurrence of relations in the original KG, and the strength of the connection of the range of one relation on the path with the domain of the next one. The weight of an abstract path π j is computed as: In our experiments we used k = 5 where the weight w(r j,i ) of an individual relation is defined based on whether r i,j is a "proper" relation or a set relation as: We use this weight to rank abstract relations for potential filtering, and to compute the weight of its grounding for specific node pairs.

Grounded paths
The abstract paths are hypothetical paths that could connect the source s and target t of a < s, r, t > tuple. They can be used in different ways, e.g. (i) as features in a link prediction system (e.g. (Gardner et al., 2014)), (ii) to fill in larger portions of the graph by producing, rather than finding, groundings of the path for specific instances.
In the work presented here we test the abstract paths through the link prediction task, so we will try to ground abstract paths for relation instances in the training and test data. After finding the set of abstract paths {π i,r } associated with a relation r, for a given instance of the relation r -< s, r, t >we can (try to) ground the paths as follows: (i) we first eliminate set relations from the abstract paths: at this point set relations between relation types domain and ranges are not useful (they were necessary only for the connectivity and search process in the abstract graph). Set relations have no counterpart in the extensional graph, as at this level nodes themselves make the connection between successive relations (ii) starting at the source node, we follow again a breadth first traversal, constraining at each step the type of relation to follow based on the "cleaned up" abstract path.
We compute the weight of a grounded path gp =< v 0 , r x 1 , v 1 , ..., v l−1 , r x l , v l > (where v 0 = s and v l = t) as a combination of the weight of the corresponding abstract path π =< r 1 , ..., r m > (r x i ∈ π) and specific information for the current node pair (s, t): where the weights of the relations on the grounded path reflect the specificity of the relation to its source node:

Experiments
Because we want to compare the abstract paths found using the abstract graph with paths found using PRA, we use the experimental set-up of (Gardner et al., 2014), where we replace the feature selection and feature computation steps with the approach presented here. A big difference will be caused by the negative sampling, which also makes the results not directly comparable. The issues are explained in the negative sampling paragraph below. The data thus obtained is used for training a linear regression model (similarly to (Gardner et al., 2014)), and tested on the provided test sets and evaluated using mean average precision (MAP).

Data
We build abstract graphs and paths from the Freebase and NELL data described in (Gardner et al., 2014). We then use the extracted paths for link prediction. The graphs built by Gardner et al. (2014) cover several variations, where the KGs were enhanced with < subject, verb, object > triples extracted from dependency parses of ClueWeb documents. Table 1 shows the statistics for each original and abstract graph. The generated abstract graph is several degrees of magnitude smaller compared to the original KG. The abstract graph approach we present here does not fit well the combination of the knowledge base (Freebase or NELL) with unstructured SVO triples, because we rely on strongly typed relations to build node sets. The SVO triples bring in numerous low frequency relations, that without additional processing are not beneficial. The results presented by Gardner et al. (2014) show that this configuration very rarely (and never overall) leads to better results than the other graph variations. The numerous relation types brought in by the SVO triples also lead to high computation time for the abstract graph: its shortcoming is the computation of set relations between the different relations' domains and ranges,  which grows quadratically with the number of relation types. We will skip this graph variation in the rest of the experiments presented here. Gardner et al. (2014) use these graphs to generate paths for augmenting the representation of node pairs, for link prediction, for a subset of 24 relation types from Freebase's inventory, and 10 relations from NELL. Each relation has a training and test set, whose numbers vary quite a bit, as shown through the statistics in Table 2.
Negative sampling The number of negative instances used in (Gardner et al., 2014) is not clearly stated. Both the number and methods of generating the negative samples can impact the results (Kotnis and Nastase, 2018). We use (up to) 200 negative samples for each positive pair: for a pair (s, t) in the provided training or test sets for each relation r, we make 100 negative samples by corrupting the source s, and 100 negative samples by corrupting the target t. The corrupted s and t are chosen from r's domain V r,s and range V r,t respectively, such that these corrupted triples are not part of the training, test or graph. If 100 instances do not exist, we extract as many as possible.
N eg(s, r, t) = {(s , r, t)|s ∈ V r,s , (s , r, t) ∈ E} ∪ {(s, r, t )|t ∈ V r,t , (s, r, t ) ∈ E} Because the relations are strongly typed, producing negative instances by corrupting the source/target nodes from the relation's domain and range leads to difficult negative instances. Instances with source and target nodes that don't match the argument types of the direct relation we want to predict can be filtered out before the link prediction.
Representing instances For each of these 24 Freebase and 10 NELL relations we mine paths in the abstract graph using depth first traversal. An example of abstract path found for the NELL rela- tion StadiumLocatedInCity is shown in figure 3. Each of the 24 Freebase and 10 NELL relations has a set of training and test examples. After building abstract paths, for each instance < s, r, t > in these datasets we will ground the corresponding abstract paths as described in Section 3.3. For each relation type the set of features representing the corresponding data will be twice the number of abstract paths. We produce two features for each abstract path: one that is the weight of this path, and one that is the weight of its grounding for a given relation instance. If a relation instance does not have a grounding for an abstract path, the values of these features will be 0.

Results and discussion
The overall results of the experiments are presented in Table 3 Table 3: Results on the three graph variations of Freebase and NELL as reported by (Gardner et al., 2014) (G) and using abstract graphs (KG A ).
Overall, the results indicate that enhancing Freebase and NELL with additional facts from textual sources leads to better results, particularly when these additional facts (< subject, verb, object > triples) are processed and clustered using low dimensional dense representations Gardner et al. (2014; use embeddings obtained by running PCA on the matrix of SVO triples).
Freebase has 4200+ relation types, and NELL 500+. More than 500 relation types in Freebase have less than 10 instances, wheres NELL does not have this issue (see Figures 1a and 1b). Because we test the approach for knowledge graph completion using classification based on the patterns as features, having features that appear too  Table 4: Relation results for the NELL KB. The second column is the best result for each relation reported by (Gardner et al., 2014).
few times will not help the system find a robust model. For the purpose of the presented experiments we filter the Freebase abstract graph to use only relation types that have at least 10 instances (Table 1 shows the statistics for this configuration).
It is not surprising that overall the results for NELL are higher -NELL has been designed on the principle of coupled learning, where connections between different relations are the basis of the resource and its continuous growth (Carlson et al., 2010b). It also has more training data for each relation (see table in Section 4.1). There is no consistent trend -for some relations using the paths extracted with this approach leads to better results, for others it does not (although, as we frequently mentioned, the fact that we used different negative sampling methods, the results are not directly comparable).
A more complete picture emerges when we look at the paths found, and compare them with the paths obtained with the PRA approach 3 . For all Freebase KG configurations, Gardner et al. (2014) have 1000 paths for most relations (approx. 6 of the relations have between 230 and 973). For NELL the number varies more, between 58 and 5509, 6 of the relations have more than 1000 metapaths. With the abstract graphs the numbers are much lower. For Freebase we find between 1 and 258 abstract paths, most of the relations (21) having fewer than 30 abstract paths for all KG configurations. For NELL we find between 1 and 157 paths, 5 of the relations having more than 100 ab-  Table 5: Statistics of the number of instances in the training and testing sets for the relations analyzed, and the number of paths extracted for each set (in parentheses the number of abstract paths for each graph). stract paths. The overlap between the sets of paths discovered with the two methods is very small: for Freebase the average overlap with respect to PRA is around 0.004 (for the different graph configurations), and with respect to the abstract paths around 0.2; for NELL around 0.003 relative to PRA and 0.27 relative to the abstract paths.
We note that overall, the system found more paths than what could be grounded for the given training instances for both Freebase and NELL. Another general observation is that relations for which we found the most patterns (Ath-letePlaysForTeam and StateHasLake for NELL, /medicine/disease/symptoms and /film/film/rating for Freebase) do not necessarily perform the best.
NELL The results for each relation in terms of average precision are presented in Table 4. We include the best result on PRA (on any variation of the graph), as reported by (Gardner et al., 2014), although since we used different negative instances the results are not directly comparable. Several of the NELL target relations have interesting patterns in the abstract graph, in particular Sta-diumLocatedInCity, TeamPlaysInLeague. In several cases, the algorithm has discovered "parallel" relations. For the relation WriterWroteBook, the most useful feature is the relation AgentCreated, which connects many of the source-target pairs in the WriterWroteBook relation. We found a similar situation with the relation JournalistWritesFor-Publication, which has WorksFor paralleling it in the graph.
Looking at specific relations, the paths extracted from the abstract graph are more focused. An example of this is the relation StadiumLocate-dInCity. Numerous paths detected by PRA seem irrelevant, as illustrated by the following (highest frequency) paths: The paths found in the abstract graph, as the example in Figure 3 shows, seem to capture more informative relation interdependencies.
Our system does not always find high quality patterns. It also finds surprising and most probably idiosyncratic patterns. In particular, for the StateHasLake relation, from the paths found, some very unexpected ones had groundings for the given training data: Agric.P rod.GrowingInStateOrP rov. −1 → Agric.P rod.GrowingInStateOrP rov. → StateHasLake M aleM ovedT oStateOrP rov. −1 → M aleM ovedT oStateOrP rov. → StateHasLake While the first rule could be justified (having lakes may favour the growing of certain types of agricultural products), the second one seems completely accidental. With a stronger filtering method based on the computed path scores we could eliminate some of these false patterns.  Freebase The fine-grained results for Freebase, in terms of average precision, are presented in Table 5. We make the same observation as for NELL -for several relations, the paths obtained from the abstract graph are different and more focused than the PRA ones. For the relation /film/film/rating for which the PRA approach gives very high results with the abstract graph has lower scores, some of the highest scoring paths found by the PRA are presented in Table 6. For comparison we also include the highest rated paths obtained using the abstract graph. While some of these paths were also found by the PRA, they are much lower in the list of extracted paths. The highest weighted paths found in the abstract graph connect specific properties of films with their rating. An archive containing the abstract graphs, the abstract paths, the train/test data, negative samples and the groundings of the abstract paths for these relations for the variations of Freebase and NELL presented here is available from the University of Heidelberg 4 . 4 https://www.cl.uni-heidelberg.de/ english/research/downloads/resource_ pages/AbstractGraphs/AbstractGraphs. shtml

Conclusions
We proposed and evaluated a method for obtaining paths from large knowledge graphs by compressing them into their intensional versions. We relied on the fact that these graphs have strongly typed relations, such that their domain and ranges consist of homogeneous sets that have overlaps only with the domains and ranges of a small number of other relations. This compression step leads to a smaller graph to work with, where we found paths that seem to capture qualitative patterns in the data. The results on link prediction on Freebase and NELL show the advantage of using such paths for some of the relations, but the task does not showcase the full potential of this representation. Further work will explore the potential of such patterns as explanatory links between directly connected nodes, or as a source of additional patterns for filling in the knowledge graphs not only with missing links, but also missing nodes, either by predicting intermediate nodes or by using the paths as patterns for targeted information extraction.