NeuInfer: Knowledge Inference on N-ary Facts

Knowledge inference on knowledge graph has attracted extensive attention, which aims to find out connotative valid facts in knowledge graph and is very helpful for improving the performance of many downstream applications. However, researchers have mainly poured attention to knowledge inference on binary facts. The studies on n-ary facts are relatively scarcer, although they are also ubiquitous in the real world. Therefore, this paper addresses knowledge inference on n-ary facts. We represent each n-ary fact as a primary triple coupled with a set of its auxiliary descriptive attribute-value pair(s). We further propose a neural network model, NeuInfer, for knowledge inference on n-ary facts. Besides handling the common task to infer an unknown element in a whole fact, NeuInfer can cope with a new type of task, flexible knowledge inference. It aims to infer an unknown element in a partial fact consisting of the primary triple coupled with any number of its auxiliary description(s). Experimental results demonstrate the remarkable superiority of NeuInfer.


Introduction
With the introduction of connotative valid facts, knowledge inference on knowledge graph improves the performance of many downstream applications, such as vertical search and question answering (Dong et al., 2015;Lukovnikov et al., 2017). Existing studies (Nickel et al., 2016;Wang et al., 2017) mainly focus on knowledge inference on binary facts with two entities connected with a certain binary relation, represented as triples, (head entity, relation, tail entity). They attempt to infer the unknown head/tail entity or the unknown relation of a given binary fact. However, n-ary facts involving more than two entities are also ubiquitous. For example, in Freebase, more than 1/3 entities participate in n-ary facts (Wen et al., 2016). The fact that John Bardeen received N obel P rize in P hysics in 1956 together with W alter Houser Brattain and W illiam Shockley 1 is a typical 5ary fact. So far, only a few studies (Wen et al., 2016;Zhang et al., 2018;Guan et al., 2019) have tried to address knowledge inference on n-ary facts.
In existing studies for knowledge inference on nary facts, each n-ary fact is represented as a group of peer attributes and attribute values. In practice, for each n-ary fact, there is usually a primary triple (the main focus of the n-ary fact), and other attributes along with the corresponding attribute values are its auxiliary descriptions. Take the above 5-ary fact for example, the primary triple is (John Bardeen, award-received, N obel P rize in P hysics), and other attribute-value pairs including point-in-time : 1956 , together-with : W alter Houser Brattain and together-with : W illiam Shockley are its auxiliary descriptions. Actually, in YAGO (Suchanek et al., 2007) and Wikidata (Vrandečić and Krötzsch, 2014), a primary triple is identified for each n-ary fact.
The above 5-ary fact is a relatively complete example. In the real-world scenario, many n-ary facts appear as only partial ones, each consisting of a primary triple and a subset of its auxiliary description(s), due to incomplete knowledge acquisition. For example, (John Bardeen, awardreceived, N obel P rize in P hysics) with pointin-time : 1956 and it with {together-with : W alter Houser Brattain, together-with : W illiam Shockley} are two typical partial facts corresponding to the above 5-ary fact. For differentiation, we call those relatively complete facts as whole ones. We noticed that existing studies on n-ary facts infer an unknown element in a welldefined whole fact and have not paid attention to knowledge inference on partial facts. Later on, we refer the former as simple knowledge inference, while the latter as flexible knowledge inference.
With these considerations in mind, in this paper, by discriminating the information in the same n-ary fact, we propose a neural network model, called NeuInfer, to conduct both simple and flexible knowledge inference on n-ary facts. Our specific contributions are summarized as: • We treat the information in the same n-ary fact discriminatingly and represent each n-ary fact as a primary triple coupled with a set of its auxiliary descriptive attribute-value pair(s).
• We propose a neural network model, NeuInfer, for knowledge inference on n-ary facts. NeuInfer can particularly handle the new type of task, flexible knowledge inference, which infers an unknown element in a partial fact consisting of a primary triple and any number of its auxiliary description(s).
• Experimental results validate the significant effectiveness and superiority of NeuInfer.
2 Related Works

Knowledge Inference on Binary Facts
They can be divided into tensor/matrix based methods, translation based methods, and neural network based ones. The quintessential one of tensor/matrix based methods is RESCAL (Nickel et al., 2011). It relates a knowledge graph to a three-way tensor of head entities, relations, and tail entities. The learned embeddings of entities and relations via minimizing the reconstruction error of the tensor are used to reconstruct the tensor. And binary facts corresponding to entries of large values are treated as valid. Similarly, ComplEx (Trouillon et al., 2016) relates each relation to a matrix of head and tail entities, which is decomposed and learned like RESCAL. To improve the embeddings and thus the performance of inference, researchers further introduce the constraints of entities and relations (Ding et al., 2018;Jain et al., 2018).
Translation based methods date back to TransE (Bordes et al., 2013). It views each valid binary fact as the translation from the head entity to the tail entity via their relation. Thus, the score function indicating the validity of the fact is defined based on the similarity between the translation result and the tail entity. Then, a flurry of methods spring up (Wang et al., 2014;Lin et al., 2015b;Ji et al., 2015;Guo et al., 2015;Lin et al., 2015a;Xiao et al., 2016;Jia et al., 2016;Tay et al., 2017;Ebisu and Ichise, 2018;Chen et al., 2019). They modify the above translation assumption or introduce additional information and constraints. Among them, TransH (Wang et al., 2014) translates on relationspecific hyperplanes. Entities are projected into the hyperplanes of relations before translating.
Neural network based methods model the validity of binary facts or the inference processes. For example, ConvKB (Nguyen et al., 2018) treats each binary fact as a three-column matrix. This matrix is fed into a convolution layer, followed by a concatenation layer and a fully-connected layer to generate a validity score. Nathani et al. (2019) further proposes a generalized graph attention model as the encoder to capture neighborhood features and applies ConvKB as the decoder. ConvE (Dettmers et al., 2018) models entity inference process via 2D convolution over the reshaped then concatenated embedding of the known entity and relation. ConvR (Jiang et al., 2019) further adaptively constructs convolution filters from relation embedding and applies these filters across entity embedding to generate convolutional features. SENN (Guan et al., 2018) models the inference processes of head entities, tail entities, and relations via fullyconnected neural networks, and integrates them into a unified framework.

Knowledge Inference on N-ary Facts
As aforesaid, only a few studies handle this type of knowledge inference. The m-TransH method (Wen et al., 2016) defines n-ary relations as the mappings from the attribute sequences to the attribute values. Each n-ary fact is an instance of the corresponding n-ary relation. Then, m-TransH generalizes TransH (Wang et al., 2014) on binary facts to nary facts via attaching each n-ary relation with a hyperplane. RAE (Zhang et al., 2018) further introduces the likelihood that two attribute values co-participate in a common n-ary fact, and adds the corresponding relatedness loss multiplied by a weight factor to the embedding loss of m-TransH. Specifically, RAE applies a fully-connected neural network to model the above likelihood. Differently, NaLP (Guan et al., 2019) represents each n-ary fact as a set of attribute-value pairs directly. Then, convolution is adopted to get the embeddings of the attribute-value pairs, and a fully-connected neural network is applied to evaluate their relatedness and finally to obtain the validity score of the input n-ary fact.
In these methods, the information in the same n-ary fact is equal-status. Actually, in each n-ary fact, a primary triple can usually be identified with other information as its auxiliary description(s), as exemplified in Section 1. Moreover, these methods are deliberately designed only for the inference on whole facts. They have not tackled any distinct inference task. In practice, the newly proposed flexible knowledge inference is also prevalent.

The Representation of N-ary Facts
Different from the studies that define n-ary relations first and then represent n-ary facts (Wen et al., 2016;Zhang et al., 2018), we represent each n-ary fact as a primary triple (head entity, relation, tail entity) coupled with a set of its auxiliary description(s) directly. Formally, given an n-ary fact F ct with the primary triple (h, r, t), m attributes and attribute values, its representation is: where each a i : v i (i = 1, 2, . . . , m) is an attributevalue pair, also called an auxiliary description to the primary triple. An element of F ct refers to h/r/t/a i /v i ; A F ct = {a 1 , a 2 , . . . , a m } is F ct's attribute set and a i may be the same to a j (i, j = 1, 2, . . . , m, i = j); For example, the representation of the 5-ary fact, mentioned in Section 1, is: Note that, in the real world, there is a type of complicated cases, say, where more than two entities participate in the same n-ary fact with the same primary attribute. We follow Wikidata (Vrandečić and Krötzsch, 2014) to view the cases from different aspects of different entities. Take the case that John Bardeen, W alter Houser Brattain, and W illiam Shockley received N obel P rize in P hysics in 1956 for example, besides the above 5-ary fact from the view of John Bardeen, we get other two 5-ary facts from the views of W alter Houser Brattain 2 and W illiam Shockley 3 , respectively: (W alter Houser Brattain, award-received, N obel

Task Statement
In this paper, we handle both the common simple knowledge inference and the newly proposed flexible knowledge inference. Before giving their definitions under our representation form of n-ary facts, let us define whole fact and partial fact first.
Definition 1 (Whole fact and partial fact). For the fact F ct, assume its set of auxiliary description(s) as S d = {a i : v i |i = 1, 2, . . . , m}. Then a partial fact of F ct is: F ct = (h, r, t), S d , where S d ⊂ S d , i.e., S d is a subset of S d . And we call F ct the whole fact to differentiate it from F ct .
Notably, whole fact and partial fact are relative concepts, and a whole fact is a relatively complete fact compared to its partial fact. In this paper, partial facts are introduced to imitate a typical openworld setting where different facts of the same type may have different numbers of attribute-value pair(s).
Definition 2 (Simple knowledge inference). It aims to infer an unknown element in a whole fact.
Definition 3 (Flexible knowledge inference). It aims to infer an unknown element in a partial fact.

The Framework of NeuInfer
To conduct knowledge inference on n-ary facts, NeuInfer first models the validity of the n-ary facts and then casts inference as a classification task.  In the above first n-ary fact, the primary triple is invalid. In the second one, some auxiliary description is incompatible with the primary triple. Therefore, we believe that a valid n-ary fact has two prerequisites. On the one hand, its primary triple should be valid. If the primary triple is invalid, attaching any number of attribute-value pairs to it does not make the resulting n-ary fact valid; on the other hand, since each auxiliary description presents a qualifier to the primary triple, it should be compatible with the primary triple. Even if the primary triple is basically valid, any incompatible attribute-value pair makes the n-ary fact invalid. Therefore, NeuInfer is designed to characterize these two aspects and thus consists of two components corresponding to the validity evaluation of the primary triple and the compatibility evaluation of the n-ary fact, respectively.

The Framework of NeuInfer
The framework of NeuInfer is illustrated in Figure 1, with the 5-ary fact presented in Section 1 as an example.
For an n-ary fact F ct, we look up the embeddings of its relation r and the attributes in A F ct from the embedding matrix M R ∈ R |R|×k of relations and attributes, where R is the set of all the relations and attributes, and k is the dimension of the latent vector space. The embeddings of h, t, and the attribute values in V F ct are looked up from the embedding matrix M E ∈ R |E|×k of entities and attribute values, where E is the set of all the entities and attribute values. In what follows, the embeddings are denoted with the same letters but in boldface by convention. As presented in Figure 1, these embeddings are fed into the validity evaluation component (the upper part of Figure 1) and the compatibility evaluation component (the bottom part of Figure 1) to compute the validity score of (h, r, t) and the compatibility score of F ct, respectively. These two scores are used to generate the final score of F ct by weighted sum ⊕ and further compute the loss. Note that, following RAE (Zhang et al., 2018) and NaLP (Guan et al., 2019), we only apply fully-connected neural networks in NeuInfer.

Validity Evaluation
This component estimates the validity of (h, r, t), including the acquisition of its interaction vector and the assessment of its validity, corresponding to "hrt-FCNs" and "FCN 1 " in Figure 1, respectively.
Detailedly, the embeddings of h, r, and t are concatenated and fed into a fully-connected neural network. After layer-by-layer learning, the last layer outputs the interaction vector o hrt of (h, r, t): where f (·) is the ReLU function; n 1 is the number of the neural network layers; {W 1,1 , W 1,2 , . . . , W 1,n 1 } and {b 1,1 , b 1,2 , . . . , b 1,n 1 } are their weight matrices and bias vectors, respectively.
With o hrt as the input, the validity score val hrt of (h, r, t) is computed via a fully-connected layer and then the sigmoid operation: where W val and b val are the weight matrix and bias variable, respectively; σ(x) = 1 1+e −x is the sigmoid function, which constrains val hrt ∈ (0, 1).
For simplicity, the number of hidden nodes in each fully-connected layer of "hrt-FCNs" and "FCN 1 " gradually reduces with the same difference between layers.

Compatibility Evaluation
This component estimates the compatibility of F ct. It contains three sub-processes, i.e., the capture of the interaction vector between (h, r, t) and each auxiliary description a i : v i (i = 1, 2, . . . , m), the acquisition of the overall interaction vector, and the assessment of the compatibility of F ct, corresponding to "hrtav-FCNs", "min" and "FCN 2 " in Figure 1, respectively.
Similar to "hrt-FCNs", we obtain the interaction vector o hrta i v i of (h, r, t) and a i : v i : where n 2 is the number of the neural network layers; {W 2,1 , W 2,2 , . . . , W 2,n 2 } and {b 2,1 , b 2,2 , . . . , b 2,n 2 } are their weight matrices and bias vectors, respectively. The number of hidden nodes in each fully-connected layer also gradually reduces with the same difference between layers. And the dimension of the resulting o hrta i v i is d.
All the auxiliary descriptions share the same parameters in this sub-process. The overall interaction vector o hrtav of F ct is generated based on o hrta i v i . Before introducing this sub-process, let us see the principle behind first.
Straightforwardly, if F ct is valid, (h, r, t) should be compatible with any of its auxiliary description. Then, the values of their interaction vector, measuring the compatibility in many different views, are all encouraged to be large. Therefore, for each dimension, the minimum over it of all the interaction vectors is not allowed to be too small.
Thus, the overall interaction vector o hrtav of (h, r, t) and its auxiliary description(s) is: where min(·) is the element-wise minimizing function.
Then, similar to "FCN 1 ", we obtain the compatibility score comp F ct of F ct: where W comp of dimension d × 1 and b comp are the weight matrix and bias variable, respectively.

Final Score and Loss Function
The final score s F ct of F ct is the weighted sum ⊕ of the above validity score and compatibility score: where w ∈ (0, 1) is the weight factor. If the arity of F ct is 2, the final score is equal to the validity score of the primary triple (h, r, t). Then, Equation (6) is reduced to: Currently, we obtain the final score s F ct of F ct. In addition, F ct has its target score l F ct . By comparing s F ct with l F ct , we get the binary crossentropy loss: Here, T is the training set and T − is the set of negative samples constructed by corrupting the n-ary facts in T . Specifically, for each n-ary fact in T , we randomly replace one of its elements with a random element in E/R to generate one negative sample not contained in T .
We then optimize NeuInfer via backpropagation, and Adam (Kingma and Ba, 2015) with learning rate λ is used as the optimizer.

Datasets and Metrics
We conduct experiments on two n-ary datasets. The first one is JF17K (Wen et al., 2016;Zhang et al., 2018), derived from Freebase (Bollacker et al., 2008). In JF17K, an n-ary relation of a certain type is defined by a fixed number of ordered attributes. Then, any n-ary fact of this relation is denoted as an ordered sequence of attribute values corresponding to the attributes. For example, for all n-ary facts of the n-ary relation olympics.olympic medal honor, they all have four attribute values (e.g., 2008 Summer Olympics, U nited States, N atalie Coughlin, and Swimming at the 2008 Summer Olympics -W omen s 4×100 metre f reestyle relay), corresponding to the four ordered attributes of this n-ary relation. The second one is WikiPeople (Guan et al., 2019), derived from Wikidata (Vrandečić and Krötzsch, 2014). Its n-ary facts are more diverse than JF17K's. For example, for all n-ary facts that narrate award-received, some have the attribute together-with, while some others do not. Thus, WikiPeople is more difficult.
To run NeuInfer on JF17K and WikiPeople, we transform the representation of their n-ary facts. For JF17K, we need to convert each attribute value sequence of a specific n-ary relation to a primary triple coupled with a set of its auxiliary description(s). The core of this process is to determine the primary triple, formed by merging the two primary attributes of the n-ary relation and the corresponding attribute values. The two primary attributes are selected based on RAE (Zhang et al., 2018). For each attribute of the n-ary relation, we count the number of its distinct attribute values from all the n-ary facts of this relation. The two attributes that correspond to the largest and second-largest numbers are chosen as the two primary attributes. For WikiPeople, since there is a primary triple for each n-ary fact in Wikidata, with its help, we simply reorganize a set of attribute-value pairs in WikiPeople to a primary triple coupled with a set of its auxiliary description(s).
The statistics of the datasets after conversion or reorganization are outlined in Table 1, where #T rain, #V alid, and #T est are the sizes of the training set, validation set, and test set, respectively.
As for metrics, we adopt the standard Mean Re-  ciprocal Rank (MRR) and Hits@N . For each n-ary test fact, one of its elements is removed and replaced by all the elements in E/R. These corrupted n-ary facts are fed into NeuInfer to obtain the final scores. Based on these scores, the n-ary facts are sorted in descending order, and the rank of the n-ary test fact is stored. Note that, except the nary test fact, other corrupted n-ary facts existing in the training/validation/test set, are discarded before sorting. This process is repeated for all other elements of the n-ary test fact. Then, MRR is the average of these reciprocal ranks, and Hits@N is the proportion of the ranks less than or equal to N .
Knowledge inference includes entity inference and relation inference. As presented in Table 1, the number of relations and attributes in each dataset is far less than that of entities and attribute values (on JF17K, |R| = 501, while |E| = 28, 645; on WikiPeople, |R| = 193, while |E| = 47, 765). That is, inferring a relation/attribute is much simpler than inferring an entity/attribute value. Therefore, we adopt MRR and Hits@{1, 3, 10} on entity inference, while pouring attention to more finegrained metrics, i.e., MRR and Hits@1 on relation inference.

Simple Knowledge Inference
Simple knowledge inference includes simple entity inference and simple relation inference. For an nary fact, they infer one of the entities/the relation in  the primary triple or the attribute value/attribute in an auxiliary description, given its other information.

Baselines
Knowledge inference methods on n-ary facts are scarce. The representative methods are m-TransH (Wen et al., 2016) and its modified version RAE (Zhang et al., 2018), and the state-of-the-art one is NaLP (Guan et al., 2019). As m-TransH is worse than RAE, following NaLP, we do not adopt it as a baseline.

Simple Entity Inference
The experimental results of simple entity inference are reported in Table 2. From the results, it can be observed that NeuInfer performs much better than the best baseline NaLP, which verifies the superiority of NeuInfer. Specifically, on JF17K, the performance gap between NeuInfer and NaLP is significant. In essence, 0.151 on MRR, 14.6% on Hits@1, 16.2% on Hits@3, and 15.9% on Hits@10. On WikiPeople, NeuInfer also outperforms NaLP. It testifies the strength of NeuInfer treating the information in the same n-ary fact discriminatingly. By differentiating the primary triple from other auxiliary description(s), NeuInfer considers the validity of the primary triple and the compatibility between the primary triple and its auxiliary description(s) to model each n-ary fact more appropriately and reasonably. Thus, it is not surprising that NeuInfer beats the baselines. And on simpler JF17K (see Section 5.1), NeuInfer gains more significant performance improvement than on WikiPeople.

Simple Relation Inference
Since RAE is deliberately developed only for simple entity inference, we compare NeuInfer only with NaLP on simple relation inference. Table 3 demonstrates the experimental results of simple relation inference. From the table, we can observe that NeuInfer outperforms NaLP consistently. Detailedly, on JF17K, the performance improvement of NeuInfer on MRR and Hits@1 is 0.036 and 7.0%, respectively; on WikiPeople, they are 0.030 and 9.1%, respectively. It is ascribed to the reasonable modeling of n-ary facts, which not only improves the performance of simple entity inference but also is beneficial to pick the exact right relations/attributes out.

Ablation Study
We perform an ablation study to look deep into the framework of NeuInfer. If we remove the compatibility evaluation component, NeuInfer is reduced to a method for binary but not n-ary facts. Since we handle knowledge inference on n-ary facts, it is inappropriate to remove this component. Thus, as an ablation, we only deactivate the validity evaluation component, denoted as NeuInfer − . The experimental comparison between NeuInfer and NeuInfer − is illustrated in Figure 2. It can be observed from the figure that NeuInfer outperforms NeuInfer − significantly. It suggests that the validity evaluation component plays a pivotal role in our method. Thus, each component of our method is necessary.

Flexible Knowledge Inference
The newly proposed flexible knowledge inference focuses on n-ary facts of arities greater than 2. It includes flexible entity inference and flexible relation inference. For an n-ary fact, they infer one of the entities/the relation in the primary triple given any number of its auxiliary description(s) or infer the attribute value/attribute in an auxiliary description given the primary triple and any number of other auxiliary description(s). In existing knowledge inference methods on n-ary facts, each n-ary fact is represented as a group of peer attributes and attribute values. These methods have not poured attention to the above flexible knowledge inference. Thus, we conduct this new type of task only on  NeuInfer. Before elaborating on the experimental results, let us look into the new test set used in this section first.

The New Test Set
We generate the new test set as follows: • Collect the n-ary facts of arities greater than 2 from the test set.
• For each collected n-ary fact, compute all the subsets of the auxiliary description(s). The primary triple and each subset form a new n-ary fact, which is added to the candidate set.
• Remove the n-ary facts that also exist in the training/validation set from the candidate set and then remove the duplicate n-ary facts. The remaining n-ary facts form the new test set.
The size of the resulting new test set on JF17K is 34,784, and that on WikiPeople is 13,833.

Flexible Entity and Relation Inference
The experimental results of flexible entity and relation inference on these new test sets are presented in Table 4. It can be observed that NeuInfer well tackles flexible entity and relation inference on partial facts, and achieves excellent performance. We also attribute this to the reasonable modeling of n-ary facts. For each n-ary fact, NeuInfer distinguishes the primary triple from other auxiliary description(s) and models them properly. Thus, NeuInfer well handles various types of entity and relation inference concerning the primary triple coupled with any number of its auxiliary description(s).

Performance under Different Scenarios
To further analyze the effectiveness of the proposed NeuInfer method, we look into the breakdown of its performance on different arities, as well as on primary triples and auxiliary descriptions. Without loss of generality, here we report only the experimental results on simple entity inference. The test sets are grouped into binary and n-ary (n > 2) categories according to the arities of the facts. Table 5 presents the experimental results of simple entity inference on these two categories of JF17K and WikiPeople. From the tables, we can observe that NeuInfer consistently outperforms the baselines on both categories on simpler JF17K. On more difficult WikiPeople, NeuInfer is comparable to the best baseline NaLP on the binary category and gains much better performance on the n-ary category in terms of the fine-grained MRR and Hits@1. In general, NeuInfer performs much better on JF17K than on WikiPeople. We attribute this to the simplicity of JF17K.
Where does the above performance improvement come from? Is it from inferring the head/tail      Tables 6  and 7. It can be observed that NeuInfer brings more performance gain on inferring attribute values. It indicates that combining the validity of the primary triple and the compatibility between the primary triple and its auxiliary description(s) to model each n-ary fact is more effective than only considering the relatedness of attribute-value pairs in NaLP, especially for inferring attribute values.

Conclusions
In this paper, we distinguished the information in the same n-ary fact and represented each n-ary fact as a primary triple coupled with a set of its auxiliary description(s). We then proposed a neural network model, NeuInfer, for knowledge inference on n-ary facts. NeuInfer combines the validity evaluation of the primary triple and the compatibility evaluation of the n-ary fact to obtain the validity score of the n-ary fact. In this way, NeuInfer has the ability of well handling simple knowledge inference, which copes with the inference on whole facts. Furthermore, NeuInfer is capable of dealing with the newly proposed flexible knowledge inference, which tackles the inference on partial facts consisting of a primary triple coupled with any number of its auxiliary descriptive attributevalue pair(s). Experimental results manifest the merits and superiority of NeuInfer. Particularly, on simple entity inference, NeuInfer outperforms the state-of-the-art method significantly in terms of all the metrics. NeuInfer improves the performance of Hits@3 even by 16.2% on JF17K.
In this paper, we use only n-ary facts in the datasets to conduct knowledge inference. For future works, to further improve the method, we will explore the introduction of additional information, such as rules and external texts.