Inferring Binary Relation Schemas for Open Information Extraction

This paper presents a framework to model the semantic representation of binary relations produced by open information extraction systems. For each binary relation, we infer a set of preferred types on the two arguments simultaneously, and generate a ranked list of type pairs which we call schemas. All inferred types are drawn from the Freebase type taxonomy, which are human readable. Our system collects 171,168 binary relations from Re-Verb, and is able to produce top-ranking relation schemas with a mean reciprocal rank of 0.337.


Introduction
Open information extraction (or Open IE) is a task of extracting all sorts of relations between named entities or concepts from open-domain text corpora, without restraining itself to specific relations or patterns. State-of-the-art Open IE systems (Carlson et al., 2010;Fader et al., 2011;Schmitz et al., 2012;Nakashole et al., 2012) extract millions of binary relations with high precision from the web corpus. Each extracted relation instance is a triple of the form arg 1 , rel, arg 2 , where the relation rel is a lexical or syntactic pattern, and both arguments are multi-word expressions representing the argument entities or concepts.
Whereas Open IE provides concrete relation instances, we are interested in generalizing these instances into more abstract semantic representations. In this paper, we focus on inferring the schemas of binary relations.
For example, given the binary relation "play in", an Open IE system extracts many triples of the form X, play in, Y . The following relation triples are extracted in ReVerb: Goel Grey, played in, Cabaret Tom Brady, play in, National Football League Informally, the goal of our system is to automatically infer a set of schemas such as t 1 , play in, t 2 , where t 1 and t 2 are two semantic types drawn from a standard knowledge base such as WordNet (Miller, 1995), Yago (Suchanek et al., 2007), Freebase (Bollacker et al., 2008), and Probase (Wu et al., 2012), and each such schema can be used to represent a set of "play in" relation instances. For the above example, two possible schemas for "play in" are: film actor, play in, film athlete, play in, sports league The schema of a binary relation is useful information in NLP tasks, such as context-oriented entity recognition and open domain question answering. Suppose we are to recognize the entities in the sentence "Granger played in the NBA". "Granger" is a highly ambiguous term, while "the NBA" is probably a sports league. Then with the the above relation schemas for "play in", the entity recognizer knows that "Granger" is more likely to be an athlete, which results in the correct linking to "Danny Granger", who is an NBA player, even though the Open IE has never extracted such fact before.
One relevant technique to achieve our goal is selectional preference (SP) (Resnik, 1996;Erk, 2007;Ritter et al., 2010), which computes the most appropriate types for a specific argument of a predicate. SP is based on the idea of mutual information (Erk, 2007), which tends to select types which are unique to the relation. In other words, common types which can be used for many different relations are less preferred. However, in Open IE, many relations are related or even similar, e.g., play in, take part in and be involved in. There's no reason for these relations not to share schemas. Therefore in this paper, our problem is, given a re-lation and its instances, identify the smallest types that can cover as many instances as possible. Our approach first attempts to link the arguments in the relation instances to a set of possible entities in a knowledge base, hence generate a set of e 1 , e 2 entity pairs. Then we select a pair of types t 1 , t 2 that covers maximum number of entity pairs. We resolve ties by selecting the smaller (more specific) types according to a type taxonomy inferred from knowledge base. This paper makes the following contributions: i) we defined the schema inference problem for binary relations from Open IE; ii) we developed a prototype system based on Freebase and entity linking (Lin et al., 2012;Ratinov et al., 2011;Hoffart et al., 2011;Rao et al., 2013;Cai et al., 2013), which simultaneously models the type distributions of two arguments for each binary relation; iii) our experiment on ReVerb triples showed that the top inferred schemas receive decent mean reciprocal rank (MRR) of 0.337, with respect to the human labeled ground truth.

Problem Definition
A knowledge base K is a 5-tuple E, Alist, T, P, IsA , where: -E is a finite set of entities e ∈ E, -Alist(e) = {n 1 , n 2 , ...} is a function which returns a set of names (or aliases) of an entity, -T is a finite set of types t ∈ T , -P is a finite set of relation instances p(e 1 , e 2 ), where p is a predicate in K. -IsA is a finite set entity-type pairs (e, t), representing the isA relation between entities and types. An entity belongs to at least one type.
An Open IE triple set S contains all relation instances extracted by the IE system, of the form a 1 , rel, a 2 , where a 1 and a 2 are the arguments of extracted relation pattern rel. The set of argument pairs sharing the same relation pattern rel is denoted by S rel .
The problem is, for each S rel , return a set of type pairs (or schemas) from T , t 1 , t 2 , ordered by the number of argument pairs covered in S rel . If two schemas cover the same number of argument pairs from S rel , the schema covering smallest number of entities wins.

System
The workflow of our system is shown in Figure  Figure 1. The system takes Open IE relation tuples as the input, then performs entity linking, relation grouping and schema ranking to translate them into final ranked list of schemas.
(1) Entity Linking: Relation arguments are linked to entities in the knowledge base by fuzzy string matching. Each entity in the knowledge base has a unique identifier.
(2) Relation Grouping: Linked tuples sharing similar relation patterns are grouped together. Besides, each group has a representative relation pattern, which is generated from all the patterns within the group.
(3) Schema Ranking: For each linked tuple in one relation group, argument entities are transformed into types drawn from the knowledge base. Then this procedure ranks type pairs (schemas) in terms of how much Open IE tuples a type pair can cover and how specific a type concept is.

Entity Linking
In the entity linking step, by matching arguments to entities in the knowledge base, each relation tuple is transformed into linked tuples, ltup = e 1 , rel, e 2 , with linking scores. We aim to support fuzzy matching between arguments and entity aliases, so we take all the aliases into consideration, and build an inverted index from words to aliases. Different words in one alias cannot be treated equally. Intuitively, a word is more important if it occurs in fewer aliases (n), and vice versa. Based on the inverted index, we use inverted document frequency score to approximately model the weight of a word w: Besides, stop words are removed from aliases, treating their idf scores as 0. In order to measure the probability of fuzzy matching from an argument (a) to an alias (n), we introduce the weighted overlap score: We merge all the aliases of an entity together to producing a similarity score of fuzzy matching between an entity and an argument: In order to control the quality of candidate entities, for an argument having m words (with stop words removed), we only keep entities that have at least one alias matching m − 1 words in the argument, and have a similarity score larger than a threshold, τ . With similarity score computed, we generate 10 best entity candidates respectively for both the subject and the object of rel.
Next, we model the joint similarity score (F ) of the relation tuple a 1 , rel, a 2 with each entity pair combination e 1 , e 2 in two ways. One is a naive method which only considers the similarity between arguments and corresponding entities: F (a 1 , e 1 ,a 2 , e 2 , rel) = sim(e 1 , a 1 ) × sim(e 2 , a 2 ). (4) The other method takes predicate paths between e 1 and e 2 into consideration. Let w be the word vector of rel, and p be a path of predicates connecting e 1 and e 2 in at most 2 hops. Here we say two entities e 1 and e 2 are connected in 1 hop, if there exists a predicate p, such that p(e 1 , e 2 ) (or p(e 2 , e 1 )) is in the knowledge base.
Similarly, e 1 and e 2 are connected in 2 hops, if there exists two predicates p 1 , p 2 and a transition entity e ′ , such that p 1 (e 1 , e ′ ) (or p 1 (e ′ , e 1 )) and p 2 (e ′ , e 2 ) (or p 2 (e 2 , e ′ )) are in the knowledge base.
We hence define the relatedness between p and w in the form of a conditional probability according to the Naive Bayes model: and we follow the IBM alignment Model 1 (Yao and Van Durme, 2014) to calculate the conditional probability between predicates and relation words P ( p | w). Based on the information above, we define a richer joint similarity score, considering all valid paths between e 1 and e 2 : F (a 1 , e 1 , a 2 ,e 2 , rel) = sim(e 1 , a 1 )× sim(e 2 , a 2 ) × p P ( p | w).
Due to the multiplications, the value of P ( p | w) varies a lot among different entity pair candidates. The large deviation makes P ( p | w) the most important term in Eq. (6), especially in the case when none of predicate paths are similar enough to the relation words. Therefore, we trust the factor of P ( p | w) only when there exists a similar predicate path. In practice, we use a threshold ρ to control whether to use Eq. (6) or Eq. (4). We call this an ensemble method. For each case of entity linking, if there exists one candidate entity pair satisfying P ( p | w) > ρ, we use the ensemble method, otherwise we fall back to the naive method for the current case.

Relation Grouping
In the step of relation grouping, linked tuples with similar relation patterns form a group. Each linked tuple belongs to one unique group.
The idea is to simplify relation patterns by syntactic transformations. If two patterns share the same simplified pattern, we treat them as being equivalent and put them into one group. First, since adjectives, adverbs and modal verbs can hardly change the type distribution of arguments in a relation, we remove these words from a pattern. Second, many relations from Open IE contain verbs, which come in different tenses. We transform all tenses into present tense. In addition, passive voice in a pattern, if any, is kept in the transformed pattern. A simple example below shows a group of relations: X, resign from, Y X, had resigned from, Y X, finally resigned from, Y All linked tuples with the same simplified pattern form a group. This pattern is selected as the representative pattern, like the pattern "resign from" in the above example.

Schema Ranking
Given a relation group, the step of schema ranking produces a ranked list of relation schemas with two constraints. Take "play in" as an example, the ideal schemas will contain the pair actor, f ilm and athlete, sports league Each linked tuple e 1 , rel, e 2 supports the type pair t 1 , t 2 where (e 1 , t 1 ), (e 2 , t 2 ) ∈ IsA in the knowledge base. We treat these pairs equally, since it's not trivial to tell which type is more related to the argument given the relation tuple as context. Combining all tuples in one group, we define the support of a type pair tp in a group (using the representative pattern r to stand for the group): A simple intuition is to rank schemas by the size of the support. Since one entity belongs to multiple types, relation schemas with general types will be ranked higher. However, two different schemas may share the same support. For instance, given the relation "X die in Y", suppose Open IE extractions and entity linking step returns correct results, the schema person, location and deceased person, location have identical supports. The latter one shows a more concrete representation of the relation, because deceased person covers small entities than person in the knowledge base.
Therefore, the schemas cannot be ranked by using the support alone. Next, we aim to extract the subsumption relations between types in the knowledge base, building the taxonomy of types.
We first define all entities in t as Intuitively, type t 1 is subsumed in t 2 , if all entities in t 1 also belong to t 2 , that is, cover(t 1 ) ⊆ cover(t 2 ). This uses the idea of strict set inclusion. For example, we can learn that the type person subsumes types such as actor, politician and deceased person. However, strict set inclusion doesn't always hold in the knowledge base. For example, entities in type award winner are mostly person, but there still has some organizations in it. The strict method fails to find the subsumption relation between award winner and person, while this subsumption actually holds with a large confidence.
To resolve this problem, we use a relaxed set inclusion, where the set cover(t 1 ) can be a subset of another set cover(t 2 ) to a certain degree. We define the degree of the subsumption as the ratio between the number of entities in the two sets: If deg(t 1 ⊆ t 2 ) > ǫ, then t 1 is subsumed by t 2 , and ǫ is a confidence parameter determined by weight tuning. By scanning all types in the knowledge base, all subsumption relations with enough confidence are extracted, forming our type taxonomy.
With a type hierarchy computed by above relaxed set inclusion, we can define a schema t 1 , t 2 subsumes another schema t 3 , t 4 if i) t 1 subsumes t 3 and t 2 subsumes t 4 ; ii) t 1 subsumes t 3 and t 2 = t 4 ; or iii) t 2 subsumes t 4 and t 1 = t 3 . If a schema (type pair) tp 1 subsumes another schema tp 2 , and their supports (|sup r (tp)|) are approximately equal, we give the more specific schema tp 2 a higher rank in the output list. Here two supports are roughly equal if: Where λ is a threshold determined in the experiments.

Evaluation
Freebase (Bollacker et al., 2008) is a collaboratively generated knowledge base, which contains more than 40 million entities, and more than 1,700 real types 1 . In our experiment, We use the 16 Feb. 2014 dump of Freebase as the knowledge base. ReVerb (Fader et al., 2011) is an Open IE system which aims to extract verb based relation instances from web corpora. The release ReVerb dataset contains more than 14 millions of relation tuples with high quality. We observed that in Re-Verb, some argument is unlikely to be an entity in Freebase, for example: M etro M anila, consists of, 12 cities , where the object argument is not an entity but a type. Since types are usually represented by lowercase common words, we remove the tuple if one argument is lowercase, or if it is made up completely of common words in WordNet. In addition, because date/time such as "Jan. 16th, 1981" often occurs in the object argument while Freebase does not have any such specific dates as entities, we use SUTime (Chang and Manning, 2012) to recognize dates as an virtual entity. After cleaning, the system collects 3,234,208 tuples and 171,168 relation groups.
We first evaluate the results of entity linking. We randomly pick 200 relation instances from Re-Verb, and manually labeled arguments with Freebase entities. For both naive and ensemble strategy, we evaluate the precision, recall, F1 and MRR score on the labeled set. An output entity pair is correct, if and only if both arguments are correctly linked. Experimental results are listed in Table 1. For the evaluation of relation schema, we first randomly pick 50 binary relations with support larger than 500 from the system. For each relation, we selected top 100 type pairs with the largest support, as what we evaluated. We assigned 3 human annotators to label the fitness score of type pair for the relation. The labeled score ranges from 0 to 3. Then we merge these 3 label sets, forming 50 gold standard rankings. When evaluating a relation schema list from our system, we calculate the MRR score (Liu, 2009) by the top schemas in the gold rankings.
For comparison, we use Pointwise Mutual Information (Church and Hanks, 1990) as our baseline model, which is used in other selectional preference tasks (Resnik, 1996). We define the association score between relation and type pair as: P M I(r, tp) = p(r, tp) log p(r, tp) p(r, * )p( * , tp) Where p(r, tp) is the joint probability of relation and type pair in the whole linked tuple set, and * stands for any relations or type pairs. Table 2 shows the MRR scores by using both baseline model (PMI) and our approach. As the result shows, our approach improves the MRR score by 10.1%. Finally, Table 3 shows some example binary relations, and their schemas inferred by our system. We can see that with a well-defined type hierarchy, our system is able to extract both coarse-grained and fine-grained type information from entities, resulting in a informative type lists.

Conclusion
In summary, our work describes a data driven approach of relation schema inference. By maximizing the support of both arguments simultaneously, our system is able to generate humanreadable type pairs for a binary relation from Open IE systems. Our experiments shows that the top ranked relation schemas for each relation are accurate according to human judges. The proposed framework can be integrated with future Open IE systems.