Recognizing Textual Entailment Using Probabilistic Inference

Recognizing Text Entailment (RTE) plays an important role in NLP applications including question answering, information retrieval, etc. In recent work, some research explore “deep” expressions such as discourse commitments or strict logic for representing the text. However, these expressions suffer from the limitation of inference inconvenience or translation loss. To overcome the limitations, in this paper, we propose to use the predicate-argument structures to represent the discourse commitments extracted from text. At the same time, with the help of the YAGO knowledge, we borrow the distant supervision technique to mine the implicit facts from the text. We also construct a probabilistic network for all the facts and conduct inference to judge the conﬁdence of each fact for RTE. The experimental results show that our proposed method achieves a competitive result compared to the previous work.


Introduction
For the natural language, a common phenomenon is that there exist a lot of ways to express the same or similar meaning. To discover such different expressions, the Recognising Textual Entailment (RTE) task is proposed to judge whether the meaning of one text (denoted as H) can be inferred (entailed) from the other one (T ) (Dagan et al., 2006). For many natural language processing applications like question answering, information retrieval which involve the diversity of natural language, recognising textual entailments is a critical step.
PASCAL Recognizing Textual Entailment (RTE) Challenges (Dagan et al., 2006) have witnessed a variety of excellent systems which intend to recognize the textual entailment instances. These systems mainly employ "shallow" techniques, including heuristics, term overlap, syntactic dependencies (Vanderwende et al., 2006;Jijkoun and de Rijke, 2005;Malakasiotis and Androutsopoulos, 2007;Haghighi et al., 2005). As Hickl (2008) stated, the shallow approaches do not work well for long sentences for the missing of underlying information which needs to be mined from the surface level expression.
Recently, some deep techniques are developed to mine the facts latent in the text. Hickl (2008) proposed the concept of discourse commitments which can be seen as the set of propositions inferred from the text, and used a series of syntaxlevel and semantic-level rules to extract the commitments from the T -H pairs. Then the RTE task is reduced to the identification of the commitments from T which are most likely to support the inference of the commitments from H. From the work of Hickl (2008), we can see that a deep understanding of text is critical to the RTE performance and discourse commitments can serve a good media to understanding text. However, the limitation of Hickl (2008)'s work is, the extracted discourse commitments are still from the original text and do not explore the implicit meaning latent behind the text.
Another kind of deep methods involves first transferring natural language to logic representation and then conducting strict logic inference based on the logic representations (de Salvo Braz et al., 2006;Tatu and Moldovan, 2006;Wotzlaw and Coote, 2013). Through logic inference, some implicit knowledge behind the text can be mined. However, it is not easy to translate the natural language text into formal logic expressions and the translation process inevitably suffer from great information loss.
Through analysis above, in our work, we pro- To judge the confidence of the new facts, we construct a probabilistic network with all the facts and adopt the Markov Logic Network (MLN) to calculate the probability of each new fact, which can be further used to recognize text entailments.

Our RTE System
To make full use of the underlying information in sentences and lessen the effect brought by natural language's vagueness, we design a RTE system which is composed of three stages, as shown in Figure 1.
First, we decompose the sentences in T -H pair to a series of discourse commitments as Hickl (2008) did. Since the syntax of these commitments are very simple, we can directly transform them to predicates (or 3-arg tuples). Then we use YAGO, a large semantic database including several thousands relations, to provide distant supervision. The predicates in T are matched to YAGO facts due to some metrics. At last, we use the Markov Logic Network(MLN) (Richardson and Domingos, 2006) to infer the correctness of the predicates in H. The MLN is constructed using the inference rules ("soft" rules) generated by AMIE system (Galárraga et al., 2013) on YAGO. Each rule has a weight which should be trained by real world facts. Using this framework, we can apply the "soft" logic to the textual entailment recognition task.

Extracting Predicates from Sentences
Discourse commitment is our baseline system proposed by Hickl (2008) which can decompose one sentence to a series of shorter and simpler sentences which completely contain the origin sentence's information. One of the advantages of discourse commitments is that it can use a lot of syntax-level and semantic level rules to extract the underlying information of one sentence. For example, the following T -H pair can be decomposed as Figure 2. The discourse commitments of T contains the information: Ayrton Senna married in 1998, which is not easy to be captured by "shallow" methods.
T : Ayrton Senna was married to a doctor who lives in Austin, the capital of Texas, in 1998.
H : Ayrton Senna lives in Texas.
Since we need to infer new facts using the extracted commitments, we transfer all the commit-  Figure 3: Text Predicates Example ments to predicates. For example, the commitments in Figure 2 can be transformed to the predicates (or triples) shown in Figure 3. We use RE-VERB (Fader et al., 2011) to extract the triples (predicate + 2 Arguments). To make the inference process in the next section more convenient, we order that all of the arguments should be NPs. Therefore, we check if the arguments in the triples contain or have overlap with any of the NPs, replace it with that NP, and the predicates are successfully extracted.

Distant supervision with YAGO
The goal of distant supervision is to use the knowledge of YAGO (Mahdisoltani et al., 2014) to help the textual entailment recognition task. The facts in YAGO have various type of connections with each other. We think this connection is very useful for RTE. Therefore, the predicates in the T -H pair should be matched to the YAGO facts for making advantage of the connection information.
Since YAGO is very large, the common predicates can easily matched to YAGO facts in most cases. However, YAGO cannot contain every predicate in T . We run DIRT (Lin and Pantel, 2001) system on 1GB text random sampled from Gigaword corpus and for each predicate we choose the top-10 similar predicates as its synonymous predicate. If the origin predicate cannot be found in YAGO, we instead check for the top-10 similar predicates. If we still cannot find a match, that means this predicate has very little connection with other predicates and cannot be supervised by YAGO.

Probabilistic Inference
The goal of MLN (Richardson and Domingos, 2006) is to implement the probabilistic inference, or "soft" logic inference. MLN is constructed by an inference rule base. Each rule has a weight which needs to be well trained by real word facts. We use AMIE to mine inference rules from YAGO.
AMIE 1 (Galárraga et al., 2013) is a state-ofthe-art inference rule mining system. The motivation of AMIE is that KBs themselves often already contain enough information to derive and add new facts. If, for example, a KB contains the fact that a child has a mother, then the mother's husband is most likely the father.
AMIE can mine such inference rules from large KBs. The inference rules can be directly used for constructing Markov logic network in the next section. In addition, the process of mining inference rules is quite efficient so that it is very helpful for our RTE task.
We use AMIE only to extract the inference rules. After the inference rules are prepared, we can construct a MLN. We give the related facts in YAGO to the MLN, then the weights of each inference rule can tune to a best fit for these facts. After the weights of each inference rule are well trained, the MLN is well prepared to use. Given the predicates in T , we first select all the related rules to construct a simple MLN, and then give the MLN some facts. After that, the MLN will calculate the probabilities of the unknown new facts. The arguments of the new facts are the permutations of all the ground atoms (or entities). For example, if we give the facts "hasChild (Cliton, Chelsea)" and "IsMarriedto (Cliton, Hillary)", the MLN will output the probability of "hasChild (Cliton, Hillary)", "hasChild (Hillary, Chelsea)", "IsMarriedto (Cliton, Chelsea)", etc. Obviously, the probability of "hasChild (Hillary, Chelsea)" may be the highest, so that it is most likely to be true. The MLN constructing and inferring can be implemented by Approach Accuracy Term overlap (Zanzotto and Moschitti, 2006) 67.50% Graph Matching (MacCartney et al., 2006) 65.33% Classification-Based (Hickl et al., 2006) 77.00% Discourse Commitment (Hickl, 2008) 84.93% Strict logic (Tatu and Moldovan, 2006) 71.59% Our Framework 85.16%

Experiment
We evaluate the performance of our framework for RTE on the PASCAL RTE-2 3 and RTE-3 4 datasets, which has 1600 examples. We use the YAGO2 for aligning predicates and mining inference rules. YAGO2 contains more than 940K facts and about 470K entities. We run the AMIE system on YAGO2 for only one time to get all inference rules (about more than 1.8K in total). For each T -H pair, we only choose a portion of related inference rules to construct MLN. The chosen rules must contain at least one predicate which occurred in the predicates of T -H pair. We only use the MLN to infer when the discourse commitment paraphrasing cannot identify a T -H pair as "Entailment", which is a back-off method.
We compare our result with 5 baseline systems: (1) Zanzotto and Moschitti (2006) Table 1. Since we only need to judge "Yes" or "No" for the 1600 examples, the precision is equal to the recall, so that we only report the precision.
According to the Table 1, the performance of our framework is higher than Hickl (2008)'s baseline, which is significant (Wilcoxon signed-rank test, p < 0.05). The reason is that we have added the inference portion to Hickl (2008)'s method. Therefore, some T -H pairs which had to be judged by semantic reasoning can be corrected by our framework. For instance, T is "Hughes loved his wife, Gracia, and was absolutely obsessed with his little daughter Elicia." and H is "Gracia's daughter is Elicia." It is not easy for the former baselines to recognize this entailment, but our framework can easily recognize it to be "true". In this way, our framework has achieved a higher result.

Related work
Textual Entailment Recognizing (RTE) task has been widely studied by many previous works. Firstly, the method based on similarity and overlap (Malakasiotis and Androutsopoulos, 2007;Jijkoun and de Rijke, 2005;Wan et al., 2006). This kind of methods can help solve the paraphrase recognition problem, which is a subset of RTE. Another important similarity-based method is tree kernel (Zanzotto and Moschitti, 2006), which rely on the cross-pair similarity between two pairs (T , H ) and (T , H ).
Secondly, some approaches extract the knowledge in T -H pair and check if the knowledge in T contains the knowledge in H. Hickl (2008) transformed the T -H pair into discourse commitments, reducing the RTE task to the identification of the commitments from a T which support the inference of the H. Other works map the text to logical meaning representations, and then strict logic entailment methods, possibly by invoking theorem provers.
Thirdly, some works make use of statistical classifiers which leverages a wide variety of features. The language expression of each T -H pair are represented by a feature vector f 1 , f 2 · · · f m . The feature vector contains the scores of different similarity measures applied to the pair, and possibly other features.
There are also other works based on predicateargument representations with Markov Logic for RTE, such as Rios et al. (2014) and Beltagy et al. (2013). However, they did not use discourse commitments to extract predicate-argument triples, which may lead to severe information loss.

Conclusion
This paper introduced a new framework to solve the Textual Entailment Recognizing task. This framework makes full use of Markov logic network for probabilistic inference. We hold that probabilistic inference is better than strict logic method since transforming from language form to strict logic form could lose a lot of information. Therefore it is extremely hard for the theorem provers to perform well.
In addition, we use YAGO database for distant supervision. The predicates extracted from T -H pair are first aligned with YAGO. If it succeeds, the inference procedure of MLN will become much more accurate. In addition, the inference rules for constructing MLN are also extracted from YAGO database using AIME system. This framework can correctly recognize the entailment T -H pairs which must be judged using inference. This is our improvement over the previous work.