Acquisition of Phrase Correspondences Using Natural Deduction Proofs

How to identify, extract, and use phrasal knowledge is a crucial problem for the task of Recognizing Textual Entailment (RTE). To solve this problem, we propose a method for detecting paraphrases via natural deduction proofs of semantic relations between sentence pairs. Our solution relies on a graph reformulation of partial variable unifications and an algorithm that induces subgraph alignments between meaning representations. Experiments show that our method can automatically detect various paraphrases that are absent from existing paraphrase databases. In addition, the detection of paraphrases using proof information improves the accuracy of RTE tasks.


Introduction
Recognizing Textual Entailment (RTE) is a challenging natural language processing task that aims to judge whether one text fragment logically follows from another text fragment (Dagan et al., 2013). Logic-based approaches have been successful in representing the meanings of complex sentences, ultimately having a positive impact on RTE (Bjerva et al., 2014;Beltagy et al., 2014;Mineshima et al., 2015Abzianidze, 2015Abzianidze, , 2016. Although logic-based approaches succeed in capturing the meanings of functional or logical words, it is difficult to capture the meanings of content words or phrases using genuine logical inference alone. This remains a crucial problem in accounting for lexical relations between content words or phrases via logical inference. To solve this problem, previous logic-based approaches use knowledge databases such as WordNet (Miller, 1995) to identify lexical relations within a sen-tence pair. While this solution has been successful in handling word-level paraphrases, its extension to phrase-level semantic relations is still an unsolved problem. There are three main difficulties that prevent an effective identification and use of phrasal linguistic knowledge.
The first difficulty is the presence of out-ofcontext phrase relations in popular databases such as the Paraphrase Database (PPDB) (Ganitkevitch et al., 2013). PPDB may suggest paraphrases that do not adhere to the context of our relevant text segments nor to their semantic structure, which might be problematic.
The second difficulty is finding semantic phrase correspondences between the relevant text segments. Typical approaches only rely on surface (Beltagy et al., 2013) or syntactic correspondences (Arase and Tsujii, 2017), often producing inaccurate alignments that significantly impact our inference capabilities. Instead, a mechanism to compute semantic phrase correspondences could potentially produce, if available, more coherent phrase pairs and solve the recurring issue of discontinuity.
The third difficulty is the intrinsic lack of coverage of databases for logical inference despite their large size. Whereas there is a relatively small number of possible word-to-word correspondences and thus their semantic relations can be enumerated, the same is not true for all phrase pairs that might be of interest. One alternative is to use functions of infinite domain (e.g., cosine similarity) between phrase representations (Tian et al., 2016), but these techniques are still under development, and we have not seen definitive successful applications when combined with logic systems.
In this study, we tackle these three problems. The contributions of this paper are summarized as follows: First, we propose a new method of detecting phrase correspondences through natu-ral deduction proofs of semantic relations for a given sentence pair. Second, we show that our method automatically extracts various paraphrases that compensate for a shortage in previous paraphrase databases. Experiments show that extracted paraphrases using proof information improve the accuracy of RTE tasks.

Related Work
In this section, we review previous logical inference systems that are combined with lexical knowledge. The RTE system developed by Abzianidze (2016) uses WordNet as axioms and adds missing knowledge manually from the training dataset; however, this technique requires considerable human effort and is not extended to handle phrasal knowledge.  proposed an RTE system with an on-the-fly axiom injection mechanism guided by a natural deduction theorem prover. Pairs of unprovable sub-goals and plausible single premises are identified by means of a variable unification routine and then linguistic relations between their logical predicates are checked using lexical knowledge such as Word-Net and VerbOcean (Chklovski and Pantel, 2004). However, this mechanism is limited to capturing word-to-word relations within a sentence pair. Bjerva et al. (2014) proposes an RTE system where WordNet relations are used as axioms for word-to-word knowledge in theorem proving. For phrasal knowledge, PPDB is used to rephrase an input sentence pair instead of translating paraphrases into axioms. However, this solution ignores logical contexts that might be necessary when applying phrasal knowledge. Moreover, it does not apply to discontinuous phrases. Beltagy et al. (2016) uses WordNet and PPDB as lexical knowledge in the RTE system. To increase their coverage of phrasal knowledge, the system combines a resolution strategy to align clauses and literals in a sentence pair and a statistical classifier to identify their semantic relation. However, this strategy only considers one possible set of alignments between fragments of a sentence pair, possibly causing inaccuracies when there are repetitions of content words and meta-predicates.
In our research, we propose an automatic phrase abduction mechanism to inject phrasal knowledge during the proof construction process. In addition, we consider multiple alignments by backtracking the decisions on variable and predicate unifications, which is a more flexible strategy. We represent logical formulas using graphs, since this is a general formalism that is easy to visualize and analyze. However, we use natural deduction (see Section 3.2) as a proof system instead of Markov Logic Networks for inference. Some research has investigated graph operations for semantic parsing (Reddy et al., 2014(Reddy et al., , 2016 and abstractive summarization (Liu et al., 2015); we contribute to these ideas by proposing a subgraph mapping algorithm that is useful for performing natural language inferences.
Considerable research efforts have been focused on the identification and extraction of paraphrases. One successful technique is associated with bilingual pivoting (Bannard and Callison-Burch, 2005;Zhao et al., 2008), in which alternative phrase translations are used as paraphrases at a certain probability. However, this technique requires large bilingual parallel corpora; moreover, word alignment errors likely cause noisy paraphrases. Another strategy is to extract phrase pairs from a monolingual paraphrase corpus using alignments between syntactic trees, guided by a linguistically motivated grammar (Arase and Tsujii, 2017). The main difference between these studies and ours is that they typically attempt alignment between words or syntactic trees, whereas we perform alignments between meaning representations, which enables the acquisition of more general paraphrases by distinguishing functional words from content words. This point is important in distinguishing among different semantic relations (e.g., antonyms and synonyms). In addition, word and syntactic alignments potentially ignore coreferences, making it difficult to find relations between many-to-many sentences. Semantic alignments enable this because coreferences must refer to the same variable as the original entity.
3 Logic-based Approach to RTE

Meaning representation
In logic-based approaches to RTE, a text T and a hypothesis H are mapped onto logical formulas T and H . To judge whether T entails H, we check whether T ⇒ H is a theorem in a logical system.
For meaning representations, we use Neo-Davidsonian event semantics (Parsons, 1990). In this approach, a verb is analyzed as a one-place predicate over events. Both the arguments of a verb and modifiers are linked to events by semantic roles, and the entire sentence is closed by existential quantification over events. For example, (1) is mapped onto (2).
(1) A girl is skipping rope on a sidewalk.
We use x i as a variable for entities and y j for events. In this semantics, we represent all content words (e.g., girl and skip) as one-place predicates. Regarding functional words, we represent a preposition like on as a two-place predicate, e.g., on(y 1 , x 3 ). We also use a small set of semantic roles such as subj and obj as a functional term and use equality (=) to connect an event and its participant, as in subj(y 1 ) = x 1 . To be precise, the set of atomic formulas A in this event semantics is defined by the rule where F(t) is a one-place predicate (for content words), G(t, u) is a two-place predicate (for prepositions), t and u are a term. A term is defined as a constant, a variable, or a functional term of the form f (t) where f is a semantic role and t is a term. We call a formula constructed by conjunctions and existential quantifiers a basic formula in event semantics. Thus, a set of basic formulas ϕ in event semantics is defined as: The formula in (2) is an instance of a basic formula, which captures the predicate-argument structure of a sentence. On top of the system of basic formulas, we have a full language of event semantics with negation (¬), disjunction (∨), implication (→), and a universal quantifier (∀). These operators are used to represent additional logical features.
There is a natural correspondence between basic formulas and directed acyclic graphs (DAGs). Figure 1 shows an example 1 . In the graph representation, constants and variables correspond to vertices; both two-place predicates for prepositions (e.g., on(y 1 , x 1 )) and functional terms for semantic roles (e.g., subj(y 1 ) = x 1 ) are represented as edges. A one-place predicate F(t) in a logical formula can be represented as a functional relation isa(t, F), where isa is an expression relating a term t and a predicate F represented as a vertex. The isa edges are unlabeled for simplicity.

Natural deduction and word abduction
We use the system of natural deduction (Prawitz, 1965;Troelstra and Schwichtenberg, 2000) to capture phrase correspondences from a sentence pair (T, H), following the strategies for word axiom injection developed by  and Yanaka et al. (2017). The sentence pair (T, H) is first mapped to a pair of formulas (T , H ). T is initially set to the premise P , and H is set to the goal G to be proved.
If formulas P and G are basic formulas, then the proving strategy is to decompose them into a set of atomic formulas using inference rules for conjunctions and existential quantifiers. The premise P is decomposed into a pool of premises P = {p i (θ i ) | i ∈ {1, . . . , m}}, where each p i (θ i ) is an atomic formula and θ i is a list of terms appearing in p i (θ i ). The goal G is also decomposed into a set of sub-goals G = {g j (θ j ) | j ∈ {1, . . . , n}}, where θ j is a list of terms appearing in g j (θ j ).
The proof is performed by searching for a premise p i (θ i ) whose predicate matches that of a sub-goal g j (θ j ). If such a premise is found, then variables in θ j are unified to those in θ i and the sub-goal g j (θ j ) can be removed from G. If all the sub-goals can be removed, we prove T → H . In the presence of two or more variables with the same predicate, there might be multiple possible variable unifications. Modern theorem provers explore these multiple possibilities in search of a configuration that proves a theorem.
Sub-goals may remain unproved when T logically does not entail H i.e., when there are no premise predicates p i that are matched with g j . In this case, the system tries word axiom injection, called word abduction. More specifically, if there is a premise p i (θ i ) whose predicate has a linguistic relation (according to linguistic knowledge 2 ) with that of a sub-goal g j (θ j ), then variables in θ j are unified with those in θ i and the sub-goal g j (θ j ) can be removed from G. Figure 2 shows an example to illustrate how the system works. To begin with, the input sentence pair (T, H) is mapped onto a pair of formulas, (T , H ). T is initially placed to the premise P , and H to the goal G. Note that these are basic formulas, and they are thus decomposed to the following sets of formulas P and G, respectively:

Graph illustration
Steps 1 to 3 in Figure 2 demonstrate the variable unification routine and word axiom injection using graphs. Note that in step 1, all variables in formulas in P or G are initially different.
In step 2, we run a theorem proving mechanism that uses graph terminal vertices as anchors to unify variables between formulas in P and those in G. The premise meat(x 2 ) in P matches the predicate meat of the sub-goal meat(x 4 ) in G and the variable unification x 4 := x 2 is applied (and similarly for the sub-goal cut(y 2 ) in G with the variable unification y 2 := y 1 ).
In step 3, we use the previous variable unification on y 1 , the subj edge in P and G and the axiom ∀x.lady(x) → woman(x) from external knowledge to infer that x 3 := x 1 .

Phrase Abduction
There is one critical reason that the word-toword axiom injection described in Section 3.2 fails to detect phrase-to-phrase correspondences. That is, the natural deduction mechanism decomposes the goal G into atomic sub-goals that are then proved one-by-one (word-by-word), independently of each other except for the variable unification effect. This mechanism is particularly problematic when we attempt to prove phrases that resist decomposition, two-place predicates (e.g., into(x, y)), or failures in variable unification (e.g., due to inaccurate semantics). Thus, we propose a method to detect phrase-to-phrase correspondence through natural deduction proofs.

Phrase pair detection
We detect phrase-to-phrase entailing relations between T and H by finding alignments between the subgraphs of their meaning representations when T ⇒ H or T ⇒ ¬H hold. Finding subgraph alignments is a generalization of the subgraph isomorphism problem, which is NPcomplete 3 . In this paper, we approximate a solution to this problem by using a combination of a backtracking variable unification and a deterministic graph search on the neighborhood of nonunified variables.
Using our running example in Figure 2, step 4 displays our proposed subgraph alignment. The variable x 5 in the graph of G cannot be unified with any variable in the graph of P. This is a very common case in natural language inferences, as there might be concepts in H that are not directly supported by concepts in T . In this research, we propose spanning a subgraph starting at nonunified variables (e.g., x 5 in G) whose boundaries are semantic roles (e.g., subj, obj). Its candidate semantics from P are then the attributes of its corresponding unified variables from G (e.g. cut up precisely → cut into pieces).

Graph alignments
To formalize this solution we introduce some graph notation. Let V = V u ∪ Vū ∪ L be the set of vertices, where V u is the set of unified variables (e.g. x 1 , x 2 , y 1 ), Vū is the set of non-unified variables (e.g. x 5 ), and L is a set of predicates (e.g., lady, woman). Let E be the set of labeled, directed edges v, l, v where v, v ∈ V and l are labels that may represent a functional relation isa, a preposition or a semantic role. We denote a set of two-place predicates for prepositions as PREP and a set of functional terms for semantic roles as ARGS; e.g., ARGS = {subj, obj}. A graph that represents P is then a tuple G P = V P , E P , and similarly, for G, We can now define a function to span a subgraph in the neighborhood of non-unified variables v ∈ Vū G in the graph of G. We call a connected set of edges in which no semantic roles appear, i.e., { v, l, v | l ∈ ARGS}, a phrase set. Let E(x) be the phrase set in E such that each vertex is connected to x with an incoming or outgoing edge, that is, T : A lady is cutting up some meat precisely H: Some meat is being cut into pieces by a woman Step 1: Make graphs from formulas. Step 3: Use graph constraints and knowledge (lady is a woman) to unify x3 := x1. Step 4: Induce subgraph alignment with nonunified variable x5. Figure 2: A graph representation of a theorem proving routine on basic formulas and variable unification. Dotted circles represent non-unified variables at each step, whereas edges without labels are attributes. The graph of the left side is the set of premises P and the graph of the right side is the set of sub-goals G. Colored subgraphs represent a word or a phrase to which our axiom injection mechanism applies.
Note that E(x) induces a subgraph in a given graph G and the condition l / ∈ ARGS sets the boundaries of the subgraph by excluding the semantic roles of verb phrases. Given two phrase sets E and E , we say E is reachable from E, written E ∼ E , if E and E share at least one variable vertex. Let ∼ * be the transitive closure of ∼. Given a set of edges E G and a variable v, we define the extended phrase set, written Reach(v), as follows: that is, the set of edges e that can be reached from v without crossing an edge with a semantic role label. This function defines a partition or equivalence class for non-unified variables v ∈ Vū G , and each of these partitions induce a (possibly discontinuous) phrase in G that remains unproved.
The corresponding subgraph in P to each of these partitions is given by the vertices and edges connected with a path of length one to the unified variables that appear in Reach(v). That is, G denotes the vertices in the subgraph of G induced by the partition Reach(v).
A subgraph alignment between P and G is given by the pair of Corr(v), Reach(v) for all v ∈ Vū G , where the phrases can be read from the predicates in the vertices and edges labeled with prepositions.
We define a mapping (·) • from a labeled edge v, l, v to an atomic formula as follows.
The phrase axiom generated for each non-unified variable v ∈ Vū G is defined as where θ C is a set of free variables appearing in Corr(v) • (which includes v) and θ R is a set of free variables appearing in Reach(v) • but not in Corr(v) • . In Figure 2, the only non-unified variable in the sub-goal in step 4 is x 5 , that is, Vū G = {x 5 }. Then, starting from the variable Finally, the following is the axiom generated from Corr(x 5 ), Reach(x 5 ) 4 . ∀y1(cut(y1) ∧ up(y1) ∧ precisely(y1) → ∃x5(into(y1, x5) ∧ piece(x5))).

Non-basic formulas
If formulas P and G are not basic formulas (i.e., they contain logical operators other than ∧ and ∃), they are decomposed according to inference rules of natural deduction. There are two types of inference rules: introduction rules decompose a goal formula into smaller sub-goals, and elimination rules decompose a formula in the pool of premises into smaller ones. Figure 3 shows introduction rules and elimination rules for decomposing non-basic formulas including negation, disjunction, implication, and a universal quantifier. By applying inference rules, a proof of non-basic formulas appearing in sub-goals can be decomposed to a set of subproofs that only have basic formulas in sub-goals. If a universal quantifier appears in premises, it is treated in the same way as other premises. 4 Note that this axiom is logically equivalent to ∀y1(cut(y1) ∧ up(y1) ∧ precisely(y1) → ∃x5(cut(y1) ∧ into(y1, x5) ∧ piece(x5))) indicated in the colored subgraphs in step 4 of Figure 2. ∀-E Figure 3: Inference rules used for decomposing non-basic formulas. P is a premise and G is a subgoal. The initial formulas are at the top, with the formulas obtained by applying the inference rules shown below.
¬-E (P ) Figure 4: Proof process for the contradiction.
For example, consider the following sentence pair with the gold label "no" (contradiction): T : A man is not cutting a potato H: A man is slicing a potato into pieces Figure 4 shows the proof process of T ⇒ ¬H . To prove the contradiction, the formulas T and ¬H are set to P and G, respectively. Then, the negation in G is removed by applying the introduction rule (¬-I) to G. Here, False is the propositional constant denoting the contradiction. In the second stage of the proof, the goal is to prove False in G 0 from the two premises P and P 0 . By applying (¬-E) to P , we can eliminate the negation from P , resulting in the new goal G 1 .
As both the premise P 0 and the sub-goal G 1 are basic formulas, the procedure described in the previous sections applies to the pair (P 0 , G 1 ); these basic formulas are decomposed into atomic ones, and then the word-to-word abduction generates the desired axiom ∀y 1 (cut(y 1 ) → slice(y 1 )). Finally, the graph alignment applies in the same way as described in Figure 2, which generates the phrase axiom: Using this axiom, one can complete the proof of the contradiction between T and H .

Dataset selection
We use the SemEval-2014 version of the SICK dataset (Marelli et al., 2014) for evaluation. The SICK dataset is a dataset for semantic textual similarity (STS) as well as for RTE. It was originally designed for evaluating compositional distributional semantics, so it contains logically challenging problems involving quantifiers, negation, conjunction, and disjunction, as well as inferences with lexical and phrasal knowledge. The SNLI dataset (Bowman et al., 2015) contains inference problems requiring phrasal knowledge. However, it is not concerned with logically challenging expressions; the semantic relationships between a premise and a hypothesis are often limited to synonym/hyponym lexical substitution, replacements of short phrases, or exact word matching. This is because hypotheses are often parallel to the premise in structures and vocabularies. The FraCaS dataset (Cooper et al., 1994) also contains logically complex problems. However, it is confined to purely logical inferences and thus does not contain problems requiring inferences with lexical and phrasal knowledge. For these reasons, we choose the SICK dataset to evaluate our method of using logical inference to extract phrasal knowledge.
The SICK dataset contains 9927 sentence pairs with a 5000/4927 training/test split. These sentence pairs are manually annotated with three types of labels yes (entailment), no (contradiction), or unknown (neutral) (see Table 1 for examples). In RTE tasks, we need to consider a directional relation between words such as hypernym and hyponym to prove entailment and contradiction. Hence, to extract phrasal knowledge for RTE tasks, we use the training data whose gold label is entailment or contradiction, excluding those with the neutral label.

Experimental setup
For the natural deduction proofs, we used ccg2lambda (Martínez-Gómez et al., 2016) 5 , a higher-order automatic inference system, which converts CCG derivation trees into semantic representations and conducts natural deduction proofs automatically. We parsed the tokenized sentences of the premises and hypotheses using three widecoverage CCG parsers: C&C (Clark and Curran, 2007), EasyCCG (Lewis and Steedman, 2014), and depccg (Yoshikawa et al., 2017). CCG derivation trees (parses) were converted into logical semantic representations based on Neo-Davidsonian event semantics (Section 3.1). The validation of semantic templates used for semantic representations was conducted exclusively on the trial split of the SICK dataset. We used Coq (Bertot and Castran, 2010), an interactive natural deduction theorem prover that we run fully automatically with a number of built-in theorem-proving routines called tactics, which include first-order logic.
We compare phrase abduction with different experimental conditions. No axioms is our system without axiom injection. W2W is the previous strategy of word abduction . P2P is our strategy of phrase abduction; W2W+P2P combines phrase abduction with word abduction. In addition, we compare our system with three purely logic-based (unsupervised) approaches: The Meaning Factory (Bjerva et al., 2014), LangPro (Abzianidze, 2015), and UTexas (Beltagy et al., 2014). We also compare our system with machine learning-based approaches: the current state-of-the-art deep learning model GRU (Yin and Schütze, 2017), a loglinear regression model SemEval-2014 best (Lai and Hockenmaier, 2014), and a hybrid approach combining a logistic regression model and probabilistic logic PL+eclassif (Beltagy et al., 2016).

Extracted paraphrases
We extracted 9445 axioms from the SICK training dataset. The proving time average to extract phrasal axioms was only 3.0 seconds for a onesentence pair 6 . Table 2 shows some examples of  ID   Text Hypothesis Entailment 3941 A boy is looking at a calendar There is nobody checking a calendar No 5938 Vegetables are being put into a pot by a man Someone is pouring ingredients into a pot Yes 5930 The man is not doing exercises Two men are fighting Unknown Table 1: Examples in the SICK dataset with different entailment labels and similarity scores.

Kind
Text Hypothesis noun phrase A blond woman is sitting on the roof of A woman with blond hair is sitting on the roof of a yellow vehicle, and two people are inside a yellow vehicle, and two people are inside verb phrase The person is setting fire to the cameras Some cameras are being burned by a person with a blow torch verb phrase A man and a woman are walking together A man and a woman are hiking through the woods through a wooded area prepositional phrase A child, who is small, is outdoors climbing A small child is outdoors climbing steps steps outdoors in an area full of grass in a grassy area antonym A woman is putting make-up on The woman is removing make-up  paraphrases we extracted from the natural deduction proof in the training set. In particular, the examples of verb phrases show our method has the potential to capture long paraphrases. Each paraphrase in Table 2 is not contained in Word-Net and PPDB. There are many instances of noncontiguous phrases in the SICK dataset, in particular, verb-particle phrases. Shown in Table 2, our semantic alignment can detect non-contiguous phrases through the variable unification process, which is one of the main advantages over other shallow/syntactic methods. In addition, Table 2 shows our method is not limited to hypernym or hyponym relations, but it is also capable for detecting antonym phrases. The machine learning-based approaches outperform W2W+P2P, but unlike these approaches, parameter estimation is not used in our method. This suggests that our method has the potential to increase the accuracy by using a classifier. Table 4 shows some positive and negative examples on RTE with the SICK dataset. For ID 9491, the sentence pair requires the paraphrase from a field of brown grass to a grassy area, not included in previous lexical knowledges. Our phrasal axiom injection can correctly generate this paraphrase from a natural deduction proof, and the system proves the entailment relation. ID 2367 is also a positive example of phrasal axiom injection. The phrasal axiom between set fire to cameras and burn cameras with a blow torch was generated. This example shows that our semantic alignment succeeds in acquiring a general paraphrase by separating logical expressions such as some from content words and also by accounting for syntactic structures such as the passiveactive alternation.

Positive examples and error analysis
For ID 3628, the axiom shown in the table was extracted from the following sentence pair with ID Sentence Pair Gold Pred Axiom

9491
A group of four brown dogs are playing in a field of brown grass Yes Yes ∀x1(field(x1) ∧ brown(x1) ∧ grass(x1) Four dogs are playing in a grassy area → grassy(x1) ∧ area(x1))

3628
A pan is being dropped over the meat Unk Yes ∀y1(pan(obj(y1)) → into(y1, obj(y1))) The meat is being dropped into a pan

408
A group of explorers is walking through the grass Yes Unk Some people are walking their entailment label: T 1 : A woman is putting meat in a pan H 1 : Someone is dropping the meat into a pan But the phrase drop over does not entail the phrase drop into, and a proof for the inference is overgenerated in ID 3628. We extracted all possible phrasal axioms from the training dataset, so noisy axioms can be extracted as a consequence of multiple factors such as parsing errors or potential disambiguation in the training dataset. One possible solution for decreasing such noisy axioms would be to use additive composition models (Tian et al., 2016) and asymmetric learnable scoring functions to calculate the confidence on these asymmetric entailing relations between phrases.
ID 96 is also an example of over-generation of axioms. The first axiom, ∀y1(jump(y1) → ∃x1(in(y1, x1) ∧ air(x1))) was extracted from the proof of T 1 ⇒ H 1 : T 1 : A child in a red outfit is jumping on a trampoline H 1 : A little boy in red clothes is jumping in the air The second axiom ∀y1(man(y1) → biker(y1)) was extracted from the proof of T 2 ⇒ H 2 : T 2 : A man on a yellow sport bike is doing a wheelie and a friend on a black bike is catching up H 2 : A biker on a yellow sport bike is doing a wheelie and a friend on a black bike is catching up Although these axioms play a role in the proofs of T 1 ⇒ H 1 and T 2 ⇒ H 2 , the wrong axiom ∀y1(man(y1) → biker(y1)) causes the overgeneration of a proof for the inference in ID 96. The correct one would rather be ∀x1∀y1(man(y1) ∧ on(y1, x1) ∧ bike(x1) → biker(y1)). In this case, it is necessary to bundle predicates in a noun-phrase by specifying the types of a variable (entity or event) when making phrase alignments.
For ID 408, the word explorer is not contained in the training entailment dataset and hence the relevant axiom ∀x1(explorer(x1) → people(x1)) was not generated. While our logicbased method enables detecting semantic phrase correspondences in a sentence pair in an unsuper-vised way, our next step is to predict unseen paraphrases of this type.

Conclusion
In this paper, we proposed a method of detecting phrase correspondences through natural deduction proofs of semantic relations between sentence pairs. The key idea is to attempt a proof with automatic phrasal axiom injection by the careful management of variable sharing during the proof construction process. Our method identifies semantic phrase alignments by monitoring the proof of a theorem and detecting unproved sub-goals and logical premises. The method of detecting semantic phrase alignments would be applicable to other semantic parsing formalisms and meaning representation languages such as abstract meaning representations (AMR) (Banarescu et al., 2013). Experiment results showed that our method detected various phrase alignments including noncontiguous phrases and antonym phrases. This result may contribute to previous phrase alignment approaches. The extracted phrasal axioms improved the accuracy of RTE tasks.
In future work, we shall enhance this methodology of phrasal axiom injection to predict unseen paraphrases. The pairs of premises and sub-goals that can be detected through the proof process conduct semantic alignments in a sentence pair. With the use of an additive composition model of distributional vectors, we can evaluate the validity of such semantic alignments. A combination of our phrasal axiom injection and additive composition model of distributional vectors has the potential to detect unseen paraphrases in a sentence pair.