Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser

This paper presents a system that compo-sitionally maps outputs of a wide-coverage Japanese CCG parser onto semantic representations and performs automated inference in higher-order logic. The system is evaluated on a textual entailment dataset. It is shown that the system solves inference problems that focus on a variety of complex linguistic phenomena, including those that are difﬁcult to represent in the standard ﬁrst-order logic.


Introduction
Logic-based semantic representations have played an important role in the study of semantic parsing and inference. For English, several methods have been proposed to map outputs of parsers based on syntactic theories like CCG (Steedman, 2000) onto logical formulas (Bos, 2015). Output formulas have been used in various tasks, including Question Answering (Lewis and Steedman, 2013) and Recognizing Textual Entailment (RTE) (Bos and Markert, 2005;Beltagy et al., 2013;Bjerva et al., 2014).
Syntactic and semantic parsing for Japanese, by contrast, has been dominated by chunk-based dependency parsing and semantic role labelling (Kudo and Matsumoto, 2002;Kawahara and Kurohashi, 2011;Hayashibe et al., 2011). Recently, the method of inducing wide-coverage CCG resources for English (Hockenmaier and Steedman, 2007) has been applied to Japanese and a robust CCG parser based on it has been developed (Uematsu et al., 2015). However, building a method to map CCG trees in Japanese onto logical formulas is not a trivial task, mainly due to the differences in syntactic structures between English and Japanese (Section 3).
There are two primary contributions of this paper. First, based on an in-depth analysis of the syntax-semantics interface in Japanese, we present the first system that compositionally derives semantic representations for a wide-coverage Japanese CCG parser. Output representations are formulas in higher-order logic (HOL) combined with Neo-Davidsonian Event Semantics (Parsons, 1990). Second, we demonstrate the capacity of HOL for textual entailment. We evaluate the system on a Japanese textual entailment dataset (Kawazoe et al., 2015), a dataset constructed in a similar way to the FraCaS dataset for English (Cooper et al., 1994;MacCartney and Manning, 2007). Although it is usually thought that HOL is unfeasible for practical applications, the results show that the entire system is able to perform efficient logical inference on complex linguistic phenomena such as generalized quantifiers and intensional modifiers -phenomena that pose challenges to the standard first-order-logic-based approaches.

Background and system overview
This section provides a brief overview of the entire system as applied to RTE, a task of determining whether a given text (T ) entails, contradicts, or is just consistent with, a given hypothesis (H). In logic-based approaches, the meanings of T and H are represented by logical formulas; whether the entailment relation holds is typically determined by checking whether T → H is a theorem in a logical system with the help of a knowledge base.
Currently, first-order logic (FOL) is the most pop-ular logical system used for RTE (Bos and Markert, 2005;Lewis and Steedman, 2013;Bjerva et al., 2014). One advantage of systems based on FOL is that practical general-purpose theorem provers and model-builders are available. However, a drawback is that there are linguistic phenomena that cannot be represented in the standard FOL; a typical example is a generalized quantifier such as most ( Barwise and Cooper, 1981). Accordingly, it has been standard in formal semantics of natural language to use HOL as a representation language (Montague, 1974). Although HOL does not have general-purpose theorem provers, there is room for developing an automated reasoning system specialized for natural language inference. In general, a higher-order representation makes the logical structure of a sentence more explicit than a first-order encoding does and hence can simplify the process of proof search (Miller and Nadathur, 1986). Recently, based on the evaluation on the FraCaS dataset (Cooper et al., 1994),  showed that a higher-order inference system outperformed the Boxer/Nutcracker's firstorder system (Bos, 2008) in both speed and accuracy. Likewise, Abzianidze (2015) developed a higher-order prover based on natural logic tableau system and showed that it achieved high accuracy comparable to state-of-the-art results on the SICK dataset (Marelli et al., 2014). There are three main steps in our pipeline. The focus of this paper is on the last two components. 1. Syntactic parsing Input sentences are mapped onto CCG trees. We use a Japanese CCG parser Jigg (Noji and Miyao, 2016) 1 , a statistical parser based on Japanese CCGbank (Uematsu et al., 2015). 2. Semantic parsing CCG derivation trees are compositionally mapped onto semantic representations in HOL. The compositional mapping is implemented via simply typed λ-calculus in the standard way (Bos, 2008;Martínez-Gómez et al., 2016). 3. Logical inference Theorem proving in HOL is performed to check for entailment and contradiction. Axioms and proof-search procedures are largely language-independent, so we use the higherorder inference system of  2 and adapt it for our purpose. 1 https://github.com/mynlp/jigg 2 https://github.com/mynlp/ccg2lambda 3 Compositional Semantics and HOL

CCG and semantic lexicon
Combinatory Categorial Grammar (CCG) (Steedman, 2000) is a lexicalized grammar formalism suitable for implementing a compositional mapping from syntax to semantics. A syntactic category of CCG is either a basic category such as S and NP or a functional category of the form X/Y or X\Y.
The meaning of a sentence is computed from a small number of combinatory rules and the meanings of constituent words. In addition to standard combinatory rules, the Japanese CCG parser uses a small number of unary type-shifting rules (e.g., the relativization rule that changes the category S\NP to NP/NP), to which suitable meaning composition rules are given. We follow the standard method of building a semantic lexicon in CCG-based logical semantics (Bos, 2008). There are two kinds of lexical entries: (1) semantic templates that are schematic entries assigned to syntactic categories, possibly with syntactic features and (2) lexical entries directly assigned to a limited number of logical and functional expressions. Lexical entries can be sensitive to a POS tag, a surface form, and other information contained in the parser output. Table 1 shows semantic templates for main syntactic categories. More details will be provided in Section 3.2 and 3.3.
We use a language of standard higher-order logic (simple type theory) (Carpenter, 1997) as a representation language. Expressions in HOL are assigned semantic types. We use three basic types: E (Entity), Ev (Event), and Prop (Proposition). Thus, the semantic types of expressions in our system are defined by the rule First-order language can be taken as a fragment of this system; apart from logical connectives and Figure 1: The mapping from syntactic categories to semantic types. ⇒ is right-associative. quantifiers, all primitive expressions in first-order logic are confined to constant symbols of type E and predicates of type E ⇒ Prop, E ⇒ E ⇒ Prop, and so on. Thus, adopting higher-order language does not lead to the loss of the expressive power of firstorder language.
The Japanese CCG parser simplifies the standard CCG and uses two basic categories, S and NP. Accordingly, a mapping (·) • from syntactic categories to semantic types can be defined as in Figure 1. Keeping the correspondence between syntactic categories and semantic types in the semantic lexicon guarantees that a well-formed formula is compositionally derived from the meaning assignment to each leaf of a CCG derivation tree.

Semantic composition for VPs
To model a semantics for VPs in Japanese, we adopt Neo-Davidsonian Event Semantics (Parsons, 1990;Jurafsky and Martin, 2009), which is widely used in the NLP field. For instance, the sentence (1) is analyzed as having the logical form in (2): (1) John NOM slowly walk PAST 'John walked slowly' In this approach, verbs are analyzed as 1-place predicates over events; arguments and adjuncts of VPs are also analyzed as event predicates. This semantic uniformity is suitable to handling Japanese syntactic structures in which the arguments of a VP is often implicit and thus the argument-adjunct distinction is less transparent than languages like English (Pietroski, 2005). As is seen in (2), we adopt the unique-role requirement for case markers (Carlson, 1984); for instance, the nominative case marker does not denote the relation Nom(v, x), as in the event semantics in Boxer (Bos, 2008), but the function Nom(v) = x. This treatment allows us to make use of logical properties of equality and hence is more suited to theorem-proving in our setting.
To derive a semantic representation in event semantics compositionally, we adopt the compositional semantics of VPs in Champollion (2015) and analyze VPs themselves as introducing existential quantification over events. To derive the correct meaning for VP modifiers, the semantic type of a verb is raised so that the verb takes a modifier as argument but not vice versa. Figures 2 and 3 give example derivations.
VP modifiers such as slowly license an inference from John walked slowly to John walked, an inference correctly captured by the formula in (2). In English and Japanese, however, there are intensional VP modifiers that do not license this inference pattern. Thus, the sentence John almost walked does not entail John walked (Dowty, 1979). While it is not easy to provide a desirable analysis in first-order language (Hobbs, 1985), HOL gives a perspicuous representation: Here, almost is a higher-order predicate having the semantic type (Ev ⇒ Prop) ⇒ Ev ⇒ Prop. The meaning assignment to VP modifiers of category S/S in Table 1 is for extensional modifiers; an intensional modifier is assigned the representation λSK.S(λJv.K(Base(J), v)) in the lexical entry, which results in a representation as in (3).

Semantic composition for NPs
The quantificational structure of an NP plays a crucial role in capturing basic entailment patterns such as monotonicity inference. In the case of English, quantificational structures are specified by the type of determiners (e.g. a, the, every, some, no); together with the category distinction between N and NP, which is supported in English CCGbank (Hockenmaier and Steedman, 2007), one can provide a correct representation for NPs.
By contrast, Japanese is a classifier language, where NPs freely occur without determiners in argument position (Chierchia, 1998). For example, the subject in (4) appears in argument position without accompanying any determiner.
Bekki (2010) provides a comprehensive CCG grammar for Japanese that adopts the N-NP distinction and analyzes Japanese bare NPs as accompanying the null determiner. The Japanese CCGbank, by contrast, simplifies Bekki's (2010) grammar and avoids the use of the null determiner; it does not use the category N and takes all NPs in Japanese to have the syntactic category NP. This discrepancy in NP-structure between English and Japanese poses a challenge to the standard approach to building compositional semantics.
To provide a compositional semantics adapted for the Japanese CCG, we take NPs themselves as introducing quantification over individuals, along the same lines as the semantics for VPs. The semantic type of NPs needs to be raised so that they take NPmodifiers as argument (cf. the template for NP in Table 1). Figure 2 shows a derivation for the sentence in (4), where the adjective small modifies the NP dog to form a bare NP small dog. It should be noted that the predicate small(x) is correctly inserted inside the scope of the existential quantification introduced by the NP dog. The so-called privative adjectives (e.g. fake and former) are analyzed in the same way as intensional VP modifiers.
Following the analysis in , we analyze non-first-order generalized quantifier most as having the higher-order logical form Figure 3 shows an example derivation for a sentence containing a generalized quantifier most. Our system also handles floating quantifiers in Japanese.

Experiments
We evaluate our system 3 on Japanese Semantics test suite (JSeM) 4 (Kawazoe et al., 2015), a Japanese dataset for textual entailment designed in a similar way to the FraCaS dataset for English. These datasets focus on the types of logical inferences that do not require world knowledge. JSeM has Japanese translations of FraCaS problems and an extended set of problems focusing on Japanese syntax and semantics. Each problem has one or more premises, followed by a hypothesis. There are three types of answer: yes (entailment), no (contradiction), and unknown (neutral). Each problem is annotated with the types of inference (logical entailment, presupposition, etc.) and of linguistic phenomena. We evaluate the system on 523 problems in the dataset. We focus on problems tagged with one of the five phenomena: generalized quantifier, plu-   ral, adjective, verb, and attitude. We use problems whose inference type is logical entailment, excluding anaphora and presupposition. We use Kuromoji 5 for morphological analysis. To focus on the evaluation of semantic parsing and inference, we use gold syntactic parses, which show an upper bound on the performance of the semantic component. Gold syntactic parses are manually selected from n-best outputs of the CCG parser. For the higher-order inference system, we use the axioms presented in  adapted with the necessary modification for our event semantics. Given premises P 1 , ... , P n and a hypothesis H, the system outputs yes (P 1 ∧ · · · ∧ P n → H is proved), no (P 1 ∧ · · · ∧ P n → ¬H is proved), or unknown (neither is proved in a fixed proof-search space). 6 We set a 30 seconds timeout for each inference run; the system outputs unknown after it. The current semantic lexicon has 36 templates and 113 lexical entries. Table 2 and 3 show the results. The system with gold syntactic parses achieved 86% accuracy on the total 523 problems, with high precision and reasonable speed. There was no timeout. 7 The accuracy dropped to 70% when ablating HOL axioms (Table  3). SLC refers to the performance of a supervised learning classifier 8 based on 5-fold cross-validation for each section. Although direct comparison is not    on the corresponding sections of FraCaS. possible, our system with gold parses outperforms it for all sections. Out of the 523 problems, 417 are Japanese translations of the FraCaS problems. Table 4 shows a comparison between the performance of our system on this subset of the JSeM problems and the performance of the RTE system for English in  on the corresponding problems in the FraCaS dataset.  used system parses of the English C&C parser (Clark and Curran, 2007). The total accuracy of our system is comparable to that of .
Most errors we found are due to syntactic parse errors caused by the CCG parser, where no correct syntactic parses were found in n-best responses. Comparison between gold parses and system parses shows that correct syntactic disambiguation improves performance.

Conclusion
To our knowledge, this study provides the first semantic parsing system based on CCG that compositionally maps real texts in Japanese onto logical forms. We have also demonstrated the capacity of HOL for textual entailment. The evaluation on JSeM showed that our system performs efficient logical inference on various semantic phenomena, including those that challenge the standard FOL. The attractiveness of a logic-based system is that it is highly modular and can be extended with other components such as a robust knowledge base (Lewis and Steedman, 2013;Beltagy et al., 2013;Bjerva et al., 2014). Such an extension will be a focus of future work.