Polarity Computations in Flexible Categorial Grammar

This paper shows how to take parse trees in CCG and algorithmically find the polarities of all the constituents. Our work uses the well-known polarization principle corresponding to function application, and we have extended this with principles for type raising and composition. We provide an algorithm, extending the polarity marking algorithm of van Benthem. We discuss how our system works in practice, taking input from the C&C parser.


Introduction
The main goal of this work is to take input from text and then to automatically determine the polarity of all the words. For example, we aim to find the arrows in sentences like Every dog ↓ scares ↑ at least two ↓ cats ↑ , Every dog ↓ and no cat ↓ sleeps = , and Most rabbits = hop ↑ . The ↑ notation means that whenever we use the given sentence truthfully, if we replace the marked word w with another word which is "≥ w," then the resulting sentence will still be true. So we have a semantic inference. The ↓ notation means the same thing, except that when we substitute using a word ≤ w, we again preserve truth. Finally, the = notation means that we have neither property in general; in a valid semantic inference statement, we can only replace the word with itself rather than with something larger or smaller.
For example, if we had a collection of background facts like cats ≤ animals, beagles ≤ dogs, scares ≤ startles, and one ≤ two, then our ↑ and ↓ notations on Every dog ↓ scares ↑ at least two ↓ cats ↑ would allow us to conclude Every beagle startles at least one animal.
The goal of the paper is to provide a computational system to determine the notations ↑, ↓, = on input text to the best extent possible, either using hand-created parses, or output from a popular and freely available CCG parser C&C (Clark and Curran, 2007).
Using our polarity tool, we get a very easy first step on automatic inference done with little or no representation. We discuss potential applications to textual inference.
Theory We extend polarity determination for categorial grammar (CG) (see Sánchez-Valencia (1991);van Benthem (1986); van Eijck (2007); Lavalle-Martínez et al. (2017)). These papers only consider the Ajdukiewicz/Bar-Hillel (AB) flavor of CG, where the rules are restricted to application rules (>) and (<). There is a consensus that application rules alone are too restrictive to give wide-coverage grammars. We thus extend this work to the full set of flexible combinators used in CCG. We prove that our system is sound, in a precise sense. Further, we show how to incorporate boolean reasoning (Keenan and Faltz, 1984) to get a more complete system.
A working system We have implemented our algorithm in Python. This implementation handles sentences from the C&C parser (Clark and Curran, 2007). This is a non-trivial step on top of the theoretical advance because the parses delivered by the C&C parser deviate in several respects from the semantically-oriented input that one would like for this kind of work.

An Ordered Syntax-semantics Interface
The basis of the semantics is the syntax-semantics interface in formal semantics, especially in CG and CCG (Keenan and Faltz, 1984;Carpenter, 1998;Steedman, 2000;Jacobson, 2014). Our syntax in this small paper will consist of the lexicon shown in our examples. Here is an exam- (1) This tree is not the simplest one for Fido chased Felix. We chose it to remind the reader of the CCG rules of type-raising (T) and composition (B).
Let us fix a semantics. We first select the base types e and t. We generate complex types from these by using function types x → y. We adopt a few standard abbreviations. We then fix a map from the CG categories into the types. We choose s → t, n → e → t, n pr → e, np → (e → t) → t, etc. (We use n pr for proper names.) A model M is a set M together with interpretations of all the lexical items by objects of the appropriate semantic type. We use M as the semantic space for the type e, 2 = {F, T} for type t, and the full set of functions for higher types. The interpretations of some words are fixed: determiners, conjunctions and relative pronouns. The model thus interprets intransitive verbs by (et, t)t, and transitive verbs by (et, t)((et, t)t). By the Justification Theorem in Keenan and Faltz (1984), we in fact may obtain these using simpler and more natural data: for proper names we need only objects of type e, for intransitive verbs we need only et, and for transitive verbs eet.
Let S be a sentence in our fragment, and let Π be a parse tree for S. Associated to Π we have a semantic parse tree, giving us a term t S in the typed lambda calculus over the base types e and t. This term may be interpreted in each model M. For example, the interpretation corresponding to (1) is the boolean value in the model Polarities ↑ and ↓ In order to say what the polarity symbols mean, we need to enrich our semantic spaces from sets to preorders (Moss, 2012;Icard and Moss, 2014).
A preorder P = (P, ≤) is a set P with a relation ≤ on P which is reflexive and transitive. Fix a model M. Then each type x gives rise to a preorder P x . We order P t by F < T. For P e we take the flat preorder on the universe set M underlying the model. For the higher types x → y, we take the set (P x → P y ) of all functions and endow it with the pointwise order. In this way every one of our semantic types is naturally endowed with the structure of a preorder in every model.
Each sentence S in our fragment is now interpreted in an ordered setting. This is the (mathematical) meaning of our ↑ and ↓ arrows in this paper. For example, when we write every dog ↓ barks ↑ , this means: for all models M, all m 1 ≤ Order-enriched types using +, −, and · Following Dowty (1994) we incorporate monotonicity information into the types. Function types x → y split into three versions: the monotone version x + → y, the antitone version x − → y, and the full version x · → y. (What we wrote before as x → y is now x · → y.) These are all preorders using the pointwise order. We must replace all of the ordinary slash types by versions of them which have markings on them.
Lexicon with order-enriched types We use S for t, N or et for e Note that we have a different font than our syntactic types s, n, and np. Then we use NP + → S for intransitive verbs, NP + or NP − for noun phrases with determiners, e for proper names. For the determiners, our lexicon then uses the order-enriched types in different ways: word type

Polarizing a Parse Tree
In this section, we specify the rules (see Figure 1) by which we put markings and polarities on each node of a CCG parse tree, based on a marked/order-enriched lexicon. The next section discusses the algorithm.
Input A parse tree T in CCG as in (1), and a marked lexicon.
Output We aim to convert T to a different tree T * satisfying the following properties: (1) The semantic terms in T and T * should denote the same Figure 1: The top line contains core rules of marking and polarity. The letters m and n stand for one of the markings +, −, or ·; d stands for ↑ or ↓ (but not =). In (I), (J), and (K), x must be a boolean category. See charts in the text for the operations m, d → md and m, n → mn. function in each model. (2) The lexical items in T * must receive their types from the typed lexicon. (3) The polarity of the root of T * must be ↑.
(4) At each node in T * , one of the rules in our system must be matched. Most of the rules are listed in Figure 1.
Example For T in (1), T * could be as in (2): The signs + and − on the arrows are markings; markings apply to arrows only. We have a third marking, ·, but this does not figure into (2). Markings are used to tell if a function is interpreted (in every model) by a function which is always monotone (+), always antitone (−), or neither in general (·). The arrows ↑ and ↓ are polarities. We also have a third polarity, =. Polarities are for specific occurrences.
Explanation of the operations on markings and polarities Each rule in Figure 1 is actually a number of other rules, and we have summarized things in terms of several operations. The chart on the left is for combining two markings m and n, and the one on the right is for combining a marking m and a polarity d, obtaining a new polarity.
Comments on the rules In Figure 1, x, y and z are variables ranging over marked types. The application rule (>) is essentially taken from van Benthem (1986) (see also Lavalle-Martínez et al. (2017) for a survey of related algorithms); we expect that our logical system will give rise to several algorithms.
To illustrate (>), let us take m = − and d = ↑. We then have the (>) rule This means: for all preorders P and Q, all f, g : P − → Q and all p 1 , p 2 ∈ P , if f ≤ g and p 2 ≤ p 1 , then f (p 1 ) ≤ g(p 2 ).
If we were to change x ↓ to x ↑ in (3), we would change our statement by replacing "p 2 ≤ p 1 " with "p 1 ≤ p 2 ". If we changed it to x = , we would use "p 1 = p 2 ". In this way, we can read off a large number of true facts about preorders from our rules.
There are similar results concerning (B). Here is an example of how (B) is used, taken from (2). Fido has type NP + = (et) + → t, and chased above it has type NP + + → (et). So the application of (B) results in Fido chased with type NP + + → t. The rules (I), (J), and (K) are new. In them, x must be Boolean. That is, it must belong to the smallest collection B containing t and with the property that if z ∈ B, then (y · → z) ∈ B for all y. B is thus the collection of types whose interpretations are naturally endowed with the structure of a complete atomic boolean algebra (Keenan and Faltz, 1984). Indeed, the soundness of (J) and (K) follows from the proof of the Justification Theorem (op. cit). Figure 2 contains two applications of the (K) rules. First, the lexical entry for chased is e → et. The first application of (K) promotes this to NP − + → et. The N P receives a − because its argument no cat is of type NP − . Note that the polarity flips when we do this. If we had used (J), the promotion would be to NP + + → et, and Several rules are not shown including "backwards" versions of (>), (B), and (T), and also versions where all polarizations are =. This is a technical point that is not pertinent to this short version. We should mention that due to these rules, every tree may be polarized in a trivial way, by using = at all nodes. So we are really interested in the maximally informative polarizations, the ones that make the most predictions.
Boolean connectives, etc. We take and and or to be polymorphic of the types B m → (B m → B), when B is a Boolean category and m = +, −, or ·. Negation flips polarities. Relative pronouns and relative clauses also can be handled. Adjectives are taken to be N + → N.
Other combinators This paper only discusses (T) and (B), but we also have rules for the other combinators used in CG, such as (S) and (W). For example, the (S) combinator is defined by Sf g = λx.(f x)(gx). In our system, the corresponding polarization rule is This combinator is part of the standard presentation of CCG, but it is less important in this paper because the C&C parser does not deliver parses using it.

Algorithmic Aspects
We have an algorithm 1 that takes as input a CCG tree as in (1) and outputs some tree with markings and polarities, a tree which satisfies the conditions that we have listed. The algorithm has two phases, similar to van Benthem's algorithm (van Benthem, 1986) for work with the Ajdukiewicz/Bar-Hillel variant of CG (only application rules). Phase 1 goes down the tree from leaves to root and adds the markings, based on the rules in Figure 1. The markings on the leaves are given in the lexicon. The rest of Phase 1 is non-deterministic. We can see this from our set of rules: there are many cases where one conclusion (on top of the line) permits several possible conclusions. As we go down the tree, we frequently need to postpone the choice.
Phase 2 of the algorithm computes the polarities, again following the rules, starting with the root. One always puts ↑ on the root, and then goes up the tree. This part of the algorithm is straightforward.
The overall algorithm is in fact nondeterministic for two reasons. As we explained, Phase 1 has a non-deterministic feature. In addition, it is always possible to polarize everything with = and make similar uninformative choices for the markings. We are really interested in the most informative polarization, the one with the fewest number of = polarities.
Soundness We have proved a soundness theorem for the system. Though too complicated to state in full, it might be summarized informally, as follows. Suppose we have a sentence S in English, and suppose that the lexical items in S are given semantics that conform to our assumptions. (This means that the semantics of the lexical entries must belong to the appropriate types.) Then any semantic statement about the ↑ , ↓ , = marking predicted by our system is correct. See Moss (2018) for details.
Completeness We have not proven the completeness of our system/algorithm, and indeed this is an open question. What completeness would mean for a system like ours is that whenever we have an input CCG parse tree and a polarization of its words which is semantically valid in the sense that it holds no matter how the nouns, verbs, etc. are interpreted, then our algorithm would detect this. This completeness would be a property of the rules and also of the polarization algorithm. The experience with similar matters in Icard and Moss (2013) suggests that completeness will be difficult.
Efficiency of our algorithm Our polarization is quite fast on the sentences which we have tried it on. We conjecture that it is in polynomial time, but the most obvious complexity upper bound to the polarization problem is NP. The reason that the complexity is not "obviously polynomial" is that for each of the type raising steps in the input tree, one has three choices of the raise. In more detail, suppose that the input tree contains x (x → y) → y T Then our three choices for marking are: (x Our implementation defers the choice until more of the tree is marked. But prima facie, there are an exponential number of choices. All of these remarks also apply to the applications of (I), (J), and (K); these do not occur in the input tree, and the algorithm must make a choice somehow. Thus we do not know the worst-case complexity of our algorithm.

What Our System Can Currently Do
We tokenized input sentences using the script from the ccg2lambda system (Martínez-Gómez et al., 2016). The tokenized sentences were then parsed using the C&C parser (Clark and Curran, 2007), which is trained on the CCGbank (Hockenmaier and Steedman, 2007). Then we run our algorithm.
We are able to take simple sentences all the way through. For example, our system correctly determines the polarities in As shown, our algorithm polarizes all words in the input. For determiners, this actually is useful. It is (arguably) background knowledge, for example that every ≤ some; at least two ≤ at least one ≡ some, no ≤ at most one ≤ at most two, etc. These would not be part of the algorithm in this paper, but rather they would be background facts that figure into inference engines built on this work.
Problems Our end-to-end system is sound in the sense that it polarizes the correctly input semantic representations. However, it is limited by the quality of the parses coming from the C&C parser. While the parser has advantages, its output is sometimes not the optimal for our purposes. For example, it will assign the supertag N/N to most, but NP/N to other quantifiers. Thus in order to handle most, one has to manually change the parse trees. It also parses relative clauses as (no dog) (who chased a cat) died rather than (no (dog who chased a cat)) died. Furthermore, the parser sometimes behaves differently on intransitive verbs likes walks than on cries. Currently, we manually fix the trees when they systematically deviate from our desired parses (e.g. relative clauses). Finally, as with any syntactic parser, it only delivers one parse. So ambiguous sentences are not treated in any way by our work.

Future Work: Inference, and Connections with Other Approaches
We certainly plan to use the algorithm in connection with inference, since this has always been the a primary reason to study monotonicity and polarity. Indeed, once one has correct polarity markings, it is straightforward to use those to do inference from any background facts which can be expressed as inequalities. This would cover taxonomic statements like dog ≤ animal and also predications like John isa swimmer. Our future work will show logical systems built this way.