A Monotonicity Calculus and Its Completeness

One of the prominent mathematical features of natural language is the prevalence of “upward” and “downward” inferences involving determiners and other functional expressions. These inferences are associated with negative and positive polarity positions in syntax, and they also feature in computer implementations of textual entailment. Formal treatments of these phenomena began in the 1980’s and have been reﬁned and expanded in the last 10 years. This paper takes a large step in the area by extending typed lambda calculus to the ordered setting . Not only does this provide a formal tool for reasoning about upward and downward inferences in natu-ral language, it also applies to the analysis of monotonicity arguments in mathematics more generally.


Introduction
Monotonicity reasoning is pervasive across many domains, from mathematics to natural language, indeed in any setting that deals with functions of ordered sets. A function f is monotone if it preserves order, that is, if x ≤ y implies f (x) ≤ f (y). Anti-monotone (or antitone) functions f are those that reserve order, that is, for which x ≤ y implies f (y) ≤ f (x). Natural language constructions that exhibit these patterns are ubiquitous, spanning semantic and grammatical categories. Algorithms have been devised and studied for deriving monotonicity patterns in complex expressions composed of simpler functional expressions (van Benthem, 1986;Sánchez-Valencia, 1991;van Eijck, 2007). For instance, the interaction of quantifier and temporal expressions, together with the fact that 2 ≤ 5, guarantee that Any play that lasts more than 2 hours is too long entails (is "less than" in a sense to be made precise) Any play that lasts more than 5 hours is too long. There has been recent theoretical work on monotonicity reasoning as part of a general interest in "natural logic" (Bernardi, 2002;Zamansky et al., 2006;MacCartney and Manning, 2009;Muskens, 2010;Icard, 2012;Moss, 2012;Icard and Moss, 2013;Tune, 2016), and much of this work has made its way into psycholinguists (Geurts, 2003;Geurts and van der Slik, 2005) and natural language processing (MacCartney and Manning, 2007;Angeli and Manning, 2014;Bowman et al., 2015;Abzianidze, 2015). (For review see Icard and Moss 2014.) Whereas monotonicity reasoning in natural language is often seen as comprising a fragment of higher-order logic, we can also construe it as encoding a logical system in its own right relative to a suitably coarsened model-theoretic interpretation. In this context standard metalogical questions such as completeness can be raised. A completeness result would tell us that a proof system is sufficient to derive everything that follows on the intended model of monotonicity reasoning.
Though our primary interest here is natural language, it bears mention that such reasoning in higher-order settings is also ubiquitous in other areas, e.g., in mathematics. Consider the convergence test for improper integrals, which states that if 0 ≤ f (x) ≤ g(x) on an interval [a, ∞), then ∞ a f (x)dx converges if ∞ a g(x)dx does. As an example of this, note that knowing ∞ 1 e −x dx = e −1 converges allows, by monotonicity reasoning alone, to infer that ∞ 1 e −x 2 also converges:  This argument, similar to those we will be considering, depends only on the monotonicity profiles of the relevant functions (multiplication, exponentiation, etc.) on the relevant domains. The aim of the present contribution is to formulate a suitable system for monotonicity reasoning in a higher-order setting, appropriate to the task of capturing common entailment patterns in natural language in particular, and to prove a completeness result for an associated proof system. Along the way we also prove an analogue of Lyndon's (1959) Theorem for first order logic, showing exactly when, in our general setting, a subterm occurrence stands in a monotone or antitone position. For reasons of space, we skip some of the less central proofs.

Motivating Example
To motivate the specific formal apparatus that we will employ, consider the following small fragment. As in previous work (Icard and Moss, 2013), we will be considering an extended simply typed lambda calculus where the functional types can be "marked" with monotonicity information, + for monotone, − for antitone, and · for neither (or unknown). Suppose we have two base types t and p, corresponding to truth values and predicates (more commonly, functions from entity type to truth value type), and the following typed terms: The first statement encodes the assumption that if all members of a given category run, it can be inferred that few members of that category fly. The second statement essentially says that deftly is subsective (Kamp and Partee, 1995): deftly v'ing involves v'ing (see Figure 1). The third and fourth capture basic lexical entailments. We can then use these assumptions to derive Few marsupials deftly soar from All mammals run. A proof using our monotonicity calculus appears in Figure 1. There are several important points to notice about this example. First, in order to state and use assumptions such as the first two above, we make crucial use of lambda abstraction and β-reduction.
Second, note that we can state entailment facts between terms even when those terms have different (marked) types, as in the first two statements. For example, though in our typing system λx.x will be of type p + → p, it can nonetheless be compared with deftly because (denotations of) terms of type p + → p can be semantically "coerced" to type p · → p. This is simply because the domain D p · →p for terms of type p · → p will be the class of all functions from D p to D p , which certainly includes all the monotone functions.
Third, it can be useful to derive monotonicity information for complex terms, e.g., so that we can derive λx. all(x)(run) (mammal) λx. all(x)(run) (marsupial) in one step. Theorem 8.2 below guarantees that the way we type lambda abstractions is in a sense optimal.
The framework developed in this paper is motivated by the desire to capture patterns like these. Such patterns could be derived in an inequational system of full higher-order logic: given a constant ∨ for disjunction at a given type, it is easy to see that a term f will be a monotone function just in case we have λx.λy.f (x) λx.λy.f (x ∨ y). Proofs of facts like that above might then be derived in a higher order logic proof system with monotonicity declarations as additional premises. We of course could not have completeness in this setting, but more importantly, we would rather like to isolate and understand what is characteristic of monotonicity reasoning as such.
There are many instances of this kinds of reasoning outside of natural language. As a simple illustration of the main concepts and definitions, throughout the paper we will be considering a running example of elementary mathematical reasoning about real number functions.

Types and Domains
Our set T of types is defined inductively from a set B of base types b: Definition 3.1 (Markings and types). The set Mar of markings is {+, −, ·}. We use m and m to denote markings. We always take Mar to be ordered with + ·, − ·, and m m for all m. We also define a binary operation • on Mar by + • + = +, +•− = −, −•+ = −, −•− = +; and otherwise m • m = ·. Notice that • is associative. We have a natural ordering on types, where σ τ can be read as: any term of type σ could also be considered of type τ (cf. Def. 3.5 below). Definition 3.2 ( on types). Define ⊆ T × T to be the least preorder with the property that whenever σ σ and τ τ , and m m , we have σ m → τ σ m → τ . Definition 3.3 (the functions ↑, ∨, and σ →σ on types). (Mar, ) is an upper semilattice. So we have an operation ∨ on it. Explicitly, m ∨ m = m for all m, and for m = m , m ∨ m = ·. We also define ↑ to be the smallest relation on types, and ∨ to be the smallest partial function on types, with the properties that for all σ, τ 1 , and τ 2 : 1. σ ↑ σ, and σ ∨ σ = σ.
Definition 3.6. Here is a family of pre-structures called the full pre-structures based on an assignment of preorders D σ to base types σ. Then one defines D σ by recursion on the height of σ: →τ : f is antitone} The order in all cases is the pointwise order. We define the maps π σ,τ in terms of the characterization in Proposition 3.4. For n = 0, the only time we have (σ, τ ) ∈ R 0 is when σ = τ ; in this case, we set π σ,σ to be the identity on D σ . Notice that each π σ,τ is order preserving (since τ must be σ when n = 0), and also that π σ,µ = π τ,µ • π σ,τ .
Example 3.7. Take B = {r}, with r intuitively standing for real numbers. Then we will have types in T such as those shown below. As an We build the full pre-structure using D r = R, the reals with the usual order ≤. Then we have, e.g., The Language L λ of Terms Our language L λ is a variant of the typed λcalculus which makes use of the marked types that we saw in Section 3. We begin with a set C of constants, each coming with a unique type, and a set V of variables, also with their types. We define the language L λ of all terms using a typing calculus. Beginning with a set of typing statements determined from C and V, we define several things simultaneously: terms with their types (denoted M : σ, N : τ , etc.), occurrences of free variables in terms, and the valence of each free variable occurrence in M .

For a variable
Further, x occurs free in itself in the evident way, and x is the only variable that occurs free in itself. The valence is +.
2. Each constant c : σ is a term, so there are no free occurrences of any variables in c.
3. Let m ∈ Mar. We have the following rule: The free occurrences of x in M (N ) are the free occurrences of x in M together with the free occurrences of x in N .
Any free occurrence of x in M (N ) is either an occurrence in M , or an occurrence in N : We define F V (M ) to be the variables with free occurrences in M . We define BV (M ) to be the variables with bound occurrences in M . (We have not defined these, but they are defined as usual.) The main point about the valences of variable occurrences will come shortly, in Lemma 5.2.
Remark 4.1. Note that every term M has a unique type. For this reason, we often omit the type when it is not pertinent to the discussion.
Example 4.2. We build on Example 3.7. Let us take the set C of constants to be given as follows: We shall present the semantics of terms in Section 5 below. The "standard interpretation" is not quite the semantics harmless. Further, we take variables x, y, z : r, and f, g : r · → r. Figure 2 has examples of terms, again with types and semantics under a valuation φ. We assume that the semantics interprets the constants as above. Of course, it would be more sensible to write 1 + 1 instead of +(1)(1).

Semantics of L λ : Structures
At this point, we turn to the semantics of our language. We interpret L λ in what we call structures. These are pre-structures together with additional information needed to interpret variables and constant symbols.
Definition 5.1. Let D be a pre-structure. We let Φ = Φ(D) be the set of functions φ whose domain is the set of (typed) variables, with the property that if x : σ, then φ(x) ∈ D σ . We call such functions φ valuations in D.
An interpretation function in D is a function mapping the terms M of the language L λ together with valuations to elements of D. As before, we where D is a pre-structure, and [[ ]] is an interpretation in D such that the following conditions hold: In the last point, we use our notation for modifying functions when we write Again, see Figure 2 for examples.

Positivity Entails Monotonicity; Negativity Entails Antitonicity
Recall that positive or negative occurrences of variables are syntactic notions, whereas monotonicity and antitonicity are semantic notions. One of the contributions of this paper is to explore the connection between these notions. First, we consider the case when M is of functional type σ + → τ . In this case, all occurrences of x in N must be +. By induction hypothesis, We have suppressed the type information on the inequality signs ≤.
The case when M is of negative functional type is similar. This concludes our (abridged) discussion of point (1).
We turn to (2). Suppose that all free occurrences of x in M (N ) are −. Then all free occurrences of x in M are −. We again have two cases.
First, we consider the case when M is +. So all free occurrences of This concludes our work on application terms. We conclude the overall induction by considering abstraction terms λy.M . Let a ≤ b. To see that [[λy.M ] Note that we apply the induction hypothesis to φ d y , not to φ. This for all d shows (1)

Term Substitution and Reduction
A substitution is a function s from variables to terms, sending x : σ to some s(x) : σ σ. One example is the identity substitution Id. For any substitution s, and any variable x : σ and M : σ σ, we get a new substitution s M x , defined by s M x (x) = M , and for y = M , s M x (y) = s(y). When the subscript/superscript notation becomes cumbersome, we might change it. For example, we usually write Id M x as [ M /x]. The notion of capture-avoiding substitution is something of a challenge to get correct. We adopt the definitions of Stoughton (1988) and then quote the results from this paper, adapted to our setting.
Given a term M and a substitution s, we define M [s] by induction on s. It represents the result of substituting, for each x, s(x) for every free occurrence of x in s. We only use the notation M [s] when no variable occurs bound in M and free in any s(x). (That is, we insist that no variable free in any s(x) has bound occurrences in M .) In the last line, y is the least variable in some pre-set list such that y is not free in M , nor in any s(z) for z free in M . Also s y x is just like s except that s y x (x) = y. But y can be any variable z with those properties; by Corollary 3.11 of Stoughton (1988) We conclude with the induction step for abstraction. Let M be λz.N with z = x. We assume our lemma for N , and we have an occurrence of x in M ; its valence there is the same as the valence of the corresponding occurrence in N . Recall that M [s] is λw.N [s], with w suitably fresh. A free occurrence of y in M [s] corresponds to a free occurrence in N [s], and the valence is the same. Our result follows from the induction hypothesis.
The next two results will guarantee that the usual reduction rules of lambda calculus involve well-defined operations on our set of terms. Proof. The type of (λx.M )N is the type of M , so the result follows from Lemma 6.1. Theorem 6.4 (Subject Reduction Theorem for valences). Consider a free occurrence occ of y in (λx.M )N with valence m, either + or −. Also, consider the term that results from (β) reduction, M [ N /x]. Then the occurrences of y in M [ N /x] which correspond to occ also have valence m.
Proof. If the free occurrence of y is in λx.M , then our result is easy. So we focus on the case when it is in N . Now m = m 1 •m 2 , where m 1 is such that λx.M : σ m 1 → τ , and m 2 is the valence of occ in N . We are assuming that m 1 is either + or −. By the way we type abstractions, all free occurrences of x in M have valence m 1 . By Lemma 6.2, the occurrences of y which correspond to occ also have valence m 1 • m 2 .
Definition 6.5. Define ≈ to be the least equivalence relation between L λ -terms closed under: In the (η) rule we also assume x / ∈ F V (M ). The following proposition guarantees that equivalent terms are assigned the same meaning.
Proposition 6.6. If M ≈ N , then for all S and φ, We say that a term M is in normal form if it has no β-or η-redexes, those defined in the usual way.

Term Structures
In this section, we outline a method to define a prestructure from a preorder on terms of the language. Given a term M , we denote its ≈-equivalence class by M . When we define a function ι on the ≈-equivalence classes, we generally write ι M rather than ι( M ). Let Proposition 7.1. The family i σ,τ has the following functoriality properties: i σ,σ is the identity on T σ , and if σ τ µ, then i σ,µ = i τ,µ • i σ,τ .
Definition 7.2. A term structure T is a family {T τ , τ } of preorders, subject to the following: Lemma 7.3. For any term structure and any type σ, if M σ N and σ τ , then also M τ N . In other words, the inclusion maps i σ,τ are order-preserving.
Proof. By induction on types. For basic types the order is trivial, so suppose that M σ m →τ N , and that σ m → τ σ m → τ , so that σ σ, τ τ , and m m . Then: The second implication is because T σ ⊆ T σ . The third implication is by induction hypothesis.
Proposition 7.4. For any term structure {T τ } τ ∈T there is an associated pre-structure {D τ } τ ∈T with order-isomorphisms ι τ : T τ → D τ , such that: Proof. We build preorders {D τ } τ ∈T and orderisomorphisms ι τ : T τ → D τ using recursion on the set of types. For base types b ∈ B we simply take D b = T b , and ι b is the identity. Suppose we have already defined D σ and D τ , and we have isomorphisms ι σ : T σ → D σ and ι τ : In other words, we define M * exactly so that (3) is satisfied. The map M * is well-defined because ≈ respects term application. The order-embedding ι σ m →τ : T σ m →τ → D σ m →τ is obviously given by ι σ m →τ ( M ) = M * . We show this map is 1-1.
By rule (ξ) we also have λv.M (v) ≈ λv.N (v), and by two applications of (η) and transitivity we have M ≈ N , whence M = N .
It remains only to show that {D τ } τ ∈T is a well defined pre-structure with maps π σ,τ given by Condition 1 in Definition 3.5 holds trivially. Condition 2 comes from condition 1 on the term structure, condition 3 from point 2, condition 4 from point 3, and condition 7 from Lemma 7.3. The functoriality properties 5 and 6 come from Proposition 7.1. Lemma 7.5 is the main construction of semantic models for our calculus besides the full structures which we saw in Definition 3.6 and Example 5.3. In it, note that if ψ is an assignment function (a map from variables to terms), then composing with the natural map (taking terms to ≈classes) gives a map into the term structure. So further composing with ι gives a valuation into a pre-structure. We thus define ψ to be the valuation function given by ψ (x) = ι ψ(x) for all x. What is more, every valuation function into a model of this type is of the form ψ , and ψ is determined uniquely up to ≈.
Lemma 7.5. Let T be a term structure, and let D be its associated pre-structure from Proposition 7.4. Define an interpretation function in D: Finally, consider a term λx.M , and fix a substitution ψ. Let a ∈ D σ , and let A be a term such that ι A = a. Let y be a variable which is not free in A, and also not free in M or any ψ(z) for z a free variable of M . Then This completes the proof.

Monotonicity Entails Positivity; Antitonicity Entails Negativity
The main result of this section is Theorem 8.2, a converse (of sorts) to Lemma 5.2.

A Term Structure Built "Freely" from an Inequality
The proof of Theorem 8.2 employs a specific term structure. Let σ be a type, and let x, y, and z be distinct variables of type σ. We take T = T(x, y, z) to be the term structure obtained by defining for each type ρ, P ρ Q if and only if the following holds: P and Q are in normal form, there is a term S with no occurrences of y or z, and there are pairwise disjoint sets of occurrences A, B, Y , and Z of x in S such that all occurrences in A are positive, all occurrences in B are negative, and In other words, if P ρ Q , then we can obtain Q from P , assuming these are in normal form, by "increasing" some positive occurrences y to z (those occurrences in A) and "decreasing" some negative occurrences of z to y (those in B). The sets Y and Z are needed in order to make the whole construction work. More specifically, it follows from (5) that Lemma 8.1. T is a term structure. Proof. The (a) ⇒ (b) directions follow from Lemma 5.2. We show (b) ⇒ (a). We only argue that "monotone implies positive", as the argument that "antitone implies negative" is similar. Let M : τ be a term, and suppose y : σ and z : σ are distinct variables not appearing in M . Fix a normal form M and a variable x : σ that occurs freely in it. Take T to be the term structure T(x, y, z) studied in Lemma 8.1. Take φ to be the assignment generated by the identity substitution, φ (w) = id(w) = w .
Let (D, [[ ]]) be the structure obtained from T using Lemma 7.5. We apply (1b) to this structure. By monotonicity of ι, ι y ≤ σ ι z in D. We thus see that in D τ , Notice that M [y/x] and M [z/x] are β-normal forms, since M is a β-normal form. By definition of the order in T, there is a term S and sets of free occurrences of x in S, say A, B, Y , and Z, such that all occurrences in A are positive, all occurrences in B are negative, and However, z does not occur in the term on the left of (6), since z does not occur in M . And so B = Z = ∅. Similarly, y does not occur in the term on the right of (7)

A Complete Proof System
We come to the centerpiece of this work, the Monotonicity Calculus given by the rules of inference in Figure 3.
Syntax Our setting is similar to equational reasoning in simply typed lambda calculus (Friedman, 1975); however, our calculus deals with inequality assertions M σ N . We make such assertions when the types of M and N are both σ. We use Γ for a set of inequality assertions. We write Γ M σ N if there is a proof of M σ N from Γ, that is, if there is a finite tree with root M σ N , and each node either a leaf from Γ, or an application of one of the rules in Figure 3. Example 9.1. In Figure 4 we give a derivation using our ongoing example of real functions. The derivation is similar to the one depicted in Figure 1; we only include this one in full for reasons of space. The proof of "1 − 1 ≤ 2 − 0" uses two basic assumptions: "0 ≤ 1" and "x ≤ x + 0 for any x." Note in particular the use of Lemma 9.5. (We could alternatively have assumed λx.x λx. + (x)(0) in the same way we used the assumption deftly λx.x in Figure 1 Frequently we leave off the type σ in assertions S φ M σ N . We also write Γ M N if for all structures S such that S φ G H for all G H ∈ Γ and all assignments φ, we also have S φ M N for all assignments φ. The next result is a key fact about our system. It emphasizes the fact that we take open assertions in hypothesis sets Γ to be "universally quantified." Proposition 9.3. Let M : σ 1 → τ 1 and N : σ 2 → τ 2 be terms, let m 1 , m 2 m, and let σ and τ be types with σ σ 1 , σ 2 and τ 1 , τ 2 τ . Thus, σ 1 Proof. Let x be a variable of type σ which does not occur in M or N . Our hypotheses tell us that Γ M (x) τ N (x). By (Func), Γ λx.M (x) σ m →τ λx.N (x). By (η), (Equiv), and (Trans), Γ M τ N .
Remark 9.4. We do not know whether Proposition 9.3 holds without (Func). This would be important if one were to revise our meaning of the calculus. Currently Γ M N means that for all structures S such that S φ G H for all G H ∈ Γ and all assignments φ, we also have S φ M N for all assignments φ. Suppose we wish to change this to mean: for all S and φ such that S φ G H for all G H ∈ Γ, S φ M N for the same φ. Then our rules (ξ) and (Func) are no longer sound. We conjecture that dropping (ξ) and (Func) results in a complete system with the revised semantic interpretation.

Soundness
We fix a structure S and a valuation φ making all sentences in Γ true in S. We show by induction on derivations from Γ that if Γ M σ N , with the types of M and N being σ 1 σ and σ 2 σ, then We use the properties of pre-structures in Definition 3.5.
The most basic derivations from Γ are the elements of Γ itself. This case is trivial.
For (Ref), we use the fact that each relation σ is reflexive. We omit the easy details on (Trans).
In the rest of this proof, we shall deal with assertions Γ M N without any notation for the overall type; that is, we shall assume that the types of M and N are exactly σ.
Then by the "pointwise property" (Definition 3.5 part 4) of S, For (Mono), let M : σ The (Anti) rule is treated similarly. The soundness of the (Equiv) rule follows easily from Proposition 6.6.

Completeness
We next show that the monotonicity calculus is complete. Assume that Γ |= M σ N , with the types of M and N being σ 1 and σ 2 . We shall show that Γ M σ N using a term structure T whose associated structure S from Lemma 7.5 is called the canonical model of Γ. We recall Definition 7.2 and the notation there, especially the fact that each T σ is the set of ≈-equivalence classes of terms of type σ. We order T σ by This relation is well-defined by Proposition 9.2.
Claim 9.8. T is a term structure.
In particular, for every assertion G σ H in Γ, S G σ H.
Proof. Suppose Γ G σ H, and fix an assignment φ, and let ψ be a substitution such that ψ = φ. By Lemma 7.5,
We conclude the proof of completeness. We started with Γ |= M N , and now we show that Γ M N . By Lemma 9.9, the canonical model S satisfies Γ under all assignments. We apply this with the assignment φ given by φ(x) = x . Therefore

Conclusion
We have presented a calculus extending the simply typed lambda calculus with enough ordertheoretic infrastructure to represent arguments about increasing and decreasing functions. The calculus provides a mathematical foundation for a style of monotonicity reasoning that is often implicit in practical NLP work (e.g., MacCartney and Manning 2007; Angeli and Manning 2014;Angeli et al. 2016). Typically this research draws upon lexical resources such as WordNet and entailment relations learned from data, and uses these assumptions as input for proof search over derivations similar to those we considered here. Functional expressions will be marked as monotone or antitone either "by hand" or in an automated way. In addition to formalizing the proof procedures used in existing work, we believe the present study also suggests further possibilities for applied work. For instance, we have shown how more complex entailment assumptions can be stated and used in a flexible way, e.g., allowing comparison between functions of different polarity types (recall the example in Section 2).
From a technical point of view, our completeness result is analogous to a standard result for simply typed lambda calculus due to Friedman (1975). While our setting is considerably more complex, there still remain open issues here that were settled in the simpler setting: e.g., how to obtain completeness for "full" structures (Ex. 5.3). We leave such questions for future work.