Computational Syntax-Semantics Interface with Type-Theory of Acyclic Recursion for Underspecified Semantics

The paper provides a technique for algorithmic syntax-semantics interface in computational grammar with underspeciﬁed semantic representations of human language. The technique is introduced for expressions that contain NP quantiﬁers, by using computational, generalised Constraint-Based Lexicalised Grammar (GCBLG) that represents major, common syntactic characteristics of a variety of approaches to formal grammar and natural language processing (NLP). Our solution can be realised by any of the grammar formalisms in the CBLG class, e.g., Head-Driven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Categorial Grammar (CG). The type-theory of acyclic recursion L λ ar , provides facility for representing major semantic ambiguities, as under-speciﬁcation, at the object level of the formal language of L λ ar , without recourse of meta-language variables. Speciﬁc semantic representations can be obtained by instantiations of underspeciﬁed L λ ar - terms, in context. These are subject to constraints provided by a newly introduced feature-structure description of syntax-semantics interface in GCBLG.


Introduction
Ambiguity permeates human language, in all of its manifestations, by interdependences, across lexicon, syntax, semantics, discourse, context, etc. Alternative interpretations may persist even when specific context and discourse resolve or discard some specific instances in syntax and semantics. We present computational grammar that integrates lexicon, syntax, types, constraints, and semantics. The formal facilities of the grammar have components that integrate syntactic constructions with semantic representations. The syntax-semantic interface, internally in the grammar, handles some ambiguities as phenomena of underspecification in human language.
We employ a computational grammar, which we call Generalised Constraint-Based Lexicalised Grammar (GCBLG). The formal system GCBLG uses feature-value descriptions and constraints in a grammar with a hierarchy of dependent types, which covers lexicon, phrasal structures, and semantic representations. In GCBLG, for the syntax, we use feature-value descriptions, similar to that in Sag et al. (2003), which are presented formally in Loukanova (2017a) as a class of formal languages designating mathematical structures of functional domains of linguistics information. GCBLG is a generalisation from major lexical and syntactic facilities of frameworks in the class of Constraint-Based Lexicalist Grammar (CBLG) approaches. To some extend, this is reminiscence of Vijay-Shanker and Weir (1994). We lift the idea of extending classic formal grammars to cover semantic representations with semantic underspecification via syntax-semantics interface within computational grammar.
The grammar rules and constraints in GCBLG, in syntax and lexicon, carry semantic representations. The formal language of the semantic representations is a specialised feature-value encoding of terms of the formal language of acyclic recursion L λ ar , see Moschovakis (2006). Types L λ ar is the smallest set defined recursively, e.g., by presenting the rules in Backus-Naur form: Typed Vocabulary of L λ ar : For each type τ ∈ Types, L λ ar has typed constants, and variables. Constants K: a denumerable (e.g., finite) set, The sets of constants and variables of both kinds are mutually distinct: K = RV = PV.
Terms of L λ ar : The set of the terms of L λ ar is Terms = ∪ τ ∈Types Terms τ , where for every τ ∈ Types, the terms in Terms τ are defined recursively as follows: • Constants: If c ∈ K τ , then c ∈ Terms τ , denoted by c : τ and c τ • Variables: If x ∈ PV τ ∪ RV τ , then x ∈ Terms τ , denoted by x : τ and x τ • Application Terms: If A ∈ Terms (σ→τ ) and B ∈ Terms σ , then A(B) ∈ Terms τ , denoted by Recursion terms: For any n ≥ 0, if A i ∈ Terms σ i (i = 0, . . . , n) and p i ∈ RV σ i (i = 1, . . . , n) are such that p 1 , . . . , p n are pairwise different, and the sequence { p 1 := A 1 , . . . , p n := A n } satisfies the Acyclicity Constraint, i.e., it is acyclic, then A 0 where {p 1 := A 1 , . . . , p n := A n } ∈ Terms σ 0 . The type assignment of the recursion term is denoted by: Acyclicity Constraint (AC): the sequence of assignments {p 1 := A 1 , . . . , p n := A n } is acyclic iff there is a function rank : {p 1 , . . . , p n } −→ N such that, for all p i , p j ∈ {p 1 , . . . , p n }, We use the meta-symbol ≡ for identity between expressions, e.g., E 1 ≡ E 2 , and in abbreviations. The sets FreeV(A) and BoundV(A), respectively of the free and bound variables of a term A, are defined by structural recursion, in the usual way, with the exception of the recursion terms. For any given recursion term A ≡ [A 0 where {p 1 := A 1 , . . . , p n := A n }], all occurrences of p 1 , . . . , p n ∈ RV in A are bound, and all other free (bound) occurrences of variables (constants) in A 0 , . . . , A n are also free (bound) in A.
The reduction calculus of L λ ar has a set of reduction rules that reduce each L λ ar -term A to its unique, up to congruence, canonical form cf(A), i.e., A ⇒ cf(A). Informally, for every A, B ∈ Terms: A ≈ B iff (1) A and B are proper terms and their denotations are equal and computed by the same algorithm determined by cf(A) and cf(B); (2) or, A and B are immediate, and A and B have the same denotations.
See Moschovakis (2006) and Loukanova (2016Loukanova ( , 2019Loukanova ( , 2018, for details on the denotational and algorithmic semantics of L λ ar . For the first representation of semantic underspecification with the theory of acyclic algorithms, see Loukanova (2007).

Underspecified Terms and Universal Syntax
Definition 1 (Underspecified L λ ar -Terms). For any A ∈ Terms, we call A an underspecified term, in case FreeV(A) ∩ RV = ∅, i.e., when A has free occurrences of recursion variables; otherwise A is specified.
We represent some of the semantic ambiguities of natural language sentences by rendering them into underspecified L λ ar -terms. In this paper, we consider a class of typical ambiguous sentences, which have different scope readings, due to occurrences of multiple quantifier NPs in them. For any such sentence Φ, direct representation of alternative readings, by a set of different L λ ar -terms, is available, e.g., Φ render −−−→ A i , i = 1, . . . , n, for some n ≥ 1. Without any specific context, all of these alternative readings can be potentially viable. Instead of rendering Φ into the set of these specific terms, e.g., by some (syntactic or other) analyses, we render Φ into a single, underspecified term A that represents the set of the alternatives. When context information is available, e.g., by data driven methods, individual or all A i can be derived from A. That can be done by instantiating the free recursion variables of A, and thus instantiating A, via expanding A by adding recursion assignments that bind its free recursion variables. We impose constraints over the free recursion variables FreeV(A) ∩ RV, e.g., due to the syntactic structure of A, via syntax-semantic analysis, to prevent undesirable alternative instantiations and denotational interpretations den A (A)(g).
Informally, we can restrict the instantiations of A by constraints over possible bindings of recursion variables that occur in A. For any given A, R ∈ Terms and p ∈ RV, the relation (R rBind p) holds between R and p in A, when R recursively binds p in A, via a sequence of recursion assignments and/or by λ-abstraction across recursion assignments. Thus, rBind provides specification relation between underspecified and specified L λ ar -terms. The formal treatment of rBind, which is not in the subject of this paper, is based on the binding relation introduced in Loukanova (2017b). Here, we focus on the technique of rendering natural language expressions into underspecified terms, via syntax-semantics interface.
For example, the terms in (3d)-(3f) render the sentence "Fido barks", via unordered, i.e., abstract, universal syntax. (3c) renders the verb "barks", which is of lexical type verb in the lexicon, to the L λ arterm T b . T b has in its where-assignment, the term barks : ( e → t), which is not a constant, but a complex term that caries information about time in relation to possible, underspecified time of potential utterance, see Loukanova (2011b). While the term T b : t is of sentential type, it is unsaturated because it has a free recursion variable p that fills the argument slot of b, and which is without any constraint over it, regarding possible binding of p.
-NP word in lexicon; semantically specified: -V lexeme in lexicon; rendered to a constant; semantically specified: -Inflected V word in lexicon; semantically underspecified term carrying information about time; unsaturated for p ∈ FreeV(T t b ); it becomes constrained VP in the sentence analysis: -Sentence: universal, unordered syntax with SynSem; semantically underspecified; SynSem constrained: -Universal, unordered syntax with SynSem; semantically specified by the SynSem constraint: -Semantically specified by the SynSem constraint; reduced chain assignments: -Surface ordered syntax with SynSem; semantically specified by the SynSem constraint: An alternative syntax-semantics analysis of the same sentence "Fido barks", can be obtained by using a λ-term [T ′ b ] ( e→ t) rendering a VP like "barks", as in (4), the type of which, via λ(x)-abstraction, reflects that it is of a functional type. This analysis results in the same term for rendering the sentence, as in (3g), but with different intermediate reductions of the canonical forms. The analysis uses intermediate steps with γ * -reduction, or, alternatively, γ-reduction, introduced in Loukanova (2019, 2018) for canonical forms cf γ * and cf γ , correspondingly. These reduction extensions of the classic reduction from Moschovakis (2006), provide computational simplifications in many cases, like this one.
designates an actant; semantically underspecified: In (5a)-(5e), we present stages of syntax-semantics analysis, of a sentence with one quantifier NP and an intransitive verb, via unordered, i.e., abstract, universal syntax-semantics interface. (5b) renders the verb "barks", which is of lexical type verb (a subtype of the type word) in the lexicon, to the L λ ar -term T b . The term T b has in its where-assignment, the term barks : ( e → t), which is not a constant, but a complex term that caries information about time in relation to possible time of a potential utterance, see Loukanova (2011b). While the term T b : t is of sentential type, it is unsaturated because it has a free recursion variable p that fills the argument slot of b, and which is without any constraint over it, regarding possible binding of p. The syntactic structure of the sentence saturates the syntactic argument of the VP, [barks] VP , with the subject NP, e.g., [every dog] NP .
-NP phrase, in grammar with SynSem; semantically specified: -Inflected V word in lexicon; semantically underspecified term carrying information about time; unsaturated for p ∈ FreeV(T t b ); it becomes constrained VP in the sentence analysis: -Sentence: universal, unordered syntax with SynSem; semantically underspecified; SynSem constrained: -Universal, unordered syntax with SynSem; semantically specified by SynSem constraint: -Surface ordered syntax with SynSem; semantically specified by the SynSem constraint: Note that the tree structures of the syntactic analyses in Sect. 6 have unordered daughters, and in essence, these are three demential graphs.

Syntax-Semantics Interface in GCBLG by the Type-Theory of Acyclic Recursion
A formal background on generalised GCBLG is given in Loukanova (2017a). We use a formal featurevalue language for type-theoretical descriptions of computational syntax of human language. Sag et al. (2003) is a detailed introduction to formal grammar of human language, by providing and using theoretical linguistics. In this version of generalised GCBLG, semantic representations of human language expressions, are provided by L λ ar -terms of the formal language of acyclic recursion L λ ar , by using featurevalue structures, via the technique introduced in Loukanova (2011a). Here, we provide two of the major grammatical rules of GCBLG, enhanced with semantic representation, and a new, additional featurevalue description on constraints over semantic representations, via syntax-semantics interface. For this, we introduce a new feature SYNSEM of type synsem, with values of type list-of(propositions).
The analysed natural language expressions are rendered into L λ ar -terms, in canonical forms, by using feature-value representations of the recursion terms, according to the following rule: Rendering syntactic structures into L λ ar -terms: Assume that a natural language expression E is analysed by GCBLG as a feature-value description F (A). Assume that, in F (A), the value of the feature T-HEAD is a L λ ar -term A 0 , and the value of the feature WHERE is a sequence − → p :=

− →
The value of the feature L-TYPE is the L λ ar -type of A 0 . We use recursion variables to designate the rendering terms in the feature-value descriptions, which can be used in the combined terms.
The rules HSR and HCR1 take as inputs expressions that have semantic representations in canonical forms and generate phrases with semantic representations that are in canonical forms too. The values of T-HEAD and WHERE of the left hand side of the rules are determined by: the semantic types T 1 and T 2 of the daughter nodes; the values of T-HEAD and WHERE in the daughters' feature structures on the right hand side; and the definition of the canonical form cf(A) of each term A.
Case2 covers expressions such as NPs, VPs, and sentences S, where some argument slots are bound by sub-terms of the renderings of NP quantifiers, via recursion variables. The technique is exemplified by the analysis of the sentence "Every cat hugs some dog", which represents a general pattern for multiple quantifier scopes. It is presented in Fig. 3.

Scope Underspecification and Specification
Scope Underspecification: The feature-value structural description given in Fig. 3 is a pattern of one of the possible ways to represent multiple semantic scopes of quantification. The syntax-semantics interface is provided by the HSR and HCR1 rules of GCBLG, while the formal calculi of L λ ar , provides terms for algorithmic semantics. A sentence that has occurrences of multiple quantifier NPs can be rendered into a single L λ ar -term that represents the multiple possibilities simultaneously. It is underspecified, by having constrained free recursion variables filling up the argument slots that are bound by the corresponding NPs.
Scope Specification: The binding relation rBind in L λ ar provides a syntax-semantics facility to constrain the possible bindings in specific scopes, via restricting free recursion variables occurring in an underspecified term. An underspecified term can be expanded into specified terms only if they satisfy the rBind-constraints. Thus, constraints expressed by rBind also restrict the possible interpretations of the free recursion variables. For instance, the semantic rendering in Fig. 3 can be expanded into specified semantic representation, which has to satisfy the constraints imposed by the structural combinations. Fig. (4) presents one of the available possibilities to instantiate the free recursion variables, i.e., to bind them by the corresponding NP quantifiers.

Grammar Analyses with Syntax-Semantics Interface in GCBLG
In this section, we provide several analyses of typical sentences with head verbs taking NPs as their syntactic, and corresponding semantic, arguments. We exemplify the classes of intransitive and transitive head verbs having NP subjects and single NP complements.
Note that in the analysis in Fig. 1, we do not render the NP that is a proper name into a term of a generalised quantifier, since that would introduce unnecessary complexity in the analysis of expressions of this kind. Furthermore, the term [T b ]t that renders the VP is as in (5b). Optionally, we can use a λ-abstraction term [T ′ b ] ( e→ t) , and then, the sentence gets rendered to an algorithmically equivalent term by the γ * -reduction calculus of L λ ar , see Loukanova (2019Loukanova ( , 2018. In Fig. 2, we present a syntax-semantics analysis of a sentence with a single quantifier NP, in the subject position of the intransitive V. Optionally, we can use a λ-abstraction term [T ′ b ] ( e→ t) , resulting in an algorithmically equivalent term for the sentence, by the γ * -reduction calculus of L λ ar . Figure 1: A Proper Name as the Specifier of an Intransitive Verb Figure 2: Quantifier NP in Subject Position, Specified In Fig. 3, we present a syntax-semantics analysis of a sentence with two quantifier NPs, in subject and compliment positions of the head verb V. In Fig. 4, we present its specification to one of the alternative scope distributions. In it, we have specified the VP at the intermediate level of the analysis, in the node marked by (n 1 ) VP. Pragmatically, it is more viable for this node to be underspecified as it is in 3, and the specification is at the node (n 0 ) S, at the sentence level, e.g., when disambiguating information is available at that level.
In Fig. 5, we present an optional analysis of the same sentence, by rendering the head verb to a λterm, which is congruent to λ( ). In such a case, there is a correspondence between the syntactic and semantic saturation types of the V and VP expressions. But, the term that renders the sentence at the node (n 0 ) S, is not algorithmically equivalent to the one in Fig. 3, and naturally reflect on the algorithmic steps that are used during the analyses for filling up arguments. Similarly, the specification terms corresponding to these two options are not algorithmically equivalent.

Conclusions and Future Work
We have presented how the formal language of the theory of acyclic recursion L λ ar can be used for semantic representations of natural language via syntax-semantics interface in computational grammar. We have introduced the technique by generalised GCBLG that employs feature-value descriptions. GCBLG is type theoretical by its hierarchy of constraints, which are of dependent types, for the syntactic compositions. The feature-value descriptions embed semantic representations by the higher-order L λ ar -terms. We have focused on two of the major grammar rules for saturation of syntactic and semantic arguments, for underspecified semantic representations of multiple quantifier scopes. A sentence that has two (or more) quantifier NPs, with ambiguous semantic scopes, can be rendered into a single, underspecified L λ ar -term A. The key idea is that A has free recursion variables saturating arguments that can be bound by corresponding quantifier NPs in alternative orders. The phrasal rules of GCBLG introduce restrictions over possible bindings via recursion assignments in syntax-semantics interface.
We foresee extending and implementing the technique for computational syntax-semantics interface in lexical and phrasal structures, for broader grammatical constructions.   Figure 4: Specification: Quantifier NP as Subject and Quantifier NP as Complement: de dicto