Towards Broad-coverage Meaning Representation: The Case of Comparison Structures

Representing the underlying meaning of text has been a long-standing topic of interest in computational linguistics. Recently there has been a renewed interest in computational modeling of meaning for various tasks such as semantic parsing (Zelle and Mooney, 1996; Berant and Liang, 2014). Opendomain and broad-coverage semantic representation (Banarescu et al., 2013; Bos, 2008; Allen et al., 2008) is essential for many language understanding tasks such as reading comprehension tests and question answering. One of the most common way for expressing evaluative sentiment towards different entities is to use comparison. Comparison can happen in very simple structures such as ‘John is taller than Susan’, or more complicated constructions such as ‘The table is longer than the sofa is wide’. So far the computational semantics of comparatives and how they affect the meaning of the surrounding text has not been studied effectively. That is, the difference between the existing semantic and syntactic representation of comparatives has not been distinctive enough for enabling deeper understanding of a sentence. For instance, the general logical form representation of the sentence ‘John is taller than Susan’ using the Boxer system (Bos, 2008) is the following:


Introduction
Representing the underlying meaning of text has been a long-standing topic of interest in computational linguistics. Recently there has been a renewed interest in computational modeling of meaning for various tasks such as semantic parsing (Zelle and Mooney, 1996;Berant and Liang, 2014). Opendomain and broad-coverage semantic representation (Banarescu et al., 2013;Bos, 2008;Allen et al., 2008) is essential for many language understanding tasks such as reading comprehension tests and question answering.
One of the most common way for expressing evaluative sentiment towards different entities is to use comparison. Comparison can happen in very simple structures such as 'John is taller than Susan', or more complicated constructions such as 'The table is longer than the sofa is wide'. So far the computational semantics of comparatives and how they affect the meaning of the surrounding text has not been studied effectively. That is, the difference between the existing semantic and syntactic representation of comparatives has not been distinctive enough for enabling deeper understanding of a sentence. For instance, the general logical form representation of the sentence 'John is taller than Susan' using the Boxer system (Bos, 2008)  Consider a more complex comparison example, 'The pizza was great, but it was still worse than the sandwich'. The stateof-the-art sentiment analysis system (Manning et al., 2014) assigns an overall 'negative' sentiment value to this sentence, which clearly lacks the understanding of the comparison happening in the sentence. As another example, consider the generic meaning representation of the sentence 'My Mazda drove faster than his Hyundai', according to frame semantic parsing using Semafor 1 tool (Das et al., 2014) as depicted in Figure 1. It is evident that this meaning representation does not fully capture how the semantics of the adjective fast relates to the driving event, and what it actually means for a car to drive faster than another car. More importantly, there is an ellipsis in this sentence, the resolution of which results in complete reading of 'My Mazda drove faster than his Hyundai drove fast', which is in no way captured in Figure 1 2 .
Although the syntax and semantics of comparison in language have been studied in linguistics for a long time (Bresnan, 1973;Cresswell, 1976;Von Stechow, 1984), so far, computational modeling of the semantics of comparison components of natural language has not been developed fundamentally. The lack of such a computational framework has left the deeper understanding of comparison structures still baffling to the currently existing NLP systems. In this paper we summarize our efforts on defining a joint framework for comprehensive semantic representation of the comparison and ellipsis constructions. We jointly model comparison and ellipsis as inter-connected predicate-argument structures, which enables automatic ellipsis resolution. In the upcoming sections we summarize our main contributions to this topic.

A Comprehensive Semantic Framework for Comparison and Ellipsis
We introduce a novel framework for modeling the semantics of comparison and ellipsis as interconnected predicate-argument structures. According to this framework, comparison and ellipsis operators are the predicates, where each predicate has a set of arguments called its semantic frame. For example, in the sentence '[Sam] is the tallest [student] [in the gym]', the morpheme -est is the comparison operator (hence, the comparison predicate) and the entities in the brackets are the arguments.

Predicates
We consider two main categories of comparison predicates (Bakhshandeh and Allen, 2015;Bakhshandeh et al., 2016), Ordering and Extreme, each of which can grade any of the four parts of speech including adjectives, adverbs, nouns, and verbs.
• Ordering: Shows the ordering of two or more entities on a scale, with the following subtypes: -Comparatives expressed by the morphemes more/-er and less, with '>', '<' indicating that one degree is greater or lesser than the other.
(1) The steak is tastier than the potatoes.

Superlative+
Joe is the most eager boy ever. -Equatives expressed by as in constructions such as as tall or as much, with '≥' indicating that one degree equals or is greater than another.
(2) The Mazda drives as fast as the Nissan.
-Superlatives expressed by most/-est and least, indicates that an entity or event has the 'highest' or 'lowest' degree on a scale.
(3) That chef made the best soup.
The details of the Extreme type can be found in the earlier work (Bakhshandeh and Allen, 2015;Bakhshandeh et al., 2016).

Arguments
Each predicate takes a set of arguments that we refer to as the predicate's 'semantic frame'. Following are the main arguments included in our framework: - Figure (Fig): The main role which is being compared.
-Ground: The main role Figure is compared to.
-Scale: The scale for the comparison, such as length, depth, speed. For a more detailed study on scales please refer to the work on learning adjective scales (Bakhshandeh and Allen, 2015). Our framework also includes 'Standard', 'Differential', 'Domain', and 'Domain Specifier' argument types. Figure 2 shows an example meaning representations based on our framework.

Ellipsis Structures
As mentioned earlier in Section 1, resolving ellipsis in comparison structures is crucial for language understanding and failure to do so would deliver an incorrect meaning representation. In linguistics various subtypes of elliptical constructions are studied (Kennedy, 2003;Merchant, 2013;Yoshida et al., 2016). In our framework we mainly include six types which are seen in comparison struc-tures (Bakhshandeh et al., 2016): 'VP-deletion', 'Stripping' 3 , 'Pseudo-gapping', 'Gapping', 'Sluicing', and 'Subdeletion'. Ellipsis more often occurs in comparative and equative comparison constructions. A few examples of ellipsis in comparative constructions are as follows: • Comparatives: Ellipsis site is the dependent clause headed by than. Three ellipsis possibilities for these clauses resuming (4) are shown below. The elided materials are written in subscript.
Furthermore, we define three argument types for ellipsis, which help thoroughly construct the antecedent of the elided material by taking into account the existing words of the context sentence: Reference, Exclude, and How-much.

Data Collection Methodology
Given the new semantic representation, we aim at annotating corpora which then enables developing and testing models. The diversity and comprehensiveness of the comparison structures represented in our dataset is dependent on the genre of sentences comprising it. Earlier, we had experimented with annotating semantic structures on OntoNotes dataset (Bakhshandeh and Allen, 2015). Recently (Bakhshandeh et al., 2016), We have shifted our focus to actual product and restaurant reviews, which include many natural comparison instances. For this purpose we mainly use Google English Web Treebank 4 which comes with gold constituency parse trees. We augment this dataset with the Movie Reviews dataset (Pang and Lee, 2005), where we use Berkeley parser (Petrov et al., 2006) to obtain parse trees.
We trained linguists by asking them to read the semantic framework annotation manual as summa-3 VP-deletion and stripping are the more frequent types. 4 https://catalog.ldc.upenn.edu/ LDC2012T13 rized in Section 2. The annotations were done via our interactive two-stage tree-based annotation tool. For this task, the annotations were done on top of constituency parse trees. This process yielded a total of 2,800 annotated sentences. Figure 3 visualizes the distribution of predicate types from the various resources. As this Figure shows, reviews are indeed a very rich resource for comparisons, having more comparison instances than any other resource of even a bigger size. There are a total of 5,564 comparison arguments in our dataset, with 'scale' and ' figure' being the majority types. The total number of ellipsis predicates is 240, with 197 Stripping, 31 VP-deletion and 12 Pseudo-gapping.

Predicting Semantic Structures
We model the prediction problem as a joint predicate-argument prediction of comparison and ellipsis structures. In a nutshell, we define a globally normalized model for the probability distribution of comparison and ellipsis labels over all parse tree nodes as follows: where T is the underlying constituency tree, p C is the probability of assigning predicate type c as the predicate type and p Ac is the probability of assigning the argument type a c as the argument type. In  Table 2: Results of argument prediction on test set, averaged across various argument types. each of the above equations, f is the corresponding feature function. For predicates and the arguments the main features are lexical features and bigram features, among many others. θ C , θ E , θ ac is the parameters of the log-linear model. We calculate these parameters using Stochastic Gradient Descent algorithm.
For inference, we model the problem as a structured prediction task. Given the syntactic tree of a given sentence, for each node we first select the predicate type with the highest p C . Then for each selected comparison predicate, we find the corresponding ellipsis predicate that has the highest p E probability. We tackle the problem of argument assignment by Integer Linear Programming (ILP), where we pose domain-specific linguistic knowledge as constraints. Any specific comparison label calls for a unique set of constraints in the ILP formulation, which ensures the validity of predictions 5 . The details of this modeling can be found in earlier work (Bakhshandeh et al., 2016).

Experimental Results
We trained our ILP model on the train-dev part of the dataset (70%), and tested on the test set (30%). Evaluation is done against the reference gold annotation, with Exact and partial (Head) credits to annotating the constituency nodes. We mainly report on two models: our comprehensive ILP model (detailed in Section 4), and a rule-based baseline. In short, the baseline encodes the same linguistically motivated ILP constraints via rules and uses a few pattern extraction methods for finding comparison morphemes.
The average results on predicate prediction (across all types) is shown in Table 1. As the results show, overall, the scores are high for predicting the predicates, what is not shown here is ellipsis predicates being the most challenging. The baseline is competitive, which shows that the linguistic patterns can capture many of the predicate types. Our model performs the poorest on Equatives, achieving 71%/73% F1 score, which is a complex morpheme used in various linguistic constructions. Our analysis shows that the errors are often due to inaccuracies in automatically generated parse trees 6 . As you can see in Table 2, The task of predicting arguments is a more demanding task. The baseline performs very poorly at predicting the arguments. Our comprehensive ILP model consistently outperforms the No Constraints model, showing the effectiveness of our linguistically motivated ILP constraints.

Conclusion
In this work we summarized our work which focuses on an aspect of language with a very rich semantics: Comparison and Ellipsis. The current tools and methodologies in the research community are not able to go beyond surface-level shallow representations for comparison and ellipsis structures. We have developed widely usable comprehensive semantic theory of linguistic content of comparison structures. Our representation is broad-coverage and domain-independent, hence, can be incorporated as a part of any broad-coverage semantic parser (Banarescu et al., 2013;Allen et al., 2008;Bos, 2008) for augmenting their meaning representation.