Neural sentence generation from formal semantics

Sequence-to-sequence models have shown strong performance in a wide range of NLP tasks, yet their applications to sentence generation from logical representations are underdeveloped. In this paper, we present a sequence-to-sequence model for generating sentences from logical meaning representations based on event semantics. We use a semantic parsing system based on Combinatory Categorial Grammar (CCG) to obtain data annotated with logical formulas. We augment our sequence-to-sequence model with masking for predicates to constrain output sentences. We also propose a novel evaluation method for generation using Recognizing Textual Entailment (RTE). Combining parsing and generation, we test whether or not the output sentence entails the original text and vice versa. Experiments showed that our model outperformed a baseline with respect to both BLEU scores and accuracies in RTE.


Introduction
In recent years, syntactic and semantic parsing has been developed and improved significantly. Syntactic parsing based on syntactic theories has been accomplishing reasonable accuracy to support various application tasks (Clark and Curran, 2007;Lewis and Steedman, 2014;Yoshikawa et al., 2017). Mapping sentences to logical formulas automatically has also been studied in depth, so there are semantic parsing systems that can produce high quality formulas (Bos, 2008(Bos, , 2015Martínez-Gómez et al., 2016). * This work was done prior to joining Amazon.
One advantage of using logical formulas in semantic parsing is that they have expressive power that goes beyond simple representations such as predicate-argument structures. More specifically, logical formulas can capture aspects of sentence meanings that arise from complex syntactic structures such as coordination, functional words such as negation and quantifiers, and the scope of interactions between them (Steedman, 2000(Steedman, , 2012. In combination with the restricted use of higherorder logic (HOL) developed in formal semantics, those logical formulas have recently been used for RTE (Mineshima et al., 2015;Abzianidze, 2015) and Semantic Textual Similarity (STS) (Yanaka et al., 2017) and achieved high accuracy.
Compared with these recent developments in syntactic and semantic parsing, automatic generation of sentences from expressive logical formulas has received relatively less attention, despite a long and venerable tradition of work on surface realization, including those based on Minimal Recursion Semantics (MRS) (Carroll et al., 1999;Carroll and Oepen, 2005) and CCG (White, 2006;White and Rajkumar, 2009). If one can generate sentences from formulas, it would be possible to perform other NLP tasks in combination with RTE, including those challenging tasks such as paraphrase extraction (Levy et al., 2016) and sentence splitting and rephrasing (Narayan et al., 2017;Aharoni and Goldberg, 2018).
Meanwhile, sequence-to-sequence models showed high performance in machine translation and many other areas in NLP (Sutskever et al., 2014), yet their applications to sentence generation from logical meaning representations are still underdeveloped, mainly due to a lack of data and the structural complexity of meaning representations (Konstas et al., 2017). To address this challenge, we introduce a first sequence-tosequence model for sentence generation from logical formulas. We use the semantic parsing system ccg2lambda (Martínez-Gómez et al., 2016) 1 to obtain data annotated with logical formulas including higher-order ones. Since the distinction between content words and function words plays an important role in parsing and generation, we augment the sequence-to-sequence model with masking for predicates, so that content words in input logical formulas occur in output sentences with a list of function words utilized in the parsing system.
We also propose a novel evaluation method for sentence generation. BLEU (Papineni et al., 2002) is widely used to evaluate the quality of decoded sentences, but it has difficulties in assessing finegrained meaning relations between sentences. Instead, we use an RTE system for evaluation. We test whether or not the output sentence entails the original text and vice versa. This idea is motivated by the assumption that unlike surfacebased methods such as BLEU, textual entailment is sensitive to syntactic and semantic aspects of sentences, thus making it possible to distinguish fine-grained meaning relations between original and output sentences. RTE has also been shown to be effective for evaluation of machine translation (Padó et al., 2009). Experiments show that our model outperforms a baseline with respect to both BLEU scores and accuracy in RTE.

Input logical formula
For input, we use logical formulas obtained from ccg2lambda (Martínez-Gómez et al., 2016), a parsing and inference system that can be used for RTE. This system parses sentences into syntactic trees based on CCG (Steedman, 2000), a syntactic theory suitable for semantic composition from syntactic structures. The meaning of each word is specified using a lambda term. Logical formulas are obtained compositionally, by combining lambda terms in accordance with the meaning composition rules specified in the CCG tree and semantic templates. Semantic templates are defined manually based on formal semantics (Mineshima et al., 2015).
For logical formulas, we use standard Neo-Davidsonian event semantics (Parsons, 1990). For instance, the sentence Eddy walked on the green grass is represented as ∃e.(walk(e) ∧ subj(e) = 1 https://github.com/mynlp/ccg2lambda eddy ∧ ∃x.(green(x) ∧ grass(x) ∧ on(e, x))). In this semantics, content words such as nouns and verbs are represented as predicates, and function words such as determiners, negation, and connectives are represented as logical operators with scope relations.
We decided not to include the following linguistic information in the input formulas: the definiteindefinite and singular-plural distinctions for NPs and tense and aspect for VPs. The intention is to normalize these semantic differences, so that the resulting formulas are easily usable in reasoning tasks based on RTE, where such fine-grained linguistic distinctions may sometimes make it more difficult to establish entailment relations between sentences. While more fine-grained linguistic information in logical formulas is readily obtainable by modifying semantic templates, we leave testing formulas with such additional semantic information for future work.

Related Work
A large amount of work has been done to convert meaning representations to their surface forms. In addition to those works mentioned in Section 1, there has been also a line of work on generating sentences from meaning representations used in semantic parsing systems (Wong and Mooney, 2007;Lu et al., 2009). Recently, Mei et al. (2016) has proposed an end-to-end neural sentence generation model from such meaning representations. These studies use datasets annotated with meaning representations, such as ROBOCUP (www.robocup.org) and GEOQUERY (Zelle and Mooney, 1996). However, these meaning representations are much simpler than logical formulas used in formal semantics in that they do not contain logical operators such as disjunction and quantifiers nor variable binding structures in standard first-order logic.
Recent rule-based approaches to generation using formal semantics and higher-order logic include a type-theoretic system based on Grammatical Framework (GF) (Ranta, 2011) and a system called Treebank Semantics based on event semantics (Butler, 2016).
Closest to our work is the one based on AMRs (Konstas et al., 2017), which has achieved high performance in sentence generation using neural networks from AMR-graphs (Banarescu et al., 2013). While AMR has been used as an intermediate meaning representation for a wide range of tasks, it has less descriptive power than standard first-order logic (Bos, 2016). In addition, current AMRs do not support inference systems and thus cannot deal with logical inference as handled by RTE systems.

Method
We present a sequence-to-sequence model with attention for formula-to-sentence conversion.

Embedding
In using the sequence-to-sequence model, the point to address is how to linearize logical formulas. We test two ways of embedding: one is a token-based method where a formula is separated by each token (predicate and operator) and the other is based on graph representations converted from input logical formulas.
The token-based method tokenizes logical expressions. Below is an example of token-separated linearization: On the other hand, a graph representation reflects the structure of a logical formula. We use the formula-to-graph conversion presented in Wang et al. (2017). This method converts a formula to a tree structure and then obtains its graph representation by identifying the nodes for a same variable and replacing edges of the tree accordingly. See Wang et al. (2017) for the detail.

Sequence-to-Sequence with Attention
Our baseline model is a sequence-to-sequence with attention mechanism. Let x = (x 1 , . . . , x |x| ) and y = (y 1 , . . . , y |y| ) be an input formula and an output sentence, respectively. Then, the probability of the sentence y given a formula x is where y <i denotes the previously generated sequence of words at step i, x is the input formula and Θ are the model parameters. The function f is defined as and LSTM enc is an LSTM encoder, which calculates a hidden state h j using embedding vector v S (x j ) and its previous hidden state.
We train the entire model by optimizing the loglikelihood with respect to the training data.

Masking
Logical formulas contain predicates for content words that should invariably appear in decoded sentences. For instance, in the sentence Eddy walked on the green grass, its content words are Eddy, walked, green, and grass, while on and the are function words. Using ccg2lambda, we obtain the following formula for this sentence: To utilize the information available in a logical formula, we use a masking vector m ∈ {0, 1} N , where N is the size of the output vocabulary, which zeroes out the probabilities of words that do not appear in the formula (see Figure 1). Thus, instead of Eq. 1, we take the element-wise multiplication of the softmax probability and mask m as To construct the masking vector, we use a dictionary that maps a lemma to a list of its inflected forms, since logical formulas contain only lemmatized forms of words. The idea of using a masking vector can be seen as a simplified method of employing a coverage vector, as has been widely used in a line of work on chart realization by Kay (1996). Our method provides a simple adaptation to sequence-to-sequence models. We obtained the dictionary by applying the lemmatizer implemented in C&C parser (Clark and Curran, 2007) to all training data used in the experiment.
In the previous example, there is a dictionary entry that maps walk to the list walk, walks, walked and walking. We set 1 in m at positions that correspond to these inflected forms (see dict1 in Figure 1). Additionally, we made functional words always available at decoding, by using a predefined list of those words (see dict2 in Figure 1).

Dataset
We create a dataset annotated with logical formulas from the SNLI corpus (Bowman et al., 2015), a collection of 570,000 English sentence pairs manually labeled with an entailment relation. We use 50,000 hypothesis sentences from its training portion and split them into 42,000, 4,000, and 4,000 sentences for our training, development, and test sets, respectively. We map the sentences into logical formulas using ccg2lambda. We use C&C parser for converting tokenized sentences into CCG trees. Table 1 shows the number of words in the constructed corpus (vocab), the max length (maxlen) and average length (ave-len) of sequences obtained for the token-based (token) and graphbased methods (graph). Here output shows information on the output sentences.
As a baseline, we use Treebank Semantics (Butler, 2016) 2 , a rule-based system for parsing and generation with logical formulas based on event semantics.

Evaluation
For evaluation, AMR generation tasks (Konstas et al., 2017) use BLEU, which does not directly consider the meaning and structure of a sentence. For instance, two sentences No one visited the old man to greet him and Someone visited the old man to greet him are similar but differ in meaning.
To avoid this problem, we propose an evaluation method using parsing and RTE. Namely, we first vocab max-len ave-len token 6,822 419 44 graph 6,747 145 17 output 8,875 40 8  parse an input sentence S 1 to obtain a formula P and then generate a sentence S 2 from the formula P . Finally, we check whether S 1 entails S 2 and vice versa. Our method based on RTE can detect differences in meaning in cases like the above. We measure the accuracy of RTE for unidirectional and bidirectional entailments: S 1 ⇒ S 2 , S 2 ⇒ S 1 and S 1 ⇔ S 2 . We use ccg2lambda for parsing original and generated sentences and proving entailment relations between them. We use 400 pairs of sentences taken from the test set for RTE experiments. The inference system outputs yes (entailment), no (contradiction) or unknown. The gold answer is set to yes. The parsing and inference system of ccg2lambda achieved high precision in RTE tasks;  reported that the precision was nearly 100% for the SICK dataset (Marelli et al., 2014). Thus, a predicted entailment (yes) judgement can serve as a reliable measure for evaluating the entailment relation between S 1 and S 2 . Table 2 shows BLEU scores and RTE accuracy. Here, token and graph show the results for a token-based model with attention and the graphbased model with attention, respectively, and +mask means the model with masking. The baseline is shown by rule, which is the performance of Treebank Semantics. As shown here, all the models outperformed the baseline with respect to both BLEU score and RTE accuracy. For the RTE accuracy, the increase in the score of the graph + mask model was slightly larger than the increase for the token + mask model. Table 3 shows examples of decoded sentences Input sentence (S 1 ) Decoded sentence (S 2 ) (1) the girls are swimming in the ocean.

Results
the girls are swimming in the ocean. (2) a dog is playing fetch with his owner. a dog is playing fetch with owner. (3) a man is sitting on the couch. the men are sitting on a couch. (4) a tall man. the man is tall. (5) a child is standing. the children are standing together. (6) there are several people in this picture.
people are pictured in a picture. Table 3: Examples of decoded sentences obtained from the graph + mask model obtained from the graph + mask model. (1) and (2) are examples that preserve the form of input sentences. (3) is an example where singular form is changed to plural form, as well as articles a and the. This is because our semantics neutralizes these distinctions. The decoded sentence is grammatically correct, accommodating be-verbs. In (4), the input is a noun phrase, while the decoded result is a sentence. Example (5) contains an unnecessary word together, but the subject is also changed so that the decoded sentence is meaningful. In example (6), the there-construction in the input is removed while preserving the same content.

Conclusion
To our knowledge, this is the first study to describe a neural sentence generation model from logical formulas. We also proposed a new evaluation method based on RTE. In future work, we will refine our model for generation of longer sentences and test formulas with richer semantic information.