Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing

This paper proposes a neural semantic parsing approach – Sequence-to-Action, which models semantic parsing as an end-to-end semantic graph generation process. Our method simultaneously leverages the advantages from two recent promising directions of semantic parsing. Firstly, our model uses a semantic graph to represent the meaning of a sentence, which has a tight-coupling with knowledge bases. Secondly, by leveraging the powerful representation learning and prediction ability of neural network models, we propose a RNN model which can effectively map sentences to action sequences for semantic graph generation. Experiments show that our method achieves state-of-the-art performance on Overnight dataset and gets competitive performance on Geo and Atis datasets.

A semantic parser needs two functions, one for structure prediction and the other for semantic grounding.Traditional semantic parsers are usually based on compositional grammar, such as CCG (Zettlemoyer andCollins, 2005, 2007), DCS (Liang et al., 2011), etc.These parsers compose structure using manually designed grammars, use lexicons for semantic grounding, and exploit fea-  tures for candidate logical forms ranking.Unfortunately, it is challenging to design grammars and learn accurate lexicons, especially in wideopen domains.Moreover, it is often hard to design effective features, and its learning process is not end-to-end.To resolve the above problems, two promising lines of work have been proposed: Semantic graph-based methods and Seq2Seq methods.
Semantic graph-based methods (Reddy et al., 2014(Reddy et al., , 2016;;Bast and Haussmann, 2015;Yih et al., 2015) represent the meaning of a sentence as a semantic graph (i.e., a sub-graph of a knowledge base, see example in Figure 1) and treat semantic parsing as a semantic graph matching/generation process.Compared with logical forms, semantic graphs have a tight-coupling with knowledge bases (Yih et al., 2015), and share many commonalities with syntactic structures (Reddy et al., 2014).Therefore both the structure and semantic constraints from knowledge bases can be easily exploited during parsing (Yih et al., 2015).The main challenge of semantic graph-based parsing is how to effectively construct the semantic graph of a sentence.Currently, semantic graphs are either constructed by matching with patterns (Bast and Haussmann, 2015), transforming from dependency tree (Reddy et al., 2014(Reddy et al., , 2016)), or via a staged heuristic search algorithm (Yih et al., 2015).These methods are all based on manuallydesigned, heuristic construction processes, making them hard to handle open/complex situations.
In recent years, RNN models have achieved success in sequence-to-sequence problems due to its strong ability on both representation learning and prediction, e.g., in machine translation (Cho et al., 2014).A lot of Seq2Seq models have also been employed for semantic parsing (Xiao et al., 2016;Dong and Lapata, 2016;Jia and Liang, 2016), where a sentence is parsed by translating it to linearized logical form using RNN models.There is no need for high-quality lexicons, manually-built grammars, and hand-crafted features.These models are trained end-to-end, and can leverage attention mechanism (Bahdanau et al., 2014;Luong et al., 2015) to learn soft alignments between sentences and logical forms.
In this paper, we propose a new neural semantic parsing framework -Sequence-to-Action, which can simultaneously leverage the advantages of semantic graph representation and the strong prediction ability of Seq2Seq models.Specifically, we model semantic parsing as an end-to-end semantic graph generation process.For example in Figure 1, our model will parse the sentence "Which states border Texas" by generating a sequence of actions [add variable:A, add type:state, ...].To achieve the above goal, we first design an action set which can encode the generation process of semantic graph (including node actions such as add variable, add entity, add type, edge actions such as add edge, and operation actions such as argmin, argmax, count, sum, etc.).And then we design a RNN model which can generate the action sequence for constructing the semantic graph of a sentence.Finally we further enhance parsing by incorporating both structure and semantic constraints during decoding.
Compared with the manually-designed, heuristic generation algorithms used in traditional semantic graph-based methods, our sequence-toaction method generates semantic graphs using a RNN model, which is learned end-to-end from training data.Such a learnable, end-to-end generation makes our approach more effective and can fit to different situations.
Compared with the previous Seq2Seq semantic parsing methods, our sequence-to-action model predicts a sequence of semantic graph generation actions, rather than linearized logical forms.We find that the action sequence encoding can better capture structure and semantic information, and is more compact.And the parsing can be enhanced by exploiting structure and semantic constraints.For example, in GEO dataset, the action add edge:next to must subject to the semantic constraint that its arguments must be of type state and state, and the structure constraint that the edge next to must connect two nodes to form a valid graph.
The results show that our method achieves state-of-the-art performance on OVERNIGHT dataset and gets competitive performance on GEO and ATIS datasets.
The main contributions of this paper are summarized as follows: • We propose a new semantic parsing framework -Sequence-to-Action, which models semantic parsing as an end-to-end semantic graph generation process.This new framework can synthesize the advantages of semantic graph representation and the prediction ability of Seq2Seq models.
• We design a sequence-to-action model, including an action set encoding for semantic graph generation and a Seq2Seq RNN model for action sequence prediction.We further enhance the parsing by exploiting structure and semantic constraints during decoding.
Experiments validate the effectiveness of our method.
2 Sequence-to-Action Model for End-to-End Semantic Graph Generation Given a sentence X = x 1 , .
where y <t = y 1 , ..., y t−1 .To achieve the above goal, we need: 1) an action set which can encode semantic graph generation process; 2) an encoder which encodes natural language input X into a vector representation, and a decoder which generates y 1 , ..., y |Y | conditioned on the encoding vector.In following we describe them in detail.

Actions for Semantic Graph Generation
Generally, a semantic graph consists of nodes (including variables, entities, types) and edges (semantic relations), with some universal operations (e.g., argmax, argmin, count, sum, and not).To generate a semantic graph, we define six types of actions as follows: Add Variable Node: This kind of actions denotes adding a variable node to semantic graph.In most cases a variable node is a return node (e.g., which, what), but can also be an intermediate variable node.We represent this kind of action as add variable:A, where A is the identifier of the variable node.
Add Entity Node: This kind of actions denotes adding an entity node (e.g., Texas, New York) and is represented as add entity node:texas.An entity node corresponds to an entity in knowledge bases.
Add Type Node: This kind of actions denotes adding a type node (e.g., state, city).We represent them as add type node:state.
Add Edge: This kind of actions denotes adding an edge between two nodes.An edge is a binary relation in knowledge bases.This kind of actions is represented as add edge:next to.
Operation Action: This kind of actions denotes adding an operation.An operation can be argmax, argmin, count, sum, not, et al.Because each operation has a scope, we define two actions for an operation, one is operation start action, represented as start operation:most, and the other is operation end action, represented as end operation:most.The subgraph within the start and end operation actions is its scope.
Argument Action: Some above actions need argument information.For example, which nodes the add edge:next to action should connect to.In this paper, we design argument actions for add type, add edge and operation actions, and the argument actions should be put directly after its main action.
For add type actions, we put an argument action to indicate which node this type node should constrain.The argument can be a variable node or an entity node.An argument action for a type node is represented as arg:A.
For add edge action, we use two argument actions: arg1 node and arg2 node, and they are represented as arg1 node:A and arg2 node:B.
We design argument actions for different operations.
For operation:sum, there are three arguments: arg-for, arg-in and arg-return.For operation:count, they are arg-for and arg-return.There are two arg-for arguments for operation:most.
We can see that each action encodes both structure and semantic information, which makes it easy to capture more information for parsing and can be tightly coupled with knowledge base.Furthermore, we find that action sequence encoding is more compact than linearized logical form (See Section 4.4 for more details).

Neural Sequence-to-Action Model
Based on the above action encoding mechanism, this section describes our encoder-decoder model for mapping sentence to action sequence.Specifically, similar to the RNN model in Jia and Liang (2016), this paper employs the attentionbased sequence-to-sequence RNN model.Figure 3 presents the overall structure.
Encoder: The encoder converts the input sequence x 1 , ..., x m to a sequence of contextsensitive vectors b 1 , ..., b m using a bidirectional RNN (Bahdanau et al., 2014).Firstly each word x i is mapped to its embedding vector, then these vectors are fed into a forward RNN and a backward RNN.The sequence of hidden states h 1 , ..., h m are generated by recurrently applying the recurrence: The recurrence takes the form of LSTM (Hochreiter and Schmidhuber, 1997).Finally, for each input position i, we define its context-sensitive embedding as . Decoder: This paper uses the classical attentionbased decoder (Bahdanau et al., 2014), which generates action sequence y 1 , ..., y n , one action at a time.At each time step j, it writes y j based on the current hidden state s j , then updates the hidden state to s j+1 based on s j and y j .The decoder is formally defined by the following equations: (5) P (y j = w|x, y 1:j−1 ) ∝ exp(U w [s j , c j ]) (7) where the normalized attention scores a ji defines the probability distribution over input words, indicating the attention probability on input word i at time j; e ji is un-normalized attention score.To incorporate constraints during decoding, an extra controller component is added and its details will be described in Section 3.3.Action Embedding.The above decoder needs the embedding of each action.As described above, each action has two parts, one for structure (e.g., add edge), and the other for semantic (e.g., next to).
As a result, actions may share the same structure or semantic part, e.g., add edge:next to and add edge:loc have the same structure part, and add node:A and arg node:A have the same semantic part.To make parameters more compact, we first embed the structure part and the semantic part independently, then concatenate them to get the final embedding.For instance, φ (y) (add edge:next to ) = [ φ 3 Constrained Semantic Parsing using Sequence-to-Action Model In this section, we describe how to build a neural semantic parser using sequence-to-action model.We first describe the training and the inference of our model, and then introduce how to incorporate structure and semantic constraints during decoding.

Training
Parameter Estimation.The parameters of our model include RNN parameters W (s) , W (a) , U w , word embeddings φ (x) , and action embeddings φ (y) .We estimate these parameters from training data.Given a training example with a sentence X and its action sequence Y , we maximize the likelihood of the generated sequence of actions given X.The objective function is: Standard stochastic gradient descent algorithm is employed to update parameters.Logical Form to Action Sequence.Currently, most datasets of semantic parsing are labeled with logical forms.In order to train our model, we convert logical forms to action sequences using semantic graph as an intermediate representation (See Figure 4 for an overview).Concretely, we transform logical forms into semantic graphs using a depth-first-search algorithm from root, and then generate the action sequence using the same order.Specifically, entities, variables and types are nodes; relations are edges.Conversely we can convert action sequence to logical form similarly.Based on the above algorithm, action sequences can be transformed into logical forms in a deterministic way, and the same for logical forms to action sequences.
Mechanisms for Handling Entities.Entities play an important role in semantic parsing (Yih et al., 2015).In Dong and Lapata (2016), entities are replaced with their types and unique IDs.In Jia and Liang (2016), entities are generated via attention-based copying mechanism helped with a lexicon.This paper implements both mechanisms and compares them in experiments.

Inference
Given a new sentence X, we predict action sequence by: where Y represents action sequence, and P (Y |X) is computed using Formula (1).Beam search is used for best action sequence decoding.Semantic graph and logical form can be derived from Y * as described in above.

Incorporating Constraints in Decoding
For decoding, we generate action sequentially.It is obviously that the next action has a strong correlation with the partial semantic graph generated to current, and illegal actions can be filtered using structure and semantic constraints.Specifically, we incorporate constraints in decoding using a controller.This procedure has two steps: 1) the controller constructs partial semantic graph using the actions generated to current; 2) the controller checks whether a new generated action can meet

Experiments
In this section, we assess the performance of our method and compare it with previous methods.

Datasets
We conduct experiments on three standard datasets: GEO, ATIS and OVERNIGHT.GEO contains natural language questions about US geography paired with corresponding Prolog database queries.Following Zettlemoyer and Collins (2005), we use the standard 600/280 instance splits for training/test.ATIS contains natural language questions of a flight database, with each question is annotated with a lambda calculus query.Following Zettlemoyer and Collins ( 2007), we use the standard 4473/448 instance splits for training/test.
OVERNIGHT contains natural language paraphrases paired with logical forms across eight domains.We evaluate on the standard train/test splits as Wang et al. (2015b).

Experimental Settings
Following the experimental setup of Jia and Liang (2016): we use 200 hidden units and 100dimensional word vectors for sentence encoding.The dimensions of action embedding are tuned on validation datasets for each corpus.We initialize all parameters by uniformly sampling within the interval [-0.1, 0.1].We train our model for a total of 30 epochs with an initial learning rate of 0.1, and halve the learning rate every 5 epochs after epoch 15.We replace word vectors for words occurring only once with an universal word vector.The beam size is set as 5. Our model is implemented in Theano (Bergstra et al., 2010), and the codes and settings are released on Github: https://github.com/dongpobeyond/Seq2Act. We evaluate different systems using the standard accuracy metric, and the accuracies on different datasets are obtained as same as Jia and Liang (2016).

Overall Results
We compare our method with state-of-the-art systems on all three datasets.Because all systems using the same training/test splits, we directly use the reported best performances from their original papers for fair comparison.
For our method, we train our model with three settings: the first one is the basic sequence-toaction model without constraints -Seq2Act; the second one adds structure constraints in decoding -Seq2Act (+C1); the third one is the full model which adds both structure and semantic GEO ATIS Previous Work Zettlemoyer and Collins (2005)  constraints -Seq2Act (+C1+C2).Semantic constraints (C2) are stricter than structure constraints (C1).Therefore we set that C1 should be first met for C2 to be met.So in our experiments we add constraints incrementally.The overall results are shown in Table 1-2.From the overall results, we can see that: 1) By synthetizing the advantages of semantic graph representation and the prediction ability of Seq2Seq model, our method achieves stateof-the-art performance on OVERNIGHT dataset, and gets competitive performance on GEO and ATIS dataset.In fact, on GEO our full model (Seq2Act+C1+C2) also gets the best test accuracy of 88.9 if under the same settings, which only falls behind Liang et al. (2011) our basic Seq2Act model gets better results than all Seq2Seq baselines.On GEO, the Seq2Act model achieve test accuracy of 87.5, better than the best accuracy 87.1 of Seq2Seq baseline.On ATIS, the Seq2Act model obtains a test accuracy of 84.6, the same as the best Seq2Seq baseline.On OVERNGIHT, the Seq2Act model gets a test accuracy of 78.0, better than the best Seq2Seq baseline gets 77.5.We argue that this is because our action sequence encoding is more compact and can capture more information.
3) Structure constraints can enhance semantic parsing by ensuring the validity of graph using the generated action sequence.In all three datasets, Seq2Act (+C1) outperforms the basic Seq2Act model.This is because a part of illegal actions will be filtered during decoding.4) By leveraging knowledge base schemas during decoding, semantic constraints are effective for semantic parsing.Compared to Seq2Act and Seq2Act (+C1), the Seq2Act (+C1+C2) gets the best performance on all three datasets.This is because semantic constraints can further filter semantic illegal actions using selectional preference and consistency between types.

Detailed Analysis
Effect of Entity Handling Mechanisms.This paper implements two entity handling mechanisms -Replacing (Dong and Lapata, 2016) which identifies entities and then replaces them with their types and IDs, and attention-based Copying (Jia and Liang, 2016).To compare the above two mechanisms, we train and test with our full model and the results are shown in Linearized Logical Form vs. Action Sequence.
Table 4 shows the average length of linearized logical forms used in previous Seq2Seq models and the action sequences of our model on all three datasets.As we can see, action sequence encoding is more compact than linearized logical form encoding: action sequence is shorter on all three datasets, 35.5%, 9.2% and 28.5% reduction in length respectively.The main advantage of a shorter/compact encoding is that it will reduce the influence of long distance dependency problem.

Error Analysis
We perform error analysis on results and find there are mainly two types of errors.
Unseen/Informal Sentence Structure.Some test sentences have unseen syntactic structures.For example, the first case in Predicted Parse: answer (A, count(B, state(B), A))

Under-Mapping
Sentence: Please show me first class flights from indianapolis to memphis one way leaving before 10am Gold Parse: (lambda x (and (flight x) (oneway x) (class type x first:cl) (< (departure time x) 1000:ti) (from x indianapolis:ci) (to x memphis:ci))) Predicted Parse: (lambda x (and (flight x) (oneway x) (< (departure time x) 1000:ti) (from x indianapolis:ci) (to x memphis:ci))) Table 5: Some examples for error analysis.Each example includes the sentence for parsing, with gold parse and predicted parse from our model.and informal structure, where entity word "Iowa" and relation word "borders" appear ahead of the question words "how many".For this problem, we can employ sentence rewriting or paraphrasing techniques (Chen et al., 2016;Dong et al., 2017) to transform unseen sentence structures into normal ones.Under-Mapping.As Dong and Lapata (2016) discussed, the attention model does not take the alignment history into consideration, makes some words are ignored during parsing.For example in the second case in Table 5, "first class" is ignored during the decoding process.This problem can be further solved using explicit word coverage models used in neural machine translation (Tu et al., 2016;Cohn et al., 2016) 5 Related Work Semantic parsing has received significant attention for a long time (Kate and Mooney, 2006;Clarke et al., 2010;Krishnamurthy and Mitchell, 2012;Artzi and Zettlemoyer, 2013;Berant and Liang, 2014;Quirk et al., 2015;Artzi et al., 2015;Reddy et al., 2017;Chen et al., 2018).Traditional methods are mostly based on the principle of compositional semantics, which first trigger predicates using lexicons and then compose them using grammars.The prominent grammars include SCFG (Wong and Mooney, 2007;Li et al., 2015), CCG (Zettlemoyer and Collins, 2005;Kwiatkowski et al., 2011;Cai and Yates, 2013), DCS (Liang et al., 2011;Berant et al., 2013), etc.As discussed above, the main drawback of grammar-based methods is that they rely on highquality lexicons, manually-built grammars, and hand-crafted features.
In recent years, one promising direction of semantic parsing is to use semantic graph as representation.Thus semantic parsing is modeled as a semantic graph generation process.Ge and Mooney (2009) build semantic graph by transforming syntactic tree.Bast and Haussmann (2015) identify the structure of a semantic query using three pre-defined patterns.Reddy et al. (2014Reddy et al. ( , 2016) ) use Freebase-based semantic graph representation, and convert sentences to semantic graphs using CCG or dependency tree.Yih et al. (2015) generate semantic graphs using a staged heuristic search algorithm.These methods are all based on manually-designed, heuristic generation process, which may suffer from syntactic parse errors (Ge and Mooney, 2009;Reddy et al., 2014Reddy et al., , 2016)), structure mismatch (Chen et al., 2016), and are hard to deal with complex sentences (Yih et al., 2015).
One other direction is to employ neural Seq2Seq models, which models semantic parsing as an end-to-end, sentence to logical form machine translation problem.Dong and Lapata (2016), Jia and Liang (2016) and Xiao et al. (2016) transform word sequence to linearized logical forms.One main drawback of these methods is that it is hard to capture and exploit structure and semantic constraints using linearized logical forms.Dong and Lapata (2016) propose a Seq2Tree model to capture the hierarchical structure of logical forms.
It has been shown that structure and semantic constraints are effective for enhancing semantic parsing.Krishnamurthy et al. (2017) use type constraints to filter illegal tokens.Liang et al. (2017) adopt a Lisp interpreter with pre-defined functions to produce valid tokens.Iyyer et al. (2017) adopt type constraints to generate valid actions.Inspired by these approaches, we also incorporate both structure and semantic constraints in our neural sequence-to-action model.
Transition-based approaches are important in both dependency parsing (Nivre, 2008;Henderson et al., 2013) and AMR parsing (Wang et al., 2015a).In semantic parsing, our method has a tight-coupling with knowledge bases, and constraints can be exploited for more accurate decoding.We believe this can also be used to enhance previous transition based methods and may also be used in other parsing tasks, e.g., AMR parsing.

Conclusions
This paper proposes Sequence-to-Action, a method which models semantic parsing as an end-to-end semantic graph generation process.By leveraging the advantages of semantic graph representation and exploiting the representation learning and prediction ability of Seq2Seq models, our method achieved significant performance improvements on three datasets.Furthermore, structure and semantic constraints can be easily incorporated in decoding to enhance semantic parsing.
For future work, to solve the problem of the lack of training data, we want to design weakly supervised learning algorithm using denotations (QA pairs) as supervision.Furthermore, we want to collect labeled data by designing an interactive UI for annotation assist like (Yih et al., 2016), which uses semantic graphs to annotate the meaning of sentences, since semantic graph is more natural and can be easily annotated without the need of expert knowledge.

Figure 1 :
Figure1: Overview of our method, with a demonstration example.

Figure 3 :
Figure 3: Our attention-based Sequence-to-Action RNN model, with a controller for incorporating constraints.

Figure 4 :
Figure 4: The procedure of converting between logical form and action sequence.

Table 2 :
* which uses extra handcrafted lexicons and Jia and Liang (2016)* which uses extra augmented training data.On ATIS our full model gets the second best test accuracy of 85.5, which only falls behind Rabinovich et al.Compared with the linearized logical form representation used in previous Seq2Seq baselines, our action sequence encoding is more effective for semantic parsing.On all three datasets, Soc.Blo.Bas.Res.Cal.Hou.Pub.Rec.Avg.Test accuracies on OVERNIGHT dataset, which includes eight domains: Social, Blocks, Basketball, Restaurants, Calendar, Housing, Publications, and Recipes.

Table 3 .
We can see that, Replacing mechanism outperforms Copying in all three datasets.This is because Replacing is done Table 5 has an unseen