TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation

We present TRANX, a transition-based neural semantic parser that maps natural language (NL) utterances into formal meaning representations (MRs). TRANX uses a transition system based on the abstract syntax description language for the target MR, which gives it two major advantages: (1) it is highly accurate, using information from the syntax of the target MR to constrain the output space and model the information flow, and (2) it is highly generalizable, and can easily be applied to new types of MR by just writing a new abstract syntax description corresponding to the allowable structures in the MR. Experiments on four different semantic parsing and code generation tasks show that our system is generalizable, extensible, and effective, registering strong results compared to existing neural semantic parsers.


Introduction
Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs). The target MRs can be defined according to a wide variety of formalisms. This include linguistically-motivated semantic representations that are designed to capture the meaning of any sentence such as λcalculus (Zettlemoyer and Collins, 2005) or the abstract meaning representations (Banarescu et al., 2013). Alternatively, for more task-driven approaches to semantic parsing, it is common for meaning representations to represent executable programs such as SQL queries (Zhong et al., 2017), robotic commands (Artzi and Zettlemoyer, 2013), smart phone instructions (Quirk et al., 2015), and even general-purpose programming languages like Python (Yin and Neubig, 2017;Rabinovich et al., 2017) and Java (Ling et al., 2016).
Because of these varying formalisms for MRs, the design of semantic parsers, particularly neural network-based ones has generally focused on a small subset of tasks -in order to ensure the syntactic well-formedness of generated MRs, a parser is usually specifically designed to reflect the domain-dependent grammar of MRs in the structure of the model (Zhong et al., 2017;Xu et al., 2017). To alleviate this issue, there have been recent efforts in neural semantic parsing with general-purpose grammar models (Xiao et al., 2016;Dong and Lapata, 2018). Yin and Neubig (2017) put forward a neural sequence-to-sequence model that generates tree-structured MRs using a series of tree-construction actions, guided by the task-specific context free grammar provided to the model a priori. Rabinovich et al. (2017) propose the abstract syntax networks (ASNs), where domain-specific MRs are represented by abstract syntax trees (ASTs, Fig. 2 Left) specified under the abstract syntax description language (ASDL) framework (Wang et al., 1997). An ASN employs a modular architecture, generating an AST using specifically designed neural networks for each construct in the ASDL grammar.
Inspired by this existing research, we have developed TRANX, a TRANsition-based abstract syntaX parser for semantic parsing and code generation. TRANX is designed with the following principles in mind: • Generalization ability TRANX employs ASTs as a general-purpose intermediate meaning representation, and the task-dependent grammar is provided to the system as external knowledge to guide the parsing process, therefore decoupling the semantic parsing procedure with specificities of grammars.
• Extensibility TRANX uses a simple transition system to parse NL utterances into tree-  Figure 1: Workflow of TRANX structured ASTs. The transition system is designed to be easy to extend, requiring minimal engineering to adapt to tasks that need to handle extra domain-specific information.
• Effectiveness We test TRANX on four semantic parsing (ATIS, GEO) and code generation (DJANGO, WIKISQL) tasks, and demonstrate that TRANX is capable of generalizing to different domains while registering strong performance, out-performing existing neural networkbased approaches on three of the four datasets (GEO, ATIS, DJANGO).

Methodology
Given an NL utterance, TRANX parses the utterance into a formal meaning representation, typically represented as λ-calculus logical forms, domain-specific, or general-purpose programming languages (e.g., Python). In the following description we use Python code generation as a running example, where a programmer's natural language intents are mapped to Python source code. Fig. 1 depicts the workflow of TRANX. We will present more use cases of TRANX in § 3. The core of TRANX is a transition system. Given an input NL utterance x, TRANX employs the transition system to map the utterance x into an AST z using a series of treeconstruction actions ( § 2.2). TRANX employs ASTs as the intermediate meaning representation to abstract over domain-specific structure of MRs. This parsing process is guided by the userdefined, domain-specific grammar specified under the ASDL formalism ( § 2.1). Given the generated AST z, the parser calls the user-defined function, AST to MR(·), to convert the intermediate AST into a domain-specific meaning representation y, completing the parsing process. TRANX uses a probabilistic model p(z|x), parameterized by a neural network, to score each hypothesis AST ( § 2.3).

Modeling ASTs using ASDL Grammar
TRANX uses ASTs as the general-purpose, intermediate semantic representation for MRs. ASTs are commonly used to represent programming languages, and can also be used to represent other tree-structured MRs (e.g., λ-calculus). The ASDL framework is a grammatical formalism to define ASTs. See Fig. 1 for an excerpt of the Python ASDL grammar. TRANX provides APIs to read such a grammar from human-readable text files.
An ASDL grammar has two basic constructs: types and constructors. A composite type is defined by the set of constructors under that type. For example, the stmt and expr composite types in Fig. 1 refer to Python statements and expressions, repectively, each defined by a series of constructors. A constructor specifies a language construct of a particular type using its fields. For instance, the Call constructor under the composite type expr denotes function call expressions, and has three fields: func, args and keywords. Each field in a constructor is also strongly typed, which specifies the type of value the field can hold. A field with a composite type can be instantiated by constructors of the same type. For example, the func field above can hold a constructor of type expr. There are also fields with primitive types, which store values. For example, the id field of Name constructor has a primitive type identifier, and is used to store identifier names. And the field s in the Str (string) constructor hold string literals. Finally, each field has a cardinality (single, optional ? and sequential * ), denoting the number of values the field holds.
An AST is then composed of multiple constructors, where each node on the tree corresponds to a typed field in a constructor (except for the root node, which denotes the root constructor). Depending on the cardinality of the field, a node can hold one or multiple constructors as its values. For instance, the func field with single car- Purple squares denote fields with sequential cardinality. Grey nodes denote primitive identifier fields. Fields are labeled with time steps at which they are generated. Right The action sequence used to construct the AST. Each action is labeled with its frontier field n f t . APPLYCONSTR actions are represented by their constructors.
dinality in the ASDL grammar in Fig. 1 is instantiated with one Name constructor, while the args field with sequential cardinality have multiple child constructors.

Transition System
Inspired by Yin and Neubig (2017) (hereafter YN17), we develop a transition system that decomposes the generation procedure of an AST into a sequence of tree-constructing actions. We now explain the transition system using our running example. Fig. 2 Right lists the sequence of actions used to construct the example AST. In high level, the generation process starts from an initial derivation AST with a single root node, and proceeds according to a top-down, left-to-right order traversal of the AST. At each time step, one of the following three types of actions is evoked to expand the opening frontier field n ft of the derivation: APPLYCONSTR[c] actions apply a constructor c to the opening composite frontier field which has the same type as c, populating the opening node using the fields in c. If the frontier field has sequential cardinality, the action appends the constructor to the list of constructors held by the field.
REDUCE actions mark the completion of the generation of child values for a field with optional (?) or multiple ( * ) cardinalities. GENTOKEN[v] actions populate a (empty) primitive frontier field with a token v. For example, the field f 7 on Fig. 2 has type identifier, and is instantiated using a single GENTOKEN action. For fields of string type, like f 8 , whose value could consists of multiple tokens (only one shown here), it can be filled using a sequence of GENTOKEN actions, with a special GENTOKEN[</f>] action to terminate the generation of token values.
The generation completes once there is no frontier field on the derivation. TRANX then calls the user specified function AST to MR(·) to convert the generated intermediate AST z into the target domain-specific MR y. TRANX provides various helper functions to ease the process of writing conversion functions. For example, our example conversion function to transform ASTs into Python source code contains only 32 lines of code. TRANX also ships with several built-in conversion functions to handle MRs commonly used in semantic parsing and code generation, like λcalculus logical forms and SQL queries.

Computing Action Probabilities p(z|x)
Given the transition system, the probability of an z is decomposed into the probabilities of the sequence of actions used to generate z p(z|x) = t p(a t |a <t , x), Following YN17, we parameterize the transitionbased parser p(z|x) using a neural encoderdecoder network with augmented recurrent connections to reflect the topology of ASTs.
Encoder The encoder is a standard bidirectional Long Short-term Memory (LSTM) network, which encodes the input utterance x of n tokens, The decoder is also an LSTM network, with its hidden state s t at each time temp given by where f LSTM is the LSTM transition function, and [:] denotes vector concatenation. a t−1 is the em-   Rabinovich et al. (2017) bedding of the previous action. We maintain an embedding vector for each action.s t is the attentional vector defined as in Luong et al. (2015) where c t is the context vector retrieved from input encodings {h i } n i=1 using attention. Parent Feeding p t is a vector that encodes the information of the parent frontier field n ft on the derivation, which is a concatenation of two vectors: the embedding of the frontier field n ft , and s pt , the decoder's state at which the constructor of n ft is generated by the APPLYCONSTR action. Parent feeding reflects the topology of treestructured ASTs, and gives better performance on generating complex MRs like Python code ( § 3).

Action Probabilities
The probability of an AP- PLYCONSTR[c] action with embedding a c is 2 For GENTOKEN actions, we employ a hybrid approach of generation and copying, allowing for out-of-vocabulary variable names and literals (e.g., "file.csv" in Fig. 1) in x to be directly copied to the derivation. Specifically, the action probability is defined to be the marginal probability = p(gen|a t , x)p(v|gen, a t , x)+ p(copy|a t , x)p(v|copy, a t , x) 2 REDUCE is treated as a special APPLYCONSTR action.  The binary probability p(gen|·) and p(copy|·) is given by softmax(Ws t ). The probability of generating v from a closed-set vocabulary, p(v|gen, ·) is defined similarly as Eq. (1). The copy probability of copying the i-th word in x is defined using a pointer network (Vinyals et al., 2015) p(x i |copy, a <t , x) = softmax(h i Ws t ).

Datasets
To demonstrate the generalization and extensibility of TRANX, we deploy our parser on four semantic parsing and code generation tasks.

Semantic Parsing
We evaluate on GEO and ATIS datasets. GEO is a collection of 880 U.S. geographical questions (e.g., "Which states border Texas?"), and ATIS is a set of 5,410 inquiries of flight information (e.g., "Show me flights from Dallas to Baltimore").
The MRs in the two datasets are defined in λ-calculus logical forms (e.g., "lambda x (and (state x) (next to x texas))" and "lambda x (and (flight x dallas) (to x baltimore))"). We use the pre-processed datasets released by Dong and Lapata (2016). We use the ASDL grammar defined in Rabinovich et al. (2017), as listed in Fig. 3.

Code Generation
We evaluate TRANX on both general-purpose (Python, DJANGO) and domain-specific (SQL, WIKISQL) code generation tasks. The DJANGO dataset (Oda et al., 2015) consists of 18,805 lines of Python source code extracted from the Django Web framework, with each line paired with an NL description. Code in this dataset covers various real-world use cases of Python, like string manipulation, I/O operation, exception handling, etc.
WIKISQL (Zhong et al., 2017) is a code generation task for domain-specific languages (i.e., SQL). It consists of 80,654 examples of NL questions (e.g., "What position did Calvin Mccarty play?") and annotated SQL queries (e.g., "SELECT Position FROM Table WHERE  Methods GEO ATIS ZH15 (Zhao and Huang, 2015) 88.9 84.2 ZC07 (Zettlemoyer and Collins, 2007) 89.0 84.6 WKZ14 (Wang et al., 2014) 90.4 91.3 Neural Network-based Models SEQ2TREE (Dong and Lapata, 2016) 87.1 84.6 ASN (Rabinovich et al., 2017) (Ling et al., 2016) 31.5 SEQ2TREE (Dong and Lapata, 2016) 39.4 NMT (Neubig, 2015) 45.1 LPN (Ling et al., 2016) 62.3 YN17 (Yin and Neubig, 2017) 71.6 TRANX (w/o parent feeding) 72.7 TRANX (w parent feeding) 73.7 Extending TRANX for WIKISQL In order to achieve strong results, existing parsers, like most models in Tab. 3, use specifically designed architectures to reflect the syntactic structure of SQL queries. We show that the transition system used by TRANX can be easily extended for WIKISQL with minimal engineering, while registering strong performance. First, we use define a simple ASDL grammar following the syntax of SQL (Fig. 4). We then augment the transition system with a special GENTOKEN action, SELCOLUMN[k]. A SELCOLUMN[k] action is used to populate a primitive column idx field in Select and Condition constructors in the grammar by selecting the k-th column in the table. To compute the probability of SELCOLUMN[k] actions, we use a pointer network over column encodings, where the column encodings are given by a bidirectional LSTM network over column names in an input table. This can be simply implemented by overriding the base Parser class in TRANX and modifying the functions that compute action probabilities.

Results
In this section we discuss our experimental results. All results are averaged over three runs with different random seeds.
Code Generation Tab. 2 lists the results on DJANGO. TRANX achieves state-of-the-art results on DJANGO. We also find parent feeding yields +1 point gain in accuracy, suggesting the importance of modeling parental connections in ASTs with complex domain grammars (e.g., Python). Tab. 3 shows the results on WIKISQL. We first discuss our standard model which only uses information of column names and do not use the contents of input tables during inference, as listed in the top two blocks in Tab. 3. We find TRANX, although just with simple extensions to adapt to this dataset, achieves impressive results and outperforms many task-specific methods. This demonstrates that TRANX is easy to extend to incorporate task-specific information, while maintaining its effectiveness. We also extend TRANX with a very simple answer pruning strategy, where we execute the candidate SQL queries in the beam against the input table, and prune those that yield empty execution results. Results are listed in the bottom two-blocks in Tab. 3, where we compare with systems that also use the contents of tables. Surprisingly, this (frustratingly) simple extension yields significant improvements, outperforming many task-specific models that use specifically de-signed, heavily-engineered neural networks to incorporate information of table contents.

Conclusion
We present TRANX, a transition-based abstract syntax parser. TRANX is generalizable, extensible and effective, achieving strong results on semantic parsing and code generation tasks.