Neural Enquirer: Learning to Query Tables in Natural Language

We propose N EURAL E NQUIRER — a neural network architecture for answering natural language (NL) questions given a knowledge base (KB) table. Unlike previous work on end-to-end training of semantic parsers, N EU - RAL E NQUIRER is fully “neuralized”: it gives distributed representations of queries and KB tables, and executes queries through a series of differentiable operations. The model can be trained with gradient descent using both end-to-end and step-by-step supervision. During training the representations of queries and the KB table are jointly optimized with the query execution logic. Our experiments show that the model can learn to execute complex NL queries on KB tables with rich structures.


Introduction
Natural language dialogue and question answering often involve querying a knowledge base (Wen et al., 2015;Berant et al., 2013).The traditional approach involves two steps: First, a given query Q is semantically parsed into an "executable" representation, which is often expressed in certain logical form Z (e.g., SQL-like queries).Second, the representation is executed against a KB from which an answer is obtained.For queries that involve complex semantics and logic (e.g., "Which city hosted the longest Olympic Games before the Games in Beijing?"), semantic parsing and query execution become extremely complex.For example, carefully hand-crafted features and rules are needed to correctly parse a complex query into its logical form (see example in the lower-left corner of Figure 1).To partially overcome this complexity, recent works (Clarke et al., 2010;Liang et al., 2011;Pasupat and Liang, 2015) attempt to "backpropagate" query execution results to revise the semantic representation of a query.This approach, however, is greatly hindered by the fact that traditional semantic parsing mostly involves rule-based features and symbolic manipulation, and is subject to intractable search space incurred by the great flexibility of natural language.
In this paper we propose NEURAL ENQUIRER -a neural network system that learns to understand NL queries and execute them on a KB table from examples of queries and answers.Unlike similar efforts along this line of research (Neelakantan et al., 2015), NEURAL ENQUIRER is a fully neuralized, end-to-end differentiable network that jointly models semantic parsing and query execution.It encodes queries and KB tables into distributed representations, and executes compositional queries against the KB through a series of differentiable operations.The model is trained using queryanswer pairs, where the distributed representations of queries and the KB are optimized together with the query execution logic in an end-to-end fashion.We demonstrate using a synthetic QA task that NEURAL ENQUIRER is capable of learning to execute complex compositional NL questions.

Model
Given an NL query Q and a KB    Figure 1 gives an illustrative example.It consists of the following components:

Query Encoder
Query Encoder abstracts the semantics of an NL query Q and encodes it into a query embedding q ∈ R d Q .Let {x 1 , x 2 , . . ., x T } be the embeddings of the words in Q, where x t ∈ R d W is from an embedding matrix L. We employ a bidirectional Gated Recurrent Unit (GRU) (Bahdanau et al., 2015) to summarize the sequence of word embeddings in forward and reverse orders.q is formed by concatenating the last hidden states in the two directions.
We remark that Query Encoder can find the representation of a rather general class of symbol sequences, agnostic to the actual representation of the query (e.g., natural language, SQL, etc).The model is able to learn the semantics of input queries through end-to-end training, making it a generic model for query understanding and query execution.

Table Encoder
Table Encoder converts a KB table T into a distributed representation, which is used as an input to executors.Suppose T has M rows and N columns.In our model, the n-th column is associated with a field name (e.g., host city).Each cell value is a word (e.g., Beijing) in the vocabulary.We use w mn to denote the cell value in row m column n, and w mn to denote its embedding.Let f n be the embedding of the field name for column n.For each entry (cell) w mn , Table Encoder computes a field, value composite embedding e mn ∈ R d E by fusing f n and w mn through a non-linear transformation:

Executor
NEURAL ENQUIRER executes an input query on a KB table through layers of execution.Each layer consists of an executor that, after learning, performs certain operation (e.g., select, max) relevant to the input query.An executor outputs intermediate execution results, referred to as annotations, which are saved in the external memory of the executor.A query is executed sequentially through a stack of executors.Such a cascaded architecture enables the model to answer complex, compositional queries.An example is given in Figure 1 in which descriptions of the operation each executor is assumed to perform for the query Q are shown.We will demonstrate in Section 4 that the model is capable of learning the operation logic of each executor via end-toend training.
As illustrated in Figure 2, an executor at Layer-(denoted as Executor-) consists of two major neural network components: a Reader and an Annotator.The executor processes a

Reader
As illustrated in Figure 3, for the m-th row with N field, value composite embeddings R m = {e m1 , e m2 , . . ., e mN }, the Reader fetches a read vector r m from R m via an attentive reading operation: where M −1 denotes the content of memory Layer-( −1), and is the normalized attention weights given by: where ω(•) is modeled as a Deep Neural Network (denoted as DNN ( ) 1 ).Since each executor models a specific type of computation, it should only attend to a subset of entries that are pertinent to its execution.This is modeled by the Reader.Our approach is related to the content-based addressing of Neural Turing Machines (Graves et al., 2014) and the attention mechanism in neural machine translation models (Bahdanau et al., 2015).

Annotator
The Annotator of Executor-computes row and table annotations based on read vectors fetched by the Reader.The results are stored in the -th memory layer M accessible to Executor-( +1).The last executor is the only exception, which outputs the final answer.
[Row annotations] Capturing row-wise execution result, the annotation a m for row m in Executoris given by 2 fuses the corresponding read vector r m , the results saved in the previous memory layer (row and table annotations a −1 m , g −1 ), and the query embedding q.Specifically, • row annotation a −1 m represents the local status of execution before Layer-; • table annotation g −1 summarizes the global status of execution before Layer-; • read vector r m stores the value of attentive reading; • query embedding q encodes the overall execution agenda.
All the above values are combined through DNN ( ) 2 to form the annotation of row m in the current layer.
[Table annotations] Capturing global execution state, a table annotation summarizes all row annotations via a global max pooling operation: Which country hosted the longest game before the game in Athens?How big is the country which hosted the shortest game?How many people watched the earliest game that lasts for more days than the game in 1956?
Table 1: Example queries for each query type, with annotated SQL-like logical form templates where g k = max({a 1 (k), a 2 (k), . . ., a M (k)}) is the maximum value among the k-th elements of all row annotations.

Last Layer Executor
Instead of computing annotations based on read vectors, the last executor in NEURAL ENQUIRER directly outputs the probability of an entry w mn in table T being the answer a: where f ANS (•) is modeled as a DNN (DNN ( ) 3 ).Note that the last executor, which is devoted to returning answers, could still carry out execution in DNN ( ) 3 .

Learning
NEURAL ENQUIRER can be trained in an end-toend (N2N) fashion.Given a set of N D query-tableanswer triples D = {(Q (i) , T (i) , y (i) )}, the model is optimized by maximizing the log-likelihood of goldstandard answers: The training can also be carried out with stronger guidance, i.e., step-by-step (SbS) supervision, by softly guiding the learning process via controlling the attention weights w(•) in Eq. (1).As an example, for Executor-1 in Figure 1, by biasing the attention weight of the host city field towards 1.0, only the value of host city will be fetched and sent to the Annotator.In this way we can "force" the executor to learn the where operation to find the row whose host city is Beijing.Formally, this is done by introducing additional supervision signal to Eq. ( 3): where α is a tuning weight, and L is the number of executors.f i, is the embedding of the field known a priori to be used by Executor-in answering the i-th example.

Experiments
In this section we evaluate NEURAL ENQUIRER on synthetic QA tasks with NL queries of varying compositional depths.

Synthetic QA Task
We present a synthetic QA task with a large number of QA examples at various levels of complexity to evaluate the performance of NEURAL EN-QUIRER.Starting with "artificial" tasks accelerates the development of novel deep models (Weston et al., 2015), and has gained increasing popularity in recent research on modeling symbolic computation using DNNs (Graves et al., 2014;Zaremba and Sutskever, 2014).
Our synthetic dataset consists of query-tableanswer triples {(Q (i) , T (i) , y (i) )}.To generate a triple, we first randomly sample a table T (i) of size 10×10 from a synthetic schema of Olympic Games.The cell values of T (i) are drawn from a vocabulary of 120 location names and 120 numbers.Figure 4 gives an example table.Next, we sample a query Q (i) generated using NL templates, and obtain its gold-standard answer y (i) on T (i)    2) different fields may be referred to by the same NL pattern (e.g, "in China" for host country and "in Beijing" for host city); (3) simple NL constituents may be grounded to complex logical operations (e.g., "after the game in Beijing" implies comparing between the values of year fields).
To simulate the read-world scenario where queries of various types are issued to the model, we construct two MIXED datasets, with 25K and 100K training examples respectively, where four types of queries are sampled with the ratio 1 : 1 : 1 : 2. Both datasets share the same testing set of 20K examples, 5K for each type of query.We enforce that no tables and queries are shared between training/testing sets.3 are 2, 3, 3.The length of word embeddings and annotations is 20.α is 0.2.We train the model using ADADELTA (Zeiler, 2012) on a Tesla K40 GPU.The training converges fast within 2 hours.

Setup
[Metric] We evaluate in terms of accuracy, defined as the fraction of correctly answered queries.
[Models] We compare the results of the following settings: • Sempre (Pasupat and Liang, 2015) is a state-ofthe-art semantic parser and serves as the baseline; • N2N, NEURAL ENQUIRER model trained using end-to-end setting (Sec 4.3); • SbS, NEURAL ENQUIRER model trained using step-by-step setting (Sec 4.4).

End-to-End Evaluation
Table 2 summarizes the results of SEMPRE and NEURAL ENQUIRER under different settings.We show both the individual performance for each query type and the overall accuracy.We evaluate SEM-PRE only on MIXED-25K because of its long training time even on this small dataset (about 3 days).
In this section we discuss the results under endto-end (N2N) training setting.On MIXED-25K, the relatively low performance of SEMPRE indicates that our QA task, although synthetic, is highly nontrivial.Surprisingly, our model outperforms SEMPRE on all types of queries, with a marginal gain on simple queries (SELECT WHERE, SU-PERLATIVE), and significant improvement on complex queries (WHERE SUPERLATIVE, NEST).On MIXED-100K, our model achieves a decent overall accuracy of 90.6%.These results show that in our QA task, NEURAL ENQUIRER is very effective in answering compositional NL queries, especially those with complex semantic structures compared with the state-of-the-art system.
To further understand why our model is capable of answering compositional queries, we study the attention weights w(•) of Readers (Eq. 1) for executors in intermediate layers, and the answer probability (Eq.2) the last executor outputs for each entry in the table.Those statistics are obtained on MIXED-100K.We sample two queries (Q 1 and Q 2 ) in the testing set that our model answers correctly and visualize their corresponding values in Figure 5.To Q 1 : How long was the game with the most medals that had fewer than 3,000 participants?
Z 1 : where # participants < 3,000, argmax(# duration, # medals) better understand the query execution process, we also give the logical forms (Z 1 and Z 2 ) of the two queries.Note that the logical forms are just for reference purpose and unknown by the model.We find that each executor actually learns its execution logic from just the correct answers in N2N training, which is in accordance with our assumption.The model executes Q 1 in three steps, with each of the last three executors performs a specific type of operation.For each row, Executor-3 takes the value of the # participants field as input, while Executor-4 attends to the # medals field.Finally, Executor-5 outputs a high probability for the # duration field in the 3-rd row.The attention weights for Executor-1 and Executor-2 appear to be meaningless because Q 1 requires only three steps of execution, and the model learns to defer the meaningful execution to the last three executors.Compared with the logical form Z 1 of Q 1 , we can deduce that Executor-3 "executes" the where clause in Z 1 to find row sets R satisfying the condition, and Executor-4 performs the first part of argmax to find the row r ∈ R with the maximum value of # medals, while Executor-5 outputs the value of # duration in row r.
Compared with Q 1 , Q 2 is more complicated.According to Z 2 , Q 2 involves an additional nest subquery to be solved by two extra executors, and requires a total of five steps of execution.The last three executors function similarly as in answering Q 1 , yet the execution logic for the first two executors (devoted to solving the sub-query) is a bit obscure, since their attention weights are scattered in-stead of being perfectly centered on the ideal fields as highlighted in red dashed rectangles.We posit that this is because during the end-to-end training, the supervision signal propagated from the top layer has decayed along the long path down to the first two executors, which causes vanishing gradients.

With Additional Step-by-Step Supervision
To alleviate the vanishing gradient problem when training on complex queries like Q 2 , we train the model using step-by-step (SbS) setting (Eq.4), where we encourage each intermediate executor to attend to the field that is known a priori to be relevant to its execution logic.Results are shown in Table 2 (column SbS).With stronger supervision signal, the model significantly outperforms the N2N setting, and achieves perfect accuracy on MIXED-100K.This shows that NEURAL ENQUIRER is capable of leveraging the additional supervision signal given to intermediate layers in SbS training.Let us revisit the query Q 2 in SbS setting.In contrast to the result in N2N setting (Figure 5) where the attention weights for the first two executors are obscure, now the weights are perfectly skewed towards each relevant field with a value of 1.0, which corresponds with the highlighted ideal weights.

Conclusion
We propose NEURAL ENQUIRER, a fully neural, end-to-end differentiable network that learns to execute compositional natural language queries on knowledge-base tables.

FindFigure 1 :
Figure 1: An overview of NEURAL ENQUIRER with five executors

Figure 3 :
Figure 3: Illustration of the Reader in Executor-.

[
Tuning] We adopt a model with five executors.The lengths of hidden states for GRU and DNNs are 150, 50.The numbers of layers for DNN

Table Encoder
where year < (select year, where host_city = Beijing), argmax(host_city, #_duration) table row-by-row.The An example table in the synthetic QA task (only one row shown) 1 SELECT WHERE [select Fa, where Fb = wb] 3 WHERE SUPERLATIVE [where Fa >|< wa, argmax/min(Fb, Fc)] How many people participated in the game in Beijing?How long was the game with the most medals that had fewer than 3,000 participants?In which city was the game hosted in 2012?How many medals were in the first game after 2008?
. Our task consists of four types of NL queries, with examples given in Table 1.We also give the logical form template

Table 2
scribed by different NL phrases (e.g., "How big is the country ..." and "What is the size of the country ..." for the country size field); ( Which country hosted the longest game before the game in Athens?Z 2 : where year < (select year,where host city=Athens), argmax(host country, # duration) Figure 5: Weights visualization of queries Q 1 and Q 2