Sentence Rewriting for Semantic Parsing

A major challenge of semantic parsing is the vocabulary mismatch problem between natural language and target ontology. In this paper, we propose a sentence rewriting based semantic parsing method, which can effectively resolve the mismatch problem by rewriting a sentence into a new form which has the same structure with its target logical form. Specifically, we propose two sentence-rewriting methods for two common types of mismatch: a dictionary-based method for 1-N mismatch and a template-based method for N-1 mismatch. We evaluate our entence rewriting based semantic parser on the benchmark semantic parsing dataset -- WEBQUESTIONS. Experimental results show that our system outperforms the base system with a 3.4% gain in F1, and generates logical forms more accurately and parses sentences more robustly.


Introduction
Semantic parsing is the task of mapping natural language sentences into logical forms which can be executed on a knowledge base (Zelle and Mooney, 1996;Zettlemoyer and Collins, 2005;Kate and Mooney, 2006;Wong and Mooney, 2007;Lu et al., 2008;Kwiatkowksi et al., 2010). Figure 1 shows an example of semantic parsing. Semantic parsing is a fundamental technique of natural language understanding, and has been used in many applications, such as question answering (Liang et al., 2011;He et al., 2014;Zhang et al., 2016) and information extraction (Krishnamurthy and Mitchell, 2012;Choi et al., 2015;Parikh et al., 2015).
Semantic parsing, however, is a challenging task. Due to the variety of natural language expressions, the same meaning can be expressed using different sentences. Furthermore, because logical forms depend on the vocabulary of targetontology, a sentence will be parsed into different logical forms when using different ontologies. For example, in below the two sentences s 1 and s 2 express the same meaning, and they both can be parsed into the two different logical forms lf 1 and lf 2 using different ontologies. s1 What is the population of Berlin? s2 How many people live in Berlin? lf1 λx.population(Berlin,x) lf2 count(λx.person(x)∧live(x,Berlin)) Based on the above observations, one major challenge of semantic parsing is the structural mismatch between a natural language sentence and its target logical form, which are mainly raised by the vocabulary mismatch between natural language and ontologies. Intuitively, if a sentence has the same structure with its target logical form, it is easy to get the correct parse, e.g., a semantic parser can easily parse s 1 into lf 1 and s 2 into lf 2 . On the contrary, it is difficult to parse a sentence into its logic form when they have different structures, e.g., s 1 → lf 2 or s 2 → lf 1 .
To resolve the vocabulary mismatch problem, (a) An example using traditional method s0 : What is the name of Sonia Gandhis daughter? l0 : λx.child(S.G.,x) r0 : {Rahul Gandhi (Wrong answer), Priyanka Vadra} (b) An example using our method s0 : What is the name of Sonia Gandhis daughter? s1 : What is the name of Sonia Gandhis female child? l1 : λx.child(S.G.,x)∧gender(x,female) r1 : {Priyanka Vadra} Table 1: Examples of (a) sentences s 0 , possible logical form l 0 from traditional semantic parser, result r 0 for the logical form l 0 ; (b) possible sentence s 1 from rewriting for the original sentence s 0 , possible logical form l 1 for sentence s 1 , result r 1 for l 1 . Rahul Gandhi is a wrong answer, as he is the son of Sonia Gandhi.
this paper proposes a sentence rewriting approach for semantic parsing, which can rewrite a sentence into a form which will have the same structure with its target logical form. Table 1 gives an example of our rewriting-based semantic parsing method. In this example, instead of parsing the sentence "What is the name of Sonia Gandhis daughter?" into its structurally different logical form childOf.S.G.∧gender.female directly, our method will first rewrite the sentence into the form "What is the name of Sonia Gandhis female child?", which has the same structure with its logical form, then our method will get the logical form by parsing this new form. In this way, the semantic parser can get the correct parse more easily. For example, the parse obtained through traditional method will result in the wrong answer "Rahul Gandhi", because it cannot identify the vocabulary mismatch between "daughter" and child∧female 1 . By contrast, by rewriting "daughter" into "female child", our method can resolve this vocabulary mismatch. Specifically, we identify two common types of vocabulary mismatch in semantic parsing: 1. 1-N mismatch: a simple word may correspond to a compound formula. For example, the word "daughter" may correspond to the compound formula child∧female.
2. N-1 mismatch: a logical constant may correspond to a complicated natural language expression, e.g., the formula population can be expressed using many phrases such as "how many people" and "live in".
To resolve the above two vocabulary mismatch problems, this paper proposes two sentence rewriting algorithms: One is a dictionary-based sentence rewriting algorithm, which can resolve the 1-N mismatch problem by rewriting a word using its explanation in a dictionary. The other is a template-based sentence rewriting algorithm, which can resolve the N-1 mismatch problem by rewriting complicated expressions using paraphrase template pairs.
Given the generated rewritings of a sentence, we propose a ranking function to jointly choose the optimal rewriting and the correct logical form, by taking both the rewriting features and the semantic parsing features into consideration.
We conduct experiments on the benchmark WEBQUESTIONS dataset (Berant et al., 2013). Experimental results show that our method can effectively resolve the vocabulary mismatch problem and achieve accurate and robust performance.
The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 describes our sentence rewriting method for semantic parsing. Section 4 presents the scoring function which can jointly ranks rewritings and logical forms. Section 5 discusses experimental results. Section 6 concludes this paper.
One major challenge of semantic parsing is how to scale to open-domain situation like Freebase and Web. A possible solution is to learn lexicons from large amount of web text and a knowledge base using a distant supervised method (Krishna-murthy and Mitchell, 2012;Cai and Yates, 2013a;Berant et al., 2013). Another challenge is how to alleviate the burden of annotation. A possible solution is to employ distant-supervised techniques (Clarke et al., 2010;Liang et al., 2011;Cai and Yates, 2013b;, or unsupervised techniques (Poon and Domingos, 2009;Goldwasser et al., 2011;Poon, 2013).
There were also several approaches focused on the mismatch problem. Kwiatkowski et al. (2013) addressed the ontology mismatch problem (i.e., two ontologies using different vocabularies) by first parsing a sentence into a domainindependent underspecified logical form, and then using an ontology matching model to transform this underspecified logical form to the target ontology. However, their method is still hard to deal with the 1-N and the N-1 mismatch problems between natural language and target ontologies. Berant and Liang (2014) addressed the structure mismatch problem between natural language and ontology by generating a set of canonical utterances for each candidate logical form, and then using a paraphrasing model to rerank the candidate logical forms. Their method addresses mismatch problem in the reranking stage, cannot resolve the mismatch problem when constructing candidate logical forms. Compared with these two methods, we approach the mismatch problem in the parsing stage, which can greatly reduce the difficulty of constructing the correct logical form, through rewriting sentences into the forms which will be structurally consistent with their target logic forms.
Sentence rewriting (or paraphrase generation) is the task of generating new sentences that have the same meaning as the original one. Sentence rewriting has been used in many different tasks, e.g., used in statistical machine translation to resolve the word order mismatch problem He et al., 2015). To our best knowledge, this paper is the first work to apply sentence rewriting for vocabulary mismatch problem in semantic parsing.

Sentence Rewriting for Semantic Parsing
As discussed before, the vocabulary mismatch between natural language and target ontology is a big challenge in semantic parsing. In this section, we describe our sentence rewriting algorithm for  solving the mismatch problem. Specifically, we solve the 1-N mismatch problem by dictionarybased rewriting and solve the N-1 mismatch problem by template-based rewriting. The details are as follows.

Dictionary-based Rewriting
In the 1-N mismatch case, a word will correspond to a compound formula, e.g., the target logical form of the word "daughter" is child∧female (Table 2 has more examples).
To resolve the 1-N mismatch problem, we rewrite the original word ("daughter") into an expression ("female child") which will have the same structure with its target logical form (child∧female). In this paper, we rewrite words using their explanations in a dictionary. This is because each word in a dictionary will be defined by a detailed explanation using simple words, which often will have the same structure with its target formula. Table 2 shows how the vocabulary mismatch between a word and its logical form can be resolved using its dictionary explanation. For instance, the word "daughter" is explained as "female child" in Wiktionary, which has the same structure as child∧female.
In most cases, only common nouns will result in the 1-N mismatch problem. Therefore, in order to control the size of rewritings, this paper only rewrite the common nouns in a sentence by replacing them with their dictionary explanations. Because a sentence usually will not contain too many common nouns, the size of candidate rewritings is thus controllable. Given the generated rewritings of a sentence, we propose a sentence selection model to choose the best rewriting using multiple features (See details in Section 4). Table 3 shows an example of the dictionarybased rewriting. In Table 3, the example sentence s contains two common nouns ("name" and "daughter"), therefore we will generate three rewritings r 1 , r 2 and r 3 . Among these rewritings, s : What is the name of Sonia Gandhis daughter? r1: What is the reputation of Sonia Gandhis daughter? r2: What is the name of Sonia Gandhis female child? r3: What is the reputation of Sonia Gandhis female child? Table 3: An example of the dictionary-based sentence rewriting.
the candidate rewriting r 2 is what we expected, as it has the same structure with the target logical form and doesn't bring extra noise (i.e., replacing "name" with its explanation "reputation").
For the dictionary used in rewriting, this paper uses Wiktionary. Specifically, given a word, we use its "Translations" part in the Wiktionary as its explanation. Because most of the 1-N mismatch are caused by common nouns, we only collect the explanations of common nouns. Furthermore, for polysomic words which have several explanations, we only use their most common explanations. Besides, we ignore explanations whose length are longer than 5.

Template-based Rewriting
In the N-1 mismatch case, a complicated natural language expression will be mapped to a single logical constant. For example, considering the following mapping from the natural language sentence s to its logical form lf based on Freebase ontology: s: How many people live in Berlin? lf : λx.population (Berlin,x) where the three words: "how many" (count), "people" (people) and "live in" (live) will map to the predicate population together. Ta  To resolve the N-1 mismatch problem, we propose a template rewriting algorithm, which can rewrite a complicated expression into its simpler form. Specifically, we rewrite sentences based on a set of paraphrase template pairs P = {(t i1 , t i2 )|i = 1, 2, ..., n}, where each template t Template 1 Template 2 How many people live in $y What is the population of $y What money in $y is used What is the currency of $y What school did $y go to What is the education of $y What language does $y speak officially What is the official language of $y is a sentence with an argument slot $y, and t i1 and t i2 are paraphrases. In this paper, we only consider single-slot templates. Table 5 shows several paraphrase template pairs. Given the template pair database and a sentence, our template-based rewriting algorithm works as follows: 1. Firstly, we generate a set of candidate templates ST = {st 1 , st 2 , ..., st n } of the sentence by replacing each named entity within it by "$y". For example, we will generate template "How many people live in $y" from the sentence "How many people live in Berlin".
2. Secondly, using the paraphrase template pair database, we retrieve all possible rewriting template pairs (t 1 , t 2 ) with t 1 ∈ ST , e.g., we can retrieve template pair ("How many people live there in $y", "What is the population of $y" for t 2 ) using the above ST .
3. Finally, we get the rewritings by replacing the argument slot "$y" in template t 2 with the corresponding named entity. For example, we get a new candidate sentence "What is the population of Berlin" by replacing "$y" in t 2 with Berlin. In this way we can get the rewriting we expected, since this rewriting will match its target logical form population(Berlin).
To control the size and measure the quality of rewritings using a specific template pair, we also define several features and the similarity between template pairs (See Section 4 for details).
To build the paraphrase template pair database, we employ the method described in Fader et al. (2014) to automatically collect paraphrase template pairs. Specifically, we use the WikiAnswers paraphrase corpus (Fader et al., 2013), which contains 23 million question-clusters, and all ques-  tions in the same cluster express the same meaning. Table 6 shows two paraphrase clusters from the WikiAnswers corpus. To build paraphrase template pairs, we first replace the shared noun words in each cluster with the placeholder "$y", then each two templates in a cluster will form a paraphrase template pair. To filter out noisy template pairs, we only retain salient paraphrase template pairs whose co-occurrence count is larger than 3.

Sentence Rewriting based Semantic Parsing
In this section we describe our semantic rewriting based semantic parsing system. Figure 2 presents the framework of our system. Given a sentence, we first rewrite it into a set of new sentences, then we generate candidate logical forms for each new sentence using a base semantic parser, finally we score all logical forms using a scoring function and output the best logical form as the final result.
In following, we first introduce the used base semantic parser, then we describe the proposed scoring function.

Base Semantic Parser
In this paper, we produce logical forms for each sentence rewritings using an agenda-based semantic parser (Berant and Liang, 2015), which is based on the lambda-DCS proposed by Liang (2013). For parsing, we use the lexicons and the grammars released by Berant et al. (2013), where lexicons are used to trigger unary and binary predicates, and grammars are used to conduct logical forms. The only difference is that we also use the composition rule to make the parser can handle complicated questions involving two binary predicates, e.g., child.obama∧gender.female.  Figure 2: The framework of our sentence rewriting based semantic parsing.
For model learning and sentence parsing, the base semantic parser learned a scoring function by modeling the policy as a log-linear distribution over (partial) agenda derivations Q: The policy parameters are updated as follows: The reward function R(h) measures the compatibility of the resulting derivation, and η is the learning rate which is set using the AdaGrad algorithm (Duchi et al., 2011). The target history h target is generated from the root derivation d * with highest reward out of the K (beam size) root derivations, using local reweighting and history compression.

Scoring Function
To select the best semantic parse, we propose a scoring function which can take both sentence rewriting features and semantic parsing features into consideration. Given a sentence x, a generated rewriting x and the derivation d of x , we score them using follow function: This scoring function is decomposed into two parts: one for sentence rewriting -θ 1 · φ(x, x ) and the other for semantic parsing -θ 2 · φ(x , d). Following Berant and Liang (2015), we update the parameters θ 2 of semantic parsing features as the Definitions: The function REW RIT IN G(xi) returns a set of candidate sentences by applying sentence rewriting on sentence x; P ARSE(p θ , x) parses the sentence x based on current parameters θ, using agendabased parsing; CHOOSEORACLE(h0) chooses the derivation with highest reward from the root of h0; CHOOSEORACLE(Htarget) chooses the derivation with highest reward from a set of derivations. CHOOSEORACLE(h * target ) chooses the new sentence that results in derivation with highest reward.

Parameter Learning Algorithm
To estimate the parameters θ 1 and θ 2 , our learning algorithm uses a set of question-answer pairs (x i , y i ). Following Berant and Liang (2015), our updates for θ 1 and θ 2 do not maximize reward nor the log-likelihood. However, the reward provides a way to modulate the magnitude of the updates. Specifically, after each update, our model results in making the derivation, which has the highest reward, to get a bigger score. Table 7 presents our learning algorithm.

Features
As described in Section 4.3, our model uses two kinds of features. One for the semantic parsing module which are simply the same features described in Berant and Liang (2015). One for the sentence rewriting module these features are defined over the original sentence, the generated sentence rewritings and the final derivations: Features for dictionary-based rewriting. Given a sentence s 0 , when the new sentence s 1 is generated by replacing a word to its explanation w → ex, we will generate four features: The first feature indicates the word replaced. The second feature indicates the replacement w → ex we used. The final two features are the POS tags of the left word and the right word of w in s 0 .
Features for template-based rewriting. Given a sentence s 0 , when the new sentence s 1 is generated through a template based rewriting t 1 → t 2 , we generate four features: The first feature indicates the template pair (t 1 , t 2 ) we used. The second feature is the similarity between the sentence s 0 and the template t 1 , which is calculated using the word overlap between s 0 and t 1 . The third feature is the compatibility of the template pair, which is the pointwise mutual information (PMI) between t 1 and t 2 in the WikiAnswers corpus. The final feature is triggered when the target logical form only contains an atomic formula (or predicate), and this feature indicates the mapping from template t 2 to the predicate p.

Experiments
In this section, we assess our method and compare it with other methods.

Experimental Settings
Dataset: We evaluate all systems on the benchmark WEBQUESTIONS dataset (Berant et al., 2013), which contains 5,810 question-answer pairs. All questions are collected by crawling the Google Suggest API, and their answers are obtained using Amazon Mechanical Turk. This dataset covers several popular topics and its questions are commonly asked on the web. According to Yao (2015), 85% of questions can be answered by predicting a single binary relation. In our experiments, we use the standard train-test split (Berant et al., 2013), i.e., 3,778 questions (65%) for training and 2,032 questions (35%) for testing, and divide the training set into 3 random 80%-20% splits for development. Furthermore, to verify the effectiveness of our method on solving the vocabulary mismatch problem, we manually select 50 mismatch test examples from the WEBQUESTIONS dataset, where all sentences have different structure with their target logical forms, e.g., "Who is keyshia cole dad?" and "What countries have german as the official language?". System Settings: In our experiments, we use the Freebase Search API for entity lookup. We load Freebase using Virtuoso, and execute logical forms by converting them to SPARQL and querying using Virtuoso. We learn the parameters of our system by making three passes over the training dataset, with the beam size K = 200, the dictionary rewriting size K D = 100, and the template rewriting size K T = 100. Baselines: We compare our method with several traditional systems, including semantic parsing based systems (Berant et al., 2013;Berant and Liang, 2014;Berant and Liang, 2015;Yih et al., 2015), information extraction based systems (Yao and Van Durme, 2014;Yao, 2015), machine translation based systems (Bao et al., 2014), embedding based systems (Bordes et al., 2014;Yang et al., 2014), and QA based system (Bast and Haussmann, 2015). Evaluation: Following previous work (Berant et al., 2013), we evaluate different systems using the fraction of correctly answered questions. Because golden answers may have multiple values, we use the average F1 score as the main evaluation metric. Table 8 provides the performance of all base-lines and our method. We can see that:

Experimental Results
1. Our method achieved competitive performance: Our system outperforms all baselines and get the best F1-measure of 53.1 on WE-BQUESTIONS dataset.
2. Sentence rewriting is a promising technique for semantic parsing: By employing sentence rewriting, our system gains a 3.4% F1 improvement over the base system we used (Berant and Liang, 2015).
3. Compared to all baselines, our system gets the highest precision. This result indicates that our parser can generate more-accurate logical forms by sentence rewriting. Our system also achieves the second highest recall, which is a competitive performance. Interestingly, both the two systems with the highest recall (Bast and Haussmann, 2015  2015) rely on extra-techniques such as entity linking and relation matching.
The effectiveness on mismatch problem. To analyze the commonness of mismatch problem in semantic parsing, we randomly sample 500 questions from the training data and do manually analysis, we found that 12.2% out of the sampled questions have mismatch problems: 3.8% out of them have 1-N mismatch problem and 8.4% out of them have N-1 mismatch problem.
To verify the effectiveness of our method on solving the mismatch problem, we conduct experiments on the 50 mismatch test examples and Table  9 shows the performance. We can see that our system can effectively resolve the mismatch between natural language and target ontology: compared to the base system, our system achieves a significant 54.5% F1 im-provement.  When scaling a semantic parser to open-domain situation or web situation, the mismatch problem will be more common as the ontology and language complexity increases (Kwiatkowski et al., 2013). Therefore we believe the sentence rewriting method proposed in this paper is an important technique for the scalability of semantic parser. The effect of different rewriting algorithms. To analyze the contribution of different rewriting methods, we perform experiments using different sentence rewriting methods and the results are presented in Table 10. We can see that:  1. Both sentence rewriting methods improved the parsing performance, they resulted in 1.8% and 3.2% F1 improvements respectively 2 .
2. Compared with the dictionary-based rewriting method, the template-based rewriting method can achieve higher performance improvement. We believe this is because N-1 mismatch problem is more common in the WEBQUESTIONS dataset.
3. The two rewriting methods are good complementary of each other. The semantic parser can achieve a higher performance improvement when using these two rewriting methods together.
The effect on improving robustness. We found that the template-based rewriting method can greatly improve the robustness of the base semantic parser. Specially, the template-based method can rewrite similar sentences into a uniform template, and the (template, predicate) feature can provide additional information to reduce the uncertainty during parsing. For example, using only the uncertain alignments from the words "people" and "speak" to the two predicates official language and language spoken, the base parser will parse the sentence "What does jamaican people speak?" into the incorrect logical form official language.jamaican in our experiments, rather than into the correct form language spoken.jamaican (See the final example in Table 11). By exploiting the alignment from the template "what language does $y people speak" to the predicate , our system can parse the above sentence correctly. The effect on OOV problem. We found that the sentence rewriting method can also provide extra  Table 11: Examples which our system generates more accurate logical form than the base semantic parser. O is the original sentence; R is the generated sentence from sentence rewriting (with the highest score for the model, including rewriting part and parsing part); LF is the target logical form.
profit for solving the OOV problem. Traditionally, if a sentence contains a word which is not covered by the lexicon, it will cannot be correctly parsed. However, with the help of sentence rewriting, we may rewrite the OOV words into the words which are covered by our lexicons. For example, in Table  11 the 3rd question "What are some of the traditions of islam?" cannot be correctly parsed as the lexicons dont cover the word "tradition". Through sentence rewriting, we can generate a new sentence "What is of the religion of islam?", where all words are covered by the lexicons, in this way the sentence can be correctly parsed.

Error Analysis
To better understand our system, we conduct error analysis on the parse results. Specifically, we randomly choose 100 questions which are not correctly answered by our system. We found that the errors are mainly raised by following four reasons (See Table 12   The main reasons of parsing errors, the ratio and an example for each reason are also provided.
The first reason is the label issue. The main label issue is incompleteness, i.e., the answers of a question may not be labeled completely. For example, for the question "Who does nolan ryan play for?", our system returns 4 correct teams but the golden answer only contain 2 teams. One another label issue is the error labels. For example, the gold answer of the question "What state is barack obama from?" is labeled as "Illinois", however, the correct answer is "Hawaii".
The second reason is the n-ary predicate problem (n > 2). Currently, it is hard for a parser to conduct the correct logical form of n-ary predicates. For example, the question "What year did the seahawks win the superbowl?" describes an nary championship event, which gives the championship and the champion of the event, and expects the season. We believe that more research attentions should be given on complicated cases, such as the n-ary predicates parsing.
The third reason is temporal clause. For example, the question "Who did nasri play for before arsenal?" contains a temporal clause "before". We found temporal clause is complicated and makes it strenuous for the parser to understand the sentence.
The fourth reason is superlative case, which is a hard problem in semantic parsing. For example, to answer "What was the name of henry viii first wife?", we should choose the first one from a list ordering by time. Unfortunately, it is difficult for the current parser to decide what to be ordered and how to order.
There are also many other miscellaneous error cases, such as spelling error in the question, e.g., "capitol" for "capital", "mary" for "marry".

Conclusions
In this paper, we present a novel semantic parsing method, which can effectively deal with the mismatch between natural language and target ontology using sentence rewriting. We resolve two common types of mismatch (i) one word in natural language sentence vs one compound formula in target ontology (1-N), (ii) one complicated expression in natural language sentence vs one formula in target ontology (N-1). Then we present two sentence rewriting methods, dictionary-based method for 1-N mismatch and template-based method for N-1 mismatch. The resulting system significantly outperforms the base system on the WEBQUES-TIONS dataset.
Currently, our approach only leverages simple sentence rewriting methods. In future work, we will explore more advanced sentence rewriting methods. Furthermore, we also want to employ sentence rewriting techniques for other challenges in semantic parsing, such as the spontaneous, unedited natural language input, etc. Jonathan Berant and Percy Liang. 2015