Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base

We consider the problem of conversational question answering over a large-scale knowledge base. To handle huge entity vocabulary of a large-scale knowledge base, recent neural semantic parsing based approaches usually decompose the task into several subtasks and then solve them sequentially, which leads to following issues: 1) errors in earlier subtasks will be propagated and negatively affect downstream ones; and 2) each subtask cannot naturally share supervision signals with others. To tackle these issues, we propose an innovative multi-task learning framework where a pointer-equipped semantic parsing model is designed to resolve coreference in conversations, and naturally empower joint learning with a novel type-aware entity detection model. The proposed framework thus enables shared supervisions and alleviates the effect of error propagation. Experiments on a large-scale conversational question answering dataset containing 1.6M question answering pairs over 12.8M entities show that the proposed framework improves overall F1 score from 67% to 79% compared with previous state-of-the-art work.


Introduction
Recent decades have seen the development of AIdriven personal assistants (e.g., Siri, Alexa, Cortana, and Google Now) that often need to answer factorial questions.Meanwhile, large-scale knowledge base (KB) like DBPedia (Auer et al., 2007) or Freebase (Bollacker et al., 2008) has been built to store world's facts in a structure database, which is used to support open-domain question answering (QA) in those assistants.
Neural semantic parsing based approach (Jia and Liang, 2016;Reddy et al., 2014;Dong and Lapata, 2016;Liang et al., 2016;Dong and Lapata, 2018;Guo et al., 2018) is gaining rising attention for knowledge-based question answer (KB-QA) in recent years since it does not rely on handcrafted features and is easy to adapt across domains.Traditional approaches usually retrieve answers from a small KB (e.g., small table) (Jia and Liang, 2016;Xiao et al., 2016) and are difficult to handle large-scale KBs.Many recent neural semantic parsing based approaches for KB-QA take a stepwise framework to handle this issue.For example, Liang et al. (2016), Dong and Lapata (2016), and Guo et al. (2018) first use an entity linking system to find entities in a question, and then learn a model to map the question to logical form based on that.Dong and Lapata (2018) decompose the semantic parsing process into two stages.They first generate a rough sketch of logical form based on low-level features, and then fill in missing details by considering both the question and the sketch.
However, these stepwise approaches have two issues.First, errors in upstream subtasks (e.g., entity detection and linking, relation classification) are propagated to downstream ones (e.g., semantic parsing), resulting in accumulated errors.For example, case studies in previous works (Yih et al., 2015;Dong and Lapata, 2016;Xu et al., 2016;Guo et al., 2018) show that entity linking error is one of the major errors leading to wrong predictions in KB-QA.Second, since models for the subtasks are learned independently, the supervision signals cannot be shared among the models for mutual benefits.
To tackle issues mentioned above, we propose a novel multi-task semantic parsing framework for KB-QA.Specifically, an innovative pointerequipped semantic parsing model is first designed for two purposes: 1) built-in pointer network toward positions of entity mentions in the question arXiv:1910.05069v1[cs.CL]   can naturally empower multi-task learning with conjunction of upstream sequence labeling subtask, i.e., entity detection; and 2) it explicitly takes into account the context of entity mentions by using the supervision of the pointer network.Besides, a type-aware entity detection method is proposed to produce accurate entity linking results, in which, a joint prediction space combining entity detection and entity type is employed, and the predicted type is then used to filter entity linking results during inference phase.
The proposed framework has certain merits.First, since the two subtasks, i.e., pointerequipped semantic parsing and entity detection, are closely related, learning them within a single model simultaneously makes the best of supervisions and improves performance of KB-QA task.
Second, considering entity type prediction is crucial for entity linking, our joint learning framework combining entity mention detection with type prediction leverages contextual information, and thus further reduces errors in entity linking.
Third, our approach is naturally beneficial to coreference resolution for conversational QA due to rich contextual features captured for entity mention, compared to previous works directly employing low-level features (e.g., mean-pooling over word embeddings) as the representation of an entity.This is verified via our experiments in §4.2.
We evaluate the proposed framework on the CSQA (Saha et al., 2018) dataset, which is the largest public dataset for complex conversational question answering over a large-scale knowledge base.Experimental results show that the overall F1 score is improved by 12.56% compared with strong baselines, and the improvements are consistent for all question types in the dataset.

Task Definition
In this work, we target the problem of conversational question answering over a large-scale knowledge base.Formally, in training data, question U denotes an user utterance from a dialog, which is concatenated dialog history for handling ellipsis or coreference in conversations, and the question is labeled with its answer A. Besides, "IOB" (Insider-Outside-Beginning) tagging and entities linking to KB are also labeled for entity mentions in U to train an entity detection model.
We employ a neural semantic parsing based approach to tackle the problem.That is, given a question, a semantic parsing model is used to produce a logical form which is then executed on the KB to retrieve an answer.We decompose the approach into two subtasks, i.e., entity detection for entity linking and semantic parsing for logical form generation.The former employs IOB tagging and corresponding entities as supervision, while the latter uses a gold logical form as supervision, which may be obtained by conducting intensive BFS1 over KB if only final answers (i.e., weak supervision) are provided.

Approach
This section begins with a description of grammars and logic forms used in this work.Then, the proposed model is presented, and finally, model's training and inference are introduced.

Grammar and Logical Form
Grammar We use similar grammars and logical forms as defined in Guo et al. (2018), with minor modification for better adaptation to the CSQA dataset.The grammars are briefly summarized in Table 1, where each operator consists of three components: semantic category, a function name, and a list of arguments with specified semantic categories.Semantic categories can be classified into two groups here w.r.t. the ways for instantiation: one is referred to as entry semantic category (i.e., {e, p, tp, u num} for entities, predicates, types, numbers) whose instantiations are constants parsed from a question, and another is referred to as intermediate semantic category (i.e., {start, set, num, bool}) whose instantiation is the output of an operator execution.
Logical Form A KB-executable logical form is intrinsically formatted as an ordered tree where the root is the semantic category start, each child node is constrained by the nonterminal (i.e., the un-instantiated semantic category in parenthesis) of its parent operator, and leaf nodes are instantiated entry semantic categories, i.e., constants.
To make the best of well-performed sequence to sequence (seq2seq) models (Vaswani et al., 2017;Bahdanau et al., 2015) as a base for semantic parsing, we represent a tree-structured logical form as a sequence of operators and constants via depthfirst traversal over the tree.Note, given guidance of grammars, we can recover corresponding tree structure from a sequence-formatted logical form.

Proposed Model
The structure of our proposed Multi-task Smantic Parsing (MaSP) model is illustrated in Figure 1.The model consists of four components: i.e., word embedding, contextual encoder, entity detection and pointer-equipped logical form decoder.

Embedding and Contextual Encoder
To handle ellipsis or coreference in conversations, our model takes current user question combined with dialog history as the input question U .In particular, all those sentences are concatenated with a [SEP] separated, and then a special token [CTX] is appended.We apply wordpiece tokenizing (Wu et al., 2016) method, and then use a word embedding method (Mikolov et al., 2013) to transform the tokenized question to a sequence of low-dimension distributed embeddings, i.e., X = where d e denotes embedding size and n denotes question length.
Given word embeddings X, we use stacked two-layer multi-head attention mechanism in the Transformer (Vaswani et al., 2017) with learnable positional encodings as an encoder to model contextual dependencies between tokens, which results in context-aware representations And, contextual embedding for token [CTX] is used as the semantic representation for entire question, i.e., h (ctx) h n

Pointer-Equipped Decoder
Given contextual embeddings H of a question, we employ stacked two-layer masked attention mechanism in (Vaswani et al., 2017) as the decoder to produce sequence-formatted logical forms.
In each decoding step, the model first predicts a token from a small decoding vocabulary V (dec) = {start, end, e, p, tp, u num, A1, • • • , A21} , where start and end indicate the start and end of decoding, A1, • • • , A21 are defined in Table 1, and e, p, tp and u num denote entity, predicate, type and number entries respectively.A neural classifier is established to predict current decoding token, which is formally denoted as where s j is decoding hidden state of current (i.e., j-th) step, FFN(•; θ) denotes a θ-parameterized two-layer feed forward network with an activation function inside, and p Then, a FFN(•) or a pointer network (Vinyals et al., 2015) is utilized to predict instantiation for entry semantic category (i.e., e, p, tp or u num in V (vec) ) if it is necessary.
• For predicate p and type tp, two parameteruntied FFN(•) are used as where h (ctx) is semantic embedding of entire question, s j is current hidden state, p are predicted distributions over the predicate and type instantiation candidates respectively, and N (p) and N (t) are the numbers of distinct predicates and types in the knowledge base.
• For entity e and number u num, two parameter-untied pointer-networks (Vinyals et al., 2015) with learnable bilinear layer are employed to point toward the targeted entity 3 and number, which are defined as follows. p 2 Superscript in bracket denotes the type instead of index.
3 Toward the first one if entity consists of multiple words. p where H :,1:n−1 is contextual embedding of tokens in the question except [CTX], W (e) and W (n) are weights of pointer-network for entity and number, p (e) j , p (n) j ∈ R n−1 are the resulting distributions over positions of input question, and n is the length of the question.
The pointer network is also used for semantic parsing in (Jia and Liang, 2016), where the pointer aims at copying out-of-vocabulary words from a question over small-scale KB.Different from that, the pointer used here aims at locating the targeted entity and number in a question, which has two advantages.First, it handles the coreference problem by considering the context of entity mentions in the question.Second, it solves the problem caused by huge entity vocabulary, which reduces the size of decoding vocabulary from several million (i.e., the number of entities in KB) to several dozen (i.e., the length of the question).To map the pointed positions to entities in KB, our model also detects entity mentions for the input question, as shown as the "Entity Detection" part of Figure 1.

Entity Detection and Linking
We observe that multiple entities in a largescale KB usually have same entity text but different types, leading to named entity ambiguity.Therefore, we design a novel type-aware entity detection module in which the prediction is fulfilled in a joint space of IOB tagging and corresponding entity type for disambiguation.Particularly, the prediction space is defined as k=1 } where ET k stands for the k-th entity type label, N (t) denotes number of distinct entity types in KB, and |E| = 2×N (t) +1.
The prediction for both entity IOB tagging and entity type is formulated as where h i is the contextual embedding of the i-th token in the question, and p Given the predicted IOB labels and entity types, we take the following steps for entity linking.First, the predicted IOB labels are used to locate all entities in the question and return corresponding entity mentions.Second, an inverted index built on the KB is leveraged to find entity candidates in KB based on each entity mention.Third, the jointly predicted entity types are used to filter out the candidates with unwanted types, and the remaining entity with the highest inverted index score is selected to substitute the pointer.This process is shown as the bottom part of Figure 2.
During inference phase, the final logical form is derived by replacing entity pointers in entitypointed logical form from §3.2.2 with entity linking results, and is then executed on the KB to retrieve an answer for the question, as shown as the top part of Figure 2.

Learning and Inference
Model Learning During the training phase, we first search gold logical forms for questions in training data over KB if only weak supervision is provided.Then we conduct multi-task learning for semantic parsing and entity detection.The final loss is defined as where α > 0 is a hyperparameter for a tradeoff between semantic parsing and entity detection, and L (sp) and L (ed) are negative log-likelihood losses of semantic parsing and entity detection defined as follows.
In the two equations above, y (tk) j is gold label for decoding token in V (dec) ; y Here, we use a single model to handle two subtasks simultaneously, i.e., semantic parsing and entity detection.This multi-task learning framework enables each subtask to leverage supervision signals from the others, and thus improves the final performance for KB-QA.

Grammar-Guided Inference
The grammars defined in Table 1 are utilized to filter illegal operators out in each decoding step.An operator is legitimate if its left-hand semantic category in the definition is identical to the leftmost nonterminal (i.e., un-instantiated semantic category) in the incomplete logical form parsed so far.In particular, the decoding of a logical form begins with the semantic category start.During decoding, the proposed semantic parsing model recursively rewrites the leftmost nonterminal in the logical form by 1) applying a legitimate operator for an intermediate semantic category, or 2) instantiation for one of entity, predicate, type or number for an entry semantic category.The decoding process for the parsing terminates until no nonterminals remain.
Furthermore, beam search is also incorporated to boost the performance of the proposed model during the decoding.And, the early stage execution is performed to filter out illegal logical forms that lead to empty intermediate result.

Experimental Settings
Dataset We evaluated the proposed approach on Complex Sequential Question Answering (CSQA) dataset4 (Saha et al., 2018)  Training Setups We leveraged a BFS method to search valid logical forms for questions in training data.The buffer size in BFS is set to 1000.Both embedding and hidden sizes in the model are set to 300D, and no pretrained embeddings are loaded for initialization, and the positional encodings are randomly initialized and learnable.
The head number of multi-head attention is 6 and activation function inside FFN(•) is Gelu(•) (Hendrycks and Gimpel, 2016).We used Adam (Kingma and Ba, 2015) to optimize the loss function defined in Eq.( 7) where α is set to 1.5, and learning rate is set to 10 −4 .The training batch size is 128 for 6 epochs.And we also employed learning rate warmup within the first 1% steps and linear decay within the rest.The source codes are available at https://github.com/taoshen58/MaSP.More details of our implementation are described in Appendix A Evaluation Metrics We used the same evaluation metrics as Saha et al. (2018) and Guo et al. (2018).F1 score (i.e., precision and recall) is used to evaluate the question whose answer is comprised of entities, and accuracy is used to measure the question whose answer type is boolean or number.
Baselines There are few works targeting conversational question answering over a large-scale knowledge base.HRED+KVmem (Saha et al., 2018) and D2A (Guo et al., 2018) are two typical approaches, and we compared them with our proposed approach.Particularly, HRED+KVmem is a memory network (Sukhbaatar et al., 2015;Li et al., 2017) based seq2seq model, which combines HRED model (Serban et al., 2016) with key-value memory network (Miller et al., 2016).D2A6 is a memory augmented neural symbolic model for semantic parsing in KB-QA, which introduces dialog memory manager to handle ellipsis and coreference problems in conversations.

Model Comparisons
We compared 7 our approach (denoted as MaSP) with HRED+KVmem and D2A in Table 2.
As shown in the table, the semantic parsing based D2A significantly outperforms the memory network based text generation approach (HRED+KVmem), which thus poses a strong baseline.Further, our proposed approach (MaSP) achieves a new state-of-the-art performance, where the overall F1 score is improved by ∼12%.
Besides, the improvement is consistent for all question types, which ranges from 2% to 45%.There are two possible reasons for this significant improvement.First, our approach predicts entities more accurately, where the accuracy of entities in final logical forms increases from 55% to 72% compared with D2A.Second, the proposed pointer-equipped logical form decoder in the multi-task learning framework handles coreference better.For instance, given an user question with history, "What is the parent organization of that one?// Did you mean Polydor Records ?// No, I meant Deram Records.Could you tell me the answer for that?" with coreference, D2A produces "(find {Polydor Records}, owned by)" and in contrast our approach produces "(find {Deram Records}, owned by)".This also explains the sub- stantial improvement for Simple Question (Coreferenced) and Clarification8 .We also observed that the improvement of MaSP over D2A for some question types is relatively small, e.g., 1.73% for logical reasoning questions.A possible reason is that there are usually more than one entities are needed to compose the correct logical form for logical reasoning questions, and our current model is too shallow to parse the multiple entities.Hence, we adopted deeper model and employed BERT (Devlin et al., 2018) as the encoder (latter in §4.4), and found that the performance of logical reasoning questions is improved by 10% compared to D2A.There are two aspects leading to performance improvement, i.e., predicting entity type in entity detection to filter candidates, and multi-task learning framework.We conducted an ablation study in Table 3 for in-depth understanding of their effects.

Accuracy
Ours  Effect of Entity Type Prediction (w/o ET) First, the entity type prediction was removed from the entity detection task, which results in 9% drop of overall F1 score.We argue that the performance of the KB-QA task is in line with that of entity linking.Hence, we separately evaluated the entity linking task on the test set.As illustrated in Figure 3, both precision and recall of entity linking drop significantly without filtering the entity linking results w.r.t. the predicted entity type, which verifies our hypothesis above.
Effect of Multi-Task Learning (w/o Multi) Second, to measure the effect of multi-task learning, we evaluated the KB-QA task when the two subtasks, i.e., pointer-equipped semantic parsing and entity detection, are learned separately.As shown in Table 3, the F1 score for every question type consistently drops in the range of 3% to 14% compared with that with multi-task learning.We further evaluated the effect of multi-task learning on each subtask.As shown in Table 4, the accuracy for each component of the pointer-equipped logical form drops with separate learning.Meanwhile, we found 0.1% F1 score reduction (99.4% vs. 99.5%)for entity detection subtask compared to the model without multi-task learning, which only poses a negligible effect on the downstream task.To sum up, the multi-task learning framework increases the accuracy of the pointer-based logical form generation while keeping a satisfactory performance of entity detection, and consequently improves the final question answering performance.
Note that, considering a combination of removing the entity type filter and learning two subtasks separately (i.e., w/o Both in Table 3), the proposed framework will degenerate to a model that is similar to Coarse-to-Fine semantic parsing model, another state-of-the-art KB-QA model over smallscale KB (Dong and Lapata, 2018).Therefore, an improvement of 11% of F1 score also verifies the  advantage of our proposed framework.

Model Setting Analysis
As introduced in §4.1 and evaluated in §4.2, the proposed framework built on a relatively shallow neural network, i.e., stacked two-layer multihead attention, which might limit its representative ability.Hence, in this section, we further exploited the performance of the proposed framework by applying more sophisticated strategies.As shown in Table 5, we first replaced the encoder with pre-trained BERT base model (Devlin et al., 2018) and fine-tuned parameters during the training phase, which results in 1.3% F1 score improvement over the vanilla one.Second, we increased beam search size from 4 to 8 during the decoding in the inference phase for the standard settings, which leads to 2.3% F1 score increase.

Error Analysis
We randomly sampled 100 examples with wrong logical forms or incorrect answers to conduct an error analysis, and found that the errors mainly fall into the following categories.
Entity Ambiguity Leveraging entity type as a filter in entity linking significantly reduces errors caused by entity ambiguity, but it is still possible that different entities with same text belong to the same type, due to coarse granularity of the entity type, which results in filtering invalidity.For example, it is difficult to distinguish between two persons whose names are both Bill Woods.
Wrong Predicted Logical Form The predicted components (e.g., operators, predicates and types) composing the logical form would be inaccurate, leading to a wrong answer to the question or an un-executable logical form.

Spurious Logical Form
We took a BFS method to search gold logical forms for questions in training set, which inevitably generates spurious (incorrect but leading to correct answers coincidentally) logical forms as training signals.Take the question "Which sexes do King Harold, Queen Lillian and Arthur Pendragon possess" as an example, a spurious logical form only retrieves the genders of "King Harold" and "Queen Lillian", while it gets correct answers for the question.Spurious logical forms accidentally introduce noises into training data and thus negatively affect the performance of KB-QA.

Related Work
Our work is aligned with semantic parsing based approach for KB-QA.Traditional semantic parsing systems typically learn a lexicon-based parser and a scoring model to construct a logical form given a natural language question (Zettlemoyer and Collins, 2007;Wong and Mooney, 2007;Zettlemoyer and Collins, 2009;Kwiatkowski et al., 2011;Andreas et al., 2013;Artzi and Zettlemoyer, 2013;Zhao and Huang, 2014;Long et al., 2016).For example, Zettlemoyer and Collins (2009) and Artzi and Zettlemoyer (2013) learn a CCG parser, and Long et al. (2016) develop a shift-reduce parser to construct logical forms.
Neural semantic parsing approaches have been gaining rising attention in recent years, eschewing the need for extensive feature engineering (Jia and Liang, 2016;Ling et al., 2016;Xiao et al., 2016).Some efforts have been made to utilize the syntax of logical forms (Rabinovich et al., 2017;Krishnamurthy et al., 2017;Cheng et al., 2017;Yin and Neubig, 2017).For example, Dong and Lapata (2016) and Alvarez-Melis and Jaakkola (2017) leverage an attention-based encoder-decoder framework to translate a natural language question to tree-structured logical form.
Recently, to handle huge entity vocabulary existing in a large-scale knowledge base, many works take a stepwise approach.For example, Liang et al. (2016), Dong and Lapata (2016), and Guo et al. (2018) first process questions using a name entity linking system to find entity candidates, and then learn a model to map a question to a logical form based on the candidates.Dong and Lapata (2018) decompose the task into two stages: first, a sketch of the logical form is predicted, and then a full logical form is generated with considering both the question and the predicted sketch.
Our proposed framework also decomposes the task into multiple subtasks but is different from existing works in several aspects.First, inspired by pointer network (Vinyals et al., 2015), we replace entities in a logical form with the starting positions of their mentions in the question, which can be naturally used to handle coreference problem in conversations.Second, the proposed pointerbased semantic parsing model can be intrinsically extended to jointly learn with entity detection for fully leveraging all supervision signals.Third, we alleviate entity ambiguity problem in entity detection & linking subtask, by incorporating entity type prediction into entity mention IOB labeling to filter out the entities with unwanted types.

Conclusion
We studied the problem of conversational question answering over a large-scale knowledge base, and proposed a multi-task learning framework which learns for type-aware entity detection and pointerequipped logical form generation simultaneously.The multi-task learning framework takes full advantage of the supervisions from all subtasks, and consequently increases the performance of final KB-QA problem.Experimental results on a largescale dataset verify the effectiveness of the proposed framework.In the future, we will test our proposed framework on more datasets and investigate potential approaches to handle spurious logical forms for weakly-supervised KB-QA.

A Model Details
A.1 Word Embedding Given an user question sentence U , a tokenizing method (e.g., punctuation or wordpiece tokenizer (Wu et al., 2016)) is applied to the sentence for a list of tokens, i.e., U = where u i or u is an one-hot vector whose dimension equals to distinct tokens N in vocabulary, and n is the length of U .Note that a special token u is appended to the tokenized sentence, corresponding to the token [CTX].Then, randomly initialized or pre-trained (Mikolov et al., 2013;Pennington et al., 2014) embeddings are applied to U and thus transform discrete tokens to a sequence of low-dimension distributed embeddings, i.e., X = [x 1 , x 2 , ..., x n ] ∈ R de×n where d e is embedding size.This process is formulated as X = W (enc) U where W (enc) ∈ R de×N is the trainable word embedding weight matrix.

A.2.1 Encoder of Seq2seq Model
To model contextual dependencies between tokens and generate context-aware representations, we leverage stacked two-layer multi-head attention mechanism with additive positional encoding (Vaswani et al., 2017).The stacking scheme is identical to that in (Vaswani et al., 2017): twolayer feed forward network with activation function (FFN) follows each multi-head attention, and residual connection (He et al., 2016) with layer normalization (Lei Ba et al., 2016) is applied.This process is briefly denoted as where H is a sequence of contextual embeddings, W (pe) ∈ R de×n is learnable weights of PE and the three arguments for MultiHead are value, key, query for an attention mechanism.

A.2.2 Decoder of Seq2seq Model
Similar to token embedding in encoder ( §A.1), we embed the j-th decoder input token as z j via a randomly initialized embedding weight matrix to represent all tokens in a gold logical form sketch, where m denotes the length of gold sketch.
The basic structure of proposed logical form decoder is same as that in the original Transformer (Vaswani et al., 2017) except only two stacked layers are used here.Each layer of the decoder is bottom-up comprised of self-attention with forward mask, cross attention between decoder and encoder and FFN, which we briefly formulate as H, H, MultiHead mask (Z, Z, Z)))].
where S is a sequence of decoding hidden states.

A.3 Multi-task Learning
We propose to employ a multi-task learning strategy to learn a entity detection (ED) model jointly with the pointer-equipped semantic parsing model because the supervision information from ED, i.e, IOB tagging, can provide all entities spans in the input question, which thus results in better performance than separate learning.
The reasons why we use a multi-task learning to jointly learn the semantic parsing model and ED rather than directly equip the semantic parsing model with span prediction (Seo et al., 2017) are that 1) the supervision information of the entities not existing in the gold logical form but appearing in the question is lost; 2) deeper network is required when predicting the end index of the target as shown in (Seo et al., 2017) and 3) the well-solved entity detection method can provide correction for the pointer even with slight deviation during inference phrase, in contrast, spanbased model usually leads to error aggregation.

A.4 Inverted Index
Based on each entity text in Wikidata, we traversed its substring whose length is not less than that of its full text minus a threshold, and then, we separately calculated Levenshtein Distance between the full text and each substring as a score for the map from the substring to corresponding full text.Since multiple entities could generate identical substring, we kept maps with largest scores and used the maps to build a dictionary for future queries.

B.1 Precision and Recall for Main Paper
Since we report the F1 score for brief demonstration in the main paper, in this section, we report the corresponding recall and precision detailedly: 1) as shown in Table 12, the results of the proposed model compared with baselines are presented; 2) as shown in Table 13, the ablation study is presented; and 3) as shown in Table 14, the performance improvement comparison after sophisticated strategies applied is provided.To further demonstrate that the proposed model is superior to the previous D2A model in term of entity linking and logical form generation, we conduct the following comparisons.

B.2 Comparison to D2A
First, as shown in Table 6, the average number of entity candidates in test set from entity linking of the proposed model is 2× less than that of D2A, which means the proposed approach provides the downstream subtask with more accurate entity linking results.Second, we compare the proposed model with D2A in term of logical form generation where the logical form would be empty due to timeout or illegal logical forms during beam search.As demonstrated in Table 7, the proposed model obtains less ratio of empty logical form than D2A.

Figure 1 :
Figure 1: Proposed Multi-task Semantic Parsing (MaSP) model.Note that P* and T* are predicate and entity type ids in Wikidata where entity type id originally starts with Q but is replaced with T for clear demonstration.

Figure 2 :
Figure 2: Transformation from entity-pointed logical form to KB-executable logical form for KB querying.
j and y (n) j are gold labels for predicate, type, entity position and number position for instantiation; p j ] c∈{p,t,e,n} , and p (ed) j are defined in Eq.(1-6) respectively; and m denotes the decoding length.

Table 1 :
Brief grammar definitions for logical form generation. *instantiation of entity e, predicate p, type tp, number-in-question u num, by corresponding constant parsed from the question.
, which is the largest dataset for conversational question answering over large-scale KB.It consists of about 1.6M question-answer pairs in ∼200K dialogs, where 152K/16K/28K dialogs are used for train/dev/test.Questions are classified as different types, e.g., simple, comparative reasoning, logical reasoning questions.Its KB is built on Wikidata 5 in a form of (subject, predicate, object), and consists of 21.2M triplets over 12.8M entities, 3,054 distinct entity types, and 567 distinct predicates.

Table 2 :
Comparisons with baselines on CSQA.The last column consists of differences between MaSP and D2A.

Table 4 :
Prediction accuracy on each component composing the pointer-equipped logical form.

Table 5 :
Comparisons with different experimental settings."Vanilla" stands for standard settings of our framework, i.e, MaSP."w/ BERT" stands for incorporating BERT.And "w/ Large Beam" stands for increasing beam search size from 4 to 8.
Luke Zettlemoyer and Michael Collins.2007.Online learning of relaxed ccg grammars for parsing to logical form.In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).

Table 6 :
The averaged number of entity candidates from entity linking.

Table 7 :
Ratio of non-empty logical form.