Improved Neural Relation Detection for Knowledge Base Question Answering

Relation detection is a core component for many NLP applications including Knowledge Base Question Answering (KBQA). In this paper, we propose a hierarchical recurrent neural network enhanced by residual learning that detects KB relations given an input question. Our method uses deep residual bidirectional LSTMs to compare questions and relation names via different hierarchies of abstraction. Additionally, we propose a simple KBQA system that integrates entity linking and our proposed relation detector to enable one enhance another. Experimental results evidence that our approach achieves not only outstanding relation detection performance, but more importantly, it helps our KBQA system to achieve state-of-the-art accuracy for both single-relation (SimpleQuestions) and multi-relation (WebQSP) QA benchmarks.


Introduction
Knowledge Base Question Answering (KBQA) systems answer questions by obtaining information from KB tuples (Berant et al., 2013;Bordes et al., 2015;Bast and Haussmann, 2015;Yih et al., 2015;. For an input question, these systems typically generate a KB query, which can be executed to retrieve the answers from a KB. Figure 1 illustrates the process used to parse two sample questions in a KBQA system: (a) a single-relation question, which can be answered with a single <head-entity, relation, tail-entity> KB tuple (Fader et al., 2013;Yih et al., 2014;Bordes et al., 2015); and (b) a more complex case, where some constraints need to be handled for multiple entities in the question. The KBQA system in the figure performs two key tasks: (1) entity linking, which links n-grams in questions to KB entities, and (2) relation detection, which identifies the KB relation(s) a question refers to.
The main focus of this work is to improve the relation detection subtask and further explore how it can contribute to the KBQA system. Although general relation detection 1 methods are well studied in the NLP community, such studies usually do not take the end task of KBQA into consideration. As a result, there is a significant gap between general relation detection studies and KB-specific relation detection. First, in most general relation detection tasks, the number of target relations is limited, normally smaller than 100. In contrast, in KBQA even a small KB, like Freebase2M (Bordes et al., 2015), contains more than 6,000 relation types. Second, relation detection for KBQA often becomes a zero-shot learning task, since some test instances may have unseen relations in the training data. For example, the SimpleQuestions (Bordes et al., 2015) data set has 14% of the golden test relations not observed in golden training tuples. Third, as shown in Figure 1(b), for some KBQA tasks like WebQuestions (Berant et al., 2013), we need to predict a chain of relations instead of a single relation. This increases the number of target relation types and the sizes of candidate relation pools, further increasing the difficulty of KB relation detection. Owing to these reasons, KB relation detection is significantly more challenging compared to general relation detection tasks.
This paper improves KB relation detection to cope with the problems mentioned above. First, in order to deal with the unseen relations, we propose to break the relation names into word sequences for question-relation matching. Second, noticing  Figure 1: KBQA examples and its three key components. (a) A single relation example. We first identify the topic entity with entity linking and then detect the relation asked by the question with relation detection (from all relations connecting the topic entity). Based on the detected entity and relation, we form a query to search the KB for the correct answer "Love Will Find a Way". (b) A more complex question containing two entities. By using "Grant Show" as the topic entity, we could detect a chain of relations "starring roles-series" pointing to the answer. An additional constraint detection takes the other entity "2008" as a constraint, to filter the correct answer "SwingTown" from all candidates found by the topic entity and relation.
that original relation names can sometimes help to match longer question contexts, we propose to build both relation-level and word-level relation representations. Third, we use deep bidirectional LSTMs (BiLSTMs) to learn different levels of question representations in order to match the different levels of relation information. Finally, we propose a residual learning method for sequence matching, which makes the model training easier and results in more abstract (deeper) question representations, thus improves hierarchical matching. In order to assess how the proposed improved relation detection could benefit the KBQA end task, we also propose a simple KBQA implementation composed of two-step relation detection. Given an input question and a set of candidate entities retrieved by an entity linker based on the question, our proposed relation detection model plays a key role in the KBQA process: (1) Re-ranking the entity candidates according to whether they connect to high confident relations detected from the raw question text by the relation detection model. This step is important to deal with the ambiguities normally present in entity linking results. (2) Finding the core relation (chains) for each topic entity 2 selection from a much smaller candidate entity set after re-ranking. The above steps are followed by an optional constraint detection step, when the question cannot be answered by single relations (e.g., multiple entities in the question). Finally the highest scored query from the above 2 Following Yih et al. (2015), here topic entity refers to the root of the (directed) query tree; and core-chain is the directed path of relation from root to the answer node. steps is used to query the KB for answers.
Our main contributions include: (i) An improved relation detection model by hierarchical matching between questions and relations with residual learning; (ii) We demonstrate that the improved relation detector enables our simple KBQA system to achieve state-of-the-art results on both single-relation and multi-relation KBQA tasks.

Related Work
Relation Extraction Relation extraction (RE) is an important sub-field of information extraction. General research in this field usually works on a (small) pre-defined relation set, where given a text paragraph and two target entities, the goal is to determine whether the text indicates any types of relations between the entities or not. As a result RE is usually formulated as a classification task. Traditional RE methods rely on large amount of hand-crafted features (Zhou et al., 2005;Rink and Harabagiu, 2010;Sun et al., 2011). Recent research benefits a lot from the advancement of deep learning: from word embeddings (Nguyen and Grishman, 2014; Gormley et al., 2015) to deep networks like CNNs and LSTMs (Zeng et al., 2014;dos Santos et al., 2015;Vu et al., 2016) and attention models .
The above research assumes there is a fixed (closed) set of relation types, thus no zero-shot learning capability is required. The number of relations is usually not large: The widely used ACE2005 has 11/32 coarse/fine-grained relations; SemEval2010 Task8 has 19 relations; TAC-KBP2015 has 74 relations although it considers open-domain Wikipedia relations. All are much fewer than thousands of relations in KBQA. As a result, few work in this field focuses on dealing with large number of relations or unseen relations. Yu et al. (2016) proposed to use relation embeddings in a low-rank tensor method. However their relation embeddings are still trained in supervised way and the number of relations is not large in the experiments.
Relation Detection in KBQA Systems Relation detection for KBQA also starts with featurerich approaches Bast and Haussmann, 2015) towards usages of deep networks (Yih et al., 2015;Dai et al., 2016) and attention models (Yin et al., 2016;Golub and He, 2016). Many of the above relation detection research could naturally support large relation vocabulary and open relation sets (especially for QA with OpenIE KB like ParaLex (Fader et al., 2013)), in order to fit the goal of open-domain question answering.
Different KBQA data sets have different levels of requirement about the above open-domain capacity. For example, most of the gold test relations in WebQuestions can be observed during training, thus some prior work on this task adopted the close domain assumption like in the general RE research. While for data sets like SimpleQuestions and ParaLex, the capacity to support large relation sets and unseen relations becomes more necessary. To the end, there are two main solutions: (1) use pre-trained relation embeddings (e.g. from TransE (Bordes et al., 2013)), like (Dai et al., 2016); (2) factorize the relation names to sequences and formulate relation detection as a sequence matching and ranking task. Such factorization works because that the relation names usually comprise meaningful word sequences. For example, Yin et al. (2016) split relations to word sequences for single-relation detection. Liang et al. (2016) also achieve good performance on WebQSP with wordlevel relation representation in an end-to-end neural programmer model. Yih et al. (2015) use character tri-grams as inputs on both question and relation sides. Golub and He (2016) propose a generative framework for single-relation KBQA which predicts relation with a character-level sequenceto-sequence model.
Another difference between relation detection in KBQA and general RE is that general RE re-search assumes that the two argument entities are both available. Thus it usually benefits from features (Nguyen and Grishman, 2014; Gormley et al., 2015) or attention mechanisms ) based on the entity information (e.g. entity types or entity embeddings). For relation detection in KBQA, such information is mostly missing because: (1) one question usually contains single argument (the topic entity) and (2) one KB entity could have multiple types (type vocabulary size larger than 1,500). This makes KB entity typing itself a difficult problem so no previous used entity information in the relation detection model. 3

Background: Different Granularity in KB Relations
Previous research (Yih et al., 2015;Yin et al., 2016) formulates KB relation detection as a sequence matching problem. However, while the questions are natural word sequences, how to represent relations as sequences remains a challenging problem. Here we give an overview of two types of relation sequence representations commonly used in previous work.
(1) Relation Name as a Single Token (relationlevel). In this case, each relation name is treated as a unique token. The problem with this approach is that it suffers from the low relation coverage due to limited amount of training data, thus cannot generalize well to large number of opendomain relations. For example, in Figure 1, when treating relation names as single tokens, it will be difficult to match the questions to relation names "episodes written" and "starring roles" if these names do not appear in training data -their relation embeddings h r s will be random vectors thus are not comparable to question embeddings h q s.
(2) Relation as Word Sequence (word-level). In this case, the relation is treated as a sequence of words from the tokenized relation name. It has better generalization, but suffers from the lack of global information from the original relation names. For example in Figure 1(b), when doing only word-level matching, it is difficult to rank the target relation "starring roles" higher compared to the incorrect relation "plays produced". This is because the incorrect relation contains word "plays", which is more similar to the question  Table 1: An example of KB relation (episodes written) with two types of relation tokens (relation names and words), and two questions asking this relation. The topic entity is replaced with token <e> which could give the position information to the deep networks. The italics show the evidence phrase for each relation token in the question.
(containing word "play") in the embedding space.
On the other hand, if the target relation co-occurs with questions related to "tv appearance" in training, by treating the whole relation as a token (i.e. relation id), we could better learn the correspondence between this token and phrases like "tv show" and "play on".
The two types of relation representation contain different levels of abstraction. As shown in Table 1, the word-level focuses more on local information (words and short phrases), and the relation-level focus more on global information (long phrases and skip-grams) but suffer from data sparsity. Since both these levels of granularity have their own pros and cons, we propose a hierarchical matching approach for KB relation detection: for a candidate relation, our approach matches the input question to both word-level and relation-level representations to get the final ranking score. Section 4 gives the details of our proposed approach.

Improved KB Relation Detection
This section describes our hierarchical sequence matching with residual learning approach for relation detection. In order to match the question to different aspects of a relation (with different abstraction levels), we deal with three problems as follows on learning question/relation representations.

Relation Representations from Different Granularity
We provide our model with both types of relation representation: word-level and relationlevel. Therefore, the input relation becomes r = {r word 1 , · · · , r word M 1 } [ {r rel 1 , · · · , r rel M 2 }, where the first M 1 tokens are words (e.g. {episode, writ-ten}), and the last M 2 tokens are relation names, e.g., {episode written} or {starring roles, series} (when the target is a chain like in Figure 1(b)). We transform each token above to its word embed-ding then use two BiLSTMs (with shared parameters) to get their hidden representations [B word 1:M 1 : B rel 1:M 2 ] (each row vector i is the concatenation between forward/backward representations at i). We initialize the relation sequence LSTMs with the final state representations of the word sequence, as a back-off for unseen relations. We apply one max-pooling on these two sets of vectors and get the final relation representation h r .

Different Abstractions of Questions Representations
From As a result, we hope the question representations could also comprise vectors that summarize various lengths of phrase information (different levels of abstraction), in order to match relation representations of different granularity. We deal with this problem by applying deep BiL-STMs on questions. The first-layer of BiLSTM works on the word embeddings of question words q = {q 1 , · · · , q N } and gets hidden representations (1) N ]. The second-layer BiL-STM works on (1) 1:N to get the second set of hidden representations (2) 1:N . Since the second BiL-STM starts with the hidden vectors from the first layer, intuitively it could learn more general and abstract information compared to the first layer.
Note that the first(second)-layer of question representations does not necessarily correspond to the word(relation)-level relation representations, instead either layer of question representations could potentially match to either level of relation representations. This raises the difficulty of matching between different levels of relation/question representations; the following section gives our proposal to deal with such problem. Note that without the dotted arrows of shortcut connections between two layers, the model will only compute the similarity between the second-layer of questions representations and the relation, thus is not doing hierarchical matching.

Hierarchical Matching between Relation and Question
Now we have question contexts of different lengths encoded in (1) 1:N and (2) 1:N . Unlike the standard usage of deep BiLSTMs that employs the representations in the final layer for prediction, here we expect that two layers of question representations can be complementary to each other and both should be compared to the relation representation space (Hierarchical Matching). This is important for our task since each relation token can correspond to phrases of different lengths, mainly because of syntactic variations. For example in Table 1, the relation word written could be matched to either the same single word in the question or a much longer phrase be the writer of.
We could perform the above hierarchical matching by computing the similarity between each layer of and h r separately and doing the (weighted) sum between the two scores. However this does not give significant improvement (see Table 2). Our analysis in Section 6.2 shows that this naive method suffers from the training difficulty, evidenced by that the converged training loss of this model is much higher than that of a single-layer baseline model. This is mainly because (1) Deep BiLSTMs do not guarantee that the two-levels of question hidden representations are comparable, the training usually falls to local optima where one layer has good matching scores and the other always has weight close to 0. (2) The training of deeper architectures itself is more difficult.
To overcome the above difficulties, we adopt the idea from Residual Networks  for hierarchical matching by adding shortcut connections between two BiLSTM layers. We proposed two ways of such Hierarchical Residual Matching: (1) Connecting each max . Finally we compute the matching score of r given q as s rel (r; q) = cos(h r , h q ).
Intuitively, the proposed method should benefit from hierarchical training since the second layer is fitting the residues from the first layer of matching, so the two layers of representations are more likely to be complementary to each other. This also ensures the vector spaces of two layers are comparable and makes the second-layer training easier.
During training we adopt a ranking loss to maximizing the margin between the gold relation r + and other relations r in the candidate pool R. l rel = max{0, s rel (r + ; q) + s rel (r ; q)} where is a constant parameter. Remark: Another way of hierarchical matching consists in relying on attention mechanism, e.g. (Parikh et al., 2016), to find the correspondence between different levels of representations. This performs below the HR-BiLSTM (see Table 2).

KBQA Enhanced by Relation Detection
This section describes our KBQA pipeline system. We make minimal efforts beyond the training of the relation detection model, making the whole system easy to build. Following previous work (Yih et al., 2015; Xu et al., 2016), our KBQA system takes an existing entity linker to produce the top-K linked entities, EL K (q), for a question q ("initial entity linking"). Then we generate the KB queries for q following the four steps illustrated in Algorithm 1.

Algorithm 1: KBQA with two-step relation detection
Input : Question q, Knowledge Base KB, the initial top-K entity candidates ELK (q) Output: Top query tuple (ê,r, {(c, rc)}) 1 Entity Re-Ranking (first-step relation detection): Use the raw question text as input for a relation detector to score all relations in the KB that are associated to the entities in ELK (q); use the relation scores to re-rank ELK (q) and generate a shorter list EL 0 K 0 (q) containing the top-K 0 entity candidates (Section 5.1) 2 Relation Detection: Detect relation(s) using the reformatted question text in which the topic entity is replaced by a special token <e> (Section 5.2) 3 Query Generation: Combine the scores from step 1 and 2, and select the top pair (ê,r) (Section 5.3) 4 Constraint Detection (optional): Compute similarity between q and any neighbor entity c of the entities alongr (connecting by a relation rc) , add the high scoring c and rc to the query (Section 5.4).
Compared to previous approaches, the main difference is that we have an additional entity reranking step after the initial entity linking. We have this step because we have observed that entity linking sometimes becomes a bottleneck in KBQA systems. For example, on SimpleQuestions the best reported linker could only get 72.7% top-1 accuracy on identifying topic entities. This is usually due to the ambiguities of entity names, e.g. in Fig 1(a), there are TV writer and baseball player "Mike Kelley", which is impossible to distinguish with only entity name matching.
Having observed that different entity candidates usually connect to different relations, here we propose to help entity disambiguation in the initial entity linking with relations detected in questions.
Sections 5.1 and 5.2 elaborate how our relation detection help to re-rank entities in the initial entity linking, and then those re-ranked entities enable more accurate relation detection. The KBQA end task, as a result, benefits from this process.

Entity Re-Ranking
In this step, we use the raw question text as input for a relation detector to score all relations in the KB with connections to at least one of the entity candidates in EL K (q). We call this step relation detection on entity set since it does not work on a single topic entity as the usual settings. We use the HR-BiLSTM as described in Sec. 4. For each question q, after generating a score s rel (r; q) for each relation using HR-BiLSTM, we use the top l best scoring relations (R l q ) to re-rank the original entity candidates. Concretely, for each entity e and its associated relations R e , given the original entity linker score s linker , and the score of the most confident relation r 2 R l q \R e , we sum these two scores to re-rank the entities: s rerank (e; q) =↵ · s linker (e; q) Finally, we select top K 0 < K entities according to score s rerank to form the re-ranked list EL 0 K 0 (q). We use the same example in Fig 1(a) to illustrate the idea. Given the input question in the example, a relation detector is very likely to assign high scores to relations such as "episodes written", "author of " and "profession". Then, according to the connections of entity candidates in KB, we find that the TV writer "Mike Kelley" will be scored higher than the baseball player "Mike Kelley", because the former has the relations "episodes written" and "profession". This method can be viewed as exploiting entity-relation collocation for entity linking.

Relation Detection
In this step, for each candidate entity e 2 EL 0 K (q), we use the question text as the input to a relation detector to score all the relations r 2 R e that are associated to the entity e in the KB. 4 Because we have a single topic entity input in this step, we do the following question reformatting: we replace the the candidate e's entity mention in q with a token "<e>". This helps the model better distinguish the relative position of each word compared to the entity. We use the HR-BiLSTM model to predict the score of each relation r 2 R e : s rel (r; e, q).

Constraint Detection
Similar to (Yih et al., 2015), we adopt an additional constraint detection step based on text matching. Our method can be viewed as entitylinking on a KB sub-graph. It contains two steps: (1) Sub-graph generation: given the top scored query generated by the previous 3 steps 5 , for each node v (answer node or the CVT node like in Figure 1(b)), we collect all the nodes c connecting to v (with relation r c ) with any relation, and generate a sub-graph associated to the original query. (2) Entity-linking on sub-graph nodes: we compute a matching score between each n-gram in the input question (without overlapping the topic entity) and entity name of c (except for the node in the original query) by taking into account the maximum overlapping sequence of characters between them (see Appendix A for details and B for special rules dealing with date/answer type constraints). If the matching score is larger than a threshold ✓ (tuned on training set), we will add the constraint entity c (and r c ) to the query by attaching it to the corresponding node v on the core-chain.

Task Introduction & Settings
We use the SimpleQuestions (Bordes et al., 2015) and WebQSP (Yih et al., 2016) datasets. Each question in these datasets is labeled with the gold semantic parse. Hence we can directly evaluate relation detection performance independently as well as evaluate on the KBQA end task.

SimpleQuestions (SQ):
It is a single-relation KBQA task. The KB we use consists of a Freebase subset with 2M entities (FB2M) (Bordes et al., 2015), in order to compare with previous research. Yin et al. (2016) also evaluated their relation extractor on this data set and released their proposed question-relation pairs, so we run our relation detection model on their data set. For the KBQA evaluation, we also start with their entity linking results 6 . Therefore, our results can be compared with their reported results on both tasks. WebQSP (WQ): A multi-relation KBQA task. We use the entire Freebase KB for evaluation purposes. Following Yih et al. (2016), we use S-MART (Yang and Chang, 2015) entity-linking outputs. 7 In order to evaluate the relation detection models, we create a new relation detection task from the WebQSP data set. 8 For each question and its labeled semantic parse: (1) we first select the topic entity from the parse; and then (2) select all the relations and relation chains (length  2) connected to the topic entity, and set the corechain labeled in the parse as the positive label and all the others as the negative examples.
For both the relation detection experiments and the second-step relation detection in KBQA, we have entity replacement first (see Section 5.2 and Figure 1). All word vectors are initialized with 300-d pretrained word embeddings (Mikolov et al., 2013). The embeddings of relation names are randomly initialized, since existing pre-trained relation embeddings (e.g. TransE) usually support limited sets of relation names. We leave the usage of pre-trained relation embeddings to future work. Table 2 shows the results on two relation detection tasks. The AMPCNN result is from (Yin et al., 2016), which yielded state-of-the-art scores by outperforming several attention-based meth-6 The two resources have been downloaded from https:  ods. We re-implemented the BiCNN model from (Yih et al., 2015), where both questions and relations are represented with the word hash trick on character tri-grams. The baseline BiLSTM with relation word sequence appears to be the best baseline on WebQSP and is close to the previous best result of AMPCNN on SimpleQuestions. Our proposed HR-BiLSTM outperformed the best baselines on both tasks by margins of 2-3% (p < 0.001 and 0.01 compared to the best baseline BiLSTM w/ words on SQ and WQ respectively). Note that using only relation names instead of words results in a weaker baseline BiLSTM model. The model yields a significant performance drop on SimpleQuestions (91.2% to 88.9%). However, the drop is much smaller on WebQSP, and it suggests that unseen relations have a much bigger impact on SimpleQuestions.

Relation Detection Results
Ablation Test: The bottom of Table 2 shows ablation results of the proposed HR-BiLSTM. First, hierarchical matching between questions and both relation names and relation words yields improvement on both datasets, especially for SimpleQuestions (93.3% vs. 91.2/88.8%). Second, residual learning helps hierarchical matching compared to weighted-sum and attention-based baselines (see Section 4.3). For the attention-based baseline, we tried the model from (Parikh et al., 2016) and its one-way variations, where the one-way model gives better results 10 . Note that residual learning significantly helps on WebQSP (80.65% to 10 We also tried to apply the same attention method on deep BiLSTM with residual connections, but it does not lead to better results compared to HR-BiLSTM. We hypothesize that the idea of hierarchical matching with attention mechanism may work better for long sequences, and the new advanced attention mechanisms (Wang and Jiang, 2016;Wang et al., 2017) might help hierarchical matching. We leave the above directions to future work. 82.53%), while it does not help as much on Sim-pleQuestions. On SimpleQuestions, even removing the deep layers only causes a small drop in performance. WebQSP benefits more from residual and deeper architecture, possibly because in this dataset it is more important to handle larger scope of context matching.
Finally, on WebQSP, replacing BiLSTM with CNN in our hierarchical matching framework results in a large performance drop. Yet on Sim-pleQuestions the gap is much smaller. We believe this is because the LSTM relation encoder can better learn the composition of chains of relations in WebQSP, as it is better at dealing with longer dependencies.
Analysis Next, we present empirical evidences, which show why our HR-BiLSTM model achieves the best scores. We use WebQSP for the analysis purposes. First, we have the hypothesis that training of the weighted-sum model usually falls to local optima, since deep BiLSTMs do not guarantee that the two-levels of question hidden representations are comparable. This is evidenced by that during training one layer usually gets a weight close to 0 thus is ignored. For example, one run gives us weights of -75.39/0.14 for the two layers (we take exponential for the final weighted sum). It also gives much lower training accuracy (91.94%) compared to HR-BiLSTM (95.67%), suffering from training difficulty.
Second, compared to our deep BiLSTM with shortcut connections, we have the hypothesis that for KB relation detection, training deep BiLSTMs is more difficult without shortcut connections. Our experiments suggest that deeper BiLSTM does not always result in lower training accuracy. In the experiments a two-layer BiLSTM converges to 94.99%, even lower than the 95.25% achieved by a single-layer BiLSTM. Under our setting the twolayer model captures the single-layer model as a special case (so it could potentially better fit the training data), this result suggests that the deep BiLSTM without shortcut connections might suffers more from training difficulty.
Finally, we hypothesize that HR-BiLSTM is more than combination of two BiLSTMs with residual connections, because it encourages the hierarchical architecture to learn different levels of abstraction. To verify this, we replace the deep BiLSTM question encoder with two single-layer BiLSTMs (both on words) with shortcut connections between their hidden states. This decreases test accuracy to 76.11%. It gives similar training accuracy compared to HR-BiLSTM, indicating a more serious over-fitting problem. This proves that the residual and deep structures both contribute to the good performance of HR-BiLSTM.

KBQA End-Task Results
Table 3 compares our system with two published baselines (1) STAGG (Yih et al., 2015), the stateof-the-art on WebQSP 11 and (2) AMPCNN (Yin et al., 2016), the state-of-the-art on SimpleQuestions. Since these two baselines are specially designed/tuned for one particular dataset, they do not generalize well when applied to the other dataset. In order to highlight the effect of different relation detection models on the KBQA end-task, we also implemented another baseline that uses our KBQA system but replaces HR-BiLSTM with our implementation of AMPCNN (for SimpleQuestions) or the char-3-gram BiCNN (for WebQSP) relation detectors (second block in Table 3).
Compared to the baseline relation detector (3rd row of results), our method, which includes an improved relation detector (HR-BiLSTM), improves the KBQA end task by 2-3% (4th row). Note that in contrast to previous KBQA systems, our system does not use joint-inference or feature-based re-ranking step, nevertheless it still achieves better or comparable results to the state-of-the-art.
The third block of the table details two ablation tests for the proposed components in our KBQA systems: (1) Removing the entity re-ranking step significantly decreases the scores. Since the reranking step relies on the relation detection models, this shows that our HR-BiLSTM model contributes to the good performance in multiple ways. 11 The STAGG score on SQ is from (Bao et al., 2016).

Accuracy
System SQ WQ STAGG 72.8 63.9 AMPCNN (Yin et al., 2016) 76.4 -Baseline: Our Method w/ 75.1 60.0 baseline relation detector Our Method 77.0 63.0 w/o entity re-ranking 74.9 60.6 w/o constraints -58.0 Our Method (multi-detectors) 78.7 63.9 Table 3: KBQA results on SimpleQuestions (SQ) and WebQSP (WQ) test sets. The numbers in green color are directly comparable to our results since we start with the same entity linking results.
Appendix C gives the detailed performance of the re-ranking step.
(2) In contrast to the conclusion in (Yih et al., 2015), constraint detection is crucial for our system 12 . This is probably because our joint performance on topic entity and core-chain detection is more accurate (77.5% top-1 accuracy), leaving a huge potential (77.5% vs. 58.0%) for the constraint detection module to improve.
Finally, like STAGG, which uses multiple relation detectors (see Yih et al. (2015) for the three models used), we also try to use the top-3 relation detectors from Section 6.2. As shown on the last row of Table 3, this gives a significant performance boost, resulting in a new state-of-the-art result on SimpleQuestions and a result comparable to the state-of-the-art on WebQSP.

Conclusion
KB relation detection is a key step in KBQA and is significantly different from general relation extraction tasks. We propose a novel KB relation detection model, HR-BiLSTM, that performs hierarchical matching between questions and KB relations. Our model outperforms the previous methods on KB relation detection tasks and allows our KBQA system to achieve state-of-the-arts. For future work, we will investigate the integration of our HR-BiLSTM into end-to-end systems. For example, our model could be integrated into the decoder in (Liang et al., 2016), to provide better sequence prediction. We will also investigate new emerging datasets like GraphQuestions (Su et al., 2016) and ComplexQuestions (Bao et al., 2016) to handle more characteristics of general QA.