Open Domain Question Answering based on Text Enhanced Knowledge Graph with Hyperedge Infusion

The incompleteness of knowledge base (KB) is a vital factor limiting the performance of question answering (QA). This paper proposes a novel QA method by leveraging text information to enhance the incomplete KB. The model enriches the entity representation through semantic information contained in the text, and employs graph convolutional networks to update the entity status. Furthermore, to exploit the latent structural information of text, we treat the text as hyperedges connecting entities among it to complement the deficient relations in KB, and hypergraph convolutional networks are further applied to reason on the hypergraph-formed text. Extensive experiments on the WebQuestionsSP benchmark with different KB settings prove the effectiveness of our model.


Introduction
Open domain question answering (QA) is a challenging task that requires answering the factual questions in natural language. According to the structure of supporting information, QA system can be divided into knowledge-based QA (KBQA) (Bordes et al., 2015; and textbased QA (TBQA) (Welbl et al., 2018;Yang et al., 2018). KBQA obtains the answers by a structured knowledge base, which is easy to query and reason with but limited by the incompleteness of welldesigned triples. TBQA's supporting information is plain text containing rich semantic and latent structural information, however, it's difficult for a machine to understand. The complementary properties inspire us to fuse these two kinds of data to enhance the incomplete KB and further improve the QA system's performance.
Some work has already been proposed. Das et al. (2017) represent KB and text using universal schema and apply memory networks, but lack the  Figure 1: Example of a question with its related KB and text. The KB is incomplete to answer the question, which lacks relation "graduated university" and entity "University of Washington". By completing the missing information with adding text as hyperedge, we can handle the question more effectively.
association between KB and text. Sun et al. (2018) build a heterogeneous graph with entities and text as nodes and employ a graph based method. Xiong et al. (2019) first encode entities in KB by graph attention networks and then read text with the help of accumulated entity knowledge. Although good results have been achieved, the text information is not fully utilized, especially the relation information among the entities contained in the text. Figure 1 shows an example that the KB is insufficient to answer the question. This question can be adequately answered by using the structural information of the text to bring high-level relationships.
In this paper, we propose a novel QA model based on text enhanced knowledge graph, which enriches entity representation by text semantic information and complements the relations in KB through structural information of the text. Specifically, the model firstly encodes entities in KB combining text information and applies graph convolutional networks (GCN) (Wu et al., 2020) to reason across KB. Note that a document usually mentions multiple entities, we convert the unstructured text into a structured hypergraph by regarding text as hyperedge connecting entities among text, and then employ hypergraph convolutional networks (HGCN) (Feng et al., 2019;Yadati et al., 2019) to further update the entity states. Finally, the model predicts the final answers.
Our highlights are summarized as follows: 1) We novelly treat documents as high-order relations (hyperedges) connecting entities mentioned in them. 2) We apply Hypergraph Convolutional Networks to reason and propose the dual-step attention to catch the importance of different entities and documents. 3) Extensive experiments conducted on the widely used WebQuestionsSP (Yih et al., 2016) with different KB settings demonstrate our model is effective.

Related Work
The combination of knowledge base and text in QA is a challenging task, which has attracted many researchers' attention. The work of (Das et al., 2017) extends universal schema to question answering and employs Key-Value Memory networks to process to text and KB. Sun et al. (2018) regard documents as heterogeneous nodes and combine them with entities in KB to form a uniform graph. The model proposed by Xiong et al. (2019) contains a graph-attention based KB reader and a knowledge-aware text reader. Some other work focuses on retrieving a small graph that contains just the question-related information (Sun et al., 2019) and the interpretability of QA on KB and text (Sydorova et al., 2019). These methods lack considering the high-order relationship among the entities contained in the text. This paper regards the text as hyperedge and further employs hypergraph convolutional networks.
Hypergraph convolutional networks (Feng et al., 2019;Yadati et al., 2019) utilize hypergraph structure rather than a general graph to represent the high-order correlation among data entirely, and hypergraph attention  further enhances the ability of representation learning by using an attention module.

Task Definition
To maintain consistency and fairness, we adopt the same setting as Sun et al. (2018) that builds a subgraph for each question. Specifically, given a question q = (w 1 , w 2 , ..., w |q| ), the related sub knowledge graph K = (V, E, T ) is extracted by the Personalized PageRank (Haveliwala, 2002), where V is the entity set, E is the relation set, and T contains a set of triples (v h , r, v t ) indicated there is a relation r ∈ E between v h ∈ V and v t ∈ V. Also a relevant text corpus D = d 1 , d 2 , ..., d |D| is retrieved from Wikipedia by an off-the-shelf document retriever (Chen et al., 2017), which d i = (w 1 , w 2 , ..., w |d i | ) represents a document and the entities mentioned in documents have been linked. The task requires to extract answers from all KB and document entities. The overview of our model is shown in Figure 2.

Input Encoder
Query and Text Encoder: Let X q ∈ R |q|×n and X d ∈ R |d|×n be the embedding matrices of query q and document d ∈ D, where n is the embedding dimension. Bi-LSTM networks (Hochreiter and Schmidhuber, 1997) are applied to encode the query and document separately and get the hidden states H q ∈ R |q|×h and H d ∈ R |d|×h , h is the hidden dimension of bi-LSTM. Then we compute the representation of query h q and document h d with attention mechanism.
where T represents matrix transposition, f q is a linear network which converts h dimension to 1 dimension, and f d converts |q| dimension to 1 dimension.
KB Encoder: Each entity v ∈ V is initialized by pre-trained knowledge graph embedding x v ∈ R n×1 . And relation is initialized by semantic vector and KG embedding. Specifically, for relation r ∈ E and its KG embedding x r ∈ R n×1 , we tokenize it as r = (w 1 , w 2 , ..., w |r| ) and feed into bi-LSTM layer with word embedding to get the hidden states H r ∈ R |r|×h , then calculate the representation h r as follows.
where [; ] denotes column-wise concatenation, f r1 is a linear network which converts 2h dimension to 1 dimension, and f r2 converts 2h + n dimension to h dimension.

Reasoning over Text Enhanced Knowledge Graph
This component utilizes text information to improve the incomplete KB by enriching entity representation and adding hyperedges, and applies GCN and HGCN to reasoning.

GCN for Entity-Enriched KB:
To utilize the rich semantic information contained in the text, we construct a binary matrix M, where M v d ∈ R |d|×1 indicates the span of entity v in document d, and pass information from documents to entities to form text-aware entity representation x v , then concatenate with x v as initial node state h Then the model learns the entity representation by aggregating the connected entity feature.
where W 1 ∈ R h×h , W 2 ∈ R h×2h are learnable parameters, N v represents the adjacent triple set of entity v, f a converts 2h dimension to h dimension, l 1 represents the current GCN layer, which has a total of L 1 layers, and σ is the sigmoid function.
HGCN for Hypergraph-Formed Text: The model regards plain text as hyperedges connecting the entities among the text to complement the lack of relations in KB. HGCN is employed to encode on the hypergraph-formed text. What's more, dual-step attention catches the importance of different entities and documents. Formally, at layer l 2 , the model first transfers the entity feature to the connected hyperedges to form the document representation, where W 5 , W 6 ∈ R h×h are learnable parameters.

Answer Prediction
After L 1 GCN layers and L 2 HGCN layers, the model finally predicts the probability of each entity being the answer, where f out converts h dimension to 1 dimension.
In our experiments we adopt the dataset 1 preprocessed by Sun et al. (2018).  statistics of the dataset and retrieved subgraphs for the questions, including KB and linked text. In particular, the average number of linked entities in the documents is 4.6, which illustrates the rationality of adopting hyperedges.

Baseline Methods
We compare our methods with the following models: • KVMemNet (Miller et al., 2016) is an end-toend memory network which stores KB facts and text into key-value pairs.
• GraftNet (Sun et al., 2018) combines KB and text with the early fusion strategy and applies a graph-based model.
• SG-KA Reader (Xiong et al., 2019) proposes two components to reason over KB and incorporate entity information to text.
• PullNet (Sun et al., 2019) is a QA framework for learning how to retrieve small sub-graph related to answering the question.

Training Details
The model is implemented in PyTorch (Paszke et al., 2019) and trained on one Nvidia Tesla P40 GPU. We apply 100-dimensional TransE embeddings (Bordes et al., 2013) for entities and relations, and 300-dimensional GloVe embeddings (Pennington et al., 2014) for question and text words. The word numbers of questions and documents are limited to be 10 and 50. The hidden size is set to 100. We select the hyperparameter values by manual tuning to perform the best results on the validation dataset. The dropout is 0.2, and the batch size is 8.
The GCN layer L 1 and HGCN layer L 2 are 1 and 2 separately. The average runtime for one epoch is 5 minutes, and we set the max number of epochs to 200. The number of parameters is 69 million. The Adam optimizer (Kingma and Ba, 2015) is applied to minimize the binary cross-entropy loss with a learning rate of 0.0005. The threshold for F1 is set to 0.05.

Results
Main Results: The metrics adopted in the experiments are Hits@1, which is the accuracy of the top answer predicted by the model, and F1, which represents the ability to predict all answers. As shown in table 2, we experiment with our model under KBonly, Text-only, and KB+Text settings and compare them with baseline methods. Our model gets competitive performance in the KB-only setting and achieves the best results in the other two settings, especially in the Text-only setting, Hits@1 and F1 are 1.9% and 1.8% higher than the second-best method respectively, which shows the validity of treating documents as hyperedges. The promising performance may inspire us to handle similar tasks that build plain text to hypergraph and apply efficient HGCN. In KB+Text's setting, our method also achieves the best performance, proving that our proposed enhancement strategy can effectively enhance incomplete KB by fully introducing the semantic and structural information implied in the text. In particular, our model improves a lot compared with KB-only, more than the work of (Sun et al., 2018), which demonstrates our way that treating documents as hyperedges is more productive than regarding them as heterogeneous nodes.  strating the ability of our method for multi-answer prediction. After combining the text, our model achieves the best results compared with the baseline methods. What's more, the performance in the KB+Text setting has been significantly improved over the KB-only setting. The more incomplete the KB, the more obvious the performance improvement, which shows it can effectively use the document information to complete and enhance the KB, so as to further improve the performance of QA. In order to intuitively visualize the improvement, figure 3 displays the increment of all models after adding text under different settings (KB+Text−KBonly). We can observe that our method achieves the largest or almost the largest increment. What's more, we notice the text information improves the performance obviously in the case of incomplete KB, but may cause the extra interference when the KB is sufficient to support answering questions, which even lead to performance degradation. This makes us think about how to effectively use text to further improve the performance of question answering under the full KB setting.  the model. Table 4 shows the results under 10% KB setting. From the second and third rows, the attention mechanism adopted by the model is effective, especially the dual-step attention proposed at the HGCN layer, which brings 1.2% improvement of Hits@1. The strategy of entity-enriched KB also increases Hits@1 by 0.9%, proving its validity.

Conclusion
We propose a QA method that aims to enhance the incomplete KB by text information, which fully explored the semantic and latent structural information in the text. In particular, the text is treated as hyperedges to complement the lack of relations in KB. The model first applies GCN to encode the entity-enriched KB, then employs HGCN to further reason over hypergraph-formed text, and predicts the final answers. Experimental results on the We-bQuestionsSP benchmark prove the effectiveness of our model and each component.