I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning

CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.


Introduction
Commonsense is the knowledge shared by the majority of people in society and acquired naturally in everyday life. Commonsense reasoning is the process of logical inference by using commonsense information. Commonsense to answer the questions that is "Blowfish requires what specific thing to live?" in Figure 1 is depicted as: "Blowfish is fish", "Fish lives in the water", and "Water includes seas and rivers." An enormous amount of pre-defined commonsense knowledge is available and people can make inferences using this commonsense such as in the following example: "Blowfish is fish." → "Fish lives in the water." → "Water includes seas and rivers." ⇒ "Blowfish lives in the sea." This chain of commonsense reasoning is naturally deduced by humans without substantial difficulty. Whereas people acquire commonsense in their lives, machines cannot learn this knowledge without any assistance. A large amount of external knowledge and several reasoning steps are required for machines to learn commonsense. In recent years, various datasets (Zellers et al., 2018;Sap et al., 2019;Zellers et al., 2019) have been constructed to enable machines to reason commonsense.
CommonsenseQA (Talmor et al., 2019) is one of the most widely researched datasets and is presented in Figure 1 (a). The studies of commonsense reasoning based on this dataset can be categorized into two mainstream approaches. The first approach uses pre-trained language models with distributed representations, which exhibit high performances on most Natural Language Processing (NLP) tasks. However, despite their high performance, these models must be trained with an excessive number of parameters and cannot explain the process of commonsense reasoning. The second approach is reasoning with a commonsense knowledge graph. The generally used commonsense knowledge graph is ConceptNet 5.5 , which includes parsed representation from Open Mind Commonsense (OMCS) and other different language sources such as WordNet (Bond and Foster, 2013) or DBPedia (Auer et al., 2007). In this approach, the subgraph of ConceptNet corresponding to the questions are transformed into  node embeddings by the graph encoder. The candidate with the highest attention score is selected as an answer that is computed between the node embeddings and the word vectors from the language models. To learn the commonsense knowledge that is not observed or understood by the language models, relations from ConceptNet serve as a critical role in this method. The performance is improved by utilizing the relations that are not represented in the text; however, the interpretation of the question is still not enough. Unlike CommonsenseQA, the most commonly used method of solving this problem is Knowledge-Based Question-Answering (KBQA) (Berant et al., 2013;Yih et al., 2014;Yih et al., 2015) employing semantic representations. As this method infers the answer with the logical structure of the question using the knowledge base, the question-answering process can be explained in a logical form. In our work, Abstract Meaning Representation (AMR) (Banarescu et al., 2013), which is one of the logical structure, is used to understand the overall reasoning process, from the question to the answer. AMR is a graph for meaning representation that symbolizes the meaning of sentences. AMR illustrates "who is doing what to whom" that is implied in a sentence with a graph. The components of these graphs are not the words, but rather the concepts and their relations. Each concept denotes an event or an entity, and each relation represents the semantic role of the concepts.
In this paper, we enable the language models to exploit the AMR graph to understand the logical structure of sentences. However, it is difficult to infer commonsense information with only an AMR graph, owing to its deficiency of commonsense knowledge of the given sentence. For example, in Figure  1 (b), the AMR graph indicates the path of the logical structure of the sentence "What the blowfish requires to live?" (require-01 → purpose → live-01 → ARG0 → blowfish); in other words, these paths from the single AMR graph lack the proficient information to predict the right answer. Therefore, for commonsense reasoning, dynamic interactions between the AMR graph and ConceptNet are inevitable to reach the correct answer.
Thus, we propose a new compact AMR graph expanded with the ConceptNet's commonsense relations with pruning, and it is called ACP graph. The proposed method can interpret the path from the question to the answer by performing commonsense reasoning within the connected graph, such as "Blowfish needs the sea to live." (require-01 → purpose → live-01 → ARG0 → blowfish → AtLocation → sea).
The contributions of our study are as follows.
• We introduce a new graph structure the ACP graph, which is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN) for commonsense reasoning. This structure is represented in a Levi graph (Gross et al., 2013) form to enable relation interpretations.
• We propose a graph-path reasoning framework, using which it is possible to explain the path from the question to the answer in a logical manner based on commonsense reasoning.
• Our path reasoning method exhibits a performance improvement over previous models.

AMR Graph
ConceptNet Graph  The remainder of this paper is organized as follows. In Section 2, we present the entire process of our method in detail. The experimental setup and results are explained in Section 3. A discussion of the proposed model is provided in Section 4, and Section 5 presents the conclusions. Appendix A provides related works including ConceptNet, previous works on commonsense reasoning, and AMR.

Proposed Method
We propose a commonsense reasoning framework that uses a commonsense knowledge base on the basis of the AMR logic structure. Our framework consists of the AMR graph integrating and pruning module, language model encoder, and graph path learning module § . As illustrated in Figure 2, we first generate the AMR graph from every question in the CommonsenseQA dataset and integrate all the nodes of AMR with ConceptNet graphs.
As this AMR-ConceptNet full graph also includes some irrelevant relations to the question, interpreting questions can be guided in the wrong way. For this reason, we suggest a new method, the ACP graph, pruned according to the relation type. Thereafter, the graph path learning module takes the pruned graph as an input and computes the attention score of each path by using the Graph Transformer (Cai and Lam, 2019) which results in the whole graph vector. The graph vector is finally fed into the Transformer (Vaswani et al., 2017) to model the interactions between the AMR and ConceptNet graph and transforms to the final graph representation. Meanwhile, the question and candidate answer from the dataset are passed through the language model encoder, producing the language vector. The concatenation of the language and graph vectors turns out to the final representation that is used to predict the correct answer.
In contrast to other models mentioned in Talmor (2019), which cannot provide interpretable reasons for predicting the correct answer from the question, our proposed method produces the reasoning paths that make the model transparent and interpretable. That is, the reasoning paths that have high attention weights from the graph encoder possess potentially accurate information for reasoning. These reasoning paths are depicted in Figure 5.  Figure 3: Two methods of integrating AMR graph with ConceptNet graph. Method (a) incorporates all of the nodes from the AMR graph with the ConceptNet graph. In contrast, method (b) removes the ConceptNet nodes that is not connected to the AMR nodes which have ARG0 and ARG1 relations. For instance, we generate an AMR graph from the sentence "What home entertainment equipment requires cable?", and we find every ConceptNet relation that includes all the AMR nodes such as (cable-HasContext-telegraphy), (require01-RelatedTo-need), and (home-UsedFor-living). Then remove the ConceptNet nodes that are not connected to the AMR nodes which have ARG0 or ARG1 relations. In this example, the deleted nodes are living, house, and game. In addition, as the require-01 is frame node which does not need to expand, need is also removed from the graph.

Graph Integrating and Pruning
As each word plays a certain role as a predicate or an argument in a sentence, the concepts of the AMR graph also carry semantic meanings in the graph structure. Hence, the AMR graph is capable of interpreting the questions as paths, semantically . Owing to these advantages of the graph structure and preserved semantic interpretation, we use the AMR graph for extracting commonsense knowledge graph.
To generate an AMR graph from the raw text, we use the pre-trained model of Zhang (2019), which is an attention-based model that treats AMR parsing as sequence-to-graph transduction. Though most of the AMR graphs generated from the model properly, they might have some inevitable errors in the type of relations or concept.
We suggest effective AMR expansion and pruning rules for commonsense reasoning. We expand the AMR graph on all nodes with the ConceptNet as illustrated in Figure 3 (a), and prune the nodes that have edges known as ARG0 and ARG1 with ConceptNet. Considering that ARG0 and ARG1 are the top two frequent relations among any other relations as shown in Table 1, we prune the full AMR-CN graph into a more compact graph that only contains ARG0 and ARG1 relations, which is called ACP graph. This procedure prevents the graph from discovering a tremendous number of paths iteratively. As described in Appendix A, since the frame node is defined as a central point in the AMR graph like require-01 in Figure 3, combining other ConceptNet relations with the root node may distract the process of path reasoning. Also, the frame node's specific meaning additionally annotated by the number like "-01" at the end of the word is different from the meaning in ConceptNet's node even though it has identical letters. For example, the specific meaning of the frame node "play-11" is "play/perform music" defined in Propbank frameset while ConceptNet's node "play" includes more diverse meanings such as "engage in an activity like game"or "bet or wager". Therefore, we remove the ConceptNet relations and nodes connected to the frame node. The proposed method is depicted in Figure 3 (b).
The graph G = (V, E) expresses fixed set of nodes V, and relation edges E. Following this notation, the ACP graph is defined as follows: The ACP graph expressed in equation (1) is the union set of the AMR and the subgraph of ConceptNet that contains AMR concepts that are connected to ARG0 and ARG1, respectively. The AMR graph is denoted as G AM R = {V amr , E amr }. The subgraph of ConceptNet matched with the concepts that are connected to ARG0 and ARG1 is defined as G AM R arg CN = {V amr arg cn , E amr arg cn }

Language Encoder and Graph Path Learning Module
The proposed method performs commonsense reasoning over the ACP graph and predicts the correct answer with the corresponding inference. Our model receives two types of inputs, which are text and graph, and converts semantic representation to distributed representation. To encode the text input into the distributed representation, the language encoder which is the pre-trained language model with a massive amount of corpus takes an input that is formalized as "[CLS]+Question+[SEP]+candidate answer." Given the ACP graph from the graph integrating and pruning module, the graph path learning module initializes the concept node vectors as the sum of the concept embedding using GloVe (Pennington et al., 2014) and absolute position embedding. Inspired by the works of Cai (2019), we modify the graph transformer to make the model reason over the relation paths of the ACP graph. To let the model recognize the explicit graph paths, we first encode the relation between two concepts into a distributed representation using the relation encoder. The relation encoder identifies the shortest path between two concepts and represents the sequence as a relation vector by employing recurrent neural networks with a Gated Recurrent Unit (GRU) (Cho et al., 2014). The equation for the represented relation is expressed as follows: where sp t indicates the shortest path of the relation between two nodes. The final relation encoding r ij between concepts i and j is the concatenation of the final hidden states from the forward and backward GRU networks, which are presented in the equation (3).
To inject this relation information into the concept representation, we follow the idea of relative position embedding (Shaw et al., 2018;Salton et al., 2017), which introduces the attention score method based on both the concept representations and their relation representation. To compute the attention score, we split the relation vector r ij passed from the linear layer into forward relation encoding r i→j and backward relation encoding r j→i , as follows: where W r is the parameter matrix. This split renders the model consider bidirectionality of the path.
Thereafter, we compute the attention score considering the concepts and their relations. Note that c i and c j are the concept embedding. The equation is presented below: The first term in the last line of equation (5) is the original term in the vanilla attention mechanism, which includes the pure contents of the concept. The second and third terms capture the relation bias with respect to the source and target, respectively. The final term represents the universal relation bias. As a result, the computed attention score updates the concept embedding while maintaining fully-connected communication (Cai and Lam, 2019). Therefore, concept-relation interactions can be injected into the concept node vector. The resulted concept representations are summed into the whole graph vector and fed into the Transformer Layers to model the interaction between AMR and ConceptNet concept representation. The major advantage of this relation-enhanced attention mechanism is that it provides a fully connected view of input graphs by making use of the relation multi-head attention mechanisms. Since we integrate two different concept types from the AMR graph and ConceptNet into a single graph, the model globally recognizes which path has high relevance to the question during the interpretation. After obtaining the language and graph vectors, the model concatenates the two vectors, feed these into the Softmax layer, and selects the correct answer.

Experiments
To show the effectiveness of representing a question using the proposed ACP graph in commonsense reasoning, we conduct four different experiments. We first compare the ACP with the ACF graph, which is expanded on all the concepts of the AMR graph, and with the graph that only utilizes the ConceptNet. In addition, we apply our model to three language models that have different encoder structures, showing performance enhancement as shown in Table 4. Moreover, we investigate the efficacy of the proposed method on the extended versions of the BERT-base model such as the BERT-large-cased or post-trained BERT model with OMCS data. Finally, we show the performance of our model with official test set.

Data and Experimental Setup
The CommonsenseQA dataset consists of 12,102 (v1.11) natural language questions and each question has five candidate answers provided by Talmor et al (2019). As the prediction on the official test set can be evaluated only through the organizers by two weeks, we divide the official training set for the experiment efficiency. We split the given training set into the new training and test sets. The new training, development, and test sets included 8,500, 1,221, and 1,241 examples, respectively. We use RTX8000 for training our model. The parameters for the graph path learning model are identical to the work in Cai (2019)'s model.

Experimental Results
Commonsense reasoning. To demonstrate that our ACP graph is more effective than other graph features, we conduct experiments on diverse graph features that include not only the ACP graph but also the ACF graph. The ACF graph is an integrated graph, with the AMR graph and the ConceptNet matched with all concepts from the AMR graph. Furthermore, we run experiments only using ConceptNet (CN) in two manners. The one is a method that uses the ConceptNet graph that corresponds to all the question tokens separated by the spaces of the sentence as depicted in Figure 4 (a). As the tokens from the question are not connected initially, it may prevent our graph path learning module from reasoning over the CN graph due to the disconnection between the concept nodes. Therefore, we connect all of the tokens from the question to the root node to let our model perform effectively on commonsense reasoning. The other is a method that employed the pruned ConceptNet graph using the the logic of AMR graph  Figure 4 (b). The pruned ConceptNet graph includes the subgraph of the ConceptNet matched with tokens that are connected to ARG0 and ARG1 of the AMR. For example, as the concept nodes require, equipment, and cable in Figure 3 (a) are connected with the relation of ARG0 and ARG1, only those concept nodes have the ConceptNet relations. Similar to the first method, the tokens from the question are linked with token relation to the root node. Note that these two methods do not explicitly make use of the AMR graph concepts. ACF graph is depicted in Figure 3 (a) and is expressed as follows.
Note that the AMR graph is denoted as G AM R = {V amr , E amr } and the subgraph of ConceptNet matched with the ACF graph is denoted as G AM R CN = {V amr cn , E amr cn }. The (1) CN full graph (CF) and (2) CN pruned graph (CP) are illustrated in Figure 4 and defined as follows, respectively: The ConceptNet graph is denoted as G CN = {V cn , E cn } and the subgraph of ConceptNet matched with the question token is denoted as G token CN = {V token cn , E token cn }. For the CN pruned graph, the notation is as follows: the subgraph of ConceptNet matched with the tokens connected to ARG0 and ARG1 is defined as G AM R arg CN = {V amr arg cn , E amr arg cn }. In addition, V token denotes the token of the sentence that is separated by a space. Table 2, whereas the BERT fine-tuning only score 51.59% in terms of accuracy, the models with the AMR graph or ConceptNet exceed this result, scoring over 52%. Interestingly, the ACP graph achieves the best score among all other graph types. These results demonstrate that the ACF graph and other CF, CP graphs consider all possible paths which are unnecessary and obtain insufficient performance. In other words, the ACP graph enables the Graph Path Learning Module to find reasonable paths efficiently by ignoring the irrelevant paths. Since the ACP graph provides the best results, other experiments are conducted with the ACP graph feature.  Extensions of BERT. We also demonstrate the effects of the proposed method on improved BERT models that are post-trained BERT-base-cased with OMCS and BERT-large. Previous studies mostly addressed the CommonsenseQA utilizing a post-training approach with OMCS data, which is a freely available crowd-sourced knowledge base of natural language statements regarding the world. Because of the high performance on the ACP graph feature in Table 2, we illustrate the effect on the performance of the ACP graph with respect to the different version of BERT. As indicated in  Comparison on different language models. Table 4 presents the results of the comparison experiments on different language models with varying transformer encoder structures. Input of the language model is formalized as "[CLS]+Question+[SEP]+candidate answer." All of the language models that use our method outperformed their own fine-tuned score, achieving 53.58% with BERT-base, 60.35% with XLNet-base , 51.08% with ALBERT-base , and 70.91% with ELECTRA-base (Clark et al., 2020) on our new test set. This implies that the concept representations obtained from the our ACP graph had significant effects and stable generality on CommonsenseQA, regardless of the language model encoder types.

As indicated in
(3) (b) Heatmap of path attention from source to target words. The details of each path are presented below the heatmap Figure 5: Case study of the question "What home entertainment equipment requires cable?" and candidate answers (a)radio shack, (b)substation, (c)cabinet, (d)television, (e)desk. Figure 5 (a) presents the entire AMR-CN graph marked with the red color for high attention weight paths in BERT-base-cased w/ AMR-CN-pruned graph (ACP). Figure 5 (b) displays the heatmap of the attention weights with respect to the source and target tokens in the path. The details of the paths are also provided.

Error Analysis
In some cases of failure, our model exhibits two problems as follows: • Difficulty in discriminating hard distractors: All candidate answers from the CommonsenseQA possess a hard distractor, which shares the same relation with the question. When the hard distractor exists in the ACP graph in the path learning module, it also uses the paths of the distractor, instead of the those of correct answer. This may make the model confused as it considers the distractor as the correct answer.
• AMR graph generation error: Since the AMR graph is generated from the pre-trained model, our model is at risk of using an incorrect AMR graph. An incorrectly produced AMR graph may lead the model to incorrect interpretation and distortion in the wrong direction during the path reasoning process. For example, the AMR graph generated from the question "What can help you with an illness?" is described below.

Case Study
The red edges in Figure 5 present the paths that have high attention weight for the question "What home entertainment equipment requires cable?" In Figure 5 (b), the top four paths with high attention weights are described. As opposed to predicting the answers simply with the ConceptNet graph connected to the question, we allow our model to learn relevant paths inherent in the ACP graph. That is, our graph path learning module with ACP graph is capable of commonsense reasoning exploring the paths.

Conclusions and Future Works
We introduce a new commonsense reasoning method, using the proposed ACP graph. This method outperformed the model that simply learns the ConceptNet graph. Furthermore, our method can explain the answer-inference process by interpreting the logical structure of the sentences within commonsense reasoning process. Models that applied our method exhibit higher performance compared to the previous models. However, certain problems still remain. Though the relations ARG0 and ARG1 occupy most of the core roles in the AMR graph, it is still arguable that the other choice of relations may lead to better results. Therefore, we will show the experimental results according to the different pruning rules on the CommonsenseQA task in the future. Also, we plan to develop an end-to-end learning model that incorporates the AMR generation model and the question-answering model to reduce the error propagation from the AMR generation.