Learn to Combine Linguistic and Symbolic Information for Table-based Fact Verification

Table-based fact verification is expected to perform both linguistic reasoning and symbolic reasoning. Existing methods lack attention to take advantage of the combination of linguistic information and symbolic information. In this work, we propose HeterTFV, a graph-based reasoning approach, that learns to combine linguistic information and symbolic information effectively. We first construct a program graph to encode programs, a kind of LISP-like logical form, to learn the semantic compositionality of the programs. Then we construct a heterogeneous graph to incorporate both linguistic information and symbolic information by introducing program nodes into the heterogeneous graph. Finally, we propose a graph-based reasoning approach to reason over the multiple types of nodes to make an effective combination of both types of information. Experimental results on a large-scale benchmark dataset TABFACT illustrate the effect of our approach.


Introduction
Fact verification aims to verify whether a fact is entailed or refuted by the given evidence, which has attracted increasing attention. Recent researches mainly focus on the unstructured text as the evidence and ignoring the evidence with the structured or semi-structured format. A recently proposed dataset TABFACT (Wenhu Chen and Wang, 2020) fills this gap, which is designed to deal with the table-based fact verification problem, namely, verifying whether a statement is correct by the given semi-structured table evidence.
It is well accepted that symbolic information (such as count and only) plays a great role in understanding semi-structured evidence based statements (Wenhu Chen and Wang, 2020). However, most existing approaches for fact verification (Thorne et al., 2018;Nie et al., 2019;Zhong et al., 2020b;Soleimani et al., 2020) focus on the understanding of natural language, namely, linguistic reasoning, but fail to consider symbolic information, which plays an important role in complex reasoning (Liang et al., 2017;Dua et al., 2019;. Due to the diversity of natural language expressions, it is difficult to capture symbolic information effectively from natural language directly. Consequently, how to leverage symbolic information effectively becomes a crucial problem. To alleviate this problem, Zhong et al. (2020a) propose a graph module network that concatenates graphenhanced linguistic-level representations and program-guided symbolic-level representations together to predict the labels. However, their method focuses on the representation of symbolic information, rather than take advantage of the combination of both types of information. More specifically, we believe that the concatenation operation between two types of representations is not effective enough to leverage the linguistic information and symbolic information to perform reasoning.
In recent studies, graph neural networks show their powerful ability in dealing with semi-structured data (Bogin et al., 2019a;Bogin et al., 2019b). Under this consideration, we propose to use graph neural networks that learn to combine linguistic information and symbolic information in a simultaneous fashion. Since the representations of different types of information fall in different embedding spaces,  Figure 1: Example of the TABFACT dataset, which are expected to combine both linguistic information in the statement and the table and symbolic information in the programs. Given a table and a statement, the goal is to predict whether the label is ENTAILED or REFUTED. Program is a kind of LISP-like logical form. The program synthesis and selection process are described in Section 3.2. a heterogeneous graph structure is suitable to reason and aggregate over different types of nodes to combine different types of information.
In this paper, we propose a heterogeneous graph-based neural network for table-based fact verification named HeterTFV, to learn to combine linguistic information and symbolic information. Given a statement and a table, we first generate programs with the latent program algorithm (LPA) algorithm proposed by Wenhu Chen and Wang (2020). After that, we construct a program graph to capture the inner structure in the program and use gated graph neural network to encode the programs to learn the semantic compositionality. Then a heterogeneous graph is constructed with statement nodes, table nodes, and program nodes to incorporate both linguistic information and symbolic information, which is expected to exploit the structure in the table and build connections among the statement, table, and programs. Finally, a graph-based neural network is proposed to reason over the constructed heterogeneous graph, which enables the message passing processes of different types of nodes to achieve the purpose to combine linguistic information and symbolic information.
We conduct experiments on the TABFACT (Wenhu Chen and Wang, 2020), a large-scale benchmark dataset for table-based fact verification. Experimental results show that our model outperforms all baselines and achieves state-of-the-art performance.
In summary, the main contributions of this paper are three-fold: • We construct a heterogeneous graph by introducing program nodes, to incorporate both linguistic information and symbolic information.
• We propose a graph-based approach to reason over the constructed heterogeneous graph to perform different types of message passing processes, which makes an effective combination of linguistic information and symbolic information.
• Experimental results on the TABFACT dataset illustrate the advantage of our proposed heterogeneous graph-based approach: our model outperforms all the baseline systems and achieves a new state-of-the-art performance.

Task Definition
Given a statement and a table, our goal is to verify whether a textual statement is entailed or refuted by the given semi-structured table evidence. We evaluate our model on the TABFACT dataset (Wenhu Chen and Wang, 2020), a large-scale benchmark dataset for table-based fact verification. Figure 1 shows an Figure 2: Architecture of the proposed graph-based approach. Taking the statement, function names (such as eq, count, filter eq, and so on) and the table as input, our approach first constructs a program graph to encode programs, and then constructs a heterogeneous graph among different types of nodes. Finally, a graph-based reasoning network is applied on the constructed heterogeneous graph. Blue, yellow, and purple nodes stand for table nodes, statement nodes, and program nodes, respectively. example in the TABFACT dataset. Specifically, in our work, as shown in Figure 1, we use programs to represent symbolic information, which is a kind of LISP-like logical form proposed by (Wenhu Chen and Wang, 2020). The program synthesis and selection process are described in Section 3.2. To verify such a statement in the example, our model is expected to combine the linguistic information between the statement and the table and the symbolic information in the programs, such as count, and, and so on.

HeterTFV
In this paper, we propose a graph-based approach for table-based fact verification task, which takes advantage of both linguistic information and symbolic information. We first generate programs from the statement and table, construct a program graph, and encode each program in a bottom-up manner to learn semantic compositionality (Section 3.2). After that, we construct a heterogeneous graph with the statement, table, and program nodes to enable to reason over multiple types of nodes (Section 3.3). Finally, we propose a graph-based neural network to reason over the constructed heterogeneous graph (Section 3.4). Figure 2 shows the architecture of the proposed graph-based approach.

Context Encoding
Pre-trained models show their strong text understanding ability across a great many NLP tasks. In this work, we use pre-trained BERT (Devlin et al., 2018) as the backbone of our approach. Given the statement and the table, to make our input suitable for the pre-trained BERT model, following Wenhu Chen and Wang (2020), we first linearize the table into a sequence with a template-based method (denote as table sequence). Then we feed the statement, function names (such as eq, count, filter eq, and so on, which are pre-defined in LPA algorithm (Wenhu Chen and Wang, 2020)), and the

Program Graph Construction and Encoding
Program Synthesis and Selection Program is a kind of logical form that conveys symbolic information. In this work, we use LPA proposed by Wenhu Chen and Wang (2020) to perform the program synthesis and selection step. They define the plausible API set to include roughly 50 different functions  tableto-table, statement-to-table, table-to-statement, program-to-statement, program-to-table. We omit some columns in the table and the edges starts from program2 -program4 for brevity.
like min, max, count, average, filter, and, and so on. Each program is tree-structural. Following their setting, each case could synthesize up to 50 programs. Then a transformer-based approach is used to rank the candidate programs. A transformer-based two-way encoder is trained in a weakly supervised manner that all label-consistent programs are regarded as true programs. Due to the issue of spurious programs, namely the programs are wrong but with true returned results, we choose the top-5 predicted programs. The selected programs will be used for the rest of the paper.
Program Graph Construction Given a program, we construct a program graph in a bottom-up manner to learn the semantic compositionality of the program. Specifically, We treat each function name and argument as a node in the graph, then we add edges between each argument and its corresponding function name. Figure 3(a) shows the structure of the program graph.
Program Encoding We use gated graph neural network (Li et al., 2015) to encode programs to learn the structure inside the programs. Since all programs are tree-structural, a bottom-up manner of the edges could learn the semantic compositionality of the programs. Here we refer to a program p = {V, E}, .., e m }. V and E stands for the nodes and edges respectively. The gated graph neural network is designed as follows: where h i means the representations of the i-th node in the program graph obtained by the output of BERT encoder, E (i) represents the edges that contains the node v i , l means the l-th layer of the gated graph neural network. || stands for cancatenation operation. Finally, we obtain the final representation of the program with a max aggregator:

Heterogeneous Graph Construction
Given the statement, table, and selected programs, we construct a heterogeneous graph to perform graphbased reasoning. We treat each statement token, table token, and program as a graph node. Our graph exploits structural information in the table, and the connections among the table, statement, and programs. Figure 3(b) shows an example of our constructed graph. Our heterogeneous graph can be constructed in the following steps: • Each table node is connected to its corresponding column node.
• All table nodes in the same row are fully-connected to each other.
• Statement nodes are connected to the table nodes that have the same content.
• Program nodes are connected to the nodes in the table and statement that appears in the program.
In our heterogeneous graph,  Chen and Wang, 2020). The representations of statement nodes are obtained by program encoding. The first three types of edges are also employed in (Zhong et al., 2020a), which aims to exploit the structure in the table and build connections between the table and statement. The last type of edges is expected to build connections between programs and the table, and the connections between programs and the statement. By constructing such heterogeneous graph, we categorize the edges into five classes, namely table-to-table, statement-to-table, table-to-statement, program-to-statement, program-to-table. Table-to-table means the edge is from a table node to a table  node, others are the same. Different types of edges correspond to different message passing processes, which will be introduced in Section 3.4.

Reasoning over Heterogeneous Graph
Node initialization We get the initial representation of each node from the output of the pre-trained BERT model. For statement nodes and table nodes, each node representation is initialized by averaging the representations of corresponding tokens generated by the pre-trained BERT model. For program nodes, the representations are generated by the program encoder introduced in Section 3.2.
Heterogeneous Graph Layer Given a constructed heterogeneous graph and representations of the nodes in the graph, we use graph attention networks (GAT) (Veličković et al., 2017) to obtain node representations by encoding graph-structural information, which is designed as follows: where W a , W q , W k , W v are learned parameters, h i , h j are the representations of the i-th and j-th node obtained by the output of BERT encoder, α i,j is the attention weight between the i-th node and j-th node. || means concatenation operation.
In this work, we perform table-to-table, statement-to-table, table-to-statement, program-to-statement and program-to-table message passing processes. The first type aims to exploit the structure in the table. The second type and the third type are expected to build connections between the table and statement. The last two types are excepted to leverage the programs to build connections between programs and the statement and table.
We use GAT network for all the above message passing processes. For the i-th node, we obtain u tt i , u st i , u ts i , u ps i , u pt i respectively. Then we aggregate information as follows to map the representations from different sources to the same embedding space: where u T i means the representation of the i-th table node, u S i means the representation of the i-th statement node, W t , W s , W p , b are learned parameters. tt, st, ts, ps, pt means table-to-table, statement-totable, table-to-statement, program-to-statement and program-to-table respectively.
Besides, we add a residual connection for table nodes and statement nodes to avoid gradient vanishing as follows: m where l indicates the l-th heterogeneous graph layer.
After the iteration of several layers, we perform mean pooling over all statement nodes and table nodes to get the final aggregated representation, followed by a one-layer MLP to get the final prediction.
where W,b are trainable parameters. S and T stand for statement and

Dataset and Experimental Settings
We evaluate our model on the TABFACT (Wenhu Chen and Wang, 2020), a large-scale table-based fact verification dataset, which is a collection of 16k Wikipedia tables for 118k human-annotated natural language statements (92283 samples in training set, 12792 samples in development set and 12779 samples in test set). Each statement is labeled as either ENTAILED or REFUTED. Samples in this dataset are expected to combine linguistic information and symbolic information. We use accuracy to evaluate our model. The test set is further divided into simple channels and complex channels to distinguish the difficulty, which has 4171 and 8608 samples respectively. Besides, A small test with 2K samples is held out for human evaluation. We start with BERT-base model (Devlin et al., 2018) in all of our experiments. The maximum sequence length is 512, and the batch size is 6. We set the hidden size to 768, which is the same as the output of the BERT-base model. We adopt cross-entropy loss as our loss function. And the learning rate is 1e-5.

Baselines
In this section, we describe the baseline systems utilized in our experiments. Table-BERT (Wenhu Chen and Wang, 2020) views the table verification problem as a two-sequence binary classification program by leveraging pre-trained BERT model to encode the linearized tables and statements.

Table-BERT
Latent Program Algorithm (LPA) LPA (Wenhu Chen and Wang, 2020) formulates the table-based fact verification as a weakly supervised learning problem by synthesizing latent programs and ranking the candidate programs with a Transformer-based two-way encoder (Vaswani et al., 2017).
LogicalFactChecker LogicalFactChecker (Zhong et al., 2020a) derives a program of the statement in a semantic parsing manner and propose a program-driven module network to exploit the hierarchical structure of the program. Voting means assigning each program with equal weight and vote for the result. Weighted-Voting is to compute a weighted-sum score for each program and vote for the result. Ranking means using the result of the top-ranked program. Seq2Action means the programs are generated by the sequence-toaction approach (Chen et al., 2018;Iyer et al., 2018;Guo et al., 2018;Zhong et al., 2020a). Table 1 shows our results on the TABFACT dataset (Wenhu Chen and Wang, 2020). Our model achieves state-of-the-art performance, 72.3% accuracy on the test set. In particular, compared with Table-BERT, our model gains an improvement of 7.2% in accuracy on the test set, which shows the effect of symbolic information and the reasoning ability over multiple types of information. Besides, our model outperforms LogicalFactChecker by 0.7% on the test set by employing the same program synthesis method (program from LPA), which indicates that the proposed heterogeneous graph-based model has a better ability to combine information of different types than concatenation operation only.

Ablation Study
Effect of model components We prove the effect of our model by removing each model component sequentially. Table 2 shows the effect of our model components. We conduct ablation experiments on the development set. We first remove the program encoding process, and performing mean pooling over function names and arguments in the programs as program representations instead. The result drops by 0.7%, which proves that the proposed program encoder has a great effect. Then we remove all program information, namely perform reasoning only on the table nodes and statement nodes. The performance of our model drops by 1.6%. This proves that the combination of linguistic information and symbolic information performs better than a single one. Next, we remove the residual connection. The result drops slightly, by 0.5%. And we remove graph-based reasoning, namely concatenate representations of all types of nodes directly. our model's result drops by 1.2%, which indicates that the proposed heterogeneous graph and graph-based reasoning neural network make sense in our model. Finally, we remove all modules above, namely, we simply concatenate statement tokens and linearized table tokens into the pre-trained BERT model. Under this circumstance, The result drops by 2.2%. All ablation studies above illustrate that both program information and proposed graph-based reasoning approach are effective to combine linguistic information and symbolic information on the table-based fact verification task. Effect of message passing We conduct ablation experiments on the development set to better understand different types of message passing processes. The experimental results are shown in Table 3. We first remove the table-to-table message passing process, the result drops by 1.0%, which proves the importance of the structure in the table. Then we remove the statement-to-table message passing process, the result of our model drops by 0.6%. And we remove the table-to-statement message passing process, the result drops slightly, by 0.2%, which indicates that the connections between the table and the statement also have some effect. Next, we remove the program-to-statement message passing process, the result drops by 0.8%. Finally, we remove program-to-table message passing process, the result drops by 0.5%. The removal of program-to-statement and program-to-table message passing processes indicates that the combination of linguistic information and symbolic information plays a great role in our model. Besides, the results above show that all types of message passing play a great role, which prove the rationality of our constructed heterogeneous graph and the effect of our proposed graph-based reasoning approach to combine different types of information. Figure 4 shows an example in our experiments. We omit some rows, columns in the table and some programs for simplicity. To verify the statement there be more than 5 number of week for 50073 attendance, the key issue is to understand more than in the statement. In our selected programs, the first program contains the filter greater function, which could help the model to understand more than in the statement. Besides, the first selected program can represent the meaning of the statement according to the definition of the latent program algorithm (LPA). The program encoding process could catch the structure and learn the semantic compositionality of the program, which is beneficial to verify the statement. And although the second selected program couldn't stand for the meaning of the statement completely, some keywords in the program could be also helpful to verify the statement.

Error Analysis
We analyze the error examples in the TABFACT dataset (Wenhu Chen and Wang, 2020), which can be mainly categorized into three classes.
In our analysis, the first error type of our framework is due to the failure to synthesize programs. For the statement the shortest name of any competitor belongs to the skip for spain, we failed to synthesize the corresponding programs, which may be because that some entities in the statement are not detected in the entity linking phase. It's hard to incorporate symbolic information without any program.
The second type of error is due to spurious programs, which is especially serious in binary classification tasks such as table-based fact verification. For the statement there be over 15 million us viewer in season 2, we selected four programs. The key issue to verify this fact is to compare the number of the us viewer in season 2 and 15 million. However, for example, the program eq { count { filter eq { all rows ; no in season ; 15 } } ; count { filter eq { all rows ; no in season ; 2 } } } aims to judge whether the number of instances in the table whose season is 15 and the number of instances in the table whose season is 2 are equal, which leads to a misunderstanding of the model because there is no enough information to perform a comparison operation.
The third type of error is due to the diversity of program expression. For example, the program formats

Fact Verification
The fact verification task aims to verify whether a textual statement is entailed or refuted by the given evidence. Most of the existing methods of fact verification focus on dealing with the unstructured text as evidence. Thorne et al. (2018) release a dataset named FEVER. There are several works on this dataset. Nie et al. (2019) modify ESIM  and design a neural semantic matching network for fact verification.  propose to reason and aggregate over claim-evidence pair with graph attention network (Veličković et al., 2017).  propose a kernel graph attention network that conducts more fine-grained evidence selection and reasoning. Zhong et al. (2020b) construct a semanticlevel graph and present two graph-driven representation learning mechanisms to perform reasoning over the claim-evidence graph.
Recently, there have been some studies on reasoning over semi-structured evidence. Wenhu Chen and Wang (2020) first release the TABFACT dataset, which is a large-scale dataset for table-based fact verification. Besides, they release two baselines: Table-BERT and latent program algorithm (LPA). In Table-BERT, they propose an approach to convert a table into the natural language and apply the pre-trained BERT model to predict the result. In LPA, they first parse the statement into the LISPlike program format to represent its semantics, then rank all candidates with a discriminator that with a Transformer-based two-way encoder (Vaswani et al., 2017). Zhong et al. (2020a) propose a graphbased neural module network to utilize logical operations for fact checking by presenting a graph-based mask matrix for self-attention mechanism. Different from their work, they focus on how to represent symbolic information, and we focus on how to combine different types of information. In our work, we propose a heterogeneous graph-based neural network that aims to make a effective combination of linguistic information and symbolic information.

Heterogeneous Graph for NLP
Graph neural networks show their superior performance when dealing with structured data. Most of the existing works focus on the homogeneous graph that all nodes in the graph have the same type. However, most of the graphs in the real world have different types, namely heterogeneous graph. A lot of works have emerged based on heterogeneous graphs.  propose a novel heterogeneous graph neural network based on the hierarchical attention, including node-level and semantic-level attentions. Linmei et al. (2019) establish a heterogeneous graph among documents, topics, and entities to perform reasoning on the semi-supervised short text classification task. Tu et al. (2019) introduce a heterogeneous graph constructed with documents, entities, and candidates for multi-hop reading comprehension. Wu et al. (2019) propose a novel relation-aware dual-graph convolutional network to incorporate relation information for heterogeneous knowledge graphs. Wang et al. (2020) propose a heterogeneous graph network for extractive summarization. Shangwen Lv and Hu (2020) propose a heterogeneous graph from heterogeneous external knowledge source for commonsense question answering. Inspired by the success of the heterogeneous graph-based neural networks above, in this work, we construct a heterogeneous graph among the statement, table, and programs to enable the combination of multiple types of information for the table-based fact verification task and propose a graph-based neural network to reason over the proposed heterogeneous graph.

Conclusion
In this work, we focus on the table-based fact verification task, propose HeterTFV, a graph-based approach that learns to combine linguistic information and symbolic information effectively by reasoning over the constructed heterogeneous graph. Specifically, we first construct a program graph to encode programs. Then we construct a heterogeneous graph among tables, statements, and programs to incorporate linguistic information and symbolic information. Finally, we propose a heterogeneous graph-based neural network to perform reasoning to learn to combine both types of information. We conduct experiments on the TABFACT dataset. Experimental results illustrate that our model outperforms all the baseline systems with considerable increment and achieves state-of-the-art performance. In the future, we will do more attempt to combine linguistic information and symbolic information on the other tasks, such as knowledge graph reasoning or text reasoning.