Incorporating Syntax and Frame Semantics in Neural Network for Machine Reading Comprehension

Machine reading comprehension (MRC) is one of the most critical yet challenging tasks in natural language understanding(NLU), where both syntax and semantics information of text are essential components for text understanding. It is surprising that jointly considering syntax and semantics in neural networks was never formally reported in literature. This paper makes the first attempt by proposing a novel Syntax and Frame Semantics model for Machine Reading Comprehension (SS-MRC), which takes full advantage of syntax and frame semantics to get richer text representation. Our extensive experimental results demonstrate that SS-MRC performs better than ten state-of-the-art technologies on machine reading comprehension task.


Introduction
Machine Reading Comprehension (MRC) requires machines to read and understand a text passage, and answer relevant questions about it, where MRC systems must have the ability to infer the meanings of underlying natural language, where both syntax and semantics are critical for text understanding.
Traditional MRC methods are feature-based approaches, where they first manually generate syntactic or semantic features and subsequently apply a standard machine learning model to identify a best answer. For instance, Sachan and Xing (2016) use the Abstract Meaning Representation (AMR) formalism in a max-margin framework, which only focuses on semantic information. In addition, there are studies that take both syntax and semantics into account for MRC (Wang et al., 2015;Li et al., 2018). However, these methods heavily rely on manually defined features and are difficult to generalize to other tasks.
Recently, as large QA datasets become available, Neural-based methods have been proposed for MRC, where even without time consuming feature engineering, they can still achieve favorable results. Especially significant progress has been achieved on MRC tasks by fine-tuning a pre-trained general purpose language model (Devlin et al., 2018;Radford et al., 2018). Despite the success of those neural models, a number of studies have found there is a huge gap between MRC deep learning models and human beings (Wang and Jiang, 2019), as they might not really understand the natural language text (Mudrakarta et al., 2018). As such, some works employ semantic knowledge in neural network to facilitate sentence modelling (Guo et al., 2020;Zhang et al., 2018a;Zhang et al., 2018b). However, no work has been focused on integrating syntax and semantics into neural network for MRC.
Note that FrameNet (Fillmore, 1976;Baker et al., 1998), as a widely adopted knowledge base, provides rich schematic scenario representation that could be potentially leveraged to better understand sentences. In this paper, we proposed a Syntax and Frame Semantics model for Machine Reading Comprehension (SS-MRC), which fuses Syntax and Frame Semantics into an end-to-end neural model for addressing MRC task. The key contributions of this work are summarized as follows: 1. To our best knowledge, we are the first to explore the schema of fusing syntax and frame semantics into an end-to-end neural network for Machine Reading Comprehension (MRC) task.  Figure 1: An Annotated Sentence with Syntax and Frame Semantics.
2. We propose a novel Syntax and Frame Semantics for Machine Reading Comprehension (SS-MRC) method, which takes full advantage of syntax and frame semantics for every token in the sentence/sequence to obtain richer and more comprehensive representation.
3. Our extensive experimental results demonstrate our proposed SS-MRC method is significantly better than ten state-of-the-art methods across two benchmark datasets for MRC task.

Syntax and Frame Semantics Labeling
We employ Stanford CoreNLP (Manning et al., 2014) to analyze the syntactic structure of every sentence in given text passage. Figure 1 shows dependency parse results (top part) of a example sentence. Words on the arrows are dependency labels, e.g., the arrow from word choose to word they indicates that they is the subject of choose and the dependency label of they is nsubj -a nominal syntactic subject.
In addition, we employ SEMAFOR (Das et al., 2014) to automatically process different sentences with multiple semantic annotations (Kshirsagar et al., 2015). In particular, a Frame (F) is defined as a composition of Lexical Units (LUs) and a set of Frame Elements (FEs). Given a sentence, if its certain word evokes a frame by matching a LU, then it is called Target (T). Figure 1 provides an example sentence (bottom part) with four T, namely choose, make, chocolate cake and chocolate frosting. Each T has its evoked semantic frame right below it, i.e. F 1 , F 2 , F 3 , F 4 . For each frame, its corresponding FEs are shown enclosed in the block. For example, T choose evokes the F 1 :Choosing frame, and has two FEs Cognizer, Chosen, fulfilled by They and to make a chocolate cake with chocolate frosting respectively. In addition, evoked F 2 :Manufacturing frame has three FEs, namely, Manufacturer, Product, Resource, fulfilled by They, a chocolate cake and chocolate frosting, respectively. From this example, it is very clear that both syntax and frame semantics information are very useful for MRC task.  Figure 2 shows our proposed SS-MRC model, consisting of five modules: three input modules (semantics, syntax, and context), one fusion module and one answer prediction module. We first pack passage, question and candidate answer into a sequence x = {x 1 , x 2 , . . . , x n }. Input module takes in source context x and external feature text x sf , i.e., syntactic context x s and frame semantic context x f . In particular, syntactic context x s are produced by replacing words with their dependency labels, while frame semantic context x f are produced by replacing words with frames and frame elements (Guo et al., 2020). Then Bert (Devlin et al., 2018) is employed to encode the source context x into a vector g x . After that, Syntax and Frame Semantics Fusion module fuses syntactic context x s and frame semantic context x f into a feature vector g sf , which will be elaborated in next subsection. Finally, Answer Prediction module predicts answers based on both source context representation g x and overall feature representation g sf .

Syntax and Frame Semantics Fusion Module
In this paper, we explore three different fusion methods to generate g sf by integrating x s and x f . Note that our Syntax and Frame Semantics Fusion Module is backbone-free, which indicates that we can use any existing neural models, i.e., LSTM, GRU and Transformer. In this work, we use Bi-LSTM as our backbone model.

Siamese-based Fusion Method (SFM)
Siamese-based Fusion Method (SFM) is a straightforward idea. Its architecture, as shown in Figure 3, consists of two sub-networks (Bromley et al., 1993). We run Bi-LSTM on syntactic context x s and frame semantic context x f independently, and then aggregate their vectorized representations into a vector g sf : Where ⊕ is the concatenation of g s and g f , and f (·) is a non-linear transformation.

Mixed-based Fusion Method (MFM)
While the Siamese structure is easy to train, there is no interaction between the two feature text x s and x f during the training process, which causes information loss (Wang et al., 2017). As such, we propose a Mixed-based Fusion Method (MFM), which directly concatenates the syntactic context x s and frame semantic context x f into a sequence x sf , and then performs a single BiLSTM on x sf to get vector g sf .

Location-wise Fusion Method (LFM)
We observe that every token/word can concurrently have both syntax and frame semantics information. Thus, instead of simply mixing up them in sentence level, like above two methods, we design a novel Location-wise Fusion Method (LFM) to coherently integrate both syntax and frame semantic information at token level, obtaining a better sentence representation, shown in Figure 4.

Datasets for MRC
We employ MCTest (Richardson et al., 2013) to test the performance of different models for multiplechoice machine comprehension task. It consists of two data sets, namely MCTest-160 and MCTest-500.

Implementation Details
Our implementation is based on the PyTorch of BERT (Devlin et al., 2018) and Bi-LSTM (Zhang et al., 2018a). We have used a single GPU, Nvidia P100 with 16G memory, for training our models. Adam has been selected as our optimizer with a batch size of 8, and the initial learning rate is set as 5e-5.

Experiment Results
Based on standard training-test setting of MCTest, Table 1 shows our SS-MRC model achieves 87.2% and 86.7% accuracy on MCTest-160 and MCTest-500 respectively, which is better than ten state-of-theart methods consistently, including three feature-based models (the first block), six neural-based models without syntax and semantic (the second block), and 1 neural method with Frame semantic (FSR). Recall in Section 4, we proposed three different methods, namely, SFM, MFM, LFM, to integrate syntax and semantic information. Table 2 shows their detailed results. We have the following observations: (1) No matter which of the three fusion methods we choose, their performance are all better than standard BERT model, indicating both syntax and frame semantic information are valuable in helping language understanding, and thus they can boost reading comprehension performance.

Passage
...They chose to make a chocolate cake with chocolate frosting. She helped measure the flour, the sugar ... They ate the chocolate cake at Julia's party with scoops of vanilla ice cream and fresh strawberries. Annie gave their dog, Sunny... Question What did Annie and her mother make?

Option
A) flour and sugar * B) cake and frosting C) ice cream and strawberries D) Julia and Sunny Frame Semantic {flour, sugar, cake, frosting, ice cream, strawberries} ∈ Food {Julia, Sunny} / ∈Food Make in the given passage and question evokes the same Frame Manufacturing.

Syntax
cake and frosting are obj and obl of make in the passage, and what is obj of make in the question. (2) LFM performs better than SFM and MFM, signifying that location-wise fusion method is more effective. As a token within a sentence typically has both syntax and multi-semantic annotations/functions, token level integration can systematically integrate corresponding syntax and semantic information.
To evaluate the contributions of different key factors in our SS-MRC method, three ablation studies are performed. From their results in Table 3, we observe both syntax and frame semantics contribute to the overall performance of our model, with frame semantics contributing more significantly than syntax. Note that the performance of SS-MRC (without syntax) is exactly the same as FSR (Guo et al., 2020).

Case Study
For case study, Table 4 shows an example in MCTest, where our proposed model is able to answer it correctly. Note both cake, frosting et al. belong to the Food Frame, while Julia and Sunny evoke two different Frames People and Animals respectively. The target word Make in the given passage and question evokes Frame Manufacturing. As shown in Figure 1 and Figure 5, we know that cake and frosting are obj and obl of make in the passage, and what is obj of make in the question, so we can infer that 'What' in question refers to 'cake and frosting' in passage.

did
Annie make and her mother What .

Conclusion
We propose a novel syntax and frame semantic fusion method for MRC in a neural network, which, to our best knowledge, is the first attempt in this area. Our extensive experimental results demonstrate it works better than ten state-of-the-art methods for the challenging machine reading comprehension task.