A Frame-based Sentence Representation for Machine Reading Comprehension

Sentence representation (SR) is the most crucial and challenging task in Machine Reading Comprehension (MRC). MRC systems typically only utilize the information contained in the sentence itself, while human beings can leverage their semantic knowledge. To bridge the gap, we proposed a novel Frame-based Sentence Representation (FSR) method, which employs frame semantic knowledge to facilitate sentence modelling. Specifically, different from existing methods that only model lexical units (LUs), Frame Representation Models, which utilize both LUs in frame and Frame-to-Frame (F-to-F) relations, are designed to model frames and sentences with attention schema. Our proposed FSR method is able to integrate multiple-frame semantic information to get much better sentence representations. Our extensive experimental results show that it performs better than state-of-the-art technologies on machine reading comprehension task.


Introduction
Machine Reading Comprehension (MRC) requires machines to read and understand a text passage, and answer relevant questions about it. Human beings can easily understand the meaning of a sentence based on their semantic knowledge. For instance, given a sentence Katie bought some chocolate cookies, people know Katie is a buyer, chocolate cookies are goods and belong to Food class etc. Existing machine learning approaches, however, face great challenges to address complicated MRC questions, as they do not have above semantic knowledge.
Nevertheless, FrameNet (Fillmore, 1976;Baker et al., 1998), as a knowledge base, provides schematic scenario representation that could be potentially leveraged to better understand sentences. * Corresponding author: Ru Li.  It enables the development of wide-coverage frame parsers (Gildea and Jurafsky, 2002;Das et al., 2014), as well as various real-world applications, ranging form event recognition (Liu et al., 2016), textual entailment (Burchardt et al., 2009), question answering (Ofoghi et al., 2009), narrative schemas (Chambers and Jurafsky, 2010) and paraphrase identification (Zhang et al., 2018), etc. In particular, Frame (F) is defined as a composition of Lexical Units (LUs) and a set of Frame Elements (FEs). Given a sentence, if its certain word evokes a frame by matching a LU, then it is called Target (T). It is worth mentioning that FrameNet arranges different relevant frames into a network by defining Frameto-Frame (F-to-F) relations. Table 1 provides an example of F, FEs, LUs, T and F-to-F, where target word bought in sentence Katie bought some chocolate cookies evokes a frame Commerce buy as it matches with a LU buy. Note target word chocolate cookies evokes a different frame Food. How to utilize semantic knowledge from FrameNet? We observe the existing works mainly focus on LU vector embedding within a frame (Hermann and Blunsom, 2014;Bojanowski et al., 2017;Glavas et al., 2019), without modeling a frame as a whole. In addition, many sentences could have more than one target words that will evoke multiple frames, but there is less existing method to integrate rich multi-frame relations from FrameNet together. To address the above problems, in this paper, we proposed a novel Frame-based Sentence Representation (FSR) method, which leverages rich frame semantic knowledge, including both generalizations of LUs and F-to-F relations, to better model sentences. The key contributions of this work are summarized as follows: 1. We propose novel attention-based frame representation models, which take full advantage of LUs and F-to-F relations to model frames with attention schema.
2. We propose a new Frame-based Sentence Representation (FSR) method that integrates multi-frame semantic information to obtain richer semantic aggregation for better sentence representation.
3. Our experimental results demonstrate our proposed frame-based sentence representation (FSR) method is very effective on Machine Reading Comprehension (MRC) task.

Frame Representation Model
In this section, we present our Frame Representation Model, considering both LUs and F-to-F.
. . , u Fm n , . . .} be the LUs set of F m , where U Fm ∈ R (H·N ) , N stands for the total number of LUs in F m , and u Fm n be the n-th LU of F m . t Fm is a target word, matching a LU in F m . We proposed 3 different frame representation models.

Lexical Units Aggregation Model (LUA)
Lexical Units Aggregation Model (LUA) is a straightforward idea. Given a frame F m , it averages all its underlying LU representation u Fm n (u Fm n ∈ U Fm ) to represent the frame entirely:

Lexical Units Attention Model (TLUA)
Each frame in above LUA model has the same representation for different sentences, as they do not distinguish the importance of each LU in the frame.
To address this issue, we propose TLUA model, utilizing an attention scheme to automatically weight different LUs for the frame, according to target word T in the given sentence, shown in Figure 1. More specifically, we compute the weighted sum of target word T's representation and other LUs' representations based on their importance wrt T. In other words, we emphasize T as it occurs in the given sentence, which can reduce the potential noise introduced by irrelevant LUs in the same frame. It should be noted that we encode multiple word target by averaging of all words representations in it.
Here, U Fm represents the LUs set of F m which is not include t Fm , and U Fm ∈ R H·(N −1) .

Frame Relation Attention Model (FRA)
The key problem in MRC is to analyze semantic relations among multiple sentences. As such, we propose a novel FRA model, which takes advantage of F-to-F relations to get much richer semantic information, shown in Figure 2.
Given frame F m , F + m = {F m,1 , . . . , F m,w , . . .} represents its expanded frames, including all the frames that can be linked to F m through F-to-F relation chains in FrameNet, with no more than 3 hops to only keep close relations. Note attention schemes have been designed for both intra-frame and inter-frames. Particularly, intra-frame attention focuses on relevant LUs, while inter-frames attention emphasizes relevant frames, avoiding the influence from less relevant but linked frames.

Frame-based Sentence Representation
Given a sentence s = {x 1 , x 2 , . . . , x k , . . . } where each x k is a word, let T k be the k-th frame-evoking target of s, and T k evokes F k frame. F E ki denotes the i-th frame element of F k , and P ki denotes the ith span fulfilling F E ki . We define a frame semantic quadruple c k =< T k , F k , F E kn , P kn >, where c k represents the k-th quadruple of s.

Sentence Semantic Annotations with Multiple Frames
In this paper, we employ SEMAFOR (Das et al., 2014) to automatically process sentences with multiple semantic annotations (Kshirsagar et al., 2015). Figure 3 provides an example sentence with three T, namely bought, some, chocolate cookies. Each T has its evoked semantic frame right below it. For each frame, its FE are shown enclosed in the block where dark grey indicates the corresponding T, and the words fulfilling the FEs are connected to the corresponding text. For example, T bought evokes the Commerce buy frame, and has the Buyer, Goods FEs fulfilled by Katie and some chocolate cookies.
The sentence s in Figure 3 has three quadruples:

Frame Integration Representation
In Figure 4, c k (k=1, 2, 3) is the input. We first compute its matrix representation c t k , with columns denoting different semantic information. Then, we formalize sentence representation as follows: Where K represents the total number of quadruples in the sentence. φ(c t k , P k ) is aggregate operation, used to form frame set representation c t based on the information of P and T in the sequence. Finally, we encode sentence information by neural network models.

Models for MRC
To better analyze the performance of our proposed method on MRC, we apply both BERT (Devlin et al., 2018) and LSTM (Hochreiter and Schmidhuber, 1997) as our neural models. Also, we construct the input as: the passage as sequence A, and the

Datasets for MRC
We employ MCTest (Richardson et al., 2013) to test the system performance of multiple-choice machine comprehension task. It consists of two data sets, namely MCTest-160 and MCTest-500.

Experiment Results
Table 2 shows our FSR model achieves 86.1% accuracy on MCTest-160, which is significantly better than all the nine state-of-the-art methods. In addition, it also achieves very competitive results on MCTest-500, i,e, much better than eight existing methods, slightly worse than BERT+DCMN+ model. This is encouraging, as our model is much simpler than BERT+DCMN+, which uses much more sophisticated architecture.  Recall in Section 2, we proposed three different methods, namely, LUA, TLUA, FRA, for frame representation. Table 3 shows their detailed results: (1) No matter for BERT or bi-LSTM, if we add frame semantic information, the performance improves by several percents, indicating frame information is valuable in semantic understanding.
(2) Comparing TLUA with LUA, TLUA performs better, signifying attention scheme in TLUA can capture semantic information more accurately.
(3) Finally, FRA further improves LUA and TLUA's performance, as sentences within a passage typically have semantic connections with each other, and it is thus necessary to take advantage of F-to-F relations to enrich semantic information.

Case Study
For case study, Table 4 shows an example in M-CTest, where we are able to answer it correctly. Both Chips, Chocolate cookies belong to the Food frame, while Flowers and Bows evoke two different frames Plants and Accoutrements respectively. The target words Found and Buy in the given passage/question evoking different frames Locationg and Commerce buy -note in FrameNet they are connected due to their semantic relations, facilitating us to find answer B) Chocolate cookies.

Conclusion
We propose a novel Frame-based Sentence Representation method, which integrates multi-frame semantic information to facilitate sentence modelling. Our extensive experimental results demonstrate it works very well for the challenging machine reading comprehension task.