Jointly Extracting Explicit and Implicit Relational Triples with Reasoning Pattern Enhanced Binary Pointer Network

Relational triple extraction is a crucial task for knowledge graph construction. Existing methods mainly focused on explicit relational triples that are directly expressed, but usually suffer from ignoring implicit triples that lack explicit expressions. This will lead to serious incompleteness of the constructed knowledge graphs. Fortunately, other triples in the sentence provide supplementary information for discovering entity pairs that may have implicit relations. Also, the relation types between the implicitly connected entity pairs can be identified with relational reasoning patterns in the real world. In this paper, we propose a unified framework to jointly extract explicit and implicit relational triples. To explore entity pairs that may be implicitly connected by relations, we propose a binary pointer network to extract overlapping relational triples relevant to each word sequentially and retain the information of previously extracted triples in an external memory. To infer the relation types of implicit relational triples, we propose to introduce real-world relational reasoning patterns in our model and capture these patterns with a relation network. We conduct experiments on several benchmark datasets, and the results prove the validity of our method.


Introduction
Relational triple extraction is defined as automatically recognizing semantic relations with triple structures (subject, relation, object) among multiple entities in a sentence. It is a critical task for constructing Knowledge Graphs (KGs) from unlabeled corpus (Dong et al., 2014).
Early work of relational triple extraction applied pipeline methods (Zelenko et al., 2003;Chan and Roth, 2011), which ran entity recognition and relation classification separately. However, such pipeline approaches suffered from error propagation. To address this issue, recent work proposed to jointly extract entity and relations from the text with feature-based methods (Yu and Lam, 2010;Li and Ji, 2014;Ren et al., 2017). Afterward, neural network-based models were proposed to eliminate hand-crafted features (Gupta et al., 2016;Zheng et al., 2017). More recently, several methods were proposed to extract overlapping triples, such as tagging-based (Dai et al., 2019;Wei et al., 2020), graph-based (Wang et al., 2018;, copy-based (Zeng et al., 2018(Zeng et al., , 2019(Zeng et al., , 2020 and token pair linking models (Wang et al., 2020). Existing models achieved considerable success on extracting explicit triples which have direct relational expressions in the sentence. However, there are many implicit relational triples that are not explicitly expressed. For example, in Figure 1, the explicit triples are strongly indicated by the key relational phrases, but the implicit relation "Live in" is not expressed explicitly. Unfortunately, existing methods usually ignored implicit triples (Zhu et al., 2019), which will cause serious incompleteness of the constructed KGs and performance degradation of downstream tasks (Angeli and Manning, 2013;Jia et al., 2020;Jun et al., 2020).
Our work is motivated by several observations. First, other relational triples within a sentence provide supplementary information for discovering entity pairs that may have implicit relational connections. For example, in Figure 1, the explicit triples establish a relational connection between "Mark Spencer" and "Huntsville" through the intermediate entity "Digium". Second, the relation types of implicit relation triples can be derived through real-world reasoning patterns. For example, in Figure 1, the reasoning pattern "one lives where the company he works for is located" helps identify the type of the implicit triple as "Live in".
In this paper, we propose a unified framework for the joint extraction of explicit and implicit relational triples. We propose a Binary Pointer Network (BPtrNet), which is based on the pointer network (Vinyals et al., 2015), to extract overlapping relational triples relevant to each word sequentially. To discover implicitly connected entity pairs, we preserve the information of previously extracted triples in an external memory and use it to enhance the extraction of later time steps. To infer the relation types between the implicitly connected entity pairs, we propose to augment our model with realworld relational reasoning patterns and capture the relational inference logic with a Relation Network (RN) (Santoro et al., 2017). The RN obtains a pattern-enhanced representation from the memory for each word pair. Then the Reasoning pattern enhanced BPtrNet (R-BPtrNet) uses the word pair representation to compute a binary score for each candidate triple. Finally, triples with positive scores are output as the extraction result.
The main contributions of this paper are: • We propose a unified framework to jointly extract explicit and implicit relational triples. • To discover entity pairs that are implicitly connected by relations, we propose a BPtrNet model to extract overlapping relational triples sequentially and utilize an internal memory to retain the extracted triples. • To enhance the relation type inference of implicitly connected entity pairs, we propose to introduce relational reasoning patterns, captured with a RN, to augment our model. • We conduct experiments on several benchmark datasets and the experimental results demonstrate the validity of our method.

Related Work
Early work of relational triple extraction addressed this task in a pipelined manner (Zelenko et al., 2003;Zhou et al., 2005;Chan and Roth, 2011;Gormley et al., 2015). They first ran named entity recognition to identify all entities and then classified relations between all entity pairs. However, these pipelined methods usually suffered from error propagation problem and failed to capture the interactions between entities and relations.
To overcome these drawbacks, recent research focused on jointly extracting entities and relations, including feature-based models (Yu and Lam, 2010;Li and Ji, 2014;Ren et al., 2017) and neural network-based models (Gupta et al., 2016;Miwa and Bansal, 2016;Zheng et al., 2017). For example, Ren et al. (2017) proposed to jointly embed entities, relations, text features and type labels into two low-dimensional spaces. Miwa and Bansal (2016) proposed a joint model containing two long-short term memories (LSTMs) (Gers et al., 2000) with shared parameters. Zheng et al. (2017) proposed to extract relational triples directly by transforming this task into a sequence tagging problem, whose tags contain the information of entities and the relations they hold. However, they only assigned one label for each word, which means that this method failed to extract overlapping triples. Subsequent work proposed several mechanisms to solve this problem: (1) labeling tagging sequences for words (Dai et al., 2019) or entities (Yu et al., 2019;Wei et al., 2020); (2) transforming the sentence into a graph structure (Wang et al., 2018;; (3) generating triple element sequences with copy mechanism (Zeng et al., 2018(Zeng et al., , 2019(Zeng et al., , 2020Nayak and Ng, 2020); (4) linking token pairs with a handshake tagging scheme (Wang et al., 2020). However, these methods usually ignored implicit relational triples that are not directly expressed in the sentence (Zhu et al., 2019), thus will lead to the incompleteness of the resulting KGs and negatively affect the performance of downstream tasks (Angeli and Manning, 2013;Jia et al., 2020).
Our work is motivated by two observations. First, other triples in the sentence provide supplementary evidence for discovering entity pairs with implicit relational connections. Second, the relation types of the implicit connections need to be identified through real-world reasoning patterns.
In this paper, we propose a unified framework for the joint extraction of explicit and implicit relational triples. We propose a binary pointer network to sequentially extract overlapping relational triples and externally keep the information of predicted triples for exploring implicitly connected entity pairs. We also propose to introduce real-world reasoning patterns in our model to help derive the relation type of implicit triples with a relation network. Experimental results on several benchmark datasets demonstrate the effectiveness of our method.   Figure 2: The overall framework of our approach.

Our Approach
The overall framework of our approach is shown in Figure 2. We introduce the Binary Pointer Network (BPtrNet) and the Relation Network (RN) in Section 3.1 and 3.2 and the details of training and inference in Section 3.3, respectively.

Binary Pointer Network
Existing methods usually failed to extract implicit relational triples due to the lack of explicit expressions (Zhu et al., 2019). Fortunately, we observe that other triples in the sentence can help discover entity pairs that may have implicit relational connections. For instance, in the sentence "George is Judy's father and David's grandfather", the relation between Judy and David is not explicitly expressed. In this case, if we first extract the explicit triple (Judy, father, George) and keep its information in our model, we can easily establish an implicit connection between Judy and David through George because George is explicitly connected with David by the relational keyword "grandfather". Inspired by this observation, our model extracts relational triples relevant to each word sequentially and keeps all previous triples of this sentence to enhance the extraction at future time steps. This word-by-word extraction process can be regarded as transforming a text sequence into a sequence of extracting actions, which leads us to a sequence-to-sequence (seq2seq) model. Therefore, we propose a Binary Pointer Network (BPtrNet), based on a seq2seq pointer network (Vinyals et al., 2015), to jointly extract explicit and implicit relational triples. Our model first encodes the words of a sentence into vector representations (Section 3.1.1). Then, we use a binary decoder to sequentially transform the vectors into (overlapping) relational triples (Section 3.1.2). We also introduce an external memory to retain previously extracted triples for enhancing future decoding steps (Section 3.1.3).

Encoder
Given a sentence [w 1 , . . . , w n ], we first capture morphological patterns of entities with a convolutional neural network (CNN) (LeCun et al., 1989) and compute the character representation c i of the word w i (i = 1, . . . , n): Then we introduce the context-sensitive representations p 1:n captured with a pre-trained Language Model (LM) to bring rich semantics and prior knowledge from the large-scale unlabeled corpus. We feed c i , p i and the word embedding w i into a bidirectional LSTM (BiLSTM) to compute the contextualized word representations x 1:n and encode the sentence with another BiLSTM: (1)

Binary Decoder
First, to capture the interactions between entities and relations, we recognize the entities with a spanbased entity tagger (Yu et al., 2019;Wei et al., 2020) and transform the tags into vectors as part of the decoder's input ( Figure 2). Specifically, we assign each token a start and end tag to indicate whether the current token corresponds to a start or end position of an entity of a certain type: where (W s , b s ) and (W e , b e ) are parameters of the start and end tag classifiers, respectively. Then we obtain the entity tag embedding e i ∈ R de by averaging the look-up embeddings of the start and end tags. We also capture a global contextual embedding g by max pooling over h E 1:n . Then we adopt a LSTM as the decoder Next, we introduce how to extract relational triples at the i-th time step. We consider the current word as the object entity, select words as subjects that form triples with the object from all the words of the sentence, and predict the relation types between the subjects and the object. For example, in Figure 2, when the current object is "Huntsville", the model selects "Digium" as the subject and classifies the relation as "Locate in". Thus ("Digium", "Locate in", "Huntsville") is extracted as a relational triple. Multi-token entities are represented with their last words and recovered by finding the nearest start tags of the same type from their last positions. However, the original softmax pointer in (Vinyals et al., 2015) only allows an object to point to one subject, thus fails to extract multiple triples with overlapping objects. To address this issue, we propose a binary pointer, which independently computes a binary score for each subject to form a relational triple with the current object under each relation type. Our method naturally solves the overlapping triple problem by producing multiple positive scores at one step ( Figure 2). We formulate the score of the triple (w j , r, w i ) as: and extract this candidate triple as a relational triple if s ji is higher than some threshold, such as 0.5 in our model (i, j = 1, . . . , n). σ and ρ are the sigmoid and tanh functions, respectively. r ∈ R d R is the type embedding of the relation r. W ptr and b ptr are parameters of the binary pointer.

External Memory
We introduce an external memory M to keep the previously extracted triples of the sentence. We first initialize M as an empty set. After the decoder's extraction process at the i-th time step, we represent the extracted triple t = (w st , r t , w i ) as: Then we update the memory with the representations of the output triples t 1 , . . . , t N i : where N i is the number of the currently extracted triples and N = i k=1 N k . Note that we set and update the external memory for each sentence independently, and the memory stores only the triple representations of one single sentence. Thus triples of other sentences will not be introduced into the sentence currently being extracted. Finally, the triples in the memory are utilized to obtain the reasoning pattern-enhanced representations for future time steps, as described in Section 3.2.

Relation Network for capturing patterns of relational reasoning
Relation types of implicit relational triples are difficult to infer due to the lack of explicit evidence, thus need to be derived with real-world relational reasoning patterns. For example, in the sentence "George is Judy's father and David's grandfather", the relation type between "Judy" and "David" can be inferred as "father" using the pattern "father's father is called grandfather". Based on this fact, we propose to enhance our model by introducing real-world relational reasoning patterns. We capture the patterns with a Relation Network (RN) (Santoro et al., 2017), a neural network module specially designed for relational reasoning. A RN is essentially a composite function over a relational triple set T : RN(T ) = f φ {g θ (t)} t∈T , where f φ is an aggregation function and g θ projects a triple into a fixed-size embedding. We set the memory M as the input relational triple set T and utilize the RN to learn a pattern-enhanced representation h P ji for the word pair (w j , w i ) at the i-th time step. First, the g θ reads the triple representations from M and projects them with a fully-connected layer: Then f φ selects useful triples with a gating network 1 : and aggregates the selected triples with the word pair to compute h P ji using another fully-connected layer: Finally, we modify Equation 4 as s (r) ji = σ(r h P ji ) to compute the binary scores of candidate triples. We denote our Reasoning pattern enhanced BPtr-Net model as R-BPtrNet. Note that we use quite simple formulas for f φ and g θ because our contribution focuses on the effectiveness of introducing relational reasoning patterns for this task rather than the model structure. Exploration for more complex structures will be left for future work.

Training and Inference
We calculate the triple loss of a sentence as a binary cross entropy over valid candidate triples T v , whose subject and object are different entities (or the end words of different entities): where s t is the score of the candidate triple t, y t = 1 for gold triples and 0 for others. We also train the entity tagger with a cross-entrory loss: whereŷ s/e i are the gold start and end tags of the i-th word, respectively. Finally, we train the R-BPtrNet with the joint loss L = L t + L e .
To prevent error propagation, we use the gold entity tags to filter out valid candidate triples and compute the tag embeddings e 1:n during training. We also update the memory M with the gold relational triples. During inference, we extract triples from scratch and use the predicted entity tags and relational triples instead of the gold ones. 1 We don't use the more common attention mechanism (Bahdanau et al., 2015) to select triples because the attention weights are restricted to sum to 1. If all triples in the memory are useless, they will still be assigned a large weight due to the restriction, which will confuse the model.

Datasets and Evaluation Metrics
We evaluate our method on two benchmark datasets. NYT (Riedel et al., 2010) consists of sentences from the New York Times corpus and contains 24 relation types. WebNLG (Gardent et al., 2017) was created for natural language generation task. It contains 171 relation types 2 and was adopted for relational triple extraction by (Zeng et al., 2018). We split the sentences into three categories: Normal, SingleEntityOverlap (SPO) and EntityPairOverlap (EPO) following Zeng et al. (2018). The statistics of the two datasets are shown in Table 1. Following previous work (Zeng et al., 2018;Wei et al., 2020;Wang et al., 2020), an extracted relational triple is regarded as correct only if the relation and the heads of both subject and object are all correct. We report the standard micro precision, recall, and F 1 -scores on both datasets.

Train Test Train Test
Normal 37013

Experimental Settings
We determine the hyper-parameters on the validation sets. We use the pre-trained GloVe (Pennington et al., 2014) embeddings as w. We adopt a one-layer CNN with d c = 30 channels to learn c from 30-dimensional randomly-initialized character embeddings. We choose the state-of-the-art RoBERTa LARGE    (Zeng et al., 2019) 77.9 67.2 72.1 63.3 59.9 61.6 GraphRel  63.9 60.0 61.9 44.7 41.1 42.9 ETL-Span (Yu et al., 2019) 84.9 72.3 78.1 84.0 91.5 87.6 CopyMTL (Zeng et al., 2020) 75.7 68.7 72.0 58.0 54.9 56.4 WDec (Nayak and Ng, 2020) 94  We add 10% dropout (Srivastava et al., 2014) on the input of all LSTMs for regularization. Following previous work (Zeng et al., 2018;Wei et al., 2020;Wang et al., 2020), we set the max length of input sentences to 100. We use the Adam optimizer (Kingma and Ba, 2014) to fine-tune the LM and train other parameters with the learning rates of 10 −5 and 10 −3 , respectively. We train our model for 30/90 epochs with the batch size as 32/8 on NYT/WebNLG. At the beginning of the last 10 epochs, we load the parameters with the best validation performance and divide the learning rates by ten. Finally, we choose the best model on the validation set and output results on the test set.

Performance Evaluation
We present our results on the NYT and WebNLG test sets in Table 2 and compare them with several previous state-of-the-art models: • NovelTagging (Zheng et al., 2017) transformed this task into a sequence tagging problem but neglected the overlapping triples. • CopyRE (Zeng et al., 2018) proposed a seq2seq model based on the copy mechanism to generate triple element as sequences. Multi-Task Learning (MTL) framework based on CopyRE to address multi-token entities. • WDec (Nayak and Ng, 2020) proposed an encoder-decoder architecture for this task. • CGT UniLM (Ye et al., 2020) proposed a generative transformer module with a triple contrastive training object. • CASREL (Wei et al., 2020) proposed a cascade binary tagging framework. • TPLinker (Wang et al., 2020) proposed a onestage token pair linking model with a novel handshaking tagging scheme. From Table 2 we have the following observations: (1) The R-BPtrNet significantly outperforms all previous non-LM methods. It demonstrates the superiority of our seq2seq-based framework to jointly extract explicit and implicit relational triples and improve the performance for this task. Additionally, the R-BPtrNet produces competitive performance to the BERT-based baseline models  Table 3: F 1 scores on sentences with different overlapping patterns and different triple numbers. The best scores are in bold and the second-best scores are underlined. † marks scores reproduced by (Wang et al., 2020). without using BERT. It shows that the improvements of our model come not primarily from the pre-trained LM representations, but from the introduction of relational reasoning patterns to this task.
(2) R-BPtrNet BERT outperforms BERT-based baseline models. It indicates that our method can effectively extract implicit relational triples with the assistance of the triple-retaining external memory and the pattern-capturing RN. (3) R-BPtrNet RoBERTa further outperforms R-BPtrNet BERT and other baseline methods. It indicates that the more powerful LM brings more prior knowledge and real-world relational facts, enhancing the model's ability to learn real-world relational reasoning patterns.

Performance on Different Sentence Types
To demonstrate the ability of our model in handling the multiple triples and overlapping triples of a sentence, we split the test sets of NYT and WebNLG datasets according to the overlapping Figure 3: An ablation study on a manually selected subset with rich implicit relational triples. patterns and the number of triples. We conduct further experiments on these subsets and report the results in Table 3, from which we can observe that: (1) The R-BPtrNet RoBERTa and R-BPtrNet BERT both significantly outperform previous models on the SPO and EPO subsets of NYT and WebNLG datasets. It proves the validity of our method to address the overlapping triple problem. Moreover, we find that implicit relational triples usually overlap with others. Therefore, the improvements on the overlapping subsets also validate the effectiveness of our method for extracting implicit relational triples.
(2) R-BPtrNet RoBERTa and R-BPtrNet BERT both bring improvements to sentences with multiple triples compared to baseline models. It indicates that our method can effectively extract multiple relational triples from a sentence. Furthermore, we observe more significant improvements when the number of triples grows. We hypothesize that this is because implicit relational triples are more likely to occur in sentences with more triples. Our model extracts the implicit relational triples more correctly and improves the performance.

Ablation Study on Implicit Triples
We run an ablation study to investigate the contribution of each component in our model to the implicit relational triples. We manually select 134 sentences with rich implicit triples from the NYT test set 4 . We conduct experiments on the subset Sentence ... organizing an expedition starting in November in Jinghong , a small city in the Yunnan province in China.
We ' re providing a new outlet for them for distribution , '' said Chad Hurley , chief executive and ... of YouTube , a division of Google.
... Edmund , was an influential municipal judge in Crowley, who was ... as well as a close adviser to former Louisiana Gov. Edwin Edwards ...  Figure 4: Examples of sentences with implicit relational triples, and the predictions from the TPLinker BERT and R-BPtrNet BERT model. Bold and colored texts are entities. We distinguish different entities with different colors. Explicit and implicit relational triples are represented by black and green solid arrows, respectively. Blue dashed arrows indicate explicit relational connections between entities, but they do not appear as relational triples because their relation types don't belong to the pre-defined relations of the dataset.

Ground
using the following ablation options: • R-BPtrNet RoBERTa and R-BPtrNet BERT are the full models using RoBERTa LARGE and BERT BASE as LMs, respectively. • R-BPtrNet removes the pre-trained LM representations from the full model. • BPtrNet removes the RN from the R-BPtrNet. Under this setting, we feed a gated summation of the memory into the decoder's input of the next time step. • BPtrNet NoMem removes the external memory from the BPtrNet, which means that the previously extracted triples are not retained.
We compare the performance of these options with the previous BERT-based models. We also analyze the performance on predicting only the entity pairs and the relations, respectively. We illustrate the results in Figure 3, from which we can observe that: (1) BPtrNet NoMem produces comparable results to the baseline models. We speculate that it benefits from the seq2seq structure and the previous triples are embedded into the decoder's hidden states.
(2) BPtrNet brings huge improvements over the BPtrNet NoMem to the entity pair and the triple F 1 scores. It indicates that the external memory effectively helps discover entity pairs that have implicit relational connections by retaining previously extracted triples.
(3) R-BPtrNet brings significant improvements over the BPtrNet to the relation and the triple F 1 scores. It indicates that the RN effectively captures the relational reasoning patterns and enhances the relation type inference of implicit relations. (4) The pre-trained LMs only bring minor improvements. It proves that the effectiveness of our model comes primarily from the external memory and the introduction of relational reasoning patterns rather than the pre-trained LMs. Figure 4 shows the comparison of the best previous model TPLinker BERT and our R-BPtrNet BERT model on three example sentences from the implicit subset in Section 4.5. The first example contains the transitive pattern of the relation "contains". The second example contains a multi-hop relation path pattern between "Chad Hurley" and "Google" through the intermediate entity "Youtube". The third example contains a composite pattern between the siblings "Crowley" and "Edwin Edwards" with a common ancestor "Edmund". We can observe that the TPLinker BERT model fails to extract the implicit relational triples. The R-BPtrNet BERT successfully captures various reasoning patterns in the real world and effectively extracts all the implicit relational triples in the examples.

Conclusion
In this paper, we propose a unified framework to extract explicit and implicit relational triples jointly.
To discover entity pairs that may have implicit relational connections, we propose a binary pointer network to extract relational triples relevant to each word sequentially and introduce an external memory to retain the extracted triples. To derive the relation types of the implicitly connected entity pairs, we propose to introduce real-world relational reasoning patterns to this task and capture the reasoning patterns with a relation network. We conduct experiments on two benchmark datasets, and the results prove the effectiveness of our method.