Neural Open Information Extraction

Conventional Open Information Extraction (Open IE) systems are usually built on hand-crafted patterns from other NLP tools such as syntactic parsing, yet they face problems of error propagation. In this paper, we propose a neural Open IE approach with an encoder-decoder framework. Distinct from existing methods, the neural Open IE approach learns highly confident arguments and relation tuples bootstrapped from a state-of-the-art Open IE system. An empirical study on a large benchmark dataset shows that the neural Open IE system significantly outperforms several baselines, while maintaining comparable computational efficiency.


Introduction
Open Information Extraction (Open IE) involves generating a structured representation of information in text, usually in the form of triples or n-ary propositions. An Open IE system not only extracts arguments but also relation phrases from the given text, which does not rely on pre-defined ontology schema. For instance, given the sentence "deep learning is a subfield of machine learning", the triple (deep learning; is a subfield of ; machine learning) can be extracted, where the relation phrase "is a subfield of " indicates the semantic relationship between two arguments. Open IE plays a key role in natural language understanding and fosters many downstream NLP applications such as knowledge base construction, question answering, text comprehension, and others. The Open IE system was first introduced by TEXTRUNNER ( Banko et al., 2007), followed by several popular systems such as REVERB , OLLIE (Mausam et al., 2012), ClausIE (Del Corro and Gemulla, 2013) Stanford OPENIE (Angeli et al., 2015), PropS  and most recently OPENIE4 1 (Mausam, 2016) and OPENIE5 2 . Although these systems have been widely used in a variety of applications, most of them were built on hand-crafted patterns from syntactic parsing, which causes errors in propagation and compounding at each stage (Banko et al., 2007;Gashteovski et al., 2017;Schneider et al., 2017). Therefore, it is essential to solve the problems of cascading errors to alleviate extracting incorrect tuples.
To this end, we propose a neural Open IE approach with an encoder-decoder framework. The encoder-decoder framework is a text generation technique and has been successfully applied to many tasks, such as machine translation Sutskever et al., 2014;Wu et al., 2016;Gehring et al., 2017;Vaswani et al., 2017), image caption , abstractive summarization (Rush et al., 2015;See et al., 2017) and recently keyphrase extraction (Meng et al., 2017). Generally, the encoder encodes the input sequence to an internal representation called 'context vector' which is used by the decoder to generate the output sequence. The lengths of input and output sequences can be different, as there is no one on one relation between the input and output sequences. In this work, Open IE is cast as a sequence-to-sequence generation problem, where the input sequence is the sentence and the output sequence is the tuples with special placeholders. For instance, given the input sequence "deep learning is a subfield of machine learning", the output sequence will be " arg1 deep learning /arg1 rel is a subfield of /rel arg2 machine  Figure 1: The encoder-decoder model architecture for the neural Open IE system learning /arg2 ". We obtain the input and output sequence pairs from highly confident tuples bootstrapped from a state-of-the-art Open IE system. Experiment results on a large benchmark dataset illustrate that the neural Open IE approach is significantly better than others in precision and recall, while also reducing the dependencies on other NLP tools. The contributions of this paper are threefold. First, the encoder-decoder framework learns the sequence-to-sequence task directly, bypassing other hand-crafted patterns and alleviating error propagation. Second, a large number of highquality training examples can be bootstrapped from state-of-the-art Open IE systems, which is released for future research. Third, we conduct comprehensive experiments on a large benchmark dataset to compare different Open IE systems to show the neural approach's promising potential.

Problem Definition
Let (X, Y ) be a sentence and tuples pair, where X = (x 1 , x 2 , ..., x m ) is the word sequence and Y = (y 1 , y 2 , ..., y n ) is the tuple sequence extracted from X. The conditional probability of P (Y |X) can be decomposed as: In this work, we only consider the binary extractions from sentences, leaving n-ary extractions and nested extractions for future research. In addition, we ensure that both the argument and relation phrases are sub-spans of the input sequence. Therefore, the output vocabulary equals the input vocabulary plus the placeholder symbols.

Encoder-Decoder Model Architecture
The encoder-decoder framework takes a variable length input sequence to a compressed representation vector that is used by the decoder to generate the output sequence. In this work, both the encoder and decoder are implemented using Recurrent Neural Networks (RNN) and the model architecture is shown in Figure 1.
The encoder uses a 3-layer stacked Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) network to covert the input sequence X = (x 1 , x 2 , ...x m ) into a set of hidden representations h = (h 1 , h 2 , ..., h m ), where each hidden state is obtained iteratively as follows: The decoder also uses a 3-layer LSTM network to accept the encoder's output and generate a variable-length sequence Y as follows: where s t is the hidden state of the decoder LSTM at time t, c is the context vector that is introduced later. We use the softmax layer to calculate the output probability of y t and select the word with the largest probability. An attention mechanism is vital for the encoderdecoder framework, especially for our neural Open IE system. Both the arguments and relations are sub-spans that correspond to the input sequence. We leverage the attention method proposed by Bahdanau et al. to calculate the context vector c as follows: where a is an alignment model that scores how well the inputs around position j and the output at position i match, which is measured by the encoder hidden state h j and the decoder hidden state s i−1 . The encoder and decoder are jointly optimized to maximize the log probability of the output sequence conditioned on the input sequence.

Copying Mechanism
Since most encoder-decoder methods maintain a fixed vocabulary of frequent words and convert a large number of long-tail words into a special symbol " unk ", the copying mechanism (Gu et al., 2016;Gulcehre et al., 2016;See et al., 2017;Meng et al., 2017) is designed to copy words from the input sequence to the output sequence, thus enlarging the vocabulary and reducing the proportion of generated unknown words. For the neural Open IE task, the copying mechanism is more important because the output vocabulary is directly from the input vocabulary except for the placeholder symbols. We simplify the copying method in (See et al., 2017), the probability of generating the word y t comes from two parts as follows: where V is the target vocabulary. We combine the sequence-to-sequence generation and attentionbased copying together to derive the final output.

Data
For the training data, we used Wikipedia dump 20180101 3 and extracted all the sentences that are 40 words or less. OPENIE4 is used to analyze the sentences and extract all the tuples with binary relations. To further obtain highquality tuples, we only kept the tuples whose confidence score is at least 0.9. Finally, there are a total of 36,247,584 sentence, tuple pairs extracted. The training data is released for public use at https://1drv.ms/u/s!ApPZx_ TWwibImHl49ZBwxOU0ktHv.
For the test data, we used a large benchmark dataset  that contains 3,200 sentences with 10,359 extractions 4 . We compared with several state-of-the-art baselines including OLLIE, ClausIE, Stanford OPENIE, PropS and OPENIE4. The evaluation metrics are precision and recall.

Parameter Settings
We implemented the neural Open IE model using OpenNMT (Klein et al., 2017), which is an open source encoder-decoder framework. We used 4 M60 GPUs for parallel training, which takes 3 days. The encoder is a 3-layer bidirectional LSTM and the decoder is another 3-layer LSTM. Our model has 256-dimensional hidden states and 256-dimensional word embeddings. A vocabulary of 50k words is used for both the source and target sides. We optimized the model with SGD and the initial learning rate is set to 1. We trained the model for 40 epochs and started learning rate decay from the 11 th epoch with a decay rate 0.7. The dropout rate is set to 0.3. We split the data into 20 partitions and used data sampling in OpenNMT to train the model. This reduces the length of the epochs for more frequent learning rate updates and validation perplexity computation.

Results
We used the script in  5 to evaluate the precision and recall of different baseline systems as well as the neural Open IE system. The precision-recall curve is shown in Figure 2. It is observed that the neural Open IE system performs best among all tested systems. Furthermore, we also calculated the Area under Precision-Recall Curve (AUC) for each system. The neural Open IE system with top-5 outputs achieves the best AUC score 0.473, which is significantly better than other systems. Although the neural Open IE is learned from the bootstrapped outputs of OPENIE4's extractions, only 11.4% of the extractions from neural Open IE agree with the OPENIE4's extractions, while the AUC score is even better than OPENIE4's result. We believe this is because the neural approach learns arguments and relations across a large number of highly confident training instances. This also indicates that the generalization capability of the neural approach is better than previous methods. We observed many cases in which the neural Open IE is able to correctly identify the boundary of arguments but OpenIE4 cannot, for instance:

Input
Instead , much of numerical analysis is concerned with obtaining approximate solutions while maintaining reasonable bounds on errors . Gold much of numerical analysis ||| concerned ||| with obtaining approximate solutions while maintaining reasonable bounds on errors OpenIE4 much of numerical analysis ||| is concerned with ||| obtaining approximate solutions Neural Open IE much of numerical analysis ||| is concerned ||| with obtaining approximate solutions while maintaining reasonable bounds on errors This case illustrates that the neural approach reduces the limitation of hand-crafted patterns from other NLP tools. Therefore, it reduces the error propagation effect and performs better than other systems especially for long sentences.
We also investigated the computational cost of different systems. For the baseline systems, we obtained the Open IE extractions using a Xeon 2.4 GHz CPU. For the neural Open IE, we evaluated performance based on an M60 GPU. The running time was calculated by extracting Open IE tuples from the test dataset that contains a total of 3,200 sentences. The results are shown in Table 1. Among the aforementioned conventional systems, Ollie is the most efficient approach which takes around 160s to finish the extraction. By using GPU, the neural approach takes 172s to extract the tuples from the test data, which is comparable

Related Work
The development of Open IE systems has witnessed rapid growth during the past decade (Mausam, 2016). The Open IE system was introduced by TEXTRUNNER (Banko et al., 2007) as the first generation. It casts the argument and relation extraction task as a sequential labeling problem. The system is highly scalable and extracts facts from large scale web content. REVERB  improved over TEXTRUNNER with syntactic and lexical constraints on binary relations expressed by verbs, which more than doubles the area under the precision-recall curve. Following these efforts, the second generation known as R2A2  was developed based on REVERB and an argument identifier, ARGLEARNER, to better extract the arguments for the relation phrases. The first and second generation Open IE systems extract only relations that are mediated by verbs and ignore contexts. To alleviate these limitations, the third generation OLLIE (Mausam et al., 2012) was developed, which achieves better performance by extracting relations mediated by nouns, adjectives, and more. In addition, contextual information is also leveraged to improve the precision of extractions. All the three generations only consider binary extractions from the text, while binary extractions are not always enough for their semantics representations. Therefore, SRLIE (Christensen et al., 2010) was developed to include an attribute context with a tuple when it is available. OPENIE4 was built on SRLIE with a rule-based extraction system RELNOUN (Pal and Mausam, 2016) for extracting noun-mediated relations. Recently, OPENIE5 improved upon extractions from numerical sentences (Saha et al., 2017) and broke conjunctions in arguments to generate multiple extractions. During this period, there were also some other Open IE systems emerged and successfully applied in different scenarios, such as ClausIE (Del Corro and Gemulla, 2013) Stanford OPENIE (Angeli et al., 2015), PropS , and more.
The encoder-decoder framework was introduced by Cho et al. and Sutskever et al., where a multi-layered LSTM/GRU is used to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM/GRU to decode the target sequence from the vector. Bahdanau et al. and Luong et al. further improved the encoder-decoder framework by integrating an attention mechanism so that the model can automatically find parts of a source sentence that are relevant to predicting a target word. To improve the parallelization of model training, convolutional sequence-to-sequence (ConvS2S) framework (Gehring et al., 2016(Gehring et al., , 2017 was proposed to fully parallelize the training since the number of non-linearities is fixed and independent of the input length. Recently, the transformer framework (Vaswani et al., 2017) further improved over the vanilla S2S model and ConvS2S in both accuracy and training time.
In this paper, we use the LSTM-based S2S approach to obtain binary extractions for the Open IE task. To the best of our knowledge, this is the first time that the Open IE task is addressed using an end-to-end neural approach, bypassing the handcrafted patterns and alleviating error propagation.

Conclusion and Future Work
We proposed a neural Open IE approach using an encoder-decoder framework. The neural Open IE model is trained with highly confident binary extractions bootstrapped from a state-of-the-art Open IE system, therefore it can generate highquality tuples without any hand-crafted patterns from other NLP tools. Experiments show that our approach achieves very promising results on a large benchmark dataset.
For future research, we will further investigate how to generate more complex tuples such as nary extractions and nested extractions with the neural approach. Moreover, other frameworks such as convolutional sequence-to-sequence and transformer models could apply to achieve better performance.