IRCMS at SemEval-2018 Task 7 : Evaluating a basic CNN Method and Traditional Pipeline Method for Relation Classification

This paper presents our participation for sub-task1 (1.1 and 1.2) in SemEval 2018 task 7: Semantic Relation Extraction and Classification in Scientific Papers (Gábor et al., 2018). We experimented on this task with two methods: CNN method and traditional pipeline method. We use the context between two entities (included) as input information for both methods, which extremely reduce the noise effect. For the CNN method, we construct a simple convolution neural network to automatically learn features from raw texts without any manual processing. Moreover, we use the softmax function to classify the entity pair into a specific relation category. For the traditional pipeline method, we use the Hackabout method as a representation which is described in section3.5. The CNN method’s result is much better than traditional pipeline method (49.1% vs. 42.3% and 71.1% vs. 54.6% ).


Introduction
Scientific paper, as a major source of new technology, is a common way for tracing the dynamics of a research domain. With lots of papers published every year, scholars can't read all of them to extract useful aspects for our research domain. Information extraction (IE) is a main NLP aspects for analyzing scientific papers, which includes named entity recognition (NER) and relation extraction (RE). Scientific papers' information extraction is identify concepts or semantic relation between these concepts. This paper focuses on the relation classification between relative concepts in scientific paper.
Relation classification is one of the most important topics for analyzing scientific papers. Most of the traditional relation classification methods * Corresponding author are influenced by the handcrafted features or extra NLP tools to derive lexical features (Surdeanu et al., 2012;Kozareva, 2012). However, these methods are time consuming and there is a problem of error propagation. Additionally, the traditional semantic textual similarity measuring approaches are using a large number of pairwise similarity features to represent the text. It is difficult for these features to represent the syntactic information.
To address these problems, DNN methods have been proposed and made remarkable achievement (Qin et al., 2016a;Guo et al., 2016). This paper is based on the work of (Qin et al., 2016b) which uses a CNN architecture to control feature learning automatically. As a result, Qin et al. (2016b) minimize the application of external toolkits and resources, which is used for part of speech (POS) or other basic pretreatment. Additionally, Zeng et al. (2014) proposes position feature to locate the entity pair, so as to highlight its promotion for the semantic relation. Owing to this position feature is mapped to several (e.g. 5) dimension followed each word's vector (e.g. 100 dimension), which represents the relative distances of current word to first and second entity. This position feature will disappear because of the excessive training times or error propagation during the training procedure. Thus we use the Qin et al. (2016b)'s entity tag features to strengthen the entity pair information, which use the tag words ( e1s , e1e , e2s , e2e ) to represent start and end position features of entities. What's more, these tag features are represented as independent vector so as to avoid position feature's disappeared defect in Zeng et al. (2014).
As far as we know, most of the pervious DNN methods used entire sentence's words embedding as the input information for DNN to extracting features for relation classification such as (Xu et al.,  Liu et al., 2015). However, our goal is to achieve relation classification instead of sentence classification. Even though we use the position feature or entity tag feature to highlight the entity effect, it still suffers from that the long sentences have lots of noise words which is useless for relation classification. In pervious working, Qin et al. (2016b) just use the context between two entities, which got a remarkable performance promotion. Thus, we use Qin et al. (2016b)'s context scope as our CNN's input information.
The contributions of this paper can be summarized as follows: Firstly, we construct a simple convolution neural network architecture for relation classification without sophisticated NLP preprocessing. Secondly, we use a more effective context input for convolution neural network, which extremely reduce the useless context's noise effect. Then, we use entity tag feature to replace the entity position feature. Finally, we conduct experiments on subtasks 1.1 and 1.2 datasets, and the experiment results reveal that the proposed approaches is helpful to improve the performance.

Methodology
Our relation classification architecture is depicted as Figure 1. First, we select the scope of context words and convert them to word embeddings. In the word representation step, the entity tag feature ( e1s , e1e , e2s , e2e ) will also be encoded into embeddings. Then, all the embeddings will be transmitted to three convolution network whose kernel size is 3, 4 and 5. Finally, these three convolution outputs are pooled into the same dimensional vector which will be concatenated as a input of a softmax classifier.

Context Scope for Convolution Neural Network
Most of the existing DNN relation classification methods use entire sentence's words embedding as context information. As the following sentence, the entity bag-of-words method and segment order-sensitive methods have a Compare relation. However, the sub sentence Further, in their optimum configuration and in terms of retrieval accuracy, but much faster have little relevance to the target relation category. However, the sub sentence e1s bag-of-words method e1e are equivalent to e2s segment order-sensitive methods e2e have more relevant information to the target relation. As a result, we extract the context between two entities for relation classification. As for the no-words between entities' condition, only two entities' names have been extracted. Further, in their optimum configuration, e1s bag-of-words method e1e are equivalent to e2s segment order-sensitive methods e2e in terms of retrieval accuracy, but much faster.

Convolution Neural Network Architecture
As the Convolution part in Figure 1, the input embedding is delivered to three convolution neural networks whose kernel size is 3, 4 and 5 respectively. It means that all the 3-grams, 4-grams and 5-grams features will be considered. Since each input sentence has a different length, the convolution output will be pooled into the same dimensional vector space. Finally, we use the multiclass classifier softmax to classify the relation into a specific category.

Parameter Settings
The experiment settings are listed in Table 2.2, we use the Wikipedia general English 300 dimensional embeddings which have 408 million words 1 . After testing, we find the parameters in the Table 2.2 achieve the most effective performance. As there are lots of parameters in CNN, we list some primary data in Table 2.2. For more detailed, we will share the whole project in our Github 2 .

Effect of Position Feature
As described in previous sections, position feature is helpful to promote the classification's performance. Moreover, results in

Effect of New Context Scope
By comparing 1st and 4th lines' result, we could conclude that the entity names and words between them contain more accurate and cleaner semantic relation information. By analyzing the predicted relation of the two experiments, we find that many wrong predicted long sentence instances in entire scope (scope1) experiment have been corrected in words between entity pair scope (scope2) as Table 4.

Result of Rule-based Experiment
In our early research, we explored lots of heuristic rules (as in Table 3.5) for each category by ob-

Sentence Scope Prediction True/False
We present an implementation of the model based on finite-state models, demonstrate the e1s model's e1e ability to significantly reduce e2s character and word error rate e2e , and provide evaluation results involving extraction of translation. scope1 Compare False scope2 Result True serving the provided training set manually. However, most of sentences' category couldn't be determined by these rules. Thus, we divided this task into two-steps method. For the first step, we use the heuristic rules to classify some sentences into a specific category. Then, for the second step, we use our CNN method to classify the remaining sentence into a specific category. Before the testing set published, the heuristic rules achieved remarkable improvement in development set. On the contrary, the heuristic rules' filtering step drops the performance in the testing set as the 7th line result in Table 2. After analyzing the result, we noticed that the rules are overfitting, since all of them are explored by training set and promoted by development set. As a result, it causes the decrease of the final evolution performance in the testing set.

Results of Comparison Experiment
To further prove the better performance of our CNN's relation classification method, we also evaluated the same dataset using a more traditional NLP method which is based on the Multinomial Naive Bayes Classifier 3 . First the data from the training text file is extracted. The labels are extracted and encoded using a LabelEncoder. All the words from e1 to e2 in a sentence are considered for training the Multinomial Naive Bayes classifier. These words are also lemmatized and stemmed for better prediction. However, traditional pipeline method not only excises the error propagation problem, but also can't detect some complicated semantic information such as hyponymy or synonymy. As a result, CNN method has a better performance than traditional method. The 8th line's result in Table 2 shows that our CNN method is better than the general NLP processing method.

Conclusion and Future Work
In this paper, we propose a new convolution neural network architecture for relation classification in scientific paper. We showed that the words between entity pairs are the most important for relation classification. Finally, our proposed method gets the macro-f1 value of 49.1 for subtask 1.1 and 71.1 for subtask 1.2. For the future work, we will explore more features which are helpful for relation classification such as entity type and preposition features. Moreover, we will explore a more flexible sub-sentence scope as the context information for relation classification.