JAIST: A two-phase machine learning approach for identifying discourse relations in newswire texts

In this paper, we present a machine learning approach for identifying shallow discourse relations in news wire text. Our approach has 2 phases. The arguments detection phase will identify arguments and explicit connectives by using the Conditional Random Fields (CRFs) learning algorithm with a set of features such as words, parts of speech (POS) and features extracted from the parsing tree of sentences. The second phase, the sense classification phase, will classify arguments and explicit connectives into one of fifteen types of senses by using the SMO classifier with a simple feature set. The performance of sys-tem was evaluated three different data sets given by the CoNLL 2015 Shared Task. The parser of our system was ranked 4 of 16 participating systems on F-measure when evaluating on the blind data set (strict matching).


Introduction
The shallow discourse parsing task given by the CoNLL 2015 Shared Task proposed by Xue et al. (2015) aims to extract discourse relations in newswire texts. Each discourse relation is a set of four: two arguments, connective words and senses. However, the connective words may not be available in case of implicit discourses. Identifying discourse relations is clearly an important part of natural language understanding that benefits a wide range of natural language applications. A number of applications of discourse information have been proposed for recent years. For example, in the task of identifying paraphrase texts, Bach et al. (2014) has used discourse information to compute the similarity score between two sentences or Somasundaran et al. (2009) has used discourse relations to improve the performance of the opinion polarity classification task.
In the past, this task is solved at different levels. Lin et al. (2009) have used supervised learning method to build a maximum entropy classifier to identify implicit relations. Ghosh et al. (2011Ghosh et al. ( , 2012 have used CRFs with a set of local and global features to recognize arguments of discourses from texts. However, in contrast to the CoNLL 2015 SDP Shared Task, Ghosh et al. (2011Ghosh et al. ( , 2012 just considered explicit relations with explicit connectives have been provided.
Our team approach for this shared task composes two phases. In the first phase, we use CRFs and a set of features such as words, POS and pattern features based on parsing tree of sentences to build models for recognizing arguments and connective words. In the second phase, we use the SMO algorithm, an optimization of SVM, to build a classifier to predict the senses of discourse relations.
The remainder of this paper is structured as follows: Section 2 describes the details of the proposed system for solving the task of identifying shallow discourse relations given by CoNLL 2015 Shared Task. We also describe the experimental results and some analysis in Section 3. Finally, Section 4 presents our conclusions and future works.

System description
Our parser system is divided into 2 phases. Firstly, documents without discourse information will be passed through the argument detection phase to recognize components of discourse relations such as both of arguments and explicit connectives if it is possible. Secondly, the sense classification phase will identify the sense of discourse relation by using a SVM classifier then format the results according to the expected output of evaluate system.

Figure 1. Workflow of the arguments detection phase
The workflow of the first phase consists of two stages. In the training stage, we use machine-learning algorithms to build models, which will be used to identify boundaries of components in the parsing stage. In order to learn models by using machine learning (ML) algorithms, we use some popular features in such as words, parts of speech. Besides, we extract a set of pattern features based on the parsing tree of sentences.
According to our analysis of discourse relations, two arguments of each discourse relation may be appeared at different positions: in the same sentence, in two consecutive sentences or in far apart sentences. Based on the statistic of discourse relations in the training dataset, we see that the number of discourse relations which two arguments located in the same sentence or two consecutive sentences is in a large quantity (92.5%). Therefore, our system focus on identifying these kinds of discourse relations by building two models: one for recognizing discourse relations in the same sentence (SS) and another model for recognizing discourse relations in two consecutive sentences (2CS).
To build learning models using ML algorithms, we need to extract the features from the data set for the input of the ML algorithm. Each type of discourse relation (SS-type or 2CS-type) has some common features and some reserved features. Table 1 describes all features that are used for machine learning approaches in our experiments. sentence: S_ begin_with_PP After all required features are extracted, the training data and these extracted features will be formatted as the input format of the machine learning algorithm tool in which words of discourse relations are marked labels using IOB notations. We use CRF++ (Taku Kudo, 2005), an implementation of the Conditional Random Fields (John Lafferty et al, 2001) to train models from the training data sets.
After models are built, they were used to predict the discourse labels of new documents (in the parsing stage) then the result will be converted into expected format.
Section 2.1.1 and 2.1.2 will describe the details of all features we used in our experiments.

Common features:
• Popular language features (A-C): including words, their parts of speeches and their stems.
• Connective features (D): The features show whether or not the words belong to a predefined connective list. Predefined connective lists are constructed from connective words in the training data set.
Then we use these lists to extract this feature for building the model.

• Brown clusters features (E):
Brown clusters, introduced and prepared by Turian (2010), were successfully applied in some named entity recognition tasks. In Brown clusters, the semantic similarities of words in the same cluster are higher than of words in different clusters. We use the Brown cluster index of words as a feature for the ML process.
• Noun phrases, verb phrases and clauses features (F, G): all words of a noun phrase, verb phrase or clauses are often located entirely in arguments. Moreover, the beginning of arguments is often the same with the beginning of noun phrases, verb phrases or clauses. We extract noun phrases, verb phrases and clause based on the syntactic parse tree of sentences.

Pattern based features based on syntactic parse trees
Our analysis on the training corpus shows that the syntactic information based on the syntactic parse trees is very important for identify discourse relations. According to our analysis, sentences that express discourse relations are usually follow some special syntax. Therefore, if we can extract features based on these special syntaxes, the system will recognize arguments of discourse relations more exactly.
Due to the linguist characteristic of discourses in sentences, each kind of discourse relations (SS-type or 2CS-type) has different pattern feature sets. Below are patterns based on syntactic parse trees we have used to extract features for each of type: Pattern features for SS-type discourses recognition (H, I, K): We have three patterns that help to recognize boundaries of arguments of SS-type discourse relations. These patterns are based on the syntactic characteristic of discourse expressions using prepositions or conjunctions such as and, but, if, although, … For example, the pattern S_CC_S (feature H) and SBAR_CC_SBAR (feature I) indicate S nodes of which child nodes matched with the pattern S(.*)CC(.*)S(.*) or SBAR(.*)CC(.*)SBAR(.*). In this case, related S-nodes or SBAR-nodes may be the arguments of a discourse relation. Figure 2 shows an example of sentences which matches with pattern S_CC_S. In this example, the matched left S node and the right S node are arguments of a discourse relation in the training data set. Another pattern is SBAR_IN_S (Feature K). This pattern matched with sentences of which SBAR node has an IN node ("if", "although", "before", "after", "though") follow by an S node. If a sentence match with this pattern, the S node is often the first arguments and the rest is often the second argument of a discourse relation. Figure 3 shows an example of sentences matched with the pattern SBAR_IN_S. When arguments of discourse relations are not located in the same sentence the task is more difficult. To build the model for identifying 2CS discourses, we will extract pattern-based features of each pair of sentences in the training data set based on parsing tree of sentences. Our analysis on the training corpus shows that if a pair of sentence in which the second sentence begins with a conjunction, an adverb, and a preposition (e.g. "for example", "by comparison" and so on) or a noun phrase followed by an adverb (e.g., "also"), the right most clauses of first sentence and the left most sentence in the second sentence may be arguments of a discourse relation. We use patterns from L-O to extract these features.

Phase 2: Sense classification
After arguments and explicit connectives of discourses are identified, we need to identify the sense of these discourses. These discourses without sense information are passed through a classifier with a model trained in the training stage to identify the correct senses of discourse relations. This model used in the above step are built by using the Sequential Minimal Optimization algorithm (SMO), a fast algorithm for training support vector machines (John Platt, 1998), with some simple features such as: connective words; type of discourses (SS or 2CS); does the first character of connective words capital or not? The workflow of sense classification phase is shown in Figure 4. We use LIBSVM (Chang and Lin. 2011) -a library that implemented SMO algorithm to build the model and classify discourses into the senses category. The trained model for sense classification task achieves an F-score of 79.8% (Preci-sion=80.9%, R=81.6%) when evaluate using cross validation 10-fold method.
One limitation of our sense classification step is that it just takes into account discourses with explicit connectives, so the sense recognition of non-implicit discourses still has not been solved yet. Table 2 shows the evaluation result of our system on the three data sets provided by the CONLL Shared task 2015, the rank column is the rank of our system when compare with other participating systems. In general, this task is a difficult task, so the result is not as high as our expectation. Moreover, due to the usage of special syntactic patterns extracted from parse trees, the precision scores of our system is higher than other teams. However, these patterns just cover several special cases, so the recall score of our system is low. The comparison of the evaluation result between explicit discourses and non-explicit discourses are shown in Table 3. With the help of special patterns based on explicit connectives and parse trees, the result of explicit discourses recognition is higher than the result of nonexplicit discourses recognition for both of precision and recall scores. The feature set based on the syntactic parse tree is very important for our system. Table 4 shows the comparison between two different feature set on the development data set. The FULL feature set consists of all feature including lexical, part of speeches, and pattern features based on syntactic parse trees and so on. However, in the SHORT feature set, we remove all pattern features based on syntactic parse trees to evaluate the importance of these features. The result, which just considered discourse relations in the same sentences, showed that there is a significant improvement when we use the FULL feature set instead of the SHORT feature set.

Conclusion
Our approach to the Shallow Discourse Parsing at CONLL 2015 Shared task was to create a 2-phase system that identifies discourse relations in newswire text. Results show that our approach achieves the high precision of all systems and was ranked 4 th in terms of F1-measure when strict matching is used.
In the future we would like to improve the recall of our approach by exploring the use of a wider range of features.