The Instantiation Discourse Relation: A Corpus Analysis of Its Properties and Improved Detection

INSTANTIATION is a fairly common discourse relation and past work has suggested that it plays special roles in local coherence, in sentiment expression and in content selection in summarization. In this paper we provide the ﬁrst systematic corpus analysis of the relation and show that relation-speciﬁc features can improve considerably the detection of the relation. We show that sentences involved in INSTANTIATION are set apart from other sentences by the use of gradable (subjective) adjectives, the occurrence of rare words and by different patterns in part-of-speech usage. Words across arguments of INSTANTI - ATION are connected through hypernym and meronym relations signiﬁcantly more often than in other sentences and that they stand out in context by being signiﬁcantly less similar to each other than other adjacent sentence pairs. These factors provide substantial predictive power that improves the identiﬁcation of implicit INSTANTIATION relation by more than 5% F-measure.


Introduction
In an INSTANTIATION relation, one text span explains in further detail the events, reasons, behaviors and attitudes mentioned in the other , as illustrated by the segments below: [a] Other fundamental "reforms" of the 1986 act have been threatened as well.
[b] The House seriously considered raising the top tax rate paid by individuals with the highest incomes.
Sentence [a] mentions "other reforms" and a threat to them, but leaves unspecified what are the reforms or how they are threatened. Sentence [b] provides sufficient detail for the reader to infer more concretely what has happened.
The INSTANTIATION relation has some special properties. A study of discourse relations as indicators for content selection in single document summarization revealed that the first sentences from INSTANTIATION pairs are included in human summaries significantly more often than other sentences  and that being a first sentence in an INSTANTIATION relation is the most powerful indicator for content selection related to discourse relation sense. The sentences between which the relation holds also contain more sentiment expressions than other sentences (Trnavac and Taboada, 2013), making it a special target for sentiment analysis applications. Moreover, INSTANTIATION relations appear to play a special role in local coherence , as the flow between IN-STANTIATION sentences is not explained by the major coherence theories (Kehler, 2004;Grosz et al., 1995). Many of the sentences in INSTANTIATION relation contain entity instantiations (complex examples of set-instance anaphora), such as "several EU countries"-"the UK", "footballers"-"Wayne Rooney" and "most cosmetic purchase"-"lipstick" (McKinlay and Markert, 2011), raising further questions about the relationship between INSTANTIA-TIONS and key discourse phenomena.
Detecting an INSTANTIATION, however, is hard. In the Penn Discourse Treebank (PDTB) (Prasad et al., 2008), INSTANTIATION is one of the few relations that are more often implicit, i.e., expressed without a discourse marker such as "for exam-ple". Identifying implicit discourse relation is an acknowledged difficult task (Braud and Denis, 2015;Ji and Eisenstein, 2015;Rutherford and Xue, 2014;Biran and McKeown, 2013;Park and Cardie, 2012;Lin et al., 2009;Pitler et al., 2009), but the challenge is exacerbated due to the lack of explicit IN-STANTIATIONs: explicit relations are shown to improve their implicit counterparts using data source expansion (Rutherford and Xue, 2015). Moreover, detecting INSTANTIATION also involves the skewed class distribution problem (Li and Nenkova, 2014a) because although it is one of the largest class of implicit relations, it constitutes less than 10% of all the implicit relations annotated in the PDTB.
In this work, we identify a rich set of factors that sets apart each sentence in an implicit INSTAN-TIATION and the pair as a whole. We show that these factors improve the identification of implicit INSTANTIATION by at least 5% in F-measure and 8% in balanced accuracy compared to prior systems.

Presence of INSTANTIATION
We use the Penn Discourse Tree Bank (PDTB) (Prasad et al., 2008) for the analysis and experiments presented in this paper. There are a total of 1,747 IN-STANTIATION relations in the PDTB, of which 83% are implicit. INSTANTIATIONs make up 8.7% of all implicit relations and is the 5th largest among the 16 second-level relations in the PDTB.

Characteristics of INSTANTIATION
We identify significant factors 1 that characterize: (i) s 1 and s 2 : the first and second sentence in an IN-STANTIATION pair vs. all other sentences; (ii) s 1 vs. s 2 : adjacent sentence pairs in INSTANTIATION relation vs. all other adjacent sentence pairs.
Our analysis is conducted on the PDTB except section 23, which is reserved for testing as in prior work (Lin et al., 2014;Biran and McKeown, 2015  Sentence length. Intuitively, longer sentences are more likely to involve details. Table 1 demonstrates that there is an average of 8.4-word difference in length between the two sentences in an INSTAN-TIATION relation; moreover, s 1 s are significantly shorter (more than 5 words on average) than other sentences, and s 2 s are significantly longer.
Rare words. For each sentence, we compute the percentage of words that are not present in the 400K vocabulary of the Glove vector representations (Pennington et al., 2014). Table 2 shows that s 1 of INSTANTIATIONs contain significantly fewer out-of-vocabulary words compared to either s 2 and non-INSTANTIATIONs. We also compare the difference in unigram probability 2 of content word pairs across sentence pairs, i.e., (w i , w j ), w i ∈ s 1 , w j ∈ s 2 . Compared to non-INSTANTIATION, words across INSTANTIATION arguments show significantly larger average unigram log probability difference (1.24 vs. 1.22). These numbers show that the first sentences of INSTANTIATION do not involve many unfamiliar words -an indication of higher readability (Pitler and Nenkova, 2008).
Gradable adjectives. The use of gradable adjectives (Frazier et al., 2008;de Marneffe et al., 2010)-popular, high, likely-may require further explanation to justify the appropriateness of their use. Here we compute the average percentage of gradable adjectives in a sentence. The list of adjectives is from Hatzivassiloglou and Wiebe (2000) and the respective percentages are shown in Table 3. Compared to other sentences, s 1 of INSTANTIATION involves significantly more gradable adjectives.   Parts of speech. We study word categories that are heavily or rarely used with INSTANTIATION by inspecting the percentage of part-of-speech tags found in each sentence. In Table 4, we show POS tags whose presence is significantly different across arguments in INSTANTIATION but not so across non-INSTANTIATION, with significance in non-INSTANTIATION in the reverse direction denoted in †. Four cases of POS occurrences are inspected: • more often in s 1 compared to s 2 , • more often in s 2 compared to s 1 , • more (+) or less (-) in s 1 compared to non-

INSTANTIATION.
We see that s 1 of INSTANTIATION contains more characteristic POS usage than s 2 . There are more comparative adjectives and adverbs as well as fewer nouns in s 1 compared to s 2 in INSTANTIATION pairs. The usage of verbs is also different between two arguments and s 1 contains more conjunctions and existential there. On the other hand, s 2 contains more nouns, numbers, determiners, personal pronouns and proper nouns, intuitively associated with the presence of detailed information.
Wordnet relations. Here we consider word-level relationships across arguments using Wordnet (Fellbaum, 1998   adverb content word pairs across arguments, we calculate the percentage of sentences with each type of Wordnet relation. Shown in Table 5, among IN-STANTIATION sentence pairs there are significantly more noun-noun pairs with hypernym or meronym relationships and verbs with indirect hypernym relationship. We also observe significantly more semantically similar verbs (group (v)).
Lexical similarity. Finally, we inspect the similarity between sentences in each pair as well as between each sentence in a pair and their immediate prior context; specifically: • Between s 1 and s 2 ; • Between s 1 and C and between s 2 and C, where C denotes up to two sentences immediately before s 1 .
We compute the Jaccard similarity between sentences using their nouns, verbs, adjectives and adverbs. INSTANTIATION arguments are significantly less similar than other adjacent sentence pairs (0.335 vs. 0.505), indicating higher difference in content.   Table 6, both arguments of INSTANTI-ATION are less similar to the immediate context. While other sentence pairs follow the pattern that s 2 is much less similar to s 1 's immediate context, this phenomenon is not significant for INSTANTIATION.

Experiments
In this section, we demonstrate the benefit of exploiting INSTANTIATION characteristics in the identification of the relation.
Settings. Following prior work that identifies the more detailed (second-level) relations in the PDTB (Biran and McKeown, 2015;Lin et al., 2014), we use sections 2-21 as training, section 23 as testing. The rest of the corpus is used for development. The task is to predict if an implicit INSTANTIATION relation holds between pairs of adjacent sentences in the same paragraph. Sentence pairs with INSTANTI-ATION relation constitute the positive class; all other non-explicit relations 3 constitute the negative class. We use Logistic Regression with class weights inversely proportional to the size of each class.
Features. The factors discussed in § 3 are adopted as the only features in the classifier. We use the average values of s 1 and s 2 and their difference for: the number of words, difference in number of words compared to the sentence before s 1 , the percentage of OOVs, gradable adjectives, POS tags and Jaccard similarity to immediate context. We use the minimum, maximum and average differences in wordpair unigram log probability, and average Jaccard similarity across sentence pairs. For Wordnet relations, we use binary features indicating the presence of a relation.

including AltLex, EntRel and NoRel
Results. To compare with our INSTANTIATIONspecific classifier (Inst. specific), we show results from two state-of-the-art PDTB discourse parsers that identify second-level relations: Biran and McKeown (2015) (B&M) and Lin et al. (2014). We also compare the results with the classifier from our prior work (Li and Nenkova, 2014b) (L&N). In that work we introduce syntactic production-stick features, which minimize the occurrence of zero values in instance representation. Furthermore, we reimplemented Brown-cluster features (concatenation of clusters in each sentence) that have been shown to perform well in identifying INSTANTIATION's parent class EXPANSION (Braud and Denis, 2015). 4 Table 7 shows the precision, recall, F-measure and balanced accuracy (average of the accuracies for the positive and negative class respectively) for each system. We show balanced accuracy rather than overall accuracy due to the highly skewed class distribution. For Inst. specific, we use a threshold of 0.65 for positive labels 5 . We also use it along with L&N for a soft voting classifier where the label is assigned to the class with larger weighted posterior probability sum from each classifier 6 . Both classifiers achieved at least 5% improvement of Fmeasure and 8%-10% improvement of balanced accuracy compared to other systems. These improvements mostly come from a dramatic improvement in recall. The improvement achieved by the voting classifier also indicate that Inst. specific provide complementary signals to syntactic production rules. Note that compared to Lin et al., Inst. specific behaves very differently in precision and recall, indicating potential for further system combination.
Finally, we analyze the confusion matrix induced by false positives and false negatives across Lin et al., B&M, Inst. specific and soft vote 7 . In Table 8, we list relations contributing at least 10% to false positives for at least one system. Remarkably, INSTANTIATION

Conclusion
We have characterized the implicit INSTANTIATION relation by studying significant factors that discriminate individual arguments and the sentence pairs connected by the relation. We show distinctive patterns in sentence length, word usage, semantic relationships between words as well as cross-argument and contextual similarity associated with INSTANTI-ATION. Using these factors as features we demonstrate significant improvement on the detection of implicit INSTANTIATION relation.