Oracle Summaries of Compressive Summarization

This paper derives an Integer Linear Programming (ILP) formulation to obtain an oracle summary of the compressive summarization paradigm in terms of ROUGE. The oracle summary is essential to reveal the upper bound performance of the paradigm. Experimental results on the DUC dataset showed that ROUGE scores of compressive oracles are significantly higher than those of extractive oracles and state-of-the-art summarization systems. These results reveal that compressive summarization is a promising paradigm and encourage us to continue with the research to produce informative summaries.


Introduction
Compressive summarization, a joint model integrating sentence extraction and sentence compression within a unified framework, has been attracting attention in recent years (Martins and Smith, 2009;Berg-Kirkpatrick et al., 2011;Almeida and Martins, 2013;Qian and Liu, 2013;Kikuchi et al., 2014;Yao et al., 2015). Since compressive summarization methods can use a sub-sentence as an atomic unit, they can pack more information into summaries than extractive methods, which employ sentences as atomic units. Thus, compressive summarization is essential when we want to produce summaries under tight length constraints. There are two approaches to compress entire document(s) to be grammatical; one is trimming the phrase structure trees (Berg-Kirkpatrick et al., 2011) and the other is trimming the dependency trees obtained from the document(s) (Martins and Smith, 2009;Almeida and Martins, 2013;Qian and Liu, 2013;Kikuchi et al., 2014;Yao et al., 2015). This paper focuses on the latter approach because recently it has been receiving much attention.
To measure the performance of compressive summarization methods, ROUGE (Lin, 2004), an automatic evaluation metric, is widely used. ROUGE evaluates a system summary by exploiting a set of human-made reference summaries to give a score in the range [0,1]. When n-gram occurrences of the system summary agree with those in a set of reference summaries, the value is 1. However, system summaries cannot achieve ROUGE=1 since summarization systems cannot reproduce reference summaries in most cases. In other words, the maximum ROUGE score that can be achieved by compressive summarization is unclear. As a result, researchers cannot know how much room for further improvement is left. Thus, it is beneficial to reveal the upper bound summary that achieves the maximum ROUGE score and can be produced by the systems. The upper bound summary is known as the oracle summary. To obtain the oracle summary on extractive summarization paradigms, several approaches have been proposed. Sipos et al. (2012) utilized a greedy algorithm, and Kubina et al. (2013) utilized exhaustive search based on heuristics. However, their oracle summaries do not always retain the optimal (maximum) ROUGE score. Recently, Hirao et al. (2017) derived an Integer Linear Programming (ILP) formulation to obtain the optimal oracle summary. Their oracle summary can help researchers to comprehend the strict limitation of the extractive summarization paradigm. However, their method cannot be applied to obtain compressive oracle summaries.
To reveal the ultimate limitation of the compressive summarization paradigm, we propose an ILP formulation to obtain a compressive oracle summary that maximizes the ROUGE score. We con-ducted experimental evaluation on the Document Understanding Conference (DUC) 2004 dataset. The result demonstrated that ROUGE scores of compressive oracle summaries completely outperformed those of extractive oracle summaries and those of state-of-the-art summarization methods. This indicates that compressive summarization is a promising paradigm for leveraging research resources.

Definition of Compressive Oracle Summaries
Before defining compressive oracle summary, we briefly describe ROUGE n . Given K reference summaries R={R 1 , . . . , R K } and a system summary S. Let G={g n 1 , . . . , g n M } be the set of all n-grams appearing in reference summaries. Let |G|=M . ROUGE n is defined as follows: g n j represents the j-th n-gram appearing in reference summaries. N (g n j , R k ) and N (g n j , S) are the number of occurrences of n-gram g n j in R k and S, respectively. Thus, compressive oracle summaries are defined as follows: (2) T is the set of all valid word subsequences 1 obtained from sentences contained in the input document(s), and L max is the length limitation of the oracle summary. (S) indicates the number of words in the summary. Neither approximation nor exact algorithms are known for solving this problem.

Dependency Structure of a Sentence
In this paper, we follow the dependency tree trimming approach proposed by Filippova et al. (2008;. They proposed rules that transform a tree that represents dependency relation between 1 Word subsequences can be regarded as grammatical sentences. We regard rooted subtrees of dependency trees as valid word subsequences. For details, see Section 3.1. words into a tree that represents dependency relation between chunks (consisting of a word or word sequence). Since we can trim their dependency trees without loss of grammatical consistency, Thus, we employ the trees in our compressive summarization framework. Figure 1 shows examples.

ILP Formulation
Since the denominator of equation (1) is constant for a given set of reference summaries, we can find an oracle summary by maximizing the numerator of equation (1). Equation (3) is the objective function that corresponds to maximization of the numerator of equation (1). z k,j is the count of the j-th n-gram that is contained in both the k-th reference summary and the oracle summary. Equation (4) ensures that the length of the oracle summary is less than L max . b i,u is a binary decision variable indicating whether u-th chunk in i-th sentence is contained in an oracle summary or not. i,u indicates the number of the words in u-th chunk in the i-th sentence. D is a set of sentences and E i is the number of chunks in the i-th sentence. Equations (5) and (6) represent min operation in equation (1). w i,v is the v-th possible word sequence whose length is n and that is contained in the ith sentence, and m i,v is a binary decision variable indicating whether w i,v is contained in the oracle summary or not. T (g n j ) is a set of tuples consisting of indices (i, v) whose word sequence corresponds to g n j , i.e., T (g n j )={(i, v)|w i,v =g n j }. Thus, z k,j = min{N (g n j , R k ), N (g n j , S)}. Equation (7) ensures that an oracle summary consists w 1,1 : Most_dolphins, w 1,2 : Most_live, w 1,3 : Most_in, w 1,4 : dolphins_live, w 1,5 : dolphins_in, w 1,6 : live_in, w 1,7 : in_ervery, w 1,8 :every_ocean. w 2,1 : Some_dolphins, w 2,2 : dolphins_live, w 2,3 : dolphins_in, w 2,4 : dolphins_in, w 2,5 : live_in, w 2,6 : live_in, w 2,7 : rivers_in, w 2,8 : in_some, w 2,9 : some_regions.
[Most] c1,1 [dolphins]  ROOT w 3,1 : Dolphins_usually, w 3,2 : Dolphins_live, w 3,3 : Dolphins_20-40, w 3,4 : Dolphins_in, w 3,5 : usually_live w 3,6 :usually_20-40, w 3,7 : usually_in, w 3,8 : live_20-40, w 3,9 : live_in, w 3,10 : years_in, w 3,11 : in_the, w 3,12 : the_wild. Figure 1: Examples of trees that represent dependency relations between chunks, and word sequences (whose length is 2). Chunks are enclosed in square brackets. Note that we disregard word sequences that are generated by destroying the structure of chunks such as "live every" in S 1 , "dolphins in" in S 2 , "live wild" in S 3 . of a set of rooted subtrees of the sentences in the entire document(s). Function parent (i, u) returns the index of the parent chunk of the u-th chunk in the dependency tree obtained from the i-th sentence. Equations (8) and (9) represent the dependency relation between n-grams and chunks. When we include w i,v in the oracle summary, we have to include all chunks that contain the words in w i,v . In addition, when the above chunks have gap(s), we have to drop chunk(s) within the gap(s). Here, V i (w i,v ) is a set of indices of chunks that includes words in w i,v , and U i (w i,v ) is a set of indices of chunks within the gap(s), defined as We give an example to show how chunks and word sequences are related. When we pack a bigram "live in" in an oracle summary, there are four candidates in the source document (Fig. 1). Word subsequences, w 1,6 ,w 2,5 ,w 2,6 and w 3.9 match "live in". Thus, T (live in) = {(1, 6), (2, 5), (2, 6), (3, 9)}. Here, when we want to pack w 2,6 into the oracle summary, we have to pack both chunks c 2,2 and c 2,4 (b 2,2 = b 2,4 = 1) because U 2 (w 2,6 ) = {2, 4}. Then, we have to drop chunk c 2,3 (b 2,3 = 0) because c 2,3 is within the gap between chunks c 2,2 and c 2,4 (V 2 (w 2,6 ) = 3). Similarly, when we pack w 3,9 into an oracle summary, we have to pack both chunks c 3,3 and c 3,5 and drop chunk c 3,4 . However, this compres-sion is not allowed since there is no dependency relationship between c 3,3 and c 3,5 .
After solving the ILP problem, we can obtain compressive oracle summaries by collecting chunks according to b i,u =1.

Experiments
To investigate the potential limitation of the compressive summarization paradigm, we compare ROUGE scores of compressive oracle summaries with those of extractive oracle summaries and those obtained from state-of-the-art summarization systems. Extractive oracle summaries are obtained by solving the ILP formulation proposed by (Hirao et al., 2017). System summaries are extracted from a public repository 2 .

Settings
We conducted experimental evaluation on the sentences in the dataset to obtain dependency relations between words, and then we transformed them into trees that represent the dependency relations between chunks by applying Filippova's rules (Filippova and Strube, 2008;Filippova and Altun, 2013). To solve the ILP problem, we utilized CPLEX version 12.5.1.0. We obtained and evaluated oracle summaries based on three variants of ROUGE, ROUGE 1 , ROUGE 2 and ROUGE-SU0, with the following conditions 3 : (1) ROUGE 1 , utilizing unigrams excluding stopwords (2) ROUGE 2 , utilizing bigrams with stopwords, and (3) ROUGE-SU0, which is an extension of ROUGE n , utilizing unigram and bigram (excluding skip-bigram) statistics. Table 1 shows ROUGE scores of compressive and extractive oracle summaries and those of RegSum ) that achieved the best ROUGE 1 and ICSISumm ) that achieved the best ROUGE 2 on the DUC-2004 dataset, respectively.

Results and Discussion
We compare ROUGE scores of compressive oracle summaries with extractive oracle summaries. The best scores are obtained when we use the same ROUGE variant for both computation and evaluation (see bolded scores in Table 1). There are large differences between the best scores of ex-  As one of the reasons for the above results, compressive oracle summaries have a much larger number of (sub-)sentences than extractive oracle summaries for the same length limitation. This is an advantage of compressive summarization over extractive summarization. However, we have to note that compressive oracle summaries optimized to ROUGE 1 may not be desirable since they are produced by compressing sentences by ignoring contexts. In fact, they obtained remarkable gain for ROUGE 1 score (8.3 points), while they obtained modest gains in ROUGE 2 and ROUGE-SU0 (0.7 and 3.6 points, respectively). This may suggest that the resultant summaries overfit to the unigrams in the reference summaries.
We compare ROUGE scores of compressive oracle summaries with those of system summaries, ROUGE scores of compressive oracle summaries completely outperformed those of state-of-the-art systems. The differences are in a range from 11 to 17 points.
The results demonstrated that compressive summarization is a promising approach to produce more informative summaries, and room still exists for further improvement. Thus, compressive summarization is important research topic to leverage our resources.

Readability evaluation
We conducted human evaluation to compare readability of extractive oracle summaries to that of compressive oracle summaries. We presented the oracle summaries to five human subjects and asked them to rate the summaries using an integer scale from 1 (very poor) to 5 (very good). Table 2 shows the results. Extractive oracle summaries achieved near perfect scores. Although the scores of compressive oracle summaries are inferior to those of extractive oracle summaries, they achieved good

Reference:
The Wye River accord has not been implemented. As the Israeli cabinet was considering the agreement, Islamic Jihad militants exploded a car bomb in nearby Mahane Yehuda market. The cabinet suspended ratification of the agreement, demanding the Palestinian Authority take steps against terrorism. Further, after the bombing, Israeli Prime Minister Netanyahu announced the resumption of construction of a new settlement, Har Homa, in a traditionally Arab area east of Jerusalem. Israel also demands that Arafat outlaw the military wings of Islamic Jihad and Hamas. The attack injured 24 Israelis, but only the two assailants, Sughayer and Tahayneh, were killed.
Extractive oracle summary n = 1: The procedure is part of the Wye River agreement negotiated last month. The radical group Islamic Jihad claimed responsibility Saturday for the market bombing and vowed more attacks to try to block the new peace accord. Most recently, Israel's Cabinet put off a vote to ratify the accord after a suicide bombing Friday in Jerusalem that killed the two assailants and injured 21 Israelis. David Bar-Illan, a top aide to Israeli Prime Minister Benjamin Netanyahu, said Sunday that Israel expects Palestinian leader Yasser Arafat to formally outlaw the military wings of Islamic Jihad and the larger militant group Hamas.
Compressive oracle summary n = 1: The Israeli cabinet suspended ratification of the Wye agreement. A Prime Minister Benjamin Netanyahu said that Israel would continue to build Jewish neighborhoods throughout Jerusalem including at a site in the Arab sector of the city. Netanyahu's Cabinet delayed action on the peace accord. The radical group Islamic Jihad claimed responsibility for the bombing and vowed attacks. Implementation of the Israeli-Palestinian land-for-security accord was to have begun. David Bar-Illan said that Israel expects Palestinian Yasser Arafat to outlaw the military wings of Islamic Jihad and the Hamas. Their car-bomb blew in a Jerusalem market killing men and wounding 24 people.
Extractive oracle summary n = 2: In response to the attack, the Israeli cabinet suspended ratification of the Wye agreement until there " is verification that the Palestinian authority is indeed fighting terrorism." The radical group Islamic Jihad claimed responsibility Saturday for the market bombing and vowed more attacks to try to block the new peace accord. Most recently, Israel's Cabinet put off a vote to ratify the accord after a suicide bombing Friday in Jerusalem that killed the two assailants and injured 21 Israelis. Their car-bomb blew apart two hours later in a Jerusalem market, killing both men and wounding 24 people. I'm going to Paradise. " Compressive oracle summary n = 2: The Israeli cabinet suspended ratification of the agreement. Hassan Asfour said the Palestinian Authority condemned the attack. Two people were killed. The procedure is part of the Wye River agreement. The radical group Islamic Jihad claimed responsibility for the bombing and vowed more attacks. Israel is demanding that the military wings of two radical Islamic groups be outlawed. Implementation of the land-for-security accord was to have begun. Israel's Cabinet put off a vote to ratify the accord after a bombing in Jerusalem that killed the two assailants and injured 21 Israelis. Their car-bomb blew in a Jerusalem market killing men.

Conclusion
To reveal the ultimate limitations of the compressive summarization paradigm, this paper proposed an Integer Linear Programming (ILP) formulation to obtain compressive oracle summaries in terms of ROUGE. Evaluation results obtained from the DUC 2004 dataset demonstrated that ROUGE scores of compressive summaries are significantly superior to those of extractive oracle summaries and those of the state-of-the-art systems. These results imply that the compressive summarization paradigm is a promising direction to produce informative summaries and encourage leveraging of further resources for the research.