Composing Elementary Discourse Units in Abstractive Summarization

In this paper, we argue that elementary discourse unit (EDU) is a more appropriate textual unit of content selection than the sentence unit in abstractive summarization. To well handle the problem of composing EDUs into an informative and fluent summary, we propose a novel summarization method that first designs an EDU selection model to extract and group informative EDUs and then an EDU fusion model to fuse the EDUs in each group into one sentence. We also design the reinforcement learning mechanism to use EDU fusion results to reward the EDU selection action, boosting the final summarization performance. Experiments on CNN/Daily Mail have demonstrated the effectiveness of our model.


Introduction
Abstractive summarization focuses on generating fluent and concise text from the original input document and has achieved considerable performance improvement with the rapid development of deep learning technology (See et al., 2017;Paulus et al., 2017;Celikyilmaz et al., 2018;Gehrmann et al., 2018). In abstractive summarization, the recently popular and practical paradigm usually generates summary sentences by independently compressing or rewriting each pre-extracted sentence, which is from the source documents (Chen and Bansal, 2018;Lebanoff et al., 2019).
However, a single document sentence usually cannot provide enough information that a summary sentence expresses, which is supported by the recent study of Lebanoff et al. (2019). They show that a high percentage of summary sentences include information from more than one document sentences, and composing a summary through only compressing sentences can cause performance degradation. Simultaneously, in contrast to the brevity requirements of a summary, each document sentence usually offers trivial details and expresses a relatively independent meaning, posing difficulty of combining multiple sentences into one summary sentence. So we hope to seek a new summary composition unit which is more information-intensive and elementary than sentence.
In this paper, we choose to use Elementary Discourse Unit (EDU) as the summarization unit, which is first proposed from Rhetorical Structure Theory (Mann and Thompson, 1988) and defined as a clause. The finer granularity makes EDU more suitable than sentence to be the basic summary composition unit (Li et al., 2016). At the same time, benefited from the development of EDU segmentation technology, which can achieve a high accuracy of 94% (Wang et al., 2018), it is feasible to automatically obtain EDUs from the text. Next, the problems are: (1) which EDUs should be selected to compose a good summary? Moreover, (2) how to well assemble the selected EDUs into a fluent summary?
To solve the problems above, we need to extract the information-intensive EDUs from the source documents and effectively fuse the related EDUs into fluent summary sentences. With such an idea, inspired by Chen and Bansal (2018)'s work, we design an abstractive summarization method which is composed of two parts: EDU selection and EDU fusion. EDU selection aims to extract informative EDUs and group them while EDU fusion takes the grouped EDUs as input to generate a sentence. As the EDU selection process lacks labeling training data, we apply the EDU fusion results as the feedback to tune the EDU selection model which in turn influences the EDU fusion process. Here, the actor-critic reinforcement learning algorithm is employed to train our EDU-based summarization method. To the best of our knowledge, we are the first to propose a practical solution to compose EDUs in summarization. Experiments show that compared to previous models, our EDU based model achieves a significant improvement on the CNN/Daily Mail dataset.

Model
Our model is mainly composed of two modules: EDU Selection and EDU Fusion. EDU Selection aims to extract salient EDUs from the source document and group the closely related EDUs. Here, we adopt a smart unified end-to-end method to implement both the extraction and grouping. Next, EDU Fusion takes the EDUs in a group to generate a fluent and informative sentence. To train our method, we adopt reinforcement learning to leverage both the two modules. Figure 1 shows the whole architecture of our method.

EDU Selection
The EDU selection model is mainly based on a sequence-to-sequence pointer network. In the encoding stage, we use a hierarchical encoder to get the contextual representation of each EDU, which consists of a word-level temporal convolutional neural network (Kim, 2014) and an EDU-level Bidirectional Long Short-Term Memory Network(Bi-LSTM) (Hochreiter and Schmidhuber, 1997).
In the decoding stage, we design an LSTM decoder to identify the informative EDUs with their group information. To group the related EDUs, we design a particular label truncate whose representation is a trainable parameter h truncate . We also add another special label stop with its representation h stop to determine the end of the selection process. h truncate and h stop are first randomly initialized and then learned in the training process. In each decoding step, the decoder computes a selection probability distribution on EDUs, truncate and stop. Assuming at time step t, the indices of the EDUs which have been extracted are included in the set Sel t , the decoder first uses the Luong attention (Luong et al., 2015) to get the context c t and then computes a score s t i for each EDU or label by: where i represents the index of an EDU, truncate or stop, and h i denotes the corresponding representation. v p and W p are the trainable parameters. In order to avoid repeated selection of the same EDUs, we assign the score of −∞ to the EDUs that have been extracted. It is noted that the label truncate can be generated multiple times since it

EDUs in document summary sentences (ground truth)
select 1 E . . is not included in Sel t . Finally, we get the selection probability at time step t by applying softmax to regularize the scores.

EDUs selected
Once the decoder selects the stop label, it stops the selection process and gets a sequence which is composed of EDUs, truncate labels and one stop label. Next, the EDUs separated by truncate are grouped for fusion.

EDU Fusion
The EDU fusion module uses the standard pointer generator (See et al., 2017) to generate one sentence for each group of EDUs. This design allows the model to directly copy words from the inputted EDUs to the generated sentence, which is beneficial to keeping the cross-sentence information in the source documents. At the same time, benefited from the conditional language model training objective, the coherence of the generated sentences is highly improved to remedy the poor readability of EDUs.
To leverage EDU selection and fusion for generating a good summary, reinforcement learning mechanism is designed to use EDU fusion results to tune the selection process, which in turn affects the fusion performance. We introduce the learning process detailedly in Section 3.

Learning
We firstly pre-train the EDU selection and EDU fusion module separately and then use the pretrained model as initialization for reinforcement learning(RL).

Model Pretraining
Because the summarization datasets do not label the salient EDUs, we propose a greedy method to provide the labeled data for pre-training. For each pair of the document and summary, we select several groups of EDUs from the document as the oracle EDU labels, with each group corresponding to a summary sentence. For each summary sentence, we construct a group of EDUs iteratively. We start from an empty group and repeatedly select the EDU from the document that can maximize the ROUGE-L recall score between the ground-truth summary sentence and the group of EDUs after the EDU is added into the group until no EDU can increase the score. We use ROUGE-L recall so that the EDU selection module can select as much information as possible for EDU fusion. With such a dataset, we pre-train the EDU selection module. To pre-train the EDU fusion module, the input and output are the concatenation of oracle EDUs and summary sentences. We pre-train the two modules separately by optimizing maximum likelihood (ML).

Reinforcement Learning
We use the Advantage Actor-Critic (A2C) algorithm to train our model end-to-end. Following Chen and Bansal (2018)'s work, we fix the parameters of the EDU fusion module during RL training. Here, we regard the EDU selection module as the agent whose decoding stage is formulated as a Markov Decision Process (MDP). In each decoding step, the agent executes one selection action, which is selecting an EDU or a label (truncate or stop) according to the selection probability. Then the agent gets a reward according to the EDU fusion results. As for reward computation, given the group i of the selected EDUs, we use the EDU fusion module to generate a sentence s i and compute its score r i to measure the overlap between s i and the sentence gt i in the ground truth summary.
where n is the number of sentences in the ground truth summary. For each selection action to compose the group, we set its reward as r i l i , where l i is the action number of selecting an EDU or truncate. Similar to (Chen and Bansal, 2018), we compute the ROU GE-1 F score between the

Results
To evaluate model performance, we compare our model (named EDUSum) with the state-of-the-art extractive and abstractive summarization methods. Three extractive methods are a strong Lead-3 baseline, NN (Cheng and Lapata, 2016) which applies neural networks with attention to extract sentences directly, and REFRESH (Narayan et al., 2018) which uses reinforcement learning to rank sentences. Three abstractive methods for comparison include: Pointer Generator (See et al., 2017), a controllable text generation method (Fan et al., 2017), and Fast-Abs (Chen and Bansal, 2018) which uses  we can also see that all the summarization methods with RL achieve comparable performance, meaning the RL mechanism can effectively supervise a system to acquire valuable information. We also design a model EDUSum sel+RL which is similar to EDUSum except that it does not include the EDU fusion module and directly concatenates the selected EDUs as a summary. EDUSum sel+RL performs worse with respect to R-1 and R-L when the EDU fusion module is removed, because the direct concatenation of EDUs may bring redundancy into the summary and EDU fusion can make the summary sentence more informative. We also note that EDUSum sel+RL performs better than EDU Sum with respect to R-2, perhaps because EDU fusion may generate some fake information and need further improvement which will be our future work.
Further, we conduct a thorough analysis of the EDU selection module which is the main component of our method. Compared to previous work, the EDU selection module can automatically determine which EDUs and how many EDUs can be grouped. Such a design is convenient for capturing cross-sentence information effectively. To evaluate whether it is necessary to capture crosssentence information in summarization, we add a constraint to our model: the EDU selection module can only select those EDUs that belong to the same sentence into the same group. We name this model EDUSum SameSent . From Table 2, we can see that EDUSum SameSent behaves a little worse than EDUSum. This makes sense because the content of each summary sentence mostly derive from one source sentence and is supplemented by some infor-

Model
Read. Non-redund.  mation from other sentences. We also evaluate the grouping effects of our model and remove the automatic grouping mechanism by grouping every K adjacent selected EDUs into a group. We set K as 1, 2, and 3 respectively where the value of 1 means no group at all. Table 2 shows EDUSum group−2 performs the best among all the size settings, but performs worse than EDUSum and EDUSum Samesent . This means that a summary sentence is usually composed of two EDUs but a hard grouping can degrade the performance. We also give a summary sentence generated by our method as an example to illustrate the advantage of our model, as in Figure 2. We can see that our model can well select and group the EDUs (the underlined EDUs in Sent. 1 and Sent. 2) which have similar meanings, and fuse the grouped EDUs coherently by grabbing the key entity information (i.e., person and team information in Sent. 1) and combining them into the final summary sentence.

Human Evaluation
To evaluate the abstractive ability of our method, we conduct a human evaluation on the two aspects of readability and non-redundancy. Readability measures how easy a text is to read, and depends on the elements of grammaticality and coherence. Non-redundancy mainly denotes the degree of linguistic brevity of a text in conveying the main idea. To save labor, we only choose two baselines Fast-Abs and EDUSum sel+RL , which perform well with ROUGE metrics, for comparison. Comparing to scoring, ranking is relatively easy for an annotator to implement and we follow the evaluation method of (Wu and Hu, 2018). We randomly sample 50 test documents and generate their summaries using our model and the two baselines. Three annotators are asked to rank each set of three summaries with respect to readability and non-redundancy. The best is ranked the first while the worst is the third, and the ranks are allowed to be tied. Then we compute the average ranks of the three models, as shown in Table 3. We see that EDUSum can  well leverage readability and non-redundancy compared to the two baselines. Both EDUSum and EDUSum sel+RL achieve a significant improvement in non-redundancy, because the fine-grained EDUs can contain more informative cross-sentence information and make the summaries briefer. We can also see EDUSum sel+RL suffers from bad readability because it simply concatenates EDUs into a sentence, which is the main problem that EDU based models are faced with. As for EDUSum, benefited from EDU fusion, this model can achieve nearly the same readability as the sentence based model Fast-Abs.

Conclusions
In this paper, we choose EDU as the basic summary unit and propose a novel EDU based summarization model EDUSum. In our model, the module of EDU selection is designed to extract and group salient EDUs and the module of EDU fusion to convert groups of EDUs into summary sentences. We also apply reinforcement learning to leverage EDU selection and EDU fusion for improving summarization performance. With such a design, EDUSum can fuse cross-sentence information and remedy the poor readability problem brought by EDUs. Compared to previous work, this work has provided a feasible and effective method which makes full use of EDUs in summarization.