Discovery of Topically Coherent Sentences for Extractive Summarization

Asli Celikyilmaz1 and Dilek Hakkani-Tur2
1Microsoft Speech Labs, 2Microsoft Speech Labs | Microsoft Research


Abstract

Extractive methods for multi-document summarization are mainly governed by information overlap, coherence, and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Based on human evaluations our models generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. Although our system is unsupervised and optimized for topical coherence, we achieve a 44.1 ROUGE on the DUC-07 test set, roughly in the range of state-of-the-art supervised models.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1050.pdf