KLearn: Background Knowledge Inference from Summarization Data

Maxime Peyrard, Robert West


Abstract
The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver. So far, summarization researchers have given considerably more attention to relevance than to background knowledge. In contrast, this work puts background knowledge in the foreground. Building on the realization that the choices made by human summarizers and annotators contain implicit information about their background knowledge, we develop and compare techniques for inferring background knowledge from summarization data. Based on this framework, we define summary scoring functions that explicitly model background knowledge, and show that these scoring functions fit human judgments significantly better than baselines. We illustrate some of the many potential applications of our framework. First, we provide insights into human information importance priors. Second, we demonstrate that averaging the background knowledge of multiple, potentially biased annotators or corpora greatly improves summaryscoring performance. Finally, we discuss potential applications of our framework beyond summarization.
Anthology ID:
2020.findings-emnlp.188
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2073–2085
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.188
DOI:
10.18653/v1/2020.findings-emnlp.188
Bibkey:
Cite (ACL):
Maxime Peyrard and Robert West. 2020. KLearn: Background Knowledge Inference from Summarization Data. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2073–2085, Online. Association for Computational Linguistics.
Cite (Informal):
KLearn: Background Knowledge Inference from Summarization Data (Peyrard & West, Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.188.pdf
Code
 epfl-dlab/KLearn
Data
New York Times Annotated Corpus