A Temporally Sensitive Submodularity Framework for Timeline Summarization

Timeline summarization (TLS) creates an overview of long-running events via dated daily summaries for the most important dates. TLS differs from standard multi-document summarization (MDS) in the importance of date selection, interdependencies between summaries of different dates and by having very short summaries compared to the number of corpus documents. However, we show that MDS optimization models using submodular functions can be adapted to yield well-performing TLS models by designing objective functions and constraints that model the temporal dimension inherent in TLS. Importantly, these adaptations retain the elegance and advantages of the original MDS models (clear separation of features and inference, performance guarantees and scalability, little need for supervision) that current TLS-specific models lack.


Introduction
There is an abundance of reports on events, crises and disasters.Timelines (see Table 1) summarize and date these reports in an ordered overview.Automatic Timeline Summarization (TLS) constructs such timelines from corpora that contain articles about the corresponding event.
In contrast to standard multi-document summarization (MDS), in TLS we need to explicitly model the temporal dimension of the task, specifically we need to select the most important dates for a long-running event and summarize each of these dates.In addition, TLS deals with a much larger number of documents to summarize,

2011-03-16
Security forces break up a gathering in Marjeh Square in Damascus of 150 protesters holding pictures of imprisoned relatives.Witnesses say 30 people are arrested.

2011-03-24
President Bashar al-Assad orders the formation of a committee to study how to raise living standards and lift the law covering emergency rule, in place for 48 years.2011-03-29 Government resigns.enhancing scalability and redundancy problems.These differences have significant consequences for constraints, objectives, compression rates and scalability (see Section 2.2).
Due to these differences, most work on TLS has been separate from the MDS community. 2nstead, approaches to TLS start from scratch, optimizing task-specific heuristic criteria (Chieu and Lee, 2004;Yan et al., 2011b;Wang et al., 2016, inter alia), often with manually determined parameters (Chieu and Lee, 2004;Yan et al., 2011b) or needing supervision (Wang et al., 2016).As features and architectures are rarely reused or indeed separated from each other, it is difficult to assess reported improvements.Moreover, none of these approaches give performance guarantees for the task, which are possible in MDS models based on function optimization (McDonald, 2007;Lin and Bilmes, 2011) that yield state-of-the art models for MDS (Hong et al., 2014;Hirao et al., 2017).
In this paper we take a step back from the differences between MDS and TLS and consider the following question: Can MDS optimization models be expanded to yield scalable, well-performing TLS models that take into account the temporal properties of TLS, while keeping MDS advantages such as modularity and performance guarantees?
In particular, we make the following contributions: • We adapt the submodular function model of Lin and Bilmes (2011) to TLS (Section 3).This framework is scalable and modular, allowing a "plug-and-play" approach for different submodular functions.It needs little supervision or parameter tuning.We show that even this straightforward MDS adaptation equals or outperforms two strong TLS baselines on two corpora for most metrics.• We modify the MDS-based objective function by adding temporal criteria that take date selection and interdependencies between daily summaries into account (Section 4).• We then add more complex temporal constraints, going beyond the simple cardinality constraints in MDS (Section 5).These new constraints specify the uniformity of the timeline daily summaries and date distribution.We also give the first performance guarantees for TLS using these constraints.• We propose a TLS evaluation framework, in which we study the effect of temporal objective functions and constraints.We show performance improvements of our temporalizations (Section 6).We also present the first oracle upper bounds for the problem and study the impact that timeline properties, such as compression rates, have on performance.

Timeline Summarization
Given a query (such as Syrian war) TLS needs to (i) extract the most important events for the query and their corresponding dates and (ii) obtain concise daily summaries for each selected date (Allan et al., 2001;Chieu and Lee, 2004;Yan et al., 2011b;Tran et al., 2015a;Wang et al., 2016).

Task Definition and Notation
A timeline is a sequence where the d i are dates and the v i are summaries for the dates d i .Given are a query q and an associated corpus C that contains documents relevant to the query.The task of timeline summarization is to generate a timeline t based on C. The number of dates in t as well as the length of the daily summaries are typically controlled by the user.We denote with U the set of sentences in C. We assume that each sentence in U is dated (either by a date expression appearing in the sentence or by the publication date of the article it appears in).For a sentence s we write d(s) for the date of s.

Relation to MDS
In MDS, we also need to generate a (lengthlimited) summary of texts in a corpus C (with an optional query q used to retrieve the corpus).In the traditional DUC multi-document summarization tasks3 , most tasks are either not event-based at all or concentrate on one single event.In contrast, in TLS, the corpus describes an event that consists of several subevents that happen on different days.This difference has substantial effects.In MDS, criteria (such as coverage and diversity) and length constraints apply on a global level.In TLS, the whole summary is naturally divided into per-day summaries.Criteria and constraints apply on a global level as well as on a per-day level.
Even for the small number of DUC tasks that do focus on longer-running events, several differences to TLS still hold.First, the temporal dimension in the DUC gold standard summaries and system outputs is playing a minor role, with few explicit datings of events and a non-temporal structure of the output, leading again to the abovementioned differences in constraints and criteria.The ROUGE evaluation measures used in MDS (Lin, 2004) also do not take into account temporality and do not explicitly penalize wrong datings.Second, corpora in TLS typically contain thousands of documents per query (Tran et al., 2013b(Tran et al., , 2015a)).This is magnitudes larger than the corpora usually considered for MDS (Over and Yen, 2004).This leads to a low compression rate4 and requires approaches to be scalable.

Casting TLS as MDS
In the introduction, we identified several issues in existing TLS research, including lack of modularity, insufficient separation between features and model, and the lack of performance guarantees.Global constrained optimization frameworks used in MDS (McDonald, 2007;Lin and Bilmes, 2011) do separate constraints, features and inference and allow for optimal solutions or solutions with performance guarantees.They also can be used in an unsupervised manner.We now cast TLS as MDS, employing constraints and criteria used for standard MDS (Lin and Bilmes, 2011).While this ignores the temporal dimension of TLS, it will give us a baseline and a starting point for systematically incorporating temporal information.

Problem Statement and Inference
We can understand summarization as an optimization of an objective function that evaluates sets of sentences over constraints.Hence, let U be a set of sentences in a corpus and let f : 2 U → R ≥0 be a function that measures the quality of a summary.Let I ⊆ {X | X ∈ 2 U } be a set of constraints5 .We then consider the optimization problem (1) Solving Equation 1 exactly does not scale well (McDonald, 2007) and is therefore inappropriate for the large-scale data used in TLS.The greedy Algorithm 1 that iteratively constructs an output solves the equation approximately (also used in McDonald (2007) and Lin and Bilmes (2011)).

Monotonicity and Submodularity
The results obtained by GREEDY can be arbitrarily bad.However, there are performance guarantees if the objective function f and the constraints I are "sufficiently nice" (Calinescu et al., 2011).Many results rely on objective functions that are monotone and submodular.
From now on we assume that the function f is of the form f ≡ m i=1 f i with monotone submodular f i : U → [0, 1] (i ∈ {1, . . ., m}).We normalize all f i to [0, 1].By closure properties of monotonicity and submodularity, f is also submodular.

MDS Constraints
Constraints help to define a summary's structure, and the performance guarantee of the greedy algorithm depends on them.In MDS, typical constraints are upper bounds in the number of sentences or words, corresponding to cardinality (|S| ≤ m) or knapsack constraints ( s∈S |words(s)| ≤ m) for some upper bound m.When optimizing a submodular monotone function under such constraints, GREEDY has a performance guarantee of ≈ 0.63 and ≈ 0.39 respectively (Calinescu et al., 2011;Lin and Bilmes, 2011).That is, for cardinality constraints, the output is at least 0.63 as good as the optimal solution in terms of objective function value.

MDS Objective Functions
In MDS, approaches typically try to maximize coverage and diversity.In its simplest form, Lin and Bilmes (2011) model coverage as where sim : U ×U → R ≥0 is a sentence similarity function, e.g.cosine of word vectors.Lin and Bilmes (2011) model diversity via where P 1 , . . ., P k is a partition of U (e.g.obtained by semantic clustering) and r : U → R ≥0 is a singleton reward function.We get diminished reward for adding additional sentences from one cluster.

Application to TLS
Applying this MDS model to TLS as-is may not be adequate.For example, since the length constraints only limit the total number of sentences, some days in the timeline could be overrepresented.Furthermore, if objective functions ignore temporal information, we may not be able to extract sentences that describe very important events lasting only for short time periods.Instead, natural units for TLS are both the whole timeline as well as individual dates, so criteria and constraints for TLS should accommodate both units.
We now systematically add temporal information to the objective function by (i) temporalizing coverage functions, (ii) temporalizing diversity functions, and (iii) adding date selection functions.We prove the monotonicity and submodularity of all functions in the supplementary material.

Temporalizing Coverage
MDS coverage functions (Equation 2) ignore temporal information, computing coverage on a corpus-wide level.We temporalize them by modifying the similarity computation.This is a minimal but fundamental modification.Previous work in TLS noted that coverage for candidate summaries for a day d should look mainly at the temporally local neighborhood, i.e. at sentences whose dates are close to d (Chieu and Lee, 2004;Yan et al., 2011b).We investigate two variants of this idea.The first uses a hard cutoff (Chieu and Lee, 2004), restricting similarity computations to sentences that are at most p days apart: The second uses a soft variant (Yan et al., 2011b).Let g : N → R >0 be monotone with g(0) = 1.We set sim g (s, t) = sim(s, t)/g(|d(s) − d(t)|).Thus, all date differences are penalized, and greater date differences are penalized more.

Temporalizing Diversity
As with coverage, standard MDS diversity functions (Equation 3) ignore temporal information.If the singleton reward r in f Div relies on sim, as is the case with many implementations, then temporalizing sim implicitly temporalizes diversity.We now go beyond such an implicit temporalization.
In TLS, we want to apply diversity on a temporal basis: we do not want to concentrate the summary on very few, albeit important dates, but we want date (and subevent) diversity.f Div , however, typically uses only a semantic criterion to obtain a partition, e.g. by k-means clustering of sentence vector representations (Lin and Bilmes, 2011).This may wrongly conflate events, such as two unrelated protests on different dates.We can instead employ a temporal partition.The simplest method is to partition the sentences by their date, i.e. for a temporalized diversity function f TempDiv we have the same form as in Equation 3, but P i contains all sentences with date d i , where d 1 , . . ., d k are all sentence dates.

Date Selection Criteria
An important part of TLS is date selection.Dedicated algorithms for date selection use frequency and patterns in date referencing to determine date importance (Tran et al., 2015b).Most date importance measures can be integrated into the objective function to allow for joint date selection and summary generation. 6One well-performing date selection baseline is to measure for each date how many sentences refer to it.This objective can be described by the monotone submodular function

Combining Criteria
We combine coverage, diversity and date importance via unweighted sums for our final objective functions.An alternative would be to combine them via weighted sums learned from training data (Lin andBilmes, 2011, 2012) but since there are only few datasets available for training and testing TLS algorithms we choose the unweighted sum to estimate as few parameters as possible from data.

Temporalizing Constraints
The MDS knapsack/cardinality constraints are too simple for TLS as an overall sentence limit does not constrain a timeline to have daily summaries of roughly similar length or enforce other uniformity properties.We introduce constraints going beyond simple cardinality, and prove performance guarantees of GREEDY under such constraints.

Definition of Constraints
Typically, we have two requirements on the timeline: the total number of days should not exceed a given number ℓ and the length of the daily summary (in sentences) should not exceed a given number k (for every day).Let d be the function that assigns each sentence its date.For a set S ⊆ U , the requirements can be formalized as and, for all s ∈ S,

Performance Guarantees
While the constraints expressed by Equations 5 and 6 are more complex than constraints used in MDS, they have a property in common: if a set S fulfills the constraints (i.e. S ∈ I), then also any subset T ⊆ S fulfills the constraints (i.e.T ∈ I).
Definition 1.Let V be some set and I ⊂ 2 V be a collection of subsets of V .The tuple (V, I) is called an independence system if (i) ∅ ∈ I and (ii) B ∈ I and A ⊆ B implies A ∈ I.
Optimization theory shows that GREEDY also has performance guarantees when generalizing cardinality/knapsack constraints to "sufficiently nice" independence systems.Based on these results, we prove Lemma 1 (see the suppl.material): Lemma 1.Let I be the set of subsets of U that fulfill Equations 5 and 6.Then GREEDY has a performance guarantee of 1/(k + 1).
The lemma implies that for small k that is typical in TLS (e.g.k = 2), we obtain a good approximation with reasonable constraints.However, our performance guarantees are still weaker than for MDS (for example, 0.33 for k = 2 compared to 0.63 in MDS).The reason for this is that our constraints are more complex, going beyond the simple well-studied cardinality and knapsack constraints.We also observe that this is a worst-case bound: in practice the performance of the algorithm may approach the exact solution (as Lin and Bilmes (2010) show for MDS).However, such an analysis is out of scope for our paper, since computing the exact solution is intractable in TLS. 7

Experiments
We evaluate the performance of modeling TLS as MDS and the effect of various temporalizations.

Data and Preprocessing
We run experiments on timeline17 (Tran et al., 2013b) and crisis (Tran et al., 2015a).Both data sets consist of (i) journalist-generated timelines on events such as the Syrian War as well as (ii) corresponding corpora of news articles on the topic  scraped via Google News.They are publically available8 and have been used in previous work (Wang et al., 2016).9Table 2 shows an overview.
In the data sets, even timelines for the same topic have considerable variation.Table 3 shows properties for the five BP oil spill timelines in timeline17.There is substantial variation in range, granularity and average daily summary length.
Following previous work (Chieu and Lee, 2004;Yan et al., 2011b), we filter sentences in the corpus using keywords.For each topic we manually define a set of keywords.If any of the keywords appears in a sentence, the sentence is retained.
We identify temporal expressions with Heidel-Time (Strötgen and Gertz, 2013).If a sentence s contains a time expression that can be mapped to a day d via HeidelTime we set the date of s to d (if there are multiple expressions we take the first one).Otherwise, we set the date of s to the publication date of the article which contains s.10

Evaluation Metrics
Automatic evaluation of TLS is done by ROUGE (Lin, 2004).We report ROUGE-1 and ROUGE-2 F 1 scores for the concat, agreement and align+ m:1 metrics for TLS we presented in Martschat and Markert (2017).These metrics perform evaluation by concatenating all daily summaries, evaluating only matching days and evalu-ating aligned dates based on date and content similarity, respectively.We evaluate date selection using F 1 score.

Experimental Settings
TLS has no established settings.Ideally, reference and predicted timelines should be given the same compression parameters, such as overall length or number of days. 11Since there is considerable variation in timeline parameters (Table 3), we evaluate against each reference timeline individually, providing systems with the parameters they need via extraction from the reference timeline, including range and needed length constraints.We set m to the number of sentences in the reference timeline, ℓ to the number of dates in the timeline, and k to the average length of the daily summaries.
Most previous work uses different or unreported settings, which makes comparison difficult.For instance, Tran et al. (2013b) do not report how they obtain timeline length.Wang et al. (2015Wang et al. ( , 2016) ) create a constant-length summary for each day that has an article in the corpus, thereby comparing reference timelines with few days with predicted timelines that have summaries for each day.

Baselines
Past work on crisis generated summaries from headlines (Wang et al., 2016) or only used manual evaluation (Tran et al., 2015a).Past work on time-line17 evaluates with ROUGE (Tran et al., 2013b;Wang et al., 2016) but suffers from the fact that parameters for presented systems, baselines and reference timelines differ or are not reported (see above).Therefore, we reimplement two baselines that were competitive in previous work (Yan et al., 2011b;Wang et al., 2015Wang et al., , 2016)).
Chieu.Our first baseline is CHIEU, the unsupervised approach of Chieu and Lee ( 2004).It operates in two stages.First, it ranks sentences based on similarity: for each sentence s, similarities to all sentences in a 10-day window around the date of s are summed up 12 .This yields a ranked list of sentences, sorted by highest to lowest summed up similarities.Using this list, a timeline contain-11 This would mirror settings in MDS, where reference and predicted summary have the same length constraint.
12 This corresponds to the Interest ranking proposed by Chieu and Lee (2004).We do not use the more complex Burstiness measure since Interest was found to perform at least as well in previous work when evaluated with ROUGEbased measures (Wang et al., 2015, p.c.) ing one-sentence daily summaries is constructed as follows: iterating through the ranked sentence list, a sentence is added to the timeline depending on the extent of the sentences already in the timeline.Extent of a sentence s is defined as the smallest window of days such that the total similarity of s to sentences in this window reaches at least 80% of the similarity to the sentences in the full 10-day window.If the candidate sentence does not fall into the extent of any sentence already in the timeline, it is added to the timeline.
As we can see, the model and parameters such as daily summary length are intertwined in this approach.We therefore reimplement CHIEU exactly instead of giving it reference timeline parameters.As we describe below, we use the same sentence similarity function as Chieu and Lee (2004).
Regression.Our second baseline is REG, a supervised linear regression model (Tran et al., 2013b;Wang et al., 2015).We represent each sentence with features describing its length, number of named entities, unigram features, and averaged/summed tf-idf scores.During training, for each sentence, standard ROUGE-1 F 1 w.r.t. the reference summary of the sentence's date is computed.The model is trained to predict this score. 13uring prediction, sentences are selected greedily according to predicted F 1 score, respecting temporal constraints defined by the reference timeline.

Model Parameters
For all submodular models and for CHIEU we use sparse inverse-date-frequency sentence representations (Chieu and Lee, 2004)14 .This yields a vector representation v s for each sentence s.We set sim(s, t) = cos(v s , v t ).We did not tune any further parameters but re-used settings from previous work.For modifications to sim when temporalizing coverage and diversity (Section 4), we use a cutoff of 10 (as Chieu and Lee ( 2004)), and consider g(x) = √ x + 1 for reweighting.We choose the square root since it quickly provides strong penalizations for date differences but then saturates.Following Lin and Bilmes (2011), we set singleton reward for f Div to r(s) = u∈U sim(s, u) and obtain the partition P 1 , . . ., P k by k-means clustering with k = 0.2•|U |.We obtain a temporalization f TempDiv of diversity by considering a partition of sentences induced by their dates (see Section 4).

Results
Results are displayed in Table 4.The numbers are averaged over all timelines in the respective corpus.We test for significant differences using an approximate randomization test (Noreen, 1989) with a p-value of 0.05.
Baselines.Overall, performance on crisis is much lower than on timeline17.This is because (i) the corpora in crisis contain articles for more days over a larger time span and (ii) average percentage of article publication dates for which a summary in a corresponding reference timeline exists is 11% for timeline17 and 3% for crisis.This makes date selection more difficult.On crisis, CHIEU outperforms REG except for date selection.On time-line17, REG outperforms CHIEU for four out of seven metrics.Timelines in crisis contain fewer dates and shorter daily summaries than timelines in timeline17, which aligns well with CHIEU's redundancy post-processing.

TLS as MDS.
The model ASMDS uses standard length constraints from MDS and an objective function combining non-temporalized f Cov and f Div .It allows us to evaluate how well standard MDS ports to TLS.Except for concat and date selection on crisis, this model outperforms both baselines, while providing the advantages of modularity, non-supervision and feature/inference separation discussed throughout the paper.

Temporalizing
Constraints.The model TLSCONSTRAINTS uses the temporal constraints described in Section 5, but has the same objective function as ASMDS.Compared to ASMDS, there are improvements on all metrics on timeline17 and similar performance on crisis.

Temporalizing
Criteria.We temporalize ASMDS objective functions (Section 4) via modifications of the similarity function (cutoffs/reweightings), replacing diversity by temporal diversity f TempDiv , and adding date selection f DateRef .Constraints are kept non-temporal.If modifications improve over ASMDS we also check for cumulative improvements.Modifying similarity is not effective, results drop or stay roughly the same according to most metrics.The other modifications improve performance w.r.t.most metrics, especially for date selection.
Temporalizing Constraints and Criteria.Lastly, we evaluate the joint contribution of temporalized constraints and criteria. 15Modifications to the similarity function have a positive effect, especially reweighting.f DateRef provides information about date importance not encoded in the constraints, improving results on crisis.
Oracle Results.Previous research in MDS computed oracle upper bounds (e.g.Hirao et al. (2017)).To estimate TLS difficulty and our limitations, we provide the first oracle upper bound for TLS: For each sentence s, we compute ROUGE-1 F 1 g s w.r.t. the reference summary for the sentence's date.We then run GREEDY for f Oracle (S) = s∈S g s , employing the same constraints as TLSCONSTRAINTS (see Table 7).
Scores of the models are most similar to oracle results for the temporally insensitive concat metric, with gaps comparable to gaps in MDS (Hirao et al., 2017).The biggest gap is in date selection F 1 .This also leads to higher differences in the scores of temporally sensitive metrics, highlighting the importance of temporal information.

Analysis
We now investigate where and how temporal information helps compared to ASMDS.We have already identified two potential weaknesses of modeling TLS as MDS: the low compression rate (Section 2) and the likely case that ASMDS overrepresents certain dates in a timeline (Section 3).We now analyze the behavior of AsMDS w.r.t.these points and discuss the effect of temporal information.To avoid clutter, we restrict analysis to time-line17 and report only align+ m:1 ROUGE-1 F 1 .
Effect of Compression Rate.We hypothesize that difficulty increases as compression rate decreases.We measure compression rate in two ways.We first adopt the definition from MDS and define corpus compression rate as the number of sentences in a reference timeline divided by the number of sentences in the (unfiltered) corresponding corpus.Second, we define a TLSspecific notion called spread as the number of dates in the reference timeline divided by the maximum possible number of dates given its start and end date.For example, the timeline from Table 1 in the introduction has spread 3/14.We see that  timelines with lowest compression rate/spread are indeed the hardest (Table 5).Temporal information leads to improvements in all categories.
(Over)representation of Dates.We hypothesized that ASMDS may overrepresent certain dates.We test this hypothesis by measuring the length (in sentences) of the longest daily summary in a timeline, and computing mean and median over all timelines (Table 6).The numbers confirm the hypothesis: When modeling TLS as MDS, some daily summaries tend to be very long.By construction of the constraints employed, the effect does not occur or is much weaker for CHIEU, REG and TLSCONSTRAINTS.Temporal objective functions (as in ASMDS+f TempDiv +f DateRef ) also weaken the effect substantially.

Related Work
The earliest work on TLS is Allan et al. (2001), who introduce the concepts of usefulness (conceptually similar to coverage) and novelty (similar to diversity), using a simple multiplicative combination.However, both concepts are not temporalized.The notion of usefulness is developed further as "interest" by Chieu and Lee ( 2004), which we use as one of our baselines.Chieu and Lee (2004)  tween coverage and diversity is not adequately modeled.Further optimization criteria are introduced by Yan et al. (2011b,a) and Nguyen et al. (2014), but their frameworks suffer from a lack of modularity or from an unclear separation of features and architecture.Wang et al. (2015) devise a local submodular model for predicting daily summaries in TLS, but they do not model the whole timeline generation as submodular function optimization under suitable constraints.Wang et al. (2016) tackle only the task of generating daily summaries without date selection using a supervised framework, greedily optimizing per-day predicted ROUGE scores, using images and text.In contrast, Kessler et al. (2012) and Tran et al. (2015b) only tackle date selection but do not generate any summaries.We consider the full task, including date selection and summary generation.
TLS is related to standard MDS.We discussed differences in Section 2. Our framework is inspired by Lin and Bilmes (2011) who cast MDS as optimization of submodular functions under cardinality and knapsack constraints.We go beyond their work by modeling temporally-sensitive objective functions as well as more complex constraints encountered in TLS.
A related task is TREC real-time summarization (RTS) (Lin et al., 2016). 16.In contrast to TLS, this task requires online summarization by presenting the input as a stream of documents and emphasizes novelty detection and lack of latency.In addition, RTS focuses on social media and has a very finegrained temporal granularity.TLS also has an emphasis on date selection and dating for algorithms and evaluation which is not present in RTS as the social media messages are dated a priori.

Conclusions
We show that submodular optimization models for MDS can yield well-performing models for TLS, despite the differences between the tasks.Therefore we can port advantages such as modularity and separation between features and inference, which current TLS models lack.In addition, we temporalize these MDS-based models to take into account TLS-specific properties, such as timeline uniformity constraints, importance of date selection and temporally sensitive objectives.These temporalizations increase performance without losing the mentioned advantages.We prove that the ensuing functions are still submodular and that the more complex constraints still retain performance guarantees for a greedy algorithm, ensuring scalability.

Proofs 1.Proof of Lemma 1
For more details and the combinatorial background regarding the used lemmas and theorems we refer the reader to Calinescu et al. (2011).

Performance Guarantees for Independence Systems
In order to relate independence systems to performance guarantees for the greedy algorithm we need the notion of a base.
Definition 1.Let (V, I) be an independence system.Let X ⊆ V .The set of bases of X is In general a set can have multiple bases.The performance guarantee of the greedy algorithm depends on the relation of the largest to the smallest base.
Definition 2. Let (V, I) be an independence system.Let X ⊆ V .We define the lower rank of X, lr(X), as the size of the smallest base of X.We define the upper rank of X, ur(X), as the size of the largest base of X. Definition 3. Let p ∈ N.An independence system (V, I) is a p-independence system if for each X ⊆ V the size of the largest base of X is at most p times the size of the smallest base of X, i.e.

ur(X) lr(X)
≤ p We can now give the main algorithmic result needed for the proof of our lemma (Fisher et al., 1978;Calinescu et al., 2011).
Theorem 1.Let (V, I) be a p-independence system and let f : 2 V → R be a submodular monotone function.Then the greedy algorithm solves the optimization problem max X⊆V,X∈I f (X) (3) within the constant factor 1/(p + 1).
Now we can prove the lemma.
and, for all s ∈ S, Lower Bound.We now prove the lower bound (Equation 9).We use a proof by contradiction.Hence, assume that (18) Hence, Equation 5 holds for all a ′ ∈ A ′ , which proves the contradiction that A ′ ∈ I.

Proofs of submodularity and monotonicity of the objective functions
Submodularity and monotonicity of f Cov and f Div is shown by Lin and Bilmes (2011).Since our modifications to the similarity functions preserve the non-negativity, the temporalized versions of f Cov are also monotone and submodular.f TempDiv is monotone and submodular since it has the same form as f Div .Therefore we just need to show that f DateRef is monotone and submodular.
Proof.We first show monotonicity and then submodularity.For brevity, we write f instead of f DateRef .

)
Then the following holds: Let I be the set of subsets of U that fulfill Equations 4 and 5. Then I is a k-independence system.Proof.Let Y ⊆ U .We need to show thatmax A∈B(Y ) |A| min A∈B(Y ) |A| ≤ k.(6)In order to show this, we show that form = min {ℓ, |{d(s) | s ∈ Y }|} (arrive at max A∈B(Y ) |A| min A∈B(Y ) |A| ≤ mk m = k (10)which proves the result.Upper Bound.We first prove the upper bound (Equation8).Let A ∈ B(Y ).We consider the equivalence relation ∼ d defined on A × A by a ∼ d a ′ if and only if d(a) = d(a ′ ).This equivalence relation induces a partition of A into its equivalence classes according to ∼ d .We therefore have ∈A/∼ d {a ′ ∈ A | d(a ′ ) = d(a)} (12) from Equation 5 and the final equality follows from Equation 4, since the equation implies that ∼ d has at most m equivalence classes.Since the estimate holds for every A ∈ B(Y ), it follows that max A∈B(Y ) |A| ≤ mk.

Table 1 :
Excerpt from a Syrian War Reuters timeline.

Table 2 :
Data set statistics.

Table 4 :
Results.Highest values per column/dataset are boldfaced.For the submodular models, † denotes sign.difference to CHIEU, * to REG, x to ASMDS.

Table 5 :
compute interest/coverage in a static local date-based window, instead of using global optimization as we do.They handle redundancy only during post-processing s.t. the interplay be-Results (align+ m:1 ROUGE-1 F 1 ) by compression rate and spread on timeline17.

Table 6 :
Length of longest daily summary, mean and median over all timelines on timeline18.