Automatic Identification of Aspectual Classes across Verbal Readings

,


Introduction
It is well known that the aspectual value of a sentence plays an important role in various NLP tasks, like for instance the assessment of event factuality (Saurí and Pustejovsky, 2012), automatic summarisation (Kazantseva and Szpakowicz, 2010), the detection of temporal relations (Costa and Branco, 2012) or machine translation (Meyer et al., 2013). Since, however, the aspectual value of a sentence results from a complex interplay between lexical features of the predicate and its linguistic context, the automatic detection of this aspectual value is quite challenging.
Studies on the computational modelling of aspectual classes emerged about two decades ago with the work of Passonneau (1988) and Klavans and Chodorow (1992), among others. In probably the most extensive study on the field, Siegel and McKeown (2000) extract clauses from a corpus and classify them into states and events, sorting the latter into culminated and non-culminated events in a subsequent step. The classification is based on features inspired by classic Vendlerian aspectual diagnostics, themselves collected from the corpus. Since, however, these features are collected on a type level, this method does not give satisfying results for verbs whose aspectual value varies across readings (henceforth 'aspectually polysemous verbs'), which are far from exceptional (see section 3) 1 .
This problem is directly addressed by Zarcone and Lenci (2008). These authors classify corpus clauses into the four Vendlerian aspectual categories (states, activities, accomplishments and achievements), and like Siegel and McKeown, base their classification on (classic aspectual) features collected from the corpus. However, they additionally employ some syntactic properties of the predicate, a move that enables them to better account for the influence of the linguistic context on the aspectual value of the verb across readings. Friedrich and Palmer (2014), who extend Siegel and McKeown's (2000) model to distributional features, also address the problem of aspectually polysemous verbs, by making use of instancebased syntactic and semantic features, obtained from an automatic syntactic analysis of the clause.
The approach we present here is designed to tackle the issue of aspectual variability and is complementary to the methods just described. As we know from detailed work on verbal syntax and semantics in the tradition of Dowty (1979), Levin (1993), Rappaport and Levin (1998) and subsequent work, many morpho-syntactic and semantic properties of the verb exert a strong influence on its aspectual value in context. As far as we know, no study on the computational modelling of aspectual classes has tried to systematically take advantage of these correlations between lexical properties and lexical aspect. We aim to capitalise on these correlations with the help of a rich French lexical resource, "Les Verbes Français" (Dubois and Dubois-Charlier (1997;François et al. (2007), henceforth LVF). The LVF is a valency lexicon of French verbs providing a detailed morpho-syntactic and semantic description for each reading (use) of a verb.
Differently from previous work, the instances we classify aspectually are verbal readings as delineated in the LVF (rather than corpus phrases). We therefore study lexical aspect on an intermediate level between the coarse-grained type (verb) level and the fine-grained corpus utterance level. Also, while in previous approaches, the features are collected from corpora, those we make use of are retrieved from the lexicon entries. The substantial advantage of this approach, that heavily makes use of the colossal amount of information manually coded in the LVF, is that it enables us to fully investigate the aspectual flexibility of verbs across readings and the factors that determine it.
For our automatic aspectual classification, we firstly extracted verbal readings from the LVF for a set of 167 frequent verbs chosen in such a way that each of the four Vendlerian aspectual classes are roughly equally represented. A semanticist manually annotated each of the corresponding 1199 readings based on a refinement of the classic Vendlerian 4-way aspectual categorisation. This refinement is motivated by recent studies in theoretical linguistics converging in the view that the traditional quadripartite aspectual typology has to be further refined (see (Hay et al., 1999;Piñón, 2006;Mittwoch, 2013) among many others). Such a refinement enables one to better account for the variable degree of aspectual flexibility among predicates, so as to e.g. delineate between 'strictly stative' predicates (e.g. know), and those stative predicates that also naturally display an activity reading (e.g. think). This annotation provides the gold standard for our classification experiments. For each annotated reading, we then collected morpho-syntactic and semantic features from the LVF, chosen for their relevance for the aspectual value of the verb in context. Based on these features, we trained classifiers to automatically predict the aspectual class of the LVF readings.
We assessed the accuracy of our automatic aspectual classification in a task based evaluation as follows. Costa and Branco (2012) showed that (type-based/verb-level) aspectual indicators improve temporal relation classification in Tem-pEval challenges (Verhagen et al., 2007), which emerged in conjunction with TimeML and Time-Banks (Pustejovsky and Mani, 2003). The tasks involved in these challenges require temporal reasoning. Following Branco and Costa's example, we performed TempEval tasks on the French TempEval data, using aspectual indicators derived from the predictions generated by our classifier. This way, we could show that our aspectual classification based on lexical features is reliable.
The paper is structured as follows. Section 2 presents the resource used. Section 3 explains on which criteria verbal readings were manually annotated. Section 4 describes the features collected from the LVF. Section 5 presents the automatic aspectual classification based on these features. Section 6 presents the aspectual indicators derived from the classification. Section 7 describes how our automatic classification was evaluated through TempEval tasks.

The Resource -LVF
The LVF, which roughly covers 12 300 verbs (lemmas) for a total of 25 610 readings, is a detailed and extensive lexical resource providing a systematic description of the morpho-syntactic and syntactico-semantic properties of French verbs 2 . The basic lexical units are readings of the verbs, determined by their defining syntactic environment (argument structure, adjuncts) and a semiformal semantic decomposition (with a finite repertoire of 'opérateurs'). Once the idiosyn-crasies are put aside, this decomposition very roughly uses the same inventory of labels and features as in the lexical templates found in e.g. Pinker (1989) or Jackendoff (1983). In Table 1, we give the sample entries for the verb élargir 'widen' to illustrate LVF's basic layout.
Syntactic description (Table 1a). Each reading of a verb is coupled with a representation of its syntactic frames. In principle, a verbal reading can be coupled with a transitive frame (labelled 'T'), a reflexively marked frame ('P') and an intransitive frame ('A', 'N') unmarked by the reflexive. The syntactic description additionally specifies some semantic features of the main arguments (e.g. whether the subject and direct object are animate and/or inanimate, whether the indirect object refers to a location, etc). This information is often crucial for the aspectual value of the reading (e.g. a 'human-only' intransitive frame strongly indicates unergativity and henceforth atelicity).
Semantic description (Table 1b). Each entry in the LVF is also characterised by a semi-formal semantic decomposition providing a rough approximation of the meaning of each verbal reading. Each entry is therefore paired with a finite set of primitive semantic features and labels on the basis of which verbal readings are sorted into 14 semantic classes (eg. psych-verbs, verbs of physical state and behaviour, etc.). The semantic features and labels used in the semantic decomposition provide other cues about the type of verbs (unergative/ unaccusative verbs, manner/ result verbs, etc.) which is instantiated by each reading. For instance, for the reading 01 of élargir 'widen' ('élargir01' for short) in Table 1b, 'r/d +qt [p]' roughly corresponds to BECOME(more(p)) ('r/d' stands for '(make) become'; '+qt' stands for an increase along a scale). From this, one can safely infer that élargir01 is a 'degree achievement' verb.
Derivational properties. The LVF also indicates when a verb is formed through a derivational process, and in the positive case, provides information about the category of the verbal root, thus enabling one to identify deadjectival or denominal verbs. Finally, for each entry is specified which suffix is used for the available reading-preserving deverbal nominalisations and adjectives (-ment,age, -ion, -eur, -oir, -ure or zero-derived nominalisations, and -able, -ant, -é adjectives).

The annotation
We retrieved 1199 entries (verbal readings) for the selected 167 frequent verbs mentioned earlier. On average, each verb has roughly 15 readings, while 50% have more than 13 3 . These readings were manually annotated according to a fine-grained aspectual classification on a 'telicity scale' of eight values.
At the bottom of the scale are readings that are unambiguously ('strictly') stative (i.e. for which any other aspectual value is excluded), rated with 1 (S-STA). For instance, élargir02 (see Table 1a) is rated with 1, given (a.o.) its incompatibility with the progressive. Those are distinguished from stative verbs that also display a dynamic reading (e.g. penser 'think'), rated with 2 (STA-ACT). Readings that are unambiguously dynamic and atelic ('strict activity' readings) are rated with 3 (S-ACT).
At the top are found achievement readings for which any other aspectual value is excluded, rated with 8 (S-ACH). At the middle of the scale are found 'variable telicity' readings, that have no preference for the telic use in a neutral context and are compatible both with for-and in-adverbials, rated with 4 (ACT-ACC). For instance, élargir01 is rated with 4, because (a.o.) it is compatible both with for-and in-adverbials and has no preference for the telic reading in a neutral context. These variable telicity readings are distinguished from 'weak accomplishment' readings, rated with 5 (W-ACC). Out of context, weak accomplishment readings trigger an inference of completion and have a preference for the telic use; however, they are nevertheless acceptable with a for-adverbial (on the relevant interpretation of this adverbial). For instance, remplir01 'fill' (Pierre a rempli le seau d'eau 'Peter filled the bucket with water') is rated with 5, because it by default triggers an inference of completion, but is nevertheless still acceptable with a for-adverbial under the 'partitive' reinterpretation of this adverbial. Under this reinterpretation, described e.g. by Smollett (2005) or Champollion (2013), the sentence triggers an inference of non-completion (Bott (2010), see e.g. Peter filled the bucket with water for 10 minutes). 'Strong' accomplishment readingslike remplir09 (Cette nouvelle a rempli Pierre de id frame encoded information 01 T1308 transitive, human subject, inanimate direct object, instrumental adjunct P3008 reflexive, inanimate subject, instrumental adjunct A30 intransitive with adjunct, inanimate subject 02 N1i intransitive, animate subject, prep. phrase headed by de (of ) A90 intransitive with adjunct, subject human or thing T3900 transitive, inanimate subject, object human or thing (a) Syntactic descriptions id example a semantic decomposition sem. primitive sem. class 01 On élargit une route/ La route (s')élargit.
r/d+qt large become Transformation 02 Cette veste élargit Paul aux épaules/ La robe élargit la taille. d large a.som become Transformation 03 On élargit ses connaissances.
r/d large abs become Transformation 04 On élargit le débat à la politique étrangère.
f.ire abs V R S directed move Enter/Exit (b) The four readings illustrated by sample sentences and their semantic description a Literal translations -01: One widens a road/the road is REFL widened/the road widens. 02: This jacket widens Paul 'at the' shoulders/ The dress widens the waist. 03: One widens one's knowledge. 04: One extends the debate to foreign policy. joie 'This news filled Peter with joy') -are incompatible with the partitive reinterpretation of for-adverbials. 4 Those are rated with 6 (S-ACC). Finally, accomplishments that share a proper subset of properties with achievements are rated with 7 (ACC-ACH).
The annotator evaluated each entry with a definite or singular indefinite internal argument, in order to abstract away from the role of the determiner in the aspectual value of the VP (see e.g. Verkuyl (1993)).
We also used a coarser grained aspectual scale and group the verbal readings into the following classes: ATElic (rating 1-3), with VARiable telicity (rating 4), and TELic (5 or more). Table 2 gives an overview of the distribution of the aspectual ratings.
The first finding is that verbs display a considerable aspectual variability across readings, which confirms the need to go beyond the type level for the computational modelling of aspectual classes. The aspectual value of 2/3 of the 151 verbs with more than one reading varies with the instantiated reading (on the 8 value scale). With respect to the coarser grained scale, roughly half of the verbs (82, for a total of 793 readings) have readings in more than one of the three overarching aspectual classes. 4 The for-adverbial is nevertheless compatible with rem-plir09, but only under its (non-partitive) 'result state-related interpretation', under which it scopes on the result state, cf. Piñón (1999); see e.g. This news filled Peter with joy for ten minutes.

The features
The LVF connects each verbal reading with specific morphological, syntactic and semantic features. Among such features, those that influence the lexical aspect of the verb in context are known to be pervasive: Verbs encoding the BECOME operator in their event structure generally have a telic use; intransitive manner verbs are mostly activity verbs (see e.g. Rappaport Hovav and Levin (1998) and subsequent work); ditransitive verbs like give are mostly result verbs (see e.g. Pylkkänen (2008)) and thus accomplishments. 5 . We took advantage of many of these features for our classification. Also, some semantic classes give very clear hints to the lexical aspect of its members. For instance, readings instantiating the class of 'enter/exit verbs' are telic, those instantiating the 'transformation' class are never atelic only, etc. 6 We also made use of features conveyed by the semantic decomposition, in particular its main component (BECOME, DO, ITER, STATE, etc.).
We also took advantage of the encoded information on the suffixes used in reading-preserving nominalisations. For instance, readings with an intransitive but no transitive frame can in prin-  in semantic decomposition for adverbial durative adverbial I sang for ten minutes.
in semantic decomposition continuous adverb +re (iterative operator) She will live indefinitely.
in semantic decomposition ciple characterise unaccusative (telic) or unergative (atelic) verbs. But only the latter undergoeur nominalisation, as in English (see Keyser and Roeper (1984)). The availability of the -eur nominalisation is therefore a reliable aspectual feature too. Tables 3 and 4 compare features used in some previous aspectual classifications and their equivalents in the LVF. As one can check, the LVF features cover most of the features used in Siegel and McKeown (2000) and Zarcone and Lenci (2008) 7 . For obvious reasons, features related to grammatical aspect conveyed by tenses are not covered in our valency lexicon. But overall, our set of features roughly corresponds to those used in previous work, for a total of 38 features.

Classifying LVF entries
The items we classified are the 1199 readings for the 167 verbs selected. Our classification task consisted in predicting the right (coarse-grained) aspectual class for these readings (ATE, VAR or TEL). In this supervised learning setting, we ap-7 The features used by Friedrich and Palmer (2014) are mainly derived from those of Siegel and McKeown (2000).  plied the classifiers shown in Table 5 with the implementation provided by Weka (Hall et al., 2009), mostly with their default settings 8 . We measured the performance of the classifiers by assessing the accuracy in 10-fold cross-validation, and compared it to the accuracy of a baseline classifier which always assigns the majority class (TEL, rules.ZeroR). We also performed a linear forward feature selection using the Naïve Bayes algorithm 9 . This way, nine features were selected, coding, among others: • the presence of a temporal or manner argument/adjunct in the semantic decomposition; • the main primitive in the semantic decomposition; • the use of the suffixes -ment and -ure in the reading-preserving nominalisation; • the relative polysemy of the lemma (indicated by the number of its readings); • a subject that must be inanimate; • the presence of a reflexive reading. 8 For libsvm (the SVM implementation), we used a linear kernel and normalisation. We selected roughly one classifier from each class. 9 An exhaustive search with the 38 features in this group was computationally too time-consuming.  The results in Table 5 show that the features retrieved from the LVF enable one to predict the aspectual class considerably better than the baseline: The accuracy ranges from 12 points to almost 20 above the baseline accuracy of 48.37. The best configuration, achieving an accuracy of 67.48%, is the lazy.kstar classifier based on the feature set reduced by feature selection. A comparison with the results reported in previous work is difficult, due to the great discrepancies in the experimental settings (see the introduction). However, our results clearly show that the aspectual class characterising verbal readings can be predicted with a reasonable precision on the basis of lexical-related information only. They once again empirically confirm the well-documented correlations between lexical aspect and the morphosyntactic/semantic properties of the verb.

Aspectual indicators
In this section, we take a more qualitative look at the results obtained in section 5. We assessed the quality of the predictions of our model (henceforth LVF-model) in two ways. Firstly, we derived aspectual indicators for the type level, describing the general 'aspectual profile' of a verb across all its readings. These are later used in the task based evaluation described in section 7 10 . Secondly, we looked at the aspectual values assigned to the readings of particular verbs (see indicators for the verbal readings below).
Indicators for the type-level. The aspectual indicators for the type-level are computed on the basis of the aspectual values predicted for each reading of the verb. As shown in Proportion of flexible readings 4. probest.max Max of probability estimates 5. probest.min Min of probability estimates 6. probest.avg Average of probability estimates (b) Numeric aspectual indicators.  Table 6a shows whether there is any variation at all, 't' assesses the presence of at least one telic reading, etc. Whereas the indicators in Table 6a provide qualitative cues, those in Table 6b convey quantitative information. The first three give the proportion of readings of a particular aspectual class. The last three are computed from the probability estimates generated by the libsvm classifier.
In order to get an idea of the quality of our predictions, we computed from automatic predictions the aspectual indicators for all annotated verbs. We provide some of them in Table 7 for verbs judged aspectually polysemous by the annotator. These 'automatic' aspectual indicators are given in normal font. For the same verbs, we also computed the 'manual' aspectual indicators, i.e. those computed on the basis of the manual annotations (when possible) 11 . These are set in bold face. The verbs in Table 7a are dominantly telic, those in 7b dominantly atelic and those in 7c dominantly variable. As one can check, the dominant aspectual value is correctly assigned in most cases. Also, in most cases, the proportion of uses of the nonpreferred readings closely matches the proportion obtained manually. Unsurprisingly, the sample of verbs predicted to be 'mostly telic' are mostly (quasi-)achievement verbs or strong accomplishments describing 'non-gradual' changes (verbs lexicalising changes involving a two-point scale, e.g. dead or not dead for kill, see e.g. Beavers (2008)). Unsurprisingly again, many verbs predicted to be 'mostly variable' are degree achievement verbs. More remarkably, remplir 'fill' is rightly predicted to be 'mostly telic', although it is a verb of gradual change. The model therefore preserves here the crucial distinction between degree achievements associated with a close scale like remplir, tolerating atelic readings under some uses although they conventionally encode a maximal point (see Kennedy and Levin (2008)), and achievement verbs associated with an open scale like élargir 'widen', that also accept both for-and in-adverbials, but do not show a preference for the telic reading in absence of any adverbial. These observations suggest that even if predictions for some readings are wrong, the aspectual indicators might still rightly capture the general 'aspectual profile' of verbs at the type level.
Indicators for the verbal readings. We also inspected the predicted values for some predicates and compared them to the values assigned manually. For predicates showing a high degree of aspectual variability like élargir 'widen' (see Table 7c), the results are very good: élargir01 ('They are widening the road') is correctly analysed as VAR and élargir04 ('They are extending the majority') as TEL. Interestingly, élargir02 ('This jacket widens Pierre's shoulders') is correctly analysed as ATE, despite of the fact that it is wrongly analysed by the LVF as instantiating the class of change of state verbs (see footnote 6). This suggests that the computational model could leverage the information provided by the syntactic frames associated to élargir02 (see Table 1b) to outweigh the wrongly assigned semantic class and produce the correct aspectual prediction.

Task based evaluation
Reliable automatic aspectual classifications are expected to enhance existing solutions to temporal relation classification. Thus, if our LVF-model improves such a solution, we can conclude that our learned aspectual values are reliable. We therefore evaluated the predictive power of the LVFmodel on unseen verbs through such tasks, following the method proposed in Costa and Branco (2012). While Costa and Branco (2012) collected their aspectual indicators from the web and improved the temporal relation detection in the Portuguese TimeBank (PTiB), we derive ours from the predictions generated using the LVF-model, as described in section 6 and use them in TempEval tasks for the French TimeBank. The data used in these experiments are the French  Table 7: Aspectual indicators computed from predictions and from manual annotations. Indicators in bold face are computed based on manual annotations. The names of the indicators refer to the labels used in Table 6.
TempEval data, a corpus for French annotated in ISO-TimeML (FTiB in the following) described in Bittar et al. (2011). This data contains about 15 000 tokens 12 annotated with temporal relations. Of these, roughly 2/3 are marked between 2 event arguments and 1/3 between an event and a temporal expression. The classification tasks we are concerned with deal with the automatic detection of the type of these temporal relations, namely the tasks A, B and C in the TempEval 2007 challenge 13 . Table 8 gives an overview of the data for each of the three classification tasks. We build our experiments on top of a base system addressing these challenges and show that the performance of this base system can be improved using our aspec-  Attribute tual indicators. Like Costa and Branco (2012), we implemented as base system the classifiers proposed for English by , which only rely on relatively simple annotation attributes. Table 9 lists the features used in the context of our FTiB data, basically the same as in  and Costa and Branco (2012). As in their work, we also determined the final set of features by performing an exhaustive search on all possible feature combinations for each task, using again the Naïve Bayes algorithm. The features marked ' ' are those finally selected this way. Using this set of features, we trained the same classifiers and under the same conditions described in section 5 on the FTiB data. The accuracy of the resulting models in 10-fold cross-validation on the three TempEval tasks are shown in italics in Table 10.
Following again Costa and Branco (2012), we then enhanced this basic set of features with each of the aspectual indicators computed from the predictions generated by the LVF-model. The aspectual indicators are listed in Table 6; we described their computation in section 6. This way, we obtained 10 enhanced feature sets, one for each as-pectual indicator. Using these feature sets and the same classifiers as before, we learned models on the FTiB data and computed their accuracy in 10 fold cross-validation.
The improvements achieved this way are shown in Table 10. Whenever an aspectual indicator improves the results of the base system, we give its accuracy (in bold face) below the accuracy of the base system. The superscripts refer to the lines in Table 6 and show which of the aspectual indicators was used to enhance the base feature set to obtain the reported improved accuracy 14 .
The results given in Table 10 show that the accuracy of 8 out of the 15 tested classifiers could be improved by 1-3 points by adding the aspectual indicators. The indicator which produced the most and largest improvements was the average over the probability estimates, suggesting that this value best reflects the dominant aspectual value of the verb. Overall, the improvement obtained through our classification is quantitatively comparable to the enhancement realised by Costa and Branco (2012): Their results show an improvement similar in size to ours for 9 out of the same 15 classifiers. They evaluate on a test set, whereas we compare accuracy in 10-fold cross-validation. This was necessary since the French TimeBank is considerably smaller (roughly 1/4 of Costa and Branco's data set for Portuguese, see PTiB column in Table 8). As mentioned earlier, a qualitative comparison is nevertheless difficult, given the substantial differences between the data and the methodology used here and there.
The results clearly show however that the LVFmodel trained on our annotated lexical entries performs well on unseen predicates.

Conclusion and future work
This paper focuses on the issue of aspectual variability for the computational modelling of aspectual classes, by using a machine learning approach and a rich morpho-syntactic and semantic valency lexicon. In contrast to previous work, where the aspectual value of corpus clauses is determined at the type (verb) level on the basis of features retrieved from the corpus, we make use of features retrieved from the lexicon in order to predict an aspectual value for each reading of a same verb (as they are delineated in this lexicon). We firstly  studied the performance of the classifier on a set of manually annotated verb readings. Our results experimentally confirm the theoretical assumption that a sufficiently detailed lexicon provides enough information to reliably predict the aspectual value of verbs across their readings. Secondly, we tested the predictions for unseen predicates through a task based evaluation: We used the aspectual values predicted by the LVF-model to improve the detection of temporal relation classes in TempEval 2007 tasks for French. Our predictions resulted in improvements quantitatively similar to those achieved by Costa and Branco (2012) for Portuguese and thus confirm the reliability of our aspectual predictions for unseen verbs. The investigation reported here can be further pursued in many interesting ways. One possible line of work consists in exploring the aspectual realisation and distribution of the LVF readings in corpus data. This would also provide means to relate our findings for verbal readings to corpus instances.
Our study strongly relies on the LVF lexical database, a very extensive source of morphosyntactic and semantic information. For other languages, this kind of information, when it is available, is generally not contained in a single lexicon. Therefore, a further interesting research direction would be to evaluate the applicability of our technique to suitable information from distributed resources. On this respect, recent efforts made for linking linguistic and lexical data and making these data accessible and interoperable would certainly be very helpful. For English in particular, available suitable resources are already abundant.
One of these is the Pattern Dictionary of English Verbs, see (Hanks, 2008). Other interesting data bases are FrameNet (Baker et al., 1998), VerbNet (Levin, 1993;Kipper Schuler, 2006) and Prop-Bank (Palmer et al., 2005), especially since these different resources have been mapped together by (Loper et al., 2007), thus giving access to both the lexical and distributional properties defining each entry.
Increasing the reliability of automatic identification of aspectual classes also represents interesting opportunities for several NLP applications. A finer-grained and more reliable automatic assessment of aspectual classes can among others be quite useful for increasing the accuracy of textual entailment recognition, and, particularly, the sensitivity of systems to event factuality (Saurí and Pustejovsky, 2009). For instance, for telic perfective sentences, while the inference of event completion amounts to an entailment with strong accomplishments and (quasi-)achievements (at least in absence of an adverb signalling incompletion like partly), the same inference is to some extent defeasible with weak accomplishments. Integrating finer-grained distinctions among predicates could also enable one to better disambiguate verbal modifiers like durative adverbials. A foradverbial typically signals that the event is incomplete when it modifies a weak accomplishment; e.g., Peter filled the truck for one hour suggests that the filling event is not finished, see (Bott, 2010) a.o. However, the same adverbial does not trigger this inference when it applies to a strong accomplishment or a (quasi)-achievement. For instance, They broke the law for five days does not suggest that the breaking event is not finished. A system that performs better in the identification of fine grained aspectual classes would therefore evaluate with more precision the probability that the reported event is completed in the actual world.