Aspectuality Across Genre: A Distributional Semantics Approach

The interpretation of the lexical aspect of verbs in English plays a crucial role in tasks such as recognizing textual entailment and learning discourse-level inferences. We show that two elementary dimensions of aspectual class, states vs. events, and telic vs. atelic events, can be modelled effectively with distributional semantics. We find that a verb’s local context is most indicative of its aspectual class, and we demonstrate that closed class words tend to be stronger discriminating contexts than content words. Our approach outperforms previous work on three datasets. Further, we present a new dataset of human-human conversations annotated with lexical aspects and present experiments that show the correlation of telicity with genre and discourse goals.


Introduction
One of the fascinating aspects of studying aspectual class of verbs in English is its relation with nonverbal categories.Thus, although in origin a property of the verb, the aspectual class interacts in a tight-knit fashion with other words in a sentence.Previous research has discussed the importance of predicting the aspectual classes of verbs for predicting coherence relations in text and imagery (Alikhani and Stone, 2019), predicting links in entailment graphs (Hosseini et al., 2019) and interpreting sign languages (Wilbur, 2003).In addition, knowledge about the aspectual class of a verb phrase, and its influence on the temporal extent and entailments that it licenses, has been leveraged in the past for a number of natural language understanding tasks such as temporal relation extraction (Costa and Branco, 2012), event ordering (Chambers et al., 2014;Modi and Titov, 2014), and statistical machine translation (Loáiciga and Grisot, 2016).
The Aktionsart (Vendler, 1957) of a verb determines the temporal extent of the predication as well as whether it causes a change of state for the entities involved (Filip, 2012).As Aktionsart typically refers to the lexical aspect of a verb in isolation, we adopt the terminology of Verkuyl (2005), and refer to the compositionally formed Aktionsart of a verb phrase as predicational aspect.
One of the most important distinctions of the predicational aspect of a verb is between states, such as to know or to love, and events, such as visit or swim.This distinction is important for identifying the entailments that a given verb phrase licenses, as stative predications do not, by definition, entail any change of state.This property has important consequences for a number of natural language understanding tasks such as question answering.For example, if it is known that John has arrived in Vienna, a system leveraging aspectual information will be able to infer that the completion of the event of arriving in Vienna, indicated by the perfect VP having arrived in, has caused a change of state which entails being in.Therefore, when asked Where is John?, the system will be able to produce the correct answer: Vienna.On the other hand, a predominantly stative verb such as to know, as in Eve knows a lot about quantum mechanics, does not cause a change of state for either Eve or quantum mechanics.
Telic predicates do not license consequent state inferences from their progressive VP forms to corresponding non-progressive forms.1 Thus, telic/atelic classifications are supported by contrastive pairs like the following: (1) Mary was drawing a circle Mary drew a circle (telic) (2) Mary was pushing a cart → Mary pushed a cart (atelic) In this paper we propose to approach the problem of classifying predicational aspect with distributional semantics.Our hypothesis is that the meaning distinctions of a verb that relate to its aspectual class should be reflected in its distribution when composed with its context.We therefore intersect word vectors with their context in order to determine a VP's predicational aspect, and show that we achieve a new state-of-the-art on two datasets.We further evaluate our approach on two new genres: image captions and situated human-human conversations, thereby extending the validity of our findings across a variety of genres.

Related Work
An early approach to classifying the lexical aspectual class of a verb in context was proposed by Passonneau (1988), who applied a decompositional analysis of the verb to determine the aspectual class for verb occurrences in a restricted domain.The first general-purpose study was conducted by Siegel and McKeown (2000), who built up on earlier work by Klavans and Chodorov (1992), and collected linguistic indicators for lexical aspect from a large corpus.These include the presence of inor for-adverbials, the tense of the verb or its frequency.Siegel and McKeown (2000) subsequently applied different supervised machine learning algorithms to classify the extracted feature vectors into either states or events, or telic or atelic events.Siegel and McKeown (2000) show that their method substantially improves over a majority-class baseline.The first approach to include features derived from a distributional semantic model has been proposed by Friedrich and Palmer (2014).In addition to the linguistic indicator features of Siegel and McKeown (2000), Friedrich and Palmer (2014) extract representative stative, dynamic or mixed verbs from the lexical conceptual structure (LCS) database (Dorr and Olsen, 1997) and subsequently use distributional representations to derive similarity scores for the mined verbs.
Another extension to the work of Friedrich and Palmer (2014) has been proposed by Heuschkel (2016), who refines the distributional similarity features by first contextualising a target verb with its subject or object, and only then computing the distributional similarities to the set of representative verbs from the LCS database as in Friedrich and Palmer (2014).All else being equal, Heuschkel (2016) shows that contextualising the distributional representations improves performance on the Asp-ambig dataset of Friedrich and Palmer (2014).
In contrast to this line of research we do not make explicit use of any hand-engineered linguistic indicator features but show that these can be picked up in an unsupervised way by composing distributional semantic word representations.The linguistic indicators are furthermore frequently collected on the verb type level instead of on the token level.Similar to Falk and Martin (2016), we are concerned with classifying verb readings; however, we do not use engineered features as Falk and Martin (2016) do, but directly leverage local contextual information in the form of distributional representations.Our approach is also not reliant on the availability of a parallel corpus as in Friedrich and Gateva (2017).The major difference between our approach of using distributional word representations and previous approaches is that we are using the word representations directly for classification, rather than indirectly by computing similarity scores and using these as features.This furthermore liberates us from the requirement of having a representative seed set of verbs per class to compute the distributional similarities from.
predicational aspect research, and thereby extend the existing evaluation repertoire by a very important genre.
Annotation effort.As our starting point, we sampled 2000 utterances from the Walking Around corpus (Brennan et al., 2013) uniformly at random.The Walking Around corpus is a dataset of humanhuman phone conversations where one party needs to find certain landmarks on a university campus and receives directions via phone from the other party.(Brennan et al., 2013).
We chose the Walking Around corpus because its conversations are situated and in real-time, and because it contains a good distribution of stative, telic, and atelic verb phrases.After sampling the initial set of 2000 utterances, we filtered multi-sentence utterances and utterances that did not contain a verb.We furthermore removed any filled pauses (indicated by "(..)" in Table 1) that have been transcribed and marked in the dataset.Following Alikhani and Stone (2019), we annotated the first VP for predicational aspect in all utterances.For example, the last utterance of Speaker 1 in Table 1 contains multiple verbs, and we have annotated the phrase Yeah it looks like one-the first VP in the utterance.
The study has been approved by Rutgers's IRB.Expert annotators annotated the whole dataset and were paid an hourly rate of 15 USD.They were final year linguistics undergraduate students and were provided with an annotation protocol for their task. 3To assess the inter-annotator agreement, we determine Cohen's κ value.We randomly selected 200 sentences and assigned each to two annotators, obtaining a Cohen's κ of 0.81, which indicates almost perfect agreement (Viera and Garrett, 2005).
Overall statistics.The final DIASPORA dataset contains 927 annotated utterances, consisting of 400 utterances labelled as expressing stative predicational aspect (43%), 279 labelled as telic (30%), and 248 labelled as atelic (27%).The overall average utterance length is 15.58.for DIASPORA per individual label.The means and medians are relatively similar across all classes, suggesting that there is no bias in terms of utterance lengths for any individual class.
The DIASPORA dataset contains 98 unique verb forms, spanning 69 lemmas, with the top 10 most frequent verb lemmas making up ≈78% of all verbs in the corpus.
The characteristic of few verbs making up a large proportion of the overall data has already been observed for captions (Alikhani and Stone, 2019).This is an expected property in DIASPORA and is due to the single-domain nature of the Walking Around corpus.Figure 1 shows the frequency distribution of the top 10 most frequent verbs and their associated label distribution.The large proportion of various forms of be is due to many utterances of either speaker referring to the current location of the subject looking for the landmark (e.g.utterances like I am currently at . . .).The label distribution of the 10 most frequent verbs shows that there are some highly skewed verbs, such as get, see or know, which have a clear majority class, whereas do or look exhibit a much more balanced, and therefore ambiguous label distribution.

Experiments
The utility of distributional semantic word representations has been shown in a large body of works in recent years (Weeds et al. (2014), Nguyen et al. (2017), Socher et al. (2013), Bowman et al. (2015); passim).In order to compose a verb with its context we apply pointwise addition as a simple distributional composition function.Pointwise addition in neural word embeddings approximates the intersection of their contexts4 (Tian et al., 2017), and has been shown to be an efficient function for contextualising a word in a phrase (Arora et al., 2016;Kober et al., 2017).

Distributional Models for Predicational Aspect
Following previous work on modelling the aspectual class of a verb (Siegel and McKeown, 2000;Friedrich and Palmer, 2014), we treat the problem as a supervised classification task, y = f (x), where y represents the aspectual class of a verb, f represents a classification algorithm, and x an input vector representation of a verb in context.For all of our experiments, f is a logistic regression classifier,5 with default hyperparameter settings.In all our experiments, the input vector x is based on 300-dimensional pre-trained skip-gram word2vec (Mikolov et al., 2013) vectors. 6We lowercase all words, but do not apply any other form of morphological preprocessing, which means that we retain different representations for different inflected forms of a verb-i.e.look, looks, looking, and looked are represented by 4 distinct vectors.

Classifying Aspect with Distributional Semantics
For this approach we obtain a word2vec representation x for a given verb v and feed x into a logistic regression classifier in order to predict the aspectual class of v.This approach represents a rather naïve baseline that assumes that the aspectual class of a verb is a purely lexical phenomenon on the type level and can be determined independent of any context.

Incorporating Context with Distributional Composition
Let x be a word2vec representation for a given verb v, and C be the set of context words extracted for v, with c ∈ C denoting the vector representation for an extracted context word of v.The composed representation of v, denoted by x , can then be expressed as a simple sum: (1) Subsequently, x is passed through a logistic regression classifier in order to predict the aspectual class of v.This model aims to capture the compositional nature of predicational aspect by integrating local contextual information into the model.

Types of Context
We investigate two different kinds of context: simple linear context windows of varying length and firstorder dependency contexts.For example for the sentence in Figure 2, a linear context window of size 1 would extract Jane and to for the target verb decided, whereas a dependency-based context would extract Jane and leave.We used the Stanford NLP pipeline (Manning et al., 2014) with default settings for parsing the sentences in our datasets.For linear context windows we use sizes {1, 2, 3, 5, 10}, and for first-order dependency-based contexts we experiment with using only the head7 of the verb, only its children, or the full first-order context.

Incorporating the Full Sentence
We furthermore test a model that incorporates the whole sentential context into a vector representation.The approach simply uses all words from a given sentence and composes their corresponding word2vec representations as in Equation 1 above to create an embedding for the whole sentence.Embedding a sentence by adding word vectors has been shown to be an effective method for other NLP tasks such as sentiment analysis (Iyyer et al., 2015) and recognising textual entailment (Wieting et al., 2016).
The underlying rationale behind this approach is that the aspectual class of a verb is a function of the sentence as a whole, rather than dependent on local context alone (Moens, 1987;Moens and Steedman, 1988;Dowty, 1991).

Experiments
We perform experiments that assess the suitability of distributional representations for distinguishing states from events ( § 5.1), and telic from atelic events ( § 5.2).Only a completed and telic event licenses a new consequent state.Therefore, modelling predicational aspect is important for deeper text understanding, for example for modelling cause and effect, and especially for inferring consequent states.

Experiment 1 -States vs. Events
For the distinction between states and events we perform experiments on 5 datasets in total.We use the Asp-ambig dataset by Friedrich and Palmer (2014), the SitEnt dataset by Friedrich et al. (2016), our own sub-sampled version of the SitEnt dataset, the Captions dataset by Alikhani and Stone (2019), and our own DIASPORA dataset, proposed in this work.
The Asp-ambig dataset is sampled from the Brown corpus (Francis and Kucera, 1979) and is based on 20 frequently occurring verbs whose predicational aspect changes depending on context.For each verb, Friedrich and Palmer (2014) collected 138 sentences, resulting in 2760 examples in total.The dataset contains the annotations of whether the verb in context expresses a state, event, or whether it could be both.8Following Friedrich and Palmer (2014), we report accuracy using leave-one-out cross-validation. 9e furthermore evaluate our approach on the SitEnt dataset (Friedrich et al., 2016).The SitEnt dataset contains 40k sentences from the MASC corpus (Ide et al., 2008) and English Wikipedia, split into separate training and test sets.We evaluate our approach on the test set, using the original split of Friedrich et al. (2016).The SitEnt dataset contains annotations for verbs in context as either expressing a state or an event, but not both.Following Friedrich et al. (2016), we report class-based F1-scores.
During model development we noticed an idiosyncrasy in the SitEnt dataset, where only 900 verb types out of 4.5k occurred with both class labels, and only 267 of them had a balanced (i.e.ambiguous) class distribution.We considered this as likely problematic as a classifier might just pick up this artefact.We therefore created a downsampled dataset -SitEnt-ambig -that only contains verb types with a balanced class distribution, randomly sub-sampling the majority class for verb types with an imbalanced class distribution 10  In order to cover a wider variety of genres, we also evaluate our approach on the Captions dataset of Alikhani and Stone (2019).The dataset is based on a number of image captions corpora and contains annotations for verbs being used as states, telic events and atelic events.For this experiment we merge the telic and atelic class, resulting in a 2-class problem with 2687 instances with a class distribution of 22:78 (state:event).The dataset does not contain pre-defined training/evaluation splits.We therefore evaluate using 10-fold cross-validation and report class-based F1-scores.
Finally, we evaluate on our proposed DIASPORA dataset, again merging the telic and atelic classes for this experiment, resulting in a class distribution of 43:57 (state:event).We again report class-based F1-scores over 10-fold cross-validation.

Results
Table 3 below shows the results on all datasets.We compare classifying the representation of a verb without any context, the verb with local context, and the full sentence, with a majority-class baseline and previous results in the literature.The results of using a local context are based on the best performing context window around the verb, an overview of the effect of the size of the context window is shown in Figure 3 In general, a local context window exhibits the strongest performance, even achieving a new state-ofthe-art on the Asp-ambig dataset, despite the simplicity of our setup.The strong results of the verb-only model on the SitEnt dataset, that substantially outperforms the sequence model of Friedrich et al. (2016) and the local context model, confirms our suspicion that the classifier learnt the fact that most verbs in the dataset occur unambiguously with their target label.This is furthermore reflected in the results on the SitEnt-ambig dataset, where using only the verb leads to considerably worse performance than when taking a local context window around the verb into account.While the results on SitEnt-ambig are generally low, this reflects the increased difficulty of the task as well as the simplicity of our setup, and we expect to improve on these results with higher capacity models in future work.

Analysis
In Figure 3 we show class-based F1-score performance trajectories for varying sizes of the linear context window and the dependency context across all datasets.We observe that performance typically peaks at a narrow context window of taking 1-3 surrounding words into account, with performance dropping steeply when increasing the context window.12Our results also exhibit that linear window contexts are typically better predictors for predicational aspect than dependency contexts.This is an interesting result as dependency contexts are more likely to yield content words, such as nouns, adjectives or other verbs as context,13 as opposed to linear context windows yielding more closed class words.We investigate this effect further by comparing the general overall performance of closed class words with content words.Figure 4 provides empirical evidence that closed class words are strong predictors of predicational aspect.The figure shows accuracies for PoS tags belonging to a closed-class group, in comparison to ones belonging to open class content words.We calculated PoS-based accuracy by counting how often a word with a given PoS tag contributed to a correct classification as opposed to an incorrect one.For example if the PoS tag IN14 occurs 8 times as part of correctly classified context windows and 2 times as part of incorrectly classified ones, we estimate its accuracy as 0.8.We count the participation of a PoS tag for a correct or incorrect classification decision as evidence that the given word is a reliable predictor for a given class.We expect that words with high predictive capacity will more often occur in correctly classified context windows than in incorrectly classified ones.
Figure 4 highlights that closed class words are typically more reliable predictors for predicational aspect than content words.This is a very interesting result, given our model solely operates on the basis of composed word vectors, thus indicating that distributional representations for closed class words encode a substantial amount of information that can potentially be leveraged for fine-grained directional inferences.In order to assess the generalisation capability of distributional representations we performed a zero-shot experiment on the Asp-ambig dataset where we held out all annotated data for a specific verb for evaluation, and trained the model on the remaining data.Table 10 in Appendix D provides evidence that distributional representations capture predicational aspect of unseen verbs to a surprising extent.
Table 4 shows example sentences for two ambiguous verbs from our datasets.In the first and third sentence the preposition at and the particle up, respectively, cause the predicate to express an event.Without a preposition, verbs such as look can express stative aspect as in the second sentence.The   last sentence is an interesting case where the verb stand occurs in the context of a preposition, yet the combination remains stative, as the sentence describes the arrangement of inanimate objects.

Experiment 2 -Telic vs. Atelic Events
For classifying telic and atelic events we are using the Telicity dataset of Friedrich and Gateva (2017), the Captions dataset of Alikhani and Stone (2019), as well as our own proposed DIASPORA dataset.
The Telicity dataset contains 1863 sentences extracted from the MASC corpus, where a verb in context is labelled as either telic or atelic.The dataset is imblanced with 82% of verb occurrences being labelled as telic.We follow the experimental protocol of Friedrich and Gateva (2017) and report accuracy and class-based F1-scores, using document-based cross-validation.During our experimental work we again noticed that only 70 out of approximately 570 distinct verbs in the dataset occur with both labels.However, applying the same strategy as for the SitEnt-ambig dataset would have resulted in too little data. 15herefore, given this characteristic, we again expect the classifier using the verb without any context to achieve artificially high performance.
For the Captions dataset, we omit the examples labelled as stative, leaving us with 2092 captions in total, of which 800 are annotated as telic (38%), and 1292 as atelic (62%).We perform 10-fold cross-validation and report accuracy and class-based F1-scores.
Finally, for our DIASPORA dataset, we also omit the utterances annotated as expressing stative predicational aspect, leaving us 527 examples in total, 279 instances labelled as telic (53%), and 248 instances labelled as atelic (47%), thus representing the most balanced dataset among the three.

Results
Table 5 shows the results for all three datasets, comparing a model that only has access to the distributional representation of the target verb itself, with models that have access to a local context window and the full sentence, as well as to previous results in the literature.A result table comparing the best linear context window window with the best performing dependency context window is presented in Table 9  67.8 (± 0.05) 62.9 (± 0.07) 0.0 --Table 5: Results on classifying telic vs. atelic events.FG17 refers to the best performing model of Friedrich and Gateva (2017), and FG17+IC refers to the model of Friedrich and Gateva (2017) with access to additional data.
Our purely distributional models achieve competitive results, with the expected strong performance for the verb-only model, that is even beating the current state-of-the-art in terms of accuracy and F1-score for the atelic class.For the Captions and DIASPORA datasets we observe similar trends as for the state vs. event datasets above, with the models that operate over a local context window typically achieving the strongest performance.Notably, the verb-only models are able to perform competitively with local context windows across all datasets.While telicity itself is not part of the morphology of English verbs, telic events frequently correlate with the past tense, such that the distributional representation for the inflected verb already encodes a substantial amount of information.

Analysis
Figure 5 shows a class-based F1-score performance trajectory across all datasets and varying context window sizes.Unlike for distinguishing states from events in Figure 3 above, predicting telicity appears to be less dependent on a small local context window surrounding the target verb.This is reflected in Figure 5 which does not contain such clear performance peaks, but is more uniform across different sizes of context windows.We furthermore show the averaged PoS-based accuracy plot in Figure 6.For predicting telicity, closed class words are less reliable predictors in comparison to content words than for modelling states and events above.
This result becomes more transparent when analysing actual sentences from our dataset.
Aspect Verb Example Sentences Telic leave (1) Okay, I have left the building.Atelic walk (2) Okay, I'm still walking towar-oh is it blue?Telic turn (3) Fans turned on the players and manager.Atelic paddle (4) Four kayakers paddle through the water.Table 6 shows some example sentences from the datasets annotated for telicity.The sentences show that telicity in English is frequently associated with tense, with present tenses indicating atelic eventualities and past tense indicating a completed event.This suggests that frequently the verb by itself might be sufficient for inferring telicity as in sentences ( 3) and ( 4).In many other cases, the verb interacts with its auxiliary in a tensed construction as in sentences ( 1) and ( 2).

Conclusion
In this work, we have proposed the first dataset of human-human dialogues annotated with the aspectual class of verbs.We have proposed a compositional distributional approach for modelling the aspectual class of English verbs in context.Our results indicate that distributional models are able to learn concise representations for closed class words such as particles and prepositions, and that classifiers using composed distributional representations achieve a new state-of-the-art on three recently proposed datasets.We have furthermore contributed a qualitative analysis, providing empirical evidence for the long standing insight of semanticists that the presence of prepositions or particles in a verb phrase, tend to be very reliable indicators of the verb's aspectual class (Vendler (1957), Dowty (1979), Moens and Steedman (1988), passim).Our model setup was intentionally kept simple as we were primarily concerned with the question whether predicational aspect can be captured with a distributional semantics approach in principle.We note that using more sophisticated models might yield even stronger results, although in preliminary tests, we did not observe any meaningful performance difference when replacing our bagof-embeddings approach with ELMo (Peters et al., 2018) or BERT (Devlin et al., 2019).
While this work was done on English, we aim to use our methodology in a multilingual setup in future work as distributional approaches scale well with growing amounts of data and across languages.
Aspect, alongside tense, is a crucial indicator of the temporal extent of a verb as well as the entailments it licenses.In future work we plan to integrate aspectual information for improving the unsupervised construction of entailment graphs (Berant et al., 2010;Hosseini et al., 2018), as well as temporal reasoning, which has been shown recently to be difficult for distributional semantic models (Kober et al., 2019).
Aspectual information can be utilised for directional entailment detection by inferring that the event of buying something entails the state of owning that thing, but not the other way round.Determining the telicity of an event also enables fine-grained inferences about whether an event caused a change of state.For example, while the telic context of writing a sonnet in fifteen minutes entails a change to a state where a finished sonnet exists, the atelic context of writing a sonnet for fifteen minutes does not.
A Supplemental Material -Performance per Verb Type in Asp-ambig For strongly imbalanced classes as in the case of feel, which almost always functions as a state, the majority baseline is very difficult to beat.Interestingly, the window-1 and dependency (full) approaches frequently exhibit complementary performance.For example, while for stand or look a window-based context works substantially better, for follow or carry a dependency-based context is preferable.One explanation for this behaviour is that for stand or look prepositions are frequently the most salient indicator of aspectual class as shown in Section 5. On the other hand, for follow or carry a content word, such as the subject or direct object, is frequently more salient.

D Supplemental Material -Zero Shot Generalization
For assessing the generalisation capabilities of our methodology we are performing a zero-shot setup on the Asp-ambig dataset.Instead of running an evaluation for each verb individually as originally proposed by Friedrich and Palmer (2014), we are evaluting the model on the data for one particular verb, say look, and train the model on all available data, except the data for the heldout verb look.This way, we investigate whether distributional representations truly capture the underlying semantics of predicational aspect.
We use the same simple setup as in Section 5, with a linear regression classifier that operates on the basis of averaged word2vec embeddings.We used a linear context window of size 1 for this experiments as this was the best performing setup for the Asp-ambig dataset in the evaluation in Section 5.  (Friedrich and Palmer, 2014).
shows the results of the zero-shot experiment in comparison to the majority class baseline.While for the majority of verbs, our model underperforms the majority class baseline -which is difficult to beat especially for the very skewed verbs such as feel or say, our approach beats the baseline for 3 verbs and achieves comparable performance for more than half of the verbs, while not having encountered any annotated data for the target verb during training at all.Given the simplicity of our setup, we regard that as strong evidence that a model based on distributional semantics does indeed capture a substantial amount of predicational aspect in its representations.

Figure 1 :
Figure 1: Frequency distribution of the 10 most frequent verbs (left) and associated label distribution for the 10 most frequent verbs (right).

Figure 2 :
Figure2: With a linear context window of size 1, Jane and to would be extracted as contexts for the verb decided.With a dependency-based context, Jane and leave would be extracted.
. The resulting dataset consists of 6547 examples in the training set and 1402 examples in the test set. 11As for the original SitEnt dataset, we report class-based F1-scores for SitEnt-ambig.

Figure 3 :
Figure 3: Class-based F1-score performance trajectories for varying sizes of the context window across all datasets.

Figure 4 :
Figure 4: Averaged accuracy scores for closed class context words in comparison to content word contexts.

Figure 5 :
Figure 5: Class-based F1-score performance trajectories for varying sizes of the context window across all datasets.

Figure 6 :
Figure 6: Averaged accuracy scores for closed class context words in comparison to content word contexts.

Figure 7
Figure7shows the PoS tag distribution of extracted contexts of the linear context window in comparison to dependency contexts.Dependency contexts, based on Universal Dependencies, overwhelmingly extract content words, whereas the linear context window predominantly tends to extract more closed class words.

Figure 7 :
Figure 7: PoS tag distribution of the extracted contexts of the linear context window vs. dependency contexts.

Table 1 :
Table 1 lists part of a conversation from the Walking Around corpus, where the speakers identify a landmark that the second speaker needs to reach.Part of an example dialogue from the Walking Around corpus

Table 2 :
Utterance length statistics per label in DIASPORA.

Table 3 :
. A result table comparing the best linear context window window with the best performing dependency context window is presented in Table 8 in Appendix C. Results on classifying states vs. events.FP14 refers to Friedrich and Palmer (2014), H16 to Heuschkel (2016), and F16 to Friedrich et al. (2016).

Table 4 :
Example Sentences from the state vs. event datasets.
in Appendix C.

Table 6 :
Example Sentences from the telic vs. atelic datasets.

Table 7 :
(Friedrich and Palmer, 2014)s per verb type for all 20 ambiguous verbs in the Asp-ambig dataset, comparing the majority class baseline, the models ofFriedrich and Palmer (2014)andHeuschkel (2016)to our window-1 and dependency (full) approaches.Per verb Accuracies on the Asp-ambig dataset(Friedrich and Palmer, 2014).

Table 8 :
C Supplemental Material -Window Contexts vs. Dependency ContextsTables 8 & 9 present the performance of the best performing linear context window in comparison to the best performing dependency context window, as well as the verb-only and full-sentence and majority class baselines.Comparison between linear contexts windows and dependency contexts on classifying states vs. events.Overall linear context window perform slightly better on average than dependency context windowsand as highlighted in Section 5 -this can be explained by linear context window extracting more closed class context words (see Figure7in Appendix B), which tend to be stronger disambiguation signals than content words.

Table 9 :
Comparison between linear contexts windows and dependency contexts on classifying telic vs. atelic events.

Table 10 :
Per verb Accuracies on the Asp-ambig dataset