Annotation of causal and aspectual structure of events in RED: a preliminary report

Causal and temporal relations among events are typically analyzed in terms of interclausal relations. Yet participants in a monoclausal event interact causally as well, and mono-clausal events unfold in temporal phases. We propose an annotation scheme for the resulting analysis of event structure types. The annotation scheme is based on a fine-grained analysis of aspectual structure combined with a novel analysis of physical event types based on proposals in the theoretical linguistics literature. By decomposing complex events in a clause, we will ultimately model the overall dynamic causal network of entities interacting over time described in a text.

Most analyses of event structure in text assume that a single clause headed by a verb denotes a single event; causal, temporal and other relations between events are assumed to hold between clauses. However, participants in a monoclausal event interact causally as well, and monoclausal events unfold in temporal phases. These observations underlie the decompositional analyses of verb meaning found widely in theoretical linguistics, both formal and cognitive, and used to capture linguistic generalizations. We use and extend the model of event decomposition of Croft (2012), intended to account for crosslinguistic generalizations, in order to develop an annotation scheme for event structure in single clauses.
A major issue in annotation of verb meaning is that a verb can be construed in multiple ways, largely though not entirely depending on the clausal constructions in which it occurs. For example, one and the same verb can describe events of different aspectual types (Moens and Steedman, 1988;Croft, 2012) (examples from COCA and Google Books): (1) a. He touched the tip of his hat, then left a few bills on the bar and slid off his stool.
b. But touching the wound again and again, and remaining concentrated on the wound is not going to heal it.
c. We can say the chair is touching the wall it is leaning against, but there really is not an encounter between them, but only a spatial relation of contiguity.
d. Her fingers touched the ball, and she gripped.
e. The desert touches the boundaries of rural and urban settlements alike.
Example (1a) describes an instant of contact (usually called semelfactive); example (1b) an activity of repeated contact; example (1c) a transitory state of maintained contact; example (1d) an achievement (instantaneous change) from noncontact to contact, and example (1e) an inherent physical relationship between two landscape entities. The different aspectual interpretations are constrained partly by the tense-aspect construction (simple past, simple present, progressive) but also partly by contextual factors-for example, (1a) and (1d) are both simple past but have different aspectual interpretations.

8
The same ambiguity is found in the forcedynamic structure of events denoted by a single verb. Examples (2) and (3) illustrate the ambiguity, with the standard labels for the force-dynamic type given in brackets (examples from COCA and Google Books): (2) a. Eva stayed by his side to hold the bowl, or to wipe his face with a cool cloth.
[ In order to capture the aspectual and forcedynamic structure of an event necessary to build the dynamic causal network, a semantic analysis or semantic annotation of clauses in text will have to combine three elements: annotation of the verbal "root", the unanalyzed part of verb meaning (Levin and Rappaport Hovav, 2005); annotation of the aspectual type, or aspectual image schema as we will call it; and annotation of the force-dynamic image schema, corresponding roughly to the labels found in examples (2)-(3).
The goal of an annotation scheme is to allow annotators to identify semantic types and properties in a text with a degree of reliability to make the manually annotated corpus useful for training, and also to allow automated integration with formal reasoning (Mani and Pustejovsky, 2012). We therefore propose an annotation scheme with holistic la-bels for aspectual and causal event types. Forcedynamic and aspectual structure are annotated separately from verb meaning and from each other, because verb meaning does not determine aspectual or force-dynamic interpretation, as noted above. The aspectual and causal event types for a clause are the product of the combination of the verb meaning and the tense-aspect and argument structure constructions of the clause. The holistic annotation relieves the annotator of identifying the contribution of the construction vs. the predicate to the event's semantic structure. The aspectual and force dynamic annotations can be translated into the decompositional analyses described in this paper, which can in turn form the basis of formal reasoning about the events in a text.
The aspect and force-dynamic annotation scheme proposed here is intended to be included in Richer Event Description (RED), which in its current form primarily annotates time, modality, and event coreference (Ikuta et al., 2014). RED is being revamped to supplement the Abstract Meaning Representation (AMR) annotation schema (Banarescu et al., 2013). AMR uses PropBank framesets (Palmer et al., 2005) for verb meanings (the verbal "root"), annotating argument roles by numbers (arg0, arg1, etc.) instead of semantic role labels, as found in VerbNet (Kipper et al., 2007) or FrameNet (Fillmore et al., 2003). The aspect and force-dynamic annotations provide the event structure from which the semantic roles of the numbered arguments can be derived. The derivation of semantic roles from argument positions in an event decomposition has been argued for by many theoretical linguists (Croft, 1991;Dowty, 1991;Van Valin and LaPolla, 1997).
In addition, other relevant semantic inferences about the events in the text may be derived from the aspectual and force-dynamic decompositional event structures. We expect that automated postprocessing of syntactically parsed, semantically annotated text will provide data for statistical analysis of the semantic interaction of constructions and verbs (more generally, complex predicates) in producing the event structure interpretation of a clause. This window is broken.

Point
Holds for one point in time The sun is at its zenith. (Mittwoch, 1988) Achievements Reversible Directed Ends in a transitory state The door opened. resettable verbs (Talmy, 1985) Irreversible Directed Ends in a permanent state The window broke. nonresettable verbs (Talmy, 1985) Cyclic Ends in a point state; reverts to the original state immediately afterwards The mouse squeaked. semelfactives (Smith, 1991), point events (Jackendoff, 1991), fullcycle predicates (Talmy, 1985) Activities Directed Describes an incremental change in a single direction on a scale The soup cooled. degree achievements (Dowty, 1979), gradient verbs (Talmy, 1985), gradual completion verbs (Bertinetto and Squartini, 1995) Undirected Describes change that is not incremental over time The girls chanted.

Incremental
Consists of a directed activity leading up to the defined result state I read the book. accomplishments (Vendler, 1957) Nonincremental Consists of an undirected activity leading up to the defined result state I repaired the computer. progressive achievements (Rothstein, 2004), runup achievements (Croft, 1998)

Annotation of aspectual structure of events
Our annotation of aspect is considerably more finegrained than previous annotations (Friedrich and Palmer, 2014;Mathew and Katz, 2009;Siegel and McKeown, 2001;Xue and Zhang, 2014;Zarcone and Lenci, 2008). It has long been known that the four-way aspectual classification of events by Vendler (1957) into states (static), activities (dynamic, durative, unbounded), achievements (dynamic, punctual, bounded) and accomplishments (dynamic, durative, bounded) does not appear to include a number of other aspectual types that have been described in the linguistics literature. Croft (2012) argues that all of the aspectual types that have been reported can be classified as different subtypes of Vendler's four classes. Based on Croft's analysis, we annotated a small sample of sentences from the data from the shared annotation task for the NAACL 2016 4th Workshop on Events (RED annotation), from which we had filtered nonfinite examples (including modals and negatives) and also perfects, whose aspectual analysis is complex, as they represent the midpoint of a grammaticalization path from a resultative state construction to perfective and past event constructions (Bybee et al., 1994;Croft, 2012).
We initially began with a two-level annotation starting from the Vendler classes as in Table 1. However, we found that a more effective annotation began with very common paired (two-way) aspectual ambiguities at the first level. For example He removed the bones could be construed as either a directed achievement or an incremental accomplishment. Sentence context cannot always resolve such

Directed Achievement
Suddenly I remembered her name.

Expandable Action
Directed Achievement He died at 9:17pm. Accomplishment He died in four hours. (Incremental/Nonincremental)

Accomplishable Action
Incremental Accomplishment Jules read the magazine (in an hour).

Directed Activity
Jules read the magazine (for an hour).

Directable Action Directed Activity
He was running down the alley.

Undirected Activity
The kids ran all afternoon.

Cyclic Action Undirected Activity
The light was flashing. Cyclic Achievement (Semelfactive) The light flashed (once).

Inactive Action
Undirected Activity He is tasting the soup. Transitory State I taste the onion in the soup.

Inherent Action Inherent State
The statue stands in the main square.

Transitory State
Joe is standing outside.

Disposition Inherent State
She is polite.

Undirected Activity
She is being polite. ambiguities, but the second level of annotation will be used for such ambiguity resolution where possible (NB: we do not disambiguate reversible and irreversible directed achievements at this stage). The common paired aspectual ambiguities can be observed in a multidimensional scaling analysis of English and Japanese verbal aspectual ambiguities (Croft, 2012, 266). These ambiguities are often not easily resolvable in texts, but the common ambiguities allow us to reduce the annotation possibilities to just two image schemas. The annotation scheme is presented in Table 2.
English is more flexible than other languages in using a single predicate form in more than one aspectual image schema. Other languages use derivational morphology to distinguish different aspectual construals of the same verb; this is true for some English Inactive Actions (She is sleeping/asleep. In fact, the paired image schemas in Table 2 understate the range of aspectual construals of English predicates; many predicates allow more than two construals. The multidimensional scaling analysis in Croft (2012) implies the semantic map of aspectual ambiguities in Figure 1. A predicate is likely to allow aspectual ambiguity in two or more connected aspectual image schemas in this semantic map. Our admittedly small sample suggests that although a verb out of sentence context may be multiply ambigu-ous aspectually, a verb in a specific sentence context may be only pairwise ambiguous, or not ambiguous at all, most of the time.  Croft (2012) supports his classification of aspectual types by presenting a phasal analysis of events mapped on two geometric dimensions, time (t) and qualitative state (q). These phases generate the aspectual types listed in Table 1. The basic phases Croft posits are: • states, having one point on q persisting for different temporal durations (point, finite segment, and a segment extending to the end of the entity's timeline for permanent and inherent states); • transitions, for punctual changes of state on q; • a monotonic function on q for directed activities/incremental accomplishments; and • a nonmonotonic function on q for undirected activities/nonincremental accomplishments.
A simple verb in a particular tense-aspect construction generally designates or profiles (Langacker, 1987) one phase. Other phases are also represented if they are presupposed, such as the initial rest state and inception of an event, or entailed, such as the result state of an achievement or accomplishment. Accomplishments profile the inception and completion phases as well, since they are temporally and qualitatively bounded events. Croft's graphic representations of two examples of the aspectual types in Table 2 are given in Figure 2: Croft's graphic representations of aspectual phases need to be formalized in order to allow for temporal reasoning over the aspectual analysis of events in a text. There are several possibilities that we are currently exploring. All involve a temporal interval logic (Allen, 1984) for the time dimension. States and changes of state on the q dimension can be modeled using Dynamic Interval Temporal Logic (Pustejovsky and Moszkowicz, 2011), or by a pointor vector-based model.

Annotation of force-dynamic event structure for physical events
We present an annotation scheme for physical events that is novel but derived from linguistic research that argues that events in simple clauses are analyzed in terms of causal transmission of force interactions between participants in the event (Talmy, 1976;Talmy, 1988;Croft, 1991;Croft, 2012). We briefly summarize the transmission of force model before introducing our annotation scheme and its linguistic justification.
The concept of causation in mind here is the narrow one pertaining to commonsense reasoning and lexical semantics called intrinsic causation by Ikuta et al. (2014), a local relation between events, not the broader causal complex (Hobbs, 2005) including all enabling conditions that must hold in order for a result to occur. Talmy (1976) develops the common "billiard ball" model of causation (Langacker, 1991) in which one participant forcefully acts on a second participant leading to some sort of change in the Agonist. Talmy (1976) defines four types of causation: physical, the basic physical force transmission relation; volitional, where an agent volitionally acts on a physical entity; affective, where an external entity causes a change in mental state; and inducive, where an agent volitionally causes a change in mental state of another agent through social interaction (communication, authority, etc.). In the following example, the force transmission from Sue to the hammer is volitional causation, from the hammer to the coconut is physical causation, and from the coconut's breaking to Greg's satisfaction is affective causation: (4) Sue broke the coconut for Greg with a hammer.
Sue → hammer → coconut → Greg VOL PHYS AFF Croft (1991) also extends Talmy's force transmission analysis to noncausal relations, such as the spatial relation between the paint (spatial figure) and the wall (ground) in the well-known spray/load alternation (Jack sprayed paint on the wall/Jack sprayed the wall with paint). Both sentences describe a situation in which Jack causes paint to end up on the wall. There is a semantic difference, best analyzed by Dowty (1991): the progress of the event is measured out by the change undergone by the direct object participant (the incremental theme). In the first example, the event is measured out by the paint going onto the wall; in the second example, the event is measured out by the wall getting covered by the paint.
Croft's model allows us to decompose events into a causal chain. However, Croft's causal model only describes who is acting on whom. It is too schematic to capture the more fine-grained categories of events that are assumed in discussions of verbs, argument structure constructions and their associated meanings by linguists such as Goldberg and Levin and Rappaport Hovav. We propose a force dynamic annotation scheme that captures these finer-grained distinctions; at this stage, it applies to physical interactions among physical entities, possibly brought about by agents. The decompositional analysis of force-dynamic image schemas underlying our annotation scheme is based on types of force-dynamic interactions proposed by Talmy (1988) and on types of incremental themes proposed by Dowty (1991), Tenny (1994), Hay et al. (1999), and others. Table  3 gives the force-dynamic image schemas that we posit; names for event types in the linguistics literature subsumed by these schemas are given in italics.
The distinction between Force/Resist and the remaining force dynamic image schemas in Table 3 is based on the analysis of force dynamics in Talmy (1988). Talmy proposes a model of causal interactions that is generalized from the "billiard ball" model; Talmy calls the transmitter of force the Antagonist, and the receiver of force the Agonist. In the "billiard ball" model, the Agonist has a tendency towards stasis, the Antagonist a tendency towards change, and the outcome of the force-dynamic interaction is change (of the Agonist). Talmy contrasts this with other possible combinations of participant tendency and outcome. In cases of simple contact and force exertion, such as He tapped the windowpane or They pushed the door (and it didn't budge), the Agonist tends towards stasis and the Antagonist towards change, as in "billiard ball" causation; but the outcome is stasis (that is, the Agonist is largely unchanged). In cases of maintaining or resisting, as in She was holding a ball, the force dynamics is the inverse: the Agonist (the ball) tends towards change (falling), but the Antagonist tends towards stasis, and the outcome is stasis. These nonprototypical force-dynamic image schemas, where the outcome is stasis rather than change, will be called Force and Resist respectively.
The remaining force-dynamic image schemas in Table 3 are all subtypes of "billiard ball" causation. They are distinguished primarily by the type of scale on the qualitative dimension that the "theme" entity undergoes. Scales may be incremental or instantaneous (Croft, 2012;Beavers, 2013). For incremental changes, some linguists have argued that all such changes be analyzed as mereological: change proceeds part by part, as in I mowed the lawn (Krifka, 1989;Krifka, 1998;Dowty, 1991). Hay et al. (1999) have argued that all such changes be analyzed as scalar change of a property of an entity, as in The balloon expanded. Croft (2012) argues that mereological change and property change are two different types of incremental change, along with two other incremental changes discussed by Dowty (1991): holistic themes, when an entity gradually moves on a path; and representation-source themes, when the incremental change is defined on a source, as in I copied the book. Instantaneous changes also fall into the same four types (Croft, 2012).
Property theme change, as in The balloon rose or The soup cooled, constitutes the change of state force dynamic image schema. Path theme change is found in all the different types of motion events, such as The car screeched around the corner or She tapped the ball into the pocket.
Mereological theme change constitutes the largest number of image schemas. One scheme, which we will call Application as in She wiped polish onto the table, describes putting a figure entity on (or in) a ground entity. Hence Application includes putting, inserting, combining and mixing; these differences are encoded by the verb, not the force dynamics. Its reverse (Cruse, 1973) is Removal (including separating), as in She wiped the dust off the table. Another related scheme, Covering as in He sprayed the wall with paint, describes incrementally covering (or filling) a ground entity with a figure entity. These alternating schemas differ in that Application treats the figure as the incremental theme while Covering treats the ground as the incremental theme (Dowty, 1991). They are treated as distinct frames by Fillmore and Atkins (1992) and Baker and Ruppenhofer (2002). The reverse of Covering is Uncovering (including emptying), as in I stripped the trees of bark.
The fourth type of theme, the representationsource theme, applies exclusively to replication. But  replication semantically belongs to a family of event types including creation, transformation and destruction that all involve changes to the categorial identity and/or individual integrity of the theme entity.
One characteristic of this family of event types is that the events are often not incremental. For example in creation (writing, composing, painting), one revises, deletes and reorganizes (Rothstein, 2004). Also, if part of a window cracks, it is construed as broken, even if later the crack lengthens or additional cracks appear. Some creation processes such as baking bread involve different types of processes such as mixing ingredients together, letting it sit, and heating it in the oven; this process cannot be described as simply mereological or a property change. These processes all involve following a design, as in a recipe. In replication, the design is simply the structure of the source; in transformation and creation, the design is in the creator's mind, and may be more of a process like a recipe than a pre-formed product. Destruction is the reverse of creation: undoing a design. We will call this generalization of the representation-source theme a design theme.
Finally, transfer and obtaining involve a change in possession, a socially-defined characteristic of physical objects (and other entities).
The first level of force dynamic annotation, given in Table 3, provides the type of the "core" interaction in a single-clause event. In simple clauses, the "core" force dynamic interaction can be supplemented by an external cause (typically volitional or inducive causation) antecedent to the core interaction, and by a beneficiary (affective causation) subsequent to the core interaction. The second level of force dynamic annotation that we propose involves the presence, if any, of additional participants in the causal chain, namely a causer and/or a beneficiary; and whether or not the event profile includes the initiator (usually the agent) or not, the latter case being the grammatically distinct passive argument structure construction with an oblique initiator. The additional annotation has a combination of any or all or none of the values causer, beneficiary and passive.
A preliminary annotation of a set of sentences from the NAACL 2016 4th Workshop on Events data (RED annotation) began by filtering out events in VerbNet classes for which we have not developed force-dynamic image schemas. We then annotated verbs whose VerbNet class allows more than one force-dynamic image schema: 9. Putting (Apply or Cover image schemas), 10. Removing (Remove or Uncover image schemas), 11. Sending and Carrying (Motion or Transfer), and 26. Creation and Transformation (Create or Transform). Unlike the aspectual annotation, here we concluded that the verbs in the corpus were largely unambiguous as to their force-dynamic image schema, and can be disambiguated using suitable annotation guidelines. Our work with force-dynamic image schemas is much less advanced than the analysis of aspectual image schemas, which are also smaller in number. As we scale up with a fuller set of force-dynamic image schemas, the annotation schema is likely to evolve beyond the simple scheme in Table 3.
The finer-grained force-dynamic annotation, combined with the finer-grained aspectual annotation described in section 2, can be translated into the causal-aspectual event structures described in this paper, and then used for semantic analysis and inference about events and their participants. Croft (2012) integrates the phasal aspectual decomposition of an event with the causal chain decomposition of the same event. An example of Croft's integrated graphic representation for Jane mowed the lawn is given in Figure 3. Croft decomposes the event into subevents each of which has its own participant. This decomposition allows the integration of Talmy's model of transmission of force between participants and the standard model of causation as events causing other events. What one participant does/undergoes causes another participant to do/undergo something in the event. The graphic representation aligns each participant's subevent temporally, and places the causally antecedent participant below the subsequent participant.
Each participant performs/undergoes its own subevent, represented by the blue t/q aspectual representations as in Figure 2, associated with Jane and the lawn. The red vertical arrows represent the force-dynamic interaction between participants (Jane's mowing causes the lawn to be mowed-a mereological change) that is encoded by the verb and argument structure construction. The labels 'mow' and 'mown' are a shorthand for the undirected activity of Jane and the mereological incremental accomplishment of the lawn. The italicized forms indicate the words associated with components of the graphic representation, and the capitalized terms represent the argument phrases of the argument structure construction. Again, Croft's representations need to be formalized in order to allow for casual reasoning over the force-dynamic analyses of events in a text. We are looking into dynamic graph structure (Harary and Gupta, 1997), which incorporates a temporal dimen-sion through which force-dynamic interactions between individuals change, as well as the qualitative features of those individuals (namely the q dimension); or a vector model of force dynamics similar to the model proposed by Warglien et al. (2012).

Conclusion
At this point, the annotation scheme is only a proposal based on the decompositional semantic analysis of event structure presented in this paper. The annotation scheme will continue to be field-tested to report on standard measures of inter-annotator agreement for the subset of physical events that it intends to capture. Provided that is successful, the annotation scheme will then be extended to other event types than those accounted for here. Several other physical force-dynamic image schemas must be analyzed. Emission and ingestion involve creation/destruction of the Agonist by the Antagonist, but also motion of the Agonist relative to the Antagonist. Existence and other unary valency events do not involve force-dynamic interactions, but do involve a variety of participant theme types. Other force-dynamic image schemas involve interactions between nondistinct participants, such as a human and her or his body part, and reflexive and reciprocal interactions, including the commercial transaction frame. Finally, there are also "mental dynamics", in which the interaction involves mental states or processes, and "social dynamics", in which the interaction involves social relationships and statuses.
More broadly, the decompositional analysis of monoclausal events will allow us to integrate the temporal and causal relations within monoclausal events and temporal and causal relations between events expressed in separate clauses. The result of this analysis will be a "least common denominator" description of events and event relationships that is invariant as to how events are linguistically expressed (one clause or multiple clauses), and a means to translate from linguistic expressions to this basic description of events. Using this decomposition, we can model the events reported in a text as a dynamic evolving network of individual entities, each of which is changing over time and interacting with other entities in the network, causally or in other ways.