Event Structure Representation: Between Verbs and Argument Structure Constructions

This paper proposes a novel representation of event structure by separating verbal semantics and the meaning of argument structure constructions that verbs occur in. Our model demonstrates how the two meaning representations interact. Our model thus effectively deals with various verb construals in different argument structure constructions, unlike purely verb-based approaches. However, unlike many constructionally-based approaches, we also provide a richer representation of the event structure evoked by the verb meaning.


Introduction
Verbal semantics is an area of great interest in theoretical and computational linguistics (e.g. (Fillmore, 1968;Fillmore et al., 2003;Talmy, 1988;Dowty, 1991;Croft, 1991Croft, , 2012Valin and LaPolla, 1997;Levin, 1993;Kipper et al., 2007;Ruppenhofer et al., 2016). It has been widely recognized that verb meaning plays an important role in the syntactic realization of arguments and their interpretation (Levin, 1993). VerbNet (Kipper et al., 2007) and FrameNet (Fillmore et al., 2003;Ruppenhofer et al., 2016) are large online resources on verb meanings that have been developed in recent years. VerbNet, an extensive verb classification system inspired by Levin (1993), defines verb classes based on verbal semantics and the syntactic expression of arguments. FrameNet uses the theory of Frame Semantics (Fillmore, 1982(Fillmore, , 1985 to classify lexical units into frames based on their meaning and their semantic and syntactic combinatorial properties with other event participants. Providing an effective model to represent event structure is essential to many natural language processing (NLP) tasks. Recent meaning representation frameworks employed in NLP (Banarescu et al., 2013;Hajič et al., 2012;Abend and Rappoport, 2013), are largely concerned with identifying event participants and their roles within the event. Most meaning representations use a lexically-based approach that assumes that the lexical semantics of a verb determines the complements that occur with it in a clause.
However, lexically-based models for event structure do not provide a complete representation since verbs can occur in various argument structure constructions (Goldberg, 1995(Goldberg, , 2006Iwata, 2005). Depending on the semantics of the argument structure construction, a verb can be construed in many different ways. For example, a verb such as kick can occur in various semantically different constructions, as shown below (Goldberg, 1995, 11).
(2) Pat kicked the football into the stadium.
(4) Pat kicked Bob black and blue.
Kick can be construed as a verb of contact by impact when it occurs in the force construction in (1) (Levin, 1993, 148). It can be construed as a verb of throwing in the caused motion construction in (2) (Levin, 1993, 146). Kick can also be construed as a transfer verb in the transfer of possession construction in (3) or a change of state verb in the resultative construction in (4). Goldberg (1995) argues that argument structure constructions carry meanings that exist independently of verbs. She develops a constructional approach in which argument structure meaning and verb meaning combine to specify the event structure. We introduce a model in which event structure is derived from argument structure meaning and verb meaning. The argument structure meaning is based on the semantic annotation scheme developed in Croft et al. (2016Croft et al. ( , 2018, which specifies the causal interactions between participants in the event. The verb meaning is a causal network which in many cases is more elaborate than the causal chain specified by the argument structure construction, but uses the same inventory of causal relations as the argument structure meanings. The argument structure meaning is annotated on individual clauses, and the verb meaning is retrieved from a resource based on VerbNet and FrameNet.
Our event structure representation offers a richer model when compared to exclusively lexically-based or constructionally-based resources on verb meaning. We describe below how our representation captures both the constructional meaning and the verb meaning, and how we map the former onto the latter. Having a two-facet representation helps us to effectively deal with verb construals as well as more complex event structures evoked by different event types.

Constructional meaning representation
The representation of constructional meaning uses a small set of causal chains that schematically represent the event structure evoked by argument structure constructions. Causal chains consist of event participants, a limited set of force dynamic relations between participants, and information about the participants' subevents. Crosslinguistic evidence indicates that argument realization is best explained by transmission of force relations (Talmy, 1988;Croft, 1991Croft, , 2012. Force-dynamic relations are defined based on existing literature on force dynamic interactions (Talmy, 1988) and event semantics (Dowty, 1991;Tenny, 1994;Hay et al., 1999;Valin and LaPolla, 1997;Verhoeven, 2007;Croft, 2012). Force dynamic relations may be causal (Talmy, 1988) or non-causal (Croft, 1991), such as a spatial relation between a figure and ground in a physical domain. Causal chains represent force dynamic image schemas that correspond to established configurations of causal and non-causal relations between participants and their subevents. The subevents for each participant are specified for qualitative features that describe the states or processes that the participant undergoes over the course of the event (Croft et al., 2017).

Why constructional causal chains aren't enough
A causal chain model of constructional meaning is not a comprehensive representation of verb meaning. A richer representation of verbal event structure is needed for various event types. An example of a complex event type that demands a more detailed event structure representation is ingestion. An example with eat such as Jill ate the chicken with chopsticks illustrates this point. In the causal chain analysis of the argument structure construction depicted in Figure 1, the chopsticks are analyzed as an Instrument. However, the semantic role of the chopsticks in an eating event is quite different from that of a more prototypical instrument participant, such as a hammer in a breaking event (e.g. Tony broke the window with a hammer). In particular, the role of the chopsticks in the event structure is more complex. Unlike the hammer which breaks the window, the chopsticks do not eat the food. The chopsticks are used to move food to the Agent's mouth rather than eating the chicken. This contrasts with the role of the hammer which directly causes the breaking of the window. Consequently, one can use an argument structure construction without an Agent with break (The hammer broke the window) but not with eat (*The chopsticks ate the chicken). The causal chain in Figure 1 does not capture this fine grained semantic distinction between these two types of instrument roles. Table 1 contains a list of event types in the physical and mental domains that require a more fine grained event structure representation. A short description of the event structure is provided for each event type to illustrate how the causal relations between participants in these event types are too complex to be accurately represented by causal chains associated with the semantics of argument structure constructions.
In this paper, we present a verb meaning representation that aims to provide a richer model for event structure such that subtle semantic differences between participant roles can be made explicit. We accomplish this by introducing a separate richer representation for the verbal event Jill ate the chicken with chopsticks.

drive, ride)
A Rider enters a Vehicle (or a Driver uses a vehicle) which then transports the Rider/Driver to a Destination.

Brenda went to Berlin by train.
Perception (e.g. look, listen) A Perceiver uses an Implement which then allows the Perceiver to view a Target. They looked at the cranes with binoculars.
Cooking (e.g. bake, cook) A Cook puts Food in a Cooking container which then cooks the Food by emitting heat. I baked the potatoes in the oven.
Searching/Finding (e.g. find, look for) A Searcher searches in a Location and mentally attends to a Searched item by searching for it. The Searched item is in a spatial relation with the Location.
I searched the cave for treasure.
Creation (e.g. paint, make) A Creator has an idea (i.e. mental experience) of a Design which then the Creator creates by producing a Creation using an Instrument.
Claire drew a picture.
Emission (e.g. flash, gush) An Emitter creates an Emission with respect to a Ground. The Emission is also in a Path relation with the Emitter.
The well gushed oil.

hurt, break)
An Agent's action results in an effect (e.g. harm) of the Agent, their Body part, or some other animate entity.
Tessa sprained her ankle.

Verbal meaning representation
Our representation of the verbal event structure uses a network model which consists of causal relations between participants and participants' subevents, not unlike causal chains. However, verbal networks contain richer information about the participants' causal relations that are not evoked by the argument structure construction and are therefore not represented in causal chains. Each causal network is associated with an event type evoked by the verb meaning. For example, an Ingestion network represents the event structure associated with verbs of eating. As shown in Figure 2, the Ingestion network is cyclic and nonbranching 1 : the Eater uses the Utensil ("Manipulate" relation) to reach the Food ("Force" relation). The Food moves to the Eater's mouth ("Path" relation) and is subsequently consumed by the Eater ("Force" relation) 2 .
Unlike the causal chain representation, the verbal network representation allows for a direct causal relation between the Eater and Food. This accommodates the semantics of ingestion events in which the Eater, rather than the Utensil, consumes the Food. Two participants in the network are involved in more than one causal relation. The Eater and Food have three distinct roles in the event structure. The Eater is the Agent who initiates the event; it is the ground that is in a Path relation with the Food, and it is also the consumer of the Food. The Food is an endpoint of the Force relation; it is a motion theme that is in a Path relation with the Eater, and it is also a Patient in a Change of State event as it gets consumed.
Since causal networks may be cyclic, the direction and ordering of causal relations within the network is more clearly represented if participants and the relations between them are depicted in a linear fashion, similarly to causal chains. "Unthreading" a linear path in the network represents the sequence of subevents better than a network representation. As shown in Figure 3, the Eater and Food occur twice in the unthreaded version of the causal network. Since the unthreaded version lays out the participants' relations in a linear chain, this representation also includes information about the change that each participant undergoes in its subevent(s). The network representation in Figure 2 does not include these labels due to a lack of space. We use the unthreaded version of verbal networks in the remainder of this paper to illustrate the mapping of the semantics of argument structure constructions onto the verbal event structure.

Mapping causal chains into verbal networks
Argument structure constructions may evoke only part of the verbal event structure. That is, causal chains may evoke a subset of participants and the relations between them in the verbal network. Mapping a causal chain into a network allows us to provide a comprehensive event structure representation that accounts for the meaning of the argument structure construction as well as the meaning evoked by the verb. In many cases, there is a considerable overlap in the two types of representations, i.e. a one-toone mapping exists between participants and their relations in the causal chain and in the verbal network. This is usually the case with simple event types, e.g. Motion or Force verbs (see Figure 6 in section 3.2 and Figure 11 in section 4). However, the mapping becomes more complicated when a causal chain is mapped into a complex network that contains additional participant relations not present in the causal chain. Figure 4 demonstrates the mapping between a causal chain associated with the example Jill ate the chicken with chopsticks and the Ingestion network. The network representation contains additional participant relations that are not evoked by the causal chain. The correct mapping of participants from the causal chain to the network is achieved by linking participants by their subevents and relations. In addition, the sequence of subevents in the causal chain and in the network must follow the same order. As a result of this constraint, the dotted lines that link participants in causal chains and networks should not cross each other. The causal chain participants and their relations are mapped into the network as follows: Jill, the Agent in the causal chain, is linked to the Eater. Although there are two instances of Eater in the network event structure, the Agent is only linked to the Eater which is the initiator of the causal chain. This is because the Eater must be in a direct Manipulate relation with the Utensil. In addition, both the Agent and the Eater are labeled Volitional (VOL 3 ). Chopsticks are labeled Internal (INTL 4 ) in the causal chain and are therefore linked to the Internal participant in the causal network, which is the Utensil. The Patient, a change of state (COS 5 ) theme, is linked to the Food participant at the end of the verbal network which is also labeled COS.
The Food and Eater participants that are in a Path relation with each other constitute a part of the verbal event structure and are therefore represented in the causal network; however, they are not evoked by the argument structure construction. As a result, there is no direct linking of these participants to the causal chain.

Structure of verbal causal networks
Examining the more complex verbal networks in Table 1 has led us to conclude that networks can be analyzed as a concatenation of less complex event types. Networks can be thought of as being made up of subchains. Each subchain denotes a force dynamic image schema that is used to describe the semantics of argument structure constructions. The internal structure of verbal networks is thus composed of subchains that can be used independently as simple networks or concatenated to each other to form complex networks.
Subchains are not random subparts of a verbal causal network. A subchain is a subpart of a complex network that can be expressed by itself with a main verb. For example, the Motion subchain can be expressed by a motion verb such as move as in He moved the ball. The Manipulate network can be expressed by a manipulate verb such as use as in He used the shovel. The Force network can be expressed with a verb of force such as hit as in He hit the ball, and the Change of State network can be expressed with a verb of change of state such as break as in The vase broke.
The concatenation analysis of causal networks can be illustrated on the unthreaded version of the Ingestion network as shown in the bottom part of Figure 5. The event structure for ingestion verbs can be analyzed as being composed of five subchains: (1) a Manipulate image schema between the Eater and the Utensil, (2) a Force image schema between the Utensil and the Food, (3) a Motion image schema between the Food and the Eater, (4) a Force image schema between the Eater and the Food, and (5) a Change of State image schema that contains only one participant, the Food. The Manipulate image schema describes a causal chain in which an Agent uses an Instrument to interact with another physical entity. The physical interaction between an Instrument and Food describes a Force image schema which, in more general terms, denotes an event in which a physical entity interacts with another physical entity (a theme) by exerting physical force and thus causing the theme to undergo some physical change, e.g. a translational motion or a change of state. Alternatively, the physical entity that initiates the Force relation comes into contact with the theme without any physical change taking place. The Motion image schema describes a causal chain in which a motion theme moves along a path with re-spect to some ground. The Change of State image schema describes a single-participant causal chain in which a theme undergoes a change of state. The change of state event may be initiated by an external entity, such as an Agent in this ingestion example.
Subchains denoting image schemas may be concatenated in various ways to form complex networks; however, they must be connected by one shared participant. Each participant that occurs in two subchains, i.e. as the endpoint of the first subchain and also the initiator of the next subchain in the verbal causal network, has two separate labels that describe the participant's subevent.
To illustrate this point further, let's consider a Motion event. Motion may be concatenated with an external cause (e.g. Force), as in the example Steve tossed the ball to the garden (VerbNet). The Agent Steve exerts force on the Moved Entity ball, which consequently undergoes motion. The Moved Entity is in a path relation with the Ground garden. The Moved Entity is both an endpoint of the Force image schema (labeled EXIST 6 ) and a motion theme in the Motion image schema (labeled MOT 7 ), as shown in Figure 6. Each network consists of a core subchain which corresponds to a particular event type. For example, in networks with motion verbs, the core subchain consists of two participants: a motion theme or figure which is in a path relation with a ground (Talmy, 1974). To distinguish the core subchain from a concatenated subchain, participants and their relations in the core subchain are highlighted in bold, as shown in Figure 6.

Network participants and overlap
Verbal event structure determines the participants and their roles in causal networks. In our network representation, we include all participants that are obligatorily evoked by the verb. To ensure that our networks for event types are comprehensive, we consult VerbNet and FrameNet databases for their semantic identification of event participants (i.e. Roles in VerbNet and Core Frame Elements in FrameNet). Our labels for network participants are chosen based on the participant's role in a given verbal event structure (not unlike Frame Elements in FrameNet); the labels are not meant to be interpreted as semantic role labels.
Including only the participants that are obligatorily evoked by verbal semantics results in causal networks that are closely related but not identical. Consequently, some event types have multiple networks that partially overlap. For example, the event structure for vehicular motion (VM) verbs, such as drive and ride, overlaps since they share event participants, i.e. a Rider, Vehicle, and Destination (see Figure 7). However, their event structure representations are not identical. Ride and drive evoke different initiators of the causal network, as shown in Figure 7 (cf. FrameNet's Ride vehicle, Operate vehicle, and Cause motion frames). The core subchain in both VM networks is a Motion image schema which describes the relation between a Rider and Destination; however, unlike other Motion networks, the VM network is more complex since VM verbs obligatorily evoke a Vehicle as an additional participant in the event structure.
As depicted in Figure 7, the relation between the initiators (i.e. Rider and Driver) and the Ve-hicle in these two types of VM networks is different. In the Drive network, a Driver drives a Vehicle (Manipulate image schema) to transport a Rider (Force image schema) to a Destination (Motion image schema). Figure 8 shows a mapping of the causal chain associated with the example He drove him to the hospital to the Drive verbal causal network 8 . The Vehicle in the network is not linked to any participant in the causal chain since it is not expressed by the argument structure construction. However, it is represented in the causal network because it is evoked by the semantics of drive. Ride evokes a similar network representation that partially overlaps with the Drive network. However, in the Ride network, a Rider boards a Vehicle (Motion image schema) which transports the Rider (Force image schema) to a Destination (Motion image schema). Unlike the Drive network, the Ride network is cyclic, i.e. the Rider is involved in more than one relation. This is illustrated on the mapping of the causal chain associated with the example Brenda went to Berlin by train to the Ride network in Figure 9. The Path relation between the Rider and the Ve-8 Drive can also occur in an argument structure construction in which the Agent and the Theme are conflated (e.g. He drove to Santa Fe). In this example, the Agent is linked to both the Driver and Rider in the verbal network. A distinct verb for conflated Driver and Rider is used in Dutch (Jens Van Gysel, pers. comm.) and Korean (Sook-kyung Lee, pers. comm.) hicle is usually not syntactically expressed in argument structure constructions with VM verbs in English; however, it is evoked by the verbal semantics of ride verbs. The Instrument is linked to the Vehicle and the Theme to the Rider in the network.
Overlapping of verbal causal networks is common in our event structure representation. Another case of network overlapping can be found with the ingestion verbs eat and feed, as shown in Figure  10. Feed in (b) obligatorily evokes an external initiator, i.e. a Feeder, which is different from an Eater. The Ingestion network for eat in (a) does not include a Feeder since eat does not obligatorily evoke this participant. The two networks share most of the event participants; however, we provide a separate representation for each event structure since the networks do not overlap fully.

Representing construals with causal networks
Using the analysis of image schema concatenation to form complex networks allows us to provide a more comprehensive representation of event structure for examples in which a verb meaning has different construals. As noted in the introductory section of this paper, a verb can have more than one construal depending on the argument structure construction in which it occurs. To demonstrate how our network representation deals with this issue, we will return to the construals of kick discussed in the Introduction.
Our causal chain analysis distinguishes the various meanings of kick by having a causal chain representation for the constructional semantics. However, an additional layer of information must be included to indicate which part of the event structure is evoked by the verb meaning and which part comes from the meaning of the argument structure construction. In particular, a causal chain analysis of constructional meaning does not convey that kick is a Force verb, rather than a Motion verb, when it occurs in a Motion construction or in other construals. Our model pairing constructional meaning (i.e. causal chains) with verb meaning (i.e. verbal networks) provides an event structure representation that accounts for verb construals in various constructions.

A Motion construal of kick
Kick can occur in a caused motion construction, as in Pat kicked the football into the stadium. As shown in Figure 11, the core event type in the network representation for this example is identified as Force. The Force image schema describes a causal relation between an Agent and a Force Theme evoked by the verb kick. Since the argument structure construction describes a Motion event, a Motion schema is concatenated onto the Force image schema. That is, the argument structure construction evokes a more complex event structure in which the Force Theme is also in a Path relation with a Ground. The Force Theme football is both an endpoint of the Force relation as well as a motion theme in the Motion image schema. The two representations for the motion argument structure constructions with toss in Figure  6 and kick in Figure 11 demonstrate that adding verb meaning to the analysis of event structure allows us to differentiate the semantics of these two examples. In the network representation of toss, the core subchain is identified as a Motion image schema since toss is a motion verb. As a result, the motion theme is labeled Moved Entity. The event structure evoked by the construction Steve tossed the ball to the garden adds a Force image schema to the Motion subchain.
The network representation of the motion example with kick in Figure 11 is different. Force is identified as the core subchain since kick is a Force verb. The motion theme is labeled Force Theme. The event structure evoked by the construction Pat kicked the football into the stadium adds a Motion image schema to the Force subchain. The distinct labels for participants in each network are motivated by the core subchain which is evoked by the verb meaning.

COS and Transfer construals of kick
Our representation also allows us to differentiate the event structure evoked by the COS argument structure construction Pat kicked Bob black and blue from the verbal semantics of kick. The core event type profiles a causal relation between an Agent and a Force Theme. As shown in Figure  12, the Force Theme is identified as both the endpoint of the Force image schema as well as a COS theme in the COS image schema evoked by the constructional semantics.  Figure 13 shows our event structure representation for kick in a Transfer construction as in Pat kicked Bob the football. Similarly to the network representation in Figure 11 and 12, the core event type in the network is Force. The Transfer argument structure construction adds a Recipient Bob who is in a Control relation with the Force Theme football. As these examples demonstrate, verbal causal networks provide more detailed information about the event structure than causal chains. Using the notion of image schema concatenation allows us to deal with various verb construals in different argument structure constructions. Our event structure representation represents verb meaning and constructional meaning, and distinguishes one from the other.

Conclusion
In this paper, we present a model of verb meaning representation that accounts for the semantics of argument structure constructions as well as verbal event structures associated with event types. Our proposed causal networks for verb meanings represent richer event structures associated with complex event types. Our network representations can also deal with verb construals in various argument structure constructions.
The verbal causal networks are more general than VerbNet classes and subclasses which are based on Levin (1993) argument structure constructions. As a result, they subsume more than one VerbNet class. The networks are also more general than frames in FrameNet. In some cases, our networks link to higher order non-lexical frames in FrameNet. However, this is not always the case. In many cases, our networks link to multiple less schematic lexical frames.
Verbal networks will be stored with verbs in VerbNet in the relevant classes. For example, the Ingestion network will be linked to the following VerbNet classes: chew-39.2, dine-39.5, eat-39.1, gobble-39.3.-1, and gorge-39.6. Given the direct correspondence between verbal networks and VerbNet classes, our verbal analysis provides the same verb coverage of corpus data as Verb-Net (cf. Palmer et al. (2005) for VerbNet's coverage of the Penn Treebank II). An automated analysis and linking of networks to verbal entries in corpora will use existing computational methods for verb sense disambiguation (Loper et al., 2007;Chen and Palmer, 2009;Brown et al., 2011;Peterson et al., 2016) to accomplish a correct match of verb senses to verbal networks.
A near-term objective of our work is to design a computational model that automates the mapping between the participants in the different networks. Given a causal chain, a verbal event network, and a set of possible links, the task is to determine the path through the network that describes an event. Developing such a computational model will be complicated by the multiple possible interactions of verb meaning and accompanying argument structure construction, the many possible concatenations of image schemas, the need to respect the dimensionality of the links in the causal representations, as well as how to account for coercion and construal. A starting point is to recognize that argument structure constructions are defined by a small set of force dynamic relations, and these relations also define verbal networks. The next step toward a computational model will be to extract constructional meaning from raw text, to be reported on in future work.
Currently, our event structure representation covers physical and mental domains. However, there are many complex event types in the social domain that need to be analyzed. Among others, verbs of transfer of possession and communication, which make up a large portion of the verbal lexicon in the social domain, all involve complex cyclic networks which will benefit from a semantic representation that is separate from the argument structure construction meaning.