Meaning Representation of Null Instantiated Semantic Roles in FrameNet

Humans have the unique ability to infer information about participants in a scene, even if they are not mentioned in a text about that scene. Computer systems cannot do so without explicit information about those participants. This paper addresses the linguistic phenomenon of null-instantiated frame elements, i.e., implicit semantic roles, and their representation in FrameNet (FN). It motivates FN’s annotation practice, and illustrates three types of null-instantiated arguments that FrameNet tracks, noting that other lexical resources do not record such semantic-pragmatic information, despite its need in natural language understanding (NLU), and the elaborate efforts to create new datasets. It challenges the community to appeal to FN data to develop more sophisticated techniques for recognizing implicit semantic roles, and creating needed datasets. Although the annotation of null-instantiated roles was lexicographically motivated, FN provides useful information for text processing, and therefore must be considered in the design of any meaning representation for natural language understanding.


Introduction
Null instantiation as a linguistic phenomenon has received much attention in the literature on verbal argument structure. Fillmore (1986) identified idiosyncrasies of lexically licensed null arguments in near-synonymous verbs. Resnik (1996) explained the phenomenon in terms of selectional restirictions; Rapaport Hovav and Levin (1998) invoke Aktionsart. Others Baker 2003, Ruppenhofer andMichaelis 2014) appeal to frames or constructions.
Aside from verbal argument structure, the discourse in which a sentence occurs also may license an omission. Ruppenhofer et al. (2010) initiated the task of linking events and their participants in discourse, with participating systems yielding different degrees of success. Roth and Lapata (2015) introduced techniques for semantic role labeling that use various discourse level features in an effort to identify implicit roles. With semantic role labeling (SRL) usually limited to sentence level analysis, the conundrum of identifying something absent from a text is clear, more so when the major resources do not identify or record information about implicit roles.
Efforts to create resources to use in work on developing techniques for recognizing implicit semantic roles have not yielded large datasets. For the SemEval task, Ruppenhofer et al. (2010) annotated 500 sentences from a novel. Studying nominal predicates, Chai (2010, 2012) created a dataset of 1000 examples from NomBank (Meyers et al. 2004). Roth and Frank (2015) aligned monolingual comparable texts to obtain implicit arguments, resulting in a dataset similar in size to previous datasets. Recently, Cheng and Erk (2018) used coreference information to generate additional training data; Cheng and Erk (2019) addressed the problem using an approach to generate data that scales.
Despite the need, no work addresses resources that record null instantiations (because most do not), or the representation of null-instantiated semantic roles. 1 This paper begins to fill the gap by bringing attention to FrameNet's practice of recording information about null-instantiated semantic roles, i.e., representing the meaning of omitted arguments, a practice that no other major lexical resource observes. It also challenges the broad NLP/NLU community of resource builders, designers of linguistic annotation and meaning representation schemes, as well as developers of SRL systems to exploit and expand the data that FrameNet already provides.
The rest of the paper proceeds as follows: Section 2 provides background to FN, and describes the goals of the projects meaning representation; Section 3 covers null instantiation in FN, provides example sentences including annotation, illustrating how FN implements its desiderata; Section 4 presents a challenge to the NLP community; and Section 5 summarizes the paper, addressing some limitations of FrameNet.
2 Background to FrameNet

General Information
FrameNet (Ruppenhofer et al. 2016) is a research and resource development project in corpus-based computational lexicography project based on the principles of Frame Semantics (Fillmore 1985), whose goal is documenting the valences, i.e., the syntactic and semantic combinatorial possibilities of each item analyzed. These valence descriptions provide critical information on the mapping between form and meaning that NLP and NLU require. At the heart of the work is the semantic frame, a script-like knowledge structure that facilitates inferencing within and across events, situations, states-of-affairs, relations, and objects. FN defines a semantic frame in terms of its frame elements (FEs), or participants in the scene that the frame captures; a lexical unit (LU) is a pairing of a lemma and a frame, characterizing that LU in terms of the frame that it evokes.
To illustrate, FrameNet has defined Revenge as a situation in which an AVENGER 2 performs a PUNISHMENT on an OFFENDER as a response to a PUNISHMENT, inflicted on an INJURED PARTY; and these core frame elements uniquely define the frame. Among the LUs in Revenge are avenge.v, avenger.n, get even.v, retributory.a, revenge.v, revenge.n, vengeance.n, vengeful.a, and vindictive.a, where nouns, verbs, and adjectives are included. The linguistic realization of each frame element highlights different participants of the frame, as shown in sentence #1, where the target of the analysis is the verb avenge. 3 1. (Peter AVENGER) avenged (the attack on the boys PUNISHMENT).
Sentence #1 illustrates the instantiation of two of the frames core frame elements: the proper noun Peter is the AVENGER and the NP the attack on the boys is the PUNISHMENT. No other core FEs of the Revenge frame is instantiated in the sentence.

Meaning Representation in FrameNet
FrameNet's ultimate goal is the representation of the lexical semantics of every sentence in a text based on the relations between predicators and their dependents, which include clauses and phrases that also include predicators Baker 2001, Baker et al. 2007: 100). FrameNet's meaning representation for these predicators was designed in accord with the principles of Frame Semantics (Fillmore 1985). For each LU that FN analyzes (annotates), the goal is to identify the linguistic material that instantiates the frame elements of the given frame, and then characterize the grammatical function and phrase type of that material. Note that annotated FEs are actually triples of information about the annotated constituent, not simply information about the constituent's semantic role. Importantly, meaning and form are inextricably tied together, where each contributes its part to characterization of the whole. Table 1 shows the FE identified as PUNISHMENT (example # 1), as a triple of information.
the attack on the boys Frame Element (FE) PUNISHMENT

Grammatical Function
Object Phrase Type NP The goal of providing a valence description for each lexical unit that FN analyzes necessitates recording information about omitted arguments. FN characterizes the syntactic and semantic conditions under which an omission is possible. For sentence # 1, FrameNet's lexicographic purposes require recording information about OF-FENDER and PUNISHMENT, two lexically licensed null-instantiations (Fillmore 2007). 4

Null-Instantiation (NI) in FrameNet
FN annotates information about the conceptually required semantic roles, i.e., the core FEs of a frame, even if absent from the text. FN records three types of null-instantiation, one licensed by a construction, and the others licensed lexically. FrameNet includes approximately 55,700 NI labels in its annotations; and some 26% of the omissions are licensed consturctionally, with the remaining 76% licensed lexically. 5 This section very briefly addresses the first type, and then presents lexically licensed omissions. 6

Constructional Null Instantiation
Constructional Null Instantiations are licensed by grammatical constructions. Examples of CNI are the omitted agent in a passive sentence (# 2), or the omitted subject in an imperative (# 3).
2. Sue was avenged by killing her assailant. 3. Get even with that bum.
In both sentences, the AVENGER is understood as a participant in the event, although not mentioned in the relevant clause (# 2) or sentence (# 3).

Definite Null Instantiation (DNI)
Definite Null Instantiation (DNI) identifies those missing core FEs that were mentioned previously in the text or can be understood, that is, inferred from the discourse. Consider examples # 4-5 as two contiguous lines of text, where information about a null-instantiated core FE appears in the context of the relevant piece of text, allowing the language user to infer the missing argument. Encountering # 5 signals the language user to refer back to information in # 4. 4. Wendy was astonished (at the killing of the pirate PUNISHMENT). 5. (Peter AVENGER) had avenged (the attack on the boys PUNISHMENT). Ziem (2013Ziem ( , 2014 demonstrated that DNIs in spoken discourse tend to be specified in adjacent sentences, and thus also showed the relevance of frames to text coherence. 5 Clearly, providing the total number of sentences would be ideal; obtaining that number is not straightforward. 6 A full treatment of grammatical constructions is well beyond the scope of this paper. Explicit grammatical information, some of which a syntactically-parsed corpus might provide, would aid in the identification of CNIs. Still, the automatic recognition of constructions is in a relatively early stage of development (e.g., Dunietz 2018, Dunietz et al. 2017).

Indefinite Null Instantiation (INI)
Indefinite Null Instantiation (INI) is the other lexically specific licensed omission, and it is illustrated with the missing objects of verbs such as eat, bake, and sew. These verbs are usually transitive, but can be used intransitively (# 6-# 7).
6. Let's go out to eat. 7. Sam took his time baking.
With such verbs, language users understand the nature of the missing material without referring back to any previously mentioned entity in the discourse. In # 6 speakers will understand that the omitted object is consumable food. Cheng and Erks (2019) recent study about implicit arguments draws on event knowledge to predict the semantic roles of omitted arguments. The work also relies upon the (psycho-linguistic) notions of entity salience and text coherence for building a computational model.
Recording null instantiation offers the ability to distinguish multiple senses of a lemma, as is apparent with different senses of the verb give, as 8b and 9b show. Thus, only the donation sense of give allows omitting the object; but give meaning gift someone a present does not. Only for the donation sense of give does FN record example 8b as having a nullinstantiated object.

Complicating Factors
FN's concept of a CoreSet adds to the challenge of automatically recognizing null instantiations. Given a set of two or more core FEs in a frame, annotating just one of them satisfies FN's requirements. For example, SOURCE, PATH, and GOAL are core FEs in motion-related frames; however not all of these FEs always manifest in every sentence that describes a motion event.
Consider example 10, an instance of the Self motion frame, which defines a scene in which the SELF MOVER, a living being, moves under its own direction along a PATH.

(Chuck SELF MOVER) walked (to the BART station GOAL).
In 10, of the CoreSet FEs, only the GOAL is realized; FN annotates the PP to the BART station as the GOAL, along with Chuck as the SELF MOVER, and considers its job done (for that sentence). Given a CoreSet, annotating just one of its members is legitimate; however, it does not preclude annotating more than one of the FEs. Thus, FN would annotate the PATH and the GOAL FEs in 11.

(Chuck SELF MOVER) walked (along Center
Street PATH) (to the BART station GOAL).
This state-of-affairs complicates matters for the recognition of null instantiations, as (so far) other than listing CoreSet FEs in the frame definition, FN does not directly record null-instantiated Core-Set FEs with its annotated data. although the information is available via the frame element-to-frame element relations within a frame. Additionally, lexical semantic and pragmatic phenomena contribute to the way that FrameNet distinguishes between INI) and (DNI), as Ruppenhofer et al. (2010) among others have noted. To illustrate, sentence 12 exemplifies the Similarity frame, in which ENTITY 1, EN-TITY 2, and DIMENSION are core FEs. While FN records ENTITY 2 as DNI, it records DIMENSION as INI. Since the interpretation of the sentence relies on the accessibility of ENTITY 2 to the language user, that FE is a DNI.
12. The split went in a different direction....

(ENTITY 2 DNI) (DIMENSION INI)
In contrast, simply knowing that ENTITY 1 and ENTITY 2 differ along some DIMENSION, a specific prior mention in the text or surrounding discourse is not necessary to interpret the sentence. As such, FN records DIMENSION as an INI. Furthermore, (assumed) prior mention in a text, i.e., beyond the boundary of the single sentence, might suggest the likelihood of a DNI interpretation. However, not all lexical items will license the same FE omission. For example, although both are defined in terms of the Arriving frame, arrive.v licenses the omission of the GOAL, while reach.v does not, as examples 14 and 13 show.

Other Lexical Resources
The comparison with other lexical resources is warranted given the impetus to feature one of FN's many differentiating characteristics. No major lexical resource records information about lexically licensed implicit semantic roles.
PropBank (Palmer et al. 2005) has annotated a corpus of text with information about basic semantic propositions, also adding predicate-argument relations to the syntactic trees of the Penn Treebank. PropBank also created parallel PropBank resources for other languages and genres. It then moved on to annotate light verb constructions in multiple languages (Hwang et al. 2010). Note that PropBanks traces only record syntactically motivated omissions, not lexically licensed ones (Ellsworth et al. 2004). VerbNet (Kipper-Schuler 2005, Kipper et al. 2006) is a very large lexicon of verbs in English that extends Levin (1993) with explicitly stated syntactic and semantic information. It provides mapping to other resources, including to WordNet senses (Fellbaum 1998) and FrameNet frames. However, it does not include any information on null-instantiated arguments.
In short, the well-known and oft-used resources for text processing simply do not include the requisite information, and hence the ongoing need for researchers to construct new datasets.

A Challenge for the Community
Recent advances in the development of semantic role labeling (SRL) systems (e.g., Swayamdipta et al. 2018) offer the prospect of automating more of FrameNet's process (than at present), specifically the annotation of frame elements (i.e., semantic roles). Such SRL systems are based on existing annotated FN data, and exploit a range of different machine learning techniques , Hermann et al. 2014, Kshirsagar et al., 2015, Tckstrm et al., 2015. Not surprisingly, none of these systems attempt recognizing nullinstantiated frame elements, not least in part due to the difficulty of the task. Still the needed data for doing so is available in the FN database, even if limited. Instead, these systems quietly ignore the presence of the null-instantiated information.
Efforts to identify implicit semantic roles, whether definite or indefinite null instantiations, tend to create limited data sets and focus on the different and new computational techniques that (may) improve the results (as briefly characterized in 1). Nevertheless, the need remains for more data on implicit semantic roles, both to facilitate the ability to recognize these null instantiated elements and to advance the goals of SRL, as well as those of FrameNet in the long term.
As a consequence, the current work calls for the community to partner with FrameNet with the goal of designing a task that exploits the recorded NI information in the database. For example, the task might include developing a new data set that distinguishes null-instantiated CoreSet FEs from other core FEs, thereby eliminating one of the complicating factors in using the FN corpus. Also, comparing results (of NI-recognition) between the new corpus and the existing corpus (of FN's NI data) may yield useful information for future investigation. Of course, the technical details of such a task have yet to be specified. However, by garnering the collective experience of the broad NLP and NLU community, especially those who work with FN data already, FrameNet will be poised to investigate the potential benefit of using the data and to measure that benefit to determine its value.

Summary
This paper has focused on the representation of the linguistic phenomenon of null-instantiated core frame elements, i.e., implicit semantic roles, and their representation in FN. It introduced the basic concepts of FrameNet, illustrated the types of null instantiation for which FN provides information, acknowledging the lexicographically motivated annotation practice, and urged the community to leverage existing data in the FN database. Finally, it also advocates for the design of meaning representations explicitly reference null instantiation. The ubiquity of the phenomenon in language language demands doing so. 8 FrameNet's developers are not impervious to the complexities and FN-specific data format and annotation practice that resulted in an apparently inhospitable resource. Recall the concept of a CoreSet, which interacts with FN's annotation of NIs (as illustrated in 3.4). Also, while the NLTK FrameNet API allows access to NI information by annotation set in a given frame, it does not have a built-in function to query the database by valence pattern (Schneider and Wooters 2017). As a consequence, actually finding NIs is not as easy as would be desirable. Also, as others have indicated already, gaps in coverage play a role in the performance of systems that use FrameNet for different applications (e.g., .
The design of meaning representations for achieving natural language understanding must include the representation of null-instantiated roles. Exploiting an existing semantically rich resource to jump-start a critical aspect of the work is expedient; appealing to FrameNet is essential.