FrameNet+: Fast Paraphrastic Tripling of FrameNet

We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to manually ﬁlter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40% better coverage when evaluated in a practical setting on New York Times data.


Introduction
Frame semantics describes a word in relation to real-world events, entities, and activities. Frame semantic analysis can improve natural language understanding (Fillmore and Baker, 2001), and has been applied to tasks like question answering (Shen and Lapata, 2007) and recognizing textual entailment (Burchardt and Frank, 2006;Aharon et al., 2010). FrameNet (Fillmore, 1982;Baker et al., 1998) is a widely-used lexical-semantic resource embodying frame semantics. It contains close to 1,000 manually defined frames, i.e. representations of concepts and their semantic properties, covering a wide array of concepts from Expensiveness to Obviousness.
Frames in FrameNet are characterized by a set of semantic roles and a set of lexical units (LUs), which are word/POS pairs that "evoke" the frame. For example, the following sentence contains a mention (i.e. target) of the Obviousness frame: In late July, it was barely visible to the unaided eye. This particular target instantiates several semantic roles of the Obviousness frame, including a Phenomenon (it) and a Perceiver (the unaided eye). Here, the LU visible.a evokes the frame. In total, the Obviousness frame has 13 LUs including clarity.n, obvious.a, and show.v. accurate, ambiguous, apparent, apparently, audible, axiomatic, blatant, blatantly, blurred, blurry, certainly, clarify, clarity, clear, clearly, confused, confusing, conspicuous, crystal-clear, dark, definite, definitely, demonstrably, discernible, distinct, evident, evidently, explicit, explicitly, flagrant, fuzzy, glaring, imprecise, inaccurate, lucid, manifest, manifestly, markedly, naturally, notable, noticeable, obscure, observable, obvious, obviously, opaque, openly, overt, patently, perceptible, plain, precise, prominent, self-evident, show, show up, significantly, soberly, specific, straightforward, strong, sure, tangible, transparent, unambiguous, unambiguously, uncertain, unclear, undoubtedly, unequivocal, unequivocally, unspecific, vague, viewable, visibility, visible, visibly, visual, vividly, well, 1 woolly The semantic information in FrameNet (FN) is broadly useful for problems such as entailment (Ellsworth and Janin, 2007;Aharon et al., 2010) and knowledge base population (Mohit and Narayanan, 2003;Christensen et al., 2010;Gregory et al., 2011), and is of general enough interest to language understanding that substantial effort has focused on building parsers to map natural language onto FrameNet frames (Gildea and Jurafsky, 2002;Das and Smith, 2012). In practice, however, FrameNet's usefulness is limited by its size. FN was built entirely manually by linguistic experts. As a result, despite many years of work, most of the words that one confronts in naturally occurring text do not appear at all in FN. For example, the word blatant is likely to evoke the Obviousness frame, but is not present in FN's list of LUs (Table 1). In fact, out of the targets we sample in this work (described in Section 4), fewer than 50% could be mapped to a correct frame using the LUs in FrameNet. This finding is consistent with what has been reported by Palmer and Sporleder (2010). Such low lexical coverage prevents FN from applying to many real-world applications.  In this work, we triple the lexical coverage of FrameNet quickly and with high precision. We do this in two stages: 1) we use rules from the Paraphrase Database (Ganitkevitch et al., 2013) to automatically paraphrase FN sentences and 2) we apply crowdsourcing to manually verify that the automatic paraphrases are of high quality. While prior efforts have entertained the idea of expanding FN's coverage (Ferrández et al., 2010;Das and Smith, 2012;Fossati et al., 2013), none have resulted in a publicly available resource that can be easily used. As our main contribution, we release FrameNet+, a huge, manually-vetted extension to the current FrameNet. FrameNet+ provides over 22,000 new frame/LU mappings in a format that can be readily incorporated into existing systems. We demonstrate that the expanded resource provides a 40% improvement in lexical coverage in a practical setting.

Expanding FrameNet Automatically
The Paraphrase Database (PPDB) (Ganitkevitch et al., 2013) is an enormous collection of lexical, phrasal, and syntactic paraphrases. The database is released in six sizes (S to XXXL) ranging from highest precision/lowest recall to lowest average precision/highest recall. We focus on lexical (single word) paraphrases from the XL distribution, of which there are over 370K.
Our aim is to increase the type-level coverage of FN. We use the rules in PPDB along with a 5-gram Kneser-Ney smoothed language model (Heafield et al., 2013) to paraphrase FN's full frame-annotated sentences (called fulltext). We ignore paraphrase rules which are redundant with LUs already covered by FN. This method for automatic paraphrasing has been discussed previously by Rastogi and Van Durme (2014). However, whereas their work only discussed the idea as a hypothetical way of augmenting FN, we apply the method, vet the results, and release it as a public resource.
In total, we generate 188,061 paraphrased sen-tences, covering 686 frames. Table 2 shows some of the paraphrases produced.

Manual Refining with Crowdsourcing
Our automatic process produces a large number of good paraphrases, but does not address issues like word sense, and many of the paraphrased LUs alter the sentence so that it no longer evokes the intended frame. For example, PPDB proposes free as a paraphrase of open. This is a good paraphrase in the Secrecy status frame but does not hold for the Openness frame (Table 3).  We therefore refine the automatic paraphrases manually to remove paraphrased LUs which do not evoke the same frame as the original LU. We show each sentence to three unique workers on Amazon Mechanical Turk (MTurk) and ask each to judge how well the paraphrase retains the meaning of the original phrase. We use the 5-point grading scale for paraphrase proposed by Callison-Burch (2008).
To ensure that annotators perform our task conscientiously, we embed gold-standard control sentences taken from WordNet synsets. Overall, workers were 76% accurate on our controls and showed good levels of agreement-the average correlation between two annotators' ratings was ρ = 0.49. Figure 1 shows the distribution of Turkers' ratings for the 188K automatically paraphrased targets. In 44% of cases, the new LU was judged to retain the meaning of the original LU given the frame-specific context. These 85K sentences contain 22K unique frame/LU mappings which we are able to confidently add to FN, tripling the total number in the resource. Figure 1 shows 69 new LUs added to the Obviousness frame. Figure 1: Distribution of MTurk ratings for paraphrased fulltext sentences. 44% received an average rating ≥ 3, indicating the paraphrased LU was a good fit for the frame-specific context.

Evaluation
We aim to measure the type-level coverage improvements provided by our expanded FrameNet in a practical setting. Ideally, one would like to identify frames evoked by arbitrary sentences from natural text. To emulate this setting, we consider potentially frame-evoking LUs sampled from the New York Times. The question we ask is: does the resource contain an entry associating this LU with the frame that is actually evoked by this target?
FrameNet+ We refer to the expanded FrameNet, which contains the current FN's LUs as well as the proposed paraphrased LUs, as FrameNet+.
The size and precision of FrameNet+ can be tuned by setting a threshold t and only including LU/frame mappings for which the average MTurk rating was at least t. Setting t = 0 includes all paraphrases, even those which human's judged to be incorrect, while setting t > 5 includes no paraphrases, and is equal to the current FN. Unless otherwise specified, we set t = 3. This includes all paraphrases which were judged minimally as "retaining the meaning of the original." Sampling LUs We consider a word to be "potentially frame-evoking" if FN+ (t = 0) contains some entry for the word, i.e. the word is either an LU in the current FN or appears in PPDB-XL as a paraphrase of some LU in the current FN. We sample 300 potentially frame-evoking word types from the New York Times: 100 each nouns, verbs, and adjectives. We take a stratified sample: within each POS, types are divided into buckets based on their frequency, and we sample uniformly from each bucket.
Annotation For each of the potentially frameevoking words in our sample, we have expert (non-MTurk) annotators determine the frame evoked. The annotator is given the candidate LU in the context of the NYT sentence in which it occurred, and is shown the list of frames which are potentially evoked by this LU according to FrameNet+. The annotator then chooses which of the proposed frames fits the target, or determines that none do. We measure agreement by having two experts label each target. On average, agreement was good (κ=0.56). In cases where they disagreed, the annotators discussed and came to a final consensus.

Results
We compute the coverage of a resource as the percent of targets for which the resource contained a correct LU/frame mapping. Figure  2 shows the coverage computed for the current FN compared to FN+. By including the humanvetted paraphrases, FN+ is able to return a correct LU/frame mapping for 60% of the targets in our sample, 40% more targets than were covered by the current FN. Table 4 shows some sentences covered by FN+ that are missed by the current FN.    LU paraphrases (setting t = 0) provides nearly 70 LUs per frame and offers 71% coverage.

Data Release
The augmented FrameNet+ is available to download at http://www.seas.upenn. edu/˜nlp/resources/FN+.zip. The resource contains over 22K new manually-verified LU/frame pairs, making it three times larger than the currently available FrameNet.  The release also contains 85K human-approved paraphrases of FN's fulltext. This is a huge increase over the 4K fulltext sentences currently in FN, and the new data can be easily used to retrain existing frame semantic parsers, improving their coverage at application time.

Related Work
Several efforts have worked on expanding FN coverage. Most approaches align FrameNet's LUs to WordNet or other lexical resources (Shi and Mihalcea, 2005;Johansson and Nugues, 2007;Pennacchiotti et al., 2008;Ferrández et al., 2010). Das and Smith (2011) and Das and Smith (2012) used graph based semi-supervised methods to improve frame coverage and Hermann et al.
(2014) used word and frame embeddings to improve generalization. All of these improvements are restricted to their respective tool rather than a general-use resource. In principle one of these tools could be used to annotate a large corpus in search of new LUs, but their precision on unseen predicates/LUs (our focus here) is still below 50%, considerably lower than this work. Fossati et al. (2013) added new frames to FN by collecting full frame annotations through crowdsourcing, a more complicated task that again did not result in a useable resource. Buzek et al. (2010) applied crowdsourced paraphrasing to expand training data for machine translation. Our approach differs in that we expand the number of LUs directly using automatic paraphrasing and use crowdsourcing to verify that the new LUs are correct. We apply our method in full, resulting in a large resource can be easily incorporated into existing systems.

Conclusion
We have applied automatic paraphrasing to greatly increase the type-level lexical coverage of FrameNet, a widely used resource embodying the theory of frame semantics. We use crowdsourcing to manually verify that the newly added lexical units are correct, resulting in FrameNet+, a high-precision resource that is three times as large as the existing resource. We demonstrate that in a practical setting, the expanded resource provides a 40% increase in the number of sentences for which FN is able to identify the correct frame. The data released will improve the applicability of FN to end-use applications with diverse vocabularies. ligence (AI2), the Human Language Technology Center of Excellence (HLTCOE), and by gifts from the Alfred P. Sloan Foundation, Google, and Facebook. This material is based in part on research sponsored by the NSF under grant IIS-1249516 and DARPA under agreement number FA8750-13-2-0017 (the DEFT program). The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes. The views and conclusions contained in this publication are those of the authors and should not be interpreted as representing official policies or endorsements of DARPA or the U.S. Government.