FrameNet and Typology

FrameNet and the Multilingual FrameNet project have produced multilingual semantic annotations of parallel texts that yield extremely fine-grained typological insights. Moreover, frame semantic annotation of a wide cross-section of languages would provide information on the limits of Frame Semantics (Fillmore 1982, Fillmore1985). Multilingual semantic annotation offers critical input for research on linguistic diversity and recurrent patterns in computational typology. Drawing on results from FrameNet annotation of parallel texts, this paper proposes frame semantic annotation as a new component to complement the state of the art in computational semantic typology.


Introduction
For some time, typologists and cognitive linguists have explored and discovered recurring crosslinguistic semantic patterns of differences across languages. Talmy (2000) characterized languages as verb framing or satellite framing, depending on the locus of path information in descriptions of motion events. Nichols et al. (2004) studied basic verbs and their causative counterparts (sit, seat; fall, drop) in 80 languages, determining just four ways of treating the realization of intransitives/transitives as basic/derived. Croft's (2012) model of event structure for aspect and argument structure in diverse languages presents the causal chain as the primary semantic factor in argument realization of simple verbs. 1 Frame Semantics is distinct from PropBank and Unified Meaning Representation (UMR), a typologically-informed annotation scheme, both only peripherally addressing representing predicate-specific roles analogous to FrameNet's frame-specific FEs. PropBank has a feature termed framefiles that UMR inherited. These framefiles are syntactic in nature, bearing no relation to FrameNet's semantic frames. As developers of UMR agree, Frame Semantics is not fully integrated into Computational Typology (Gysel et al., To Appear in Künstliche Intelligenz).
How many such recurring patterns exist? How are such patterns related to each other? Because these and many other questions remain open, we suggest that annotation with semantic frames can help to find semantic universals and languagespecific exceptions, just as syntactic annotation is useful for investigating syntactic typology. Such semantic frames may be very general or quite specific, depending on the nature of the research.
The goal of Computational Typology is "the development of robust language technology applicable across the world's languages" (Dubossarsky et al., 2019). As such, the computational linguistics world must exploit all resources that contribute to the community's understanding of typological phenomena in those languages. FrameNet (FN) and its related projects in diverse languages are underutilized resources that must be a part of an inclusive drive to model semantic typology.
The rest of this paper proceeds as follows here: Section 2 presents FrameNet and Multilingual FN; Section 3 describes a FN study showing typological differences across diverse languages; Section 4 describes another crosslinguistic annotation study; Section 5 discusses crosslinguistic comparability of frames and presents ViToXF, a frame alignment visualization tool; and finally, Section 6 concludes the paper.

FrameNet
FrameNet (Ruppenhofer et al., 2016) is a research and resource development project in corpus-based computational lexicography grounded in the theory of Frame Semantics (Fillmore, 1985).
At the heart of the work is the semantic frame, a script-like knowledge structure that facilitates inferencing within and across events, situations, states-of-affairs, relations, and objects. FN defines a semantic frame in terms of its frame elements (FEs), or participants (and other concepts) in the scene that the frame captures; a lexical unit (LU) is a pairing of a lemma and a frame, characterizing that LU in terms of the frame that it evokes.
Example 1 illustrates annotation for the verb BUY, which FN defines in the Commerce_buy frame, with the FEs BUYER, SELLER, GOODS, and MONEY. 2 1. Chuck BUYER BOUGHT a car GOODS from Jerry SELLER for $2,000 MONEY Along with frames and their associated annotations, FN employs a set of Frameto-Frame Relations to link semantically related frames into a set of frame hierarchies, including Inheritance, Subframe, Precedes, Perspective_on, Inchoative_of, and Causative_of. For instance, FN defines the frame Commerce_buy as Inheriting from Getting and holding a Perspective_on relation to Commerce_goods_transaction, which is a Subframe of Commercial_transaction. Commerce_sell has the same relations, but it inherits from Giving (not Getting). Table 1 lists FN's frame-to-frame relations.

Multilingual FrameNet (MLFN)
Do semantic frames represent universals of human language or are they language specific characterizations of the lexicon? Despite many and varied language-specific patterns of expression, the successful development of FN-type resources for typologically distinct languages leads to the conclusion that many frames constitute appropriate characterizations of events, situations, etc., across typologically diverse languages, especially frames for basic human experiences, like eating, drinking, and sleeping. Even frames for cultural practices are similar across languages; for instance, all commercial transactions, regardless of culture, involve the same participants (or frame elements) defined for English buy. 3 Berkeley FrameNet (BFN) has inspired the development of numerous comparable resources for languages other than English. 4 While the methods to develop these resources have differed, each project creating frames based on its own linguistic data, all consider how they compare with BFN's frames for the lexicon of English (Boas, 2009).

Typology via Frames
Translations attempt meaning equivalence; so, expecting them to evoke the same frames as the original text seems reasonable. Yet, an analysis of frame mismatch (Ellsworth et al., 2006) reveals typological differences in motion and location vocabulary across languages. The annotation of Chapter 14 of The Hound of the Baskervilles (Doyle, 1902) in English, Japanese, Spanish, and German demonstrated that even a modest amount of annotation confirmed known typological differences between English and Ger-man as satellite-framing languages vs. Spanish and (less so) Japanese as verb-framing ones. 5 The annotation also showed several patterns unrelated to these typologies. Consider example 2a, showing an original sentence; 2b is the text of one Japanese translation, and 2c is a (fairly literal) back-translation of the Japanese. 6 In this case, while the Japanese (2) does show a verbframed clause (" 這う" 'crawl on') compared to a satellite-framed clause with a manner verb in English ("came crawling round..."), it also profiles an entirely different concept of visibility, i.e. the extent to which a sentence shows that some focal part of a scene is visible to the speaker. We hypothesize that the cline of saliency of visibility may be comparable to the cline of saliency of manner (Slobin, 2004 Self_motion ] ので (c) Translation: 'Eventually the whole area became slightly blurry, and was gradually wrapped up in the fog, especially as the white fog crept low along the ground.' Japanese consistently makes visibility explicit when other languages leave visibility as an inference from the location and nature of objects (2b vs. 2a). A large sample would show whether this phenomenon is an artifact of the sample (due to stylistics of a particular translation or the nature of the text) or a regular difference between Japanese and the other languages. Still, these results are suggestive.
To the best of our knowledge, typologists have not proposed a feature to distinguish languages based on the preferential encoding of visibility in this way. The frames approach holds power in its ability to code for vastly different domains simultaneously within the same framework.

Parallel Annotation of TED Talk
Building on results from the Hound study and the expansion of FrameNet-related projects, Global FN teams each annotated their own language's version of a TED talk "Do Schools Kill Creativity?" 7 in English, Portuguese, Japanese, and French. Since annotations with different frame inventories are hard to compare (Section 5), the teams agreed only to use the frames and LUs from Berkeley FrameNet (BFN) Release 1.7. If annotators found an appropriate BFN frame, they annotated the target language text with the BFN frame. If not, they marked the phrase with the closest available BFN frame, recording the discrepancy.
However, this exercise had problems. The policy called for teams to annotate the text completely, yet each understood "complete annotation" differently. Also, although the English was a fairly exact transcription of the original talk, versions in other languages were briefer than a full translations would be, since they were intended as subtitles and had to match the video stream timing (Ohara, 2020).
The TED annotation reinforced a key finding of the Hound study: even with an available equivalent frame, the translated phrase may evoke a different target frame, a phenomenon known as a frame shift, analogous to translation shift (Čulo et al., 2012). Consider # 3, where two English sentences (3a) translate into a single sentence of Japanese (3b). (Partial annotation of this sentence appears in # 4 and # 5, below. As in example # 2, # 3c is a back-translation from the Japanese.)

(a) If you think of it, children starting
school this year will be retiring in 2065. Nobody has a clue, despite all the expertise that's been on parade for the past four days, what the world will look like in five years' time. 8

が、TEDに集まるあらゆる分野のエ
キスパートをもってしても５年先の 世界ですらわかりません。 (c) Translation: 'Children enrolling in elementary school this year will reach retirement age in 2065, but even with the experts in every possible field gathered at TED, we don't even know about the world five years from now.' This short passage includes several examples of frame shift. The first English sentence (roughly, the first Japanese clause) treats school as the activities that occur in the building, evoking Activity_start, while Japanese specifies that it is an elementary school, treating it as an organization of which children become a part, evoking Becoming_a_member. An alternative analysis treats the whole phrase 'enroll in elementary school' as a multiword expression, evoking Activity_start, as in English. 9

(a) [children AGENT ] [starting
The English verb phrase repeated in 5a uses the one word retiring that evokes the frame Quitting. The Japanese 5b uses 定年を迎え ま す, analyzable either as a multiword expression (also evoking Quitting) or separately as 定年 (teinen) 'fixed year' and 迎えます (mukaemasu) 'welcome/go to meet'. Since Japanese workers generally retire at age 60, teinen has come to mean 'fixed retirement age'. The highly entrenched collocation with mukaemasu can imply happiness about reaching one's goal, as if meeting with a friend. Analyzed as such, mukaemasu evokes a motion frame and a backgrounded emotion frame, 10 with teinen the (metaphorical) GOAL. 9 Of course, school itself can stand for the institution or organization, the place where this is located, the activity at the school, and for the people via metonymy. 10 The Japanese mukaemasu is highly associated with happiness, which can be encoded in Frame Semantics by placing it in a frame that inherits from Arriving and uses Experiencer_focused_emotion, a frame that also contains happily. Such data suggest that detecting frame shifts facilitates recognizing precise cultural and conceptual differences across languages. The examples above are quite specific, but form part of larger conceptual systems reflected in the lexicon of each language, such as the system of terms for older/younger classmates partially shared across Chinese, Japanese, and Korean (Davies and Ikeno, 2002). Frame annotation can help typologists take advantage of many such patterns. Work is also underway on a system to predict frame shifts, based on the TED annotation data. 11

Frame Alignment
Comparing the annotations across FrameNet projects demands raising the question about the extent to which frames are universal. In the individual and joint projects, all FrameNet projects agreed on semantic frames and found BFN frames generally applicable to their language. For example, all languages have a Self_motion frame, with MOVER, SOURCE, PATH, and GOAL FEs. Thus, semantic frames provide useful generalizations both over LUs within a language and across languages. However, crosslinguistic frame relations are not limited to equivalence. A language's frames can be broader or narrower than the nearest BFN frame; it even might give a different point of view on a scene. 12 For example, English I LIKE X, with its verb in the Experiencer_focused_emotion frame translates into Spanish Me GUSTA X -'X pleases me' with its verb in the Experiencer_object frame.
Moreover, as Section 4 indicates, cultural differences may preclude the existence of equivalent frames, e.g., for religious concepts or legal processes, which differ greatly across cultures. The MLFN team developed several different approaches to provide quantitative measures of frame similarity across languages. Some of them rely on finding translation equivalents from the LUs in the BFN frame to those in the target language frame, using Open Multilingual Wordnet (Bond and Foster, 2013). Various measures of set overlap then give a value for the frame similarity. Other approaches use MUSE vector embeddings (Bojanowski et al., 2017); the metric can be either the mean vector similarity of all pairs of LUs in a pair frames in the two languages or the similarity between the mean vector for the LUs in a frame in one language and the same value for a frame in the other. Both approaches are beset with problems caused by the ambiguity of words taken out of context, but nevertheless reveal interesting differences in conceptualization between languages.
The MLFN team also developed a tool to facilitate visualizing cross-linguistic frame similarity, called ViToXF (Visualization Tool across FrameNets). The tool provides numerous parameter settings, such as the type of alignment algorithm and the minimum level of similarity to display. Figure 1 shows the tool displaying English and Spanish alignments of motion frames. Baker and Lorenzi (2020) provides details about the alignment algorithms and the parameters of the visualization tool. These data, the tool, and the TED parallel annotation will be available for the workshop.

Concluding Remarks
Crosslinguistic frame semantic annotation highlights the tension between language-specific meaning representations and the kind of generalizations that typology needs (Haspelmath, 2020). However, to be useful, the relationships between meanings must be structured to allow the recognition of commonalities and differences. FrameNet relations provide a sufficiently general framework to explore crosslinguistic semantic differences, without prejudging the nature of such relationships. Fine-grained analysis tied to an elaborate frame hierarchy of the sort available in FrameNet allows the viewing of linguistic structures at any level of abstraction from which computational typologists can confirm, refute, or add nuance to existing hypotheses, as well as discover previously unseen semantic patterns.