Finding metaphorical triggers through source (not target) domain lexicalization patterns

Some metaphorical mappings between source and target are obvious and appear in colloc a-tion patterns in natural language data. Howe v-er, other meta phors that structure abstract pr o-cesses or complex topics are trickier to inve s-tigate because the target domain is lexically divorced from the source. Using metaphors for the economy as a case study, this paper i n-troduces new techniques to find metaphorical tokens when target and source relationships are nonobvious. Through novel methods , c o n-stellations of source - domain triggers are ide n-tified in the data and evaluated for met a-phoricity and keyness and then grouped a c-cording to trigger potency .


Introduction
Collections of naturally occurring language data serve as repositories of metaphor and can be used to investigate lexical patterns indicative of conceptual metaphors. Since the introduction of sizable, computationally searchable corpora, metaphor researchers and corpus linguists have begun to probe questions of quantitative validity (Deignan 2005). This movement toward measurable indicators of metaphorical salience has allowed for increasing focus on the detection and numeration of conceptual metaphors, which is seen as a mechanism to shield conceptual metaphor research from ongoing methodological criticism (Gibbs 2015). For some, the validity of Conceptual Metaphor Theory (Lakoff and Johnson 1980) rests on movement away from a fine-grained qualitative analysis of exemplary data to more robust experimental and quantitative measures designed to gauge salience and nuanced details of how conceptual metaphors are lexicalized and expressed in natural discourse (Gibbs 2011;Deignan 2012).
Most analyses, however, are based on metaphorical data that are easy to mine. That is, examining a corpus for metaphorical data in which both source and target domain language is paired and collocated is a straightforward process. But, this method is not possible for some metaphorical concepts due to the nature of how target domains are instantiated. When a target domain like the ECONOMY is understood as a complex system and is based on multiple conceptual metaphors, direct lexical searches centered on target domain lexis will not recover all pertinent structural information about active source domains.
In addition to providing metaphor researchers supplementary strategies to validate impressionistic conclusions of model dominance, the methods modeled in this study offer several future lines of inquiry concerning the automated extraction of metaphorical data. The programmed extraction of metaphorical tokens is not only of interest to corpus linguists and metaphor researchers but is also a focus of attention in natural language processing and computational linguistics (Babarczy et al 2010;Tang et al. 2010). A subgroup of these researchers is focused on the automatic detection of metaphorical tokens in relation to the system of conceptual metaphors that structure a given language (cf. Shutova et. al. 2013;Dodge et al. 2015).
In this study, I offer a new mining technique centered on source, rather than target, language. The conceptual metaphors used to understand the economy are overwhelmingly activated by a particular group of metaphorical tokens. That is, specific frames that structure the economy are linked to a subset of lexemes, which consistently appear metaphorically and occur more frequently in economic discourse than in the general corpus. In this paper I will probe a subgroup of these metaphors including ECONOMY IS A SHIP and ECONOMY IS A WEATHER EVENT through an examination of source domain lexicalization patterns. I describe a systematic method of pulling common collocates of source domain labels from a general corpus, and then introduce new techniques to evaluate and group these metaphor triggers. In the case of the economy, a subset of source-domain triggers are habitually used metaphorically and occur more frequently in a corpus of economic discussion from The Economist magazine than in nonspecific English discourse. This relative frequency differential is assessed through a 'keyness score', which is a numerical measure of how frequent, on average, metaphorical triggers occur in the restricted corpus compared to the baseline corpus. Words that consistently activate the metaphorical source domain fit into a category of 'super triggers' and can then be used to mine additional corpora for more metaphorical data. Background For quite some time, metaphor analysts have been engaged in automated data extraction of one kind or another. In most cases, methodological approaches focus on target domain language. For instance, Oster (2010) relies on collocation patterns in a nonspecific corpus to show which lexical units are most associated with metaphorical description of the emotion fear. Oster gathers co-occurrence information -the lexical units that most frequently collocate with fear -to find target-specific metaphorical expressions. She uses the results of collocation searching to build a source-domain ontology, arguing that the most "relevant" metaphors are those evoked by the highest number of linked linguistic expressions (p.742). For example FEAR IS SOMETHING INSIDE THE BODY is evoked more frequently than is FEAR IS AN ANTAGONIST. Some metaphors, however, such as FEAR IS FIRE are more creatively produced because they are linked to a larger set of linguistic expressions. In Oster's approach, frequency information combines with lexical co-occurrence data to produce a source domain's "productivity and creativity index" (p. 748) -additional parameters by which source domains can be compared.
Following a similar semi-automated approach, investigators working on the MetaNet project have engaged in a corpus-driven, lexical approach to researching the alignment between target domain expressions, source domain frames, and the grammatical constructions that blend the two (David, et al. 2014;Stickles et al. 2014). Target and source word pairs, such as alleviate poverty, in which the source domain of DISEASE is evoked to understand the target domain POVERTY, are used to examine the frequency of one source domain in relation to another. Through these source-target pairings, the frequency of activation of individual frames can be compared to other frames within the same source domain. For instance, in the British National Corpus, Stickles et al. (2014) show how poverty is more frequently discussed as a disease than as a basic harm. And, when understood as a disease, speakers are more likely to discuss the treatment of the affliction of poverty than the diagnosis of the disease of poverty. Thus, at a macro level, the corpus results lead to the conclusion that AFFLICTION and TREATMENT roles in the source domain are more salient than is the role of DIAGNOSIS .
The commonality among most corpus-based approaches to metaphor research, whether through manual searching, sorting, and collection or semiautomated searching based on collocation patterns, is the focus on the target domain. In all cases described above, researchers use lexical items indicative of the target domain to find instances of conceptual metaphors. Oster's collocations searches rested on the word fear in order to find semi-fixed expressions like fight fear. Likewise, the data mining approach taken by Stickles et al. was to look for common collocates of the word poverty, such as spread, alleviate, and fight.
Many target domains, however, cannot be thoroughly investigated by searching a corpus for collocates of target lexemes because not all target language occurs near or next to source domain triggers. This is the case for metaphorical concepts that are fundamentally understood as processes not as entities, and most target domains, like the economy, are built on extremely complex conceptual ecology. Thus, the ease with which metaphorical structure can be exposed has to do with the relationship between source-domain language and the structural character of the target domain. When the target domain is cognitively complex and lexically divorced from the source, target-domain directed searching limits the extraction of relevant data.
Because of the constraints imposed by manual searching and the lexical division between source and target triggers, metaphor researchers in corpus linguistics have turned to alternative approaches (Koller et al., 2008;L'Hôte 2014;Demmen et. al. 2015). Demmen et al. (2015) use a semi-automated corpus-based approach to research "violence metaphors" active in discourse on cancer. In their method, repeated source domain verbs like fight, battle, and struggle, identified through manual searching, are grouped according to predetermined semantic fields such as "warfare" or "damaging and destroying" (p.211). These fields come from an adapted version of the UCREL2 Semantic Analysis System (USAS) tagger (Rayson et al. 2004) in Wmatrix (Rayson 2008). Their identification of relevant semantic fields, and the lexis associated with each, yields additional search tokens such as destroy and shatter, which serve as supplementary source domain triggers used to locate further metaphorical tokens in cancer discourse. This focus on the identification of sourcedomain language leads to a greater diversity of identified metaphorical lexis, feeding a systematic comparison of particular metaphorical tokens across groups of speakers and across genres of data. However, the use of pre-specified semantic domains can constrain the types of search tokens yielded because the resulting lexis is, for the most part, homogenous in nature -sets of synonymous verbs.
Like these studies, my methodology moves away from a target lexis focus, querying metaphorical language through a concentration on source language. However, the methods described below differ significantly from previous corpus-based approaches. Instead of relying on pre-specified semantic fields in the collection of candidate metaphor triggers, I use corpus collocation patterns to gather candidate triggers. This methodology results in a frame-based collection of candidate search terms, as opposed to a definition-based collection, resulting in a more comprehensive compilation strategy.

A Case Study: The Economy
As a target domain, the economy, along with basic understandings of business and finance, have been well researched within metaphor analysis both in and out of academia. Because the metaphors used to structure economic thinking are well understood, it serves as a good case study to investigate the ways in which the metaphors are instantiated in natural discourse. All data referenced in this paper comes from a 2,084,650 token corpus built from the business and finance sections of The Economist magazine (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015).
The economy, as a complex, abstract system, does not have a particularly unique metaphorical structure. Systems of all kinds, such as social organizations, governments, corporations, climate, and physical organisms are understood primarily through the same sets of metaphors. These metaphors all have one thing in common: The source domains represent different instantiations and elaborations of material structures vis-à-vis the primary metaphor ABSTRACT SYSTEMS ARE PHYSI-CAL STRUCTURES. The structures that serve as subcases of this superordinate metaphor, however, vary: abstract complex systems can be understood through several types of concrete structure including machines, buildings, plants, and human bodies (Kövecses 2010: 156).

Metaphors for the Economy
The metaphors for the economy are varied and complex, as are the mapping within each metaphor. The partial metaphor list here comes from existing analyses (cf. McCloskey 1986;Boers 1997;Boers & Demecheleer 1997;Skorczynska & Deignan 2006;Kövecses 2010;Shenker-Osorio 2012). Illustrative data is from The Economist corpus. The overlapping nature of these related source domains helps explain the prevalence of lexical items that are compatible with more than one metaphor. Domain interdependence is hypothesized to, in part, explain how metaphorical lexis can activate more than one source domain, and the assignment of metaphorical tokens to one domain or another must be based on a close read of surrounding context. by searching for only target domain words such as economic, finance, and business because of the metaphorical complexity involved. Many source domain triggers in economic discourse are lexically removed from target domain language, and in this way, economic discourse mirrors the metaphorical disease language studied by Demmen et al. (2015) and Koller et al. (2008). Just as in these studies, I advocate for a new approach in which metaphors for the economy are investigated through the lexicalization patterns of the source domain. In this mixed-method approach, a small portion of the specialized corpus is qualitatively scanned for salient metaphors, and then quantitatively assessed by pulling source, not target, domain examples. However, unlike recently published corpus studies, I adopt an approach that integrates corpus collocation patterns. Rather than using a semantic ontology to build a set of source domain triggers, I pull triggers by looking at common collocates of source domain labels. Through the Sketch Engine interface (Kilgarriff et al. 2014), I compiled a 2,084,650 token corpus of The Economist data taken from articles in the Business and Finance sections of issues published between 2008 and 2015. In order to partially automate the identification of source domain language in this specialized corpus, I used collocation searching in a baseline corpus to identify source domain triggers (lexical items that activate one or more source domains used to structure the target concept). In this procedure, a source domain label serves as a collocation magnet to collect a list of frequent words associated with the specified domain. This method was employed to investigate source domains with tight constellations of lexical triggers.

ECONOMIC SYSTEMS ARE BUILDINGS
To investigate the SHIP source domain, I searched the academic section of the Corpus of Contemporary American English (Davies 2008-) for the most frequent significant collocates of the word ship (9L, 9R; MI > 3). (An MI score is a statistical measure of lexical attraction in the corpus and an MI greater than 3.0 is interpreted as significant (Cheng 2012).) To find weather-related language, the same collocate search was carried out with the word weather. This technique produced a list of the 100 top collocates for each source domain label. These candidate source domain triggers were then evaluated for their metaphoricity (rate of metaphorical use) in relationship to economic dis-course in the specialized corpus by manually scanning concordance lines.

Results
The search method described above yielded a set of 100 potential source domain triggers for the two economic metaphors: ECONOMY IS A SHIP and ECONOMY IS A WEATHER EVENT.

Common Collocates
I have established three distinct categories to classify source domain triggers in the specialized corpus. 'Trigger lexeme' is the term I use to indicate any lexical item in the specialized corpus that evokes one or more relevant source domains. Many words can function as trigger lexemes. Some are words very closely tied to a source domain frame; for example, the phrase on life support is directly tied to our understanding of hospitals, emergency rooms, and very sick patients, and can be used to activate the metaphor ECONOMY IS (AIL-ING) HUMAN BODY. Other words, however, activate one source domain, but that source domain structures more than one target. This would be the case for a word like circulation, which can be used to describe the movement of money (MONEY IS LIQUID) but can also be used as a source domain trigger for a different metaphor like IMMIGRATION IS THE FLOW OF WATER. Thus circulation is only counted as a trigger when used in metaphorical description of the economy. 'Significant trigger lexemes' are lexical items that have a significant rate of use as a source domain trigger, quantified as a frequency of three or more metaphorical uses in reference to the target. These triggers must also be used in metaphorical reference to the target domain in at least 20% of the instances of total use, yielding a moderate to high rate of metaphoricity.
'Super trigger lexeme' refers to lexical items unique to the specialized corpus with a significant rate of metaphorical use in reference to the specified target domain (over 20%). Uniqueness is measured by a disproportionate use in the specialized corpus compared to a baseline corpus. To quantify uniqueness, I ranked individual lexical items through a basic algorithm, which I call a 'keyness score' (Ahmad 2005). In this measure, the frequency rate of the lexical item in the specialized corpus is divided by the frequency ratio of the item in a general corpus. Any item which occurs at a higher rate in the specialized corpus will measure at a keyness score greater than 1.0. keyness(term) = F special /N special _______________ F general /N general This measure quantifies the relative frequency of a particular lexical item in the specialized corpus (compared to a baseline) and allows lexical items to be both ranked by their relative frequency and numerically compared to one another.

The SHIP Source Domain
Trigger lexemes reside on a scale of potency in their relationship to the source domain. Some triggers loosely connect to the source domain, while others consistently evoke it. Figure 3 is comprised of a subset of the 100 top collocates of the lexical item ship in the academic section of COCA (9L, 9R; MI > 3). These are all words that evoke the SHIP frame. But, again, there is wide variation in whether or not these collocates are used as source domain triggers for the metaphor ECONOMY IS A SHIP.  Certain frequent lexical items in the discourse about weather such as the word snow are never used in metaphors for the economy; whereas, other words like vagaries, part of weather discourse but less frequent in general discourse, are consistently used with a metaphorical meaning in discussions of the economy.

Limitations
Even though source domain collocations yield a large quantity of metaphorical language, there are limits. Not all trigger lexemes can be found through this automated technique. Manually tagging a subset of The Economist corpus reveals that there are robust source domain triggers that are not frequent collocates of their source frame labels. That is, there are trigger lexemes that come from our understanding of ships and weather, which do not frequently co-occur with the word ship and weather, and the fact that these triggers exist shows the limitations in using a methodology that exclusively relies on source frame collocation magnets.
An additional obstacle to this automated approach is the great variability in how dense and connected the constellation of source domain triggers is within a given source domain. In contrast to the source domains of WEATHER and SHIP, the source domain of an UNHEALTHY HUMAN BODY is a lexically diffuse conceptual domain. That is, there is no fruitful source domain label to use as a collocate magnet in COCA. Thus, the automated technique of using collocation patterns to find source domain seed language does not work here. In this case, source domain triggers have to be directly collected from The Economist corpus in order to investigate the metaphor by searching for likely source domain triggers given the identified source frame.
Standout source domain triggers among this set are the words healthy, recovery, contagion, and ailing. These lexemes, which occur frequently in the data, have high percentages of metaphorical use in reference to economic concepts and significant keyness scores, meaning they are more frequently represented in economic discourse than in academic English. We can, to some extent, explain super triggers through the overlapping of source domains, as was illustrated in Figure 1. Figure 6 contains all the super triggers from the various metaphors investigated above. About half of the total super triggers are words are compatible with more than one source domain.
Concepts like float, buoy, choppy, turbulent, and wave are central to our understanding of the movement of water, the performance of a ship on the water, and the effect of inclement weather on the water. When we think about navigating a ship through bad weather, these are concepts that come to mind. The motivated, yet semi-random, nature of this collection of super triggers should remind us that Conceptual Metaphor Theory is not predictive of how source domains will be lexicalized (Lakoff and Johnson 1980). Metaphorical mappings serve as a template to model the systematic nature of figurative language and metaphorical reasoning, and once conceptual metaphors are identified, the re-occurrence of specific source domains can be expected. But, there is no way to predict a priori exactly how a source domain will be evoked in natural discourse, nor is there a way to foresee what source domain language will be adopted into the speech community and used consistently as idiomatic jargon.

Lexical Divorce
Words evoking the target domain of metaphors for the economy are, of course, prevalent and frequent in economic discourse. There are 12 target triggers in the top 100 most frequent words in The Economist corpus. (Top 100 list includes both content and function words.) Not one source domain super trigger is in the top 100.
Bank* Invest* Market* Firm* Debt* Financ* Rate* Price* Capital* Growth* Economy* Money* Lexical divorce between source and target is evaluated by the existence or absence of collocations of common source and target triggers. Source triggers are inevitably less frequent than target triggers because, while the target is invariable, the source domains are numerous. Thus, the source domain 'super triggers' identified in Figure 6 are not among the most frequent words used when discussing economic issues. The only source domain super trigger in the top 1000 most frequent words in The Economist corpus is recovery (rank: 425). A search of the top 5000 collocates of each target lexeme reveals few source domain super triggers co-occur near or next to these target words (9R, 9L; MI>3.00). Of all source super triggers found through the source domain collocation searches described above, only a small subset appear regularly or even semi-regularly near or next to a target domain word (within the 1000 most frequent collocates). Figure 8 displays  These patterns confirm that target domain directed searching may not be a particularly efficient or effective way of finding source domain language, certainly not an effective technique to find source domain triggers with high rates of metaphoricity that do not occur near or next to target language. The distribution also indicates that lexical divorce between source and target triggers is a significant obstacle to those automated extraction systems which rely on close collocations between source and target lexis.

Conclusion
Metaphor is treated differently across research fields. Corpus linguists tend to be interested in lexical metaphors, defined as nonliteral lexis used in text. Cognitive linguists tend to be interested in conceptual metaphors -the hierarchical, abstracted system of metaphor. Researchers interested in natural language processing fall into both of these camps. But those of the latter school must contend with metaphorical source domain language that is lexically separated from target concept. This lexical divorce between source and target triggers is a substantial obstacle to automated metaphor identification systems because metaphor retrieval can't be based exclusively on pairings of source and target words.
In this paper, I have demonstrated a methodology to use nonrestricted corpus data to identify constellations of source domain lexis that serve as potential metaphorical triggers. This simple technique involves searching a baseline corpus for common collocates of identified source domain labels. Once collected, these lexemes associated with the source domain frame can then be evaluated for their metaphoricty -use in reference to the target conceptand their 'keyness' -relative frequency in topical discourse concerning the target domain. These two metrics allow for a systematic, comparative evaluation of source language along a scale of trigger 'potency'.
NLP efforts concerning metaphor can be furthered through the identification of 'significant' and 'super' source domain triggers-lexical items that reliably evoke a robust conceptual metaphor. The effort to build ontologies of conceptual metaphor, necessary for any adequate computational model of semantic processing, should occur alongside the identification of salient trigger language, and the construction of such systems will benefit from the methodologies outlined in this paper.