Catching the Common Cause: Extraction and Annotation of Causal Relations and their Participants

In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.


Introduction
Causality is an important concept that helps us to make sense of the world around us. This is exemplified by the Causality-by-default hypothesis (Sanders, 2005) that has shown that humans, when presented with two consecutive sentences expressing a relation that is ambiguous between a causal and an additive reading, commonly interpret the relation as causal.
Despite, or maybe because of, its pervasive nature, causality is a concept that has proven to be notoriously difficult to define. Proposals have been made that describe causality from a philosophical point of view, such as the Counterfactual Theory of causation (Lewis, 1973), theories of probabilistic causation (Suppes, 1970;Pearl, 1988), and production theories like the Dynamic Force Model (Talmy, 1988).
Counterfactual Theory tries to explain causality between two events C and E in terms of conditionals such as "If C had not occurred, E would not have occurred". However, psychological studies have shown that this not always coincides with how humans understand and draw causal inferences (Byrne, 2005). Probabilistic theories, on the other hand, try to explain causality based on the underlying probability of an event to take place in the world. The theory that has had the greatest impact on linguistic annotation of causality is probably Talmy's Dynamic Force Model which provides a framework that tries to distinguish weak and strong causal forces, and captures different types of causality such as "letting", "hindering", "helping" or "intending".
While each of these theories manages to explain some aspects of causality, none of them seems to provide a completely satisfying account of the phenomenon under consideration. The problem of capturing and specifying the concept of causality is also reflected in linguistic annotation efforts. Human annotators often show only a moderate or even poor agreement when annotating causal phenomena (Grivaz, 2010;Gastel et al., 2011). Some annotation efforts abstain altogether from reporting inter-annotator agreement at all.
A notable exception is Dunietz et al. (2015) who take a lexical approach and aim at building a constructicon for English causal language. By constructicon they mean "a list of English constructions that conventionally express causality" (Dunietz et al., 2015). They show that their approach dramatically increases agreement between the annotators and thus the quality of the annotations (for details see section 2). We adapt their approach of framing the annotation task as a lexicon creation process and present first steps towards build-ing a causal constructicon for German. Our annotation scheme is based on the one of Dunietz et al. (2015), but with some crucial changes (section 3).
The resource under construction contains a lexicon component with entries for lexical units (individual words and multiword expressions) for different parts of speech, augmented with annotations for each entry that can be used to develop a system for the automatic identification of causal language.
The contributions of this paper are as follows.
1. We present a bootstrapping method to identify and extract causal relations and their participants from text, based on parallel corpora.
2. We present the first version of a German causal constructicon, containing 100 entries for causal verbal expressions.
3. We provide over 1,000 annotated causal instances (and growing) for the lexical triggers, augmented by a set of negative instances to be used as training data.
The remainder of the paper is structured as follows. First, we review related work on annotating causal language (section 2). In section 3, we describe our annotation scheme and the data we use in our experiments. Sections 4, 5 and 6 present our approach and the results, and we conclude and outline future work in section 7.

Related Work
Two strands of research are relevant to our work, a) work on automatic detection of causal relations in text, and b) annotation studies that discuss the description and disambiguation of causal phenomena in natural language. As we are still in the process of building our resource and collecting training data, we will for now set aside work on automatic classification of causality such as (Mirza and Tonelli, 2014;Dunietz et al., In press) as well as the rich literature on shallow discourse parsing, and focus on annotation and identification of causal phenomena.
Early work on identification and extraction of causal relations from text heavily relied on knowledge bases (Kaplan and Berry-Rogghe, 1991;Girju, 2003). Girju (2003) identifies instances of noun-verb-noun causal relations in WordNet glosses, such as starvation N1 causes bonyness N2 .
She then uses the extracted noun pairs to search a large corpus for verbs that link one of the noun pairs from the list, and collects these verbs. Many of the verbs are, however, ambiguous. Based on the extracted verb list, Girju selects sentences from a large corpus that contain such an ambiguous verb, and manually disambiguates the sentences to be included in a training set. She then uses the annotated data to train a decision tree classifier that can be used to classify new instances.
Our approach is similar to hers in that we also use the English verb cause as a seed to identify transitive causal verbs. In contrast to Girju's WordNet-based approach, we use parallel data and project the English tokens to their German counterparts.
Ours is not the first work that exploits parallel or comparable corpora for causality detection. Hidey and McKeown (2016) work with monolingual comparable corpora, English Wikipedia and simple Wikipedia. They use explicit discourse connectives from the PDTB (Prasad et al., 2008) as seed data and identify alternative lexicalizations for causal discourse relations. Versley (2010) classifies German explicit discourse relations without German training data, solely based on the English annotations projected to German via word-aligned parallel text. He also presents a bootstrapping approach for a connective dictionary that relies on distribution-based heuristics on word-aligned German-English text.
Like Versley (2010), most work on identifying causal language for German has been focusing on discourse connectives. Stede et al. (1998;2002) have developed a lexicon of German discourse markers that has been augmented with semantic relations (Scheffler and Stede, 2016). Another resource for German is the TüBa-D/Z that includes annotations for selected discourse connectives, with a small number of causal connectives (Gastel et al., 2011). Bögel et al. (2014) present a rule-based system for identifying eight causal German connectors in spoken multilogs, and the causal relations REASON, RESULT expressed by them.
To the best of our knowledge, ours is the first effort to describe causality in German on a broader scale, not limited to discourse connectives.

Annotation Scheme
Our annotation aims at providing a description of causal events and their participants, similar to FrameNet-style annotations (Ruppenhofer et al., 2006), but at a more coarse-grained level. In FrameNet, we have a high number of different causal frames with detailed descriptions of the actors, agents and entities involved in the event. 1 For instance, FrameNet captures details such as the intentionality of the triggering force, to express whether or not the action was performed volitionally.
In contrast, we target a more generic representation that captures different types of causality, and that allows us to generalize over the different participants and thus makes it feasible to train an automatic system by abstracting away from individual lexical triggers. The advantage of such an approach is greater generalizability and thus higher coverage, the success however remains to be proven. Our annotation scheme includes the following four participant roles: 1. CAUSE -a force, process, event or action that produces an effect 2. EFFECT -the result of the process, event or action 3. ACTOR -an entity that, volitionally or not, triggers the effect 4. AFFECTED -an entity that is affected by the results of the cause Our role set is different from Dunietz et al. (2015) who restrict the annotation of causal arguments to CAUSE and EFFECT. Our motivation for extending the label set is twofold. First, different verbal causal triggers show strong selectional preferences for specific participant roles. Compare, for instance, examples (1) and (2). The two argument slots for the verbal triggers erzeugen (produce) and erleiden (suffer) are filled with different roles. The subject slot for erzeugen expresses either CAUSE or ACTOR and the direct object encodes the EFFECT. For erleiden, on the other hand, the subject typically realises the role of the AFFECTED entity, and we often have the CAUSE or ACTOR encoded as the prepositional object of a durch (by) PP.
(1) Elektromagnetische Electromagnetic Given that there are systematic differences between prototypical properties of the participants (e.g. an ACTOR is usually animate and a sentient being), and also in the way how they combine and select their predicates, we would like to preserve this information and see if we can exploit it when training an automatic system.
In addition to the participants of a causal event, we follow Dunietz et al. (2015) and distinguish four different types of causation (CONSEQUENCE, MOTIVATION, PURPOSE, INFERENCE), and two degrees (FACILITATE, INHIBIT). The degree distinctions are inspired by Wolff et al. (2005) who see causality as a continuum from total prevention to total entailment, and describe this continuum with three categories, namely CAUSE, EN-ABLE and PREVENT. Dunietz et al. (2015) further reduce this inventory to a polar distinction between a positive causal relation (e.g. cause) and a negative one (e.g. prevent), as they observed that human coders were not able to reliably apply the more fine-grained inventories. 2 The examples below illustrate the different types of causation.
(3) Cancer Cause is second only to accidents Cause as a cause of death Effect in children Affected CONSEQUENCE (4) I would like to say a few words in order to highlight two points PURPOSE (5) She must be home Effect because the light is on Cause INFERENCE (6) The decision is made Cause so let us leave the matter there Effect MOTIVATION Epistemic uses of causality are covered by the INFERENCE class while we annotate instances of speech-act causality (7) as MOTIVATION (see Sweetser (1990) for an in-depth discussion on that matter). This is also different from Dunietz et al. (2015) who only deal with causal language, not with causality in the world. We, instead, are also interested in relations that are interpreted as causal by humans, even if they are not strictly expressed as causal by a lexical marker, such as temporal relations or speech-act causality.
(7) And if you want to say no, say no Effect 'Cause there's a million ways to go Cause MOTIVATION A final point that needs to be mentioned is that Dunietz et al. (2015) exclude items such as kill or persuade that incorporate the result (e.g. death) or means (e.g. talk) of causation as part of their meaning. Again, we follow Dunietz et al. and also exclude such cases from our lexicon.
In this work, we focus on verbal triggers of causality. Due to our extraction method (section 4), we are mostly dealing with verbal triggers that are instances of the type CONSEQUENCE. Therefore we cannot say much about the applicability of the different annotation types at this point but will leave this to future work.

Knowledge-lean extraction of causal relations and their participants
We now describe our method for automatically identifying new causal triggers from text, based on parallel corpora. Using English-German parallel data has the advantage that it allows us to use existing lexical resources for English such as WordNet (Miller, 1995) or FrameNet (Ruppenhofer et al., 2006) as seed data for extracting German causal relations. In this work, however, we focus on a knowledge-lean approach where we refrain from using preexisting resources and try to find out how far we can get if we rely on parallel text only. As a trigger, we use the English verb to cause that always has a causal meaning.

Data
The data we use in our experiments come from the English-German part of Europarl corpus (Koehn, 2005). The corpus is aligned on the sentence-level and contains more than 1,9 mio. English-German parallel sentences. We tokenised and parsed the text to obtain dependency trees, using the Stanford parser (Chen and Manning, 2014) for English

Method
Step 1 First, we select all sentences in the corpus that contain a form of the English verb cause. We then restrict our set of candidates to instances of cause where both the subject and the direct object are realised as nouns, as illustrated in example (8).
Starting from these sentences, we filter our candidate set and only keep those sentences that also have German nouns aligned to the English subject and object position. Please note that we do not require that the grammatical function of the German counterparts are also subject and object, only that they are aligned to the English core arguments. We then extract the aligned German noun pairs and use them as seed data for step 2 of the extraction process.
For Figure 1, for example, we would first identify the English subject (gentrification) and direct object (problems), project them to their German nominal counterparts (Gentrifizierung, Problemen), the first one also filling the subject slot but the second one being realised as a prepositional object. We would thus extract the lemma forms for the German noun pair (Gentrifizierung → Problem) and use it for the extraction of causal triggers in step 2 (see Algorithm 1). Step 2: extraction of causal triggers) Step 2 We now have a set of noun pairs that we use to search the monolingual German part of the data and extract all sentences that include one of these noun pairs. We test two settings, the first one being rather restrictive while the second one allows for more variation and thus will probably also extract more noise. We refer to the two settings as strict (setting 1) and loose (setting 2).
In setting 1, we require that the two nouns of each noun pair fill the subject and direct object slot of the same verb. 4 In the second setting, we extract all sentences that include one of the noun pairs, with the restriction that the two nouns have a common ancestor in the dependency tree that is a direct parent of the first noun 5 and not further away from the second noun than three steps up in the tree.
This means that the tree in Figure 1 would be ignored in the first setting, but not for setting 2.
Here we would extract the direct head of the first noun, which will give us the verb führen (lead), and extract up to three ancestors for the second noun. As the second noun, Problem, is attached to the preposition zu (to) (distance 1) which is in turn attached to the verb führen (distance 2), we would consider the example a true positive and extract the verb führen as linking our two nouns.
While the first setting is heavily biased towards transitive verbs that are causal triggers, setting 2 will also detect instances where the causal trigger is a noun, as in (9). In addition, we are also able to find support verb constructions that trigger causality, as in (10). As both the word alignments and the dependency parses have been created automatically, we can expect a certain amount of noise in the data. Furthermore, we also have to deal with translation shifts, i.e. sentences that have a causal meaning in English but not in the German translation. A case in point is example (11) where the English cause has been translated into German by the non-causal stattfinden (take place) (12). Using the approach outlined above, we want to identify new causal triggers to populate the lexicon. We also want to identify causal instances for these triggers for annotation, to be included in our resource. To pursue this goal and to minimize human annotation effort, we are interested in i) how many German causal verbs can be identified using this method, and ii) how many false positives are extracted, i.e. instances that cannot have a causal reading. Both questions have to be evaluated on the type level. In addition, we want to know iii) how many of the extracted candidate sentences are causal instances. This has to be decided on the token level, for each candidate sentence individually.

Results for extracting causal relations from parallel text
Step 1 Using the approach described in section 4.2, we extracted all German noun pairs from Europarl that were linked to two nouns in the English part of the corpus that filled the argument slots of the verb cause. Most of the noun pairs appeared only once, 12 pairs appeared twice, 3 pairs occured 3 times, and the noun pair Hochwasser (floodwater) -Schaden (damage) was the most frequent one with 6 occurrences. In total, we extracted 343 unique German noun pairs from Europarl that we used as seed data to indentify causal triggers in step 2.
We found 45 different verb types that linked these noun pairs, the most frequent one being, unsurprisingly, verursachen (cause) with 147 instances. Also frequent were other direct translations of cause, namely hervorrufen (induce) and auslösen (trigger), both with 31 instances, and anrichten (wreak) with 21 instances. We also found highly ambiguous translations like bringen (bring, 18 instances) and verbs that often appear in support verb constructions, like haben (have, 11 instances), as illustrated below (examples (13) Please note that at this point we do ignore the verbs and only keep the noun pairs, to be used as seed data for the extraction of causal triggers in step 2. From examples (13) and (14) above, we extract the following two noun pairs: Step 2 Using the 343 noun pairs extracted in step 1, we now search the monolingual part of the corpus and extract all sentences that include one of these noun pairs as arguments of the same verb. As a result, we get a list of verbal triggers that potentially have a causal reading. We now report results for the two different settings, strict and loose.
For setting 1, we harvest a list of 68 verb types. We manually filtered the list and removed instances that did not have a causal reading, amongst them most of the instances that occurred only once, such as spielen (play), schweigen (be silent), zugeben (admit), nehmen (take), finden (find).
Some of the false positives are in fact instances of causal particle verbs. In German, the verb particle can be separated from the verb stem. We did consider this for the extraction and contracted verb particles with their corresponding verb stem. However, sometimes the parser failed to assign the correct POS label to the verb particle, which is why we find instances e.g. of richten (rather than: anrichten, wreak), stellen (darstellen, pose), treten (auftreten, occur) in the list of false positives.
After manual filtering, we end up with a rather short list of 22 transitive German verbs with a causal reading for the first setting.
For setting 2 we loosen the constraints for the extraction and obtain a much larger list of 406 unique trigger types. As expected, the list also includes more noise, but is still managable for doing a manual revision in a short period of time. As shown in Table 1, after filtering we obtain a final list of 79 causal triggers, out of which 48 follow the transitive pattern <N1 subj causes N2 dobj > where the subject expresses the cause and the direct object the effect. There seem to be no restrictions on what grammatical function can be expressed by what causal role but we find strong selectional preferences for the individual triggers, at least for the core arguments ( Table 2). The verb verursachen (cause), for example, expresses CAUSE/ACTOR as the subject and EFFECT as the direct object while abhängen (depend) puts the EFFECT in the subject slot and realises the CAUSE as an indirect object. Often additional roles are expressed by a PP or a clausal complement. While many triggers accept either CAUSE or ACTOR to be expressed interchangeably by the same grammatical function, there also exist some triggers that are restricted to one of the roles. Zu Grunde liegen (be at the bottom of), for example, does not accept an ACTOR role as subject. These restrictions will be encoded in the lexicon, to support the annotation.

Annotation and inter-annotator agreement
From our extraction experiments based on parallel corpora (setting 2), we obtained a list of 79 causal triggers to be included in the lexicon. As we also want to have annotated training data to accompany the lexicon, we sampled the data and randomly selected N = 50 sentences for each trigger. 6 We then started to manually annotate the data. The annotation process includes the following two subtasks: 1. Given a trigger in context, does it convey a causal meaning?  What remains to be done is the annotation of the causal type of the instance. As noted above, the reason for postponing this annotation step is that we first wanted to create the lexicon and be confident about the annotation scheme. A complete lexicon entry for each trigger specifying the type (or types and/or constraints) will crucially support the annotation and make it not only more consistent, but also much faster.
So far, we computed inter-annotator agreement on a subsample of our data with 427 instances (and 22 different triggers), to get a first idea of the feasibility of the annotation task. The two annotators are experts in linguistic annotation (the two authors of the paper), but could not use the lexicon to guide their decisions, as this was still under construction at the time of the annotation.
We report agreement for the following two subtasks. The first task concerns the decision whether or not a given trigger is causal. Here the two annotators obtained a percentage agreement of 94.4% and a Fleiss' κ of 0.78.
An error analysis reveals that the first annotator had a stricter interpretation of causality than annotator 2. Both annotators agreed on 352 instances being causal and 51 being non-causal. However, annotator 1 also judged 24 instances as non-causal that had been rated as causal by annotator 2. Many of the disagreements concerned the two verbs bringen (bring) and bedeuten (mean) and were systematic differences that could easily be resolved and documented in the lexicon and annotation guidelines, e.g. the frequent support verb construction in example (15).  For the second task, assigning role labels to the first (N1) and the second noun (N2), it became obvious that annotating the role of the first noun is markedly more difficult than for the second noun ( Table 3). The reason for this is that the Actor-Cause distinction that is relevant to the first noun is not always a trivial one. Here we also observed systematic differences in the annotations that were easy to resolve, mostly concerning the question whether or not organisations such as the European Union, a member state or a comission are to be interpreted as an actor or rather than as a cause.
We think that our preliminary results are promising and confirm the findings of Dunietz et al. (2015), and expect an even higher agreement for the next round of the annotations, where we also can make use of the lexicon.

Discussion
Section 4 has shown the potential of our method for identifying and extracting causal relations from text. The advantage of our approach is that we do not depend on the existence of precompiled knowledge bases but rely on automatically preprocessed parallel text only. Our method is able to detect causal patterns across different parts of speech. Using a strong causal trigger and further constraints for the extraction, such as restricting the candidate set to sentences that have a subject and direct object NP that is linked to the target predicate, we are able to guide the extraction towards instances that, to a large degree, are in fact causal. In comparison, Girju reported a ratio of 0.32 causal sentences (2,101 out of 6,523 instances) while our method yields a ratio of 0.74 (787 causal instances out of 1069). Unfortunately, this also reduces the variation in trigger types and  is thus not a suitable method for creating a representative training set. We address this problem by loosening the constraints for the extraction, which allows us to detect a high variety of causal expressions, at a reasonable cost. Our approach, using bilingual data, provides us with a natural environment for bootstrapping. We can now use the already known noun pairs as seed data, extract similar nouns to expand our seed set, and use the expanded set to find new causal expressions. We will explore this in our final experiment.

Bootstrapping causal relations
In this section, we want to generalise over the noun pairs that we extracted in the first step of the extraction process. For instance, given the noun pair {smoking, cancer}, we would also like to search for noun pairs expressing a similar relation, such as {alcohol, health problems} or {drugs, suffer-ing}. Accordingly, we call this third setting boost. Sticking to our knowledge-lean approach, we do not make use of resources such as WordNet or FrameNet, but instead use word embeddings to identify similar words. 8 For each noun pair in our list, we compute cosine similarity to all words in the embeddings and extract the 10 most similar words for each noun of the pair. We use a lemma dictionary extracted from the TüBa-D/Z treebank (release 10.0) (Telljohann et al., 2015) to look up the lemma forms for each word, and ignore all words that are not listed as a noun in our dictionary. Table 4 shows the 10 words in the embedding file that have the highest similarity to the target noun Unsicherheit (uncertainty). To minimise noise, we also set a threshold of 0.75 and exclude all words with a cosine similarity below that score. Having expanded our list, we now create new noun pairs by combining noun N1 with all similar words for N2, and N2 with all similar words for N1. 9 We then proceed as usual and use the new, expanded noun pair list to extract new causal triggers the same way as in the loose setting. As we want to find new triggers that have not already been included in the lexicon, we discard all verb types that are already listed.
Using our expanded noun pair list for extracting causal triggers, we obtain 131 candidate instances for manual inspection. As before, we remove false positives due to translation shifts and to noise and are able to identify 21 new instances of causal triggers, resulting in a total number of 100 German verbal triggers to be included in the lexicon (Table 1).

Conclusions and Future Work
We have presented a first effort to create a resource for describing German causal language, including a lexicon as well as an annotated training suite. We use a simple yet highly efficient method to detect new causal triggers, based on English-German parallel data. Our approach is knowledge-lean and succeeded in identifying and extracting 100 different types for causal verbal triggers, with only a small amount of human supervision.
Our approach offers several avenues for future work. One straightforward extension is to use other English causal triggers like nouns, prepositions, discourse connectives or causal multiword expressions, to detect German causal triggers with different parts of speech. We would also like to further exploit the bootstrapping setting, by projecting the German triggers back to English, extracting new noun pairs, and going back to German again. Another interesting setup is triangulation, where we would include a third language as a pivot to harvest new causal triggers. The intuition behind this approach is, that if a causal trigger in the source language is aligned to a word in the pivot language, and that again is aligned to a word in the target language, then it is likely that the aligned token in the target language is also causal. Such a setting gives us grounds for generalisations while, at the same time, offering the opportunity to formulate constraints and filter out noise.