Experiments on bridging across languages and genres

In this paper, we introduce a typology of bridging relations applicable to multiple languages and genres. After discussing our annotation guidelines, we describe annotation experiments on the German part of our parallel coreference corpus and show that our inter-annotator agreement results are reliable, considering both antecedent selection and relation assignment. In order to validate our theoretical model on other languages, we manually transfer German annotations to the English and Russian sides of the corpus and brieﬂy discuss ﬁrst results that suggest the promise of our approach. Furthermore, for the complete exploration of extended coreference relations, we exploit an existing near-identity scheme to augment our annotations with near-identity links, and we report on the results.


Introduction
High-quality coreference resolution is necessary to establish coherence in discourse. In comparison to recent large-scale annotation efforts for identity coreference such as OntoNotes (Hovy et al., 2006), it is now becoming more interesting to investigate understudied coreference relations other than identity -namely, near-identity and bridging.
Bridging relations are indirect relations that can only be inferred based on the knowledge shared by the speaker and the listener. They encompass a wide range of relations between anaphor and antecedent, such as part-whole, or set membership. Additional complexity arises when two expressions refer to "almost" the same thing, but are neither identical nor non-identical. In this case, we speak of near-identity, which can be seen as a 'middle ground' between identity and non-identity coreference (Recasens et al., 2010).
The goals of the paper are: (i) to introduce a typology of extended coreference relations based on the related work and experimental annotation rounds; (ii) to validate our theoretical model by applying it to a multilingual and multi-genre corpus; and (iii) to explore the existing near-identity scheme using the same dataset. Our primary interest lies in developing a domain-independent typology that would serve as a basis for subsequent creation of larger annotated resources for different languages and domains.
The paper is organized as follows: Section 2 summarizes previous efforts of classifying bridging and near-identity relations. Section 3 presents our corpus annotation in detail. Section 4 discusses the results, and Section 5 concludes.

Previous annotation efforts
Bridging. The concept of bridging was initially introduced by Clark (1975) who postulated that a definite description can be implicitly related to some previously mentioned entity. Clark makes a distinction between direct reference and indirect reference.
Direct reference is what we usually understand by identity coreference, when two NPs share the same referent in the real world. 1 What we are interested in (and what is called 'bridging' in the coreference literature, as opposed to the identity relation) is indi-rect reference. Clark names 3 classes of indirect reference: (i) indirect reference by association, (ii) indirect reference by characterization, and (iii) a separate group encompassing reasons, causes, consequences and concurrences.
Since we only deal with noun phrase coreference for the time being, we can not make use of the last group, as the antecedent in that case is often an event, not an object. The first two groups have much in common: they are subdivided into necessary and optional parts and roles respectively, e.g.: ( The difference between the two examples is that in (1a) the attackers is an absolutely necessary role of the mentioned event, while from (1b) we can infer that the office has one windowsill (which is not necessarily true for all the offices). Necessary and optional components of entities or events vary in their predictability by the listener from absolutely necessary to quite unnecessary (Clark lists three levels of 'necessity' of this continuum).
The recent approaches to the annotation of bridging derive from two different annotation frameworks. First, bridging can be annotated as a part of the information structure (IS) of texts, along with other information status categories. Second, bridging can be seen as a separate category of textual coreference, besides identity and near-identity coreference. We will deal with bridging on the coreference level, but we consider both approaches in the review of the related work.
Bridging at the IS level. Bridging is an individual subcategory among other categories of information status, as introduced in the work of Nissim et al. (2004), subsequently enhanced and applied by Gardent et al. (2003), Ritz et al. (2008), Riester et al. (2010) and Markert et al. (2012). Usually the results are reported on the entire scheme and are somewhat lower for the single categories. To our knowledge, the highest agreement for the bridging anaphor recognition in particular (κ = 0.6-0.7) was reported by Markert et al. (2012), whose interpretation of bridging is to some extent different from the others (they do not restrict the annotation scope to definite noun phrases, allowing indefinite NPs to participate in bridging relations as well). However, all these approaches treat the bridging category as a whole, not making any distinctions between individual subcategories. For our purposes here, this is a more challenging task and the one we are primarily interested in.
Bridging at the coreference level. Recent related literature distinguishes between the following most common types of bridging relations: part-whole, set membership and generalized possession (Poesio et al., 2004), (Poesio and Artstein, 2008), (Hinrichs et al., 2005). In addition to these, in the Prague Dependency Treebank, contrast was annotated as a bridging relation as well (Nedoluzhko et al., 2009). Baumann and Riester (2012) additionally annotated cases of bridging-contained NPs, where the bridging anaphor is anchored to an embedded phrase, e.g.
[the ceiling of [the hotel room]]. However, these relations seem to be underspecified in the sense that part-whole is a very general relation; in contrast, we are interested in a more fine-grained classification of relations that could emerge from part-whole.
More specific relations are proposed in NLP approaches to extract bridging automatically. For example, a more complex and detailed classification of bridging relations was introduced in (Gardent et al., 2003) who distinguished between 5 classes of bridging relations: set-membership, thematic (links an event to an individual via a thematic relation defined by the thematic grid of the event, e.g. murder -the murderer), definitional (relation is given by the dictionary definition of either the target or the anchor, e.g. convalescence -the operation), co-participants, and non-lexical (relation could be established due to discourse structure or world knowledge).
For developing a rule-based system to resolve bridging, Hou et al. (2014) used 8 relations that were based on related literature and their document set, which comprises 10 documents from the ISNotes Corpus 2 , which contains the Wall Street Journal portion of the OntoNotes corpus (Hovy et al., 2006): building -part (room -the roof ), relative -person (the husband -she), geopolitical entity -job title (Japan -officials), role -organization, percentage NP (22% of the firms -17%), set -member (reds and yellows -some of them), argument taking NP I (different instances of the same predicate in a document likely maintain the same argument fillers; Marina residents -some residents), argument taking NP II (an argument-taking NP in the subject position is a good indicator for bridging anaphora, Poland's first conference -the participants).
Bridging was shown to be a very complex category that poses difficulties for the annotators. It includes the following subtasks: (a) recognizing bridging anaphors and selecting their antecedents, and (b) assigning appropriate bridging types. In general, inter-annotator agreement for (a) tends to be lower than for standard identity coreference; the scores vary between 22 and 50% F1-score for selecting bridging anaphors and antecedents (Poesio and Vieira, 1998), (Poesio, 2004), (Nedoluzhko et al., 2009). As for types of relations, not much was reported lately. To our knowledge, only Nedoluzhko et al. (2009) reported on the scores for four basic relation types (average κ = 0.9). However, we are not aware of any other agreement studies for more complex relation sets.
In sum, corpus creation approaches to bridging classification are quite coarse-grained, while applied work (bridging resolution) tends to be very domainspecific. Both paths are rather problematic if we want to create reliable multi-genre annotated resources with a fine-grained classification of bridging relations.
Near-identity. The concept of near-identity has been introduced by Recasens et al. (2010). The near-identity relation is defined as a middle-ground between identity and bridging, and it emerged out of the inter-annotator disagreements while annotating identity coreference. Near-identity holds between two NPs whose referents are almost identical, but differ in one crucial dimension. Recasens et al. (2010) introduce four main categories of nearidentity relations: • name metonymy; • meronymy; • class; • spatio-temporal function. Each of the categories includes several subcategories (not mentioned in the list above). To our knowledge, no large-scale near-identity annotation on different text genres has been done so far. Recasens et al. (2010) reported the results of their stability study only for pre-selected NP pairs. In a follow-up paper, Recasens et al. (2012) showed that explicit near-identity annotation is a very difficult task for the annotators, due to the infrequency of the near-identity links in their corpus of newswire texts, as identified by the annotators. The same annotation scheme was subsequently applied to annotate the Polish Coreference Corpus by Ogrodniczuk et al. (2014), however, the inter-annotator agreement scores were quite low (κ = 0.22).

Corpus annotation
For the annotation, we used the parallel coreference corpus from (Grishina and Stede, 2015) which consists of texts in three languages (English, German, Russian) and of three different genres (newswire, narratives, medicine instruction leaflets). The German part of the corpus, which already contained identity coreference annotations, was given to the annotators to add bridging and near-identity links.
In order to evaluate the applicability of our annotation scheme for other languages and to speed up the annotation process, we transferred the German annotations to the English and Russian sides of the corpus.
Corpus statistics are shown in Table 1. In this section, we present statistics for German, including the number of identity, near-identity and bridging links. Details on the annotation transfer for the two other languages are provided in Section 4.

Bridging scheme
We base our work on the main principle identified by Clark (1975): We assume that the speaker intends the listener to be able to compute the shortest possible bridge from the previous knowledge to the antecedent which is therefore unique (determinate) in the natural language discourse.
Hence, only definite descriptions can be annotated as bridging anaphors. However, not all the definite descriptions that appear in a text for the first time have a bridging antecedent -some of them are definite due to the common knowledge shared by the speaker and the listener.
In our pilot experiments, we identified several bridging categories, which were common across genres, and applied them to annotate the corpus. Below, we describe these categories and give typical examples from different genres for each of them.

Physical parts -whole
One NP represents a physical part of the whole expressed by the other NP.
• the militant organisation -the offices in the whole country • the telephone -the dial pad • the knee -the bone

Set-membership
Sets can be represented by multiple entities or events. One can refer to a certain subset or to a single definite element of the set and bridge from this subset or element to the whole collection. We do not distinguish between sets and collections, as is done in some of the related work. Sets are homogeneous and imply that their elements are equal. • these studies -the main study • Pakistan major cities -the most populous city

Entity-attribute/function
An entity is a person or an object that has certain attributes characterizing it and certain functions it fulfills with respect to some other entity.
A. ENTITY-ATTRIBUTE • Kosovo -their current policy of rejection • Mrs. Humphries -the monotonous voice

B. ENTITY-FUNCTION
This relation involves a bridge holding between individuals with one of the related individuals being described by his profession or function with respect to the other (Gardent et al., 2003).
• Trends, the shop -Mr. Rangee, the owner • Kosovo region -the government

Event-attribute
Core semantic frame elements of events are commonly time and place, while optional ones can include duration, participants, explanation, frequency etc. From these frame elements one can bridge to the event itself.
• the regional conflict -the trained fighters • the attack -the security offices • the surgical intervention -the operating room

Location-attribute
As locations we consider geographical entities that have permanent locations in the world. Such locations exhibit different semantic frames as compared to entities and events.
• the Balkans -the instability on the Balkans • Germany -in the south • Afghanistan -the population

Other
Other bridging relations (if any), that can not be described using the categories presented above.
Bridging and near-identity relations are generally directed from right to left. Each markable can have only one outgoing relation, but multiple ingoing relations are allowed. Cataphoric bridging and near-identity relations (directed from left to right) are allowed if the cataphoric antecedent is semantically closer to the anaphor than the possible anaphoric antecedent. Following (Baumann and Riester, 2012), we annotated BRIDGING-CONTAINED NPs and marked them as such.

Near-identity scheme
We used the definitions provided by Recasens et al. (2010) and made an attempt to apply them to our texts. The annotators' goal was to extend existing annotations on top of the identity coreference. We only chose the four top categories mentioned in Section 2, without distinguishing among their subtypes. In order to differentiate between the category of meronymy, which is common for both near-identity and bridging 3 , we introduced the principle of primacy, according to which, in case of doubt, identity was preferred over near-identity and near-identity over bridging. However, the annotations of our corpus exhibited a small number of near-identical markables, which was not sufficient to compute inter-annotator agreement. For that reason, we merged the annotations from the first and the second annotator and then analysed their distribution according to the near-identity types across genres in Section 4. It is worth pointing out that our results for a multi-genre corpus conform to the results obtained by Recasens et al. (2012).

Bridging agreement study
We carried out an agreement study with 2 annotators -students of linguistics, native speakers of German, with prior experience in other types of corpus annotation tasks. All the markables in the texts were manually pre-selected by the author of this paper. The annotation guidelines were developed on 7 training documents, and 4 of them were given to the annotators for training. During the pilot annotation round, the annotators discussed the disagreements, and necessary changes to the guidelines were made. Inter-annotator agreement was measured on 5 doc-    Table 2 shows the distribution of the types of relations for the first (A1) and the second annotator (A2). We measured (i) F-1 score for anaphor recognition (the number of common bridging anaphors) and antecedent selection (the number of common anaphor-antecedent pairs based on the commonly recognized markables) and (ii) Cohen's κ for individual categories for those pairs that both annotators agreed upon. Table 3 shows agreement results, which we consider as overall reliable for bridging when compared to related work on extended coreference. We were able to achieve even higher agreement scores on bridging categories (average κ = 0.98), introducing a wider range of relations than Nedoluzhko et al. (2009). We do not give an agreement score for set-membership, the reason for that being data scarcity and the preference of A1 towards other relations: A1 marked only about 0.1% of all bridging pairs as set-membership, and did not agree on antecedent selection with A2 for any of them, therefore it was not possible to measure agreement for this category. 4 Table 4 shows the distribution of types for those pairs that were labelled differently by both annotators. The most controversial category is entityattribute/function, which correlates with this cate-  To answer this question, we first looked at the number of bridging anaphors that actually start a new coreference chain further in the text. On average for all the texts, only 17% of all the bridging anaphors are being referred to later on. These chains are on average 3.28 markables long, which is 1 markable shorter than the average length of coreference chains in the corpus (4.05). The most frequent relation that starts a new chain is entity-attribute/function (44%), followed by location-attribute (21%) and event-attribute (18%). Secondly, we were interested in whether bridging markables correlate with the prominent coreference chains in the text. Our study showed that 56% of all the chains have bridging markables connected to them. We computed the average lengths of a target chain and a non-target chain for bridging, which is 6.1 markables and 2.4 markables, respectively. These numbers show that a target 'bridging' chain is usually longer than an average chain in the text (see above) while a 'non-bridging' chain is shorter. The longest 'bridging' chain can reach up to 22 markables, while the longest 'non-bridging' chain can only reach up to 9 markables.
We computed the correlation between the length of identity chain and the number of bridging markables that are linked to this chain. Using Spearman's rank correlation coefficient, we found that there is a strong correlation between the chain length and  Figure 1 shows the relation between the chain length and the number of its bridging markables.

How far can we bridge in the natural text?
Our guidelines do not limit the scope of the study at any point, allowing annotators to bridge back over an unlimited number of sentences if they find the antecedent semantically close to the anaphor. However, we postulated several principles in order to set priorities and help annotators resolve controversial issues, one of them being the principle of SEMAN-TIC RELATEDNESS: in case of multiple antecedent candidates, pick the one that is more semantically related to the anaphoric (or cataphoric) markable. This principle wins over the principle of PROXIMITY, according to which one has to bridge to the nearest semantically close antecedent in the text. For example: (2) [The telephone] rang. I came into [the office] and picked up [the receiver].
In this case, we link the telephone to the office and the receiver to the telephone (because it is more sematically close), and not to the office (that is a closer possible antecedent).
We computed the average bridging distance (anaphora + cataphora), which is 20.55 tokens for all texts, 5 with the average sentence length being 24.87. The average distances for anaphora and cataphora, if computed separately, are 30.96 and -3.6 tokens, respectively. It is worth noticing that the  furthest bridging antecedent was found 410 tokens away from its anaphor. Finally, our study has shown that distance does not seem to correlate with prominence: Both longer and shorter chains can have close and long-distance bridging anaphors.

How transferable is bridging across languages and genres?
One of the main goals of our study was to introduce the classification of relations that could be applied to various languages and domains. In the following, we present the results of (a) analysis of bridging and near-identity distribution across different genres and (b) results of the experiment on manual transfer of German annotations into English and Russian. Different genres. Table 5 shows the percentage of near-identity and bridging in the German part of the corpus. Interestingly, all of the genres exhibit a big proportion of entity-attribute/function relations. However, in the newswire texts, other relations are almost equally distributed, as opposed to the medicine leaflets and the narratives. In narratives, we encountered a lot more part-whole relations than in the other genres.
As for near-identity, it is worth noticing that the annotations of medical texts exhibited a very high percentage (71.43) of spatio-temporal relations, the reason for that being the specificity of the texts (instruction leaflets). In narratives, we only found metonymic relations, while medical texts did not contain them. In the newswire texts, all types of relations were found, with meronymy being the most common one (76.32).  Different languages. Taking German annotations as a starting point, we annotated the English and Russian sides of our parallel corpus. Table 6 shows the distribution of different types of relations for German, English and Russian. 6 The resulting number of bridging anaphors for the English and Russian sides of the corpus is 188 each, which is about 44% of the total number of German bridging markables.
This 'transfer' of annotations across languages posed additional difficulties in some cases. In particular, it was more difficult to transfer existing German annotations across newswire texts, while for the stories, all of the markables were successfully transferred. The majority of the NPs that could not be transferred is explained by two reasons: (a) due to our restriction on the definiteness status of bridging markables; and (b) because they were already participating in identity chains. Below, we give examples for the first case in English and German: ( In this example, we bridge from das Einkommen to den USA, however, in the English part income is indefinite and thus it is no bridging markable according to our guidelines. For Russian, the lack of articles impeded the identification of bridging markables and made the deci-sion on their definiteness much more complex. We applied the following strategy in doubtful cases in order to identify bridging markables: We used a substitution test, replacing the NP in question with the corresponding genitive NP. If the test succeeded, we considered the markable as a bridging anaphor, otherwise the markable was not annotated. For example: In this example, the door in English is definitely unique, while in Russian we need to apply our test first: дверь офиса (the door of the office) is appropriate in this case, hence there is a bridging relation between the two NPs.
The analysis of the resulting annotations has shown that our guidelines are in general applicable to the three languages in our corpus; even though there are some differences across languages and genres that we will investigate in more detail. In particular, the category of entity-attribute/function requires a more careful analysis.

Conclusions
The focus of this study was to explore extended coreference relations, namely near-identity and bridging. Our primary goal was to introduce a domain-independent typology of bridging relations, which can be applicable across languages. We subsequently applied our annotation scheme to a multilingual coreference corpus of three genres, and for near-identity relations we use the typology introduced in the related work. Our scheme achieves reliable inter-annotator agreement scores for anaphor and antecedent selection, and on the assignment of bridging relations. The infrequency of near-identity relations in our corpus leaves this part as a step for the future work. We conducted a detailed analysis of the nature of bridging relations in the corpus, focusing on the distance between anaphor and antecedent. Furthermore, we examined the correlation between bridging and identity coreference and presented the distribution of bridging and near-identity relations across three different languages and genres.
In future work, we are interested in refining our typology by introducing a set of possible subrelations, conducting a more detailed comparative analysis of bridging relations across languages using annotation transfer, and exploring in detail set-membership relation and the category of nearidentity on a larger amount of texts. We intend to reconsider the definition of markables in our guidelines (which probably has to vary from language to language), which was one of the main reasons for markables being missed in the annotation transfer. We aim at keeping our approaches applicable to multilingual data and to different genres of text.
Our annotation guidelines and the annotated corpus will be made available via our website http://angcl.ling.uni-potsdam.de.