Bridging Relations in Polish: Adaptation of Existing Typologies

The paper attempts at presenting initial verification of existing approaches to annotation of bridging relations by proposing a compiled model based on schemata used in previous annotation projects and testing its validity on a corpus of Polish. The categorization features structural relations, dissimilation, analogy , reference to label, class, entailment and attribution. Multiple categories can be assigned to model situations where several aspects of the relation play a part. The relations are organized hierarchically which allows varied granularity of processing depending on computational needs. The classification is confronted with existing annotation of other-than-identity relations in a portion of Polish Coreference Corpus. Results of manual annotation involving two annotators and adjudicator are presented. Findings from the process are intended to facilitate development of annotation guidelines of a new reference-related project.


Introduction
The term bridging (bridging anaphora, indirect anaphora, associative anaphora) refers to relations between non-coreferential expressions that influence the text coherence. In most cases these expressions are nominal (and we will limit our analysis to such cases in this paper), although bridging between events can be also distinguished.
In the article we attempt to compile the existing taxonomies of bridging relations into a common model, validate it on corpus data and present findings from the process which are planned to help develop annotation guidelines for the new project involving annotation of referential relations in Polish.

Related Work
Clark's classic classification of indirect implicature (Clark, 1975) lists set membership, indirect reference by association (necessary/probable/inducible parts) indirect reference by characterization (necessary/optional roles), reason, cause, consequence and concurrence.
Poesio and Artstein's annotation scheme for AR-RAU (Poesio and Artstein, 2008) allows part-of, set-membership and converse relation, which probably results from successful annotation of such limited number of relations in GNOME (Poesio, 2000) and VENEX corpora (Poesio et al., 2004). The solution is similar to Recasens' annotation in CESS-ECE corpus (Recasens et al., 2007), using 3 basic relations and rest type with no further subtype specification.
Greek Coreference and Bridging Team's annotation guidelines (GCBT: Greek Coreference & Bridging Team, 2014) use contrast, possession-owner, two predicate relations, entity-property and objectfunction apart from traditional set-subset and partwhole relations. Other relations (spatial, temporal, generic-specific, thematic or situational association) are represented as rest.

Compilation of Typology of Bridging Relations
The proposed initial classification unifying existing approaches is depicted in Figure 1. Each main branch represents the intended relation type; leaf relations are specified as examples only.

Metareference
The relation allows to model relations such as hasmodel, has-name or has-label. This covers e.g. PDT's meta-linguistic reference, a subtype of noncospecifying anaphoric relation.
'I was yesterday in a restaurant called "Delicious Fish" but I didn't like their fish at all.'

Class
Class-instance relation, for some seen as of privileged nature, is represented similarly to standard part-whole or set-member, so reference between class and instance can be modelled in a unified manner.
(2) Kobiety mają prawo do takiej wolności. Dlatego dobrze, by Ewa przekonała się,że nie wszystko musi być tak, jak było w rodzinnym domu. 'Women have the right to such freedom. It is all right then for Eve to get convinced that not everything must be as it was in her family home.'

Temporal Relation
Temporal relation will be used to represent nearidentical temporal aspects of the object (e.g. 'prewar Warsaw' and 'Warsaw of today'). Note that traditional temporal expressions such as anaphoric references to the time when the antecedent situation takes place (e.g. 'this time' and an event; a subtype of PDT's non-cospecifying anaphoric relation) will not be marked as temporal bridging relations (due to nominal intention of the current typology).

Structural Relation
Structural or meronymic relations are probably the least controversial part of the taxonomy, starting with Clark's necessary/probable/inducible parts through standard aggregation (set-subset, setelement) and composition (whole-part) to relations introducing inseparability such as whole-portion (also called segment, e.g. 'cake/slice') or wholesubstance (e.g. 'cake/flour'). A ready-to-use subclassification of meronymic relations can be found e.g. in (Winston et al., 1987).

Functional Relation
The basic function-object relation (as e.g. in PDT), causal relations from literature, Clark's necessary/optional role and Gardent's thematic relation can be interpreted as functional relations. Most PDT's 'other' relations such as location-resident, event-argument or author-work are also regarded as functional.
Clark's indirect reference by characterization also falls into this category, though it is mostly used for events and not objects.
The most interesting aspect of the functional relation is its correspondence with Recasens' nearidentity (Recasens et al., 2012). In our opinion such weak near-identity cases as representation (e.g. between a manuscript and its content printed in a book) should be modelled as functional relations.

Analogical Relation
Both similarity relations (signaled by such as etc.) and contrast relations are intended to be marked as analogical.

Attribution
Attribution is a type introduced to represent relations between an object and someone's opinion on the object (i.e., what is believed, doubted etc.) or indicate incomplete certainty about the nature of identity between two mentions. In most projects this relation is annotated as coreference, but in general case (e.g., when several clashing opinions are represented in one discourse) such approach seems to be inappropriate.

From Quasi-identity To Bridging Relations
The proposed classification was initially validated on the Polish Coreference Corpus (Ogrodniczuk et al., 2015, chapter 8). During its annotation, apart from marking direct identity-of-reference, annotators were asked to identify 'quasi-identity' relations, i.e. relations distorting or distinguishing properties of an object, metaphorical relations between substance and container, set-element relations and other relations not characterized by identity or nonidentity. Over 5100 instances of such relations were marked, making a useful resource for corpus-based investigation of bridging.

Preliminary Corpus-based Verification
Randomly selected 5%, i.e. 255 relations, were reviewed to provide material for evaluation of the proposed taxonomy. The process was carried out by two annotators previously involved in classification of quasi-identity relations in the Polish Coreference Corpus. Cases incompatible with the current proposal of the typology were marked as 'other' with three subtypes: 1) coreference, for cases where original annotators of the Polish Coreference Corpus set quasi-identity type to a direct coreferential relation by mistake, 2) predicate, where relation was used to link mention with a predicate noun, and 3) error, for cases when no relation could be identified reasonably.
The results of this experiment are presented in Table 1. The annotation agreement was 0.50 (Cohen's κ = 0.36) which indicates that the typology is not precise enough to be used efficiently in practice.
The prevailing share of structural relations (60%) is compatible with Gardent's findings (Gardent et al., 2003, Figure 5) where 52% of the investigated relations were of meronymic type.

Error Analysis
The probable causes of divergence in the annotation are: 1) too extensive annotation categories, 2) too vague definition of some categories, 3) too many non-classified phenomena, 4) confusion of the coreference, near-identity and other semantic relations. Some categories distinguished at the beginning turned out to be too extensive. Almost 44% examples were classified as of the set category. On the other hand, this category includes very diverse examples which calls for its division into subcategories in the prospective annotation.
The definitions of the predicative and attribution classes were not clear enough which led to confusion. Other difficult pairs were: class and set, class and function, class and meta.
In the proposed classification the category other was included for all doubtful examples. The annotations had shown that too many examples were classified as other and that there are quite distinct categories like: causality, connection of content or dissimilation.
In some cases making the distinction between definiteness and indefiniteness is virtually impossible. For example, when previous part of the text includes information on a merger of companies A and B and then someone comments that the idea of a merger of companies is cost-justified, it depends on interpretation whether it refers to this particular merger (and in such case makes an composition relation between companies and A) or it refers to a general statement which makes A an instance of companies referred to in the subsequent statement. Such cases are a frequent cause of disagreement in our annotation.
The data shows numerous coreferential links which are reported as other since only noncoreferential relations should be present in the annotated set. This can be explained with problems related to distinguishing other-than-coreferential re-lations from different linguistic means of expressing proper coreference, particularly in the initial phase of the annotation. A common observed mistake was treating mentions from indirect speech as noncoreferential with their direct speech equivalentsdespite their identical reference targets.
Functional category calls for subclassification; several cases were commented as being best defined by WordNet's entailment relation (e.g. to sleep is entailed by to snore); a few others were marked as metonymy (e.g. Ottawa meaning Canada, also confused with a simple part-whole relation).
Temporal category needs to be confronted with Recasens' near-identity which defines more aspects of dissimilation. Figure 2 presents the revised version of the typology of bridging relations based on findings from the annotation process. Contextual dissimilation can be used in cases when different realization or representation is being referred to in the process of refocusing (Fauconnier, 1994); entailment is mostly effect which corresponds to reason-cause relation (waroccupation, manure-smell, competition-result etc.) while function groups general role-casting relations such as place-inhabitant, writer-work etc.

The Revised Model
Within the most coarse-grained and abundantly represented aggregation subclass several evident subcategories were identified: collection, group and hyponymy-hypernymy. Collections are ad hoc sets of generally unrelated objects, e.g. shopping items while elements of a group are related, e.g. members of the same organization. Hyponyms are collections of objects related by a common hypernym (e.g. animals vs. monkeys, elephants etc.) Table 2 presents statistics of different relations ob-20 served in the analyzed set (after adjudication and conversion of annotation results to the new typology).  Table 2: Post-adjudication statistics of bridging relations.

Transitivity of Facets
An important aspect of referential associations which does not seem to be covered by existing approaches is transitivity of basic relations, i.e. ability to maintain a more distant but still decodable relation than just atomic link between a pair of referents.
To illustrate the case, Example 8 shows a mixture of aggregation and composition: the link between a set and part of one element in the set is clear to understand yet reasonably complex: my sons → my son → my son's broken leg. Example 9 shows a similar mixture of functional relation and attribution.

Conclusions
The presented unified classification of bridging relations intends to be an initial step towards annotation of referential relations on a larger scale. The typology covers only relations available in existing models and preliminarily annotated data but several other aspects of referentiality should be verified against the corpus, e.g. the issue of definiteness, negation or natural ambiguity.
The experiment confirmed that clear identification of types of bridging relations is a difficult task, particularly when fine-grained distinctions are introduced. This leads to conclusion that shallow semantics is probably insufficient to describe such a complex phenomenon as reference. A new annotation guidelines taking into account discourse structure, lexical-semantic models and extra-linguistic knowledge are currently under preparation.