Semantically Constrained Multilayer Annotation: The Case of Coreference

We propose a coreference annotation scheme as a layer on top of the Universal Conceptual Cognitive Annotation foundational layer, treating units in predicate-argument structure as a basis for entity and event mentions. We argue that this allows coreference annotators to sidestep some of the challenges faced in other schemes, which do not enforce consistency with predicate-argument structure and vary widely in what kinds of mentions they annotate and how. The proposed approach is examined with a pilot annotation study and compared with annotations from other schemes.


Introduction
Unlike some NLP tasks, coreference resolution lacks an agreed-upon standard for annotation and evaluation (Poesio et al., 2016). It has been approached using a multitude of different markup schemas, and the several evaluation metrics commonly used (Pradhan et al., 2014) are controversial (Moosavi and Strube, 2016). In particular, these schemas use divergent and often (languagespecific) syntactic criteria for defining candidate mentions in text. This includes the questions of whether to annotate entity and/or event coreference, whether to include singletons, and how to identify the precise span of complex mentions. Recognition of this limitation in the field has recently prompted the Universal Coreference initiative, 1 which aims to settle on a single cross-linguistically applicable annotation standard.
We think that many issues stem from the common practice of creating mention annotations from scratch on the raw or tokenized text, and we suggest that they could be overcome by reusing structures from existing semantic annotation, thereby ensuring compatibility between the layers. We advocate * Contact: jakob@cs.georgetown.edu 1 https://sites.google.com/view/crac2019/ for the design pattern of a semantic foundational layer, which defines a basic semantic structure that additional layers can refine or make reference to. Some form of predicate-argument structure involving entities and propositions should serve as a natural semantic foundation for a layer that groups coreferring entity and event mentions into clusters.
Here we argue that Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013) is an ideal choice, as it defines a foundational layer of predicate-argument structure whose main design principles are cross-linguistic applicability and fast annotatability by non-experts. To that end, we develop and pilot a new layer for UCCA which adds coreference information. 2 This coreference layer is constrained by the spans already specified in the foundational predicate-argument layer. We compare these manual annotations to existing gold coreference annotations in multiple frameworks, finding a healthy level of overlap.
Our contributions are: • A discussion of multilayer design principles informed by existing semantically annotated corpora ( §2). • A semantically-based framework for mention identification and coreference resolution as a layer of UCCA ( §3). Reusing UCCA units as mentions facilitates efficient and consistent multilayer annotation. We call the framework Universal Conceptual Cognitive Coreference (UCoref). • An in-depth comparison to three other coreference frameworks based on annotation guidelines ( §4) and a pilot English dataset ( §5).  (boxes). The context is that the speaker is posting a message to a forum in which she shares her own fears and asks for advice; you is coreferent with anyone else, and them refers back to the whole first scene. 3 Circled nodes indicate semantic heads/minimal spans, as determined by following State (S) and Center (C) edges. In the third sentence, Advice please!, the addressee/adviser is a salient, but implicit Participant (A) which is expressed with a remote (dashed) edge to a prior mention. Remaining categories are abbreviated as: H -Parallel Scene, P -Process, E -Elaborator, D -Adverbial, F -Function.

Background and Motivation
We first consider the organization of semantic annotations in corpora, arguing that UCCA's representation of predicate-argument structure should serve as a foundation for coreference annotations.

Approaches to Semantic Multilayering
A major consideration in the design of coreference annotation schemes, as well as meaning representations generally, is what the relevant annotation targets are and whether they should be normalized across layers when the text is annotated for multiple aspects of linguistic structure. Should coreference be annotated completely independently of decisions about syntactic phrases and semantic predicate-argument structures? On the one hand, this decoupling of the annotations might absolve the coreference annotators from having to worry about other annotation conventions in the corpus. On the other hand, this is potentially a recipe for 3 Following UCCA's philosophy, we interpret both fears and them mainly as evoking the emotional state of having fears (i.e., "how did you get over them" ≈ "how did you get over being afraid"). This analysis abstracts away from the more direct reading as the specific objects of fear; but either way, the proper semantic head of the first sentence has to be fears (not have), and from our flexible minimum/maximum span policy it follows that any mention coreferring with fears automatically corefers with the whole scene.
Further, we interpret both anyone else and you as referring to the unknown-sized set of audience members sharing the speaker's fears. Whereas you introduces a presupposition that this set is non-empty, this is not the case for the negative polarity item anyone else. Although questionable in terms of cohesion (as the presupposition created by you fails if the answer to the first question is 'no'), this is a typical phenomenon in conversational data and can be explained by recognizing that the second question is implicitly conditional: "If so, how did you get over them?" inconsistent annotations across layers, making it more difficult to integrate information across layers for complex reasoning in natural language understanding systems. Moreover, certain details of coreference annotation may be underdetermined such that relying on other layers would save coreference annotators and guidelines-developers from having to reinvent the wheel.
We can examine existing semantic annotation schemes with regard to two closely related criteria: a) anchoring, i.e. the previously determined underlying structure (characters, tokens, syntax, etc.) that defines the set of possible annotation targets in a new layer; and b) modularity, the extent to which multiple kinds of information are expressed as separate (possibly linked) structures/layers, which may be annotated in different phases. Massively multilayer corpora. A few corpora comprise several layers of annotation, including semantics, with an emphasis on modularity of these layers. One example is OntoNotes (Hovy et al., 2006), annotated for syntax, named entities, word senses, PropBank (Palmer et al., 2005) predicateargument structures, and coreference. Another example is GUM (Zeldes, 2017), with layers for syntactic, coreference, discourse, and document structure. Both of these resources cover multiple genres. Different layers in these resources are anchored differently, as noted below. Token-anchored. Many semantic annotation layers are specified in terms of character or token offsets. This is the case for UCCA's Foundational Layer ( §2.2), FrameNet (Fillmore and Baker, 2009), RED (O'Gorman et al., 2016), all of the layers in GUM, and the named entity and word sense annotations in OntoNotes. Though the guidelines may mention syntactic criteria for deciding what units to semantically annotate, the annotated data does not explicitly tie these layers to syntactic units, and to the best of our knowledge the annotator is not constrained by the syntactic annotation.
Syntax-anchored. Semantic annotations explicitly defined in terms of syntactic units include: PropBank (such as in OntoNotes); and the coreference annotations in the Prague Dependency Treebank (PDT; Nedoluzhko et al., 2016). In addition, PDT's "deep syntactic" tectogrammatical layer, which is built on the syntactic analytic layer, can be considered quasi-semantic (Böhmová et al., 2003).
Transformed syntax. In other cases, semantic label annotations enrich skeletal semantic representations that have been deterministically converted from syntactic structures. One example is Universal Decompositional Semantics (White et al., 2016), whose annotations are anchored with Pred-Patt, a way of converting Universal Dependencies trees (Nivre et al., 2016) to approximate predicateargument structures.
Sentence-anchored. The Abstract Meaning Representation (AMR; Banarescu et al., 2013) is an example of a highly integrative (anti-modular) approach to sentence-level meaning, without anchoring below the sentence level. AMR annotations take the form of a single graph per sentence, capturing a variety of kinds of information, including predicate-argument structure, sentence focus, modality, lexical semantic distinctions, coreference, named entity typing, and entity linking ("Wikification"). English AMR annotators provide the full graph at once (with the exception of entity linking, done as a separate pass), and do not mark how pieces of the graph are anchored in tokens, which has spawned a line of research on various forms of token-level alignment for parsing (e.g. Flanigan et al., 2014;Pourdamghani et al., 2014;Chen and Palmer, 2017;Szubert et al., 2018;Liu et al., 2018). Chinese AMR, by contrast, is annotated in a way that aligns nodes with tokens (Li et al., 2016).
Semantics-anchored. The approach we explore here is the use of a semantic layer as a foundation for a different type of semantic layer. Such approaches support modularity, while still allowing annotation reuse. A recent example for this approach is multi-sentence AMR (O'Gorman et al., 2018), which links together the previously annotated per-sentence AMR graphs to indicate corefer-ence across sentences.

UCCA's Foundational Layer
UCCA is a coarse-grained, typologically-motivated scheme for analyzing abstract semantic structures in text. It is designed to expose commonalities in semantic structure across paraphrases and translations, with a focus on predicate-argument and other semantic head-modifier relations. Formally, each text passage is annotated with a directed acyclic graph (DAG) over semantic elements called units. Each unit, corresponding to (anchored by) one or more tokens, is labeled with one or more semantic categories in relation to a parent unit.
The foundational layer 4 specifies a DAG structure organized in terms of scenes (events/situations mentioned in the text). This can be seen for three sentences in figure 1, where each corresponds to a Parallel Scene (denoted by the category label H) as three events are presented in sequence. A scene unit is headed by a predicate, which is either a State (S), like these fears, or a Process (P), like get over. Most scenes have at least one Participant (A), typically an entity or location-in this case, the individuals experiencing fear. Semantic refinements of manner, aspect, modality, negation, causativity, etc. are marked with the category Adverbial (D). Time (T) is used for temporal modifiers. Within a non-scene unit, the semantic head is marked Center (C), while semantic modifiers are Elaborators (E). Function (F) applies to words considered to add no semantic content relevant to the scene structure.
Some additional structural properties are worthy of note. An unanalyzable unit indicates that a group of tokens form a multiword expression with no internal semantic structure, like get over 'surmount'. A remote edge (reentrancy, shown as a dashed line in figure 1) makes it possible for a unit to have multiple parent units such that the structure is not a tree. This is mainly used when a Participant is shared by multiple scenes. Texts are annotated in passages generally larger than sentences, and remote edges may cross sentence boundaries-for example, when a Participant mentioned in one sentence is implicit in the next, such as you as the implicit advice-giver in the sentence Advice please!. Implicit units are null elements used when there is a salient piece of the meaning that is implied but not expressed overtly anywhere in the passage. (  the third sentence from figure 1 was annotated in isolation, the advice-giver would be represented by an implicit unit.)

Insufficiency of the Foundational Layer
In addition to the benefits of a semantic foundational layer for coreference annotation ( §2.1), we point out how adding such a layer to UCCA would rectify shortcomings of the foundational layer. First and foremost, UCCA currently lacks any representation of "true" coreference, i.e., the phenomenon that two or more explicit units are mentions of the same entity. Second, though remote edges are helpful for indicating that a Participant is shared between multiple scenes, this is problematic if the referent is mentioned multiple times in the passage. Because the information that those mentions are coreferent is missing, the choice which mention to annotate with a remote edge is underdetermined. This leads to multiple conceptually equivalent choices that are formally distinct, opening the way for spurious disagreements. For example, the implicit advice-giver in figure 1 could be marked equally well with a remote edge to anyone else instead of you, resulting in a structurally diverging graph (taking the presented analysis as the reference). 5 And third, many other implicit relations relevant to coreference (e.g., implied common sense part/whole relations, via bridging) are not exposed in the foundational layer of UCCA. A layer that annotates identity coreference could be extended with such additional information in the future.

The UCoref Layer
The underlying hypothesis of this work is that the spans of words that form referring expressions, i.e., evoke or point back to entities and events in the world, are also grouped as semantic units in the foundational layer of UCCA. This assumption is motivated by the fundamental principles of UCCA as a neo-Davidsonian theory: The basic elements of a discourse are descriptions of scenes (≈ events), and their basic elements are participants (≈ entities). We can thus automatically identify scene and participant units as referring. With this high-precision preprocessing and a small set of simple guidelines for identifying other UCCA units as referring, the process of mention identification in UCoref is very efficient. Figure 2 illustrates how UCoref interacts with the foundational layer. Four referents and six mentions (two singletons) are identified based on the criteria below. Scene and Participant units. The vast majority of referent mentions can be identified by two simple rules: 1) All scene units are considered mentions as they constitute descriptions of actions, movements, or states as defined in the foundational layer guidelines. 2) Similarly, all Participant units are considered mentions as they describe entities that are contributing to or affected by a scene/event (including locations and other scenes/events).
Special attention should be paid to relational nouns like teacher or friend that both refer to an entity and evoke a process or state in which the entity generally or habitually participates. 6 According to the UCCA guidelines, these words are analyzed internally (as both P/S and A within a nested unit over the same span), in addition to the context-dependent incoming edge from their parent. However, the inherent scene (of teaching or friendship) is merely evoked, but not referred to, and it is usually invariant with respect to the explicit context it occurs in. Moreover, treating one span of words as two mentions would pose a significant complication. Thus, we consider these units only in their role as Participant (and not scene) mentions.
Non-scene-non-participant units. A certain subset of the remaining unit types are considered to be mention candidates. This subset is comprised of the categories, Time, Elaborator, Relator, Quantity, and Adverbial. We give detailed guidelines for these categories, as well as for coreference markup, in the supplementary material (appendix A).
Center units. For simplicity, a referring unit with a single Center usually does not require its Center to be marked separately, as a unit always corefers with its Center (see §4 and §5.1 about how this relates to the min/max span distinction).
Multi-Center units receive a different treatment: One use of multi-Center units is coordination, where each conjunct is a Center. Here we do want to mark up the conjuncts in addition to the whole coordination unit-provided the whole unit is referring by one of the other criteria-and assign them to separate coreference clusters. Another class of multi-Center units, which we call relative partitive constructions, is less straightforward to handle. Consider a phrase like the top of the mountain. The intuition given in the UCCA guidelines is that while the phrase is syntactically and, to some extent, semantically headed by top, it can only be fully understood in relation to mountain; thus, both words should be Centers. This construction is clearly less symmetric than coordination, but at this point we do not have a reliable way of formally distinguishing the two in preprocessing, purely based on the UCCA structure and categories. Thus, multi-Center units deserve a more nuanced manual UCoref analysis in future work; however, for the sake of consistency and simplicity, we treat all multi-Center units in the same way as we treat coordinations in our pilot annotation ( §5).
Implicit units. Implicit units may be identified as mentions and linked to coreferring expressions just like any other unit, as long as they meet the criteria outlined above.

Comparison with Other Schemes
The task of coreference resolution is far from trivial and has been approached from many different angles. Below we give a detailed analysis of the theoretical differences between three particular frameworks: OntoNotes (Hovy et al., 2006) Syntactic vs. semantic criteria. GUM and OntoNotes, despite not being anchored in syntax, specify syntactic criteria for mention and coreference annotation. The criteria in RED and UCoref, on the other hand, are fundamentally semantic. Rough syntactic guidance is only given where appropriate and at no time is a decisive factor. Minimum and maximum spans. The policy on mention spans is often one of two extremes: minimum spans (also called triggers or nuggets), which typically only consist of the head word or expression that sufficiently describes the type of entity or event; or maximum spans (also called full mentions), containing all arguments and modifiers. GUM and OntoNotes generally apply a maximum span policy for nominal mentions, with just a few exceptions. 9 For verbal mentions, OntoNotes chooses minimum spans, whereas GUM annotates full clauses or sentences. RED always uses minimum spans, except for time expressions, which follow the TIMEX3 standard (Pustejovsky et al., 2010). One of the main advantages of UCoref is that the preexisting predicate-argument and headmodifier structures of the foundational layer allow a flexible and reliable mapping between minimum and maximum span annotations. Addition-7 For event coreference specifically, see also EventCoref-Bank (ECB; Bejan and Harabagiu, 2010) and the TAC-KBP Event Track (Mitamura et al., 2015), which uses the ACE 2005 dataset (LDC2006T06; Doddington et al., 2004). 8 A separate layer records all named entities, however, and non-coreferent ones can be considered singleton mentions. 9 The GUM guidelines specify that clausal modifiers should not be included in a nominal mention. ally, UCoref has 'null' spans, corresponding to implicit units in UCCA. 10 Predication. OntoNotes does not assert a coreference relation between copular arguments. 11 RED distinguishes several relation types depending on the "predicativeness" of the expression and in particular asserts a set-membership (i.e., non-identity) relation when the second argument is indefinite. In GUM, relation types are assigned based on different criteria, 12 and, depending on the polarity and modality of the copula, its arguments may be marked as coreferring mentions, even if they are indefinite. 13 A slightly different distinction is made in UCoref, where, thanks to the foundational layer, evokers of set-membership and attributive relations are marked as stative scenes in which the modified entity participates. Definite identity is handled in the same way as in RED, as well as relational nouns except for the special case of generic mentions (appendix A.2).
Apposition. In RED and OntoNotes, punctuation is considered a strict criterion for marking appositives, while GUM relies solely on syntactic completeness. In OntoNotes and GUM, ages specified after a person's name are considered separate appositional mentions, coreferring with the name mention they modify. UCoref takes advantage of UCCA's semantic Center-Elaborator structure, abstracting away from superficial markers like punctuation which may not be available in all genres and languages (details in appendix A.2).
Prepositions. Whereas OntoNotes and GUM stick to the syntactic notion of NPs, UCoref in- 10 The coreference layer of the Prague Dependency Treebank (Nedoluzhko et al., 2016), quite similarly to the proposed framework, marks null-mentions arising from control verbs, reciprocals, and dual dependencies (in general, null-nodes arising from obligatory valency slot insertions into the tectogrammatical layer)-the syntactic equivalents of implicit units and remote edges in UCCA. Further, in case the mention is a root of a nontrivial subtree, it is underspecified whether the mention spans only the root, the whole subtree or some part of it. 11 Neither do Poesio and Artstein (in the ARRAU corpus; 2008). 12 In particular, the notion of bridging is interpreted differently between GUM and RED: GUM reserves it for entities that are expected (from world knowledge) to stand in some relationship (e.g., part/whole) with each other, which is reflected in a definite initial mention of the 'bridging target' (My car is broken; it's the motor). RED uses it for copular predications involving relational/occupational nouns like John is a/the killer, which are simple 'coref' (or 'ana'/'cata', if one mention is a pronoun) relations in GUM. We consider neither of these definitions in this work (see appendix A.2). 13 See also Chinchor (1998).
cludes prepositions and case markers within mentions. This does not have a major effect on coreference, but contributes to consistency between languages that vary in the grammaticalization of their case marking. Coordination. Our treatment of coordinate entity mentions is adopted and expanded from the GUM guidelines, where the span containing the full coordination is only marked up if it is antecedent to a plural pronominal mention. OntoNotes does not specify how coordinations in particular should be handled; while the guidelines state that out of headsharing (i.e., elliptic) mentions only the largest one should be picked, we assume that coordinations of multiple explicitly headed phrases are not targeted as mentions in addition to the conjuncts. The minimum span approach of RED precludes marking full coordinations in addition to conjuncts. Summary. That OntoNotes does not annotate singleton mentions makes it the most restrictive out of the compared frameworks. Despite its emphasis on syntax, GUM is closer to our framework as it includes singletons and marks full spans for non-singleton events; the marking of bridging relations, directed coreference links, and information status present in GUM is beyond our scope here. RED is conceptually closest to UCoref in marking all entity, time, and event mentions, except for the difference in span boundaries. This can largely be resolved as we will show in §5.1.

Pilot Annotation
In order to evaluate the accessibility of the annotation guidelines given above and in appendix A, and facilitate empirical comparison with other schemes, we conducted a pilot annotation study.
We annotated a small English dataset consisting of subsets of the OntoNotes (LDC2013T19), RED (LDC2016T23), and GUM 14 corpora with the UCCA foundational and coreference layers. 15 The OntoNotes documents are taken from blog posts, the GUM documents are WikiHow instructional guides, and the RED documents are online forum discussions. Because all annotations were done by a single annotator each and not reviewed, our results are to be understood as a proof of concept; measuring interannotator agreement will be  necessary in the future to gauge the difficulty of the task and quality of guidelines/data. Table 1 shows the distribution of tokens and UCCA foundational units, and table 2 compares the distribution of UCoref units with the respective "native" annotation schema for each corpus. We can see that about one third of all UCCA units are identified as mentions, in all corpora. The automatic candidate filtering based on UCCA categories simplifies this process for the annotator by removing about one third to one half of units. There are similar amounts of scene and Participant units (both of which are always mentions), but it is important to note that Participant units can also refer to events. We can see this reflected by the majority of referent units being event referents. We can also see that most of the referents in GUM, RED, and UCoref are in fact singletons, and the number of non-singleton referents is quite similar between each scheme and UCoref. Most implicit units and targets of remote edges are part of a non-singleton coreference cluster, which confirms the issue of spurious ambiguity we pointed out in §2.3.

Recovering Existing Schemes
Next we examine the differences in gold annotations between our proposed schema and existing schemas and how we can (re)cover annotations in established schemas from our new schema. We can interpret this experiment as asking: If we had a perfect system for UCoref, could we use that to predict GUM/OntoNotes/RED-style coreference? And vice versa, if we had an oracle in one of those schemes, and possibly oracle UCoref mentions, how closely could we convert to UCoref? 16 Exact mention matches. A naïve approach would be to look at the token spans covered by all mentions and reference clusters and count how often we can find an exact match between UCoref and one of the existing schemes.
In UCoref, we use maximum spans by default, but thanks to the nature of the UCCA foundational layer, minimum spans can easily be recovered from Centers and scene-evokers. For schemas with a minimum span approach, we can switch to a minimum span approach in UCoref by choosing the head unit of each maximum span unit as its representative mention. This works well between UCoref and RED as they have similar policies for determining semantic heads, which is crucial for, e.g., light verb constructions. This would be problematic, however, when comparing to a minimum span schema that uses syntactic heads. For schemas with a non-minimum span approach, we keep only the maximum span units from UCoref and discard any heads that have been marked up representatively for their parent (e.g., as remote targets). Fuzzy mention matches. Because our theoretical comparison in §4 exposed systematically diverging definitions of what to include in a mention span, we also apply an evaluation that abstracts away from some of these differences. We greedily identify one-to-one alignments for maximally overlapping mentions, as measured by the Dice coefficient: where L A (L B ) records the mentions from annotation A (B) aligned thus far, and stopping when this score falls below a threshold µ. µ is a hyperparameter controlling how much overlap is required: µ = 1 corresponds to exact matches only, while µ = 0 includes all overlapping mention pairs as candidates (we report fuzzy match results for µ = 0). Once a mention is aligned it is removed from consideration for future alignments. We align referents by the same procedure. Results are reported in table 3.

Findings
We can see in table 3 that UCoref generally covers between 60% and 80% of exact mentions in existing schemes ('R' columns), however, the amount of UCoref units that are present in other schemes varies greatly, between 21.3% (OntoNotes) and 79.5% (RED; 'P' columns). This is generally expected based on our theoretical analysis in §4. Fuzzy match has a great effect on the maximum span schemes in GUM and OntoNotes, resulting in up to 100% of mentions being aligned, and a lesser,   Table 3: Exact (=) and fuzzy (≈) referent matches based on exact and aligned mentions between UCoref and GUM, OntoNotes, and RED. Precision (P) and recall (R) are measured treating gold UCoref annotation as the prediction and gold annotation in each respective existing framework as the reference. Italics indicate minimum UCoref spans are used. Implicit UCoref units are excluded from this evaluation, and children of remote edges are only counted once (for their primary edge).
but still positive effect on RED. 17 We observe a similar trend for referent matches, which follows partly from the mismatch in mention annotation, and partly from diverging policies in marking coreference relations, as discussed above. Whether or not singleton event and/or entity referents are annotated has a major impact here. Below we give examples for sources of non-exact mention matches that can be resolved using fuzzy alignment. GUM and OntoNotes. A phenomenon that is trivially resolvable using fuzzy alignments is punctuation, which is excluded from all UCoref units, but included in GUM and OntoNotes. Another group of mentions recovered are prepositional phrases, where UCoref includes prepositions (to them, since the end of 2005), and GUM and OntoNotes do not (them, the end of 2005). As mentioned in §4, GUM deviates from its maximum span policy for clausal modifiers of noun phrases, which are stripped off from the mention. Noun phrases modified in this way can be fuzzily aligned with the maximum spans in UCoref, even if the modifier is very long: people who are stuck on themselves intolerant of people different from them rude or downright arrogant (UCoref) gets aligned with people (GUM).
RED. Almost 80% of both RED and UCoref mentions match exactly, but there are some cases of divergence: 1) One subset of these are time expressions like this morning, where, as pointed out above, RED marks maximum spans. However, in UCoref these are internally analyzable-thus their Center will be extracted for minimum spans (here, morning). On the other hand, idiomatic multiword expressions (MWEs) such as verb-particle constructions (e.g., pass away 'die') are treated as unanalyzable in UCCA, but only the syntactic head (pass) is included in RED. 2) Also interesting are predicative prepositions and adverbials in copular or expletive constructions: there will be lots of good dr.s and nurses around. Here, UCoref chooses around as the (stative) scene evoker (and would mark the prepositional object as a participant, if it is explicit), while RED chooses the copula be. 3) UCCA treats some verbs as modifiers rather than predicates themselves: e.g., stopped in i m [sic] stopped feeling her move and it seemed in it seemed tom [sic] take forever. The former, as an aspectual secondary verb, is labeled Adverbial (D); the latter, which injects the perspective of the speaker, is labeled Ground (G). Since we do not generally consider these categories referring, these are not annotated as mentions in UCoref, though they are in RED.

Discussion
For the non-minimum span schemas GUM and OntoNotes, we can use a fuzzy mention alignment based on token overlap to find many pairs which aim to capture the same mention, only under different annotation conventions. RED is most similar to UCoref in defining what counts as a mention, though our corpus analysis showed that the notion of semantic heads is interpreted differently for certain constructions, where UCCA is more liberal about treating verbs as modifiers rather than heads. While counting fuzzy matches allows us to recover partially overlapping spans (time expressions, verbal MWEs), other phenomena (adverbial copula constructions, secondary verbs) have inconsistent policies between the two schemes that require more elaborate methods to align. We can thus, to some extent, use UCoref to predict REDstyle annotations, with the additional gain of flexible minimum/maximum spans and cross-sentence predicate-argument structure for a whole document. Furthermore, we see that UCoref subsumes all OntoNotes mentions and nearly all GUM mentions and is able to reconstruct coreference clusters in GUM with high recall.

Conclusion
We have defined and piloted a new, modular approach to coreference annotation based on the semantic foundational layer provided by UCCA. An oracle experiment shows high recall with respect to three existing schemes, as well as high precision with respect to the most similar of the three. We have released our annotations to fuel future investigations.
As an illustration what we mean by that, consider the two occurrences of that in the following example, which are both Relators in UCCA: I didn't like that 1 he said the things that 2 he said.
Here, that 2 is an anaphoric reference to things, whereas that 1 is purely functional and thus should not be identified as referring. In English, this corresponds to the syntactic category of relative pronouns.
Most R units, however, are non-referring expressions like prepositions, so identification of the few referring instances of Relators has to be done manually.
Quantity (Q) Partitive constructions like one of the 5 books contain mentions of two distinct referents: the 5 books and one of the 5 books. According to the v2 UCCA guidelines, these expressions should be annotated as an Elaborator-Center structure with a remote edge: Such an annotation will result in correct identification of the two mentions based on the guidelines given so far (by choosing the E unit and the whole X 19 unit), without the need to identify the Quantifier (Q) unit one Q . However, in foundational layer annotations made based on the UCCA v1 guidelines the same phrase would receive a flat structure (cf. discussion of Centers above): [ one Q of R the F 5 Q books C ] X In this case, we choose the whole X unit as a mention of the one book (respecting semantics rather than morphology), and the Q unit 5 as a mention of the five books.
Adverbial (D) While most Adverbial units (D) are by default not considered to be referring (they describe secondary relations over events), in some cases D units can be identified as mentions (also see coordinated mentions in appendix A.2).
One such phenomenon are prepositional phrases like for another reason and in the majority of cases are annotated as D in the corpus, as they modify scenes, not entities.
Another class of Adverbial units that may be identified as referring are the so-called secondary verbs like help, want and offer, which, according to the UCCA guidelines, modify scenes evoked by primary verbs, but do not themselves denote scenes. However, the relations described by them can sometimes be coreferring antecedents independently from the main scene: [ I A really D appreciated P that Ai ] .
In both examples, losing weight is the main scene according to UCCA, however, in the second example the object of appreciation is helping. Thus, we do mark secondary verbs as mentions, but only if they are referred back to in the way demonstrated above.

A.2 Resolving Coreference
Appositives. Appositives and titles cooccurring with (named) entity mentions are annotated as Elaborators in UCCA and thus automatically included in the entity mention they modify. They should be marked as separate mentions, coreferring with the modified unit.
If a title or occupational noun occurs by itself or as a copular argument, we treat it as a relational noun as described in the next paragraph.
Extensional vs. intensional readings. A coordinated mention of a group of individuals, such as John, Paul, and Mary evokes a referent that is distinct from the possibly already evoked referents of John, Paul, and Mary, respectively.
Relational nouns (e.g., "the president [of Y]"), which are instantiated by a specific individual or a fixed-size set of individuals (e.g., "[X's] parents") at any given point in time, should usually be marked as coreferring with their instances, as inferred from context. This corresponds to an extensional (or set-theoretical) notion of reference: a distinct referent is identified by the individuals in which this concept manifests (extension).
Only in clearly generic statements like Remote edges. Different types of remote edges call for different coreference annotations. Nonhead (i.e., non-Center, -State, or -Process) remote edges indicate that the same entity/scene modifies or participates in two (potentially also coreferent) unit mentions, namely its primary parent (or the primary parent of the unit it heads) and its remote parent. This corresponds to zero anaphora, or a "core" element that is implicit in one context and explicit in another. Head remote edges, however, merely indicate the category of entity/event that is shared between a full and an elliptic or anaphoric mention ("sense anaphora"; Recasens et al., 2016). E.g., books in "two of the 5 books" is categoryshared between 5 books and two (books), which are separate non-coreferent mentions. Whether the primary and remote parent are coreferent or not is contingent on context.
Did anyone else have these fears ? How did you get over them ?