Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics

Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping between meaning representations from different frameworks using two complementary methods: (i) a rule-based converter, and (ii) a supervised delexicalized parser that parses to one framework using only information from the other as features. We apply these methods to convert the STREUSLE corpus (with syntactic and lexical semantic annotations) to UCCA (a graph-structured full-sentence meaning representation). Both methods yield surprisingly accurate target representations, close to fully supervised UCCA parser quality—indicating that UCCA annotations are partially redundant with STREUSLE annotations. Despite this substantial convergence between frameworks, we find several important areas of divergence.


Comparing Meaning Representations
Several symbolic meaning representations (MRs) support human annotation of text with broad coverage (Abend and Rappoport, 2017;Oepen et al., 2019).To date, it is still not completely clear, for all frameworks, what linguistic semantic phenomena they encode, and how it compares to the content represented by the others.It therefore behooves us to develop a firm linguistic understanding of MRs.In particular: are they merely a coarsening and rearranging of syntactic information, such as is encoded in Universal Dependencies (UD; Nivre et al., 2016Nivre et al., , 2020))?To what extent do they take lexical semantic properties into account?What does this suggest about the potential for exploiting simpler or better-resourced linguistic representations for improved MR parsing?Intuitively, we ask whether: sentence-level MR ?= syntax + lexical semantics To address this question, we examine UCCA, a document-level MR often used for sentence-level semantics (see §2.1).Hershcovich et al. (2019) began to examine the relation of UCCA to syntax, contributing a corpus with gold standard UD and UCCA parses, heuristically aligning them, and quantifying the correlations between syntactic and semantic labels.Conversely, Hershcovich et al. (2018) provided some initial evidence that other MRs can be brought to bear on the UCCA parsing task via multitask learning, but left the details of the relationship between representations to latent (and opaque) parameters of neural models.
In this paper, we aim to close the gap between the two previous investigations by (1) building an interpretable rule-based system to convert from shallower representations (syntax and lexical semantic units/tags) into UCCA, forcing us to be linguistically precise about what UCCA captures and how it "decomposes"; and (2) training top-performing supervised parsers in a delexicalized setting with only syntactic and lexical semantic features, as a data-driven mapping corroborating the rule-based approach.
We perform our analysis on the Reviews section of the English Web Treebank (Bies et al., 2012), which has been manually annotated with UD and UCCA; and STREUSLE for lexical semantics ( §2).Although at present we only have the necessary evaluation data for English, the linguistic representations we examine have been applied to multiple languages ( §2).Our approach can thus be applied crosslinguistically with minimal adaptation.Our contributions are: • Delexicalized rule-based and supervised UCCA parsers, based only on syntax and lexical semantics.
• A linguistically motivated analysis of similarities and differences between the frameworks.

Representations under Consideration
The increasing interest in semantic representation and parsing, and the partial overlap in content between the different frameworks (Oepen et al., 2019), is a main motivation for our inquiry into content differences between UD and STREUSLE, and UCCA.We expect our inquiry to be relevant to other schemes, both in developing a general methodology, and in the insights gathered.For example, besides STREUSLE, UD also serves as the backbone of the DeComp scheme (White et al., 2016), and so information as to its semantic content is important there as well.Argument structural phenomena are at the heart of many MRs, which provide further motivation for empirical studies to the extent lexical semantics and syntax can encode them.

Universal Conceptual Cognitive Annotation
Universal Conceptual Cognitive Annotation (Abend and Rappoport, 2013) targets a level of semantic granularity that abstracts away from syntactic paraphrases in a typologically-motivated, cross-linguistic fashion (Sulem et al., 2015), building on Basic Linguistic Theory (Dixon, 2010(Dixon, , 2012)), an influential framework for linguistic description.The scheme does not rely on language-specific resources, and sets a low threshold for annotator training.Beyond syntactic paraphrases, UCCA encodes lexical semantic properties such as the aspectual distinction between states and processes (whether an event evolves in time or not).
UCCA has been applied to text simplification (Sulem et al., 2018b) and evaluation of text-to-text generation (Birch et al., 2016;Choshen and Abend, 2018;Sulem et al., 2018a).UCCA corpora are available for English, French and German, and pilot studies have been conducted on additional languages.
Here we summarize the principles and main distinctions in UCCA. 1  In UCCA, an analysis of a text passage is a DAG (directed acyclic graph) over semantic elements called units.A unit corresponds to (is anchored by) one or more tokens, labeled with one or more semantic categories in relation to a parent unit. 2The principal kind of unit is a scene denoting a situation mentioned in the sentence, typically involving a scene-evoking predicate, participants, and (perhaps) modifiers.Each predicate is labeled as either State (S) or Process (P). Figure 1 contains three scenes: one anchored by the Process took; one anchored by the Process a repair; and one anchored by the possessive pronoun our, which indicates a stative possession relation.A Participant (A) of a scene is typically an entity or location involved.Adverbials (D) modify scenes with respect to properties like negation, modality, causativity, direction, manner, etc., which do not constitute an independent situation or entity.Temporal modifiers are labeled Time (T).
Scenes in UCCA can relate to one another in one of three ways.A Scene can serve as a Participant within a larger scene; a Scene can serve to elaborate on a Participant within a Scene (typically relative clauses); or scenes can be related by parallel linkage in a unit that consists of Parallel Scenes (H) and possibly Linkers (L) describing how they are related.This is seen at the top level of figure 1, where the taking and repair scenes are parallel and the purposive for is a linker.
Other categories only apply to units with no predicate: a semantic head-the Center (C); modifiers of Quantity (Q); and other modifiers, called Elaborators (E).An Elaborator may itself be a scene, as in our vehicle, where the scene of possession elaborates on the vehicle entity.Similarly, blue vehicles would be analyzed with a stative scene of blueness that elaborates on the vehicles.
Apart from the main semantic content of scenes and participants, UCCA provides the categories: Relator (R) for grammatical markers expressing how a unit relates to its parent unit-in English, these are mainly prepositions and the possessive 's; Function (F) for other grammatical markers with minimal semantic content, such as tense auxiliaries, light verbs, and articles.Other categories are used for expressing coordination and for expressions expressing speaker perspective outside the propositional structure of the sentence.Semantically opaque multi-word expressions (e.g., air conditioning in figure 1) are called unanalyzable units, and are not analyzed internally.UCCA distinguishes primary edges that always form a tree, and remote edges, which express reentrancies, such as the dotted edge from the possession scene unit to vehicle.

Universal Dependencies
UD is a syntactic dependency scheme used in many languages, aiming for cross-linguistically consistent and coarse-grained treebank annotation.Formally, UD uses bi-lexical trees, with edge labels representing syntactic relations.An example UD tree appears at the bottom of figure 1.

STREUSLE
STREUSLE (Supersense-Tagged Repository of English with a Unified Semantics for Lexical Expressions) is a corpus annotated comprehensively for several forms of lexical semantics (Schneider and Smith, 2015;Schneider et al., 2018).All kinds of multi-word expressions (MWEs) are annotated, giving each sentence a lexical semantic segmentation. 3Syntactic and semantic tags are then applied to individual units (single-and multi-word).The semantic tags are supersenses for noun, verb, and prepositional/possessive units.Preposition supersenses include two tiers of annotation: scene role labels represent the semantic role of the prepositional phrase marked by the preposition, and function labels represent the lexical contribution of the preposition in itself.The two labels are drawn from the same supersense inventory and are identical for many tokens.
The lexcat annotations (syntactic category of lexical unit) is a slight extension to the Universal POS tagset, adding categories for certain MWE subtypes, such as light verb constructions, following Walsh et al. (2018) and idiomatic PPs; it also distinguishes possessive pronouns, the possessive clitic 's, and discourse expressions. 4Figure 1 illustrates the MWE, lexcat, and supersense layers.
STREUSLE itself is limited to English, but many of its component annotations have been applied to other languages: verbal multi-word expressions (Ramisch et al., 2018), noun and verb supersenses (Picca et al., 2008;Qiu et al., 2011;Schneider et al., 2013;Martínez Alonso et al., 2015;Hellwig, 2017), and preposition supersenses (Hwang et al., 2017;Peng et al., 2020;Hwang et al., 2020).Liu et al. (2020) presented a comprehensive lexical semantic tagger for STREUSLE, which predicts the comprehensive lexical semantic analysis from text, and is freely available.Prange et al. (2019) proposed several procedures for integrating STREUSLE supersenses directly into UCCA, refining its coarsegrained categories with preposition supersenses.Enriching a supervised UCCA parser with preposition supersense features from STREUSLE-and, even more so, training a parser to predict supersenses jointly with UCCA-improved parsing performance, revealing the two frameworks to be overlapping but complementary.

Related Representations
The above annotation schemes define finite inventories of coarse-grained categories to avoid depending on language-specific lexical resources, and thus can in principle be applied to any language.This fact distinguishes UCCA and STREUSLE from finer-grained sentence-structural representations like FrameNet (Baker et al., 1998;Fillmore and Baker, 2009) and the Abstract Meaning Representation (Banarescu et al., 2013), which relies on PropBank (Palmer et al., 2005).The Prague Dependency Treebank tectogrammatical layer (Böhmová et al., 2003) uses few lexicon-free roles, but its semantics is determined by a valency lexicon.
The Parallel Meaning Bank (Abzianidze et al., 2017) uses lexicon-free5 VerbNet (Schuler, 2005) semantic roles.The STREUSLE tagset for preposition supersenses generalizes VerbNet's role set to cover non-core arguments/adjuncts of verbs, as well as prepositional complements of nouns and adjectives.Universal Decompositional Semantics (DeComp) defines semantic roles as bundles of lexicon-free features.Cross-linguistic applicability in this case is delegated to the parser, which parses sentences in other languages to their corresponding English semantics (Zhang et al., 2018).3 First Conversion Approach: Rule-based UCCA Parsing from Syntax and Lexical Semantics Here we describe a system to produce UCCA analyses for text, given UD syntactic graphs and STREUSLE lexical semantic annotation.This is a rule-based converter that inspects the UD structure in tandem with STREUSLE annotations to build an UCCA parse. 6An analysis of the converter's successes and failures ( §7) will, in turn, reveal the similarities and differences between the schemes.What follows is an overview of the algorithm; futher details about the rules are given in appendix A. 7   MWEs and unanalyzable units.Based on STREUSLE MWE annotation, we group together text tokens into single unanalyzable semantic units when they are annotated as strong MWEs (we do not use weak MWEs, as UCCA does not encode them), except for light verbs and annotations that would lead to cycles.MWE-internal dependency edges are discarded so they will not be processed later.2.
STREUSLE supersenses and UCCA scene-evoking phrases.In a top-down traversal of the dependency parse, we visit each word's lexical unit and decide whether it evokes a main relation (scene-evoking phrase), using rules based on syntactic and lexical semantic features: copular be and stative have, adjectives (excluding a small list of quantity adjectives), existential there, non-discourse adverbs with a copula dependent, predicative prepositions and copulas introducing a predicate nominal, as well as common nouns supersense-tagged as Secondary verb constructions.These constructions are structured differently in UCCA and UD: in "members who won't stop talking", the verb "stop" is the UD head and "talking" is a dependent.In UCCA "stop" is D and "talking" is the main relation, labeled P. To normalize the treatment of these constructions, they are marked and eventually restructured such that the syntactic head is labeled D and the syntactic dependent is labeled as the main relation.4 Second Conversion Approach: Delexicalized Supervised UCCA Parsing Previous work tackled the UCCA parsing task using supervised learning.In order to complement and validate the analysis of the rule-based converter, we compare its findings to a delexicalized supervised parser, that can be seen as inducing a converter from data.By removing all word and lemma features from these parsers, and instead adding features based on gold UD and STREUSLE annotations, we obtain supervised "converters", which can be used for data-driven analysis and complement the rules.
TUPA.This UCCA parser (Hershcovich et al., 2017) is based on a transition-based algorithm with a neural network transition classifier, using a BiLSTM for encoding input representation, with word, lemma, and syntactic features embedded as real-valued features.We add the supersense and lexcat from STREUSLE as embedding inputs to the TUPA BiLSTM (concatenated with existing inputs).For prepositions, we add both the scene role and function (see §2).9 HIT-SCIR Parser.This is a transition-based parser for several MR frameworks, including UCCA (Che et al., 2019)

Experiments
Data.We use the Reviews section from UD 2.6 English_EWT (Zeman et al., 2020), with lexical semantic annotations from STREUSLE 4.4 (Schneider and Smith, 2015;Schneider et al., 2018), 11 and with UCCA graphs from UCCA_English-EWT v1.0.1 (Hershcovich et al., 2019). 12We use the standard train/development split for this dataset, and do not use the test split to avoid over-analyzing it, although all datasets contain annotations for it too.The data statistics are listed in table 1.
Rule-based converters.We evaluate the rule-based converter ( §3), as well as the syntax-based converter from Hershcovich et al. (2019), which uses the UD tree and a majority-based category mapping (based on the most common UCCA category in the training set for each UD relation).This converter is oblivious to lexical semantics.
Parsers.We train TUPA v1.3 and the HIT-SCIR parser with gold-standard features from UD and STREUSLE, for equal conditions with the converters, using default hyperparameters.Categorical features are added as 20-dimensional embeddings.Scores are averaged over 3 models with different random seeds.For TUPA, we ablate UD-or STREUSLE-based features to quantify the contribution of each.Evaluation.We use standard UCCA parsing evaluation, matching edges by the terminal yields of their endpoint units. 13Labeled precision, recall and F1-score consider the edge categories when matching edges.Where an edge has multiple categories, each of them is considered separately.

Results
Table 2 shows the EWT Reviews dev scores.For comparison with parsers that have access to words, we also show the TUPA dev results from Hershcovich et al. (2019), who used syntactic features from the gold UD annotation and GloVe (Pennington et al., 2014);14 and the HIT-SCIR parser with BERT/GloVe, and with UD+STREUSLE features.
Rules with gold UD and STREUSLE close the gap between the syntax-based converter and parsers with word information, reaching the same primary labeled F1 as TUPA with word features.This is surprising (since supervised parsers are known to usually outperform rule-based ones), and suggests that the training data (see table 1) was insufficient for the parser to learn a mapping as accurate as the complex conversion rules (described in §3).Enhancing GloVe-based HIT-SCIR with UD and STREUSLE yields similar results.However, many errors remain in both approaches, indicating that UCCA and STREUSLE are far from equivalent.We analyze these errors in §7 to investigate the frameworks and the relationship between them.
Ablations.Noticeable drops in the ablations (TUPA with UD/STREUSLE only) show that both UDprovided structure and relation/entity types from STREUSLE supersenses are needed to make up for the missing lexical information, but also that lacking UD hurts more.This is expected, as the parser resorts to guessing when it lacks sufficiently informative input, and the chance of errors when guessing the UCCA structure (for which UD is informative) is much larger than for assigning edge labels (for which STREUSLE provides more fine-grained cues).Table 3: dev set confusion matrix for the rule-based converter.The last column (row) shows the number of predicted (gold-standard) edges of each category that do not match any gold-standard (predicted) unit.

Analysis
Table 3 presents the EWT Reviews dev confusion matrix for the converter's output and gold UCCA.
The delexicalized parsers' confusion matrix (in appendix C) is similar. 15Note that we consulted the training set iteratively while developing the rules, addressing many recurring issues that would show up as prominent confusions. 16e proceed with an extensive error analysis of the converter, to point out similarities and delineate remaining divergences, which we stipulate constitute content differences between UCCA and the combination of syntax and lexical semantics from UD and STREUSLE.Figure 3 shows gold annotation and the converter's predictions.

High Match-Converging Analyses
Participants.As are recovered with high precision and recall.This is generally expected as most syntactic subjects and objects, as well as some obliques and even clauses, signify scene participants.Where syntax and semantics diverge, STREUSLE supersenses can rule out unlikely candidates.The most common sources of missed As are structural errors, i.e., incorrect scene structures, overly flat units containing more than the referential words, or misinterpreted noun compounds (see §7.3

below).
Function words.As evident in table 3, Function words (F) are accurately predicted.The distinction between words that contribute to the semantic meaning and those that do not is preserved between STREUSLE and UCCA, except for some cases-mainly infinitive "to".

STREUSLE Annotation
Predicted UCCA Annotation Gold UCCA Annotation

early [H [P Gets busy] ] [L so] [H [P come] [T early] ] [H [D Gets] [S busy] ] [L so] [H [P come] [T early] ]
✓ so easy to load Linkers.Linkers (L) are relatively easy: they are prototypically instantiated by syntactic co-and subordinators.To the extent that these are considered adpositional by STREUSLE, their supersense helps disambiguate between inter-scene linkage and Connectors (N) of non-scenes.

Partial Match-Inferrable by Combining Syntax and Lexical Semantics
Time (T) and Quantifier (Q) expressions frequently coincide with certain syntactic categories such as adverbs and prepositions, and can typically be identified from corresponding supersenses, if available.
The converter tends to err on the conservative side, falling back to Adverbials (D) and Elaborators (E) when it cannot find sufficient explicit semantic evidence.

Low Match-Divergences or Insufficient Information
Noun compound interpretation.Lexical composition in noun compounds evokes various forms of event structures, which are underspecified by the meaning of the constituent words (Shwartz and Dagan, 2019).While often compounding is used for Elaboration, as in [E tap] [C water], it is not necessarily always the case.For example, in [C sea] [C bottom] both "sea" and "bottom" are Center, since they reflect part-whole relations.The modifier may also be a Participant in the scene evoked by the head, as in [A road] [P construction].This is partially encoded in STREUSLE, as the fact that the MWE "road construction" has the N.EVENT supersense indicates that it is scene-evoking, but it still does not reveal the relationship between the constituent words.
Centers.C is often unaligned due to the different notions of multi-word expressions in STREUSLE and UCCA: "tap water" is considered a strong MWE in STREUSLE, but is internally analyzed (with "water" being the Center) in UCCA (see figure 3), leading to an unmatched C.

Scene-evokers
While the concept of scenes is central to UCCA, correctly identifying scene-evoking words is one of the more difficult tasks for our converter."Scene-ness" clearly goes beyond syntax (not all verbs evoke scenes and scenes can be evoked by a wide range of POS) and STREUSLE supersenses in isolation are often too coarse to resolve the question whether a given word evokes a scene and, if so, whether it is a Process (P) or a State (S).The former decision is generally somewhat easier for the converter (Recall of scene-evokers: 71.3%) than distinguishing between P (Recall: 69.1%) and S (64.0%).Below we examine a few recurring phenomena involving scenes.
Relational nouns.This is a special case of scene-evoking nouns (Newell and Cheung, 2018;Meyers et al., 2004), both refering to an entity and evoking a scene in which the entity generally or habitually participates. 17These units have two categories in UCCA, either A|P or A|S.The converter relies here on a combination of N.PERSON or N.GROUP supersenses and lexical lists.However, these nouns' scene-ness is often not recognized and they are confused with regular A or C.
Scene-evoking adjectives.Inspecting the high-frequency confusions, adjectives stand out as persistent error inducers.Different classes of adjectives are handled differently in UCCA: e.g., while most adjectives are scene-evoking, pertainyms (academic), inherent-composition modifiers (sugary), and quantity modifiers (many) are not.Some adjectives are ambiguous: a legal practice may refer to a behavior that is legal as opposed to illegal, in which case it should be scene-evoking, or to a law office, in which case it should not.Enriching STREUSLE with supersenses for adjectives (Tsvetkov et al., 2014) might be fruitful for such distinctions.Even with lexical disambiguation, the scene attachment of the adjective may be ambiguous: e.g. a good chef probably means a chef who cooks well, so good should be an Adverbial in the scene evoked by chef-in contrast with a tall chef, where tall is not part of the cooking scene and instead should evoke a State.Predicative adjectives, and adjective modifiers in predicative NPs, are another source of difficulty, especially when they occur in fragments: sometimes the adjective is annotated as evoking the main scene, and sometimes not.Determining this requires making various semantic distinctions, which are not fully represented in STREUSLE.

Conclusion
We have presented an extensive analysis of the similarities and differences between STREUSLE and UCCA on the EWT Reviews corpus, assisted by two complementary methods: manual rule-based conversion, and delexicalized parsing.Both approaches arrived at similar results, showing that the conversion between the frameworks can be moderately accurate, while also revealing important divergences, namely distinctions made in UCCA but not in STREUSLE: semantic relation between nouns in compounds, adverbial and linkage usage of adverbs, and the scene-evoking status of nouns, possessives and adjectives, among others.
Enriching supervised parsers with lexical semantic features improves parsing performance when using gold input.While this paper focuses on analysis, future work will investigate using predicted features with a parser/tagger (Liu et al., 2020).This approach is expected to improve parsing performance and robustness, demonstrating the utility of linguistically-informed approaches in complementing general supervised semantic parsers.

A Details of Rule-based Converter
The following is a detailed description of the rules used in the rule-based parser ( §3).This gives a stepby-step overview of the algorithm for constructing an UCCA semantic graph using STREUSLE/UD annotations.It is not a full specification and omits many details of the criteria and operations, but should be helpful for understanding the full code, available in https://github.com/danielhers/streusle/blob/streusle2ucca/conllulex2ucca.py.A running example is given at each step for the sentence: There's plenty of parking, and I've never had an issue with audience members who won't stop talking or answering their cellphones.
It has the following gold annotation: Step 0.1: Transform the UD dependency parse Split the final preposition off from V.IAV MWEs like take care of, as it is usually not treated as part of the verbal unit in UCCA.If the remaining part is still an MWE, it is labeled V.LVC.full or V.VPC.fulldepending on its syntax.
Step 0.2: Initialize lexical units under the UCCA root • Each strong lexical expression (single-word or MWE) in STREUSLE is treated as an UCCA unit, with the following exceptions: -An MWE with lexcat V.LVC.cause is broken into two units: D for the light verb modifies the main predicate in a + unit.-An MWE with lexcat V.LVC.full is broken into two units: F for the light verb modifies the main predicate in a + unit.-An MWE annotation is discarded if it would lead to a cycle in the dependencies such that the highest token of the MWE is dependent on a token outside of the MWE, which is dependent on another token of the MWE.
• MWE-internal dependency edges are discarded so they will not be processed later.
• Punctuation is marked U.
• Mappings between units and dependency nodes are maintained.
Step 0.3: Identify which lexical units are main relations (scene-evoking) In a top-down traversal of the dependency parse, visit each word's lexical unit and decide whether it evokes a state (S), process (P), undetermined between state or process (+), or does not evoke a scene (-): • If already labeled D, F, or +, do nothing.
• If an adjective not from a small list of quantity adjectives, label S.
• If existential there, label S. If a be verb in an existential construction, labeland swap the positions of be and there in the dependency parse so there is the head.
• If an adverb not attached as discourse and it has a copula, label S.
• thanks and thank you are P.
• In most cases, predicative prepositions are S.
• A copula introducing a predicate nominal (non-PP) is labeled S and promoted to the head of the dependency parse, unless the nominal is scene-evoking.In the notation, "UNA" means "lexical" (it originally meant "unanalyzable").

Easy for both:
• both systems perform well on As • both systems are good at recalling Fs (system A: 84.9, system B: 89.9), but system A (in contrast to system B) has almost perfect precision (98.6 vs 82.5) Difficult for both: • both systems perform okay on Cs; system B tends to confuse Cs for As and Ps more than system A, which tends to fail at predicting units matching gold Cs entirely • Ds are difficult for both systems; system A underpredicts Ds more than system B, but it is also more precise • Es are difficult for both systems; Es often get confused (by both systems and in both directions) with Ds, As and Ss • relational nouns (A|S, A|P) are very difficult for both systems; system B doesn't predict them at all, and system A predicts a few A|Ss which are mostly correct, but still misses 3/4 of them (and all

A|Ps)
• Gs are very difficult for both systems; system B doesn't predict them at all, and system A predicts a few but with low precision and recall Differences: • system A is better at recalling Qs • system A is better at recall (64.0 vs 53.6) and precision (61.2 vs 48.4) on Ss, but also confuses some gold Fs for Ss • system A is better at recalling Ts; both systems tend to confuse Ts for Ds, but system B does it more than half of the time whereas system A only a quarter • system A is more eager to predict Ls, thus has higher recall (83.7 vs 61.0) but lower precision (74.7 vs 87.5) here than system B; system A confuses some gold Fs and Rs for Ls, system B confuses some gold Ls for Cs, Ns and Rs • system B is more eager to predict Ns, thus has higher recall (76.6 vs 66.0) but lower precision (37.1 vs 66.0) here than system A; system B confuses some gold Ls for Ns • system B is more eager to predict Ps, thus has higher recall (78.6 vs 69.1) but lower precision (56.3 vs 73.1) here than system A; system B confuses some gold Cs, Ds and Fs for Ps, system A confuses some gold Ps for Hs • system B is more eager to predict Rs, thus has higher recall (88.8 vs 74.0) but lower precision (65.5 vs 84.1) here than system A; system B confuses some gold Fs and Ls for Rs, system A confuses some gold Rs for Ls Given the small differences between the converters in terms of performance, we decided to use system A for the main analysis in the paper, as it is more modular and interpretable.

Figure 1 :
Figure 1: Example sentence from the Reviews training set (reviews-086839-0003, "We took our vehicle in for a repair to the air conditioning"), with UCCA, STREUSLE, and UD annotations.⋄ UCCA abbreviations: H = parallel scene, L = scene linker, P = process (dynamic event), S = state, A = scene participant, D = scene adverbial, E = non-scene elaborator, C = center (non-scene head), R = relator, F = functional element.The STREUSLE and UD part is adapted from Liu et al. (2020).

Figure 2 :
Figure 2: Example sentence (reviews-003418-0006, reading "Blue cross has no record of aa[sic] reversal") with gold-standard UCCA graph; STREUSLE MWEs and supersenses; and UD coarse-grained POS tags and relations (left); and UCCA graph output by the rule-based converter (right).

✗Figure 3 :
Figure3: Examples for cases where the rule-based converter produced the correct UCCA annotation due to converging analyses, as well as cases where it produced a wrong annotation due to a divergence.

Table 1 :
EWT Reviews data statistics.We use only the training and development splits for analysis.
Coordination and lexical heads.Traversing the graph top-down again, coordination between scene units is labeled L (Linker), and between non-scene units, N (Connector).Scene units are labeled H, and non-scene unit heads, C. Lexical heads of units are labeled C, P, or S, and scene units as H where necessary; in "X of Y" constructions involving quantities/species, Y is identified as the C.

Table 2 :
Labeled F1 (in %) for primary and remote edges on the UCCA EWT Reviews dev set, for rule-based systems (top), delexicalized supervised parsers with gold UD+STREUSLE (middle), and supervised parsers with word features (bottom).
The ablated TUPA models still outperform the syntax-based converter, indicating that there are indeed structural signals in STREUSLE and semantic signals in UD, which TUPA can salvage.
(The top-down traversal order ensures the nominal is reached first.)• If a common noun, mark as -S if supersense-tagged as ATTRIBUTE, FEELING, or STATE -P if ACT, PHENOMENON, PROCESS, or EVENT (with the exception of nouns denoting a part of the day) a relational noun if PERSON or GROUP and matching kinship/occupation lists or suffixes --otherwise • If a verb or copula not handled above, label + • Else label -

Table 4 :
UCCA postprocessing category replacements in the alternative converter.
Table5shows the confusion matrix for the delexicalized HIT-SCIR parser on the EWT reviews development set.

Table 5 :
Development set confusion matrix for the delexicalized HIT-SCIR parser.The last column (row), labeled ∅, shows the number of predicted (gold-standard) edges of each category that do not match any gold-standard (predicted) unit.