Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences

The patterns in which the syntax of different languages converges and diverges are often used to inform work on cross-lingual transfer. Nevertheless, little empirical work has been done on quantifying the prevalence of different syntactic divergences across language pairs. We propose a framework for extracting divergence patterns for any language pair from a parallel corpus, building on Universal Dependencies. We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation. We further present a novel dataset, a manually word-aligned subset of the Parallel UD corpus in five languages, and use it to perform a detailed corpus study. We demonstrate the usefulness of the resulting analysis by showing that it can help account for performance patterns of a cross-lingual parser.


Introduction
The assumption that the syntactic structure of a sentence is predictably related to the syntactic structure of its translation has deep roots in NLP, notably in cross-lingual transfer methods, such as annotation projection and multi-lingual parsing (Hwa et al., 2005;McDonald et al., 2011;Kozhevnikov and Titov, 2013;Rasooli and Collins, 2017, inter alia), as well as in syntaxaware machine translation (MT; Birch et al., 2008;Williams et al., 2016;Bastings et al., 2017). Relatedly, typological parameters that provide information on the dimensions of similarity between grammars of different languages were found useful for a variety of NLP applications (Ponti et al., 2019). For example, neural MT in low-resource settings has been shown to benefit from bridg- * Work mostly done while at the Hebrew University of Jerusalem. ing morphosyntactic differences in parallel training data by different types of preprocessing, such as reordering (Zhou et al., 2019) and hand-coded syntactic manipulations (Ponti et al., 2018). Nevertheless, little empirical work has been done on systematically quantifying the type and prevalence of syntactic divergences across languages. Moreover, previous work generally classified divergences into a small set of divergence classes, often based on theoretical considerations (Dorr, 1994) or on categorical ("hard") typological features selected in an ad-hoc manner, and left basic questions, such as how often POS tags are preserved in translation and what syntactic structures are likely correspondents of different syntactic relations, largely unaddressed. See § 2.
We propose a language-neutral, fine-grained definition of cross-linguistic morphosyntactic divergences (CLMD) that allows for their extraction using a syntactically annotated, content-wordaligned parallel corpus. Concretely, we classify CLMD based on the edge labels on the dependency paths between corresponding pairs of content words ( § 3.2). See Figure 1 for an example. 1 1 It may appear that divergences recoverable by means of UD edge labels are purely syntactic and not morphosyntactic. However, this is not the case: the domain of "pure syntax" is not well defined in a non-theoretical perspective, and many phenomena we are dealing with, e.g. a switch from a direct-We further conduct a detailed corpus study, manually aligning content words in a subset of the PUD corpus (Zeman et al., 2017) over five language pairs-English-French (En-Fr), English-Russian (En-Ru), English-Chinese (En-Zh), English-Korean (En-Ko), and English-Japanese (En-Jp) ( § 3.1)-and analyze the prevalence of divergences by types ( § 4). The resulting resource can be useful for MT research, by guiding the creation of challenge sets focusing on particular constructions known to be cross-linguistically divergent, as well as by guiding preprocessing of source-side sentences for better MT performance. 2 The emerging CLMD provide information not only on the macro-structure of the grammar (e.g., whether the language is pro-dropping), but also on parameters specific to certain lexical classes (e.g., modal verbs) and probabilistic tendencies (e.g., Japanese tends to translate sequences of events, expressed in English using coordinating conjunctions, with subordinate clauses). See § 5.
Further experiments demonstrate the methodology's applicative potential. First, we show that the proposed methodology can be straightforwardly automated by replacing manual parses and alignments with automatically induced ones ( § 7). We present a study done on a larger En-Zh corpus, which yields results similar to those obtained manually. Secondly, we show that the reported distribution over divergence types is predictive of the performance patterns of a zero-shot parser ( § 8).

Related Work
Comparing syntactic and semantic structures over parallel corpora is the subject of much previous work. Dorr et al. (2010) compiled a multiplyparallel corpus and annotated it with increasingly refined categories in an attempt to abstract away from syntactic detail but did not report any systematic measurement of the distribution of divergences. Šindlerová et al. (2013), Xue et al. (2014), Sulem et al. (2015), and Damonte and Cohen (2018) studied divergences over semantic graphs and argument-structure phenomena, while a related line of work examined divergences in discourse phenomena (Šoštarić et al., 2018). Other works studied the ability of a given grammar formalism to capture CLMD in a parallel corpus (e.g., object construction to an oblique construction, often involve morphological processes, such as adding a case ending. 2 The resource can be found at https://github. com/macleginn/exploring-clmd-divergences Søgaard and Wu, 2009). However, none of these works defined a general methodology for extracting and classifying CLMD.
The only previous work we are aware of to use UD for identifying CLMD is (Wong et al., 2017), which addresses Mandarin-Cantonese divergences by comparing the marginal distribution of syntactic categories on both sides (without alignment). Relatedly, Deng and Xue (DX17; aligned phrase structure trees over an En-Zh parallel corpus. Notwithstanding the similarity in the general approach, we differ from DX17 in (i) specifically targeting content words, (ii) relying on UD, which is standardized cross-linguistically and allows to simplify the alignment process by focusing on the level of words, 3 and (iii) addressing multiple language pairs. It should be noted that the classification of divergences presented in DX17 is rather coarse-grained. Of the seven classes in their study, five (Transitivity, Absence of function words, Category mismatch, Reordering, and Dropped elements) reflect local syntactic differences; one (Lexical encoding) covers many-to-one/one-to-many alignments and non-literal word translations; and the remaining residual type (Structural paraphrase) indiscriminately covers more substantial CLMD. We address this limitation and propose a methodology that automatically derives fine-grained CLMD from aligned annotated corpora and enables straightforward computation of their type statistics.

Fine-grained Classification of CLMD
In this section, we present a novel cross-linguistic dataset that provides a high-resolution overview of morphosyntactic differences between pairs of languages and a formal definition of morphosyntactic divergences formulated based on it.
Divergences in the syntax of sentences and their translations can stem from a number of reasons. Setting aside semantic divergences, which are differences in the content expressed by the source and the target (Carpuat et al., 2017;Vyas et al., 2018), the remaining varieties of divergences are essentially different ways to express the same content (Fisiak, 1984;Boas, 2010), which we call CLMD.
We define CLMD empirically to be recurrent divergence patterns in the syntactic structures of sentences and their translations. While content differences may account for some of the observed syntactic divergences, by aiming for recurring patterns we expect to filter out most such cases, as they are subject to fewer grammatical constraints and should thus not yield systematic patterns of morphosyntactic divergence.
It is harder to distinguish between translation artifacts and CLMD in translated sentences that are due to the genuine differences between grammar and usage. However, translated texts are usually characterized by a higher degree of morphosyntactic transfer and rarely portray the target language as more different from the source language than it needs to be (Koppel and Ordan, 2011;Volansky et al., 2015). Therefore, we do not expect to find spurious recurrent morphosyntacticdivergence patterns introduced by the process of translation.

The Manually Aligned PUD Resource
Universal Dependencies (UD) is a framework for treebank annotation, whose objectives include satisfactory analyses of individual languages, providing a suitable basis for bringing out crosslinguistic parallelism, suitability for rapid consistent annotation and accurate automatic parsing, ease of comprehension by non-linguists, and effective support for downstream tasks. See Appendix A for a glossary of UD terms.
An important feature of the dependency analysis in UD is that content words are considered the principal components of dependency relations. Within this framework, function words are generally dependents of the content word they relate to most closely. The primacy of content words brings out cross-linguistic parallels that would be harder to detect with other annotation frameworks since function words are highly variable across languages. Importantly, dependency paths between content words do not generally contain function words. As a result, by comparing paths across languages, differences in the surface realization are often masked, and argument structure and linkage differences emphasized.
For example, a preposition accompanying a verb may be dropped in translation if the corresponding verb is transitive (cf. went around the world in En vs. oboshel went.around mir world in Ru). As prepositions modify the head noun in UD prepositional phrases, the dependency path between the verb and the head noun is not altered.
The Parallel Universal Dependencies (PUD) corpus consists of 1000 sentences translated into various languages by professional translators. 4 In this paper, we study the Russian, French, Chinese, Japanese, and Korean versions of the PUD corpus, which were each aligned with the corresponding English corpus. 5 Each parallel corpus was aligned by a human annotator, proficient in the language of the corpus and in English. The UD tokenization is adopted in all cases. Due to the difficulty in finding annotators proficient in pairs of these languages, our annotation takes English as the source language. However, it is possible to obtain an approximate alignment between any pair of these languages, pivoting through English.
Only content words are aligned, so as to sidestep the inherently ambiguous nature of aligning function words across divergent constructions. For details on the function/content distinction we apply to words, see Appendix B. We restrict the alignment to include connected components of the following types: (1) one-to-one alignments, i.e., where a single content word is aligned with another single content word; (2) many-to-one alignments, where multiple source words are aligned with a single target word; (3) one-to-many alignments, where a single source word is aligned with multiple target words. Where a source multi-word expression is translated with a target multi-word expression, we align their headwords, to indicate that their subtrees are in correspondence with one another (e.g., English with this and French par conséquent). Most of the content words in the corpora were aligned in a one-to-one alignment, which accounts for around 90% of aligned En tokens across the corpora.

Defining Divergences using UD
We present a framework for defining and investigating translation divergences across a variety of language pairs using UD. The framework operates on a sentence-aligned parallel corpus, where both sides are annotated with UD and content words in corresponding sentences are aligned. Let of UD trees over corresponding sentences, and let CW s ⊂ V s and CW t ⊂ V t be the sets of content words in T s and T t respectively. Let A ⊂ CW s × CW t be a token-level alignment, consisting of one-to-one, many-to-one, and one-to-many alignments. There are two ways to restrict the definition of correspondences between nodes and edges in T s and T t : (1) by considering only one-to-one edges or (2) by defining a one-to-one correspondence A ⊂ A by traversing all many-to-one alignments where v i is the highest node in T s among the nodes in C. The same is done for one-to-many alignments. 6 The first approach is preferable for analyzing syntactic-path correspondences and was followed in this presentation. The second approach is more suitable for analysis of POS mappings, where headwords are more prominent. We then define Corresponding Syntactic Relations (CSR) as a pair (R s , R t ) such that R s and R t are dependency paths in T s and T t , and such that the origin and endpoint of R s are in CW s and the origin and endpoint of R t are their aligned tokens in CW t according to A . If the origin or the endpoint of R s do not have a corresponding node in T t , R s does not have a corresponding relation in T t . The types of R s and R t are the sequence of labels on the edges of the paths, optionally along with their directionality in the tree (linear order is not taken into account). Without loss of generality, we assume that R s begins at the leftmost word of the pair in the En sentence, and R t by definition begins at the target word corresponding to the leftmost source word. For brevity, we only present results where directionality is not taken into account. Relations are thus written as sequences of UD edge labels separated by the '+' sign.
Token pairs that do not share a POS tag and CSR not of the same type are said to form a divergence. One-to-many and many-to-one alignments are another form of divergence.

Empirical Study of Divergences
We apply the proposed methodology to the aligned PUD. We compare syntactic relations, analyzing correspondences between POS tags as well as cor-respondences between single-edge relations in English and target-side dependency paths.

Parts of Speech
We begin by examining the mappings of the POS of corresponding tokens (see Appendix C.1 for the full percentage and count matrices). We find that En POS tags of content words are mostly stable in translations to Fr and Ru (sums of the values on the main diagonals account for 78 and 77% of the total number of word pairs respectively). Notable exceptions are the negative particle not, which is in a one-to-many alignment in French with ne and pas, certain types of auxiliaries analyzed as verbs in both Ru and Fr, and proper nouns, which often get mapped to Fr nouns (cf. § 9 and discussion in [Samardžic et al., 2010]).
The En-Zh matrix presents more divergences with only 65% of the alignment edges connecting tokens with the same POS. 11% of nouns were translated as verbs (the reverse mapping is found, albeit to a lesser extent, in all three corpora). Most of such cases involve names of actions and agents (borrowing, ruler, etc.). En negative particles are split between Zh adverbs, verbs, and auxiliaries; adjectives are quite often mapped to nouns, which form parts of compounds (e.g., social media → shèjiāo méitǐ, lit. 'social-interaction media'). Adpositions involving spatial relations (the only type of adpositions we consider as content words) are predominantly mapped to adverbs.
The En-Ko matrix is even more divergent: only 62% of the alignment edges connect matching POS. The most striking property of the En-Ko POS matrix is that NOUN serves as a "sink" for other POS: 27% of En adverbs, 56% of En adjectives, and 60.5% of En verbs correspond to Ko nouns. For example, En trying (to do something) corresponds to Ko misu 'attempt'. As we will show in the next section, this is due to drastic syntactic divergences in En-Ko.
The En-Jp matrix is similar: 62.4% of the edges connect matching POS. Verbs are mostly translated as verbs (58.1%), which shows more affinity between En and Jp basic clause structure. However, adjectives still mostly turn into nouns (53.7%), and adverbs are quite likely to get translated by a noun (16.4% vs. 25.8% for adverb→adverb).
Both Ko and especially Jp tend to leave En pronouns unaligned (15% and 59% respectively), up-holding their reputations as "radically pro-drop" languages (Neeleman and Szendrői, 2007). Interestingly, Zh, another classical example of this phenomenon, loses only 9% of the pronouns. Ru, a mildly pro-drop language, loses 4% of the pronouns, while the non-pro-drop Fr loses only 2%. This demonstrates the fine granularity of distinctions an empirical approach to CLMD can yield. Table 2 presents the matrices of target-side syntactic relations that correspond to single-edge sourceside relations in the five parallel corpora.

Divergences in Syntactic Relations
Several observations can be made. First, the En-Fr and En-Ru matrices are similar and are dominated by the elements on the main diagonal (60% of the total number of edges in En-Fr and 55% in En-Ru). An exception are compounds (which in En are mostly noun compounds), as Ru does not have a truly productive nominal compounding process and Fr compounds are annotated as other relations in UD (Kahane et al., 2017). The other three matrices are less dominated by the entries on the main diagonal (46% of the alignments in En-Zh, 32% in En-Ko, 25.8% in En-Jp) and show higher entropy in most rows, especially in nmod, amod, obl, and xcomp, compound again being a notable exception (entropy matrices for all relations can be found in Appendix D).
Adverbial clauses (advcl) have relatively low values on main diagonals and a high percentage of single edges corresponding to multi-edge paths. This reflects the wide semantic range of advcl: in addition to modifying the matrix predicate (died by drowning), they can also denote sequential and parallel events (published a paper sparking a debate). The latter two cases naturally give rise to conj and complement clauses (cf. published a paper and sparked a debate / published a paper to spark a debate), the most common other path in En-Ru and En-Fr respectively. As we show in § 5.2, there is also a converse phenomenon: sequences of events represented using coordinated clauses, ccomp, or xcomp in En are translated with advcl in East Asian languages.
Of particular interest are the differences between En-Ko and En-Jp confusion matrices. Japanese and Korean are largely similar from the point of view of language typology (SOV word order, topic prominence, agglutinative morphology), but there are also important differences on the level of usage. Thus, the adjective class in Korean is less productive, and translations often resort to relative clauses for the purposes of nominal modification. Another difference is the fact that Japanese has few compounds as those are usually translated as nmod with a genitive particle, while Korean translates nearly all En compounds as compounds. See the discussion of further differences in the next section.

Qualitative Analysis of Divergences
In this section, we analyze prominent cases of divergences revealed by applying our method, attempting to demonstrate how fine-grained CLMD may be detected from the confusion matrices and shedding light on what challenges are involved in bridging these divergences (e.g., for the purposes of MT or cross-lingual transfer). Some of the divergences arise due to real differences between grammars; others are largely due to inconsistent application of the UD methodology.

Nominal and Adjectival Modifiers
When inspecting the translation correspondents of adjectives, we find that while in En-Fr and En-Ru the adjective classes are mostly overlapping, this is not the case for Zh, Ko and Jp. Instead, translation into these languages shifts probability mass from adjectives to nouns: nouns are hardly ever translated to adjectives, but adjectives are more likely to be translated to nouns than remain adjectives. This trend is related to a preference to translate adjectives into possessives (e.g., Korean company → Jp: Kankoku no kaisha lit. 'a company of Korea') or compounds (e.g., European market → Jp: Oshū ichiba lit. 'Europe market').

Nominal Subjects
The confusion matrix shows that En nsubj demonstrates very different multi-edge mappings into European languages (Fr and Ru) as opposed to East Asian ones (Zh, Jp, and Ko). The "most common other path" for both Russian and French is xcomp+nsubj, which is easy to explain: PUD corpora of these languages "demote" fewer auxiliary predicates than English (criteria for demotion are formulated in terms of superficial syntax and differ between languages) and more often place the dependent predicates as the root. Therefore, in constructions like he could do something the direct edge between the subject and the verb of the dependent clause is replaced with two edges going through the modal predicate. 7 In Zh, Ko and Jp, however, there is another issue: sequential events described using coordinated conjuncts and xcomp in En are analyzed as being described with temporal or causal subordinate clauses (Kipling met and fell in love with Florence Garrard → Ko: Kipeulring manna meet.subordinate sarange in.love ppajyeosseumyeo fell , lit. 'having met, fell in love').
This makes the direct nsubj edge in En correspond to an Ko nsubj edge within a subordinated clause, and thus a nsubj+advcl path. Given that not all coordinated verbs are translated using a subordinate clause in Ko and Zh, bridging these divergences is likely to require more than a simple tree-transformation rule but possibly refinement of UD's categories to more abstract linkage types.

Modal Auxiliaries in Korean
UD treats En modal verbs, such as can or may, as aux, which are dependent on the lexical verb (e.g., could aux ← − − do). Corresponding verbs in other languages are often treated as simple verbs (for example, all Ru modal verbs are simple verbs in UD).
Even more drastically, Ko routinely expresses this semantics by using an existential construction with the literal meaning '(for) X there was a possibility of doing Y' (instead of X could do Y), which converts the En aux into nsubj+acl. In this case, a tree transformation seems to be sufficient to bridge this divergence.

nmod→acl+X in Korean
Ko also differs from other languages in the extent that it uses relative clauses for nominal modification. Table 2 shows that nmod has a high percentage for "other" mappings (48%). Investigation of this long tail shows that to a large extent it consists of acl-based constructions: acl+advmod, acl+nsubj, acl+obj. Added to acl, the cumulative share of acl-based constructions is on par with compound, the main correspondent of this relation for non-possessive nmod (possessive nmod are the only ones that map to nmod in Ko). This discrepancy is due to the fact that Ko nearly 7 Cf. also 23 En nsubj edges mapped to Ru nsubj+obl. Inspection of these sentences reveals that the CLMD can be ascribed to metaphorical usage (e.g., the sense of read employed in the post reads has no direct correspondent in Ru). Some such cases can be disambiguated using existing annotation schemes. obligatorily adds contextually-predictable predicates to oblique relations such as actions [taken] in Crimea or people [being] without children. The Korean PUD does not demote these verbs to functional-word status (such an approach is advocated for in [Gerdes and Kahane, 2016] Dorr (1994). Row headings are explained in § 6.

Revisiting Dorr's Divergences
We turn to show that the types of the seminal classification of divergences from Dorr (1994) can be, with a single exception, recast in terms of UD divergences. We quote the original formulations of the divergences illustrated through English-Spanish or English-German examples.
Thematic divergence: E: I like Mary ⇔ S: María me gusta a mí 'Mary pleases me.' In UD, this corresponds to the situation when the original obj or obl becomes the nsubj and vice versa. The divergence will correspond to a CSR of type (nsubj, obj) or (nsubj, obl). A "full" thematic divergence will also involve the inverse divergence (obj, subj) or (obl, subj). 8 The list of examples we can discuss goes on. For example, while investigating the cross-linguistic patterning of English advcl, we noticed that it often gets mapped to ccomp in French and acl in Russian. Both divergent annotations seem to be erroneous as the sentences they appear in are covered by the definition of advcl provided in the UD manual. However, the French case is interesting in that the source advcl in question can be characterized semantically: instead of denoting a secondary action, they reflect a sequence of events or parallel scenes (e.g., Columbus sailed across the Atlantic... sparking a period of European exploration of the Americas). Another problem is presented by multi-word expressions analyzed as proper nouns where all tokens have the same POS tag. The UD manual advises to retain the original parts of speech in proper nouns consisting of phrases (e.g., Cat on a Hot Tin Roof ) but allows to treat "words that are etymologically adjectives" as PROPN in names such as the Yellow Pages. When such names are translated, PROPN get reanalyzed as ADJ, NOUN, etc., producing spurious CLMD.
Promotional divergence: E: John usually goes home ⇔ S: Juan suele ir a casa 'John tends to go home.' This corresponds to the situation where the original root predicate becomes an xcomp, and the original advmod takes its place as the root. Corresponding CSR type: (advmod, xcomp).
Demotional divergence: E: I like eating ⇔ G: Ich esse gern 'I eat likingly.' The original xcomp becomes the root predicate, and the original root predicate is demoted to the position of an advmod. The relevant CSR type is (xcomp, advmod).
Structural divergence: E: John entered the house ⇔ S: Juan entró en la casa 'John entered in the house.' The original obj becomes an obl. CSR type: (obj, obl).
Conflational divergence: E: I stabbed John ⇔ S: Yo le di puñaladas a Juan 'I gave knife-wounds to John.' The original root predicate is in a oneto-many alignment with a combination of a root predicate and its obj.
Lexical divergence: E: John broke into the room ⇔ S: Juan forzó la entrada al cuarto 'John forced (the) entry to the room.' Divergences of this type arise whenever aligned words have at best partially overlapping semantic content and never appear on their own but always with other divergences. The information necessary to ascertain the degree of word-meaning overlap is not embedded into UD or any other cross-lingual annotation scheme; therefore we were unable to provide a formal interpretation of this type of divergence.
Frequencies of Dorr's Divergences in PUD are presented in Table 1 (except for Lexical divergences, which are hard to formalize). It is evident that these types only account for a small portion of the encountered divergences, the point already made for En-Zh in DX17. It seems then that "hand-crafted" translation divergences, however insightful they may be, receive attention disproportionate to their empirical frequency.

Perspectives for Automation
One of the strengths of our approach is that it only relies on UD parses and alignments, for which automatic tools exist in many languages. To demonstrate the feasibility of an automated protocol, we conducted an analysis of the WMT14 En-Zh News Commentary corpus. 9 We used TsinghuaAligner (Liu and Sun, 2015) and pretrained English and Chinese UD parsers from the StanfordNLP toolkit (Qi et al., 2018). To verify that the aligner we are using is adequate for the task, we aligned the En-Zh PUD corpus pair and checked the resulting precision and recall of the edges corresponding to content words. 10 The results (P=0.86, R=0.32) indicate that the automated approach is able to recover around a third of the information obtained through manual alignment with reasonable precision. Importantly, we find recall to be nearly uniform for all source edge types, which suggests that the low recall can be mitigated by using a larger corpus without biasing the results.
The POS and edge-type confusion matrices built from this experiment are very similar to the ones reported in this paper (save for compound, which is not produced by the Stanford Zh parser), and are not reproduced here (they can be found in the Supplementary Materials).

Applicability for Zero-Shot Parsing
We come to demonstrate the applicability of our method for analyzing the performance of a downstream cross-lingual transfer task. We consider zero-shot cross-lingual parsing (Ammar et al., 2016;Schuster et al., 2019) as a test case and investigate to what extent the performance of a zeroshot parser on a given dependency label can be predicted from its stability in translation. As test sets we use the test sets of GSD UD corpora for the five languages (Ru, Fr, Zh, Ko, and Jp), as well as the corresponding PUD corpora. We train a parser following the setup of Mulcaire et al. (2019) and use a pretrained multilingual BERT (Devlin et al., 2019), feeding its output embeddings into a biaffine-attention neural UD parser (Dozat and Manning, 2017) trained on the English EWT corpus. We evaluate the parser's ability to predict relation types by computing F-scores for each de-pendency label (save for labels corresponding to function words that were generally not aligned). Appendix E gives full implementation details.
We start by computing Spearman correlations between F-scores and the PRESERVATION indices, defined as the proportion of identity mappings in the confusion matrices for each corpus (e.g., PRESERVATION for acl in Ru is 0.48, while in Jp it is 0.37). The correlations are very strong for some languages and noticeable for others (ρ = 0.62,0.75,0.31,0.42,0.77 for Ru,Fr,Zh,Ko,and Jp respectively on GSD test sets,and ρ = 0.7,0.82,0.72,0.84,0.68 on PUD).
We hypothesize that the preservation of a relation in translation is related to the ability of a zero-shot parser to predict it. In order to control for obvious covariates, we introduce two control variables: (1) SOURCE-SIDE HARDNESS (test-set F-scores attained by the parser on English dependency relations) and (2) TARGET-SIDE HARDNESS (F-scores attained by a parser trained on the targetlanguage UD GSD corpus on the target-language test set). We use a mixed-effects model with PRESERVATION, SOURCE-SIDE HARDNESS, and TARGET-SIDE HARDNESS as fixed effects, random intercepts for language, and F-scores for dependency relations as the dependent variable. We then used likelihood-ratio test to compute p-values for the difference in predictive power between the model without PRESERVATION and one with it. The p-value (using Holm correction) is highly significant (< 0.001) for the PUD corpora, and for GSD it is significant with p = 0.02.
These results suggest that morphosyntactic differences between languages, as uncovered by our method, play a role in the transferability of parsers across languages. This also underscores the potential utility of bridging CLMD for improving syntactic transfer across languages.

Discussion
The presented methodology gives easy access to different levels of analysis. On one hand, by focusing on content words, the approach abstracts away from much local-syntactic detail (such as reordering or adding/removing function words). At the same time, the methodology and datasets provide means to investigate essentially any kind of well-defined CLMD. Indeed, since function words in UD tend to be dependents of content words, we may analyze the former by considering the distri-bution of function word types that each type of content word has. Moreover, sub-typing dependency paths based on their linear direction can allow investigating word-order differences. 11 Other than informing the development of crosslingual transfer learning, our analysis directly supports the validation of UD annotation. For example, we reveal inconsistencies in the treatment of multi-word expressions across languages. Thus, the translation of many NPs with adjectival modifiers, such as Celtic sea or episcopal church, are analyzed as compounds. Languages such as Ru, lacking a truly productive nominal-compound relation, carve this class up based mostly on the POS of the dependent element (e.g., episcopal corresponds to a Ru amod), its semantic class (e.g., compounds with cardinal directions are Ru amods), and whether the dependent element itself has dependents (these mostly correspond to Ru nmods). Our method can be used to detect and bridge such inconsistencies.
In conclusion we note that our analysis suggests that considerable entropy in the mapping between the syntactic relations of the source and target sides can be reduced by removing inconsistencies in the application of UD, and perhaps more importantly by refining UD with semantic distinctions that will normalize corresponding constructions across languages to have a similar annotation. This will simultaneously advance UD's stated goal of "bringing out cross-linguistic parallelism across languages" and, as our results on zero-shot parsing suggest, make it more useful for cross-linguistic transfer.  Table 2: Percentages of corresponding syntactic relations of UD edge types connecting content words in the five corpora. Rows correspond to En and sum to 100%; columns correspond to target side paths. "Collapsed" are cases where the edge is collapsed to a single node. "MCOP" stands for most common other target-side path. See Appendix C.2 for raw counts.