Hybrid Enhanced Universal Dependencies Parsing

This paper describes our system to predict enhanced dependencies for Universal Dependencies (UD) treebanks, which ranked 2nd in the Shared Task on Enhanced Dependency Parsing with an average ELAS of 82.60%. Our system uses a hybrid two-step approach. First, we use a graph-based parser to extract a basic syntactic dependency tree. Then, we use a set of linguistic rules which generate the enhanced dependencies for the syntactic tree. The application of these rules is optimized using a classifier which predicts their suitability in the given context. A key advantage of this approach is its language independence, as rules rely solely on dependency trees and UPOS tags which are shared across all languages.


Introduction
Parsing Enhanced Universal Dependencies (EUD) (Schuster and Manning, 2016) is an interesting extension of dependency parsing. EUDs provide syntactic information which can be crucial for any NLP processing based on syntactic analysis.
The shared task on EUD parsing (Bouma et al., 2020) provided the platform to develop and compare various systems. Our team participated using a hybrid system (machine learning/rule-based) which came second in both metrics, ELAS (82,6%) and EULAS (84,6%).

Related Work
Whereas basic dependencies are strict surface syntax trees, enhanced dependencies are implicite syntactic links in constructions like coordinations, raise/control constructions or relative clauses. EUDs also enrich existing basic dependencies such as obl and nmod relations by adding information about the adposition used and morphological cases. Finally EUDs propose the syntactic annotation of elided words, absent in the actual sentence . Even though the basic dependencies tree (apart from the orphan relation) is part of the EUD graph, the latter is no longer a tree, since individual tokens can have more than one head. Most of the EUDs can be predicted deterministically , others, notably the prediction of EUDs for elided words, are more complex .

System description
The data of Universal Dependencies treebanks (Nivre et al., 2016) 1 used for the shared task and annotated with enhanced dependencies (other than copied basic dependencies) is small. In total, all training treebanks contain about 5.1 million words, only 5.6% of those have a second enhanced dependency attached to them (the first being the copied basic dependency). Another 7.2% and 8.3% of words have an enhanced dependency like obl:... or nmod:... which correspond to the basic dependency but also give the adposition and morphological case (if existing in the language in question). In total, only 21.1% or 1 million words have any non basic enhanced dependency.
The enhanced dependencies address specific and well known linguistic phenomena, and are relatively deterministic , once the basic dependency tree is available. For this reason, we decided to utilise a hybrid system, using a graph-based parser to produce first a dependency tree and a rule system which uses the generated dependency tree to determin the enhanced dependencies. The latter uses a (learned) filter to control the application of rules in certain contexts. The system functions as a pipeline (cf. Figure 1). Thus errors in earlier parts of the pipeline will impact the results of the following components.  Figure 1 2 ). We trained a tokenizer per language using the training files ([a'] in Fig. 1) of each treebank. For languages with more than one treebank, we chose the one with the largest training file. A python postprocessing script ([b] in Fig. 1) deals with obvious tokenization errors, such as quotes concatenated to letters (e.g. like word" , word" or "word) and separates these tokens into two.

Tagging and parsing
To tag and parse the texts, we use a special version of UdpipeFuture ([c] in Fig. 1) (Straka, 2018), winner in terms of the Morphology-aware Labeled Attachment Score (MLAS) 3 metric of the shared task on Dependency Parsing in 2018. In our version, we also use contextual embeddings, however instead of using ELMo (Peters et al., 2018), we experimented with a range of contextual embeddings, either multilingual as BERT (Devlin et al., 2019) or XLM-R (Conneau et al., 2019), or language specific models like RoBERTa-large (English, Liu et al. (2019)), CamemBERT (French, Martin et al. (2019)). Experiments with these embeddings on the CoNLL 2018 Shared Task (Zeman et al., 2018) data show that XLM-R outperforms the best score for nearly all treebanks of the 2018 Shared Task. The Content-Word Labeled Attachment Score (CLAS) 4 scores 2 Letters in brackets refer to the architecture diagrams shown in Figures 1 and 4. Identical letters refer to the same component.
3 MLAS is metric inspired by the Content-Word Labeled Attachment Score (CLAS) (Zeman et al., 2018) which takes into account POS tags and morphological features. 4 CLAS is a variant or the classical Labeled Attachment Score (Nivre and Fang, 2017). It only takes into account dependency reations between content words, in order to be able for these experiments on the treebanks which are used for the Enhanced Dependencies Shared Task are given in Table 1 Although the CoNLL 2018 Shared Task is based on UD v2.2, we were able to produce similar promising results with the data provided by the current shared task, based on UD v2.5.
To prepare for the shared task, we first merged treebanks of the same language when more than one was available: this was the case for Czech, Dutch, Estonian and Polish. Then, we trained and tested the tagging and parsing using UDPipeFuture with all contextual word embedding models available for the given language (unless the treebank did not provide a dev-file, as e.g. the PUD treebanks). In addition to the multilingual contextual embeddings BERT and XLM-R, we also tested some language-specific transformers such as Arabic BERT 5 , CamemBERT 6 (French, Martin   Evaluations on the development corpora showed that XLM-R gave the best results for nearly all languages, with some exceptions: for Arabic (Arabic BERT), Bulgarian (Slavic BERT), Finnish (Finnish BERT), French (CamemBERT), Italian (Italian BERT) and Dutch (Dutch BERT) the languagespecific versions of BERT gave better results in terms of Labeled Attachment Score (LAS) for the parsing.
To obtain a language-independent system which can predict enhanced dependencies on any language, we need homogeneous annotations in all the treebanks. Since these annotations, which require time-consuming manual work, are currently missing in many UD treebanks, and the existing annotations are not always homogeneous, we opted for a rule-based system. For example, dep is used frequently as an additional 13 enhanced dependency in the Czech and Arabic treebanks. Other differences stem from language differences, e.g. in Finnish-TDT and Polish-PDB case information is sometimes given with the nmod:poss enhanced dependency, which is absent in other treebanks for languages without morphological case. Similarly, the conj enhanced dependency is enriched with the lemma of the cc relation only in the treebanks of Dutch, English, Italian and Swedish. Similar differences can be observed for relative pronouns or case information for oblique nominals (obl:<prep>:<case>) or nominal modifiers (nmod:<prep>:<case>). The French-Sequoia treebank frequently employs nmod:enh, amod:enh, nsubj:enh and nsubj:passxoxobjenh which are not defined in the guidelines.
Our script takes into account these language specific differences. For example, it discards prepositions and case information in nmod/obl enhanced dependencies for languages where this information has not been annotated. In general, the script mainly exploits basic dependencies and UPOS, i.e. universal information, to determine the enhanced dependencies.
The script first initialises enhanced dependencies by copying all basic dependencies (except orphan). In a second step we look for all words with a obl and nmod relation and check whether they have a case-dependant. If so, we enrich the enhanced dependencies with the lemma of this dependant. If present, we add the Case-feature to obl:<ADP> and nmod:<ADP> as well.
For coordinations of nouns, we simply take the heads of words with a conj-relation (cf. relation (A) in Figure 2) and determine the dependency relation of its head (relation (B) in Fig. 2). With this information we can add the enhanced dependency relation of the coordinated noun to its enhanced head (relation (C) in Fig. 2). We also enrich the conj-relation (relation (D) in Fig. 2)  In order to insert elided nodes, we interpret the orphan relation. Whereas the insertion itself works fine, we were not able to predict correctly the needed enhanced dependencies for elided nodes and have abandoned this prediction for the shared task.
We validated the rules for enhanced dependency extraction on gold basic dependencies from the validation corpora to avoid the accumulation of errors from the tagging and parsing step. This yielded encouraging results presented in Table 3.
To further improve the performance of the rulebased approach, and to take into account the errors in the tagging/parsing step, we add the ICSIBoost 14 classifier (Favre et al., 2008).
This classifier (cf. [g] in Figures 1 and 4)   a given context. For this task, we trained a single classifier using the following features: • rule name • treebank language • enhanced dependency label • UPOS of enhanced dependency head • (basic) dependency relation of the enhanced dependency head, • distance (in words) of the basic dependency head • distance of the enhanced dependency head To generate the training corpus of ICSIBoost, we ran our enhance-script ([d2] in Fig. 4) on the training CoNLL-U files ([a'] in Fig. 4), with gold UPOS and basic dependencies) of each language to obtain the list of appropriate features and the information whether the rule produced a correct EUD or not within the given context ([d'] in Fig. 4). We then trained ICSIBoosts on this list to obtain a classifier model ([f'] in Fig. 4) which we integrated into the enhance-script ([d3] in Fig. 4) to obtain more accurate predictions.
To get the best threshold for each language, we ran our script ([d3]) on the development CoNLL-U files with various thresholds for each language with UPOS and basic dependencies predicted by Ud-pipeFuture using contextual embeddings. Rules   Figure 4: ICSIBoost training whose score fell below the threshold, in a given context, were not applied. It turned out that thresholds between 30% and 60% gave the best result in terms of ELAS (cf.  Table 4: ICSIBoost thresholds to apply a rule in a given context Running our entire pipeline on gold UPOS and basic dependencies shows that we can predict enhanced dependencies with a very high precision (cf. Table 5).
Applying the entire pipeline on the raw text files provided for the evaluation produced the results shown in Table 6. Since the script which generates the enhanced dependencies depends on basic dependencies and indirectly on the UPOS tags, a lower LAS yields a lower ELAS. By definition, EULAS is always slightly above ELAS. We do not exploit XPOS, since they are too language-specific. Thus the bad results for Finnish XPOS tags do not have an impact on the E(U)LAS score (Table 6). Interestingly the poor sentence segmentation re-

Conclusion and perspectives
Considering that training data was heterogeneous, partially incomplete, and in general not very voluminuous, our hybrid machine-learning (ML)/rulebased approach gave very good results for the shared task. A possible extension would be the processing of elided nodes. Even if for the long term a purely ML-based approach may prove more efficient, at least our language-independent system can help to preannotate existing UD treebanks which, after human validation, can be the basis of an ML approach on predicting enhanced dependencies.