Improving the Performance of UDify with Linguistic Typology Knowledge

UDify is the state-of-the-art language-agnostic dependency parser which is trained on a polyglot corpus of 75 languages. This multilingual modeling enables the model to generalize over unknown/lesser-known languages, thus leading to improved performance on low-resource languages. In this work we used linguistic typology knowledge available in URIEL database, to improve the cross-lingual transferring ability of UDify even further.


Introduction
State-of-the-art approaches to dependency parsing are supervised approaches that require large manually annotated dataset to be trained on, thus limiting their utility to only few high-resource languages. Multilingual modeling which involves training a model on a mixed polyglot corpus of high-resource source-languages and applying it on a low resource target-language, is an effective way to circumvent this issue of data-sparsity. In a similar way, as the proficiency of a speaker's previous languages can enhance his/her ability to learn a new language (Abu-Rabia and Sanitsky, 2010), a model which is trained on multilingual dataset can learn to generalize over unknown or lesser-known languages. UDify (Kondratyuk and Straka, 2019) is the stateof-the-art mBERT based language-agnostic dependency parser, which takes the advantage of multilingual modeling to improve its performance on low-resource languages. The authors of UDify (Kondratyuk and Straka, 2019) trained it on a joint polyglot corpus created by concatenating all training treebanks available in UDv2.3, and evaluated it on all test treebanks in UDv2.3 individually. Results outlined in (Kondratyuk and Straka, 2019) show that for dependency parsing task, the UDify outperforms its baseline monolingual UDPipe Future (Straka, 2018) model by a large margin especially for low-resource languages, as the model benefit significantly from the cross-lingual transfer learning which occurs due to joint polyglot training. However, the performance of UDify model on the low-resource languages (less represented in the polyglot training corpus) is still much lower than the performance of it on the high-resource languages which are well represented within the training corpus. In this work, we use linguistic typology knowledge to improve the cross-lingual transferring ability of UDify model even further, thereby significantly reducing this gap between model's performance on high-resource and low-resource languages. We induce the linguistic typology knowledge available in URIEL (Littell et al., 2017) database into the UDify model by adding an auxiliary task of linguistic typology feature prediction to it, within the multitasking framework. Sections 3 and 4 will describe the model in more details.

Related Work
Cross-lingual Model-transfer approaches to Dependency Parsing such as (McDonald et al., 2011;Cohen et al., 2011;Duong et al., 2015;Guo et al., 2016;Vilares et al., 2015;Falenska and Çetinoglu, 2017;Mulcaire et al., 2019;Vania et al., 2019;Shareghi et al., 2019) involve training a model on high-resource languages and subsequently adapting it to low-resource languages. Participants of CoNLL 2017 shared-task (Daniel et al., 2017) and CoNLL 2018 shared task (Zeman et al., 2018) also provide numerous approaches to dependency parsing of low-resource languages. Some approaches such as (Naseem et al., 2012;Täckström et al., 2013;Barzilay and Zhang, 2015;Wang and Eisner, 2016a;Rasooli and Collins, 2017;Ammar, 2016;Wang and Eisner, 2016b) indeed used linguistic typology to facilitate the cross-lingual transfer between source and target languages. However, all these approaches directly feed the linguistic typology features into the respective model, whereas we induce the linguistic typology knowledge into UDify model through Multitask learning. Inducing typology knowledge through Multitask learning rather than directly feeding it along with word-embeddings have following advantages. 1. The model can also be applied to low-resource languages for which many typology feature values are unknown/missing.
2. The auxiliary task should help to improve the performance on the main dependency parsing task as well, since it would make the model give special emphasis on the syntactic typology (specially word-order typology) of language being parsed while predicting the dependency relations.

UDify
UDify is a multitasking multilingual BERT based model which performs four key languageprocessing tasks simultaneously namely UPOStagging, UFeat-tagging, Lemmatization and Dependency Parsing, in a multitasking framework. The model utilizes a single shared mBERT based encoder, and four individual task-specific decoders, for each of the four tasks respectively. The mBERT Encoder takes in the entire sentence as input, tokenizes it using pre-trained WordPiece Tokenizer (Wu et al., 2016) and subsequently outputs mBERT (Wu and Dredze, 2019) based contextualized-embeddings for each word within the input-sentence. We refer to original UDify (Kondratyuk and Straka, 2019) paper for detailed description of mechanism of computing/fine-tuning such contextualized embeddings. The decoders for both UPOS-tagging and UFeattagging tasks adopt standard sequence-tagging architecture with softmax layer on the top. These decoders accept the contextual embeddings generated from the mBERT Encoder for each word in the input sentence, and predicts its UPOS/Ufeats tag. For Lemmatization task as well, the model uses a standard sequence-tagger which predicts a classtag representing a unique edit script, for each word. An edit-script is simply the sequence of character operations to transform a word form to its lemmaform.
For dependency-parsing, the model adopts the pop-ular deep biaffine architecture (Dozat and Manning, 2016) for graph-based parsing, with LSTMencoder been replaced by the shared mBERT Encoder.

Linguistic Typology prediction
Here e </s> ∈ R d is the contextual embedding from the shared mBERT Encoder for end-token < /s > of the input-sentence. U ∈ R d * N and c ∈ RN are weights and biases respectively. P r T y comprises of the probability of value of each URIEL binary feature being as 1, for the specific language being parsed. The total-loss is computed by simply adding the Typology Predictor loss to UDify model's (as computed in (Kondratyuk and Straka, 2019))

Experiments
This section describes the details of experiments conducted to evaluate our proposed model.

Experimental Setup
Both baseline UDify and the proposed UDify+Typology-predictor models are trained on a single large joint-polyglot corpus, created by concatenating all training datasets available in UDv2.5 2 together. Before each training-epoch, we randomly shuffled all sentences in our polyglot training corpus, and subsequently fed mixed batches of sentences from this shuffled corpus into the model being trained, where each batch may contain sentences from any language or treebank (as done by authors of UDify (Kondratyuk and Straka, 2019)). We used batch-size of 32, drop-out probability of 0.01 and the pre-trained mBERT model cased_L-12_H-768_A-12 downloaded from tensorflow-hub 3 . We fine-tuned these hyper-parameters on Dev dataset for English-EWT treebank.

Results
We evaluated our proposed model on 80 test treebanks available in UDv2.5 datasets individually. Appendix A provides the results achieved on each of these 80 test-treebanks, whereas table 1 outlines the average results on all these 80 treebanks. All scores are evaluated using the official CoNLL 2018 Shared Task evaluation script. We compared the performance of our model with two baselines namely UDPipe Fututre(Straka, 2018) and UDify. URIEL database comprises of three categories of typology features namely Syntactic, Semantic and Phonological features. In this work, we evaluated three variants of our proposed model, based on the categories of features predicted by the typologypredictor within the auxiliary task, namely UDifyw-Syntax (predicts only syntactic typology features), UDify-w-Syntactic+Semantic (predicts syntactic and semantic typology-features) and UDifyw-All (predicts all the URIEL typology-features). Furthermore, we evaluated the performance of UDify-w-Lang_id model. The architecture of it is identical to our proposed model but the linguistic-typology predictor is replaced by a simple language-id predictor.

Discussion
It is observed that the UDify-w-Syntax variant of our proposed model outperforms the other two variants of it, for most of the test-treebanks, despite the fact that the UDify-w-Syntax+Semantic and UDifyw-All variants utilizes more typology-features than the UDify-w-Syntax variant.    Table 4: Results achieved in zero-shot learning scenario. UDify+ refers to UDify+Syntax model few-shot learning scenario, but does not lead to any improvement within zero-shot learning scenario. Furthermore, to ensure that the auxiliary task of linguistic typology-prediction is indeed responsible for the improvement in performance of UDify, we conducted numerous statistical t-tests to find the correlation between F1 scores achieved by the UDify+Syntax model for the auxiliary-task of typology-prediction, and various other performance parameters including the improvement in performance of UDify+Syntax over UDify. Table 3 outlays results of these t-tests.

Conclusion
In this work we used linguistic typology knowledge available in URIEL database to improve the cross-lingual transferring ability of the state-of-theart language-agnostic UDify parser. We injected typology knowledge in UDify model through an auxiliary task, in multitasking settings.

A Results
This section outlines the results obtained by the three variants of our proposed models namely UDify-w-Syntax (predicts only syntactic typology features), UDify-w-Syntactic+Semantic (predicts syntactic and semantic typology-features) and UDify-w-All (predicts all the URIEL typology-features), as well as the baselines.