KU-CST at CoNLL–SIGMORPHON 2018 Shared Task: a Tridirectional Model

In this paper we describe our sequence-to-sequence model for morphological inﬂection. We have constructed a common Encoder-Decoder network that encodes the input lemma into a dense vector to translate it to an inﬂected form, based on input morphological tags. The main novelty of the model is that the input lemma is encoded in three different directions: left-to-right, right-to-left and boundaries-to-center. In this paper we report the accuracies of the model compared to the same bidirectional approach.


Introduction
In this work we present the neural network architecture prepared for the task of morphological inflection in the CoNLL-SIGMORPHON 2018 Shared Task (Cotterell et al., 2018). Both morphological analysis and morphological inflection are crucial in end-to-end Natural Language Processing pipelines, as they are one of the initial steps performed before solving more high-level problems such as Named-Entity Recognition, Sentiment Analysis, or others.

Task
In the CoNLL-SIGMORPHON 2018 Shared Task there were two tasks to solve. In this work we present a possible solution for the first task, in which word-forms have to be built without considering the context. The input in the task is a lemma and a list morphological tags. The system should be then able to produce the corresponding word form. The following is an example from the Spanish dataset jaquear V ; CON D; 1; P L ↓ jaquearamos in which the input lemma is jaquear and its morphological tags state that the word form should be a verb (V) in conditional tense (COND), first person (1) in plural (PL). The output word form is jaquearamos.
The models can be trained and tested in over 100 languages and in three different settings, low-, medium-and high-resource scenarios (100, 1,000 or 10,000 training instances, respectively).

Dataset
As mentioned above we trained and tested our models in the provided dataset, which contains morphological inflections for over 100 languages. The information is encoded using Unicode and morphological tags follow the UniMorph tagging schema (Kirov et al., 2018).

Method
Following previous successful attempts to morphological inflection (Kann and Schütze, 2016), we built a model based on Neural Networks, specifically an Encoder-Decoder network  with an attention mechanism . Furthermore, instead of constructing a linguistically inspired model, we have shortly explored an engineering approach. The main novelty of our model is in the way the input is encoded.
In Lample et al. (2016) it is stated that recurrent architectures such as Recurrent Neural Networks are capable of encoding very long sequences, but the representation is biased towards the last explored items. Because of that, a bidirectional RNN could be expected to represent well the structure of a word, as it models both the ending (suffix) and the beginning (prefix) by the use of a forward encoder and a backward encoder, respectively.
Our model explores whether this architecture can be improved adding another encoder that en-codes the word starting at the boundaries and ending in the center, capturing in that way the central structure of the word. The architecture is shown in Figure 1.
Each input is encoded with three different encoders, left-to-right, right-to-left and boundariesto-center, and those encoded representations are concatenated. Then, a many-hot encoding representation of the morphological tags is concatenated at the end. In this way, we generate the representation of our source lemma with its morphological information. This representation goes to the decoder so that the output word is generated character by character.

Model configuration
The implementation is based on a Machine Translation model created using the Pytorch framework. It encodes sentences using three Recurrent Neural Networks with 128 GRU cells in each encoder. In order to train the decoder, we use a teacher forcing ratio of 0.5. We started training all the models for 10 epochs, but we could observe that the models from the low-resource scenario did not converge and the ones in the high-resource scenario did not improve results after the fifth epoch. Because of that, we train our models for 20, 15 and 5 epochs in the low-, medium-and high-resource scenarios, respectively.

Results
We tested a bidirectional and a tridirectional model and our expectations were that the tridirectional one would show a better performance in the development set. As you can see in Table 1, the mean accuracy is slightly better than in the bidirectional model with the same exact configuration, 1 although these differences are not significant according to a bootstrap test.
In Table 2 you can see the accuracies for each language in each setting with the tridirectional approach 2 . In Figure 2 we plotted these accuracies together with the accuracies of the bidirectional model. In some cases, our tridirectional approach is sufficiently more accurate than the bidirectional 1 Same number of epochs, same cell types, and same size of hidden memory size. We are aware, although, that the tridirectional approach has more parameters because it has three encoders instead of two.
2 In order to make it more interpretable, we marked in bold results that are better than the current baseline presented for the shared task.    Table 2: Accuracies for all languages in the low-, medium-and high-resource scenario using the tridirectioal Encoder-Decoder model. The last row shows the average accuracy for each resource scenario.

Conclusion and Future work
In this experiment we tried to approach morphological inflection using a slightly more complex Encoder-Decoder architecture, by encoding the lemmas in three different directions (left-to-right, right-to-left and boundaries-to-center). Although the model works quite well in some cases, there is plenty room for improvement. The main improvement that must be done is to continue experimenting with more parameters to check whether the addition of parameters improves results. Both in the medium-and highresource settings, there are some languages that show very bad performance, especially Finnish, Hungarian and Latin. We feel that there is a need of carefully analyzing their outputs so that to better understand the motivation for these low results. Data augmentation techniques were successfully used in the last CoNLL-SIGMORPHON 2018 Shared Task (Bergmanis et al., 2017;Nicolai et al., 2017;Silfverberg et al., 2017) and thus, we think that our model could see its results improved in the low-and the medium-resource setting by adding artificially generated data.
We expect that using external resources, such as Wikipedia, would have a positive effect. We could even analyze how much effect does a specific amount of text have in this task pretraining character embeddings with, for instance, 10,000, 50,000 or 100,000 characters.