EHU at the SIGMORPHON 2016 Shared Task. A Simple Proposal: Grapheme-to-Phoneme for Inflection

This paper presents a proposal for learning morphological inﬂections by a grapheme-to-phoneme learning model. No special processing is used for speciﬁc languages. The starting point has been our previous research on induction of phonology and morphology for normalization of historical texts. The results show that a very simple method can indeed improve upon some baselines, but does not reach the accuracies of the best systems in the task.


Introduction
In our previous work carried out in the context of normalization of historical texts (Etxeberria et al., 2016) we proposed an approach based on the induction of phonology. We obtained good results using only induced phonological weighted finite-state transducers (WFSTs), i.e. by leveraging the phoneme-to-grapheme method to yield a grapheme-to-grapheme model. The research question now is if the grapheme-to-grapheme model can be extended to handle morphological information instead of words or morphological segmentation. To assess this, we test a general solution that works without special processing for specific languages (i.e. we do not focus on special treatment of accents in Spanish and other idiosyncracies).

Task
We only have taken part in task 1 (Inflection from lemma/citation form) of the SIGMORPHON 2016 Shared Task (Cotterell et al., 2016). Given a lemma with its part-of-speech, the system must generate a target inflected form whose morphosyntactic description is given. 1 1 http://www.sigmorphon.org/sharedtask

Corpora and Resources
We use the data provided by the organizers of the task. Our first experiments and tuning were conducted on eight languages before the two additional 'surprise' languages (Maltese and Navajo) were provided.
We also ran experiments using the available bonus-resources (track 3) but after initial results we decided to present only a system using the basic resources.

Related work
In our previous work (Etxeberria et al., 2014;Etxeberria et al., 2016) we have used Phonetisaurus, 2 a WFST-driven phonology tool (Novak et al., 2012) which learns to map phonological changes using a noisy channel model. It is a solution that works well using a limited amount of training information. The task addressed earlier was the normalization of historical/dialectal texts.
In the same paper we demonstrated that the method is viable for language-independent normalization and we tested the same approach for normalization of Spanish and Slovene historical texts obtaining similar or better results than previous systems reported by Porta et al. (2013) (using hand-written rules) and Scherrer and Erjavec (2015) (using a character-based SMT system).
Because of the model's relative success with historical normalization and its simplicity, we developed the approach further for addressing the shared task problem.
There exist other finite-state transducer-based approaches, generally more complex than what we present, of which two warrant a mention: (i) Dreyer et al. (2008) develops a model for string-to-string transduction where results are improved using latent-variables.
The system includes finite-state technology (in the form of WFSA and PFSTs) in two of the three steps: concatenation, phonology, and phonetics.

Basic Method
We used Phonetisaurus to train a WFST-system that learns the changes that occur when going from the citation form to another form. This tool-while not specifically limited to such usesis widely used for rapid development of highquality grapheme-to-phoneme (g2p) converters. It is open-source, easy-to-use, and authors report promising results (Novak et al., 2012).
Phonetisaurus uses joint n-gram models and it is based on OpenFST, which learns a mapping of phonological changes using a noisy channel model. The application of the tool includes three major steps: 3. Decoding. The default decoder used in the WFST-based approach finds the best hypothesis for the input words given the WFST obtained in the previous step. It is also possible to extract a k-best list of output hypotheses for each word.
The alignment algorithm is capable of learning many-to-many relationships and includes three modifications to the basic toolkit: (a) a constraint is imposed such that only many-to-one and one-tomany alignments are considered during training; (b) during initialization, a joint alignment lattice is constructed for each input entry, and any unconnected arcs are deleted; 3 (c) all transitions, including those that model deletions and insertions, are initialized with and constrained to maintaining a non-zero weight.
As the results obtained with this tool were the best ones in our previous scenario, we decided to employ it for this task. Concretely, we have used Phonetisaurus to learn a WFST which can translate simplified morphological expressions to words to solve the inflection task. Once the transducer is trained, it can be used to generate correspondences for previously unseen morphological representations and their corresponding wordforms.

Testing the models
Using the development section for tuning we experimented with different variations in our approach in order to tune a good model for the problem.
First, we compacted the morphological information in a tag (which we consider a pseudomorpheme) by concatenating the first letter in the category with a consecutive number. For example, the first lines in the training corpus for German aalen pos=V, ... per=1,num=PL aalen aalen pos=V, ... per=3,num=PL aalen aalen pos=V, ... per=2,num=SG aaltest aalen pos=V, ... per=3,num=SG aalte aalen pos=V,tense=PRS aalend are converted into: aalen V0 aalen aalen V1 aalen aalen V2 aaltest aalen V3 aalte aalen V4 aalend Using this information three experiments were carried out where the morphosyntactic information was • treated as a suffix.
• treated as a suffix and as a prefix.
• treated as a suffix, as an infix in the center of the lemma, and as a prefix.
The strongest results were obtained using the second model for all languages except Finnish, which yielded the best results using only a suffixbased representation.
To illustrate the encoding, below are the first few entries in the development corpus for German: In a second step we built different WFSTs depending on the category, but this yielded no improvement. As an alternative, we decided to test if putting only the category information in the prefix (i.e. one character) could help in the task. This produced an improvement only for Finnish.
As a third step we tested the possibility of optimizing the size and the content of the tag (the pseudo-morpheme), attempting to match its length with the length of the corresponding morpheme, as in the following example for German encodings: This strategy produced no solid improvement in our preliminary experiments.

Evaluation
We have measured the quality using the metrics and the script provided by the organizers; the baseline figures also originate with the organizers.
In all the languages whole tags were injected as prefixes and suffixes, with the exception of Finnish, where in the prefix tag position only the first character is included. For example, for the wordform aakkostot 'alphabets' N+aakkosto+N9 is used instead of N9+aakkosto+N9.
For the submitted final test we retrained the transducer adding the development section to the training corpus. As can be seen in table 1, a slight improvement was obtained (0.43% on average).

Using external information
Trying to take advantage of bonus resources, we used a word list for Spanish, German and Russian available with the FreeLing package (Carreras et al., 2004) as a 1-gram language-model of words. Since it is possible to produce multiple outputs from the WFST we train, we also experimented with an approach where the WFST would return several ranked candidates (3, 5, and 10), and selecting the first one found in the word list. If none of the candidates appeared in the list, the first proposal was used.
Using this strategy the results for Spanish improved slightly (by 2%), while the results for German improved slightly less (by 0.2%), and the Russian results worsened (by -0.7%).  Table 2: Accuracy when using a word list for filtering the proposals from the WFST. The first column shows the results without any external resources used; in the second column a word list has been used for filtering the top 3 proposals and in the third column for filtering with the top 5 proposals.
Since FreeLing is known to produce the highestquality output for Spanish, we may assume that the results reflect the relative quality of the resources in that package.
Due to this limited improvement, we decided to present only the basic system for track 1.

Conclusions and future work
Previous work on lexical normalization on historical and dialectal texts has been extended and ap-plied to a morphological inflection scenario.
While the method is simple and somewhat limited, with results not fully competitive against the best reported systems (Cotterell et al., 2016), some difficult languages saw a relatively good performance (Navajo and Maltese).
In the near future, our aim is to improve the results by trying to place the tags and morphemes in a more congenial configuration for WFST training and to use existing proposals to harness available latent information (Dreyer et al., 2008). In addition to this, we plan to incorporate techniques learned from other participants in the shared task.