Paradigm Completion for Derivational Morphology

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models adapted from the inflection task are able to learn the range of derivation patterns, and outperform a non-neural baseline by 16.4%. However, due to semantic, historical, and lexical considerations involved in derivational morphology, future work will be needed to achieve performance parity with inflection-generating systems.


Introduction
Unlike inflectional morphology, which produces grammatical variants of the same core lexical item (e.g., take →takes), derivational morphology is one of the key processes by which new lemmata are created.For example, the English verb corrode can evolve into the noun corrosion, the adjective corrodent, and numerous other complex derived forms such as anticorrosive.Derivational morphology is often highly productive, leading to the ready creation of neologisms such as Rao-Blackwellize and Rao-Blackwellization, both originating from the Rao-Blackwell theorem.Despite the prevalence of productive derivational morphology, however, there has been little work on its generation.Commonly used derivational resources such as NomBank (Meyers et al., 2004) are still finite.Moreover, the complex phonological and historical changes (e.g., the adjectiviza-tion corrode →corrosive) and affix selection (e.g., choosing between English deverbal suffixes -ment and -tion) make generation of derived forms an interesting and challenging problem for NLP.
In this work, we show that viewing derivational morphological processes as paradigmatic may be fruitful for generation.This means that there are a number of well-defined form-function pairs associated with a core word.For example, a typical English verb may have five forms in its inflectional paradigm, corresponding to its base (take), past tense (took), past participle (taken), progressive (taking) and third-person singular (takes) forms.These forms are related by a consistent set of relations, such as affixation.Similarly, a verb may have several slots in its derivational paradigm: The form take has the agentive nominalization taker, and the abilitative adjectivization takable.Note there are also consistent patterns associated with each derivational slot, e.g., the -er suffix regularly produces the agentive.
Exploiting this paradigmatic characterization of derivational morphology allows us to create a statistical model capable of generating derivationally complex forms.We apply state-of-the-art models for inflection generation, which learn mappings from fixed paradigm slots to derived forms.Empirically, we compare results for two models on the new task of derivational paradigm completion: a neural sequence-to-sequence model and a standard non-neural baseline.Our best neural model for derivation achieves 71.7% accuracy, beating the non-neural baseline by 16.4 points.Nevertheless, we note this is about 25 points lower than the equivalent model on the English inflection task (and even 20 points lower than the model's performance on the harder Finnish inflection generation).These results point to additional complications in derivation that require more elaborate models or data annotation to overcome.While inflection generation is becoming a solved problem (Cotterell et al., 2017), derivation generation is still very much open.

Derivational Morphology
The generation of derived forms is structurally similar to the generation of inflectional variants, but presents additional challenges for NLP.Here, we provide linguistic background comparing the two types of morphological processes.
Inflection and Derivation.Inflectional morphology primarily marks semantic features that are necessary for syntax, e.g., gender, tense and aspect.Thus, it follows that in most languages inflection never changes the part of speech of the word and often does not change its basic meaning.The set of inflectional forms for a given lexeme is said to form a paradigm, e.g., the full paradigm for the verb to take is take, taking, takes, took, taken .Each entry in an inflectional paradigm is termed a slot and is indexed by a syntacto-semantic category, e.g., the PAST form of take is took.We may reasonably expect that all English verbs-including neologisms-have these five forms. 1 Furthermore, there is typically a fairly regular relationship between a paradigm slot and its form (e.g., add -s for the third person singular form).Derivational morphology, on the other hand, often changes the core part of speech of a word and makes more radical changes in meaning.In fact, derivational processes are often subcategorized by the part-of-speech change they engender, e.g., corrode →corrosion is a deverbal nominalization.
1 Only a handful of English irregulars distinguish between the past tense and the past participle, e.g., took and taken, and thus have five unique forms in their verbal paradigms; most English verbs have four unique forms.
Derivational Paradigms.Much like inflection, derivational processes may be organized into paradigms, with slots corresponding to more abstract lexico-semantic categories for an associated part of speech (Corbin, 1987;Booij, 2008;Štekauer, 2014).Lieber (2004) presents one of the first theoretical frameworks to enumerate a set of derivational paradigm slots, motivated by previous studies of semantic primitives by Wierzbicka (1988).A partial listing of possible derivational paradigm slots for base English adjectives, nouns, and verbs is given in Table 1.The list contains several productive cases.A key difficulty comes from the the fact that the mapping between semantics and suffixes is not always clean; Lieber (2004) points out the category AGENT could be expressed by the suffix -er (as in runner) or by -ee (as in escapee).However, both -er and -ee may have the PATIENT role; consider burner ("a cheap phone intended to be disposed of, i.e. burned") and employee ("one being employed"), respectively.We flesh out partial derivational paradigms for several English verbs in Table 2.
Unlike in inflectional paradigms, where we expect most cells to be filled for any given base form, derivational paradigms often contain baseslot combinations that are not semantically compatible, leading to the gaps in Table 2. 2 We also observe increased paradigm irregularity due to some derived forms becoming lexicalized at different points in history, differences in the language from which the base word entered the target language (e.g., English roots of Germanic and Latinate origin behave differently (Bauer, 1983)), as well as other factors that are not obvious from the characters in the base word (e.g., gender or number of the resulting noun).
As an example of how difficult these factors can make derivation, consider the wide variety of potential nominalizations corresponding to the RE-SULT of a verb, e.g., -ion, -al and -ment, (Jackendoff, 1975 the information required to choose the correct suffix may be both arbitrary or not easily available. Productivity.There is a general agreement in linguistics that frequently used complex words become part of the lexicon as wholes, while most others are likely to be constructed from constituents (Bauer, 2001;Aronoff and Lindsay, 2014); the latter ones typically follow derivational patterns, or rules, such as adding -able to express potential or ability or applying -ly to convert adjectives into adverbs.These patterns typically present two essential properties: productivity and restrictedness.Productivity relates to the ability of a pattern to be applied to any novel base form to create a new word, potentially on-the-fly.One example of such a productive transformation is adding -less (privative construction), which may attach to almost any noun to form an adjective.Moreover, the resulting form's meaning is compositional and predictable.Many derivational suffixes in English are of this type.On the other hand, some patterns are subject to semantic, pragmatic, morphological or phonological restrictions.
Consider the English patient suffix -ee, which cannot be attached to a base ending in /i(:)/, e.g., it cannot be attached to the verb free to form freeee.
Restrictedness is closely related to productivity, i.e., highly productive rules are less restricted.A parsimonious model of derivational morphology would describe forms using productive rules when possible, but may store forms with highly restricted patterns directly as full lexical items.
A Note On Terminology.We would like to make a subtle, but important point regarding terminology: the phrase morphologically rich in the NLP community almost exclusively refers to inflectional, rather than derivational morphology.For example, English is labeled as morphologically impoverished, whereas German and Russian are considered morphologically rich, e.g., see the introduction of Tsarfaty et al. (2010).As regards derivation, English is quite complex and even similar in richness to German or Russian as it contains productive formations from two substrata: Germanic and Latinate.From this perspective, English is very much a morphologically rich language.Indeed, a corpus study on the Brown Corpus showed that the majority of English words are morphologically complex when derivation is considered (Light, 1996).Note that there are many languages that exhibit neither rich inflection nor rich derivational morphology, e.g., Chinese, which most commonly employs compounding for word formation (Chung et al., 2014).

Task and Models
We discuss our two systems for derivational paradigm completion and the results they achieve.

Data
We experiment on English derivational triples extracted from NomBank (Meyers et al., 2004). 4ach triple consists of a base form, the semantics of the derivation and a corresponding derived form e.g., ameliorate, RESULT, amelioration .Note that in this task we do not predict whether a slot exists, merely what form it would take given the base and the slot.In terms of current study, we consider the following derivational types: verb nominalization such as RESULT, AGENT and PATIENT, adverbalization and adjective-noun transformations.
We intentionally avoid zero-derivations.We also exclude overly orthographically distant pairs by filtering out those for which the Levenshtein distance exceeds half the sum of their lengths, which appear to be misannotations in NomBank.The final dataset includes 6,029 derivational samples, which we split into train (70%), development (15%), and test (15%). 5We also note that Nom-Bank annotations are often semantically more coarse-grained.

Evaluation Metrics
We evaluate on 3 metrics: accuracy, average edit distance, and F 1 .Accuracy measures how often system output exactly matches the gold string.Edit distance, by comparison, measures the Levenshtein distance between system output and the gold string.Finally, we calculate affix F 1 scores for individual derivational affixes.E.g., forment precision is the number of words where the model correctly predicted -ment (out of total predictions) and recall is the number of words where the model correctly predicted out of the number of true words.

Baseline Transducer
We train a simple transducer for each base-toparadigm slot mapping in the training set, identical to the baseline described in Cotterell et al. (2016).This uses an averaged perceptron classifier to greedily apply an output transformation (substitution, deletion, or insertion) to each input character given the surrounding characters and previous decisions. 5The dataset is available at http://github.com/ryancotterell/derviational-paradigms.

RNN Encoder-Decoder
Following Kann and Schütze (2016) on the morphological inflection task, we use an encoderdecoder gated recurrent neural network (Bahdanau et al., 2015).First, an encoder network encodes a sequence: the concatenation of the characters of the input word and a tag describing the desired transformation-both represented by embeddings.This encoder is bidirectional and consists of two gated RNNs (Cho et al., 2014), one encoding the input in the forward direction and one encoding in the backward direction.The output of the two RNNs is the resulting hidden vectors − → h i and ← − h i .The hidden state is a concatenation of the forward and backward hidden vectors, i.e., The decoder also consists of an RNN, but is additionally equipped with an attention mechanism.The latter computes a weight for each of the encoder hidden vectors for each character or subtag, which can be roughly understood as giving a certain importance to each of the inputs.The probability of the target sequence y = (y 1 , . . ., y |y| ) given the input sequence x = (x 1 , . . ., x |x| ) is modeled by where EOS is a distinguished end-of-string symbol, g is a multi-layer perceptron, s t is the hidden state of the decoder and c t is the sum of the encoder states h i , scored by attention weights α i (s t−1 ) that depend on the decoder state: Input Encoding.We model this problem as a character translation problem, with special encodings for the transformation tags that indicate the type of derivation.For example, we treat the triple: ameliorate, RESULT, amelioration as the source string a m e l i o r a t e RESULT and target string a m e l i o r a t i o n.This is similar to the encoding in Kann and Schütze (2016).
Training.We use the Nematus toolkit (Sennrich et al., 2017). 6We exactly follow the recipe in Kann and Schütze (2016), the winning submission on the 2016 SIGMORPHON shared task for inflectional morphology.Accordingly, we use a character embedding size of 300, 100 hidden units in both the encoder and decoder, Adadelta (Zeiler, 2012) with a minibatch size of 20, and a beam size of 12.We train for 300 epochs and select the test model based on the performance on the development set.

Experimental Results
Table 3 compares the accuracy of our baseline system with the accuracy of our sequence-tosequence neural network using the data splits discussed in §3.1.In all cases, the network outperforms the baseline.While 1-best performance is not nearly as high as that expected from a stateof-the-art inflectional generation system, the key point is that performance significantly increases when considering the 10-best outputs.This suggests that the network is indeed learning the correct set of possible nominalization patterns.However, the information needed to correctly choose among these patterns for a given input is not necessarily available to the network.In particular, the network is only aware of important disambiguating historical (e.g., is the input of Latin or Greek origin) and lexical-semantic (e.g., is the input verb transitive or intransitive) factors to the extent that they are implicitly encoded in the input character sequence.We speculate that making these additional pieces of information directly available as input features will significantly improve 1-best accuracy.
Unfortunately, NomBank does not provide the necessary annotations in most cases.For instance, there is no way to differentiate actor and actress without gender.It also does not distinguish the semantics of some adjective nominalizations, e.g., activism and activity.Future work will reannotate NomBank to make these finer-grained distinctions.
Error Analysis.We observe mistakes on less frequent suffixes, e.

Related Work
Previous work in unsupervised morphological segmentation and has implicitly incorporated derivational morphology.Such systems attempt to segment words into all constituent morphs, treating inflectional and derivational affixes as equivalent.
The popular Morfessor tool (Creutz and Lagus, 2007) is one example of such an unsupervised segmentation system, but many others exist, e.g., Poon et al. (2009), Narasimhan et al. (2015) inter alia.Supervised segmentation and analysis models in the literature can also break down derivationally complex forms into their morphs, provided pre-segmented and labeled data is available for training (Ruokolainen et al., 2013;Cotterell et al., 2015;Cotterell and Schütze, 2017).Our work, however, builds directly upon recent efforts in the generation of inflectional morphology (Durrett and DeNero, 2013;Nicolai et al., 2015;Ahlberg et al., 2015;Rastogi et al., 2016;Faruqui et al., 2016).We differ in that we focus on derivational morphology.In another recent line of work, Vylomova et al. ( 2017) predict derivationally complex forms using sentential context.Our work differs from their approach in that we attempt to generate derivational forms divorced from the context, but the underlying neural sequence-to-sequence architecture is quite similar.

Conclusion
We have presented a statistical model for the generation of derivationally complex forms, a task that has gone essentially unexplored in the literature.Viewing derivational morphology as paradigmatic, where slots refer to semantic categories, e.g., corrode+RESULT →corrosion, we draw upon recent advances in the generation of inflectional morphology.Applying this method works well, achieving an overall accuracy of 71.71%, and beating a non-neural baseline.Performance, however, is lower than on the task of paradigm completion for inflectional morphology, indicating that paradigm completion for derivational morphology is more challenging than its inflectional counterpart.

Table 1 :
A partial list of derivational transformations in English with corresponding POS changes and semantic labels.

Table 2 :
).While any particular English verb will almost exclusively employ exactly one of these suffixes (e.g., we have refuse →refusal and other candidates * refusion and * refusement are illicit), 3 Partial derivational paradigm for several English verbs; semantic gaps are indicated with -.
g., -age-we predict * draination instead of drainage.Also, there are several cases where NomBank only lists one available form, e.g., complexity, and our model predicts complexness.We also see mistakes on irregular adverbs, e.g., we generate advancely from advance, rather than in-advance, as well as in PATIENT nominalizations, e.g., the model