SupWSD: A Flexible Toolkit for Supervised Word Sense Disambiguation

In this demonstration we present SupWSD, a Java API for supervised Word Sense Disambiguation (WSD). This toolkit includes the implementation of a state-of-the-art supervised WSD system, together with a Natural Language Processing pipeline for preprocessing and feature extraction. Our aim is to provide an easy-to-use tool for the research community, designed to be modular, fast and scalable for training and testing on large datasets. The source code of SupWSD is available at http://github.com/SI3P/SupWSD.


Introduction
Word Sense Disambiguation (Navigli, 2009, WSD), is one of the long-standing challenges of Natural Language Understanding. Given a word in context and a pre-specified sense inventory, the task of WSD is to determine the intended meaning of that word depending on the context. Several WSD approaches have been proposed over the years and extensively studied by the research community, ranging from knowledgebased systems to semi-supervised and fully supervised models (Agirre et al., 2014;Taghipour and Ng, 2015b;Iacobacci et al., 2016). Nowadays a new line of research is emerging, and WSD is gradually shifting from a purely monolingual (i.e. English) setup to a wider multilingual setting Moro and Navigli, 2015). Since scaling up to multiple languages is considerably easier for knowledgebased systems, as they do not require senseannotated training data, various efforts have been made towards the automatic construction of highquality sense-annotated corpora for multiple lan-guages (Otegi et al., 2016;Delli Bovi et al., 2017), aimed at overcoming the so-called knowledge acquisition bottleneck of supervised models (Pilehvar and Navigli, 2014). These efforts include the use of Wikipedia, which can be considered a full-fledged, manually sense-annotated resource for numerous languages, and hence exploited as training data (Dandala et al., 2013).
Beside the automatic harvesting of senseannotated data for different languages, a variety of multilingual preprocessing pipelines has also been developed across the years (Padr and Stanilovsky, 2012;Agerri et al., 2014;Manning et al., 2014). To date, however, very few attempts have been made to integrate these data and tools with a supervised WSD framework; as a result, multilingual WSD has been almost exclusively tackled with knowledge-based systems, despite the fact that supervised models have been proved to consistently outperform knowledge-based ones in all standard benchmarks . As regards supervised WSD, It Makes Sense (Zhong and Ng, 2010, IMS) is indeed the de-facto state-ofthe-art system used for comparison in WSD, but it is available only for English, with the last major update dating back to 2010.
The publicly available implementation of IMS also suffers from two crucial drawbacks: (i) the design of the software makes the current code difficult to extend (e.g. with classes taking as input more than 15 parameters); (ii) the implementation is not optimized for larger datasets, being rather time-and resource-consuming. These difficulties hamper the work of contributors willing to update it, as well as the effort of researchers that would like to use it with languages other than English.
In this paper we present SUPWSD, whose objective is to overcome the aforementioned drawbacks, and facilitate the use of a supervised WSD software for both end users and researchers. SUP- WSD is designed to be modular and highly flexible, enabling contributors to extend it with ease. Its usage is simple and immediate: it is based on a jar file with only 2 commands and 3 parameters, along with an XML configuration file for specifying customized settings. SUPWSD supports the most widely used preprocessing tools in the research community: Stanford coreNLP (Manning et al., 2014), openNLP 1 , and TreeTagger (Schmid, 2013); as such, SUPWSD can directly handle all the languages supported by these tools. Finally, its architecture design relies on commonly used design patterns in Java (such as Factory and Observer among others), which make it flexible for a programmatic use and easily expandable.

SUPWSD: Architecture
In this section we describe the workflow of SUP-WSD. Figure 1 shows the architecture design of our framework: it is composed of four main modules, common for both the training and testing 1 opennlp.apache.org/ phase: (i) input parsing, (ii) text preprocessing, (iii) features extraction and (iv) classification.
Input parsing. Given either a plain text or an XML file as input, SUPWSD first parses the file and extracts groups of sentences to provide them as input for the subsequent text preprocessing module. Sentence grouping is used to parallelize the preprocessing module's execution and to make it less memory-intensive. Input files are loaded in memory using a lazy procedure (i.e. the parser does not load the file entirely at once, but processes it according to the segments of interest) which enables a smoother handling of large datasets. The parser specification depends on the format of the input file via a Factory patterns, in such a way that new additional parsers can easily be implemented and seamlessly integrated in the workflow (c.f. Section 3). SUPWSD currently features 6 different parsers, targeted to the various formats of the Senseval/SemeEval WSD competition (both all-words and lexical sample), along with a parser for plain text.
Text preprocessing. The text preprocessing module runs the pre-specified preprocessing pipeline on the input text, all the way from sentence splitting to dependency parsing, and retrieves the data used by the feature extraction module to construct the features. This module consists of a five-step pipeline: sentence splitting, tokenization, part-of-speech tagging, lemmatization and dependency parsing. SUPWSD currently supports two preprocessing options: Stanford and Hybrid. Both can be switched on and off using the configuration file. The former (default choice) provides a wrapper for the Stanford NLP pipeline, and selects the default Stanford model for each component. The latter, instead, enables the user to customize their model choice for each and every preprocessing step. For instance, one possible customization is to use the openNLP models for tokenization and sentence splitting, and the Stanford models for part-of-speech tagging and lemmatization. In addition, the framework enables the user to provide an input text where preprocessing information is already included.
The communication between the input parsing and the text preprocessing modules (Figure 1) is handled by the Analyzer, a component that handles a fixed thread pool and outputs the feature information collected from the input text. Features extraction. The feature extraction module takes as input the data extracted at preprocessing time, and constructs a set of features that will be used in the subsequent stage to train the actual SUPWSD model. As in the previous stage, the user can rely on the configuration file ( Figure 2) to select which features to enable or disable. SUPWSD currently supports five standard features: (i) part-of-speech tag of the target word and part-of-speech tags surrounding the target word (with a left and a right window of length 3); (ii) surrounding words, i.e. the set of word tokens (excluding stopwords from a pre-specified list) appearing in the context of the target word; (iii) local collocations, i.e. ordered sequences of tokens around the target word; (iv) pre-trained word embedding, integrated according to three different strategies, as in Iacobacci et al. (2016); 2 (v) syntactic relations, i.e. a set of features based on the dependency tree of the sentence, as in Lee and Ng (2002). SUPWSD allows the user to select appropriate cutoff parameters for features (i) to (iii), in order to filter them out according to a minimum frequency threshold.
Classification. The classification module constitutes the last stage of the SUPWSD pipeline. On the basis of the feature set constructed in the previous stage, this module leverages an off-theshelf machine learning library to run a classification algorithm and generate a model for each sense-annotated word type in the input text. The current version of SUPWSD relies on two widely used machine learning frameworks: LIBLIN-EAR 3 and LIBSVM 4 . The classification module of SUPWSD operates on top of these two libraries.
Using the configuration file ( Figure 2) the user can select which library to use and, at the same time, choose the underlying sense inventory. The current version of SUPWSD supports two sense inventories: WordNet (Miller et al., 1990) 5 and BabelNet (Navigli and Ponzetto, 2012) 6 . Specifying a sense inventory enables SUPWSD to exploit the Most Frequent Sense (MFS) back-off strategy at test time for those target words for which no training data are available. 7 If no sense inventory is specified, the model will not provide an answer for those target words.

SUPWSD: Adding New Modules
In this section we illustrate how to implement new modules for SUPWSD and integrate them into the framework at various stages of the pipeline.
Adding a new input parser. In order to integrate a new XML parser, it is enough to extend the XMLHandler class and implement the methods startElement, endElement and characters (see the example in Figure 3). With the global variable mAnnotationListener, the programmatic user can directly specify when to transmit the parsed text to the text preprocessing module. Instead, in order to integrate a general parser for custom text, it is enough to extend the Parser class and implement the parse method. An example is provided by the PlainParser class that implements a parser for a plain textual file.
Adding a new preprocessing module. To add a new preprocessing module into the pipeline, it is enough to implement the interfaces in the package modules.preprocessing.units. It is also possible to add a brand new step to the pipeline (e.g. a Named Entity Recognition module) by extending the class Unit and implementing the methods to load the models asynchronously. Adding a new feature. A new feature for SUP-WSD can be implemented with a two-step procedure. The first step consists in creating a class that extends the abstract class Feature. The builder of this class requires a unique key and a name. It is also possible to set a default value for the feature by implementing the method getDefaultValue. The second step consists in implementing an extractor for the new feature via the abstract class FeatureExtractor (Figure 4). Each FeatureExtractor has a cut-off value and declares the name of the class through the method getFeatureClass.
Adding a new classifier. A new classifier for SUPWSD can be implemented by extending the generic abstract class Classifier (Figure 5), which declares the methods to train and test the models. Feature conversion is carried out with the generic method getFeatureNodes.

SUPWSD: Usage
SUPWSD can be used effectively via the command line with just 4 parameters ( Figure 6): the first parameter toggles between the train and test mode; the second parameter contains the path to the configuration file; the third and fourth parameters contain the paths to the dataset and the associated key file (i.e. the file containing the annotated senses for each target word) respectively. Figure 2 shows an example configuration file for SUPWSD. As illustrated throughout Section 2, the SUPWSD pipeline is entirely customizable by changing these configuration parameters, and allows the user to employ specific settings at each stage of the pipeline (from preprocessing to actual classification). The working directory tag encodes the path in the file system where the trained models are to be saved. Finally, the writer tag enables the user to choose the preferred way of printing the test results (e.g. with or without confidence scores for each sense).
SUPWSD can also be used programmatically through its Java API, either using the toolkit (the

Evaluation
We evaluated SUPWSD on the evaluation framework of Raganato et al. (2017) 8 , which includes five test sets from the Senseval/Semeval series and two training corpus of different size, i.e. Sem-Cor (Miller et al., 1993) and OMSTI (Taghipour and Ng, 2015a). As sense inventory, we used WordNet 3.0 (Miller et al., 1990) for all open-class parts of speech. We compared SUPWSD with the original implementation of IMS, including the best configurations reported in Iacobacci et al. (2016) which exploit word embedding as features. As shown in Table 1, the performance of SUPWSD consistently matches up to the original implementation of IMS in terms of F-Measure, sometimes even outperforming its competitor by a considerable margin; this suggests that a neat and flexible implementation not only brings benefits in terms of usability of the software, but also impacts on the accuracy of the model.

Speed Comparisons
We additionally carried out an experimental evaluation on the performance of SUPWSD in terms of execution time. As in the previous experiment, we compared SUPWSD with IMS and,  given that both implementations are written in Java, we tested their programmatic usage within a Java program. We relied on a testing corpus with 1M words and more than 250K target instances to disambiguate, and we used both frameworks on SemCor and OMSTI as training sets. All experiments were performed using an Intel i7-4930K CPU 3.40GHz twelve-core machine. Figures in Table 2 show a considerable gain in execution time achieved by SUPWSD, which is around 3 times faster than IMS on Semcor, and almost 6 times faster than IMS on OMSTI.

Conclusion and Release
In this demonstration we presented SUPWSD, a flexible toolkit for supervised Word Sense Disambiguation which is designed to be modular, highly customizable and easy to both use and extend for end users and researchers. Furthermore, beside the Java API, SUPWSD provides an HTTP RESTful service for programmatic access to the SUPWSD framework and the pre-trained models.
Our experimental evaluation showed that, in addition to its flexibility, SUPWSD can replicate or outperform the state-of-the-art results reported by the best supervised models on standard benchmarks, while at the same time being optimized in terms of execution time.
The SUPWSD framework (including the source code, the pre-trained models, and an online demo) is available at http://github.com/SI3P/ SupWSD. We release the toolkit here described under the GNU General Public License v3.0, whereas the RESTful service is licensed under a Creative Commons Attribution-Non Commercial-Share Alike 3.0 License.