NUIG-UNLP at SemEval-2016 Task 13: A Simple Word Embedding-based Approach for Taxonomy Extraction

This paper describes the NUIG-UNLP sys-tem submitted to SemEval-2016, Task 13. We implement a semi-supervised method that extracts hypernym candidates for the terms provided in the test list. The main assumption of our system is that hypernyms may be induced by adding a vector offset to the corresponding hyponym word embedding. The vector off-set is obtained as the average offset between 200 pairs of hyponym-hypernym in the same vector space. Our approach ranked second on connectivity (c.c.) and categorisation (i.i.) for the English taxonomy construction, and ﬁfth on the overall ranking. Despite of these modest results, our system achieved comparable evaluations scores with the other participants.


Introduction
Hyponyms and hypernyms (sometimes called subordinate and superordinate terms, respectively) describe a type of relation which, in general, can be defined in terms of asymmetric entailment: given the hyponym of feline, cat, and its hypernym, feline, we can state that all cats are felines, but not that all felines are cats.
Likewise, the relations that hyponyms and hypernyms signal can also be characterized as a isa relation between a hyponym X and hypernym Y : for nouns, X is a Kind of Y or X is a type of Y (Saint-Dizier and Viegas, 1995). These particular type / kind-of relations form the backbone of the construction of Lexical Taxonomies and Ontologies (Buitelaar et al., 2004;Navigli et al., 2011), and those in turn plays a essential role in many Natural Language Processing applications: Question Answering, Textual Entailment, Natural Language Inference, or Text Summarization (Bordea et al., 2015).
In this regard, despite the fact that taxonomy construction can be addressed from a diversity of approaches, the lexico-syntactic patterns-based are still the most widely used. Nevertheless, in the last years some vector space-based approaches have emerged for learning semantic hierarchies (Saxe et al., 2013;Khashabi, 2013;Fu et al., 2014;Rei and Briscoe, 2014;Tan et al., 2015;Nayak, 2015). In the next sections we will mainly turn our attention to this type of approaches.

Task Definition
The five participating teams in SemEval-2016 Task 13 were provided with six datasets in four languages (English, Dutch, French and Italian) 1 . The datasets can be divided in three domains (science, environment and food). Additionally, this year the TExEval-2 task has a focus in four subtasks related to taxonomy construction: 1. Taxonomy construction 2. Hypernym identification 3. Multilingual taxonomy construction 4. Multilingual hypernym identification However, due to lack of time, we decided to address only the English monolingual subtasks.
The key idea behind TExEval tasks is the creation and evaluation of systems capable of automatically extracting hierarchical relations from text and then constructing taxonomies. Following (Fu et al., 2014), ideally, the construction of those hierarchies can be seen as a directed acyclic graph DAG with a finite set of nodes (words) and edges representing the asymmetric and transitive hyponymhypernym relations. This is formally defined by Fu et al. (2014) as follows: where in our case x, y and z denote the terms in the domain list L d ∈ L, and the hyponym-hypernym relation is represented by H − →. Therefore, the aim of the task was to return a list of pairs x H − → y for each term in the six different domains L d .

Experimental Setup
We describe in this section our taxonomy extraction system.

Training Data
Since TExEval-2 organizers did not provide any specific corpus for the task, we used the latest Wikipedia dump 2 . We preprocessed it using the WikiExtractor tool 3 , which generates a plain text from a Wikipedia database dump discarding markup tags and any other element different than text, such as tables, references, lists and images. On the other hand, in order to generate a single word embedding for each entry in the test list, we underscore all the entries containing open compound words: civil engineering ⇒ civil engineering

Word Embeddings Generation
We use the log-bilinear model GloVe (Pennington et al., 2014) trained over the above-mentioned Wikipedia corpus to generate vector space representations of words. Following the analogy task results presented in their paper and some pre-experimental test, we set a windows size of 10 and 300 dimensions word embeddings. The number of iterations of the model was set to 20. Mikolov et al. (2013) and subsequently Levy and Golberg (2014) demonstrated that word embeddings generated by neural nets (and also other traditional distributional methods) preserve some syntactic and semantic information. Some of this encoded information, such as relational similarities between pairs of words, can be recovered by simple vector offsets between the vector embeddings of each word. Thus, as Mikolov et al. (2013) and Levy and Golberg (2014) showed, given two pairs of words that share a relation, a : a * , b : b * , the relation between those two words can be represented by their vector offset, as follows:

Offset Model
Therefore, the vector of the word b * should be similar to the proxy vector y where y , ideally, corresponds to the vector representation of b * . Since y will rarely match the exact position of the word b * , different similarity measures may be applied to find the most similar word to y . In this paper we will focus only in Cosine similarity (4) and Euclidean distance (5): maximizing the function: where V is the vocabulary. And given the Euclidean distance formula, subsequently we obtain the following function: Mikolov et al. (2013) and Levy and Golberg (2014) have only tested the vector offset method for simple symmetric relations such a capital-country, gender inflections, adjective-to-adverbs, etc. However, as Rei and Briscoe (2014) pointed out, hypernymhyponyms relations are conceivable much more difficult to represent by simple vector offsets, as their relations rarely are symmetric. Rei and Briscoe (2014) in their paper first assess how word embeddings perform in hypernymhyponym detection and generation, and second, propose a new directional similarity measure (Weight-edCosine) based on two new properties to detecting these relations. In our submitted system, though, we finally decided not to implement this new measure due to lack of time.

Offset Model for the Hypernym-Hyponym Relation
We first generate a random list of 200 pairs of hypernym-hyponyms. This training list was extracted from the trial data provided in Bordea et al. (2015) and WordNet (Miller, 1995;Fellbaum, 1998) covering different domains.
Using the Gensim library 4 (Řehůřek and Sojka, 2010) we compute the vector offset as the average offset of all the pairs generated in the abovementioned training data (Mikolov et al., 2013;Nayak, 2015): where n = 200, as the number of pairs of hypernym-hyponym in our training data.
Once the v of f set has been obtained, we add it to the target terms in the test list: where we assume that the addition of the vectors v of f set and v term projects y close enough to the hidden hypernym representation b * . Thus, we apply either the measure similarity (4) or (5) and we rank either the top 10 or 5 candidates, discarding those terms not included in the test list. We also implement a substring inclusion approach based on regexp (Nevill-Manning et al., 1999)  In other words, given an open compound word such civil engineering, we assume that the second term engineering is the most likely hypernym of civil engineering.

Evaluation Metrics
In this section we present the results obtained in the second task on Taxonomy Extraction Evaluation as part of SemEval-2016. The metrics correspond to the structural analysis and the comparison against the Gold Standard effectuated by Bordea et al. (2016). The best results among all the systems appear in a bold font (note that Euclidean 5 and Cosine 5 have been excluded as they were not submitted on time). Table 1 shows the structural analysis of our system and the corresponding results when compared to Gold Standard. We were only able to submit the first system (Euclidean 10), and we had to exclude the food domains due to time limitations. However, we will also present here metrics beyond the official system submission, i.e, Euclidean 5 and Cosine 5 covering all domains provided on the test data.

Results
The hyphen (-) is used in cases when the number of cycles could not be computed due to hardware limitations. This outcome should be interpreted as negative, as the presence of cycles goes against Directed Acyclic Graph (DAG) definition.
As per the structural analysis, the main goal is to evaluate the number of correct nodes and edges in comparison with the Gold Standard. Thus, the quantifying metrics in the left block (|V| ... i.n.) cannot really be considered aside of the Golden Standard evaluation. Therefore, in this section we will mainly focus on the qualifying metrics instead of the quantifying ones.
We observe that, likely, due to the restrictions imposed in our algorithm not allowing hypernym can-  We also note that the number of cycles was considerably higher on the euclidean approaches, in fact, exceeding the computer memory capacities for some domains, namely, Science, Science WordNet (see the surprisingly high figure for Euclidean 5, Science WN). On the other hand, unlike our initial assumptions, the cosine approach did not perform much better than the Euclidean ones. Our system obtained comparable recall values with the other systems, at the expenses, though, of the precision. Therefore, the results achieved by our systems are in general modest, especially taking into consideration that our algorithm also included a substring inclusion module (as described in section 2.5).

Conclusion and Discussion
Although there is still room for improvement in our system, we conclude that the diversity involved in the complex hypernym-hyponym relations cannot easily be captured by a simple vector offset mean. As direction for future work, it might be worth considering domain specific vectors as well as incrementing the number of training pairs for the vector offset mean.
Our system ranked second on connectivity (c.c.) and categorisation (i.i.) for the English taxonomy construction, and fifth on the overall ranking (see Bordea et al. (2016) for further details on the evaluation metrics).