USAAR at SemEval-2016 Task 13: Hyponym Endocentricity

This paper describes our submission to the SemEval-2016 Taxonomy Extraction Evaluation (TExEval-2) Task. We examine the en-docentric nature of hyponyms and propose a simple rule-based method to identify hyper-nyms at high precision. For the food domain, we extract lists of terms from the Wikipedia lists of lists by using the name of each list as the endocentric head and treating all terms in the extracted tables as the hyponym of the en-docentric head. Our submission achieved competitive results in taxonomy construction and ranked top in hypernym identiﬁcation when evaluated against gold standard taxonomies and also in manual evaluation of novel relations not covered by the gold standard taxonomies.

With the rapid technological evolution, it is more feasible to construct a domain-specific taxonomy that caters to domain or company specific terminology (Lefever, 2015). This motivated the move towards unsupervised approaches to taxonomy extraction (Berland and Charniak, 1999;Lin and Pantel, 2001;Snow et al., 2006) and specifically focused towards particular domains (Velardi et al., 2013;Bordea et al., 2015).
The aim of the Taxonomy Extraction Evaluation (TExEval) task is to automatically find lexical relations between pairs of terms within several specified domains. Previously, we have developed a hypernym extraction system using word embeddings by exploiting the frequent occurrence of the 'X is a Y' pattern in encyclopedic text .
We have achieved competitive results in SemEval-2015 and as a follow up to our study, we would like to explore the endocentric nature of hyponyms that contributed substantially to the system performance in the previous TExEval task.
Below, we will briefly (i) describe related work on different approaches to taxonomy induction, (ii) explain the linguistic phenomenon of endocentricity, (iii) present our endocentric hypo-hypernym identification system and the results of our submission to the TExEval-2 task in SemEval-2016.
More recently, there is a resurgence of vector space or distributional approaches (Van Der Plas, 2005;Lenci and Benotto, 2012;Santus et al., 2014) primarily because of the renaissance of deep learning and neural networks.
Semantic knowledge can be thought of as a vector space where each word is presented by a point and the proximity between words in this space quantifies their semantic association. The vector space is usually constructed from the distribution of words across contexts such that similar meanings tend to be found close to each other within the vector space (Mitchell and Lapata, 2010).
A grammatical construction is endocentric when it fulfills the same linguistic function as one of its part(s). For instance, the word goldfish is an endocentric compound noun as syntactically it functions as a noun just as its component part fish and semantically the compound denotes a type of fish.
Conversely, when a grammatical construction made of two or more parts is exocentric, no part component carries the linguistic function or meaning assigned to the complex construction. Intuitively, we would expect that there are many endocentric hyponyms in a taxonomy where part of the term conveys its main meaning and usually that part of term would be its hypernym.
The endo/exocentricity feature of a lexical term assumes that the term can be split into two or more parts. For example, fish is a single noun that cannot be split, thus it cannot be endo-or exocentric.
While experimenting with ways to weight a term for information retrieval, Jones (1979) observed that compound nouns often follow the head-modifier principle where the meaning of the term can be conveyed by part(s) of the compound. Approaching endocentricity from a different angle, Nichols et al. (2005) identified the semantic head(s) of a term as its hypernym using the lowest scoping element of the Robust Minimal Recursion Semantics (RMRS) (Copestake et al., 2005) structure of the dictionary definition of the term. In the first TExEval task in SemEval-2015, both Lefever (2015) and  1 independently developed string-based systems that exploit the endocentric nature of hyponyms.
In our submission to the TExEval-2 task (Bordea et al., 2016), we seek to answer the question of exactly "how many hyponyms within a taxonomy are endocentric?". Additionally, we exploit the endocentric nature of the hyponyms to extend the taxonomy by crawling and cleaning Wikipedia's List of Lists of Lists. 2 Often these lists of terms are found in Wikipedia marked up as tables or in bullet forms.

Identifying Endocentric Parts
The main implementation of the rule-based identifier 3 checks if a term T1 is a substring of another term T2 and if so, assign T1 as a hypernym of T2. Examples of hypo-hypernym pairs captured by this rule includes are (psycholinguistics, linguistics), (kobe beef, beef ), (sauce gribiche, sauce).
Our implementation is simpler than the three part morpho-syntactic analyzer component of the multimodular taxonomy constructor in Lefever (2015). She implemented rules for three different syntactic constructions which check for suffixes and treat single-word terms and multi-word terms differently while our implementation is agnostic to the single and multi-word distinction.
In addition to the first rule, if a term contains the "of" preposition, we swap the assignment and check that T2 starts with T1 then assign T2 as a hypernym of T1. Examples of hypo-hypernym pairs captured by this swap rule are (elixir of life, elixir), (sociology of education, sociology).
To improve the precision of the identifier, we set a threshold of a minimum character length of three when identifying a term as a hypernym.

Extending a Taxonomy with Wikipedia
List of Lists of Lists The Wikipedia List of Lists of Lists (LOLOL) is a crowdsourced list of lists of terms. We adapted the customized crawler 4 (Tan et al., 2014;Tan and Ordan, 2015) to crawl for tables or bullet points in the Wikipedia subpages of the LOLOL for the food domain. We started the crawl from these seed pages under the bullet point of https://en.wikipedia.org/wiki/ List of lists of lists#Food and drink.
When the crawler lands on each List of Lists (LOL) page, it will treat the URL suffix as the hypernym and find words in the bullet points or tables that contain endocentric hyponyms.
If an endocentric hyponym exists, it will extract either (i) all the terms in bold font if the LOL page is bulleted or (ii) all terms in the first column if the LOL page is in table form. The choice of the first column is based on the fact that often LOL tables are bi-column, one containing the terms and the other the gloss or/and description of the term.

Limitations of LOLOL Trawler
There are a number of issues with this trawling (crawl+clean) approach to extend the taxonomy.
LOL pages are not standardized: The way the crawler cleans the bullets or tables on each LOL page is not standardized because there is no constraint put on the format of the Wikipedia's LOL page. Our trawler only managed to crawl and clean less than 30 LOL pages when extracting the new terms for the food domain.
LOL pages are inceptive: The depth of how nested the LOLs are is undefined. Our trawler can start with a List of foods page and it leads to the List of breads page and then the List of American breads page and it contin-4 It was built for crawling translations and diachronic texts in previous SemEval tasks ues. For sanity, we had to break our trawler at the second page depth and return to the main LOLOL page to move on to the next LOL that we have not previously trawled. Table 1 presents the overview results of our submissions to the TExEval-2 task. Only the results for the food domain contains the hypo-hypernym pairs extracted by our trawler. The rest of the domains comprise of the outputs solely generated by our endocentric hypo-hypernym identifier.

Results
Although it is counter-intuitive to think that endocentric hypo-hypernym pairs can be wrong, this example aptly demonstrates the limitation: (honey bunches of oats, honey). In this case, neither 'honey bunches of oats' can be a hypernym of 'honey' nor vice versa.
When compared against the gold standard taxonomies, our submission achieved the highest precision in the environment, food (WordNet), science (Eurovoc) and science (WordNet) domains.
As for the Food domain, we expected the fall in precision due to the additional terms that we introduced from the Wikipedia LOLOL outside of the gold standard taxonomy. Thus, we are also unable to determine the true "correctness" of these terms (indicated by the dash in Table 1).
Looking at the proportion of the number of hypohypernym pairs that our system correctly identified, we can approximate that 15-25% of the hypernyms in a taxonomy can be easily identified through their endocentric hyponyms by taking the ratio of #Correct / #Terms. However, the proportions presented in Table 1 exclude the correct hypo-hypernym pairs that are identified but are not currently in the gold-standard taxonomy. Table 2 presents the results of the manual evaluation for the precision of 100 randomly selected hypo-hypernym pairs that are not in the gold standard taxonomy. Our system achieved top precision in all domains other than science (Eurovoc).
If we consider the precision scores from Table 2 as the precision of the remaining identified but not correct hypo-hypernym pairs in Table 1, we would be able to add to the 15-25% hypononym endocentricity in taxonomies. However, the aggregation of   Comparing against the TExEval-2 organizers baseline string-based method and the TAXI lexicosyntactic substring approach (Panchenko and Biemann, 2016) for the WordNet taxonomies, our system achieved highest precision but underperfomed in recall as shown in Table 3.
Since our main implementation of our hypernym identifier is language independent, in retrospect, we can easily remove the swap rule that is attached to the English 'of ' and apply the identifier to other languages in the TExEval-2 task. Table 2 presents a summary of the results of novel hypo-hypernym pairs identified by the participating systems in TExEval-2. A detailed overview of the results of TExEval-2 is presented in Bordea et al. (2016).

Other Participating Systems
JUNLP relied on substrings and relations extracted from BabelNet (Navigli and Ponzetto, 2012) to identify hyper-hyponym pairs. Although it is sensible to approach the task using an existing ontology, their system achieved relatively low precision on the manual evaluation of novel hyper-hyponym pairs. The NUIG-UNLP team extended previous work on vector space approaches to taxonomy induction by comparing the similarity between the dense word embeddings of the hyponyms and their candidate hypernyms. They system achieved high recall but attained low precision (Pocostales, 2016).
Similar to our endocentric-based approach, the TAXI team extended the substring-based approach by filtering the hypernym candidates based on corpora statistics of lexico-syntactic patterns. Additionally, they applied pruning methods to improve the ontological structure which resulted in high Fowlkes and Mallows (F&M) Measure (Panchenko and Biemann, 2016). QASSIT used lexical patterns to extract hypernym candidates and applied the pretopological space graph optimization technique that is based on genetic algorithm to achieve the desired taxonomy structure (Cleuziou and Moreno, 2016).
TAXI and QASSIT ranked first and second in the taxonomy construction criterion of the TExEval task. Both teams used graph pruning techniques to improve the taxonomy structure and implicitly improve the F&M scores 5 of their taxonomy. Although our endocentricity based hypo-hypernym extraction system ranked first in hypernym identification of TExEval task, we ranked third in taxonomy construction with an overall F&M score of 0.0013.

Conclusion
In this paper, we have described our submission to the Taxonomy Extraction Evaluation (TExEval-2) Task for SemEval-2016. We have empirically shown that 15-25% of the hypernyms in a taxonomy can be easily identified through their endocentric hyponyms and we briefly discuss the intuitions and limitations of the approach.
We have achieved competitive results in taxonomy construction and achieved top precision score for hypernym identification in most domains involved in the task.