MERALI at SemEval-2017 Task 2 Subtask 1: a Cognitively Inspired approach

In this paper we report on the participation of the MERALI system to the SemEval Task 2 Subtask 1. The MERALI system approaches conceptual similarity through a simple, cognitively inspired, heuristics; it builds on a linguistic resource, the TTCS-e, that relies on BabelNet, NASARI and ConceptNet. The linguistic resource in fact contains a novel mixture of common-sense and encyclopedic knowledge. The obtained results point out that there is ample room for improvement, so that they are used to elaborate on present limitations and on future steps.


Introduction
Defining conceptual representations along with their associated reasoning procedures required and still requires truly interdisciplinary efforts, involving psychologists (Miller and Charles, 1991;Barsalou, 1999;Malt et al., 2015), philosophers (Machery, 2009;Gärdenfors, 2014), neuroscientists (Vigliocco et al., 2014), and computer scientists (Resnik, 1998;Agirre et al., 2009;. Today, the evergrowing number of applications of semantic technologies demand for further investigation on concepts' meaning: this fact explains the popularity of issues rooted in and related to conceptual similarity, and the success of the present Semantic Word Similarity task (Camacho-Collados et al., 2017).
In this paper we present an approach to the computation of conceptual similarity based on a novel lexical resource, the TTCS E -so dubbed after Terms to Conceptual Spaces-Extended-that has been acquired by integrating two different sorts of linguistic resources, such as the encyclopedic knowledge available in BabelNet (Nav-igli and Ponzetto, 2012) and NASARI (Camacho-Collados et al., 2015), and the common-sense grasped by ConceptNet (Speer and Havasi, 2012). The resulting representation enjoys the interesting property of being anchored to both resources, thereby providing a uniform conceptual access grounded on the sense identifiers provided by Ba-belNet.
The TTCS E provides a conceptual representation inspired to Conceptual Spaces (CSs), a geometric representation framework where knowledge is represented as a set of limited though cognitively relevant quality dimensions (Gärdenfors, 2014). The CSs framework has been recently used to extend and complement the representational and inferential power allowed by formal ontologies with special emphasis on dealing with the corresponding typicality-based conceptual reasoning (Lieto et al., 2015; in this setting, the TTCS E aims at providing a wide-coverage, cognitively based linguistic resource for this sort of knowledge, by extending previous work (Lieto et al., 2016;Mensa et al., 2017).

Concept Representation in the TTCS E
Concepts representation in the TTCS E is consistent with CSs: each concept c is provided with a vector representation c providing information on the given concept along some semantic dimensions d. All concepts included in such description are referred to through BabelNet synset IDs, and dimensions themselves are a subset of the relationships available in ConceptNet. Such relations report common-sense information like, e.g., ISA, ATLO-CATION, USEDFOR, PARTOF, MADEOF, HASA, CAPABLEOF, etc.. For a full description of the employed properties we refer the reader to (Mensa et al., 2017).
Let D be the set of N dimensions. Each con-  Each s can contain an arbitrary number of values, or be empty. For example, the representation for the concept FORK includes information about 6 dimensions that are filled with overall 18 concepts, like illustrated in Figure 1.
The TTCS E resource contains 14, 677 concepts, and it was built by starting from the 10K most frequent nouns present in the Corpus of Contemporary American English (COCA), 1 browsing over 11M associations available in ConceptNet and the 2.8M NASARI vectors. Concepts in the TTCS E are filled, on average, with 14.90 (concept) values. 2 3 Conceptual similarity with the TTCS E Our similarity metrics does not employ WordNet taxonomy and distances between pairs of nodes, such as in (Wu and Palmer, 1994;Leacock et al., 1998), nor it depends on information content accounts either, such as in (Resnik, 1998).
Conversely, given the aforementioned representation for concepts, one principal assumption underlying our approach is that two concepts are similar insofar as they share values on the same dimension, such as when they are both used for the same ends, they share the same components, etc.. Given two concepts c i and c j , the conceptual similarity along each dimension -filled in both vectors-should be ideally computed as a function of the cardinality of the intersection between overlapping dimensions where s i k is the set of concepts filling the k-th dimension in the vector c i representing the concept c i . The rationale underlying this formula is to grasp shared features, thereby allowing us to provide an explanation based on common-sense accounts. For example, rather than computing a distance on WordNet or observing how frequently they co-occur, to justify the similarity score for the pair bird, cock we consider that each concept ISA 'animal'; and that both of them are RELAT-EDTO 'feather', 'chicken', 'roosting' and 'vertebrate'.
However, our approach is presently limited by the actual average filling factor, and by the noise that can be possibly collected by an automatic procedure built on top of the BabelNet and ConceptNet resources. To handle the possibly unbalanced number of concepts that characterize the different dimensions and to prevent the computation from being biased by more richly defined concepts, we adopt the Symmetrical Tversky's Ratio Model (Jimenez et al., 2013).
where |s i k ∩ s j k | counts the number of shared concepts that are used as fillers for the dimension d k in the concept c i and c j , respectively; and a and b are computed as a = min(|s ; and N * counts the dimensions actually filled with at least one concept in both vectors. The Symmetrical Tversky's Ratio Model allows us to tune the balance between cardinality differences (through the parameter α), and between |s i k ∩ s j k | and |s i k − s j k |, |s j k − s i k | (through the parameter β). The parameters α and β were set to .8 and .2 for the experimentation, based on a parameter tuning performed on the RG, MC and WSsim datasets (Rubenstein and Goodenough, 1965;Dimension (h)   h | report about how many concepts were retrieved that fill each dimension; elements in the 'shared values' column detail how many concepts were found in common to be part of both concept descriptions along the given dimension. The final similarity score obtained by the TTCS E is 0.63, against the 0.65 assigned in the gold standard.

Evaluation
The dataset proposed for the experimentation included 500 word pairs; thanks to the mixture of abstract/concrete concepts and named entities it can be considered as a very complete and challenging test bed. Results have been computed through Pearson and Spearman correlations (respectively, r and ρ) and their harmonic mean; the latter measure ranges between 0.789 (obtained by the LUMINOSO team) and 0, as displayed in Figure 2. In particular, MERALI obtained 0.589 (r), 0.600 (ρ) and 0.594, respectively. We presently focus on this run of the system and disregard the other one that attained substantially similar results, stemming from a slightly different parameters setting.
We dissected the dataset, to identify our system's weaknesses, to the ends of improving both the conceptual similarity computation procedures and the lexical resource. We noticed that out of the 500 overall word pairs, 405 involve concept comparisons, while in the reminder pairs we have at least one entity at stake (namely, 45 entity-entity pairs and 50 entity-concept pairs).
Comparisons involving entities are somehow different from those involving only concepts: for example, the cases where the semantic similarity is computed between a concept and an entity (e.g., in 'Darwin-evolution', 'Gauss-scientist', 'Siemens-electric train') pose additional problems with respect to cases in which two entities are considered (such as for 'Juventus-Bayern Munich', 'Plato-Aristotle', and 'Alexander Fleming-Penicillin'). Under an ontological perspective, individual entities act like instances, whilst concepts can be considered as classes: one thus wonders what does comparing individuals and classes mean. Moreover, according to the Conceptual Spaces framework, individuals can be thought of as points, while concepts are represented as regions over the multidimensional conceptual space. Comparisons between a class and an individual are intuitively harder in that they require i) to find the relations relating the individual and the class being examined; and ii) in a CSs perspective, to compare a point with a region. Furthermore, under a cognitive perspective, it is difficult to follow the strategy adopted by humans in providing a score for pairs such as 'Zara-leggings' (gold standard similarity judgement: 1.67): directly comparing a manufacturer and a product is nearly unfeasible, since their features can be hardly compared. Justifying the answer is perhaps helpful to give some information on the argumentative paths that can be followed to assess semantic similarity. One major risk, in these respects, is that instead of similarity, the scores provided by human annotators rather refer to generic relatedness. For example, let us consider the pair 'tail-Boeing 747' (gold standard similarity judgement: 1.92): although each Boeing 747 has a tail, the whole plane (holonym) cannot be conceptually similar to its tail (meronym), in the same way a door is not similar to its knob.
So we have re-run the statistical tests to compute Spearman and Pearson correlations over the three subsets (entity-entity, entity-concept, concept-concept); the partial results are reported  Table 2: Spearman (ρ) and Pearson (r) correlations (and their harmonic mean) obtained by the MERALI system over the three subsets.
in Table 2.
It turned out that, against our intuition, the MERALI system has better accuracy on word pairs including an entity; so we further examined the latter subset (concept-concept), where we obtained poorer results. Here we notice that in many cases (22, that is over 5% of this subset) overly high scores were determined by the maximization implemented in the word-similarity: in such cases, in fact, semantic similarity is usually computed as the similarity of the closest senses underlying the given terms (Budanitsky and Hirst, 2006). An example of this sort of errors is the pair 'apocalypsefire' (gold standard similarity judgement: 1.25), where the MERALI system returned a value by far higher than the expected value (namely, 3.85): fires can legitimately be interpreted as apocalyptic events, but only in a figurative way. Similar, though distinct, differences in score are observed when comparing two identical concepts: not always human beings provide the maximum (equality) score, sometimes in unexpected way like for, e.g., 'movie-film' (gold standard similarity judgement: 3.92), 'multiple sclerosis-MS' (gold standard similarity judgement: 3.92). Out of 24 such cases, for 13 pairs (3% of this subset) we overestimated the semantic similarity. As regards as fully different concept pairs (46, over 11%), in half cases we have over-estimated the similarity, perhaps due to a too permissive enriching routine that sometimes accepts noisy concepts as dimension fillers.
However, the main issue of the first version of the MERALI system is that the overall amount of information available to the system is often not enough to fully assess the semantic similarity between concepts. Sometimes concepts themselves have been missing, and missing concepts may be lacking in (at least one of) the resources upon which the TTCS E is built. Also, difficulties stemmed from insufficient information for the concepts at stake: this phenomenon was observed, e.g., when both concepts have been found, but no common dimension has been filled. This sort of difficulty shows that the coverage of the resource still needs to be enhanced, especially by improving the extraction phase, so to add further concepts per dimension, and to fill more dimensions.

Conclusions
We have illustrated the system MERALI, that relies on a novel resource, the TTCS E . The underlying representation is compatible with the Conceptual Spaces framework and aims at putting together encyclopedic and common-sense knowledge. The results of the MERALI system have been illustrated and discussed. The experimentation clearly showed that there is room for improving the system along two main axes: dimensions must be filled with further information, and the quality of the extracted information should be improved. Also the computation of the similarity can be refined by testing further heuristics, so to reduce the cases of over-estimation of semantic similarity. All mentioned aspects will be addressed in our future work.