ELiRF-UPV at SemEval-2018 Task 10: Capturing Discriminative Attributes with Knowledge Graphs and Wikipedia

This paper describes the participation of ELiRF-UPV team at task 10, Capturing Discriminative Attributes, of SemEval-2018. Our best approach consists of using ConceptNet, Wikipedia and NumberBatch embeddings in order to stablish relationships between concepts and attributes. Furthermore, this system achieves competitive results in the official evaluation.


Introduction
Capturing Discriminative Attributes, task 10 of SemEval-2018 (Krebs et al., 2018), proposes working on semantic difference detection . The goal of the task is to predict whether a word is a discriminative attribute between two other words. This problem is known as semantic difference detection, which is a binary classification task: given a triple (apple, banana, red), it consists in determining whether it exemplifies a semantic difference. Regarding semantic difference, it is a ternary relation between two concepts, for instance, (apple, banana) and a discriminative feature (red) that characterizes the first concept but not the other.
As task 10 is related to the semantic relations among different words, knowledge graphs seems the most appropriate resources to be used. An interesting knowledge resource that we used for this task is ConceptNet. In particular, Concept-Net (Speer et al., 2016) is a knowledge graph that connects words and phrases of natural language using labeled edges. It was designed to represent some general knowledge involved in natural language and could be used in combination with other resources. The combination of Con-ceptNet with distributed representations such as Word2Vec (Mikolov et al., 2013) and GloVe (Pennington et al., 2014), is known as NumberBatch embeddings (Speer et al., 2016).
Regarding the relations specified in Concept-Net, there are a total of 36 relations such as IsA (A banana is a dessert), UsedFor (A net is used for catching fish), or FormOf ("Leaves" is a form of the word "leaf"), intended to represent a relationship independently of the language or the source of the terms it connects.
In this work, we propose five knowledge based systems and one additional machine learning system based on Siamese networks. We used Con-ceptNet in order to determine if each input concept and the input attribute are related through a relation edge or path. When there is a relationship between the first concept and the attribute and there is no relationship between the second concept and the attribute, then the answer is 1, otherwise, the answer is 0. However, there are cases in which ConceptNet does not provide enough information to take a decision. In those cases, we have implemented a system that seeks the information in Wikipedia articles by using distances between NumberBatch embeddings.

Resources and Preprocess
As we stated in Section 1, ConceptNet 5 is used in order to find the relationships among concepts and attributes. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 4.0) from http: //conceptnet.io. Moreover, we use two additional resources next to ConceptNet.
On the one hand, in order to use more information of each concept we used the Wikipedia articles. In this way, we get the most related article for each concept, following the recommendation of the Wikipedia disambiguation system. Wikipedia articles had been preprocessed. First, we remove non relevant sections such as "See Also", "References" and "External Links" which links to other resources. After that, we normalized tokens like numbers or urls e.g. "678.2" → "number" y "https://en.wikipedia.org" → "url" and we made a tokenization of the articles.
On the other hand, we used distributed representations of words, more specifically, we used NumberBatch embeddings (Speer et al., 2016).

System Description
We tested several approaches to address this task, mostly knowledge-based. Let (c 1 , c 2 , at) be a triple where, c 1 and c 2 are concepts and at is an attribute. The goal of the task is to define a function d to decide whether at is a discriminative feature of c 1 (value 1 for d) or not (value 0 for d). That is, if at characterizes c 1 but not c 2 .
Although the training and the development sets were not very large, we wanted to test the performance of Machine Learning (ML) approaches for this task. We selected a siamese neural network (Bromley et al., 1993) because these kind of systems are suitable for similar tasks such as knowledge base completion (Yang et al., 2014). This system works as follow. First, the input to this network are the NumberBatch embeddings of c 1 , c 2 and at. From this, a shared Multilayer Perceptron is applied in order to extract a complex representation of each term, f (c 1 ), f (c 2 ) and f (at). With these representations, we compute the differences s 1 and s 2 where s 1 = f (c 1 )-f (at) and s 2 = f (c 2 )-f (at) with the aim of establishing relationships between each concept and the attribute. Finally, we concatenate s 1 and s 2 and we apply a fully-connected layer with softmax activation functions to carry out a classification i.e. d Sia = 1 if at is discriminative for c 1 and not for c 2 or d Sia = 0 otherwise.
The rest of the systems were knowledge-based. As first knowledge-based approach, we use the relationships between each concept and the attribute to determine if the attribute is discriminant. To do this, we use the ConceptNet relations. Note that, ConceptNet contains both positive (IsA, For-mOf, DerivedFrom, SimilarTo, ...) and negative relations (DistinctFrom, NotHasProperty). In our proposals, we only consider positive edges, that is, those that denote positive relationships.
We look for positive edges between each concept and the attribute. If an edge between c 1 and at exists but there is not any edge between c 2 and at, we assume that at is discriminant. In other words, at is discriminant if it is reachable from c 1 but not from c 2 . In this way, the function d CN that determines if the attribute at is discriminant for concepts c 1 and c 2 can be defined as shown in Equation 1.
(1) where, R(c 1 ) and R(c 2 ) are the sets of reachable nodes from c 1 and c 2 respectively using positive edges.
The main problem of this proposal is its low coverage. In many cases, there is no edge between any of the two concepts and the attribute, and therefore, it is decided that the attribute is not discriminant. In order to increase the coverage of d CN , we extend the set of reachable nodes from a concept to those nodes reachable from other concepts closely related to the original concept. We have considered as related concepts those that are linked by the FormOf relation in ConceptNet. Considering the extended sets of reachable nodes, we can redefine the function d CN of Equation 1 as shown in Equation 2.
(2) where R 2 (c) is the set of nodes reachable from c or from any concept closely related to c.
Nevertheless, this approach still has coverage problems. In order to mitigate this problem, we proposed two new approaches based on Wikipedia. The main idea is simple, we search for the attribute at in the Wikipedia article of concepts c 1 and c 2 (doc(c 1 ) and doc(c 2 )) and, we decide if at is discriminant based on the result of this search. In this way, function d W e can be defined as shown in Equation 3.
(3) Further, we can relax the exact match criterion. Concretely, if we use NumberBatch distributed representations of words (h), we can compute similarities between at and all the tokens of doc(c 1 ) and doc(c 2 ) in order to decide which concept is the closest to at.
We define a threshold to ensure that there is enough difference between the maximum similarity max w∈doc(c 1 ) cos(h(w), h(at)) and max w∈doc(c 2 ) cos(h(w), h(at)). Using the development set, the value of was fixed to 0.2. This new decision function d W t is presented in the Equation 4.
cos(h(w), h(at)) ≥ 0 : otherwise (4) Finally, in order to explore the joint behavior of the knowledge-based approaches, we propose the combination of d CN 2 and d W t . When there is a relationship between at and any concept -it does not matter if it is c 1 , c 2 or both-we decide if at is discriminant using d CN 2 . But if there is no relationship in ConceptNet between them, we smooth the decision using d W t . Thus, we only use d CN 2 when we really have information in ConceptNet. The definition of this new decision function d CN 2 +W t is shown in Equation 5.

Experimental Results
In order to validate the correctness of the proposed approaches and also to select the one with the best performance for the competition, we carried out an evaluation of the approaches using the development set provided by the organizers. The results obtained are shown in Table 4. As can be seen in Table 4, the knowledge-based systems which use knowledge resources obtain among 1.31 and 11.13 points of macro F 1 more than the Siamese network which uses only Num-berBatch embeddings. The approaches that use only ConceptNet (d CN and d CN 2 ) achieved as good results as those based on Wikipedia (d W e and d W t ). Note that the coverage of d CN and d CN 2 is very low, 48.71% for d CN and 56.61% for d CN 2 . In cases where there are no links in ConceptNet -more than fifty percent of the time for d CN -it is decided that the attribute is not discriminant. Moreover, the more knowledge incorporated into the system, the better results are obtained. We achieved the best results using the combination of ConceptNet graph and Wikipedia articles (d CN 2 +W t ), achieving 68.20 macro F 1 .
Regarding the evaluation with the test set, we used d CN 2 +W t as decision function for the Se-mEval competition. Our system achieved competitive results (69.00 macro F 1 , 6 points of macro F 1 below the best system that obtains 75.00 macro F 1 ). Our proposal was ranked in 5 th place out of a total of 26 participating teams. Several results from the official evaluation are shown in Table 4.

Analysis of Results
Once the evaluation is finished, we want to carry out an analysis of the behavior of our system. Our goal is to detect in which types of attribute the system works worse. This way, we could add specific knowledge resources to deal with these attributes. Although we have not completed this analysis, a group of attributes that caught our attention were the colors. While the overall error rate of our system was about 30%, the error in samples with attributes related to colors was about 50%. More details about the behavior of our system can be seen in Table 5. Therefore, it would be possible to improve the system behavior by treating the color attributes in a specific way. For instance, by using image resources, such as ImageNet (Deng et al., 2009), to compute the color palette of images of each concept and compare it with the color attribute.

Conclusions and Future Work
In this work, we proposed a knowledge-based system for the discriminative attributes task. This system is based on the combination of two knowledge resources: a knowledge graph with semantic links such as ConceptNet and a general resource such as Wikipedia.
With this system, we achieved good results in the development set, compared to a supervised learning approach like siamese neural networks. That is, a combination of knowledge-based approaches produces significative improvements compared to the supervised approach. Regarding the evaluation with the test set, we obtained competitive results.
As future work, we propose an extension of our system based on the addition of more knowledge resources such DBpedia (Lehmann et al., 2015), Wordnet (Fellbaum, 1998) or Microsoft Concept Graph (Wang et al., 2015). Moreover, it could be interesting to consider the sections of Wikipedia with links to other resources in order to extract more information.
Finally, we propose the incorporation of knowledge resources into Deep Learning systems, beyond using only distributed representations of words. This offers us end-to-end systems capable of learning more complex decision algorithms. Concretely, the siamese neural networks seems to be promising for this work due to their good results in related fields such as knowledge-based completion (Yang et al., 2014).