ELiRF-UPV at SemEval-2017 Task 7: Pun Detection and Interpretation

This paper describes the participation of ELiRF-UPV team at task 7 (subtask 2: homographic pun detection and subtask 3: homographic pun interpretation) of SemEval2017. Our approach is based on the use of word embeddings to find related words in a sentence and a version of the Lesk algorithm to establish relationships between synsets. The results obtained are in line with those obtained by the other participants and they encourage us to continue working on this problem.


Introduction
Pun is a figure of speech that consists of a deliberate confusion of similar words or phrases for rhetorical effect, whether humorous or serious. In (Giorgadze, 2014), the author analyzed, from a linguistic point of view, the pun as one of the categories of wordplay and its manifestation in oneliner jokes in English. Pun is a way of using the characteristics of the language to cause a word, a sentence or a discourse to involve two or more different meanings. Therefore, humorous or any other effects created by puns depend upon the ambiguities of these words.
Pun detection is closely related to the Word Sense Disambiguation (WSD) problem, but in this case we need to select two senses of the pun (Miller and Gurevych, 2015;Miller and Turkovi, 2016).
The interpretation of puns has been subject of study in theoretical linguistics, and has led to a small but growing body of research in computational linguistics. In the task 7 of the SemEval 2017 competition, organizers proposed three challenges (subtasks): pun detection, pun location and pun interpretation (Miller et al., 2017).
In this work, we present a proposal for two subtasks: homographic pun location (subtask2), and homographic pun interpretation (subtask3).
Our proposal for both subtasks lies in the hypothesis that the two senses of the pun in the sentence are possible thanks to the coexistence of the pun with other words in that sentence that are semantically close to the pun. According to this hypothesis, our method for pun detection consists of finding pairs of words more semantically related in the sentence. In addition, our method for pun interpretation is based on the detection of words in the sentence, different from the pun, that help to find the two senses of the pun. The selection of these words is also based on the criterion of the semantic proximity to the pun.

Subtask 2: Pun location process
Pun localization consists of identifying which word is the pun given a sentence that contains a pun. Our proposal is based on two hypotheses: i) to find the most semantically related pair of words (one of these words should be the pun); ii) the pun should be at the end of the sentence.
Our approach to the pun location process is made following the Algorithm 1. As a previous step, the sentence is processed in order to eliminate punctuation marks and stop words, and to convert uppercase to lowercase. As a result of this process a set of semantically relevant tokens is obtained. This process removes from the sentences those tokens without semantics. Each token is represented by its embedding obtained from a pretrained word embedding model (Mikolov et al., 2013) trained on part of Google News dataset (3 million words). The embedding dimension was fixed to 300.
For all the pairs of tokens, the cosine distance of their corresponding embedding representation is calculated. The pairs are ranked according to this distance and the pair of less distance is selected. Finally, the pun selection is performed applying two heuristics: • First, we assume that consecutive words in a sentence are semantically close, but the words that help the pun to be interpretable are not placed next to the pun. Therefore, we do not consider those pairs that correspond to consecutive words in the sentence.
• Second, we assume that the words that help the pun to be interpretable are placed before the pun in the sentence. Therefore, we selected as pun the word in the pair that is situated closer to the end of the sentence.
Algorithm 1: Selection of the pun of a sentence, task2 Input: s, the sentence that contains a pun Result: w k , the word in s that we guess is the pun Table 1 shows the results for the subtask 2 (Homographic pun location). Although our results present a wide room for improvement (0.4462 for F 1), they are in line with those obtained by other participants. We achieved the fourth place in the competition being the best result 0.6631 for F 1.
In order to test the two heuristics applied in the pun selection, we additionally computed some statistics comparing our results with those of the gold standard.
We assumed that the words that help the pun to be interpretable are placed before the pun in the  sentence in most of the cases. The number of pairs of tokens selected by our approach that contain the pun is 767. In 702 of these pairs (91,5%), the pun was the second component of the pair, and, only in 65 (8,5%) the pun was the first component. These percentages confirm the goodness of this heuristic for subtask 2.
We also assumed that the words that help the pun to be interpretable are not placed next to the pun; therefore, we did not consider as candidates the consecutive words. If this heuristic is not applied, the number of pairs of tokens selected by our approach that contains the pun is 672, fewer than 767 pairs in case the heuristics was applied. In these 672 pairs, there are 580 where the pun is the word selected by our approach, and in 92 pairs, the selected word was the first component of the pair, that is more than the 65 pairs in case the heuristics was applied.

Subtask 3: Pun interpretation process
The process of pun interpretation is described by Algorithm 2. The interpretation process of our proposal is made following several steps: • Selection of the two words semantically closest to the pun.
In a similar way that stated for subtask 2 (Section 2), the sentence is processed in order to eliminate punctuation marks and stop words, and uppercase are converted to lowercase.
Given the set of tokens, a sorted list of pairs of different tokens is generated, where, the first component of the pair is the pun w p and the second component is any of the other tokens in the sentence whenever is not consecutive to the pun. For each pair of tokens, the cosine distance of their corresponding embedding representation is calculated.
We selected the two first pairs in the above sorted list, (w p , w 1 ), (w p , w 2 ), that is, we selected the two words in the sentence most Algorithm 2: Selection of the two synsets of the pun on a sentence, task3 Input: s, the sentence that contains a pun w p , the word in the sentence, at position p, that is the pun Result: (sy 1 , sy 2 ), the two synsets of the pun w p in the sentence s Function get closest words (s, w p ) t ← remove stopwords(tokenize(s)) Function synset similarity(sy 1 , sy 2 ) c 1 ← get context(sy 1 ) c 2 ← get context(sy 2 ) return c 1 ∩ c 2 begin w i , w j = get closest words(s, w p ) sy 1 , b ← null, −∞ foreach sy p ∈ synsets(w p ) do foreach sy i ∈ synsets(w i ) do s ←synset similarity(sy p , sy i ) if s > b then sy 1 , b ← sy p , s sy 2 , b ← null, −∞ foreach sy p ∈ synsets(w p ) do foreach sy j ∈ synsets(w j ) | sy j = sy 1 do s ←synset similarity(sy p , sy i ) if s > b then sy 2 , b ← sy p , s return (sy 1 , sy 2 ) closely related to the pun from a semantic point of view. The cosine distance of (w p , w 1 ) is the smallest and the cosine distance of (w p , w 2 ) is the next smaller one.
In Algorithm 2, this step corresponds to the get closest words function.
• Generation of a bag-of-words per synset.
For each synset of the pun (w p ) and for each synset of both closest words (w 1 , w 2 ), we obtain a bag-of-words that includes: i) all the lemmas in the gloss of the synset; ii) the own name of the synset; and iii) the lemmas in all the example sentences. Before getting the lemmas, the sentences are processed in order to convert to lowercase and to eliminate punctuation marks and stop-words.
This step corresponds to the get context function in Algorithm 2.
The final goal of the subtask is to select one pair of synsets (sy 1 , sy 2 ) of the pun that represent its two different meanings in the sentence. Our hypothesis is that one synset of the pair (sy 1 ) is related to one synset of w 1 and the other synset of the pair (sy 2 ) is related to one synset of w 2 . In a similar way that Lesk algorithm (Lesk, 1986), we used as measure of similarity between two synsets, the overlapping between the bags-of-words of both synsets.
In this way, we select the first synset (sy 1 ) of the pun that maximizes the overlapping with one synsets of w 1 . After that, we select other synset of the pun (sy 2 , sy 2 = sy 1 ) that maximizes the overlapping with one synset of w 2 . Table 2 shows the results of subtask 3 (Homographic pun interpretation). Our results are low, but are in line with the results of the rest of the participants. We achieved 0.0996 for F 1 (the third place), being the best result 0.1557.  As in the subtask 2, we calculated some statistics comparing our results with those of the goldstandard. The number of correct pairs of synsets was 127 of the 1252 analyzed sentences, however, there were 255 additional sentences for which one synset was correct.

Conclusions
In this work, we have presented our participation at task 7 (subtask 2: homographic pun detection and subtask 3: homographic pun interpretation) of SemEval2017. Our approach is based on the use of word embeddings to find related words in a sentence and a version of the Lesk algorithm to establish relationships between synsets. We achieved the fourth place in subtask 2 (Homographic pun location) and the third place in subtask 3 (Homographic pun interpretation).
The results obtained are in line with those obtained by the other participants and they encourage us to continue working on this problem.
As future work we plan to adapt state-of-the-art WSD techniques to tackle with the pun interpretation problem.