EICA Team at SemEval-2018 Task 2: Semantic and Metadata-based Features for Multilingual Emoji Prediction

The advent of social media has brought along a novel way of communication where meaning is composed by combining short text messages and visual enhancements, the so-called emojis. We describe our system for participating in SemEval-2018 Task 2 on Multilingual Emoji Prediction. Our approach relies on combining a rich set of various types of features: semantic and metadata. The most important types turned out to be the metadata feature. In subtask 1: Emoji Prediction in English, our primary submission obtain a MAP of 16.45, Precision of 31.557, Recall of 16.771 and Accuracy of 30.992.


Introduction
Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message (Barbieri et al., 2017). Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. Barbieri et al. (2016) compare the meaning and usage of emojis across two Spanish cities: Barcelona and Madrid. Ljubešić et al. (2017) present a set of experiments and analyses on predicting the gender of Twitter users based on languageindependent features extracted either from the text or the metadata of users tweets. Miller et al. (2016) performed an evaluation asking human annotators the meaning of emojis, and the sentiment they evoke. People do not always have the same understanding of emojis, indeed, there seems to exist multiple interpretations of their meaning beyond their designers intent or the physical object they evoke1. Their main conclusion was that emojis can lead to misunderstandings. The ambiguity of emojis raises an interesting question in human-computer interaction: how can we teach an artificial agent to correctly interpret and recognise emojis use in spontaneous conversation? The main motivation of our research is that an artificial intelligence system that is able to predict emojis could contribute to better natural language understanding (Novak et al., 2015) and thus to different natural language processing tasks such as generating emoji-enriched social media content, enhance emotion/sentiment analysis systems, and improve retrieval of social network material.

Features
We use several semantic features and metadata features to represent the twitter.

Semantic Features
Semantic features represent the basic conceptual components of meaning for any lexical item (Fromkin et al., 2018). An individual semantic feature constitutes one component of a word's intension, which is the inherent sense or concept evoked (O'Grady et al., 1997).
Semantic Word Embeddings. We use semantic word embeddings obtained from Word2vec models for GoogleNews. For each twitter, we construct the centroid vector from the vectors of all words in that text.
TF-IDF. In information retrieval, tfidf or TFIDF, short for term frequencyinverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus (Leskovec et al., 2014). It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The tf-idf value increases proportionally to the number of times a word appears in the document, but is often offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general. Nowadays, tf-idf is one of the most popular term-weighting schemes; 83% of text-based recommender systems in the domain of digital libraries use tf-idf (Beel et al., 2016).
N is total number of documents in the corpus N = |D|, |{d ∈ D : t ∈ d}| is the number of documents where the term t appears. If the term is not in the corpus, this will lead to a division-byzero (Robertson, 2004). It is therefore common to adjust the denominator to 1 + |{d ∈ D : t ∈ d}|.

Metadata Features
Metadata-based features provide clues about the social aspects of the twitter. Thus, except for the semantic features described above, we also used some common sense metadata features: Twitter containing a question mark. We think if the twitter has a question mark, it may be a question, which might indicate a negative emotions (Castillo et al., 2011).
The presence and the number of links in the twitter. We count both inbound and outbound links. Our hypothesis is that the presence of a reference to another resource is indicative of a positive emotions (Adamic and Huberman, 2000).
The assumption here is that longer twitter could bring more useful detail (Ogasawara, 2009).

Classifier
For each twitter, we firstly extract the features described above. Then we concatenate the extracted features in a bag of features vector and have them normalized. After the normalization, the value are mapped to interval [-1,1]. At last, we input them into the classifier. In our experiments, we use L2regularized logistic regression classifier (Buitinck et al., 2013) and SVM classifier (Zweigenbaum and Lavergne, 2016) respectively. For the logistic regression classifier, we tune the classifier with different values of the C (cost) parameter (Aono et al., 2016), and we take the one that yield the best accuracy on 10-fold cross-validation on the training set. For the SVM classifier, we choose different kernels (Moreno et al., 2004) and achieve the best results with RBF kernel. We only show the better results of above two classifiers in the next section.

Training and Evaluation Data
The data for the task will consist of 500k tweets in English and 100K tweets in Spanish ( Barbieri et al., 2018). The tweets were retrieved with the Twitter APIs, from October 2015 to February 2017, and geolocalized in United States and Spain. The dataset includes tweets that contain one and only one emoji, of the 20 most frequent emojis. Data will be split into trial, training and test.

Label set
As labels we will use the 20 most frequent emojis of each language. They are different across the English and Spanish corpora. In the following we show the distribution of the emojis for each language (numbers refer to the percentage of occurrence of each emoji)

Evaluation Criteria
For evaluation, the classic Precision and Recall metrics over each emoji are used. The official results will be based on Macro F-score, as the fundamental idea of this task is to encourage systems to perform well overall, which would inherently mean a better sensitivity to the use of emojis in general, rather than for instance overfitting a model to do well in the three or four most common emojis of the test data. Macro F-score can be defined as simply the average of the individual label-wise F-scores. The official will also report Micro F-score for informative purposes.

Subtask 1 Result
We can see the results in Table 1. The cagri team obtains the best F1 value. The derpferd team gets

Conclusion
We have described our system for SemEval-2018, Task 2 Multilingual Emoji Prediction. Our approach rely on semantic and metadata-based features. Our primary submission obtain a F1 of 16.45 and accuracy of 30.992.
In future work, we plan to use our best feature combinations in a deep learning architecture, as in the Qius system (Qiu and Huang, 2015), which outperforms the other methods on two matching tasks. We also want to use information from entire threads (Joty et al., 2015) to make better predictions. How to combine them efficiently in the system is an interesting research question 5 Acknowledgments