SINAI at SemEval-2017 Task 4: User based classification

This document describes our participation in SemEval-2017 Task 4: Sentiment Analysis in Twitter. We have only reported results for subtask B - English, determining the polarity towards a topic on a two point scale (positive or negative sentiment). Our main contribution is the integration of user information in the classification process. A SVM model is trained with Word2Vec vectors from user’s tweets extracted from his timeline. The obtained results show that user-specific classifiers trained on tweets from user timeline can introduce noise as they are error prone because they are classified by an imperfect system. This encourages us to explore further integration of user information for author-based Sentiment Analysis.


Introduction
Task 4 of SemEval 2017, Sentiment Analysis in Twitter (Rosenthal et al., 2017), has included some new subtasks this year.One of these subtasks considers user information to be also integrated in proposed systems.We have participated in subtask B consisting of, given a message and a topic, classify the message on a two-point scale (positive or negative sentiment towards that topic).Actually, organizers provide scripts to download user profile information such as age, location, followers...We have taken advantage of this information to expand a SVM model trained with Word2Vec vectors from user publications on this social media.
In this paper, we present our approach to classify tweets in a two point scale (positive and negative) by combining Support Vector Machine (SVM), Word2Vec (Mikolov et al., 2013) and user information.We have decided to combine these technologies for several reasons.Firstly, we have applied SVM many different tasks including tweet polarity classification with good results (Saleh et al., 2011).Secondly, after a revision of the systems presented in the last year for the same task (Nakov et al., 2016), it seems that better results are achieved by using word embeddings representations, so we have decided to test how it works on user modeling.Finally, this year for the first time, organizers include user information.We consider that it is very interesting to integrate this contextual information to improve tweets sentiment classification.Actually, polarity classification on a per-user basis has been found to be useful in tasks like collaborative filtering (García-Cumbreras et al., 2013).Besides, the generation of user profiles in Twitter has attracted the attention of many researches in recent years, enabling the prediction of user behavior as in election processes (Pennacchiotti and Popescu, 2011).
In Section 2 we explain the data used in our approach.Section 3 presents the system description.Experiments and results are expounded in Section 4 and they are analyzed in Section 5. Finally, in Section 6, conclusions and future work are commented.

Data
The organizers provided English data from previous years (2015 and 2016).The test set corresponding to 2016 was also supplied for development purposes but, since then, it can be used for training too.

System description
The system presented is based on user modeling.
It determines the user opinion on a tweet according to a user model generated from his timeline.
In our experiments, all tweets are vectorized using Word2Vec.First, a general SVM model on training vectors is generated.Then, for each user in the test set, the system downloads the last 200 tweets published by the user and classifies them using a general SVM classifier, the one resulting from the training set.If the classified tweets from the timeline contains positive and negative tweets and an specific SVM model of the timeline reports an accuracy over 0.7 on leave-one-out cross-validation, the user model is applied on authored tweets from the test set; if not, the general SVM model is applied.Thus, we try to train a per-user classifier, whenever feasible.
For the Word2Vec representation of the tweets, it has been used the software 1 developed by the authors of the method (Mikolov et al., 2013).In order to get representative vectors for each word, it is needed to generate a model from a large text volume.To this end, a Wikipedia 2 dump in English of the articles in XML was downloaded, and the text from them was extracted.The parameters used have been those that provided better results in previous experiments with Spanish tweets (Montejo-Ráez and Dıaz-Galiano, 2016;Montejo-Ráez et al., 2014): a window of 5 terms, the CBOW model and a number of dimensions expected of 300.In this way, each tweet of the training and test set has been represented with the resultant vector of calculating the average and standard deviation of the Word2Vec vectors from words in the tweet text, resulting in a final vector of 600 features.Previously, a simple normalization has been performed on each tweet: repeated letters have been eliminated, stop words have been 1 https://code.google.com/p/Word2Vec/ 2 https://dumps.wikimedia.org/enwiki/removed and all words have been transformed to lowercase.
The SVM implementation selected is that based on LibSVM (Chang and Lin, 2011) provided by the Scikit-learn library (Pedregosa et al., 2011).

Experiments and results
Three different experiments were conducted over the development set as follows (Fig. 1 and Fig. 2): • Experiment 1: a general SVM model on Word2Vec representations of training tweets was generated.Each tweet of the development set was vectorized using Word2Vec and classified with the model obtained previously.
• Experiment 2: each tweet vector was expanded with a user vector.
A general SVM model was also generated, but on both the Word2Vec representation of the training tweets and user timeline.For every user in the training tweets, the last 200 tweets from his timeline were downloaded.These tweets were used to enrich the vector of each individual tweet.Each tweet of the development set along with user timeline who posted it were vectorized using Word2Vec and the tweet was classified with the model.
• Experiment 3: the general SVM model of experiment 1 was used but one model per user was also defined.In order to define the user model, the last 200 tweets published by the user were retrieved and each of them was vectorized and classified using the general SVM model.Each tweet of the development set was vectorized using Word2Vec and classified according to the following approach: if the model corresponding to the user contains positive and negative tweets and the leaveone-out cross-validation reports an accuracy over 0.7%, the tweet is classified with the user model; if not, it is classified with the general SVM model.
The results obtained in the development phase are shown in Table 2.Although experiment 1 was the one that provided the best results, for our participation in the task, we selected the approach developed in experiment 3 because it takes into account user information, one of the challenges of this year.Experiment 2 also considers user information and got better results than experiment   3 in the development phase, but we did not select it because we considered that the fact of adding tweets without more sense was not a good idea.Experiment 3 makes more sense, since it defines a personal model for each user based on the way he thinks.
The results for all participants in the test phase can be seen in Table 3 and the detailed report of the results for all participants can be found at (Rosenthal et al., 2017).
Once the gold standard corresponding to the test phase was released, we also conducted other experiments that we defined in the development phase.The results related to the test set in all the experiments are shown in Table 4. Following, in the next section, an in-depth analysis of the results obtained is performed.

Analysis of results
The results obtained do not seem to support the integration of content from users' timelines.In Table 4 we can see that using word embeddings in tweet words straightforward yielded the best results.Adding further user information did not improve the first setup.A model of the user under the form of an aggregated vector computed from his timeline, or a specific polarity classifier for each user involves, first, to download hundreds of tweets for every single user in the data set and, second, use these tweets to compute a final user model.
It is important to note that the SemEval data set is very unbalanced, and that can affect the generation of user classifiers.Besides, not additional data has been used to determine the polarity of tweets in the timeline, so the effects of a bad performance might be, therefore, amplified.Anyhow, experiment 3 shows similar results as the other two approaches, despite the potential bias that recent tweets from the timeline may have on the classification process.

Conclusion
Working on timelines has been found interesting as a source of information to generate user profiles (Bollen et al., 2011).Actually, as more text is obtained, further analysis on user behavior or personality can be performed (Diakopoulos and Shamma, 2010).
We will continue exploring how the timeline could be better integrated or analyzed for an effective user modeling process.As the timeline is provided on recent tweets, it could be worth downloading those closer to the moment when the tweet to analyze was published, so the context would be more coherent.

Table 1 :
Number of tweets provided for experimentation and testing.

Table 2 :
Results for the development phase.

Table 4 :
Results for the test phase.