EICA at SemEval-2017 Task 4: A Simple Convolutional Neural Network for Topic-based Sentiment Classification

This paper describes our approach for SemEval-2017 Task 4 - Sentiment Analysis in Twitter (SAT). Its five subtasks are divided into two categories: (1) sentiment classification, i.e., predicting topic-based tweet sentiment polarity, and (2) sentiment quantification, that is, estimating the sentiment distributions of a set of given tweets. We build a convolutional sentence classification system for the task of SAT. Official results show that the experimental results of our system are comparative.


Introduction
With the rapid growth of social media such as Twitter, sentiment classification towards the user generated texts has attracted increasing research interest. The objective of sentiment classification is identifying the sentiment of a text into binary polarity (Positive vs. Negative) or single-label multi-class (e.g., Very positive, Positive, Neutral, Negative, Very negative). Feature representation is one of key points for this kind of classification, which generally falls into two categories: (1) traditional feature engineering (Liu, 2012;Mohammad et al., 2013), such as sentiment lexicon, ngrams, dependency triple, etc. (2) deep learning methods (Zhao et al., 2015;Yang et al., 2016), which use exquisitely designed neural network to encode input texts and to get text feature representation. Recently, deep learning approaches emerge as powerful computational models for text sentiment classification, and have achieved new stateof-the-art result in some datasets.
SemEval-2017 provides a universal platform for researchers to explore the task of twitter sentiment analysis. In this paper, we explore Task 4 (Rosenthal et al., 2017), which includes five subtasks: subtask A, B and C are related to the task of sen-timent classification, and subtask D and E are related to sentiment quantification (that is distributions of sentiments). Considering the length limitations of tweets, we view the subtasks of SAT as sentence-level sentiment analysis. We design a convolutional neural network for topic-based sentiment classification.

System Description
In this section, we describe the neural network architecture of our system. As shown in Figure 1, our system consists of six layers, an input layer, a convolutional layer, a max-pooling layer, a topic embedding layer, a concatenate layer, and an output layer.
Input layer. A tweet text can be denoted as a sentence sequence x with n words, x = [w 1 , w 2 , · · ·, w i , · · ·, w n ]. To obtain word vector of word w i , we look-up word embedding matrix E, where e(w i ) ∈ R d , E ∈ R |V |×d , |V | is the vocabulary size. Then, we get an input matrix X = [e(w 1 ); · · ·; e(w n )], where X ∈ R n×d . Convolution layer. The convolution action has been used to capture n-gram information (Collobert et al., 2011), and n-gram has been shown useful for twitter sentiment analysis (Dos Santos and Gatti, 2014). In this layer, a set of m filters is applied to a sliding window of length h over each tweet matrix X, and a feature c i ∈ R n−h+1 is generated from a window of words e(w) i:i+h by: where f is an activation function, and b ∈ R is a bias term. The vectors c = [c 1 ⊕ · · · ⊕ c m ] are then aggregated over all m filters into a feature map matrix. We consider m is 3, and h is chosen in {3, 4, 5}.
Max-pooling layer. In order to get a fixed dimension vector, we exploit pooling techniques to Figure 1: The framework of the simple CNN for topic-based sentiment classification.
get sentence representation S ∈ R m , and we adopt max pooling function.
Topic embedding layer. To make the best use of topic information, we propose to learn an embedding vector t i for each topic: where w 1 , · · · , w k are topic words, t i ∈ R s , avg(·) is a element average function, and W (1) ∈ R s×d .
Concatenation layer. We use a concatenation layer to get tweet representation which can be formed as: where ⊕ is the concatenation operator, W (2) ∈ R s×(s+m) . Output layer. Finally, we use a softmax layer to get the class probability: Where S t (i) denotes the tweet representation with sentiment class y i . W j is jth column of parameter W ∈ R 2s×C and C is number of categories.
Training process. The training goal is to minimize the cross-entropy loss over the training set T : (5) where C is the number of classes, x represents a tweet, θ is the model parameters, P g (x) is the goal probability, which has the same dimension as the number of classes, and only the corresponding goal dimension is 1, with all others being 0.
We use mini-batch gradient descent algorithm to train the network, with the batch size is 32 and a learning rage of 0.01.
We also use Adadelta (Zeiler, 2012) to optimize the learning of θ, which is a effective method to train the neural networks. We initialize all the matrix and vector parameters with uniform samples in − 6/(r + c), + 6/(r + c) (Glorot and Bengio, 2010), where r is the rows numbers , and c is the column numbers.
Pre-training Word Embedding We adopted the word2vec tool 1 to obtain word embedding with the dimensionality of 100, trained on 238M tweet from Sentiment140 2 .

Datasets
Since only tweet IDs are provided by organizers, Some tweets are no longer available on Twitter due to tweets miss or system errors. Subtask B and D share one dataset, while subtask C and E share the other dataset. An overview statistics of the data available for download are given in Tables 1, 2

Tweet Preparation.
We preprocessed all of our datasets as follows: • The tweet text was lowercased.
• URLs and mentioned usernames were substituted by replacement tokens < LINK > and < MENTION > respectively. We also map numbers to a generic NUMBER token.
• All words that appear less than 5 times in the training were removed.

Result on Test Data
Subtask A. For this subtask, there is no topic information, so we removed the Concatenate and Topic Embedding parts in Figure 1. We report the result of our system in Table 4.  As shown in Table 4, we obtained poor performance in Subtask A. In order to further analysis our system performance on three-point scale(positive, negative, neutral), we show the detail results in Table 5 Our system did not distinguish the positive and negative class, but it performed well in neutral class. The unbalanced train data distribution may influence our system: 49%(positive), 31%(neutral) , 20%(negative).
Subtask B and C. The results of our system for Subtasks B and C are reported in Table 6 and Table 7, individually. For these two subtasks, the organizers make available alternative metrics. We found that the choice of the scoring metric influences results considerably, for example, in Subtask C, our system ranked second by M AE µ while ranked 8th in M AE M .

Conclusion
In this paper, we used a simple convolution neural network to accomplish sentiment analysis towards sentence level (i.e., subtask A) and topic level (i.e., subtask B, C), without using any user information. In future work, we will focus on developing advanced neural network to model sentence with the aid of user information. we also would like to ensemble deep leaning based classifier with handcrafted features based classifier to improve the system performance, in the next Se-mEval competition.