XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter

This paper describes the XJSA System submission from XJTU. Our system was created for SemEval2017 Task 4 – subtask A which is very popular and fundamental. The system is based on convolutional neural network and word embedding. We used two pre-trained word vectors and adopt a dynamic strategy for k-max pooling.


Introduction
Several years ago, the typical approaches to sentiment analysis of tweets were based on classifiers trained using several hand-crafted features, in particular lexicons of words with an assigned polarity value.About since 2014 the deep neural network methods have got state-ofthe-art results in many NLP tasks, especially in sentiment classification.The work of Harvard NLP group in 2014 and Kalchbrenner's work in 2014 have suggested that convolutional neural network and word embedding play important roles in this field.General word embedding has got excellent results.If we can embed sentiment information in vectors, we will get better results.
There are some open word vectors on the web already such as Word2Vec (Mikolov et al., 2013), Glove (Pennington et al., 2014), SSWE (Tang et al., 2014).In our system we use Word2Vec and SSWE at the same time.Deep learning models have achieved excellent results in computer vision and speech recognition in recent years.In the field of natural language processing, much work with deep learning methods has involved learning word vectors representations for their own task or problem (Bengio et al., 2003;Mikolov et al., 2013, Collobert C&W et al., 2011).The others exploit the open word vectors which was mentioned above.Word vectors is a transformation of the feature of letter,word,sentence and paragraph or even text.It's a lower dimensional, dense and continuous vectors.In this vector, the words have similar syntactic are close -in Euclidean or cosine distance in the vector space.So one can study and compare the syntactic functionality between different words via word vectors.
Convolutional neural network (CNN) utilize layers with convolutional filters that are applied to local features (LeCun et al., 1998).CNN originally invented for computer vision, recently CNN models have achieved remarkably results in many natural language processing problem, such as sentence modeling (Kalchbrenner et al., 2014), semantic parsing (Yih et al., 2014), sentiment classification (kim et al., 2014) and other traditional natural language processing tasks (Collobert C&W et al., 2011).
Our system was inspired by the work (kim et al., 2014) and another work (Tang et al., 2014).In the aspect of CNN, we use a simple 3 layers CNN to automatic extract features.In the aspect of pretrained vectors, we use the Word2Vec and SSWE to filter our training set to get a proper input for CNN.The reason that we use the vectors trained by Mikolov et al. (2013) is the 100 billion words of Google News and the vectors are publicly for free.We use the SSWE vectors because the vectors was especially trained for sentiment classification by Tang et al (2014).SSWE contains sentiment information which is not in word vectors trained by Mikolov.

Model
The architecture of our system shown in figure 2 is a simple 3 layers CNN just like the architecture of Kim et al (2013).
is the kdimensional word vector corresponding to the i-th word in the sentence.A sentence of length n is described as , ...
Where ⊕ is the concatenation operator.Then we let stand for the concatenation of words is applied to a window of h words to produce a new feature.For example, a feature i c is generated from a window of words where   b is a bias term and f is a non-linear function.This filter is applied to each possible window of words in the sentence } ,..., ,  { : . Then a max-over-time pooling operation is applied just like Collobert C&W et al. (2011).over the feature map and take the maximum value } max{ ^c c 

Experimental Setup
We test our system on the following settings:

Hyper-parameters and Training
There are four models in Kim et al (2013):CNNrand,CNN-static,CNN-non-static,CNNmultichannel.We use rectified linear units, filter windows (h) of 3, 4, 5 with 100 feature maps each, dropout rate (p) of 0.5, 2 l constraint (s) of 3, and mini-batch size of 50.These values were chosen via a grid search on the dev sets.

Pre-trained Word Vectors
Initializing word vectors with those obtained from an unsupervised neural language model is a popular method to improve performance in the absence of a large supervised training set (Collobert C&W et al., 2011;Socher et al., 2011;Iyyer et al., 2014).First we use SSWE (Tang et al. 2014) which is a word vector contains sentiment information.We also use the publicly available word2vec vectors which were trained on 100 billion words from Google News.The vectors have dimensionality of 300 and were trained using the CBOW architecture.Because the dimensionality of vectors in SWEE is 50, so we extended it to 300 dimension by padding the 250 dimension randomly.

Environment of experiment
The experiments were run on a linux server with an nVIDIA GTX 1080 accelerated GPU.

Results
In order to compare the results of our system with other better system's results, here we show enough results generated by our system and the top one.The official submission achieved results presented in Table 1, compared to the top scoring system.We also list our detailed scores in Table 2 6 Conclusion Our work based on the method with deep learning neural network built on the top of word2vec and SSWE.We can find if we exploit the sentiment information in the pre-trained word vector we would get better result.Our work and some previous work mentioned in this paper show that unsupervised pre-training of word vectors plays an important role in deep learning for sentiment analysis.
is shown above,traditional methods typically model the syntactic context of words but ignore the sentiment information of text.As a result, words with opposite polarity are mapped into close vectors, such as good and bad, just as Word2Vec.So in our system,we use SSWE and Word2Vec at the same time for word embedding, SSWE first.Tang et al.(2014) introduce SSWE model to learn word embedding for Twitter sentiment classification.In our task,We use the word vector trained by u SSWE ,which captures the sentiment information of sentences as well as the syntactic contexts of words.uSSWE is illustrated in Figure1.Given an original(or corrupted)n-gram and the sentiment polarity of a sentence as the input, SSWEu predicts a two-dimensional vector for each input n-gram.t is the original n-gram, r t

Table 1 :
Official results of our submission compared to the top one.

Table 2 :
Detailed scores of XJSA official submission.