IITP at SemEval-2017 Task 5: An Ensemble of Deep Learning and Feature Based Models for Financial Sentiment Analysis

In this paper we propose an ensemble based model which combines state of the art deep learning sentiment analysis algorithms like Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) along with feature based models to identify optimistic or pessimistic sentiments associated with companies and stocks in financial texts. We build our system to participate in a competition organized by Semantic Evaluation 2017 International Workshop. We combined predictions from various models using an artificial neural network to determine the opinion towards an entity in (a) Microblog Messages and (b) News Headlines data. Our models achieved a cosine similarity score of 0.751 and 0.697 for the above two tracks giving us the rank of 2nd and 7th best team respectively.


Introduction
Sentiment analysis of financial text is an important area of research.It has been shown that sentiments and opinions can affect market dynamics (Goonatilake and Herath, 2007).Social media has created a new world of venting customer voice.People tend to express their personal sentiment about the stock market through tweets.On the other hand, news presents the macroeconomic factors, company-specific or political information.Positive news tend to bring optimism and lift the market where as negative news effect the market in opposite direction (Van de Kauter et al., 2015).Sentiment analysis gives organizations the ability to observe the various social media sites in real time and then act accordingly.Twitter is considered to be an ocean of sentiment data.
A study indicates that sentiment analysis of public mood derived from Twitter feeds can be used to eventually forecast movements of individual stock prices (Smailović et al., 2014).All these evidences show us that financial sentiment analysis has a lot of untapped power and extensive research in the field can help us gain great insight about the financial market.The fundamental problem with classifying financial tweets is the presence of noise.The natural use of short, informal languages, emoticons, hashtag and sarcasm in tweets makes the sentiment analysis problem especially challenging.
News headlines usually use limited number of words to summarize the article.Moreover, aspects like language patterns, writing style, irony usage differs notably among different news categories and articles.Use of articles, verb form of 'to be', conjunction are very rare in practice.
In this paper we describe our proposed system as part of the 'SemEval-2017 Task 5 on Fine-Grained Sentiment Analysis for Financial Microblogs and News' (Cortis et al., 2017).We propose a multilayer perceptron (MLP) based ensemble method that leverages the combination of deep learning and feature based models for the prediction.Our system produces 4th and 8th best cosine similarity score for microblogs messages and news headline respectively.A total of 25 teams participated for the microblogs messages task while 29 teams submitted their systems for the news headline track.
The task defines sentiment score prediction in two separate tracks i.e. microblogs and news headlines.The objective of the task is to predict a sentiment score associated with a company/cashtag in the text.The sentiment score lies in a continuous range of -1(very bearish) to +1(very bullish).Cashtag refers to a stock symbol that uniquely identifies a company.For e.g.$AAPL represents stock symbol for the company Apple Inc.Every instance of microblogs messages also include a span which indicates a part of text from where prediction should be derived.
This rest of the paper is organized as follows: Section 2 illustrates our system architecture in detail.We present our experimental results in Section 3. Finally, Section 4 presents our conclusions.

System Description
In this section we discuss our proposed system for the task.We developed a multi-layer perceptron (MLP) based ensemble approach which learns on top of a convolution neural network (CNN), a long short term memory network (LSTM), a vector averaging MLP and a feature driven MLP model.We separately train and tune all the models and then feed the prediction scores of each model as input to an MLP for ensembling.Training and tuning of this system is performed separately.The resultant pipeline is used to predict the final sentiment score.

Word Embeddings
Word embeddings are generally helpful in many natural processing tasks due to it's excellence in capturing hidden semantic structures.For word embeddings we used two pre-trained embedding models: GloVe 1 and Word2Vec 2 .For microblogs messages we used GloVe (Pennington et al., 2014) and Word2Vec (Godin et al., 2015) twitter model trained on 2 billion and 400 million tweets respectively.For news headline we used GloVe common crawl model trained on 802 billion words and Word2Vec Google News model (Mikolov et al., 2013).We experimented with 200, 300 and 400 dimension vectors and observed that 200 & 300 dimension vectors are the near-optimal case for microblogs messages and news headlines respectively.We have used concatenation of word embeddings to form sentence embeddings.

Convolutional Neural Network (CNN)
Convolutional neural network consists of one or more convolution and pooling layers followed by one or more dense layers.Our system uses 2 convolution layers followed by a max pool layer, 2 dense layers and an output layer.Size of convolution filters dictates the hidden features to be ex-1 http://nlp.stanford.edu/projects/glove/ 2 https://code.google.com/archive/p/word2vec/tracted.We employ 50 such filters while sliding over 1, 2, 3 and 4 word(s) at a time.

Long Short Term Memory (LSTM)
LSTMs are special kind of recurrent neural network which can efficiently learn long-term dependencies.We use two layers of LSTM on top of each other followed by 2 dense layers and a output layer.We fix number of neurons on each LSTM layers as 100.For the dense layer we use 50 and 10 neurons in the hidden layers.

Multilayer Perceptron (MLP) -Vector Averaging Model
Concatenation of word vectors for generating sentence embeddings often face the curse of highdimensionality.In an attempt to get a constant low-dimensional feature vector we employ vector averaging technique for producing sentence vector.We perform an element wise averaging of the word vectors in a tokenized tweet/headline.We then use the sentence embeddings to train a 3layered neural network for the prediction.

Multilayer Perceptron (MLP) -Feature Driven Model
This model is based on various lexical and semantic features.We trained a multilayer perceptron on top of the following features.
• Character ngrams: tf-idf weighted counts of continuous sequences of 2, 3, and 4 characters; • Word ngrams: tf-idf weighted counts of continuous sequences of 1, 2, 3, and 4 words; • POS-tag: parts of speech tags of each token in the text; • Lexicons: -Following set of features are used for each of the four lexicons: Opinion Lexicon (Liu et al., 2005)  In the above equation f req(w, pos) is the frequency of word w in positive text, f req(pos) is the number of words in positive headlines and N is the total number of tokens in the corpus.
• Microblog Specific Features: We use following features only for microblogs messages track: -the number of words with all characters in upper case.-the number of favorite and retweet counts of a message (tweet).-the number of hashtags in the message.
The multilayer perceptron network has three hidden layers and one output layer consisting of 500, 50, 10 and 1 neurons respectively.

Ensemble Model
Ensemble of various systems is an effective technique to improve the overall performance by assisting each other.Ensembling usually reduces the generalization error, which in turn reduces overfitting.Here we discuss second stage of the our proposed system.We merge predicted sentiment scores of all four models (CNN, LSTM, Vector Averaging, Feature Driven) to create a new feature vector, and then fed it into a multilayer perceptron (MLP) network for training.Figure 1 shows, an overall schema of the proposed approach.

Datasets
The training datasets comprises of 1700 and 1142 instances of microblogs messages and news headlines respectively.We used the span in microblogs message track and the title in news headlines track as the textual feature for all our experiments described in this paper.For validation we did a 80:20, train:development split of the full datasets.The split was done such that the relative percentage of sources (twitter and stocktwits), mean and standard deviation of sentiment scores were same in the training and development data.We trained our model on the train data and selected models for ensembling, based on results on development data.Figure 2 and 3 shows the distribution of sentiment scores for the two datasets.

Experiments
We used python based neural network package Keras3 for the implementation.We use ReLU activations for the intermediate layers and tanh activation for the final layer.Dropout (Srivastava et al., 2014) is a very effective regularization technique to prevent over-fitting of a network.It restrict convergence of weights to identical positions by randomly turning off the neurons during forward propagation.We use 15% dropout and 'Adam' optimizer (Kingma and Ba, 2014) for regularization and optimization.We train and validate each model on 80% & 20% of the full data respectively.Table 1 shows our results of deep learning models (D), feature based model (F) and vector averaging models (V) on the validation set.It also depicts the results of our ensemble model (E) on the development set.It should be observed that use of ensemble improves the performance by a margin of 2-3%.
We submitted the E1 and E6 systems for the final evaluation and got a test cosine similarity score of 0.751 and 0.697 for microblogs messages and news headlines tracks respectively.Table 2 reports cosine similarity of our system.

Conclusion
In this paper we presented an MLP based ensemble technique for predicting the sentiment score.The proposed approach is a robust regression algorithm which predicts optimistic or pessimistic sentiments of associated stocks and companies  in financial text.We implemented a variety of semantic and linguistic features for our analysis of the noisy text such as tweets and news headlines.We combined predictions of four models (i.e.CNN, LSTM, Vector Averaging MLP and Feature Driven MLP) for calculation of final prediction.Our submission stood 2nd and 7th in two tracks that involves microblogs messages and news headlines respectively in SemEval 2017 shared task on 'Fine-Grained Sentiment Analysis of Financial Microblogs and News'.

•
Pointwise Mutual Information (PMI): We calculate a sentiment score for each term in our training corpus to get the association of each term with positive as well as negative sentiment.score(w) = P M I(w, pos) − P M I(w, neg) PMI is calculated as follows:-P M I(w, pos) = log 2 f req(w, pos) * N f req(w) * f req(pos)

Figure 2 :
Figure 2: Histogram plot of sentiment scores in microblogs messages

Table 1 :
Cosine similarity score on validation set.

Table 2 :
Cosine similarity score on test dataset.