Capturing User and Product Information for Document Level Sentiment Analysis with Deep Memory Network

Document-level sentiment classification is a fundamental problem which aims to predict a user’s overall sentiment about a product in a document. Several methods have been proposed to tackle the problem whereas most of them fail to consider the influence of users who express the sentiment and products which are evaluated. To address the issue, we propose a deep memory network for document-level sentiment classification which could capture the user and product information at the same time. To prove the effectiveness of our algorithm, we conduct experiments on IMDB and Yelp datasets and the results indicate that our model can achieve better performance than several existing methods.


Introduction
Sentiment analysis, sometimes known as opinion mining, is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes and emotions from written language. It is one of the most active and critical research areas in natural language processing (Liu, 2012). On the one hand, from the industry point of view, knowing the feelings among consumers based on their comments is beneficial and may support strategic market decisions. On the other hand, potential customers are often interested in other people's opinion in order to find out the choices that best fits their preferences (Moraes et al., 2013).
Previous studies tackled the sentiment analysis problem at various levels of granularity, from document level to sentence level due to different objectives of applications (Zhang et al., 2009). In this work, we mainly focus on document-level sentiment classification Basically, the task is to predict user's overall sentiment or polarity in a document about a product (Pang and Lee, 2008).
Most existing methods mainly utilize local text information whereas ignoring the influences of users and products (Tang et al., 2015). As is often the case, there are certain consistencies for both users and products. To illustrate, lenient users may always give higher ratings than fastidious ones even if they post the same review. Also, it is not surprising that some products may always receive low ratings because of their poor quality and vice versa. Therefore, it is necessary to leverage individual preferences of users and overall qualities of products in order to achieve better performance. Tang et al. (2015) proposed a novel method dubbed User Product Neural Network (UPNN) which capture user-and product-level information for sentiment classification. Their approach has shown great promise but one major drawback of their work is that for users and products with limited information, it is hard to train the representation vector and matrix for them. Inspired by the recent success of computational models with attention mechanism and explicit memory (Graves et al., 2014;Sukhbaatar et al., 2015), we addressed the aforementioned issue by proposing a method based on deep memory network and Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997). The model can be divided into two separate parts. In the first fart, we utilize LSTM to represent each document. Afterwards, we apply deep memory network consists of multiple computational layers to predict the ratings for each document and each layer is a content-based attention model.
To prove the effectiveness of our algorithm, we have conducted experiments on three datasets derived from IMDB and Yelp Dataset Challenge and compare to several other algorithms. Experimental results show that our algorithm can outperform baseline methods for sentiment classification of documents by leveraging users and products for document-level sentiment classification.

Memory Network
In 2014, Weston et al. (2014) introduced a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component. The long-term memory can be read and written to and then it can be used for prediction. Generally, a memory network consists of an array of objects called memory m and four components I, G, O and R, where I converts input to internal feature representation, G updates old memories, O generates an output representation and R outputs a response.
Based on their work, Sukhbaatar et al. (2015) proposed a neural network with a recurrent attention model over a possibly large external memory. Unlike previous model, their model is trained endto-end and hence requires significantly less supervision during training. They have shown that their model yields improved results in language model and question answering.
Inspired by the success of memory network, Tang et al. (2016) introduce a deep memory network for aspect-level sentiment classification. The architecture of their model is similar to the previous model and experimental results demonstrate that their approach performs comparable to other state-of-the-art systems. Also, Li et al. (2017) decompose the task of attitude identification into two separate subtasks: target detection and polarity classification; and then solve the problem by applying deep memory network so that signals produced in target detection provide clues for polarity classification and the predicted polarity provides feedback to the identification of targets.

Sentiment Classification
Most existing work tackle the problem of sentiment classification by manually design effective features. such as text topic (Ganu et al., 2009) and bag-of-opinion (Qu et al., 2010) . Some work take user information into consideration. For example, in 2013, Gao et al. (2013) design userspecific features to capture user leniency. Also, Li et al. (2014) incorporate textual topic and userword factors with supervised topic modeling. Tang et al. (2015) points out that it is critical to leverage users and products for documentlevel sentiment classification. They assume there are four types of consistencies for sentiment classification and validate the influences of users and products in terms of sentiment and text on massive IMDB and Yelp reviews. Their model represent each user and product as both vector and matrix in order to capture the consistencies and then apply convolutional neural network to solve the task.
To the best of our knowledge, no one has ever applied deep memory network to capture the user and product information and solve the tasks in sentiment classification at document-level.

Proposed Methods
In this section, we present the details of User Product Deep Memory Network (UPDMN) for sentiment classification at document level.

Basic Symbol and Definition
First we suppose U , P , D is the set of users, products and documents respectively. If user u ∈ U writes a document d ∈ D about a product p ∈ P and give the rating, we denote U (d) = {ud|ud is written by u, ud = d} and P (d) = {pd|pd is written about p, pd = d}. Then, our task can be formalized as follows: suppose u write a document d about a product p , we should output the predicted score y for the document d based on the input < d, U (d), P (d) > . The detail of these symbols would be illustrated in the following part. Figure 1 illustrates the general framework of our approach. Basically, inspired by the use of memory network in question answering and aspectlevel sentiment analysis (Sukhbaatar et al., 2015;Tang et al., 2016), our model consists of multiple computational layers (hops), each of which contains an attention layer and a linear layer.

General Framework of UPDMN
For every document in U (d) and P (d), we embed it into a continuous vector d i and store it in the memory. The model writes all document to the memory up to a fixed buffer size. Suppose we are given {d i } = {d 1 , ..., d n } to be stored in memory, for each layer we can convert them into memory vectors {m i } using an embedding matrix. The document d should also be embedded into q. Then, we compute the match {p i } between q and each memory m i . Afterwards, we embed {d i } into

Embedding Documents
Although there are several state-of-the-art techniques to embed word into vectors (Mikolov et al., 2013a), for document-level sentiment classification, the document we need to classify is usually too long to be represented as a vector. People have tried different ways to solve the task. For example, Kalchbrenner et al. (2014) apply convolutional neural network for modeling sentences and Li et al. (2015) introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words.Some of these work can be incorporated into our methods. However, here we only use the LSTM model to embed each document, i.e. every word in the document is fed into LSTM and the final representation is obtained by averaging the hidden state of each word, and the experimental results shows that this simple embedding method can actually obtain satisfactory results.

Attention Model
After obtaining the embedding vector q for document d the memory vectors {m i } for each memory, we calculate the match between q and m i using the following equation: Afterwards, we compute the corresponding output o for each hop by summing over the c i , weighted by the probability vector from the input:

Final Prediction and Training Strategy
At last hop, the output vector is fed into a softmax layer and thus generates a probability distribution {y i } over ratings. The score with the highest probability would be considered as our final prediction py. During training, we try to minimize the cross entropy error of sentiment classification in a supervised manner. The specific equation is shown as follows: (3) where Y is the collection of sentiment categories, I(y = y i |d) is 1 or 0, indicating whether the correct category for d is y i , and P (y = y i |d) represents the probability of classifying document d as category y i .

Experiment
In this section, we will first discuss the experimental setting and then display the results.

Experimental Settings
We use the same datasets as Tang et al. (2015), which are derived from IMDB (Diao et al., 2014) and Yelp Dataset Challenge in 2013 and 2014 1 . Statistical information of the datasets are given in Table 1.
In order to measure the performance of our model, here we use three metrics. Specifically, we use accuracy to measure the overall sentiment classification performance, M AE and RM SE to measure the divergences between prediction py and ground truth gy. The formulas for these three metrics are listed as follows:

Baseline Models
We compare UPDMN with the following models: (1) Majority : it assigns each review in the test dataset with the majority sentiment category in training set.
(2) Trigram : it first takes unigrams, bigrams and trigrams as features and then trains a classifier with SVM (Fan et al., 2008).
(3) TextFeature : it takes hard-crafted text features such as word/character n-grams, negation features and then trains a classifier with SVM.
(7) RNTN+RNN : it represents each sentence with RNTN, composes document with recurrent neural network , and then averages hidden vectors of recurrent neural network as the features (Socher et al., 2013).
(9) JMARS: it is the recommendation algorithm which leverages user and aspects of a review with collaborative filtering and topic modeling (Diao et al., 2014).
(10) UPNN : as has been stated above, it also leverages user and product information for sentiment classification at document level (Tang et al., 2015).

Experimental Results and Discussion
The experimental results are given in Table 2. The results of baseline models are reported in (Tang et al., 2015). Our model is abbreviated to UPDMN(k), where k is the number of hops. With the increase of the number of hops, the performance of UPDMN will get better intially, which indicates that multiple hops can indeed capture more information to improve the performance. However, if there are too many hops, the performance would be not as well as before, which may be caused by over-fitting.
Compared with other models, we can see that with proper setting, our model achieve superior results. All these results prove the effectiveness of UPDMN and the necessity to utilizing user and product information at document level.
It should be noticed that there are still several improvements can be made, such as better representation of documents or more sophisticated attention mechanism. We believe that our model has great potential and can be improved in many ways.