Word Embedding and Topic Modeling Enhanced Multiple Features for Content Linking and Argument / Sentiment Labeling in Online Forums

Multiple grammatical and semantic features are adopted in content linking and argument/sentiment labeling for online forums in this paper. There are mainly two different methods for content linking. First, we utilize the deep feature obtained from Word Embedding Model in deep learning and compute sentence similarity. Second, we use multiple traditional features to locate candidate linking sentences, and then adopt a voting method to obtain the final result. LDA topic modeling is used to mine latent semantic feature and K-means clustering is implemented for argument labeling, while features from sentiment dictionaries and rule-based sentiment analysis are integrated for sentiment labeling. Experimental results have shown that our methods are valid.


Introduction
Comments to news and their providers in online forums have been increasing rapidly in recent years with a large number of user participants and huge amount of interactive contents. How can we understand the mass of comments effectively? A crucial initial step towards this goal should be content linking, which is to determine what comments link to, be that either specific news snippets or comments by other users. Furthermore, a set of labels for a given link may be articulated to capture phenomena such as agreement and sentiment with respect to the comment target.
Researchers have tried various features and methods for sentiment and argument labeling. The main features are different kinds of sentiment dictionaries, while the basic method is the rule-based one. The major method for sentiment and argument labeling is based on statistical machine learning algorithms (Aker et al., 2015;Hristo Tanev, 2015;Maynard and Funk, 2011).

Task Description
We work on three tasks for English and Italian in this paper. The first one is content linking, which is to find all the linking pairs for comment sentences. In every pair, one sentence belongs to the original article or a former comment by an author, the other belongs to a comment by a later commentator. The second and third tasks are to tag two kinds of labels to the linking pairs that were found in the first task. Labels involve argument label and sentiment label. For argument label, it focuses on whether or not a commentator agrees with the commentated author. For sentiment label, it cares about the sentiment of comment sentences. Experiments are implemented on the training data released by MultiLing 2017, including 20 English news (from The Guardian) and 5 Italian (from Le Monde) news with some comments.

Methods
For content linking, we adopt the Word Embedding Model to dig up word vectors as linking information of sentence pair with deeper semantic fea- tures. Besides, we also use some traditional features of sentence similarity which performed well through experiments and explore how to fuse them together with Word Embedding features. For this purpose, first, we try to use every single feature to get one linking sentence, next, we choose the most repetitive sentence as final result via a voting method. Then we mainly use rule-based sentiment analysis to obtain the sentiment label. LDA (Latent Dirichlet Allocation) (Blei et al., 2003) topic model and K-means (Hartigan and Wong, 1979) are integrated to obtain the argument label. Figure 1 shows the process for content linking.

Pre-Processing
We crawl 1.5G data from the English Guardian website to train word vectors for English, and about 1G data from Wikipedia for Italian. Then we use the tool named word2vec (Mikolov et al., 2013) for training.

Method 1-Word Vector Algorithm
After the training of word embedding models, a sentence in the corpus can be expressed as: Where w t is the word vector of 300 dimensions of word t. Then two sentences W i and W j can form a calculating matrix M i,j : (2) Before the computation of (w t , w v ), we need some processing steps: stemming as well as stop words and punctuation removing. Besides, it is essential to check relations between word t and word v based on WordNet. If they exist in the hyponyms/hypernyms part of each other, they can be seen as the same.
The cosine distance can represent (w t , w v ), and the similarity of sentences i and j is: Where maxM m,n is obtained through the following concrete steps. First, find out the maximum of M i,j , then delete the row and column of the maximum. Next, find the maximum of the remaining matrix and remove row and column like the former step. Do the same procedure until the matrix is empty. Finally add up all the maximum values. length i is the number of word vectors in the sentence, and length i length j is used to reduce the influence of sentence length.
We think that the maximum value in the matrix can represent the most matching word pairs in the two sentences. We just choose the maximum value in each step and delete the word pairs in the matrix for next iteration until the matrix is empty. As a result, we can find out all the best matching word pairs in the two sentences. Hence accumulation of word similarities of all these best matching word pairs can represent the similarities of the two sentences.
Based on the above sentence similarities, we can extract those sentences with the highest similarity to a comment sentence as its linking result.

Method 2-Feature Fusion Algorithm
This algorithm is only for English. We have two kinds of features, one is from lexicons, and the other is from sentence similarities.
We have three lexicons: Linked Text highfrequency word lexicon (Lexicon 1), LDA lexicon (Lexicon 2), Comment Text and Linked Text cooccurrence lexicon (Lexicon 3). For Lexicon 1, we pick up the words with high frequency from standard answers artificially, and then expand them  Figure 2: Argument label process through WordNet and word vectors, resulting in a lexicon. For Lexicon 2, we use LDA model to train the news and comments to get a lexicon of 25 latent topics in every file independently. For Lexicon 3, we obtain the co-occurrence degree between words by the word frequency statistics of comment and its linked sentence from the training corpus.
As for sentence similarities, we have word vector similarity, jaccard similarity, idf similarity, res similarity, jcn similarity and path similarity. Word vector similarity is calculated through Method 1. We add up the idf values of the same words between two sentences to represent their idf similarity. The last three similarities are from WordNet.
For every feature, we can use it to get a sentence with the highest score. Then among these nine sentences chosen by nine features, we use voting method to choose the most repetitive sentence as the final linking result. When some sentences get the same votes, we choose the first one according to sentence order in the input news and comments. Figure 2 shows the process for argument label.

Argument Label
Given a collection of sentences in the input file, we wish to discover topic distribution of every sentence through LDA model. We generate the input file for LDA first. For every sentence, we change it into its bag-of-words model representation, which assumes that the order of words can be neglected. During LDA modeling, we set the topic number to 15 according to the experiments. That is to say, later in K-means clustering, our feature is the 15-dimension vector. We run K-means to cluster all sentences into two categories. For every sentence pair, if the two sentences belong to the same category, then we set the label to in favour, else, against.  There are three kinds of seed sentiment dictionaries discovered from OpinionFinder system (MPQA, http://mpqa.cs.pitt.edu/). One is subjectivity lexicon, the other two are called Intensifier and Valenceshifters lexicon. Intensifier lexicon involves words which can improve the sentiment level. Valenceshifters lexicon involves words which can alter the sentiment label.
The original dictionary is in English. We use machine translation to add Italian vocabularies. With DLDA (Chen et al., 2014), we can get all sentiment weights of words in corpus. At last, the word which is not included in seed has the same polarity with a seed word if their sentiment weight distance can be ignored.
Through DLDA, every word gets a sentiment state. We map the sentiment state to a number of word score as in Table 1. We accumulate word score in a sentence to obtain the sentence score, which is then mapped to the sentiment label as in Table 2.

Sentiment state
Word score Weak neg(only) -1 Strong neg(only) -2 Strong pos(only) 2 Weak pos(only) 1 Neutral 0 Intensifier+weak neg -2 Intensifier+weak pos 2 Note that when current sentence score is bigger than 0 and current word is in Valenceshifters and the score of current word is less than 0, sentence score = sentence score * (-1), or current sentence score is less than 0 and current word is in Valenceshifters and the score of current word is more than 0, sentence score strategy is the same. For any other conditions, we simply accumulate the word score.   Table 3 and Table 4 show the performance of Method 1 and Method 2 in our experiments respectively. The first and third rows in Table 4 are the threshold. F1 to F9 refer to 9 features respectively (word vector, jaccard, idf, res, jcn, path, lexicon 2, lexicon 1 and lexicon 3) and the number means the vote for corresponding feature.
From Table 3, we can find out that, for Method 1, the bigger threshold usually can bring the higher precision. But the sentences we obtain may be fewer, too. This will cause low recall rate. According to the precision evaluation method used by MultiLing 2015, precision of 86 is high. Thus we can have good precision here. For Method 2 in Table 4, although its precision is a little lower than that of Method 1, it can also achieve good result. Lexicon 3 shows its good performance, other features like jaccard and idf perform well,

Argument and Sentiment Label
From Table 6, we can find out that when we set the threshold at 0.2 and 0.3, we can get the highest precision in both argument label and sentiment label. However, unlike the linking precision mentioned above, the bigger thresholds result in lower precision. The reason may be that when we set a bigger threshold, the linking sentences we obtain are much fewer. Sometimes we can only get one or two sentence pairs. If there are any wrong answers in the results, it will obviously decrease the precision.

Conclusion
For content linking, our system has tried to mine both syntactic and semantic information, and the performances are good. For argument and sentiment labeling, we focus on machine learning algorithm and sentiment dictionaries. And there is still space for us to improve. Our future work is to find some better ways to mine and use more semantic features for both content linking and labeling.