Talking to the crowd: What do people react to in online discussions?

This paper addresses the question of how language use affects community reaction to comments in online discussion forums, and the relative importance of the message vs. the messenger. A new comment ranking task is proposed based on community annotated karma in Reddit discussions, which controls for topic and timing of comments. Experimental work with discussion threads from six subreddits shows that the importance of different types of language features varies with the community of interest.


Introduction
Online discussion forums are a popular platform for people to share their views about current events and learn about issues of concern to them.Discussion forums tend to specialize on different topics, and people participating in them form communities of interest.The reaction of people within a community to comments posted provides an indication of community endorsement of opinions and value of information.In most discussions, the vast majority of comments spawn little reaction.In this paper, we look at whether (and how) language use affects the reaction, compared to the relative importance of the author and timing of the post.
Evidence that language use does matter is provided by recent work (Danescu-Niculescu-Mizil et al., 2012;Lakkaraju et al., 2013;Althoff et al., 2014;Tan et al., 2014), but studies also show the importance of topic, timing and author.Teasing these different factors apart is a challenge.The work presented in this paper provides additional insight into this question by controlling for these factors in a different way than previous work and by examining multiple communities of interest.Specifically, using data from Reddit discussion forums, we look at the role of author reputation as measured in terms of a karma k-index, and control for topic and timing by ranking comments in a constrained window within a discussion.
The primary contributions of this work include findings about the role of author reputation and variation across communities in terms of aspects of language use that matter, as well as the problem formulation and associated data collection.

Data
Reddit1 is the largest public online discussion forum with a wide variety of subreddits, which makes it a good data source for studying how textual content in a discussion impacts the response of the crowd.On Reddit, people initiate a discussion thread with a post (a question, a link to a news item, etc.), and others respond with comments.Registered users vote on which posts and comments are important.The total amount of up votes minus the down votes (roughly) is called karma; it provides an indication of community endorsement and popularity of a comment, as used in (Lakkaraju et al., 2013).Karma is valued as it impacts the order in which the posts or comments are displayed, with the high karma content rising to the top.Karma points are also accumulated by members of the discussion forum as a function of the karma associated with their comments.
The Reddit data is highly skewed.Although there are thousands of active communities, only a handful of them are large.Similarly, out of the more than a million comments made per day2 , most of them receive little to no attention; the distributions of positive comment karma and author karma are Zipfian.Slightly more than half of all comments have exactly one karma point (no votes beyond the author), and only 5% of comments have less than one karma point.For this study, we downloaded all the posts and associated comments made to six subreddits over a few weeks, as summarized in Table 1, as well as karma of participants in the discussion.All available comments on each post were downloaded at least 48 hours after the post was made.3

Uptake Factors
Factors other than the language use that influence whether a comment will have uptake from the community include the topic, the timing of the message, and the messenger.These factors are all evident in the Reddit discussions.Some subreddits are more popular and thus have higher karma comments than others, reflecting the influence of topic.Comments that are posted early in the discussion are more likely to have high karma, since they have more potential responses.
Previous studies on Twitter show that the reputation of the author substantially increases the chances of the retweet (Suh et al., 2010;Cha et al., 2010).On Reddit most users are anonymous, but it is possible that members of a forum become familiar with particular usernames associated with high karma comments.In order to see how important personal reputation is, we looked at how often the top karma comments are associated with the top karma participants in the discussion.Since an individual's karma can be skewed by a few very popular posts, we measure reputation instead using a measure we call the k-index, defined to be equal to the number of comments in each user's history that have karma ≥ k.The k-index is analgous to the h-index (Hirsch, 2005) and arguably a better indicator of extended impact than total karma.We consider two questions: i) how often is the top karma comment in a discussion thread from the highest k-index person participating in the discussion?and ii) what percentage of the comments from the top 3 karma person in the thread end up as one of the top 3 karma comments?
The results in Table 2 show that it is in fact rare for the top k-index person in the discussion to have the comment with the most karma.The highest frequency is in ASKSCIENCE, where expertise is more highly valued.Since people may have multiple comments in a thread, the chance that any particular one of these comments is the top comment is even lower.For most people contributing to Reddit, reputation has minimal impact.

Tasks
Having shown that the reputation of the author of a post is not a dominating factor in predicting high karma comments, we propose to control for topic and timing by ranking a set of 10 comments that were made consecutively in a short window of time within one discussion thread according to the karma they finally received.The ranking has access to the comment history about these posts.This simulates the view of an early reader of these posts, i.e., without influence of the ratings of others, so that the language content of the post is more likely to have an impact.Very long threads are sampled, so that these do not dominate the set of lists.Approximately 75% of the comment lists are designated for training and the rest is for testing, with splits at the discussion thread level.Here, performance is tuned and evaluated using mean precision of the top-ranked comment (P@1), so as to emphasize learning the rare high karma events.
In addition, for analysis purposes, we report results for three surrogate tasks that can be used in the ranking problem: i) the binary ranker trained on all list pairs, in which low karma comments dominate, ii) a positive vs. negative karma classifier, and iii) a high vs. medium karma classifier.All use class-balanced data.

Classifier
We use the support vector machine (SVM) rank algorithm (Joachims, 2002) to predict a rank order for each list of comments.The SVM is trained to predict which of a pair of comments has higher karma.The error term penalty parameter is tuned to maximize P@1 on a held-out validation set (20% of the training samples).
Since much of the data includes low-karma comments, there will be a tendancy for the learning to emphasize features that discriminate comments at the lower end of the scale.In order to learn features that improve P@1, and to understand the relative importance of different features, we use a greedy automatic feature selection process that incrementally adds one feature whose resulting feature set achives the highest P@1 on the validation set.Once all features have been used, we select the model with the subset of features that obtains the best P@1 on the validation set.

Features
The features are designed to capture several key attributes that we hypothesize are predictive of comment karma motivated by related work.The features are categorized in groups as summarized below, with details in supplementary material.
• Graph and Timing (G&T): A baseline that captures discourse history (response structure) and comment timing, but no text content.• Authority and Reputation (A&R): K-index, whether the commenter was the original poster, and in some subreddits "flair" (display next to a comment author's username that is subject to a cursory verification by moderators).Probability scores from surrogate classification tasks (reply vs. no reply, positive vs. negative sentiment) to measure the community response of a comment using bag-of-words predictors.• Relevance (Rel.):Comment similarity to the parent, post and title computed with three methods: topic similarity using a non-negative matrix factorization (NMF) model (Xu et al., The various word lists are motivated by feature exploration studies in surrogate tasks.For example, projecting words to a two dimensional space of positive vs. negative and likelihood of reply showed that self-oriented pronouns were more likely to have no response and secondperson pronouns were more likely to have a negative response.The politeness and argumentativeness/profanity lists are generated by starting with hand-specified seed lists used to train an SVM to classify word embeddings (Mikolov et al., 2013) into these categories, and expanding the lists with 500 words farthest from the decision boundary.

Ranking Experiments
We present two series of experiments on comment karma ranking.Fig. 1 shows the relative gain in P@1 over the G&T baseline associated with using different feature groups, and Table 3 summarizes the results using the greedy selection procedure compared to a random baseline (showing the effect of ties) and the G&T baseline.In both cases, we observe very different behavior for the different subreddits.The G&T baseline results show that the graph and timing features alone obtain 21-32% of top karma comments depending on sub-  reddits.We observe an improvement by using textual features for all subreddits except ASKMEN and WORLDNEWS.
The importance of the different features reflect the nature of the different communities.The authority/reputation features help most for ASKSCIENCE, consistent with our k-index study.Informativeness and relevance help all subreddits except ASKMEN and WORLDNEWS.Lexical, mood and community style features are useful in some cases, but hurt others.The predicted probability of a reply was least useful, possibly because of the low-karma training bias.
A major challenge with identifying high karma comments (and negative karma comments) is that they are so rare.Although our feature selection tunes for high rank precision, it is possible that the low-karma data dominate the learning.Alternatively, it may be that language is mainly useful for identifying distinguishing the negative and medium karma comments, and that the very high karma comments are a matter of timing.To better understand the role of language for these different types, we trained classifiers on balanced data for positive vs. negative karma, and high vs. mid levels of karma, and compared the results to the binary classifier used in ranking.In all cases, random chance accuracy is 50%.
Table 4 shows the pairwise accuracy of these classifiers.Not surprisingly, distinguishing positive from negative classes is fairly easy, except for the more information-oriented subreddits (ASKSCIENCE and FITNESS).We also find that the high vs. medium task is slightly easier than the general task for most subreddits.

Related Work
Interest in social media is rapidly growing in recent years, which includes work on predicting the popularity of posts, comments and tweets.Danescu-Niculescu-Mizil et al. (2012) investigate phrase memorability in the movie quotes.Cheng et al. (2014) explore prediction of information cascades on Facebook.Weninger et al. (2013) analyze the hierarchy of the Reddit discussions, topic shifts, and popularity of the comment, using among the others very simple language analysis.
Most relevant to this paper are studies of the effect of language in popularity predictions.Tan et al. (2014) study how word choice affects the popularity of Twitter messages.As in our work, they control for topic, but they also control for the popularity of the message authors.On Reddit, we find that celebrity status is less important than it is on Twitter since on Reddit almost everyone is anonymous.Lakkaraju et al. ( 2013) study how timing and language affect the popularity of posting images on Reddit.They control for content by only making comparisons between reposts of the same image.Our focus is on studying comments within a discussion instead of standalone posts, and we analyze a vast majority of language features.Althoff et al. (2014) use deeper language analysis on Reddit to predict the success of receiving a pizza in the Random Acts of Pizza subreddit.To our knowledge, this is the first work on ranking comments in terms of community endorsement.

Conclusion
This paper addresses the problem of how language affects the reaction of community in Reddit comments.We collect a new dataset of six subredit discussion forums.We introduce a new task of ranking comments based on karma in Reddit discussions, which controls for topic and timing of comments.Our results show that using language features improve the comment ranking task in most of the subreddits.Informativeness and relevance are the most broadly useful feature categories; reputation matters for ASKSCIENCE, and other categories could either help or hurt depending on the community.Future work involves improving the classification algorithm by using new approaches to learning about rare events.

Figure 1 :
Figure 1: Relative improvement in P@1 over G&T for individual feature groups.

Table 1 :
Data collection statistics.

Table 3 :
Overall P@1 preformance on the test set.