The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media

This study analyzes the political slants of user comments on Korean partisan media. We built a BERT-based classifier to detect political leaning of short comments via the use of semi-unsupervised deep learning methods that produced an F1 score of 0.83. As a result of classifying 21.6K comments, we found the high presence of conservative bias on both conservative and liberal news outlets. Moreover, this study discloses an asymmetry across the partisan spectrum in that more liberals (48.0%) than conservatives (23.6%) comment not only on news stories resonating with their political perspectives but also on those challenging their viewpoints. These findings advance the current understanding of online echo chambers.


Introduction
User-generated news comments manifest the interactive and participatory nature of online journalism. Unlike letters to the editor, the comments section allows readers to express their thoughts and feelings in response to news stories they read more publicly and instantaneously. Moreover, comments themselves generate subsequent user reactions from others. For example, in South Korea, news portals wherein the vast majority consume news online displays the list of "most commented" articles along with the "most viewed" news stories. This aggregation of user responses is a major contributing factor toward news selection (KPF, 2018). As such, real-time user reactions, rather than the cover stories of newspapers, now tell people what to read and think about, signifying the shift of the traditional direction of agendasetting (Lee and Tandoc, 2017).
On a related note, about 63% of South Koreans getting news online reported they regularly read the comments section (DMCReport, 2013) and consider them as a valid, direct cue to public opinion (Lee and Tandoc, 2017). Much research has corroborated the role of comments on the formation of public opinion (Springer, Engelmann, and Pfaffinger, 2015;Waddell, 2018). Nonetheless, diverging from the massive scholarly attention paid to the role of hyper-partisan news media, little research has analyzed political biases of news comments. This study aims to fill this void in current literature by examining political slants of news comments as well as the overlap of commenters across the partisan spectrum. These examinations will contribute to advancing the current understanding of the presence of echo chambers in the realm of online news comment sections.
To analyze comments on a million scale, we took two different approaches. First, we trained a BERT model based on news stories published by both liberal and conservative news outlets. Second, we expanded the seed data of human-labeled news comments with the use of user information and built a comment-specific classifier. Next, based on the best performing model, the political slants of 21.6K news comments were analyzed. This allows us to examine the composition of the political diversity of commenters within each liberal and conservative news outlet. For the identified liberal-or conservative-leaning commenters, we also tracked whether they leave comments on news stories resonating with their own political preferences or do so to news stories challenging their political viewpoints.

Partisan Media and Echo Chambers
The proliferation of partisan media has invited much scholarly concern about the rise of echo chambers (Sunstein, 2009). As more people tune into news outlets congenial to their political viewpoints, their existing preferences reinforce and, in turn, opinion polarization and social ex-tremism become prevalent (Iyengar and Hahn, 2009;Stroud, 2010). The recent employment of web tracking technologies, however, rebuts the echo chamber hypothesis with the observational data showing accidental or purposeful crosscutting news exposure (Gentzkow and Shapiro, 2011;Flaxman, Goel, and Rao, 2016).
However, in the realms of online discussion forums including online news commenting sections, to date there is no evidence opposing the extant findings such that people are more likely to leave comments in alignment with the other posts, thus leaving user commenting sections homogeneous (Lee and Jang, 2010;Hsueh, Yogeeswaran, and Malinen, 2015). Interestingly, whereas user comments posted on news websites exerted considerable power in the perceptions of individuals' inference about public opinion Jang, 2010, 2010), no such effect emerged when it comes to user comments posted on news outlets' Facebook pages (Winter, Brückner, and Krämer, 2015).

Neural Network for Text Classification
User-contributed news comments are, by nature, informal, irregular, and erratic. To reduce the noise of the dataset as such, many studies have adopted various data cleaning methods. However, user generated text is difficult to sanitize and the results could be misleading if we apply a naive statistic method. Recent NLP (natural language processing) research has shown that neural network approaches outperform conventional statistical models. For instance, TextCNN (Kim, 2014) gained performance in text classification tasks via introducing convolutional layers and BERT (Devlin, Chang, Lee, and Toutanova, 2018) shows stateof-the-art results via its bidirectional transformer network, and it shows Modern text classification methods utilize pretrained models like BERT by fine-tuning them on the target NLP domain.

Data Collection
Using "minimum wage" as a keyword, we crawled news stories and their associated user comments on NAVER, the top news portal site in South Korea. The platform offers its own user commenting section and ten times more Internet users are known to read news on NAVER compared to individual news websites. From January to July of 2019, 1534 articles from three conservative news outlets-Chosun Ilbo, Joongang Ilbo, and Donga Ilbo-and 765 from thee progressive ones-Hankyoreh, Kyunghyang Shinmun, and OhMyNews-were collected, with a total of 2299 news stories.
We collected press releases of the two major political parties in South Korea-the liberal Democratic Party of Korea and the conservative Liberty Korea Party-containing the phrase "minimum wage." This data serves as the basis for learning partisan framing and for filtering out irrelevant topics that might create noise in determining the political stance of news comments. We performed part-of-speech tagging on the press releases and analyzed the frequency of nouns and bigrams. From the list of nouns and bigrams appearing more than a hundred times, the authors carefully examined and selected phrases that were relevant. In this way, the list of related keywords (i.e., nouns and bigrams) on the minimum wage was compiled for the liberal and conservative parties, respectively. This list includes a wide range of phrases relevant to politics in general, such as inter-Korean summit, denuclearization, corruption, flexible working hours, labor union, and unemployment rates.
Online news comments were also collected from NAVER's top "most commented" stories on each of seven major news sections including national, politics, economy, culture, world, and lifestyle, resulting in a total of 210 most commented news stories per day. These news stories received around 38 million news comments by 1.8 million unique users. This data contains text content as well as user identifiers.

Classification from News Articles
We first built a classifier trained on news articles in order to observe how well news articles could proxy comments in the task of political bias classification. To compensate for the relatively small news article dataset, we spliced the news articles into individual sentences and then filtered in sentences that included at least one of the identified partisan-specific keywords. Two hundred conservative and liberal articles (100 each) were set aside for validation tests. The remaining 2,099 articles were spliced sentences, resulting in 49.8K sentences. Each sentence was labeled as either conservative or liberal based on the political leaning of the news outlets. The completed dataset con-sists of 20.3K liberal and 29.5K conservative sentences.
Using the BERT-based pre-trained model for the Korean language, KorBERT 1 , we trained a classifier for binary classification task of news sentences. Test data composed of the 200 news articles mentioned above (i.e., 100 each from liberal and conservative news outlets). Test sets were further prepared in two different manners: (1) the HEADLINE representation includes headlines only of the labeled data and (2) the BODY-TEXT representation includes only sentences containing partisan-specific keywords in the news body text. The former representation was considered since political slant are known to be more apparent in news headlines.

Classification from News Comments
We also built another language model that is trained on the slants of news comments directly. To obtain labels, we randomly sampled 4,827 user comments that had been posted to stories on the minimum wage and employed three coders. These coders labeled each comment as liberal, conservative, or other (including nuetral). After a series of training sessions, coders could achieve an acceptable inter-coder reliability (Krippendorff's alpha = .7015). Coders next independently labeled the rest. This human-labeled dataset consisted of 1,345 liberal (27.6%), 1,597 conservative (32.8%), and 1,930 other (39.6%) comments.
To ensure sufficient training data, we expanded the comment labels by a semi-unsupervised method. All online users who had authored at least two comments in the human-labeled dataset were identified. The bias of each user was computed as the mean of their comment labels, where scores are -1 for a liberal comment and +1 for a conservative. We selected those users whose mean bias score was below -0.8 or above 0.8 and identified 250 such users, whose political leaning could be recognized. We then collected all comments authored by the 250 users from the entire news corpus of 6 months. Under the assumption that the political stance of these users would be constant during this time period, we labeled these comments with the author's computed political bias. The final dataset contained a total of 93,565 comments, out of which 17,535 were liberal and 76,030 were conservative. 1 http://aiopen.etri.re.kr/ The expanded comment labels were used to train two NLP models: TextCNN and BERT. We undersampled 17,535 conservative comments to balance the labels, resulting in a total of 35,070 labels. We set aside 10% of labels as the test set, then split the remainder into 10% of the validation set and 90% of the train set to train the models.
The TextCNN-based model employed padding up to 300 characters, which is the maximum length of comments on NAVER. Comments were tokenized on character level and fed into an embedding layer of 300 dimensions. The model used a single-dimensional CNN with the channels of sizes 5, 10, 15, and 20, respectively. Then the data were fed into the max-pooling layer with a dropout ratio of 0.5 and classified through a linear layer. We used Adam optimizer with the default learning rate (1e-3) and trained six epochs, as validation loss did not converge. For the second model, we again utilized the pretrained BERT model for the Korean language, KorBERT. Comments were tokenized by words and padded to a size of 70. The model converged after two epochs with a higher learning rate (1e-3), with the lowest loss.

Bias Classification
After training the BERT classifier on news sentence labels, we evaluated the model by classifying political slants of the HEADLINE and BODY-TEXT representations into liberal and conservative labels. The performance of the classifiers was measured in terms of Matthew's correlation coefficient, accuracy, and F1-scores. Table 1 summarizes the results, which shows the BODY-TEXT representation of the validation set showed the best result. The slant detection model trained on news article data, however, was not effective in detecting slant of user-contributed comments, indicated by the low F1 score of 0.3571. When comment labels were utilized for training, existing language models could successfully detect the slant of news comments, as dis-  Table 2. BERT-based model outperformed TextCNN model in this domain-specific learning task.

Bias Distribution in User Comments
We analyzed the distribution of political slants in the comment sections. From the set of news articles on minimum wage, we inferred political bias labels for the comments of all articles published by the aforementioned major outlets with the best-performing model (i.e., BERT-based model trained on comment labels) and used it to derive a statistic on bias distribution.
Fig 1 shows the percentage of liberal and conservative comments for each news outlet based on the labels inferred. The figure demonstrates a clear presence of conservative bias in user comments across all partisan news media; Even for the bottom three liberal outlets, more news comments are conservative-leaning, as opposed to the widely accepted echo chamber conjecture in digital media.

Discussion & Conclusion
The slant classifier trained on news articles related to the minimum wage issue performed well on similar news articles. However, the same model performed poorly in the comments section. This may be due to the unstructured nature of the usergenerated data that is very different from editorialized news content. The slant classifiers trained on the labeled comment data, however, showed promising performance on classifying the political bias of crowd-generated comments.
The analysis in this paper revealed that the majority of user comments are pro-conservatives in Korean news outlets, although conservative partisan media attracted more congenial comments (71.1% conservative versus 28.9% liberal) than liberal ones (46.4% liberal versus 53.6% conservative). This finding may suggest the fallacy of online echo chambers in Korean news media.
We do not, however, argue that the prominent crossover, of news commenters in both liberal and conservative news stories, represents that online users in South Korea have established a more balanced news reading habits nor have disrupted the "filter bubble." Notably, because a recent study indicated that being exposed to opposing views can sometimes backfire and further increase polarization (Bail et al., 2018).
Moreover, the auxiliary analyses tracking user IDs further showed that less than half of liberals (48.0%) and conservatives (23.6%) commented on both congenial and uncongenial partisan news outlets. This finding provides imperative insights to the current understanding of online political discussions in the era of partisan media.
This work bares several limitations that can be improved. First, the expanded comments dataset that was built from human-labeled news comments could be validated further. While we assumed people's slant remains consistent over a half a year, there is a chance that the political slant of commenters could differ by news topics. Second, while this paper utilized data from contributors who post news comments frequently, future methods could also consider data from one-time commenters. Such methods could reveal the political slant map of comments more comprehensively.