Vernon-fenwick at SemEval-2019 Task 4: Hyperpartisan News Detection using Lexical and Semantic Features

In this paper, we present our submission for SemEval-2019 Task 4: Hyperpartisan News Detection. Hyperpartisan news articles are sharply polarized and extremely biased (onesided). It shows blind beliefs, opinions and unreasonable adherence to a party, idea, faction or a person. Through this task, we aim to develop an automated system that can be used to detect hyperpartisan news and serve as a prescreening technique for fake news detection. The proposed system jointly uses a rich set of handcrafted textual and semantic features. Our system achieved 2nd rank on the primary metric (82.0% accuracy) and 1st rank on the secondary metric (82.1% F1-score), among all participating teams. Comparison with the best performing system on the leaderboard shows that our system is behind by only 0.2% absolute difference in accuracy.


Introduction
Today in the age of digitization, a smartphone has become an indispensable tool for information sharing and consumption. It is much more convenient for users to read news through online articles and social media platforms. These platforms provide them quick and easy access to information almost everywhere. However, it is highly possible that the shared information is unverified and may bias the reader's opinions. This issue is exacerbated by the fact that most of the people find it difficult to distinguish between what's real and objective, what's fake and what's partisan.
Although these online news articles are expected to be written well-balanced without any prejudices, the authors/media houses may at times spill their standpoints and beliefs. Instead of providing a holistic view to the readers, the author tries to convey a picture with which he agrees and thus making the audience biased towards a party or a faction. When these news articles are extremely polarized towards one side of the argument, they are referred to as "Hyperpartisan News Articles".
This extreme polarization can leave users vulnerable to detrimental arguments and cloud their judgment to make objective decisions. These hyperpartisan articles may also carry some common elements of fake news. They are typically used to spread propaganda and manipulate readers. It targets human psychology by creating confirmation bias and echo chambers and therefore impairing their ability to dispel the hyperpartisan articles in favor of neutral articles.
SemEval-2019 Task 4 aims to solve this issue of hyperpartisanship. The objective of this task is to detect if a given news article has hyperpartisan arguments (extremely one-sided). In this paper, we have described our system that automates the process of identifying and annotating news articles as hyperpartisan or not.

Related Work
The problem of fake news and hyperpartisan has been discussed earlier by Potthast et al. (2017). They followed a style based approach to tackle the problem and have also suggested that writing style of left-wing and right-wing news are quite similar. Apart from this, there hasn't been much work in hyperpartisan news detection. Our approach is inspired by some recent work in the domain of sentiment analysis (Pontiki et al., 2016) and bias detection (Patankar et al., 2018;Recasens et al., 2013;Patankar and Bose, 2017;Baly et al., 2018). Jian and Wilson (2018) have explored linguistic signals embedded in news articles. They have used these linguistic clues to detect the spread of misinformation via social media. Iyyer et al. (2014) have studied the impact of words in identifying people's ideology and have proposed RNN to capture semantic features of a sentence.

System Description
Our hyperpartisan news detection system consists of three phases: 1) preprocessing, 2) generating article representation, and 3) training a classifier. An overview of our system is shown in Figure 1.

Preprocessing
The original dataset consisted of news articles along with HTML tags. Hyperpartisan news detection cleaner 2 was used to convert original text to plain text by removing tags. In the next step, further rudimentary cleaning of articles was done. We have also expanded contractions in the dataset like 'shan't' was converted to "shall not", 'don't' to "do not" etc.

Article Representation
Hyperpartisan news articles are extremely one sided and exhibits blind beliefs as compared to a neutral news articles. We have observed that writers of such articles generally, manifest usage of harsh tone and inflammatory language, they even exaggerate and convey opinions to stress their ideology. Hyperpartisan news articles tend to use superlatives and comparatives frequently to dramatize or exaggerate situations.
Polarity at an article level can capture emotions and sentiments of the article, it may also capture contextual polarity. Polarity at sentence level helps to identify bias localized to a sentence which might not be perceptible at an article level. It helps to shift the focus on bias-heavy sentences. By combining polarity at both levels, we can capture the tone, overpraise and sentiment in an arti-cle. Subjectivity, modality and bias lexicons help to discover the attitude, prejudice and beliefs expressed by the author. Building from these insights, we created a set of handcrafted textual features (HF), which are based on writing style, linguistics, and lexicons.
The problem with handcrafted features is that they don't capture the semantic relationship among the sentences. To solve this problem, we incorporated semantic features (SF) that can capture long-range dependencies of sentences and bring out the semantics of the article. The drawback of semantic features obtained via word based embeddings like Glove, is that it ignores word sequencing. In our approach, we have also explored features generated via distributed document representation (Universal Sentence Encoder or Doc2Vec) that are agnostic to word orderings and captures the semantics of an article.
Consider a set of N news articles A = {a 1 ...a N }. Each article a i has a set of S = {s 1 ...s m } sentences and a set of W = {w 1 ...w l } words, where l is the length of the article. We jointly used HF and SF to obtain article representation (ArtRep), where ⊕ is concatenation operator: Handcrafted features used in our system is described in Section 4 and semantic features are discussed in Section 5.

Handcrafted Features
Bias Score: For identifying bias words in an article, bias lexicon built from NPOV corpus of Wikipedia articles (Recasens et al., 2013) was used. Wikipedia advocates Neutral Point of View policy (NPOV), articles falling under NPOV dispute category were used to build this corpus. Bias score is the frequency of article words that occur in bias lexicon.
Article Level Polarity: Polarity of an article (AP ol) was extracted using MPQA Subjectivity lexicon (Wilson et al., 2005) (SLex), which lists around 8000 words with their prior polarity and their subjectivity type. Let a set of prior positive polarity words in an article be P Lex i and negative polarity words be N Lex i . We computed positive (AP ol + i ) and negative polarity score (AP ol − i ) of an article a i , where 1 is an indicator function: Sentence Level Polarity: Polarity was further fine-grained to generate features for sentences of a news article using Pattern toolkit for English 3 . A sentence s j was given as an input to the toolkit and a polarity score α j in the range of [-1.0, 1.0] was obtained. Positive (P olScore + i ), negative (P olScore − i ) and neutral polarity score (P olScore N eu i ) of an article a i was computed with |0.1| as a threshold as it gave the best results for our system: Subjectivity and Modality: Subjectivity score is computed using Sentiment module of Pattern toolkit for English 3 . Toolkit gives a score based on adjectives and their context in the range of [0.0, 1.0]. Modality is a measure of the degree of certainty. It was computed using Modality module of Pattern toolkit for English 3 .
Superlatives and Comparatives : Intensifying lexicons like adjectives and adverbs in superlative and comparative degree were used. We ran POS tagger from NLTK (Bird and Loper, 2004) on the article text to identify Subjective and Comparative adjectives and adverbs and their corresponding frequencies in the text were used as a feature.  semantic space. We have used 300-dimensional Glove embeddings trained on Common Crawl data of 2.2 million words and 840 billion tokens. An article was tokenized into sentences and further into words to obtain it's article representation. Each of these words was vectorized using Glove pretrained embeddings. Article representation was generated by averaging (Wieting et al., 2015) these 300-dimensional word embeddings.

Doc2Vec
Doc2Vec (D2V) (Le and Mikolov, 2014) is an unsupervised algorithm to learn distributed representation of multi-word sequences in semantic space. We have used Python implementation of Doc2Vec provided by gensim to learn embeddings for the news articles. For our experiments, we have used 512-dimensional embeddings generated from D2V on article text.

Universal Sentence Encoder
Universal Sentence Encoder (USE) (Cer et al., 2018) is a pretrained model to generate embeddings for sentences, phrases and much larger multi-word sequences. It has shown good performance on diverse NLP tasks for e.g., phrase level opinion extraction and sentiment classification. USE takes English text as an input and generates 512-dimensional embedding. In our best system, we have used these 512-dimensional article embeddings, generated on feeding article text to USE.

Dataset
We have used ByArticle dataset provided in the task, which is labeled through crowdsourcing. Dataset consists of 645 news articles and a label to denote if it is hyperpartisan or not, details of the dataset are provided briefly in Table 1. More information on the dataset can be found in (Kiesel et al., 2019).

Results and Analysis
Handcrafted features were concatenated with semantic features, to generate a rich article representation which was fed to the classifier as an input. A L2-regularized logistic regression (Pedregosa et al., 2011) classifier was trained with 10fold cross-validation. Since the training dataset is unbalanced, we have used class weighted logistic regression by weighing classes inversely proportional to their frequency. For evaluation, a balanced hidden test data (ByArticle) was provided by the organizers through TIRA . We have used accuracy for performance evaluation, which was a primary performance measure in the task. Apart from this, we have also reported precision, recall, and F1-score on the dataset. Table 2 shows the performance of our various approaches against the baseline results (Task 4 semeval-pan-2019-baseline on TIRA 4 ). From the results, we can observe that our best performing system has outperformed the baseline by a huge margin, recording an absolute jump of 35.83% accuracy. We can also see that our system is as good as the best system (bertha-von-suttner 4 ) submitted for the task, which has 82.17% accuracy, showing that we were behind by only 0.16%.
In order to assess the importance of each hand-crafted feature, we performed experiments by using individual features and their combination (HF). In fact, Table 2 highlights that all the features jointly perform well with 70.22% accuracy. We found that bias lexicon and polarity based features are the most informative ones.
In an attempt to further improve our model's performance, we experimented by combining semantic features with handcrafted features as described in Section 3.2. Our results have also shown that handcrafted features alone aren't sufficient and better performance can be achieved by combining them with semantic features. USE+HF has beaten all our other models and showed an absolute improvement of 11.79% accuracy over the HF model. It also attained 1st rank on the leaderboard (based on F1-score) and 2nd overall rank (based on Accuracy).
The poor performance of D2V+HF can be attributed to the training of D2V as typically D2V requires large data for training. Results have also shown that USE+HF performed better than Glove+HF, and thus validating our earlier claim of limitations of word-based embeddings.

Conclusion and Future Work
In this paper, we proposed a novel approach to detect hyperpartisan arguments in a news article. Our system ranked 2nd in SemEval-2019 Task 4. Our approach leverages rich semantic and handcrafted textual features. In the paper, we have also studied the importance of capturing semantic relationship among sentences of an article. Our system employed linguistic and lexical features to detect polarity, sentiments and blind beliefs exhibited in an article. Experiments with various model configurations demonstrated the effectiveness of our approach.
Detecting hyperpartisan in news articles should also, involve incorporation of world knowledge, as statements as an individual may not be extremely biased but when seen from a global perspective, they turn out to be hyperpartisan. As a future work, we would like to exploit the use of external knowledge. We would also like to investigate the role of credibility of news sources (Baly et al., 2018;Popat et al., 2018) in detecting hyperpartisan news articles.