Fermi at SemEval-2019 Task 4: The sarah-jane-smith Hyperpartisan News Detector

This paper describes our system (Fermi) for Task 4: Hyper-partisan News detection of SemEval-2019. We use simple text classification algorithms by transforming the input features to a reduced feature set. We aim to find the right number of features useful for efficient classification and explore multiple training models to evaluate the performance of these text classification algorithms. Our team - Fermi’s model achieved an accuracy of 59.10% and an F1 score of 69.5% on the official test data set. In this paper, we provide a detailed description of the approach as well as the results obtained in the task.


Introduction
Hyper-partisan refers to a person or a group's tendency to be extremely partisan or biased towards a person or a group and specifically towards a political person or a political party. With the tremendous increase in citizen-based journalism, where anyone can create a website and post his (biased) views, there is a new phenomenon called fake news and it's potential role in affecting the election results, and has the ability to modify and impact the public's perception towards various people, companies and political parties. These kind of 'news' are usually one-sided, inflammatory, emotional and mostly woven around untruths. Combined with the proliferation of social media platforms, these 'fake news' signals get amplified and may potentially mask the signal of the real news. The fake news phenomenon hype has caused irreparable loss to many politicians, companies and in some cases involved the death of fellow citizens.
While Social media platforms can be used for constructive ideas, a small group of people can propagate their notions including hatred or affinity towards or against an individual, or a group or a race to the entire world in a few seconds. This necessitates the need to come up with computational methods to identify hyper-partisan news in user generated content.
Using computational methods to identify hyperpartisan news has been gaining attention in recent years as evidenced in (Potthast et al., 2018).

Related Work
In this section, we briefly describe other work in this area.
Hyper-partisan news detection is a new area and to the best of the knowledge of the authors, not much work has been done in this area. However, a close and related task is that of fake news detection. (Pérez-Rosas et al., 2017) use linguistic features to distinguish between fake and legitimate news content. (Wang, 2017) collect a decade long manually labelled sor statements in various context from a political fact checking website and create fake news classifiers using surface level linguistic patterns. (Tschiatschek et al., 2018) leverage crowd signals for detecting fake news. (Long et al., 2017) tackles the problem of fake news through multi-perspective speaker profiles.
Papers published in the last two years include the surveys by (Zhou and Zafarani, 2018), (Zhou et al., 2019) and (Shu et al., 2017), the paper by (Kumar and Shah, 2018).
A shared task on Hyper-partisan News detection (Kiesel et al., 2019) was announced as part of the annual workshop SemEval 2019. The task was to find if the given news article text and classify if it follows a hyper-partisan argumentation, i.e., whether it exhibits blind, prejudiced, or unreasoning allegiance to one party, faction, cause, or person.

Methodology and Data
The data collection methods used to compile the data set in Hyperpartisan news detection is described in (Kiesel et al., 2019). We tackle the problem of identifying a piece of news as hyperpartisan or not by formulating it as a text classification problem. We use bag of words representation to transform the individual documents into vectors. After the transformation, we reduce the number of dimensions by using chi-square feature selection technique. In this method, the chi-square statistics between every feature variable and the target variable are computed, and then the existence of a relationship between the variables and the target is calculated. If the target variable is independent of the feature variable, that feature variable is not useful for prediction. If the two are dependent, then that feature variable is very important. In text classification, the feature selection is the process of selecting a specific subset of the terms of the training set and using only them in the classification algorithm. The feature selection process takes place before the training of the classifier. We use Random Forest Classifier from scikit-learn 1 machine learning library to generate models on these reduced features. The number of estimators in all the experiments is 20. All other parameters are default.
Our results on the different number of important features have been mentioned and described in the results section.
We haven't used any external datasets to augment the data for training our models.   Table 2 shows the dev set macro-averaged F-1 and accuracy for different number of important features. We notice that the best performance was bagged by the model which uses 1000 features with Random Forest. We submitted this best model for evaluation on the test data and Table 4 shows the results.
The potential applications of this work show how different number of important features affect the performance of the classification task.

Future Work
Due to some constraints on the TIRA 2 platform, we were unable to use state-of-the-art deep learning techniques for text classification, which gained immense popularity in the past few years. In the future, we would like to explore transfer learning and deep learning algorithms to create models for and evaluate their performance for this task.