Detecting Political Bias in News Articles Using Headline Attention

Language is a powerful tool which can be used to state the facts as well as express our views and perceptions. Most of the times, we find a subtle bias towards or against someone or something. When it comes to politics, media houses and journalists are known to create bias by shrewd means such as misinterpreting reality and distorting viewpoints towards some parties. This misinterpretation on a large scale can lead to the production of biased news and conspiracy theories. Automating bias detection in newspaper articles could be a good challenge for research in NLP. We proposed a headline attention network for this bias detection. Our model has two distinctive characteristics: (i) it has a structure that mirrors a person’s way of reading a news article (ii) it has attention mechanism applied on the article based on its headline, enabling it to attend to more critical content to predict bias. As the required datasets were not available, we created a dataset comprising of 1329 news articles collected from various Telugu newspapers and marked them for bias towards a particular political party. The experiments conducted on it demonstrated that our model outperforms various baseline methods by a substantial margin.


Introduction
News bias is a ubiquitous phenomenon, potentially present in most of the newspapers. The first step in challenging biased news is documenting bias. So detection of the inclination of a news article towards a political party has gained attention today. Such news articles are mostly selected and analyzed manually using a process called coding or theoretical frameworks like discourse analysis and content analysis. This analysis requires a lot of effort, concentration, attention to detail and is also time taking. Thus automating this bias de-tection in a news article could be very helpful and necessary for media verification.
Media bias can be observed and defined through various factors. In political domain, it ranges from selectively publishing articles to specifically choosing to highlight some events, parties and leaders. We also come across articles where bias can be detected by observing the unclear assumptions, loaded language, or lack of proper context. Especially during the election campaigning due to several unjust factors, media houses often align themselves either for or against some specific parties and instead of reporting just the content, they subtly add their stand towards it. This is usually reflected in the headline, and making the headline biased has an effect on the reader who reads the article after registering the headline subconsciously. As there was no dataset marked for political bias available in Telugu, we created a dataset comprising of 1329 news articles collected from various Telugu newspapers and annotated them for bias towards a political party. The bias is marked as None if the article is unbiased.
Telugu is an agglutinative Dravidian language spoken widely in two states of India namely Telangana and Andhra Pradesh. According to Ethnologue 1 list of most spoken languages worldwide, Telugu ranks fifteenth in the list, and a total of 85 million Telugu native speakers exist across the world. There are only 5 major political parties present in the two Telugu speaking states. We treat the problem of political bias detection as a classification problem. The political parties can be treated as labels and the goal will be to assign labels to each news article. Any news article deviating its reader from the original news towards a political party is considered biased. Traditional approaches of text classification represent documents with sparse lexical features, such as ngrams, and then use a linear model or kernel methods on this representation (Wang and Manning, 2012;Joachims, 1998). More recent approaches used deep learning, such as convolutional neural networks (Kalchbrenner et al., 2014) and recurrent neural networks based on long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) to learn text representations.
Although neural-network based approaches have been quite effective, classification based only on articles or only on headlines may not give better results as articles may contain unnecessary extra information and headlines being short may not capture required information. So a combination of article and headline is required for better classification. In this paper, we test our hypothesis that classification can be improved by focusing on essential parts of news articles based on their headlines. Since headlines are designed to be short and catchy, journalists tend to exploit them to express their ideological view of the news stories and depending on these headlines the interpretation of the stories can change. So the intuition underlying our model is that bias in an article can be effectively found by focusing on essential parts of articles based on their headlines.
Our contributions in this paper are (i) The creation and annotation of a newspaper dataset for political bias detection, (ii) The proposal of a neural network architecture, the Headline Attention Network that is designed to capture the important parts of news article causing political bias by paying headline attention.Generally, readers first read the headlines and then go through the news article with those headlines in their mind. Thus attention is paid on news article with its headline in reader's mind. Headline Attention Networks are designed to do the same thing and find important parts that reflect bias in news articles. To illustrate, consider the example in Figure 1. In the figure, importance of each highlighted word in causing bias is directly proportional to the intensity of the blue colour in highlighting 2 . So focusing more on these words according to their importance would give better results rather than focusing on all words.The key difference to other neural networks is that our system focuses on the importance of headline for political bias detection in an article and discover which 2 Translation, explanation and visualizations of Headline attention are given in Supplement Material sequence of tokens are relevant rather than simply filtering out. Our model outperforms various common classification architectures by a significant margin.

Related Work
Identification and analysis of bias in news articles has led to extensive research in the fields of anthropology, discourse analysis, and media studies. (Sivandi and Dowlatabadi, 2015) used the headlines and leads of newspaper articles to detect bias in their complete linguistic approach to the problem. (Iyyer et al., 2014) used recursive neural networks to detect political ideology.
(Rashkin et al., 2017) introduced a propagandists dataset focused propaganda news and presented a study on the language of news media in context of political fact checking. (Recasens et al., 2013) conducted a study related to bias in the Wikipedia articles using logisitc regression.
Many industrial organizations are working in this space worldwide to fight disinformation. First Draft News is a project "to fight mis-and disinformation online" founded by 9 organizations brought together by the Google News Lab. Full Fact is a charity based in London to check and correct facts reported in the news. CrossCheck is a new initiative from Google Labs and First Draft to support truth and verification in Media.
In Telugu, a small amount of work is done on news data. (Mukku et al., 2016) apply ML techniques for Sentiment Analysis of Telugu news articles. (Gangula and Mamidi, 2018) performed multidomain sentiment analysis in Telugu. dataset 3 containing headline of the article, article and the political party towards which it is biased. We marked it with label "None" if it was unbiased. The statistics of the dataset is shown in Table 1.
Four annotators annotated each article in the dataset with one of the 5 parties namely BJP, TDP, Congress, TRS, YCP or as None if the article is unbiased. The annotators are native Telugu speakers with good proficiency in the language. While choosing annotators care was taken that they do not have any bias towards any party and have sufficient political knowledge. The following annotation guidelines were followed: Each article along with the headline was presented to the annotators. They were asked to read them just as they read newspapers. After reading, they were asked to annotate whether the article was subjectively biased towards or against a party or is unbiased. A Kappa score of 0.9 was achieved through multiple discussions. Figure 6 presented in supplemental Material shows some of the examples from our dataset. We can see in the examples below that there is some inherent bias towards a party in the way a particular newspaper has reported. This could be due to several factors like the ownership of the media house, the present power of a party (ruling or opposition), and the ideology of the target group of readers that particular newspaper is catering to. Many a times, political parties themselves establish media houses and newspaper agencies to increase their outreach and glorify their party. This greatly contributes to bias in the published articles.

Headline Attention Networks
The overall architecture of the Headline Attention Network is shown in Figure 2. It consists of several parts: a headline encoder, an article encoder and a headline attention layer. We describe the details of these components below.

Model
We focus on classifying a given article as biased towards one of the political parties in this work. Assume that the article has T words, w i with i ∈ [1, T ] represents the i th word in article and headline has H words, q i with i ∈ [1, H] represents i th word in headline of the article. The proposed model projects the raw articles into a vector representation which can be used for classification. In the following, we will present this method of projection.

Headline Encoder
Given the headline of an article with words q i , i ∈ [1, H], we first embed the words into vectors through an embedding matrix W e , x i =W e q i . We use a bidirectional LSTM to get contextual encoding of headline from both the directions. The bidirectional LSTM contains a forward LSTM − → f which reads headline from q 1 to q H and a backward LSTM ← − f which reads headline from q H to q 1 : We encode the headline of the article by concatenating the forward representation − → h H and the backward representation is the representation of the article headline.

Article Encoder
An article is nothing but a sequence of words. We embed these words into vectors and use bidirectional LSTM to get annotations of the words by summarizing information from both direction for words and therefore incorporating contextual information in the annotation. We encode article as: h i summarizes the neighboring words around word w i but still focuses on word w i .

Headline Attention Layer
Headline of a news article is very important to report news biased towards a political party as a reader generally reads headline first and then goes through the article with that headline in his mind i.e paying attention to article based on the headline. We introduce attention mechanism to extract words that contribute to political bias and form a vector representation v. Specifically, We measure the importance of the word as the similarity of u i with U, the hidden representation of encoded headline representation Q and get a normalized importance α i through a softmax function. After that we compute the representation of the news article as a weighted sum of the word annotations based on the weights. All of the above are learned during the training process.

Bias detection
The vector v is used to detect towards which political party the article is biased to as: Training loss is the negative log likelihood of the correct labels: where i is the label of document d.

Experiments
All the experiments are carried out in a 5-fold cross validation scenario. As headlines express the ideological view of the news stories, in some cases only the headline would be sufficient to detect bias. So except for Headline Attention Networks, for all other baselines we divided dataset into three parts: 1. Only headline.

Concatenation of both headline and news article.
We compared how each of them differs in bias detection.

Baselines
We compare Headline Attention Networks with several baseline methods, including traditional approaches such as Naive Bayes, SVMs, CNNs, Branched CNNs, LSTMs and GRUs. Word embeddings are available for Telugu 4 .

Naive Bayes
Naive Bayes classifier is used to classify documents using the following features. TFIDF The TFIDF values of each word is used as features.
Bag-of-means The average word2vec (Mikolov et al., 2013) embedding is used as feature set.

SVMs
SVM-based classifier is used including following different features.
TFIDF+Unigrams The TFIDF values of bag of Unigrams is used as features.
TFIDF+Bigrams The TFIDF values of bag of Bigrams is used as features.
AverageSG The average word embeddings of each document is used as feature set.

Neural Network methods
We experimented with multiple neural network architectures like: CNNs Word based neural network model like in (Kim, 2014) are used.
Branched CNNs Figure 3 shows the branched CNN architecture. LSTMs and GRU based models like in (Wang et al., 2018) are used.  Table 2: Bias Detection Accuracy in percentage. Maximum is the best value among the three divisions of our dataset for baselines.

Results and analysis
The experimental results are shown in table 2. The results show that our model gives the best performance. Our model outperforms previous best baseline methods by 4.22%. From table 2 we can see that there is a significant improvement in neural network based methods compared to traditional methods. But involving the headline attention can significantly improve over them. As mentioned earlier, headlines are designed to be short and catchy so the journalists tend to exploit them to influence readers. Therefore, considering only the headlines also predicts bias with only a small difference in accuracy when compared to considering whole article. This can be clearly observed in table 2 in neural network methods. We can also observe that simply concatenating headline does not help much in bias prediction, instead attending to article with headline representation increases accuracy by a significant margin. Our Headline Attention Network outperforms all other models because it effectively finds out important words causing bias in a document.

Conclusion
In this paper, we proposed a headline attention mechanism for automatic detection of bias in news articles along with a manually annotated dataset to enable further research. Our model builds a vector for news article by aggregating important words obtained by paying attention based on headline representation. The experimental results demonstrate that our model significantly outperforms all the previous baseline models. Visualization of attention shows how headline attention effectively picks out words causing bias. This model can also be extended to other sen-timent based classification of texts such as blogs or online trending articles, which contains a title/headline and a body.

A Supplemental Material
A.1 Visualization of Headline Attention Figure 4 and 5 show the visualization of our headline attention networks. Intensity of blue color denotes word weight. Figures 4 and 5 shows that our model selects words with strong emphasis on a person or a political party. The darker the blue colour, it implies higher is its importance in predicting bias towards a party. Words with the darkest blue highlighting, such as YSRCP,Chandra Babu, People's leader are the most important ones as they refer to who/what the article is intending to inform about. So they are given more weight. The English translation of words in blue are "Chandra Babu", "progress", "inspiration", "strongest person on earth", "special", "encourage" etc. Our headline attention focuses most on "Chandra Babu" who is the chair person of the TDP political party and the other words are attended according to the intensity of praising. Approximate translation of Figure 4: Headline : Path of welfare Article : On Friday, YSR Congress chief Jaganmohan Reddy carried out the fulfillment of people's desires successfully. The main goal of the walk is the welfare and betterment of the people of the state and people participated with a lot of excitement and offered immense support to the leader. The leader of the masses was given a warm welcome by the people, who have waited for hours just to see him. The people were very eager and enthusiastic to see him, meet him, greet him and to be addressed by him in the public talk that the leader addresses. The leader of masses, with a constant smile on his face, also greeted the people affectionately, spoke with them to find out about the current problems that they are facing and gave offered them to support and ensured that he is always with the people in any kind of need.
Approximate translation of Figure 5: Headline : Chandra Babu Naidu praised by New York Times Article : The step taken by Chandra Babu is now an inspiration for all other states. The measures taken by Chandra Babu regarding organic farming are exceptionally great and are getting great applauses from various environmentalists. New scheme called Zero Budget Natural Farming introduced by Chandra Babu mainly encourages the farmers to implement organic farming and techniques and are the main reason for the farmer to have hope on their life. The same has been even published in the New York Times. The effort put by Chandra Babu for encouraging farmers in chemical-free farming is truly appreciation worthy.