In Plain Sight: Media Bias Through the Lens of Factual Reporting

The increasing prevalence of political bias in news media calls for greater public awareness of it, as well as robust methods for its detection. While prior work in NLP has primarily focused on the lexical bias captured by linguistic attributes such as word choice and syntax, other types of bias stem from the actual content selected for inclusion in the text. In this work, we investigate the effects of informational bias: factual content that can nevertheless be deployed to sway reader opinion. We first produce a new dataset, BASIL, of 300 news articles annotated with 1,727 bias spans and find evidence that informational bias appears in news articles more frequently than lexical bias. We further study our annotations to observe how informational bias surfaces in news articles by different media outlets. Lastly, a baseline model for informational bias prediction is presented by fine-tuning BERT on our labeled data, indicating the challenges of the task and future directions.


Introduction
News media exercises the vast power of swaying public opinion through the way it selects and crafts information (De Vreese, 2004;DellaVigna and Gentzkow, 2010;McCombs and Reynolds, 2009;Perse, 2001;Reynolds and McCombs, 2002).Multiple studies have identified the correlation between the increasing polarization of media and the general population's political stance (Gentzkow andShapiro, 2010, 2011;Prior, 2013), underscoring the imperative to understand the nature of news bias and how to accurately detect it.
In the natural language processing community, the study of bias has centered around what we term NYT: And on Saturday, one day before Attorney General William Barr released a short summary of Mueller's findings, former Texas Rep. Beto O'Rourke charged on the campaign trail in South Carolina that you ["have a president, who in my opinion beyond the shadow of a doubt, sought to, however [ham-handedly,] Trump collude with the Russian government-a foreign powerto undermine and influence our elections."] Trump lexical bias: bias stemming from content realization, or how things are said (Greene and Resnik, 2009;Hube and Fetahu, 2019;Iyyer et al., 2014;Recasens et al., 2013;Yano et al., 2010).Such forms of bias typically do not depend on context outside of the sentence and can be alleviated while maintaining its semantics: polarized words can be removed or replaced, and clauses written in active voice can be rewritten in passive voice.
However, political science researchers find that news bias can also be characterized by decisions arXiv:1909.02670v1[cs.CL] 5 Sep 2019 made regarding content selection and organization within articles (Gentzkow et al., 2015;Prat and Strömberg, 2013).As shown in Figure 1, though all three articles report on the same event, Huffington Post (HPO) and Fox News (FOX) each frame entities of opposing stances negatively: HPO states an assumed future action of Donald Trump as a fact, and FOX implies Democrats are taking advantage of political turmoil.Such bias can only be revealed by gathering information from a variety of sources or by analyzing how an entity is covered throughout the article.
We define these types of bias as informational bias: sentences or clauses that convey information tangential, speculative, or as background to the main event in order to sway readers' opinions towards entities in the news.Informational bias often depends on the broader context of an article, such as in the second FOX annotation in Figure 1: gathering new supporters would be benign in an article describing political campaign efforts.The subtlety of informational bias can more easily affect an unsuspecting reader, which presents the necessity of developing novel detection methods.
In order to study the differences between these two types of bias, we first collect and label a dataset, BASIL (Bias Annotation Spans on the Informational Level), of 300 news articles with lexical and informational bias spans.To examine how media sources encode bias differently, the dataset uses 100 triplets of articles, each reporting the same event from three outlets of different ideology.Based on our annotations, we find that all three sources use more informational bias than lexical bias, and informational bias is embedded uniformly across the entire article, while lexical bias is frequently observed at the beginning.
We further explore the challenges in bias detection and benchmark BASIL using rule-based classifiers and the BERT model (Devlin et al., 2019) fine-tuned on our data.Results show that identifying informational bias poses additional difficulty and suggest future directions of encoding contextual knowledge from the full articles as well as reporting by other media.

Related Work
Prior work on automatic bias detection based on natural language processing methods primarily deals with finding sentence-level bias and considers linguistic attributes like word polarity (Re-casens et al., 2013), partisan phrases (Yano et al., 2010), and verb transitivity (Greene and Resnik, 2009).However, such studies fail to take into consideration biases that depend on a larger context, which is what we try to address in this work.
Our work is also in line with framing analysis in social science theory, or the concept of selecting and signifying specific aspects of an event to promote a particular interpretation (Entman, 1993).In fact, informational bias can be considered a specific form of framing where the author intends to influence the reader's opinion of an entity.The relationship between framing and news is investigated by Card et al. (2015), in which news articles are annotated with framing dimensions like "legality" and "public opinion."BASIL contains richer information that allows us to study the purpose of "frames," i.e., how biased content is invoked to support or oppose the issue at hand.
Research in political science has also studied bias induced by the inclusion or omission of certain facts (Entman, 2007;Gentzkow andShapiro, 2006, 2010;Prat and Strömberg, 2013).However, their definition of bias is typically grounded in how a reader perceives the ideological leaning of the article and news outlet, whereas our informational bias centers around the media's sentiment towards individual entities.Furthermore, while previous work mostly uses all articles published by a news outlet to estimate their ideology (Budak et al., 2016), we focus on stories of the same events reported by different outlets.

BASIL Dataset Annotation
Using a combination of algorithmic alignment and manual inspection, we select 100 sets of articles, each set discussing the same event from three different news outlets.10 sets are selected for each year from 2010 to 2019.We use, in order from most conservative to most liberal, Fox News (FOX), New York Times (NYT), and Huffington Post (HPO).Main events and main entities are manually identified for each article prior to annotation.The political leanings of the main entities (liberal, conservative, or neutral) are also manually annotated.See the Supplementary for details.
Annotation Process.To compare how the three media sources discuss a story, annotators treat each article triplet as a single unit without knowing media information.Annotations are conducted on both document-level and sentence-level.On the document-level, annotators estimate the overall polarities of how the main event and main entities are covered, and rank the triplet's articles on the ideological spectrum with respect to one another.Before reading the articles, annotators specify their sentiment towards each main entity on a 5 point Likert scale. 2n the sentence-level, annotators identify spans of lexical and informational bias by analyzing whether the text tends to affect a reader's feeling towards one of the main entities.In addition to the main dimension of bias type (lexical or informational), each span is labeled with the target of the bias (a main entity), the bias polarity (positive or negative towards the target), the bias aim towards the main target (direct or indirect), and whether the bias is part of a quote.Bias aim investigates the case where the main entity is indirectly targeted through an intermediary figure (see the HPO example in Figure 1, where the sentiment towards the intermediary entity "Trump Administration" is transferred to the main target, "Donald Trump").Statistics are presented in Table 1.
Inter-annotator Agreement (IAA).Two annotators individually annotate each article triplet before discussing their annotations together to resolve conflicts and agree on "gold-standard" labels.We measure span-level agreement according to Toprak et al. (2010), where we calculate the F1 score of span overlaps between two sets of annotations (details are in the Supplementary).Although the F1 scores of IAA are unsurprisingly low for this highly variable task, the score dramatically in-creases when agreement is calculated between individual annotations and the gold standard-from 0.34 to 0.70 for informational bias spans and from 0.14 to 0.56 for the sparser lexical spans, demonstrating the effectiveness of resolution discussions.
During the discussions, we noticed several trends that improved the quality of the gold standard annotations.First, the difficulty of being continually vigilant of one's own implicit bias would sometimes cause annotators to mark policies they disagreed with as negative bias (e.g., a liberal annotator might consider the detail that a politician supports an anti-abortion law as negative bias).Discussions allowed annotators to re-examine the articles from a more neutral perspective.Annotators also disagreed on whether a detail was relevant background or biasing peripheral information.During discussions, they performed comparisons to other articles of the triplet to make a final decision-if another article includes the same information, it is likely relevant to the main event.This strategy reiterates the importance of leveraging different media sources.
For overlapping spans, we find high agreement on the other annotation dimensions, with an average Cohen's κ of 0.84 for polarity and 0.92 for target main entity.

Contrasting the Bias Types
Informational bias outnumbers lexical bias.As shown in Table 1, the large majority of annotations in BASIL are classified as informational bias.One explanation for its prevalence is that journalists typically make a conscious effort to avoid biased language, but can still introduce informational bias, either intentionally or through negligence.
For both bias types though, negative bias spans are much more pervasive than positive spans, mirroring the well-established paradigm that news media in general focuses on negative events (Niven, 2001;Patterson, 1996).
Lexical bias appears early in an article.We further study differences in characteristics between lexical and informational annotation spans and find that the two bias types diverge in positional distributions.Figure 2 shows that a disproportionate amount of lexical bias is located in the first quartile of articles.A visual inspection indicates that this may be attributed in part to media sources' attempts to hook readers with inflammatory speech early on (e.g., FOX: "Paul Ryan stood his ground against a barrage of Biden grins, guffaws, snickers and interruptions.").
In contrast, informational bias is often embedded in context, and therefore can appear at any position in the article.This points to a future direction of bias detection using discourse analysis.
Quotations introduce informational bias.We also find that almost half of the informational bias comes from within quotes (48.7%), highlighting a bias strategy where media sources select opinionated quotes as a subtle proxy for their own opinions (see the second HPO and first NYT annotations in Figure 1).

Portrayal of Political Entities
On the document-level, only 17 out of 100 article sets had reversed orderings (i.e.FOX marked as "more liberal" or HPO marked as "more conservative" within a triplet), confirming the ideological leanings identified in previous studies.Here, we utilize BASIL's span-level annotations to gain a more granular picture of how sources covering the same events control the perception of entities.
Concretely, we examine the polarity of bias spans with target entities of known ideology.As shown in Figure 3, for both bias types, the percentage and volume of negative coverage for liberal entities strongly correspond to the ideological leaning of the news outlet.Note that though NYT appears to have significantly more informational bias spans against conservatives than HPO, this is because NYT tends to have longer articles than the other two sources (see Table 1), and thus naturally results in more annotation spans by raw count. 3oreover, the breakdown of lexical bias distinguishes FOX from the other two outlets: it comparatively has more negative bias spans towards liberals and fewer towards conservatives, even though all three outlets have more conservative entities than liberal ones across the 100 triplets (average of 99.0 conservatives, 72.7 liberals).

Experiments on Bias Detection
We study the bias prediction problem on BASIL as a binary classification task (i.e., whether or not a sentence contains bias) and as a BIO sequence tagging task (i.e., tagging the bias spans in one sentence at the token-level).We benchmark the performance with rule-based classifiers and the popular BERT model (Devlin et al., 2019)  on informational and lexical bias spans separately.
Training Details.We utilize the pre-trained BERT-Base model and use the "Cased" version to account for named entities, which are important for bias detection.We run BERT on individual sentences 4 and perform stratified 10-fold cross validation.The validation set is used to determine when to stop training and a held out test set is used for the final evaluation of each fold.For the sentence-level classifiers, both our informational and lexical models use 6,819 sentences for training, 758 for validation, and 400 for testing.Due to the sparsity of our data, we train and test our token-level models only on sentences containing bias spans of the relevant bias type.Our informational and lexical bias sequence taggers use a train/val/test split of 1,043/116/62 sentences and 383/42/23 sentences respectively.Results are shown in Table 2.
Sentence-level Classifier.The fine-tuned BERT is better at predicting informational bias than lexical bias, likely because informational bias is better captured by sentence-level context.As a baseline, we select the 4 sentences 5 in each article with the lowest average TF-IDF token scores as containing informational bias.The intuition is that sentences with different content than the rest of the article are more likely to contain extraneous information that the author chose to include to frame the story in a certain way.We find that this simple baseline performs relatively well considering the difficulty of the task, indicating the importance of explicitly modeling context.Future work may consider leveraging context in the entire article or articles on the same story by other media.Token-level Classifier.From Table 2, we see that the BERT lexical sequence tagger produces better recall and F1 than the informational tagger, highlighting the additional difficulty of accurately identifying spans of informational bias.We also use the polarity and subjectivity lexicons from the MPQA website (Wilson et al., 2005;Choi and Wiebe, 2014) as a simple baseline for lexical bias tagging and find that these word-level cues, though widely used in prior sentiment analysis studies, are insufficient to fully capture lexical bias.
In order to evaluate token-level prediction on the larger original test set, we conduct a pipeline experiment with the fine-tuned BERT models where sentences predicted as containing bias by the best sentence-level classifier from cross validation are tagged by the best token-level model.The results reaffirm our hypothesis that while both tasks are extremely difficult, informational bias is more challenging to detect.

Conclusion
We presented a novel study on the effects of informational bias in news reporting from three major media outlets of different political ideology.Analysis of our annotated dataset, BASIL, showed the prevalence of informational bias in news articles when compared to lexical bias, and demonstrated BASIL's utility as a fine-grained indicator of how media outlets cover political figures.An experiment on bias prediction illustrated the importance of context when detecting informational bias and revealed future research directions.

A Sample Annotations
On the right, several sample annotations from the BASIL dataset illustrate some aspects of our annotation schema and highlight characteristics of informational bias.
Indirect Bias.Though not as prevalent as bias spans with direct aim, indirect aim is nevertheless important to study because readers may find it more difficult to detect bias consciously when it does not directly implicate the main entity.Indirect bias can be aimed through an intermediary ally or opponent, or may be based on contextual information.In each case, the sentiment towards the intermediary entity alters sentiment toward the main target entity.Figure 4a shows an example of indirect bias where Donald Trump is negatively targeted via the negative framing of an ally, Donald Trump Jr. Readers are required to know the relationship between the two men in order to notice the bias, and the information itself would be irrelevant to the article were it not for their relationship.
The span from HPO in Figure 4b shows an indirect bias span where contextual information unconnected to the rest of the article reflects negatively on Trump without mentioning him in the text.It requires several leaps in logical thinking: children and families seeking asylum are sympathetic :: turning them away is bad :: Trump wants a border wall :: Trump is framed negatively.This type of informational bias is difficult to detect algorithmically as there is no mention of Trump, the target main entity.
Informational Bias Strategies.Inspecting the informational bias spans in our dataset reveals several trends and strategies that journalists tend to   4b show the strategy where objective but tangential information frames the target in a negative light given the context of the article.The example from FOX uses nonessential background information to imply Trump is hypocritical, and the NYT example includes a detail peripheral to the main event that portrays Trump as rambling.Figure 4c is an example of subtle informational bias where the author's opinion masquerades as fact.The writing is in a neutral tone and appears objective, but it is actually the author's perception of the situation and uncovers their bias towards the topic.The span is categorized as informational bias rather than lexical because there is no way to rephrase or remove parts of the sentence without changing the overall meaning.This span is also an example of the rarer positive bias span.

B Data Collection
BASIL contains 100 triplets of articles, each with 3 articles about the same main event from the New York Times (NYT), Fox News (FOX), and the Huffington Post (HPO).According to Budak et al. (2016), FOX is considered strongly right leaning, NYT slightly left leaning, and HPO strongly left leaning.As an initial annotation set, 16 triplets of highly visible, polarizing events were directly selected from the media source websites by our annotators.
The remaining triplets were aligned algorithmically from the Common Crawl corpus. 6Articles with less than 200 words or more than 1,000 words were filtered out, and only political, non-editorial articles published within 3 days of each other were considered.Article similarity was calculated using the cosine similarity of the TF-IDF vectors of each article's title combined with its first 5 sentences.For each FOX article, the most similar NYT article was found, then the most similar HPO article was found using this pair.An annotator manually selected the final triplets from this list of automatically aligned triplets.
Main event and entities were manually annotated for each article by one annotator.Articles in a triplet share the same main event, which the annotator produced after reading the leads of the three articles.Main entities sometimes differ across the triplet, as stories about the same event can emphasize different characters, but at least one main entity is consistent across each triplet.A single article contains an average of 2.04 main entities and at most five main entities.
During the annotation process, the order of articles is randomized within each triplet and annotators are not aware of the media source of the article.The entire dataset was annotated by three unique annotators.

C Inter-annotator Agreement
Our study of inter-annotator agreement consists of two parts: the agreement of the text spans selected and the agreement on the dimensions within each annotation span.To find text span agreement, a similar method to Toprak et al. ( 2010) is used in which precision, recall, and F1 are calculated between two annotators using the agreement metric from Wiebe et al. (2005), treating one annotator's spans as the gold standard and the other annotator's spans as the system.Results are calculated for exact matching, where the text spans must overlap exactly to be considered correct, and lenient matching, where text spans with any overlaps are considered correct (Somasundaran et al., 2008).Table 3 shows that span agreement is higher for spans of informational bias than for spans of lexical bias due to the sparsity of lexical bias in our dataset (see Table 1 in the main paper).
Dimension agreement is reported in Table 4 only for lenient matching spans, as the results are not significantly different from that of exact matching spans.Cohen's κ is used to measure attribute agreement for target, polarity, and aim, and we find high levels of agreement for both polarity and target.Because of the metric's sensitivity to class imbalance, Cohen's κ is impractical for measuring the agreement on aim for one annotator pairing (B + C), which had fewer article triplets to resolve and nearly all overlapping lexical annotations were marked as direct (31 / 32 spans).To  account for this imbalance, the percent agreement for all attributes is also included in Table 4.

D Javascript Annotation Tool
A Javascript based tool7 was developed to annotate our dataset.Annotations created in the tool can be downloaded in JSON format and analyzed or imported at a later date.Users can highlight spans of text or select an entire sentence, then answer dimensional questions (see Figure 5b).Users can also answer document-level questions (see Figure 5c).Figure 5a shows the tool after annotations have been made, where blue spans are informational bias and green spans are lexical bias.In order to alleviate eye strain, annotations of the entire sentence are shown underlined rather than highlighted.

Main event :
Democratic presidential candidates ask to see full Mueller report Main targets: Donald Trump, Democratic candidates HPO: Democrats want access to special counsel Robert Mueller's investigation into Russian interference in the 2016 presidential election [before President Donald Trump has a chance to interfere.]Trump ... Sen. Mark Warner said in a statement: ["Any attempt by the Trump Administration to cover up the results of this investigation into Russia's attack on our democracy would be unacceptable."] Trump FOX: Democratic presidential candidates [wasted no time] Dems Friday evening demanding the immediate public release of the long-awaited report from Robert S. Mueller III.... Several candidates, in calling for the swift release of the report, also [sought to gather new supporters and their email addresses] Dems by putting out ["petitions"] Dems calling for complete transparency from the Justice Department.

Figure 1 :
Figure 1: Examples of negative bias from Huffington Post (HPO), Fox News (FOX), and New York Times (NYT) discussing the same event.Informational bias and lexical bias are highlighted.The target of the bias is noted at the end of each span.Intermediary targets of indirect bias spans are underlined.

Figure 2 :
Figure 2: Distribution of lexical and informational bias spans found in each quartile of an article.The shaded area represents the 95% confidence interval for the three outlets combined.

Figure 3 :
Figure 3: Percentage of bias spans with negative polarity toward targets of known ideology, grouped by media source, bias type, and target's ideology.For example, in all HPO articles, there are 46 lexical bias spans targeting liberals, 78.26% of which are negative.Larger circle means greater number of spans.Darker color corresponds to higher ratio of negative spans.

Main Event :
Trump reverses decision to allow import of elephant trophies Main Entity: Donald Trump NYT: On social media, photos were being shared of Mr. Trump's two elder sons hunting on safari in Zimbabwe, [including one photo that showed Donald Trump Jr. with a severed elephant tail in one hand and a knife in the other.]Trump(a) Indirect negative informational bias against Donald Trump, using the intermediary entity Donald Trump Jr. Main Event: Trump declares national emergency over border wall Main Entity: Donald Trump HPO: [Since 2014, a high proportion of those crossing have been Central American children and families seeking to make humanitarian claims such as asylum.]Trump FOX: President Trump said Friday he is declaring a national emergency on the southern border ... [despite his criticisms of former President Barack Obama for using executive action.]Trump NYT: Mr. Trump's announcement came during a freewheeling, 50-minute appearance ... [The president again suggested that he should win the Nobel Peace Prize, and he reviewed which conservative commentators had been supportive of him, while dismissing Ann Coulter, who has not.]Trump (b) Example annotations showing negative informational bias from all three media sources for one article triplet.Main Event: Raul Labrador challenges Kevin McCarthy for House majority leadership Main Entities: Raul Labrador, Kevin McCarthy HPO: [Labrador is an ambitious, sometimes savvy politician.]Labrador He is in Idaho this weekend chairing the state GOP convention.(c) Example annotation of positive informational bias.

Figure 4 :
Figure 4: Excerpts showing different types of informational bias, annotated in italics.The target of the negative bias is noted at the end of each span.Underlined entities are intermediary targets in indirect bias spans.
(a) Tool with loaded annotations.Informational bias spans are shown in blue, lexical bias spans are shown in green.(b) Tool with sentence-level annotation form.(c) Document-level annotation form.

Figure 5 :
Figure 5: Our Javascript annotation tool at various steps.

Table 1 :
Descriptive statistics of the BASIL dataset.Mean and standard deviation shown where applicable.Annotation dimensions show raw counts and their percentage within the dimension in parentheses. fine-tuned

Table 3 :
Inter-annotator span agreement for lexical and informational bias.Dashes indicate that there were no exact matching lexical text spans between annotators B and C.

Table 4 :
Number of articles resolved by each annotator pairing, along with Cohen's κ and percent agreement for IAA on auxiliary dimensions for overlapping spans.