Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News

Although many fact-checking systems have been developed in academia and industry, fake news is still proliferating on social media. These systems mostly focus on fact-checking but usually neglect online users who are the main drivers of the spread of misinformation. How can we use fact-checked information to improve users' consciousness of fake news to which they are exposed? How can we stop users from spreading fake news? To tackle these questions, we propose a novel framework to search for fact-checking articles, which address the content of an original tweet (that may contain misinformation) posted by online users. The search can directly warn fake news posters and online users (e.g. the posters' followers) about misinformation, discourage them from spreading fake news, and scale up verified content on social media. Our framework uses both text and images to search for fact-checking articles, and achieves promising results on real-world datasets. Our code and datasets are released at https://github.com/nguyenvo09/EMNLP2020.


Introduction
The rampant spread of biased news, partisan stories, false claims and misleading information has raised heightened societal concerns in recent years. Many reports pointed out that fabricated stories possibly caused citizens' misperception about political candidates (Allcott and Gentzkow, 2017), manipulated stock prices (Kogan et al., 2019) and threatened public health (Ashoka, 2020;Alluri, 2019).
The proliferation of misinformation has provoked the rise of fact-checking systems worldwide. Since 2014, the number of fact-checking outlets has totally increased 400% in 60 countries (Stencel, 2019). However, fabricated stories and hoaxes are still pervading our cyberspace. Fig. 1 shows an example of a fake quote related to Barack Obama. The quote had been debunked by Snopes (Emery, 2016) on September 08, 2016 but two months later, it appeared again inside an original tweet posted by a Twitter user (called an original poster) and was retweeted over 28 thousand times. Perhaps, the original poster and people who shared the original tweet did not know if it was fact-checked or they might share it simply because it was suitable for their personal preferences or ideologies (Lewandowsky et al., 2012). In other words, existing fact-checking systems mainly focus on detection but neglect online users who play the critical role in spreading fake news. After detecting fake news, what are the next steps to discourage people from sharing it? Recent studies Lee, 2018, 2019) tried to curb the above weakness. However, these approaches are not proactive since they rely on fact-checkers who may be unreliable. Recent works showed that when seeing factchecked information, users' likelihood to delete fake news's shares went up 400% (Friggeri et al., 2014) and 95% of the time users did not further consume or go through fake news (CNN, 2020). Observing the downside of existing methods and impacts of broadcasting verified news, our goal is to search for fact-checking articles (FC-articles) which address the content of original tweets (i.e. confirming, supporting, debunking or refuting). We show a mock-up of how a relevant FC-article is linked/displayed given an original tweet in Fig. 1. By searching for FC-articles and incorporating factchecked information into social media posts, we can warn users (e.g. followers of original posters) about fake news to which they are exposed. The search also proactively scales up volume of verified content on social media. However, achieving the goal is challenging since we need to solve two problems: (P1) what information in original tweets should we use to find correct FC-articles? and (P2) how can we design a framework to retrieve and rank FC-articles?
With the first problem (P1), we can use original tweets' text to find FC-articles. However, this approach is suboptimal since fake news can appear in many forms (e.g. text, images, videos) (Friggeri et al., 2014;O'Brien, 2018) as shown in Fig. 1. Thus, we propose to use both text and images of original tweets to search for FC-articles. Regarding the second problem (P2), we propose a framework consisting of two key steps: (1) using a basic retrieval (i.e. BM25) to find initial lists of candidate FC-articles and then (2) re-ranking the initial lists by using advanced models for ranking refinement. In the first step, since original tweets' text may be insufficient to find correct articles as shown in Fig. 1 where there is no meaningful information in the text but in the image, we propose to expand original tweets' text by using text inside original tweets' images. For the second step, we propose an attention mechanism to focus on key textual matching signals and jointly integrate them with visual information to boost ranking quality. By tackling these issues, our contributions are as follows: • To the best of our knowledge, our study is the first one that searches for fact-checking articles to increase users' awareness of fact-checked information when they are exposed to fake news. • We propose a novel neural ranking model which jointly utilizes textual and visual matching signals. The model is also integrated with a novel attention mechanism. • Experiments on two datasets demonstrate effectiveness and generality of our model over state-of-the-art retrieval techniques.

Related Work
Fake News and Fact-checking. Fake news detection methods mainly use linguistics and textual content (Zellers et al., 2019;Zhao et al., 2015;Shu et al., 2019), temporal spreading patterns Ma et al., 2018), network structures Liu et al., 2020) and users' feedbacks Lee, 2019, 2020;Shu et al., 2019). Studies about multimodal fake news detection (Gupta et al., 2013;Wang et al., 2018b) are different from ours since their inputs are text and images of tweets while our inputs are pairs of a multimodal tweet and a FC-article.
Our work is closely related to evidence-aware fact-checking. Thorne et al. (2018); Nie et al. (2019) built a pipeline to find documents and sentences to fact-check mutated claims generated from Wikipedia pages, Wang et al. (2018a) aimed to find webpages related to given FC-articles and predict their stances on claims in the FC-articles. Popat et al. (2018) only focused on fact-checking and (Shaar et al., 2020) detected previously factchecked claims. Our paper deviates from these work since we aim to find FC-articles given multimodal fake news in social media posts. As our goal is to increase users' awareness of verified news, studies about fact-checkers Lee, 2018, 2019;You et al., 2019) are close to ours.
Neural Ranking Models for Text Search. Neural ranking models for text search mainly fall into two groups: semantic matching and relevance matching models. The former one seeks to learn representations of a query and a document, and measure their similarity (Huang et al., 2013;Shen et al., 2014;Severyn and Moschitti, 2015;Nie et al., 2019;Zhu et al., 2019), while the later one Xiong et al., 2017;Hui et al., 2018;Dai et al., 2018) aims to capture relevant matching signals between a query and a document based on word interactions. There are methods unifying two categories such as Mitra et al. (2017); Rao et al. (2019a). Our model can be viewed as a relevance matching method in which a novel attention mechanism is designed to focus on crucial word interactions.
Neural Models for Multimodal Retrieval. Multimodal data (e.g. text and images) are used in cross-modal retrieval (Cao et al., 2016;Balaneshin-kordan and Kotov, 2018;Chen et al., 2016), visual Q&A task (Kim et al., 2018), product search (Laenen et al., 2018;Guo et al., 2018) and so on. Our work is the first using multimodal data in social media posts to search for verified information.

Our framework
Given an original tweet q and a FC-article d, where every original tweet q contains text and images and the article d contains text and/or images, we aim to derive function f (q, d) which determines their relevancy 1 . We use f (q, d) to rank all FC-articles.
Following (Thorne et al., 2018), we adopt the re-ranking methodology as follows: (1) quickly retrieving candidate FC-articles/documents 2 for each original tweet/query 3 by a basic retrieval and (2) reranking the candidates by our MAN (Multimodal Attention Network) as shown in Fig. 2. We describe our input representations, basic retrieval and MAN in following subsections.

Input Representations
We denote text and images of an original tweet q as (q text , q images ) where q text is a sequence of Similarly, text and images of a factchecking article d are denoted as

Basic Retrieval
We use BM25 as a basic retrieval due to its good performance compared with several ranking models (McDonald et al., 2018;Pang et al., 2017). Since using tweets' text may be insufficient to find relevant articles, we expand queries' text by using text extracted from images. For example, in Fig. 1, text extracted from the image is Breaking News: Obama: "I won't leave if Trump is elected". Following (Vosoughi et al., 2018), we use a tool (OCR Space, 2020) to extract text in images. To our knowledge, our work is the first one using text in images to find verified information.

Projection Layers
We use two projection layers: one for Glove embeddings and the other one for contextual word embeddings.
Projection layer for Glove embeddings. Each word w, which can be w q i or w d j , is mapped into a 1 Relevance means that the fact-checking article fact-checks the query 2 We use fact-checking articles, articles and documents interchangeably 3 We use original tweets and queries interchangeably vector t ∈ R 300 by a fixed word embedding layer initialized by Glove embeddings (Pennington et al., 2014). Then, the vector t is projected into g ∈ R P by a trainable linear layer shown in Eq. 1.
where W 1 ∈ R P ×300 , b 1 ∈ R P . P is projection dimensions. After going through the linear layer, we denote g q i ∈ R P and g d j ∈ R P as representations of word w q i and word w d j , respectively. Projection layer for contextual word embeddings. Since Glove embeddings do not reflect context of words in queries and articles, we integrate ELMo (Peters et al., 2018) as a static encoder to generate contextual word embeddings. ELMo maps each word w, which can be w q i or w d j , into a vector ∈ R 1024 which is then projected into h ∈ R P by a trainable linear layer shown in Eq. 2.
where W 2 ∈ R P ×1024 , b 2 ∈ R P . P is projection dimensions. After going through the linear layer, we denote h q i ∈ R P and h d j ∈ R P as contextual representations of words w q i and w d j , respectively.

Textual Matching Layer
We derive (1) Glove embeddings interactions, (2) attended interaction matrix and (3) contextual word embedding interactions, and input them to convolution neural networks (CNNs) for feature extraction.

Glove Embeddings
Interactions. An article may be relevant to an original tweet if they have overlapping words or similar words. To capture such signals, we use cosine similarity to derive matrix S ∈ R N ×M as shown in Eq. 3.
Let's look at an example of matrix S in Fig. 4(a) where x-axis is an article and y-axis is a query. Roughly speaking, matrix S looks like a gray-scale image in which the overlapping phrase 'at a costume party' is like a segment at the bottom of the image, suggesting the article is relevant to the query. To capture such patterns, CNNs are widely used. Attended Interaction Matrix. Matrix S captures overlapping words between a query and an article. However, when word w q i is same as word w d j , sometime they may not have the same meaning. Thus, we need an attention mechanism to avoid overreliance on raw similarities in matrix S. Inspired by Tay

Glove
Keep your promise Barack Obama has announced he will refuse...

ELMo
Glove ELMo

Textual features
will be in (0, 0.5] and G ij will be in (0, 1]. Therefore, we can use G ij to attend to S ij as follows: It is clear to see that when the distance between h q i and h d j is large, G ij will be closer to 0 which helps downgrade impact of S ij . From Eq. 5, we can form attended interaction matrix A ∈ R N ×M . To our knowledge, our work is the first one using dissimilarity between contextual word embeddings to attend to interactions of Glove embeddings. Contextual Word Embeddings Interactions. In our case studies in Section 6.5, we find that contextual word embeddings are able to capture high similarity between a typo and a normal word (e.g. hillar vs. hillary) while Glove embeddings fail to do so. To further exploit contextual embeddings, we derive matrix C ∈ R N ×M as follows: Again, we can view matrix C as a greyscale image as shown in Fig. 4(d). In addition to cosine similarities, we found that using bilinear function (Rao et al., 2019a) works pretty well as well.
Textual Feature Extraction. We stack matrices S (Eq. 3), A (Eq. 5) and C (Eq. 6) and S − C to generate a tensor Z ∈ R N ×M ×4 shown in Eq. 7. The matrix S − C is used to make our model aware of differences between interaction matrices.
'⊕' denotes matrix stacking. We apply n CNNs on tensor Z to extract features. The i th CNN is performed with kernel size, stride and the number of filters equal to i × i × 4, 1 and F , respectively. The output feature map of the i th CNN layer is .n}. Note, padding zeros are used to ensure P i has size of N × M × F .
Next, we apply k-max pooling on each j th output channel of P i denoted as P i [: , : , j] ∈ R N ×M to generate vector o i,j ∈ R K as shown in Eq. 8.

Visual Matching Layer
A fixed pretrained ResNet50  maps an image v, which is either an image of an original H and T are set to 2048 and 300, respectively. After the linear layer, we denote m q i ∈ R T and m d j ∈ R T as representations of v q i and v d j , respectively. Intuitively, an article is relevant to a query if the article has images similar to the query's images. Thus, we derive matrix V ∈ R X×Y of pairwise similarities of images in Eq. 10.
Similar to (Rao et al., 2019a,b), we pool the largest pairwise similarity s as a visual feature as follows: When the article has no images, s is set to −1.

Unifying Textual and Visual Information
We unify textual and visual information by appending scalar s (Eq. 11) to vector o (Eq. 9), denoted as [o; s], and derive f (q, d) as shown in Eq. 12.
We remove biases to avoid clutter. Our model is trained on triples consisting of a query q, relevant document d + and non-relevant document d − , minimizing hinge loss in Eq. 13.

Data Collection
Finding FC-articles, which address an original tweet, is laborious since we have to read many FC-articles even when using search engines (Popat et al., 2017(Popat et al., , 2018. To reduce labeling efforts, we looked at existing datasets (Jiang and Wilson, 2018;Vosoughi et al., 2018; and found that a dataset in Vo and Lee (2019) met our need.
The dataset provides non-anonymized pairs of an original tweet and its reply in which FC-articles, from two major fact-checking sites snopes.com and politifact.com, are embedded. Fact-checkers in  replied to the original tweet posters with FC-articles as evidence. From the original tweets' replies, we generate pairs of an original tweet q and a FC-article d. We also only kept original tweets where text and images are both available. After preprocessing, we have 19,341 original tweet in English and FC-article pairs (q, d) in which there are 18,961 unique original tweets and 2,845 FC-articles. Following Vosoughi et al. (2018), a labeling step is conducted to ensure that in each pair, the article fact-checks the original tweet. We hired native U.S. English speakers since they were more likely to be familiar with topics in the tweets and FC-articles. The labelers labeled each pair (q, d) as 1 if the article d fact checked the tweet q. Otherwise, they labeled it as 0. They were trained directly by the authors and were asked to label several examples as exercises to ensure that they fully understood the task. We required labelers to read the original tweet's text, the article's text and images, and developed a labeling UI to help labelers to quickly explore the linked FC-articles shown in Fig. 5 in our appendix. For each pair, three different labelers labeled it. The final label is based on the majority vote. The Kappa value is 0.56, suggesting moderate agreement among the labelers (Viera et al., 2005).
The moderate agreement between labelers was because there were many pairs of an original tweet and a FC-article where the tweet and the article are topically similar but the article does not fact-check the tweet. For example, the tweet is about Hillary Clinton's mishandled classified emails while the article fact-checks if she gave uranium to Russia. Both the tweet and the article were about Hillary Clinton but the article did not precisely fact-check the tweet's content. As we utilized the dataset in  which was collected during the 2016 U.S. presidential election, many tweets and FC-articles were about misinformation related to Hillary Clinton and Donald Trump, leading to topically similar pairs which might confuse labelers. After labeling, we have a full dataset of 13,239 positive pairs made by 13,091 original tweets and 2,170 FC-articles.
We observe that there may be false negatives in the full dataset, meaning that a FC-article actually fact-checks an original tweet but the article is viewed as an irrelevant one (i.e., 100% precision but less than 100% recall) because the FC-article was not embedded in a fact-checker's reply. For example, an original tweet is fact-checked by both a Snopes article and a Politifact one but only the Snopes article was embedded in the fact-checker's reply to the original tweet while the Politifact one was not included in the reply. If we build a model on the full dataset, the false negatives may mislead our model. To mitigate impact of this problem, we split the full dataset into two sub datasets called Snopes and Politifact datasets. The former one contains pairs where FC-articles are from snopes.com and the later one contains pairs where FC-articles are from politifact.com. Note, there still may be false negatives in each sub dataset since an original tweet may have multiple fake news stories factchecked by different articles from the same factchecking website but a fact-checker did not embed all of the articles in the reply. But the number of false negatives under this case would be smaller than those in the full dataset. In Snopes dataset, we have 11,202 positive pairs made by 11,167 tweets and 1,703 FC-articles. In PolitiFact dataset, we have 2,037 positive pairs made by 2,026 tweets and 467 FC-articles. There are 102 overlapping tweets between the two datasets. The number of unique original posters is 8,277 and 1,482 in Snopes and Politifact respectively. On average, each original poster posted ∼1.35 tweets in our datasets.

Data Analysis
Topics of original tweets/queries. Since the topic of an original tweet is related to the topic of a corresponding FC-article, we extracted topics of relevant FC-articles to understand the topical distribution of tweets. By analyzing each FC-article, top 5 topics of tweets in Snopes are as follows: Politics (42.3%), Fauxtography (22.7%), Junk News (8.1%), Uncategorized (6.8%), Quotes (4.8%). For Politifact, tweets' topics are mostly about politics due to its political mission. In conclusion, our datasets captured various topics.

Similarity of text in tweets and text in images.
As we utilize text in images to enhance ranking performance, we seek to understand how similar text in tweets and text in images. For each query/tweet having text in its images, we transformed its text in tweet and its text in images into two vectors of TF-IDF values, and computed their cosine similarity. From all queries of a dataset, we computed mean cosine similarity. The mean similarity is 0.083 and 0.102 for Snopes and Politifact respectively, indicating that text in tweets is less similar to text in images. The number of tweets/queries containing text in images is 8,494 (76%) and 1,742 (86%) for Snopes and Politifact respectively.
In Fig. 3(a), HIT@50 of BM25-T is only 50% while BM25-I's HIT@50 is 70%, suggesting that a lot of fake news appear in images. This is because tweets' text has at most 280 characters. Images are more attractive to online users and easier to convey fake news to them. When K is larger, BM25-I's HIT@K saturates quickly since only 76% queries have text inside their images. Finally, BM25-TI is the best. Its HIT@50 is 89.6%. Similar patterns appear in Politifact in Fig. 3(b). With BM25-TI, HIT@50 is 94%. From these results, we choose BM25-TI as the basic retrieval of our framework. Split Datasets. We need to choose value of K -the number of initial candidates for each query. If K is too small, initial candidates may not have relevant articles, leading to a meaningless re-ranking step. If K is too large, rerankers' running time may be high for online apps. We set K to 50 for both datasets. The number of queries, each of which has at least one relevant article in top 50 candidates, is 10,003 out of 11,167 for Snopes and 1,870 out of 2,026 for Politifact. Similar to Thorne et al. (2018), from these queries of each dataset, we randomly split them into train, validation and testing sets with ratio 80%/10%/10% as shown in Table 1. There are 1,164 and 156 leftover queries in Snopes and Politifact, respectively. Note, having leftover queries is a common issue for re-ranking based systems  (Thorne et al., 2018). Initial candidates output by BM25-TI are used by all neural ranking models. Testing Scenarios. All models are tested in the reranking step with 2 scenarios (SC1 and SC2). The main difference between them is whether to extract text from images of original tweets and FC-articles, and to incorporate the text with the other information (i.e., text and images of both the tweets and FC-articles The number of output channels F is chosen from {16, 24}. The value of k in kmax pooling is chosen from {16, 32, 48}. The number of CNNs n is chosen from {1, 2, 3}. Our model performs best on Snopes with P , F , k and n equal to 256, 16, 32 and 2, respectively. It performs best on PolitiFact with P , F , k and n equal to 256, 16, 48 and 3, respectively. We implement our model with PyTorch 0.4.1 and test it on a NVIDIA 1080 GTX GPU.

Performance of Multimodal Attention Network and Variants
We also show results of MAN's variants as follows: (1) only using text (Eq. 9) and (2) only using images (Eq. 11). We call the former Contextual Text Matching (CTM) and the later Visual Matching Network (VMN). We show MAN's improvements wrt. the best result of baselines in each metric. SC1: Re-ranking using images and text in tweets. In Table 2, our CTM outperforms the best baselines, achieving maximum improvements of 4.7% on NDCG@1. Our VMN amazingly outperforms text-based ranking baselines in Snopes perhaps because fauxtography is one of the most popular categories on Snopes (Friggeri et al., 2014) while Politifact mainly fact-checks political claims. By using both text and images, our MAN shows an average increase of 17.2% over the best baselines with the maximum improvement of 39.6%. SC2: Re-ranking using images, tweets' text and images' text. We omit VMN from Table 3 since its results are same as Table 2. In Table 3, both our MAN and CTM outperform baselines on two datasets. Interestingly, MAN has lower performance than CTM on Snopes while it has higher performance than CTM on Politifact. We suspect that the abundance of textual signals between original tweets and FC-articles in SC2 unintentionally makes MAN tend to favor textual signals and neglect visual signals. To remedy this issue, we propose to augment training data in SC2 with training data in SC1 while keeping the same validation and testing set from SC2. Intuitively, the augmented training data may regularize MAN better  by letting it observe both rich textual overlapping pairs in SC2 and pairs with sparse textual signals in SC1. We name our model trained under the augmented training data as MAN-A. In Table 3, MAN-A mitigates the above issue with an average increase of 4.8% over the best baselines with the maximum improvement of 11.2%. Text in images has a high impact on performance of CTM and MAN. In Table 3, when using text in images to expand textual content of queries, performance of CTM and MAN increased by 17∼34% compared with their performances in Table 2.
From Tables 2 and 3, semantic matching models and multimodal baselines perform worse than relevance matching methods because the first two groups' goal is to compress whole queries and articles into dense vectors and measure their similarities. However, when compressing textual contents, some irrelevant information may be captured, leading to poor representations (Rao et al., 2019a).
In conclusion, our model MAN outperforms all baselines in both two testing scenarios . Experiments on the leftover original tweets (i.e., 1,164 tweets in Snopes and 156 tweets in Politifact). We further test benefits of using text and images on each leftover query where we rank its x     Table 4 shows results of our best model MAN-A and best baselines in each group. As expected, MAN-A outperforms all the baselines due to sparse textual content in leftover queries.

Effect of Contextual Word Embeddings
To understand effects of word embeddings on our model, we remove visual information and study reranking results of our model when (1) using only Glove embeddings, (2) using only contextual word embeddings from ELMo and (3) Glove+ELMo. In Table 5, when combining Glove and ELMo, we consistently achieve best NDCG in both datasets.

Case Studies
Qualitative comparison with the best baseline. An example tweet is 'You won't have to wait long 4 ' embedded with a picture of an Antifa member beating a police officer. Clearly, the tweet's text does not have any meaningful information while the image contains useful information. Given the tweet, the best baseline CoPACRR failed to find relevant FC-articles, whereas MAN ranked the correct FC-article (Evon, 2017) in top-3 results. Visualization of interaction matrices and attended matrix Fig. 4 visualizes matrices S, G, A and C in Eq. 3, 4, 5, 6 respectively of an original tweet and its FC-article from a testing set. Note, these matrices are learned by our model. In Fig. 4(a), Glove embeddings help reveal overlapping phrases (e.g. at a costume party, clinton) but closeness of hillar and hillary is not well captured (i.e. sim(hillar, hillary) = 0.3). In contrast, sim(hillar, hillary) is 0.86 in Fig. 4(d), indicating quality of contextual word embeddings. The matrix G in Fig. 4  When combing matrix S and G, we have a sparse matrix A in Fig. 4(c) which pays more attention to key interactions (e.g. costume and party). In conclusion, the attention mechanism helps us capture key matching signals. Impact of Searching for FC-Articles. We measure how much impact we can make on online users when correct FC-articles are retrieved (i.e. HIT@1 = 1). Totally, our best model, MAN-A, accurately finds FC-articles for 910 original tweets in test set of Snopes dataset. From these tweets, the total number of their retweets is 527,299 and total number of followers of the original posters who posted 910 original tweets is 233M. Roughly speaking, we can inform fact-checked information to millions of users. Security systems can prevent half million shares of fake news in those original tweets.

Discussion
Since Snopes and Politifact are the most popular fact-checking sites, building two models for them is an acceptable cost. When facing a real-life social media post, we run two trained models sequentially. If there is no found FC-article, we can inform users that the post is unverified and suggest related pages from verified sites (e.g. governments' sites). When tweets do not have any images, we can use CTM which may find less relevant articles compared with MAN. However, CTM still performed better than the baselines as shown in Tables 2 and 3. We also built our best model (MAN-A) on the full dataset but observed some reduction in NDCG@1 and NDCG@3, but not HIT@3 compared with results of SC2 on separate datasets maybe because of the false negatives described in Section 4. However, our model still outperformed the baselines.
There are a few things our work could be improved. First, our basic retrieval BM25-TI does not consider images' similarities. To improve BM25-TI, we may combine images' similarities and BM25's score. We leave it as future work. Second, we create train/test data based on unique original tweets. Though there are no retweets and quotes, it is hard to completely ensure all queries's content are unique. However, our settings are applied to all models for fair comparisons. In addition, as shown in Fig. 1, online users tend to re-post fake news. Therefore, it may be reasonable to have similar original tweets' content. Third, we tried to fine tune BERT but did not achieve good results perhaps because we did not have enough data. Interestingly, prior work (Shaar et al., 2020) also had a similar observation when fine-tuning BERT.

Conclusions
In this paper, we propose a novel method to alleviate the spread of fake news. By searching for FC-articles and incorporating fact-checked information into social media posts, we can warn users about fake news and discourage them from spreading misinformation. Our framework uses text and images to search for FC-articles, achieving an average increase of 4.8% over best baselines with the maximum improvement of 11.2%. Complementary to fake news detection methods, our method proactively scales up verified content on social media.
Our framework can be used for other multimodal retrieval tasks (e.g. searching for verified sites as we suggested in the previous section).