A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

Textual patterns (e.g., Country’s president Person) are specified and/or generated for extracting factual information from unstructured data. Pattern-based information extraction methods have been recognized for their efficiency and transferability. However, not every pattern is reliable: A major challenge is to derive the most complete and accurate facts from diverse and sometimes conflicting extractions. In this work, we propose a probabilistic graphical model which formulates fact extraction in a generative process. It automatically infers true facts and pattern reliability without any supervision. It has two novel designs specially for temporal facts: (1) it models pattern reliability on two types of time signals, including temporal tag in text and text generation time; (2) it models commonsense constraints as observable variables. Experimental results demonstrate that our model significantly outperforms existing methods on extracting true temporal facts from news data.


Introduction
Temporal fact extraction is to extract (entity, value, time)-factual tuples from text data (e.g., news, tweets) for specific attributes. It acts as one of the fundamental tasks in knowledge base construction, knowledge graph population, and question answering. For example, if we were interested in country's president, the entity would be of type Location.Country, the value would be of type Person, and the time would be a valid year in the person's presidential term. Thanks to name entity recognition (NER) and typing systems (Del Corro et al., 2015), pattern-based information extraction methods generate patterns consisted of entity types (Jiang et al., 2017;Li et al., 2018;Reimers et al., 2016). They are widely used for good transferability across domains and datasets, unsupervised manner requiring no or very few annotations, and high efficiency. The typed patterns give only the association between entity and value. Two types of time signals can be attached to the pairs, forming temporal triples: One is temporal tag in text, e.g., the year tag next to the entity/value mentions in the sentence; the other is text generation time, i.e., the year the text document was posted. For example, given two sentences: Pattern-based methods discover two patterns: • P1: former Country president Person • P2: Person, now president of Country, Then the methods can extract the following tuples. We label and for correct tuples and incorrect ones, respectively: (France, Jacques Chirac, 1995) or unreliable depending on the pattern. So, there is a dependency between pattern and type of time signal, in terms of reliability. Existing truth finding approaches assumed that a structured "source-object-claim" database was given and then estimated the reliability of source for inferring whether the claim was true or false (Yin et al., 2008;. For example, a source could be a book seller, an object could be a book's author list, and a claim could be an author list that a seller gave for a book. One conclusion was that probabilistic graphical models (PGM)  have advantages of estimating source reliability over the general data distributions, compared with bootstrapping algorithms (Yin et al., 2008;Li et al., 2018;Wang et al., 2019). However, PGM-based truth finding models have not yet been developed for the task of information extraction. Estimating the reliability of textual patterns is new (O1). Moreover, when we focus on temporal fact extraction, modeling the dependency between pattern and type of time signals is also new (O2).
In truth finding, it is critical to define conflicts. For the book seller's example, we assume that one book can have only one true author list; so if we knew one list was true, then any different list of the same book would be false. This originated from our commonsense. Fortunately, we have quite a few commonsense rules for temporal facts, i.e., specific attributes. On country's president, we know that • one president serves only one country; • one country has only one president at a time; • however, one country can have multiple presidents in the history (e.g., USA, France). For the attribute sports team's player, we have commonsense rules: • one player serves only one club at a time; • however, one club has multiple players and one player can serve multiple clubs in his/her career. We generalize possible commonsense rules: • C1: one value matches with only one entity; • C2: one entity matches with only one value; • C3: one value matches with only one entity at a time; • C4: one entity matches with only one value at a time. So, we know that the attribute country's president follows C1 and C4; and the attribute sports team's player follows C3. The third challenge (O3) is the necessity of modeling the commonsense (e.g., C1-C4) for identifying conflicts, estimating pattern reliability, and finding true temporal facts.
To address the three challenges (O1-O3), we propose a novel Probabilistic Graphical Model with Commonsense Constraints (PGMCC), for finding true temporal facts from the results from patternbased methods. The given input is the observed frequency of tuples extracted by a particular pattern and attached with a particular type of time signal. We model information source as a pair of pattern and type of time signal. We represent the source reliability as an unobserved variable. It becomes a generative process. We first generate a source. Next we generate a (entity, value, time)-tuple. Then we generate the frequency based on the source reliability and the tuple's trustworthiness (i.e., probability of being a truth). Moreover, we generate variables according to the commonsense rules if needed -the variable counts the values/entities that can be matched to one entity/value with or without a time constraint (at one time) from the set of true tuples. Given a huge number of patterns (i.e., 57,472) and tuples (i.e., 116,631) in our experiments, our proposed unsupervised learning model PGMCC can effectively estimate pattern reliability and find true temporal facts.
Our main contributions are: • We introduce the idea of PGM-based truth finding to the task of pattern-based temporal fact extraction. • We propose a new unsupervised probabilistic model with observed constraints to model the reliability of textual patterns, the trustworthiness of temporal tuples, and the commonsense rules for certain types of facts. • Experimental results show that our model can improve AUC and F1 by more than 7% over the state-of-the-art. The rest of this paper is organized as follows. Section 2 introduces the terminology and defines the problem. Section 3 presents an overview as well as details of the proposed model. Experimental results can be found in Section 4. Section 5 surveys the literature. Section 7 concludes the paper.
2 Terminology and Problem Definition 2.1 Terminology Definition 1 (Temporal fact: (entity, value, time)-tuple)). Let F = {f 1 , f 2 , f 3 , . . . } be the set of temporal facts. Each fact f is in the format of (entity, value, time). F was extracted by textual pattern-based methods.
Definition 2 (Pattern s). Let P ( * ) = {p ( * ) 1 , . . . } be the set of pattern source, here * ∈ {post, tag} stands for the type of time signal (i.e., "text gen. time" and "temporal tag"). One pattern paired with different types of time signals will be treated as different pattern sources.
Definition 3 (Extraction). Let E = {e 1 , e 2 , e 3 , . . . } be the set of extractions. Our generative model will take E as input. An extraction item e is in the format of (f , p ( * ) , o).
Here o stands for the observed frequency of fact tuples f that were extracted by pattern p ( * ) in E.
Definition 4 (Constraint). Each commonsense rule (constraint) is represented as a variable. The variable is likely to be observed as 1. Examples: • one value matches with only one entity, denoted as C 1v−1e that counts the number of such entities. • one entity at one time matches with only one value, denoted as C 1(e,t)−1v that counts the number of values.

Problem Definition
Suppose the set of extractions E have been obtained by pattern-based methods from text data. We define the problem as follows: Given a set of extractions E, pattern sources P ( * ) , and the constraints C a for attribute a, infer truth T for all temporal facts F contained in E and quality information for each pattern source p ( * ) .

Proposed Approach
We mainly discussed the model detail of PGM with multiple Constraints C 1(e,t)−1v and C 1v−1e , since it's the most complicated scenario while modeling constraint. The given input is the observed frequency of fact tuples extracted by a particular pattern and attached with a particular type of time signal. Figure 1 gives the plate notation of our model. Each node represent a variable. Blue nodes indicate hyper-parameter. Gray nodes stand for observable variable. And white nodes stand for latent variables we want to infer.

Generative Process
Our approach based on PGM is a generative process. We first generate a source. Next we generate a (entity, value, time)-tuple. Then we generate the frequency based on the source reliability and the tuple's trustworthiness. Moreover, we generate variables according to the commonsense constraints. The variables counts the values/entities that can be matched to one entity/value with or without a time constraint (at one time) from the set of true tuples. The concrete meaning of each variable has been given in Table 1.
Temporal fact trustworthiness. For each temporal fact f ∈ F, we first draw θ f , i.e., the prior truth probability of fact f , from a Beta distribution with hyper-parameter β 0 and β 1 : β 0 and β 1 represent the prior distribution of fact reliability. In practice, if we have a strong prior knowledge about how likely all or certain temporal facts are true, we can model it with the corresponding hyper-parameters. Otherwise, if we do not have a strong belief, we set a uniform prior, which means it's equally likely to be true or false, and our model can still infer the truth from other factors. After drawing the θ f , we generate the truth label l f from a Bernoulli distribution with parameter θ f : Pattern source reliability. As aforementioned, a reliable pattern source is more likely to extract true facts with higher counts, and extract false facts Real numbers: reliability of pattern p ( * ) on giving false/true fact tuples C 1v−1e Real number: the number of entities given one value v C 1(e,t)−1v Real number: the sum of values given one entity e and one time t Hyper-Parameter µ 0 , µ 1 Integers: prior counts of false/true tuples extracted by a textual pattern κ 0 , κ 1 Integers: prior sums of false/true tuples extracted by a textual pattern β 0 , β 1 Integers: prior counts of false/true tuples with lower counts. Therefore, we choose average count of false/true as latent pattern reliable weight, it's represented as λ p ( * ) 0 , λ p ( * ) 1 for pattern p ( * ) . The Gamma distribution is utilized because it is the conjugate prior of Poisson distributions. Initially, these two parameters are generated from Gamma distribution with hyper-parameter {µ 0 , κ 0 }/{µ 1 , κ 1 }, respectively. µ 0 and µ 1 represent the prior number of false/true fact the pattern extract, and κ 0 and κ 1 determine the prior sum of false/true fact count: Constraints. Finally, we draw the constraint variables. In temporal fact extraction, we define two variables C 1(e,t)−1v and C 1v−1e . C 1(e,t)−1v limits the number of truth on certain constraint key {e, t}.
There are as many C 1(e,t)−1v variables as unique {e, t} keys: where F e,t denotes a set of f with same {e, t}. Each C 1(e,t)−1v is generated by F e,t set. C 1v−1e ilimits the truth of fact with same {v}: otherwise.
(7) where F v denotes set of fact with value v, F e,v stands for a set of temporal fact f with same {e, v}, F e,v ∈ F v . l e,v denoted the truth label of v, e. Each C 1v−1e is generated by F v . If there is true fact f ∈ F e,v , then l e,v equals to one, otherwise, l e,v equal with zero.

Dataset
We focus on attribute country's president and experiment on the same data set in the work of (Wang et al., 2019). It has 9,876,086 news articles (4 billion words) published from 1994-2010. We have 57,472 patterns, 116,631 temporal fact tuples, and 1,326,164 extractions. The dataset's ground truth was collected from Google and Wikipedia. It includes 3,175 true temporal facts of 130 countries.

Competitive methods
We compare our model with: • TRUTHFINDER (Yin et al., 2008): It was a bootstrapping algorithm for structured data using C 1v−1e .
• LTM : It was a probabilistic model, assuming that the truth about an object contains more than one value. We set "object" as {entity, time} and set value as the temporal fact's value. • TRUEPIE (Li et al., 2018): It was a bootstrapping method using C 1(v)−1e and estimating pattern reliability.
• MAJVOTE (Goldman and Warmuth, 1995): It used the weighted majority voting strategy and returned the most frequent temporal fact.
• TFWIN (Wang et al., 2019): It was the state-ofthe-art bootstrapping method for truth discovery on fact extraction. However, error propagation is serious in its iterative process.

Evaluation settings
All the methods can only find truth of temporal fact at one time point, e.g., (French, Jacques Chirac, 1995). However, due to the incompleteness of fact description in data, some time points of temporal facts could be missing. One way to improve the evaluation is to composite true temporal fact time points {e, v, t} into temporal fact time period

Evaluation metrics
We evaluate all competitive methods using precision, recall, F1 score, and AUC (Area Under the Curve). Precision is the the fraction of temporal fact truth among all the temporal fact that were labelled as true. Recall is the fraction of true temporal facts our approach finds among the ground truth temporal facts. F1 score is the harmonic mean of precision and recall. For all of the metrics, higher score indicates that the method performs better.

Effectiveness
The results are given in Table 2. Our proposed method PGMCC consistently outperforms all the baselines on finding (country, president, time)-facts (i.e., presidential terms).
PGMCC vs LTM: PGMCC performs significantly better than LTM (+34.5% AUC; +64.4% F1) on evaluating time points, and performs better with (+40.45% AUC; +71.2% F1) on evaluating time periods. LTM was designed to solve structured truth finding like the bookseller example. So, there were many conflicts when applied to temporal fact extraction. PGMCC has multi-constraint as observable variables to alleviate the issue.
PGMCC vs TFWIN: PGMCC performs better than TFWIN (+2.4% AUC; +2.8% F1) on evaluating time points, and performs better with (+5.2% AUC; +8.3% F1) on evaluating time periods. TFWIN started with seed patterns and defined constraints as a rule to eliminate conflicting tuples. However, the inference on conflicts was based on local information (i.e., the current pattern reliability estimation). During this process, error might propagate through iterations. PGMCC is a probabilistic graphical model that can avoid error propagation by modeling constraints as variables and inferring truth with the global data distributions.
PGMCC with different constraints: See the last two rows in Table 2. For both PGMCC and TFWIN models, a complete constraint set, i.e., {C 1(v)−1e and C 1(e,t)−1v }, gives the best performance. Partial constraint cannot fully identify conflicts or false tuples. C 1(e,t)−1v plays a significant role in extracting country's president.  Here are our observations. First, the pattern "president Person of Country" is the only pattern that shows high reliability on both types of time signals (above 0.85). Second, the textual patterns that describe the current presidency are likely to have higher reliability on text gen. time than temporal tag, because the presidency was likely to be in the same time as the document was generated. These patterns usually have words such as "current", "newly", and "now". Third, the textual patterns that describe the past presidency are likely to have higher reliability on "tag" than "post", because the presidency was likely to be in the same time as the event (described in the sentence) happened but before the time of the document being generated. These patterns usually have words such as "have governed", "have ruled", "former", and "formerly".

Related Work
In this section, we review two relevant fields to our work, temporal fact extraction and truth discovery.

Truth Discovery
In big data era, the issue of "Veracity" on resolving conflicts among multi-source information is quite serious (Berti-Equille, 2015;Vydiswaran et al., 2011;Waguih and Berti-Equille, 2014;Dong et al., 2009;Galland et al., 2010;Xiao et al., 2016;Yin and Tan, 2011). Truth discovery methods find trustworthy information from conflicting multi-source (Xiao et al., 2015;. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. A few truth discovery methods are probabilistic model. LTM solved the "Book's author list problem" and modeled its source in two-fold quality . GTM solved the task of finding true numeric value of "New York City's population" . TEXTTRUTH found the true answer for a question from multi users (Zhang et al., 2018b).

Temporal Fact Extraction
Temporal fact extraction is to extract (entity, attribute name, attribute value)-tuples along with their time conditions from text corpora (Sil and Cucerzan, 2014;Hoang-Vu et al., 2016;Chekol, 2017;Zhang et al., 2018a;Shang et al., 2018;Zeng et al., 2019;. Textual patterns have been proposed to extract structured data from unstructured text data in an unsupervised way, such as E-A patterns (Gupta et al., 2014), parsing patterns (Nakashole et al., 2012), and meta patterns (Jiang et al., 2017). However, patterns are of different reliability and extractions are sometimes conflicting. In order to get reliable temporal fact, we addressed this problem using truth discovery.

Limitations and Future Work
Though the proposed approach show effectiveness in experiments, it and/or the study has several limitations. First of all, because collecting temporal factual truth for a variety of attributes is very expensive, in this study, we only studied a single relation type. In future work, we will apply the approach to other types of temporal-facts if correct constraints can be defined, such as sports team's players and spouse relationship. Second, though the patterns were generated by automated mining technologies such as Meta Patterns (Jiang et al., 2017) (in other words, they are not hand-crafted), the pattern mining as a preprocessing step is needed. The approach is not end-to-end.

Conclusions
In this work, we proposed a probabilistic graphical model for inferring true facts and pattern reliability. It had two novel designs for temporal facts: (1) it modeled pattern reliability on temporal tag in text and text generation time; (2) it modeled commonsense constraints as observable variables. Experimental results demonstrated that our model outperformed existing methods.