“A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce”: Learning State Changing Verbs from Wikipedia Revision History

Learning to determine when the timevarying facts of a Knowledge Base (KB) have to be updated is a challenging task. We propose to learn state changing verbs from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity’s Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. We use Wikipedia edit history to distantly supervise a method for automatically learning verbs and state changes. Additionally, our method uses constraints to effectively map verbs to infobox changes. We observe in our experiments that when state-changing verbs are added or deleted from an entity’s Wikipedia page text, we can predict the entity’s infobox updates with 88% precision and 76% recall. One compelling application of our verbs is to incorporate them as triggers in methods for updating existing KBs, which are currently mostly static.


Introduction
Extracting relational facts between entities and storing them in knowledge bases (KBs) has been a topic of active research in recent years. The resulting KBs are generally static and are not updated as the facts change (Suchanek et al., 2007;Carlson et al., 2010;Fader et al., 2011;Mitchell et al., 2015). One possible approach to updating KBs is to extract facts from dynamic Web content such as news (Nakashole and Weikum, 2012). In this paper, we propose to predict state changes caused by verbs acting on entities in text. This is different from simply applying the same text extraction pipeline, that created the original KB, to dynamic Web content.
In particular, our approach has the following advantages: (1) Consider for example the SPOUSE relation, both marry and divorce are good patterns for extracting this relation. In our work, we wish to learn that they cause different state changes. Thus, we can update the entity's fact and its temporal scope (Wijaya et al., 2014a). (2) Learning state changing verbs can pave the way for learning the ordering of verbs in terms of their pre-and post-conditions.
Our approach learns state changing verbs from Wikipedia revision history. In particular, we seek to establish a correspondence between infobox edits and verbs edits in the same article. The infobox of a Wikipedia article is a structured box that summarizes an entity as a set of facts (attribute-value pairs) . Our assumption is that when a statechanging event happens to an entity e.g., a marriage, its Wikipedia infobox is updated by adding a new SPOUSE value. At approximately the same time, the article text might be updated with verbs that express the event, e.g., X is now married to Y. Figure 1 is an example of an infobox of an entity changing at the same time as the article's main text to reflect a marriage event.
Wikipedia revision history of many articles can act as distant supervision data for learning the correspondence between text and infobox changes. However, these revisions are very noisy. Many infobox slots can be updated when a single event happens. For example, when a death happens, slots regarding birth e.g., birthdate, birthplace, may also be updated or added if they were missing before. Therefore, our method has to handle these sources of noise. We leverage logical constraints to rule out meaningless mappings between infobox and text changes. In summary, our contributions are as follows: (1) we present an algorithm that uses Wikipedia edit histories as distantly labeled data to learn which verbs result in which state changes to entities, and experimentally demonstrate its success, (2) we make available this set of distantly labeled training data on our website 1 , and (3) we also make available our learned mappings from verbs to state changes, as a resource for other researchers, on the same website.

Data Construction
We construct a dataset from Wikipedia edit histories of person entities whose facts change between the year 2007 and 2012 (i.e., have at least one fact in YAGO KB (Suchanek et al., 2007) with a start or end time in this period). We obtain Wikipedia URLs of this set of entities P from YAGO and crawl their article's revision history. Given a person p, his/her Wikipedia revision history R p has a set of ordered dates T p on which revisions are made to his/her Wikipedia page (we consider date granularity). Each revision r p,t ∈ R p is his/her Wikipedia page at date t where t ∈ T p .
Each Wikipedia revision r p,t is a set of infobox slots S p,t and textual content C p,t . Each infobox slot s ∈ S p,t is a quadruple, s att , s value , s start , s end containing the attribute name (non-empty), the attribute value, and the start and end time for 1 http://www.cs.cmu.edu/ dwijaya/postcondition.html which this attribute-value pair holds in reality.
A document d p,t in our data set is the difference 2 between any two consecutive revisions separated by more than 24 hours i.e., d p,t = r p,t+2 − r p,t , where r p,t+2 is the first revision on date t + 2 and r p,t is the last revision on date t (as a page can be revised many times in a day).
A document d p,t is therefore a set of infobox changes ∆S p,t and textual changes ∆C p,t . Each slot change δs ∈ ∆S p,t = s att , δs value , δs start , δs end is prefixed with + or − to indicate whether they are added or deleted in r p,t+2 . Similarly, each text change δc ∈ ∆C p,t is prefixed with + or − to indicate whether they are added or deleted.
For each d p,t , we use ∆S p,t to label the document and ∆C p,t to extract features for the document. We label d p,t that has a new value or start time added to its infobox: s att , +δs value , * , * ∈ ∆S p,t or s att , * , +δs start , * ∈ ∆S p,t with the label begin-s att and label d p,t that has a new end time added to its infobox: s att , * , * , +δs end ∈ ∆S p,t with the label end-s att .
The label represents the state change that happens in d p,t . For example, in Figure 1, d kim, 05/23/2014 is labeled with begin-spouse.
The revision history dataset that we make available for future research consists of all documents d p,t , labeled and unlabeled, ∀t ∈ T p , t ∈ [01/01/2007, 12/31/2012], and ∀p ∈ P ; a total of 288,184 documents from revision histories of 16,909 Wikipedia entities. Using our labeling process, we find that out of 288,184 documents, only 41,139 have labels (i.e., have their infobox updated with new values/start/end time). The distribution of labels in the dataset is skewed towards birth and death events as these are life events that happen to almost all person entities in Wikipedia. The distribution of labels in the dataset that we release can be seen in Figure 2. We show only labels that we evaluate in our task.
For our task of learning state changing verbs from this revision history dataset, for each labeled d p,t , we extract as features, verbs (or verbs+prepositions) v ∈ ∆C p,t of which its subject (or object) matches the Wikipedia entity p and its object (or subject resp.) matches an infobox value, start or end time: arg1= p and s att ,arg2, * , * or s att , * ,arg2, * or s att , * , * ,arg2 ∈ ∆S p,t . We use Stanford CoreNLP (Manning et al., 2014) to dependency parse sentences and extract the subjects and objects of verbs. We find that 27,044 out of the 41,139 labeled documents contain verb edits, but only 4,735 contain verb edits with two arguments, where one argument matches the entity and another matches the value of the infobox change. We use the latter for our task, to improve the chance that the verb edits used as features are related to the infobox change.

Model
We use a Maximum Entropy (MAXENT) classifier 3 given a set of training data = is the set of all verbs in our training data, and y is the label of d as defined in 2.1.
These training documents are used to estimate a set of weight vectors w = {w 1 , w 2 , ... w |Y | }, w y ∈ R |V | , one for each label y ∈ Y , the set of all labels in our training data. The classifier can then be applied to classify an unlabeled document d u using: (1)

Feature Selection using Constraints
While feature weights from the MAXENT model allow us to identify verbs that are good features for predicting a particular state change label, our distantly supervised training data is inherently noisy. Changes to multiple infoboxes can happen within our revision. We therefore utilize constraints among state changes to select consistent verb features for each type of state change.
We use two types of constraints: (1) mutual exclusion (Mutex) which indicate that mutex state changes do not happen at the same time e.g., update on birthdate should not typically happen with update on deathcause. Hence, their state changing verbs should be different. (2) Simultaneous (Sim) constraints which indicate that simultaneous state changes should typically happen at the same time e.g., update on birthdate should typically happen with other birth-related updates such as birthplace, birthname, etc. We manually specified these two types of constraints to all pairs infoboxes where they apply. We have 10 mutex constraints and 23 simultaneously updated constraints. The full list of constraints can be found in our website.
Given a set of constraints, a set of labels Y , and a set of base verbs 4 B in our training data, we solve a Mixed-Integer Program (MIP) for each base verb b ∈ B to estimate whether b should be a feature for state change y ∈ Y .
We obtain label membership probabilities {P (y|b) = count(y, b)/ y count(y , b)} from our training data. The MIP takes the scores P (y|b) and constraints as input and produces a bit vector of labels a b as output, each bit a y b ∈ {0, 1} represents whether or not b should be a feature for y.
Solving MIP per base verb is fast; we reduce the number of labels considered per base verb i.e., we only consider a label y to be a candidate for b if ∃ v i ∈ V s.t. w i y > 0 and b = base form of v i . After we output a b for each b, we select features for each label. We only select a verb v i to be a feature for y if the learned weight w i y > 0 and a y b = 1, where b = the base form of v i . Essentially for each label, we select verb features that have positive weights and are consistent for the label.

Experiments
We use 90% of our labeled documents that have verb edits as features (section 2.1) as training data and test on the remaining 10%. Since revision history data is noisy, we manually go through our test data to discard documents that have incorrect infobox labels by looking the text that changed. The task is to predict for each document (revision), the label (infobox slot change) of the document given its verbs features. We compute precision, recall, and F1 values of our predictions and compare the values before and after feature selection (Fig. 3).
To the best of our knowledge, the task to learn state-changing verbs in terms of states defined in existing knowledge bases and learning it from Wikipedia edit histories is novel. There is no previous approach that can be used as baseline; therefore we have compared our structured prediction using MIP and MAXENT with a majority class baseline. Both our approaches (MAXENT and MAXENT + MIP) perform better than the majority class baseline (Figure 3).

Related Work
Learning from Wikipedia Revision History.
Wikipedia edit history has been exploited in a number of problems. A popular task in this regard is that of Wikipedia edit history categorization (Daxenberger and Gurevych, 2013). This task involves characterizing a given edit instance as one of many possible categories such as spelling error correction, paraphrasing, vandalism, and textual entailment (Nelken and Yamangil, 2008;Cahill et al., 2013;Zanzotto and Pennacchiotti, 2010;Recasens et al., 2013). Prior methods target various tasks different from ours. Learning State Changing Verbs. Very few works have studied the problem of learning state changing verbs. (Hosseini et al., 2014) learned state changing verbs in the context of solving arithmetic word problems. They learned the effect of words such as add, subtract on the current state.
The VerbOcean resource was automatically generated from the Web (Chklovski and Pantel, 2004). The authors studied the problem of fine-grained semantic relationships between verbs. They learn relations such as if someone has bought an item, they may sell it at a later time. This then involves capturing empirical regularities such as "X buys Y" happens before "X sells Y". Unlike the work we present here, the methods of (Chklovski and Pantel, 2004;Hosseini et al., 2014) do not make a connection to KB relations such as Wikipedia infoboxes. In a vision paper, (Wijaya et al., 2014b) give high level descriptions of a number of possible methods for learning state changing methods. They did not implement any of them.

Conclusion
In this paper we presented a method that learns state changing verb phrases from Wikipedia revision history. We first constructed and curated a novel dataset from Wikipedia revision history that is tailored to our task. We showed that this dataset is useful for learning verb phrase features that are effective for predicting state changes in the knowledge base (KB), where we considered the KB to be infoboxes and their values. As future work we wish to explore the usefulness of our verb resource to other KBs to improve KB freshness. This is important because existing KBs are mostly static. We wish to also explore the application of the learned verb resource to domains other than Wikipedia infobox and text e.g., for predicting state changes in the knowledge base from news text.
In this paper, we learned post-conditions of verbs: state changes that occur when an event expressed by a verb happens. As future work we would also explore the feasibility of learning preconditions of verbs from Wikipedia revisions. Additionally, most Wikipedia revisions only have text changes without the associated infobox change. Therefore, another line of future work is to also learn from these unlabeled documents.