But What Do We Actually Know?

Knowledge bases such as Wikidata, DBpedia, YAGO


Motivation
General-purpose knowledge bases (KBs) such as Wikidata [21], the Google Knowledge Vault [4], NELL [10], or YAGO [19] aim to collect as much factual information about the world as possible. They store information about entities (such as Barack Obama, Hawaii, or NAACL), and information about relationships between these entities (such as the fact that Barack Obama was born in Hawaii, and that NAACL took place in San Diego). These pieces of information typically take the form of triples, as in Barack Obama, wasBornIn, Hawaii . KBs find applications in question answering, automated translation, or information retrieval.
The quality of a KB can be measured along several dimensions. A prominent one is the size. To-day's KBs can contain millions, if not billions of triples. Another criterion is precision, i.e., the proportion of triples that are correct. YAGO, e.g., was manually evaluated on a sample, and was shown to have a precision of 95%. In this paper, we propose to look at a third criterion for quality which, besides in some manual evaluations that have ground truth available, has been largely neglected so far: recall, i.e., the proportion of facts of the real world that are covered by the KB. For some topics, today's KBs show very good recall values. For example, On some other topics, today's KBs are nearly completely incomplete: • DBpedia contains currently only 6 out of 35 Dijkstra Prize winners. • According to YAGO, the average number of children per person is 0.02. • The Google Knowledge Graph contains a predicate called "Points of Interest" for countries. Since this predicate is subjective, it is not even clear how to measure its recall.
Previous research [18,9] has shown that between 69% and 99% of instances in popular KBs lack at least one property that other entities in the same class have. This gives us a hint of how incomplete KBs really are. The problem is not just that KBs do not contain missing triples, but also that they do not know how many are missing, or whether some are missing at all. This is an issue from several perspectives: • In this paper, we investigate the problem of recall for KBs, and outline possible approaches to solve it.

Vision
Vision. Our vision is that a KB should know for which topics it is complete, and for which topics it is not. Under appropriate interpretation of terms, this could be phrased as KBs should know what they know.
Defining Completeness. In line with work in databases [11,8,13], we define completeness by help of a hypothetical ideal KB K * . The ideal KB contains all facts of the real world. We say that a KB K is correct, if K ⊆ K * . We say that K is complete for a query Q, if Q(K) ⊇ Q(K * ). For example, we could say that a KB K is complete for the children of Obama by saying This means that evaluating this query on K will return at least the two children that we would expect as an answer in the real world. Completeness is always bound to a particular query, because we do not expect that we can ever construct a KB K = K * . A query can represent the completeness of simple triples about a subject (as in the example), but also for complex constellations, such as "This KB is complete for all rivers longer than 100km in Europe". We believe that completeness assertions are particularly interesting for class expressions. These are conjunctive queries with a single selection variable. The class expression for the long rivers of Europe would be: We thus know an infinite number of negative facts.
Recall. The recall of a KB K for a query Q is |Q(K) ∩ Q(K * )| × |Q(K * )| −1 . The recall is 1 for a query, if the KB is complete for that query.
Cardinality. The cardinality of a query on a KB is the number of results. If we know the cardinality of a query on K * , and if we know that the KB is correct, we can compute the recall of the KB for that query, and vice versa.
Size. The larger a KB is, the more likely it is to be complete, everything else being equal.
Confidence. Completeness assertions can be crisp, but they could also be made with a certain confidence score. For example, we could be 80% certain to have all children of Obama.

Challenges
We see four main challenges that need to be mastered in order to arrive at knowledge about the knowledge of KBs:

Knowing What Can Be Known
A prerequisite for completeness assertions are unambiguous definitions. Some relations such as "sibling" or "place of birth" are well-defined, while others, such as "affiliation" or "hobby" are not. For example, while one of Einstein's hobbies was playing the violin, he might have had an unclear number of other "hobbies" (such as going for a walk, or eating chocolate). If a topic is not well-defined, completeness has little meaning as well. One might assume that KBs generally contain well-defined predicates, yet this is not always the case. As mentioned before, the Google Knowledge Graph contains an attribute pointOfInterest. While some attractions are clearly points of interest (such as the Colosseum in Rome), others are less clearly so (e.g. the pub that DiCaprio allegedly threw up at). In such cases, the concept of crisp completeness is meaningless. We note that some fuzzy concepts can be turned into crisp ones by binding them to particular verifiable properties. For example, it makes sense to consider completeness for "Points of interest recommended by Tripadvisor", because this is a well-defined verifiable set.

Languages for Describing Completeness
Various formal languages for completeness assertions have been proposed [11,8,13], while Erxleben et al. [5] have introduced no-values into Wikidata (e.g. Elisabeth I has no children), thus allowing specifying completeness if the object has no values, but not in the general case. All proposals so far deal only with boolean descriptions, mentioning whether data of some kind is present or not, but do not allow descriptions of confidences or recall.

Obtaining Completeness Information
Experts. There are two main paradigms for constructing KBs: manual construction by experts or the crowd, and automated extraction from Web sources. For expert-created data, it makes sense to give the task of recall estimation to the experts too (as is the case already for the no-values in Wikidata, and wider envisioned in the tool COOL-WD [3]). In this way, a comparable quality of data and recall information can be guaranteed. For automatically extracted data, it is highly desirable to find automatic ways to estimate the recall. Partial Completeness Assumption. The partial completeness assumption (PCA) [7] has been proven to do well in providing negative information [7,4]. It assumes that if a KB contains one pair of property and object for a given subject, then the KB contains all objects for that given subject and property. For instance, if a KB contains the fact that Sasha is a child of Obama, then it is assumed that the KB contains all children of Obama. Hence, anyone who is not known to be a child of Obama is not. The validity of the PCA has been evaluated manually [6] on YAGO. For relations with generally high functionality [17], the PCA holds nearly perfectly. For example, the PCA holds for 90% of the subjects of the worksAt relation. For others, the PCA is less suited. For hasChild, e.g., the precision of the PCA is only around 27%. Pattern Matching. Phrases on the Web such as "has no children" or "X and Y are all his children" can be used to infer completeness. Similarly, phrases such as "The 199 Nobel laureates in Physics..." could be used to assert the cardinality and hence the recall for a class.
Growth Patterns over Time. The growth of data over time, and especially the end of such a growth, might indicate completeness. For instance, we can imagine that once a new congress is established, its members are added to a KB until eventually all are inside. The fact that the number then remains constant could indicate completeness.
Interrelation. The completeness of a certain class expression could be learned from the completeness of other class expressions. For instance, it might be that if parents of a person are complete, then also the children are complete with a higher probability. Crowd Sourcing. The crowd could be used to manually generate completeness annotations. A related idea is to use games with a purpose [1].
Estimating Cardinalities. Mark and recapture techniques have been developed in the domain of ecology in order to estimate the size of a population of animals. For this purpose, a sample of animals is captured, marked, and freed. After some time, another sample of animals is captured. The ratio of marked animals in this sample can help estimate the size of the population. This technique works also if samples are not independent, and has been used in the estimation of cardinalities of search results [16]. We believe that it might also be useful for estimating the size of a set of entities, based on the overlap between different websites or datasources dealing with the same topic. Based on the size estimate, and the number of entities already in the KB, one could then estimate the recall.

Combining Completeness Information
Once information about the recall of KBs for individual classes exists, methods need to be found to present this information in a meaningful way. Several techniques can annotate query answers with completeness information [8,13,14], but only if the underlying database is annotated with such information, and only for crisp boolean completeness information. Techniques from the domain of query answering over probabilistic databases [2] could possibly be extended to handle non-crisp completeness assertions, while techniques from data profiling can help understand distributions and skew [12]. Also, one would need to apply these techniques to state-of-the-art KBs in order to finally know how much we currently know about the world (see Fig. 1). The community would then have to develop benchmarks for comparing the performance of completeness estimators, and for the completeness of KBs themselves, and would face the classic challenge of KB alignment, because information may be differently presented in different KBs.

Conclusion
In this paper, we have outlined our vision of knowledge bases (KBs) that know how complete they are. Their completeness assertions could be used to guide knowledge engineers in the extension and debugging of the KB, to provide negative examples for machine learning algorithms, and to qualify answers to user queries. We have surveyed the state of the art in the area, and concluded that we cannot yet automatically determine where KBs are complete. We have discussed the challenges in defining, determining, and combining completeness assertions, and have outlined possible paths to address them.