Faceted Hierarchy: A New Graph Type to Organize Scientific Concepts and a Construction Method

On a scientific concept hierarchy, a parent concept may have a few attributes, each of which has multiple values being a group of child concepts. We call these attributes facets: classification has a few facets such as application (e.g., face recognition), model (e.g., svm, knn), and metric (e.g., precision). In this work, we aim at building faceted concept hierarchies from scientific literature. Hierarchy construction methods heavily rely on hypernym detection, however, the faceted relations are parent-to-child links but the hypernym relation is a multi-hop, i.e., ancestor-to-descendent link with a specific facet “type-of”. We use information extraction techniques to find synonyms, sibling concepts, and ancestor-descendent relations from a data science corpus. And we propose a hierarchy growth algorithm to infer the parent-child links from the three types of relationships. It resolves conflicts by maintaining the acyclic structure of a hierarchy.


Introduction
Concept hierarchies play an important role in knowledge discovery from scientific literature. Concepts are expected to be organized in a hierarchical structure like chapters-to-sections-tosubsections in textbooks. In this work, we propose a new representation of scientific concept hierarchy, called faceted concept hierarchy. Under this hierarchy, the links should not only carry parentto-child relations but also the semantic relations (facets) between the concepts. Figure 1 presents a part of the faceted hierarchy in Data Science. The parent node is "classification" and the child concepts of it are excepted to be grouped into three facets, each of which has three child-concepts. One example of the faceted relation we define is as follows: parent("classification", "svm"): "models", The idea of faceted concept hierarchy from Data Science publications: For student learning, concepts are expected to be organized in a hierarchical structure. For example, here the nine child-concepts of "classification" (in dashed line blocks) should be grouped into three facets ("models", "applications", and "metrics").
which is more complete than "type-of" relations in the works that focused on taxonomy or ontology induction (Liang et al., 2017;Gupta et al., 2017; like this: type-of ("svm", "classification model").
The basic units of the hierarchy include concept nodes and their parent-to-child relations. Three types of essential structural relations are expressed in paper texts and can be used to infer the parentto-child relations. The relation types include (1) synonym (concept names on the same node), (2) sibling (concept nodes having the same parent), and (3) ancestor-to-descendant (nodes on the direct descending line). The task of hierarchy construction has three challenges. First, there is no sufficient human annotated data or available distant supervisory sources to feed into (deep) learning models. It is necessary to extract the concepts and relations in an unsupervised manner. Second, the extracted relations could be noisy at the long tail of the frequency distribution. When inferring the parent-to-child relations, the algorithm should consider the trustworthiness of the synonym, sib-ling, and ancestor relations. Also, it is important to detect redundant or conflicting relations (links) on the hierarchy. Third, it requires a joint process of clustering child-concepts into the parent concept's facets and identifying words as facet indicators.
We propose a novel framework HiGrowth that grows faceted hierarchies from literature data. The framework has five modules: (M1) scientific concept extraction, (M2) concept typing, (M3) hierarchical relation extraction, (M4) hierarchy growth, and (M5) facet discovery. The M1-M3 NLP modules were implemented in an unsupervised manner. First, we use two complementary keyphrase mining tools to extract concepts: one is rule-based and the other is a statistical learning method. Second, we use a KNN-based method, simple and effective, to assign types (e.g., $Problem, $Method) to the concepts. Third, we use textual patterns to extract the hierarchical relations (i.e., synonym, sibling, and ancestor). To address the second challenge, we design an efficient algorithm that grows a concept hierarchy by scanning the set of relation tuples (sorted by their frequency from the highest to the lowest) just once and inferring parentto-child relations. This algorithm will be able to identify unnecessary, invalid, and redundant links during the process of hierarchy growth in spite of serious noise at the long tail. Finally, we use a word clustering method to discover the facets of every parent concept and assign child concepts to each of the facets.
Thirty-two junior/senior students who took the Data Science course in Spring 2018 were asked to manually label the parent-child concept pairs. We finalize a set as ground-truth if the pair was labelled by more than half of the participants. The F1 score of building the parent-to-child links is 0.73. The F1 score of 2-hop paths is 0.69. Both precision values are above 0.99, showing that the links in the hierarchy are precise because of the careful design of the growth algorithm, but the pattern-based methods have limitations of finding all possible relations.

Data Description
We collected full text, all sections including abstract, introduction, and experiments, of 5,850 papers in the proceedings of ACM SIGKDD 1994, IEEE ICDM 2001, The Web Confer-

M1: Scientific Concept Extraction
We use phrase mining tools, AutoPhrase (Shang et al., 2018) & SCHBase (Adar and Datta, 2015), to extract scientific concepts from data science papers. AutoPhrase adopted distant supervision and large-scale statistical techniques; SCHBase focused on a tendency to expand keyphrases by adding terms, coupled with a pressure to abbreviate to retain succinctness in academic writing.

M2: Concept Typing
We use a simple but effective method to classify the concepts into four types: $Problem (e.g., "fraud detection"), $Method (e.g., "svm"), $Object (e.g., "frequent patterns"), and $Metric (e.g., "accuracy"). We assume that the neighboring nonstop word indicates the concept's type, for example, the trigger word "problem" in the sentence ". . . the problem of fraud detection" suggests that "fraud detection" is a $Problem. We manually select a set of 20 trigger words that indicate concept types when they appear left/right next to the concepts. Table 1 shows a few examples. If in the text one concept has a left/right neighboring word in the set, the corresponding type gets one vote. For each concept, we count the votes on every type and use the strategy of majority voting (MajVot) to determine the predicted type (i.e., the most voted).

M3: Hierarchical Relation Extraction
In order to find the relations in an unsupervised manner on the scientific text, we use textual patterns, mainly Hearst Patterns (Hearst, 1992), to accurately extract three types of hierarchical relations, where X and Y are two concept names: • synonym(X, Y ), if X and Y will be included in the same concept node on the hierarchy; • sibling(X, Y ), if the concept nodes of X and Y will have the parent node; • ancestor(X, Y ), if there will be a path from the concept node of X to the node of Y . Note that synonym and sibling relations are symmetric, while ancestor-to-descendant is asymmetric (see Figure 2). Find synonym(X, Y ). Two ideas to find synonym concepts: First, the plural form of a noun or nounphrase concept can be considered as a synonym, for example, we have synonym("SVM", "SVMs") and synonym("decision tree", "decision trees"). Second, the abbreviation inside of parentheses can be considered as a synonym of the full name preceding the parenthesis. We have synonym("support vector machines", "SVMs") from text ". . . Support Vector Machines ( SVMs ). . . ". Find ancestor(X, Y ). Hearst patterns such as . . , (and|or)} Y n , have been often used to find "isA" relation or called hypernym for taxonomy construction: Y i (e.g., "dog") is a kind of X (e.g., "mammal"). However, we expect to extract faceted hierarchical relations such as • ancestor("machine learning", "SVM"): models; • ancestor("machine learning", "classification"): tasks; • ancestor("classification", "SVM"): models; instead of • isA("machine learning models", "SVM"); • isA("machine learning tasks", "classification"); • isA("classification models", "SVM"), if the text contains • . . . machine learning models such as SVM. . . ; • . . . machine learning tasks such as classification. . . ; • . . . classification models such as SVM. . . , especially when "machine learning" has been extracted as a concept. Note that we are not confident to say every relation given by pattern matching is parent-to-child. We denote the relation as ancestor. We expect that "machine learning" connects to "SVM" through "classification" on the hierarchy instead of a direct connection. Therefore, we modify the patterns as below: • X <pl> such as {Y 1 , . . . , (and|or)} Y n , • X <pl> including {Y 1 , . . . , (and|or)} Y n , where <pl> is the plural form of a noun or noun phrase, e.g., "models" and "tasks". We extract ancestor(X, Y i ) from the above patterns. We will  infer concrete parent-to-child relations and parent concept's facets in the next section.
Find sibling(X, Y ). Shorter patterns in which the ancestor concept names are missing occur more frequently in the text, for example: We extract sibling(Y i , Y j ) from these patterns. The number of sibling relations is more than the number of the ancestor relations, and the sibling relations, e.g., sibling("precision", "recall"), bring useful information to hierarchy induction, say, Y i and Y j have the same parent concept node.
We use the strategy of majority voting to choose one relation type for each pair of concepts. We assume that a pair of concepts can have no or only one relation among synonym, sibling, and ancestor. However, the relational extractions may still be noisy due to the long tail. Next we discuss how to construct a high-quality concept hierarchy from a set of the three types of relations with noise.

M4: The Hierarchy Growth Algorithm
Given a set of relations rel(X, Y ) and their support (i.e., frequency), construct a hierarchy H in which the links are directional indicating parent-  to-child relations between concepts, where rel ∈ {synonym, sibling, ancestor}. H should have no unnamed nodes, and have no unnecessary or invalid or redundant links. Specifically, the unnecessary means that the relation is correct but it does not contain extra information for the hierarchy. We will define these characteristics when we introduce each step of it in details. An overview of the algorithm comes as below.
• Initialize H as empty; • When adding a new sibling relation into the hierarchy: When post-processing descendant relations in the hierarchy: de sc en da nt Figure 6: Two scenarios that NIL nodes can be eliminated when finalizing the hierarchy.
-Grow the hierarchy H with this relation (see Figure 4). -Remove redundant links when the relation is ancestor (see Figure 5). • Narrow down ancestor relations to parent-tochild when the scan completes (see Figure 6). We denote different sets of connected nodes given a concept node X as below (see Figure 3): • P X is the set of parent nodes of X: there is at least one direct link from ∀Z ∈ P X to X; • C X is the set of child nodes of X: there is at least one direct link from X to ∀Z ∈ C X ; • A X is the set of ancestor nodes of X: there is at least one path but no direct link from ∀Z ∈ A X to X; • D X is the set of descendant nodes of X: there is at least one path but no direct link from X to ∀Z ∈ D X . Check if a relation is invalid (Figure 3). Given a new relation synonym(X, Y ), if there has been any other relation between X and Y such as ancestor (i.e., X ∈ D Y or Y ∈ D X ) or sibling (i.e., P X ∩ P Y = ∅), this new relation is invalid to be added to the H. Given sibling(X, Y ), if X and Y have at least one parent, we skip; if there has been an ancestor relation between X and Y , the sibling relation is invalid. Given ancestor(X, Y ), if there has been path from X to Y (i.e., Y ∈ D X ), we skip it; if there has been a sibling relation (i.e., P X ∩ P Y = ∅) or a descendant relation (i.e., X ∈ D Y ), the ancestor relation is invalid. Grow the hierarchy H with a new relation (Figure 4). We sort valid relations by their frequencies. For synonym(X, Y ), we merge node X and Y in H: if neither was in H, we create a new isolated node named "X, Y "; if one of them existed in H, we update the name of the existing node as "X, Y "; if both existed, we merge their ancestor nodes as the new ancestor node A X ∪ A Y , and we merge their descendant nodes as the new descendant node D X ∪ D Y . For sibling(X, Y ), if neither of the concepts existed, we create a "NIL" node as the parent node to each concept node; if one of them existed, for each parent node in P X , we add Y as a child node of it; if both existed, we merge their parent nodes as the parent node of each and eliminate the NILs.
For ancestor(X, Y ), we add a descendant link from X to Y . When X and Y are in H, we eliminate the NILs and remove the redundant links.
When adding a new relation sibling(X, Y ), we merge their parent nodes. If there has been at least one non-NIL node in the set of parent nodes, we remove the NILs. When adding an ancestor node of either X or Y , if they share a NIL parent node, we remove the NIL node. Remove redundant links when growing with ancestor(X, Y ) ( Figure 5). On the concept hierarchy, we allow only one path from an ancestor node to a descendant node. Therefore, when we add a new ancestor(X, Y ), there are three situations of having a redundant link. First, if there has been a path from X to Y , the new relation is redundant. For example, suppose on H, A ("svm") is a descendant node of X ("classification") and Y ("ls-svm") is a descendant node of A ("svm"). Then a new relation ancestor("classification", "lssvm") is actually inferable so it is redundant. We do not add it to the hierarchy. For the other two situations, we also remove the existing, redundant link in the hierarchy.

Experiments
We conduct experiments to answer three questions: (1) Are the three NIP modules effective in extracting hierarchical relations? (2) Does the hierarchy growth algorithm generate a hierarchy of better quality than existing methods? Are NIL nodes and redundant link removal necessary? (3) What does the result hierarchy look like?

Results on Three IE Components
M1: Scientific concept extraction. Table 2 shows examples of data science concepts the tools extracted. The learning module in AutoPhrase can segment words and phrases of good statistical features like high frequency. There is often no ambiguity when we lowercase them but the phrase lengths tend to be short. SchBase has a different philosophy: it looks for abbreviation expansions that could be long and of very low frequency. We show some case studies in Table 2. For result of  AutoPhrase, some 1-gram and n-gram high quality phrase are in Table 2a. For results of SchBase, some acronyms and typical abbreviation expansions we selected are in Table 2b. With these two complementary tools, we harvest a collection of 215 data science concepts. M2: Concept typing. Table 3 shows that the accuracy of concept typing (a 4-class classification task) is 0.874. Table 4a gives two of the 27 MajVot's false predictions. We observe that some synonym/sibling concept names like "topic model" and "topic models" have inconsistent predicted types due to the sparsity of their neighboring words. Therefore, we leverage the synonym/sibling relations discovered in the next subsection to group the related concept names together and determine their type based on the neighboring words of all the concepts in the group (called MajVot+Grouping). The accuracy is improved significantly to 0.963. Table 4b shows three of the 8 false cases among 215 predictions. Table 5 shows the number of concepts of each type we have for hierarchy induction. M3: Hierarchical relation extraction. Table 6 gives the number of relation tuples we extracted for each type. The relation synonym has the highest number of extractions while sibling gives the most unique concept pairs.

Results on Hierarchy Quality Evaluation
Evaluation metrics. Based on the manually labelled parent-to-child relations, we evaluate the quality of the resulting hierarchy with three standard IR metrics, precision, recall, and F1 score, on extracting concept pairs that have a 0-hop path (i.e., synonyms), a 1-hop path (i.e., "parent-tochild" relation), and a 2-hop path (i.e., ancestor relation as parent's parent). A higher score means better performance. Baseline method. It is not fair to compare with taxonomy construction methods because we are targeting a different problem, that is to generate a concept hierarchy of facets with three kinds of hierarchical relations. Therefore, we choose to compare with a hierarchy induction method, called TAXI (Panchenko et al., 2016), and we feed it with all the relations we mined so that we only compare on the performance of hierarchy induction algorithms. However, TAXI has no module to consider the sibling relations but we have the "NIL" mechanism. TAXI goes through all the relations several times, removes cycles, and links disconnected components to the root, while we consider the relation weights and generate the hierarchy in a growth manner for one scan. Therefore, compared with TAXI, HiGrowth is a more efficient algorithm on generating a facet concept hierarchy. Quality analysis. As shown in Table 7, Hi-Growth consistently outperforms TAXI on all three kinds of paths: it improves synonym detection by 3.4%, parent relation extraction by 27.8%, and 2hop ancestor relation extraction by from 0.31 to 0.69. Actually, the HiGrowth variant that disabled the generation and removal of "NIL" node can still outperform TAXI because the hierarchy grows with relations from the most confident to the least confident. With the "NIL" nodes, HiGrowth improves the 1-hop relation by 18.3% and 2-hop relation by 49.6%. This shows that it is important to carefully consider the sibling relations. Figure 7 presents redundant links that HiGrowth skipped or removed when adding a new relation ancestor(X, Y ) for each of the three situations, respectively. The most common situation is that, we have ancestor(A, X) and ancestor(A, Y ) in the hierarchy, and now we have a new link to specify the relation between X and Y , two descendants of A. If X is an ancestor of Y , we remove the redundant link ancestor(A, Y ). We can see a few examples of the 93 redundant relations. A is a more gen-   Figure 7: The redundant links that the HiGrowth algorithm removed during hierarchy construction. eral (ancestor-level) concept. The frequency of A is often higher than the frequency of X or Y . The weights of ancestor(A, X) and ancestor(A, Y ) are bigger than the weight of ancestor(X, Y ). So the latter relation will be added to the hierarchy when the other two have been on the hierarchy. Figure 8 presents the concept hierarchy that Hi-Growth extracted from the Data Science publications. The hierarchy is not very large but still not visible in one page, so we enlarge three parts of the hierarchy, including (1) a set of concepts as the "measures" facet of "binary classification," (2) the "applications" and "algorithms" facets of the concept "classification," and (3) the "algorithms" of "community detection," the "techniques" of "matrix factorization," and the "methods" of "feature extraction" and "dimensionality reduction." We represent the relations of synonyms by adding different surface names for same entities in one node. For example, "topic models" and "topic model" are merged into one node in Figure 8 because they have the same semantic meaning.

Scientific Concept Extraction
Scientific concept extraction is a fundamental task (Yu et al., 2019;. It has been widely studied on multiple kinds of text sources such as web documents (Parameswaran et al., 2010), business documents (Ménard and Ratté, 2016), clinical documents (Jonnalagadda et al., 2012), material science documents (Kim et al., 2017), and computer science publications (Upadhyay et al., 2018). The phrase mining technologies have been evolving from noun phrase analysis (Evans and Zhai, 1996) to recently popular representation learning methods (Mikolov et al., 2013;Pennington et al., 2014). Here we combined two methodologies that have been demonstrated to be effective in Science IE (Gábor et al., 2018).

Hierarchical Relation Extraction
There has been unsupervised methods on hypernym discovery and synonym detection (Weeds et al., 2014): In this work, we combine precise textual patterns, not only the syntactic patterns (Snow et al., 2005) but also the typed patterns (Nakashole et al., 2012;Wang et al., 2019) to find synonyms and hypernyms. We consider hypernyms carefully as ancestor-to-descendant instead of parent-to-child relations. Synonyms are on the same node, and hypernyms are connected via one-or multi-hop path. Moreover, we extract the sibling relations which precisely describe the nodes on the same level. All the three types of relation tuples are important for inferring concept hierarchies.

Hierarchy Construction and Population
There are two kinds of hierarchy construction methods: one is taxonomy or ontology induction that infers "isA" relations by machine learning models (Kozareva and Hovy, 2010;Yang et al., 2015;Cimiano and Staab, 2005), and the other is topical hierarchy discovery that organizes phrases into topical groups and then infers hierarchical connections between the topical groups (Wang et al., 2015;Jiang et al., 2017). For the first kind of approaches, researchers used syntactic contextual evidence (Anh et al., 2014), belief propagation for population (Bansal et al., 2014), and embedding-based inference (Fu et al., 2014;Nguyen et al., 2014). For the second part, poincaré embedding and ontology embedding methods have been proposed to learn node representations from existing hierarchies (Nickel and Kiela, 2017;. None of the existing approaches aimed at inferring parent-to-child relations based on the three types of hierarchical relations (i.e., synonym, ancestor-to-descendant, and sibling). We propose a novel hierarchy growth algorithm that addresses the issues of noisy, redundant, and invalid links.

Conclusions
This paper presented the HiGrowth method that constructs a faceted concept hierarchy from literature data. The major focus is on growing a hierarchy from three kinds of hierarchical relations that were extracted by pattern-based IE and weighted by their frequency. The hierarchy growth algorithm handles unnecessary, invalid and redundant links, even the relation set is noisy at the long tail.   Figure 8: The resulting faceted concept hierarchy we extracted from Data Science publications, nodes mean the entities with different surface names (synonyms).