Multilingual Knowledge Graph Completion via Ensemble Knowledge Transfer

Predicting missing facts in a knowledge graph(KG) is a crucial task in knowledge base construction and reasoning, and it has been the subject of much research in recent works us-ing KG embeddings. While existing KG embedding approaches mainly learn and predict facts within a single KG, a more plausible solution would benefit from the knowledge in multiple language-specific KGs, considering that different KGs have their own strengths and limitations on data quality and coverage. This is quite challenging since the transfer of knowledge among multiple independently maintained KGs is often hindered by the insufficiency of alignment information and inconsistency of described facts. In this paper, we propose kens, a novel framework for embedding learning and ensemble knowledge transfer across a number of language-specific KGs.KEnS embeds all KGs in a shared embedding space, where the association of entities is captured based on self-learning. Then, KEnS performs ensemble inference to com-bine prediction results from multiple language-specific embeddings, for which multiple en-semble techniques are investigated. Experiments on the basis of five real-world language-specific KGs show that, by effectively identifying and leveraging complementary knowledge, KEnS consistently improves state-of-the-art methods on KG completion.


Introduction
Knowledge graphs (KGs) store structured representations of real-world entities and relations, constituting actionable knowledge that is crucial to various knowledge-driven applications (Koncel-Kedziorski et al., 2019;Chen et al., 2018a;Bordes et al., 2014). Recently, extensive efforts have been invested in KG embedding models, which encode entities as low-dimensional vectors and capture relations as algebraic operations on entity vectors. These models provide a beneficial tool to complete KGs by discovering previously unknown knowledge from latent representations of observed facts. Representative models including translational models (Bordes et al., 2013;Wang et al., 2014) and bilinear models (Yang et al., 2015;Trouillon et al., 2016) have achieved satisfactory performance in predicting missing facts.
Existing methods mainly investigate KG completion within a single monolingual KG. As different language-specific KGs have their own strengths and limitations on data quality and coverage, we investigate a more natural solution, which seeks to combine embedding models of multiple KGs in an ensemble-like manner. This approach offers several potential benefits. First, embedding models of wellpopulated KGs (e.g. English KGs) are expected to capture richer knowledge because of better data quality and denser graph structures. Therefore, they would provide ampler signals to facilitate inferring missing facts on sparser KGs. Second, combining the embeddings allows exchanging complementary knowledge across different language-specific KGs. This provides a versatile way of leveraging specific knowledge that is better known in some KGs than the others. For example, consider the facts about the oldest Japanese novel The Tale of Genji. English DBpedia (Lehmann et al., 2015) only records its genre as Monogatari (story), whereas Japanese DBpedia identifies more genres, including Love Story, Royal Family Related Story, Monogatari and Literature-Novel. Similarly, it is reasonable to expect a Japanese KG embedding model to offer significant advantages in inferring knowledge about other Japanese cultural entities such as Nintendo and Mount Fuji. Moreover, ensemble inference provides a mechanism to assess the credibility of different knowledge sources and thus leads to a Despite the potential benefits, combining predictions from multiple KG embeddings represents a non-trivial technical challenge. On the one hand, knowledge transfer across different embeddings is hindered by the lack of reliable alignment information that bridges different KGs. Recent works on multilingual KG embeddings provide support for automated entity matching (Chen et al., 2017(Chen et al., , 2018bSun et al., 2018Sun et al., , 2020a. However, the performance of the state-of-the-art (SOTA) entity matching methods is still far from perfect (Sun et al., 2020a), which may cause erroneous knowledge transfer between two KGs. On the other hand, independently extracted and maintained languagespecific KGs may inconsistently describe some facts, therefore causing different KG embeddings to give inconsistent predictions and raising a challenge to identifying the trustable sources. For instance, while the English DBpedia strictly distinguishes the network of a TV series (e.g. BBC) from its channel (e.g. BBC One) with two separate relations, i.e., network and channel, the Greek DBpedia only uses channel to represent all of those. Another example of inconsistent information is that Chinese DBpedia labels the birth place of the ancient Chinese poet Li Bai as Sichuan, China, which is mistakenly recorded as Chuy, Kyrgyz in English DBpedia. Due to the rather independent extraction process of each KG, such inconsistencies are inevitable, calling upon a reliable approach to identify credible knowledge among various sources.
In this paper, we propose KEnS (Knowledge Ensemble), which, to the best of our knowledge, is the first ensemble framework of KG embedding models. Fig. 1 gives a depiction showing the ensemble inference process of KEnS. KEnS seeks to improve KG completion in a multilingual setting, by combining predictions from embedding models of multiple language-specific KGs and identifying the most probable answers from those prediction results that are not necessarily consistent. Experiments on five real-world language-specific KGs show that KEnS significantly improves SOTA fact prediction methods that solely rely on a single KG embedding. We also provide detailed case studies to interpret how a sparse, low-resource KG can benefit from embeddings of other KGs, and how exclusive knowledge in one KG can be broadcasted to others.

Related Work
We hereby discuss three lines of work that are closely related to this topic. Monolingual KG Embeddings. Monolingual KG embedding models embed entities and relations in a low-dimensional vector space and measure triple plausibility using these vectors. Translational models assess the plausibility of a triple (h, r, t) by the distance between two entity vectors h and t, after applying a relation-specific translation vector r. The representative models include TransE (Bordes et al., 2013) and its extensions TransD (Ji et al., 2015). Despite their simplicity, translational models achieve satisfactory performance on KG completion and are robust against the sparsity of data (Hao et al., 2019). RotatE (Sun et al., 2019b) employs a complex embedding space and models the relation r as the rotation instead of translation of the complex vector h toward t, which leads to the SOTA performance on KG embedding. There are also various methods falling into the groups of Bilinear models such as RESCAL (Nickel et al., 2011) and DistMult (Yang et al., 2015), as well as neural models like HolE (Nickel et al., 2016) and ConvE (Dettmers et al., 2018). Due to the large body of work in this line of research, we only provide a highly selective summary here. Interested readers are referred to recent surveys (Wang et al., 2017;Ji et al., 2020) for more information.
Multilingual KG Embeddings. Recent studies have extended embedding models to bridge multiple KGs, typically for KGs of multiple languages. MTransE (Chen et al., 2017) jointly learns a transformation across two separate translational embedding spaces along with the KG structures. BootEA (Sun et al., 2018) introduces a bootstrapping approach to iteratively propose new alignment labels to enhance the performance. MuGNN  encodes KGs via multi-channel Graph Neural Network to reconcile the structural differences. Some others also leverage side information to enhance the alignment performance, including entity descriptions (Chen et al., 2018b;), attributes (Trsedya et al., 2019Sun et al., 2017;Yang et al., 2019), neighborhood information (Wang et al., 2018;Yang et al., 2015;Sun et al., 2019aSun et al., , 2020a and degree centrality measures (Pei et al., 2019). A systematic summary of relevant approaches is given in a recent survey by Sun et al. (2020b). Although these approaches focus on the KG alignment that is different from the problem we tackle here, such techniques can be leveraged to support entity matching between KGs, which is a key component of our framework.
Ensemble methods. Ensemble learning has been widely used to improve machine learning results by combining multiple models on the same task. Representative approaches include voting, bagging (Breiman, 1996), stacking (Wolpert, 1992 and boosting (Freund and Schapire, 1997). Boosting methods seek to combine multiple weak models into a single strong model, particularly by learning model weights from the sample distribution.
Representative methods include AdaBoost (Freund and Schapire, 1997) and RankBoost (Freund et al., 2004), which target at classification and ranking respectively. AdaBoost starts with a pool of weak classifiers and iteratively selects the best one based on the sample weights in that iteration. The final classifier is a linear combination of the selected weak classifiers, where each classifier is weighted by its performance. In each iteration, sample weights are updated according to the selected classifier so that the subsequent classifiers will focus more on the hard samples. RankBoost seeks to extend AdaBoost to ranking model combination. The model weights are learned from the ranking performance in a boosting manner. In this paper, we extend RankBoost to combine ranking results from multiple KG embedding models. This technique addresses KG completion by combining knowledge from multiple sources and effectively compensates for the inherent errors in any entity matching processes.

Method
In this section, we introduce KEnS, an embeddingbased ensemble inference framework for multilingual KG completion.
KEnS conducts two processes, i.e. embedding learning and ensemble inference. The embedding learning process trains the knowledge model that encodes entities and relations of every KG in a shared embedding space, as well as the alignment model that seizes the correspondence in different KGs and enables the projection of queries and answers across different KG embeddings. The ensemble inference process combines the predictions from multiple KG embeddings to improve fact prediction. Particularly, to assess the confidence of predictions from each source, we introduce a boosting method to learn entity-specific weights for knowledge models.

Preliminaries
A KG G consists of a set of (relational) facts {(h, r, t)}, where h and t are the head and tail entities of the fact (h, r, t), and r is a relation. Specifically, h, t ∈ E (the set of entities in G), and r ∈ R (the set of relations). To cope with KG completion, the fact prediction task seeks to fill in the right entity for the missing head or tail of an unseen triple. Without loss of generality, we hereafter discuss the case of predicting missing tails. We refer to a triple with a missing tail as a query q = (h, r, ?t). The answer set Ω(q) consists of all the right entities that fulfill q. For example, we may have a query (The Tale of Genji, genre, ?t), and its answer set will include Monogatari, Love Story, etc.
Given KGs in M languages G 1 , G 2 , . . . , G M (|E i | ≤ |E j |, i < j), we seek to perform fact prediction on each of those by transferring knowledge from the others. We consider fact prediction as a ranking task in the KG embedding space, which is to transfer the query to external KGs and to combine predictions from multiple embedding models into a final ranking list. Particularly, given the existing situation of the major KGs, we use the following settings: (i) entity alignment information is available between any two KGs, though limited; and (ii) relations in different language-specific KGs are represented with a unified schema. The reason for the assumption is that unifying relations is usually feasible, since the number of relations is often much smaller compared to the enormous number of entities in KGs. This has been de facto achieved in a number of influential knowledge bases, including DBpedia (Lehmann et al., 2015), Wikidata (Vrandečić and Krötzsch, 2014) and YAGO (Rebele et al., 2016). In contrast, KGs often consist of numerous entities that cannot be easily aligned, and entity alignment is available only in small amounts.

Embedding Learning
The embedding learning process jointly trains the knowledge model and the alignment model following Chen et al. (2017), while self-learning is added to improve the alignment learning. The details are described below. Knowledge model. A knowledge model seeks to encode the facts of a KG in the embedding space. For each language-specific KG, it characterizes the plausibility of its facts. Notation-wise, we use boldfaced h, r, t as embedding vectors for head h, relation r and tail t respectively. The learning objective is to minimize the following margin ranking loss: (1) where [·] + = max(·, 0), and f is a model-specific triple scoring function. The higher score indicates the higher likelihood that the fact is true. γ is a hyperparameter, and (h , r, t ) is a negative sampled triple obtained by randomly corrupting either head or tail of a true triple (h, r, t).
We here consider two representative triple scoring techniques: TransE (Bordes et al., 2013) and RotatE (Sun et al., 2019b). TransE models relations as translations between head entities and tail entities in a Euclidean space, while RotatE models relations as rotations in a complex space. The triple scoring functions are defined as follows.
Hadamard product for complex vectors, and · 2 denotes L 2 norm. Alignment model. An alignment model is trained to match entity counterparts between two KGs on the basis of a small amount of seed entity alignment. We embed all KGs in one vector space and make each pair of aligned entities embedded closely. Given two KGs G i and G j with |E i | ≤ |E j |, the alignment model loss is defined as: where e i ∈ E i , e j ∈ E j and Γ G i ↔G j is the set of seed entity alignment between G j and G i . Assuming the potential inaccuracy of alignment, we do not directly assign the same vector to aligned entities of different language-specific KGs. Particularly, as the seed entity alignment is provided in small amounts, the alignment process conducts self-learning, where training iterations incrementally propose more training data on unaligned entities to guide subsequent iterations. At each iteration, if a pair of unaligned entities in two KGs are mutual nearest neighbors according to the CSLS measure (Conneau et al., 2018), KEnS adds this highly confident alignment to the training data. Learning objective. We conduct joint training of knowledge models for multiple KGs and alignment models between each pair of them via minimizing the following loss function: where J Gm K is the loss of the knowledge model on G m as defined in Eq (1), J G i ↔G j A is the alignment loss between G i and G j . λ is a positive hyperparameter that weights the two model components.
Following Chen et al. (2017), instead of directly optimizing J in Eq. (5), our implementation optimizes each J G K and each λJ G i ↔G j A alternately in separate batches. In addition, we enforce L 2regularization to prevent overfitting.

Ensemble Inference
We hereby introduce how KEnS performs fact prediction on multiple KGs via ensemble inference. Cross-lingual query and knowledge transfer.
To facilitate the process of completing KG G i with the knowledge from another KG G j , KEnS first predicts the alignment for entities between G i and G j . Then, it uses the alignment to transfer queries from G i to G j , and transfer the results back. Specifically, alignment prediction is done by performing an kNN search in the embedding space for each entity in the smaller KG (i.e. the one with fewer entities) and find the closest counterpart from the larger KG. Inevitably, some entities in the larger KG will not be matched with a counterpart due to the 1-to-1 constraint. In this case, we do not transfer queries and answers for that entity. Weighted ensemble inference. We denote the embedding models of G 1 , . . . , G M as f 1 , . . . , f M . On the target KG where we seek to make predictions, given each query, the entity candidates are ranked by the weighted voting score of the models: where e is an entity on the target KG, and w i (e) is an entity-specific model weight, N i (e) is 1 if e is ranked among top K by f i , otherwise 0. We propose three variants of KEnS that differ in the computing of w i (e), namely KEnS b , KEnS v and KEnS m . Specifically, KEnS b learns an entityspecific weight w i (e) for each entity in a boosting manner, KEnS v fixes w i (e) = 1 for all f i and e (i.e. majority voting), and KEnS m adopts mean reciprocal rank (MRR) of f i on the validation set of the target KG as w i (e). We first present the technical details of the boosting-based KEnS b .

Boosting Based Weight Learning
KEnS b seeks to learn model weights for ranking combination, which aims at reinforcing correct beliefs and compensating for alignment error. An embedding model that makes more accurate predictions should receive a higher weight. Inspired by RankBoost (Freund et al., 2004), we reduce the ranking combination problem to a classifier ensemble problem. KEnS b therefore learns model weights in a similar manner as AdaBoost. Validation queries and critical entity pairs. To compute entity-specific weights w i (e), KEnS b evaluates the performance of f i on a set of validation queries related to e. These queries are converted from all the triples in the validation set that mention e. An example of validation queries for the entity The Tale of Genji is given as below. Ranking loss. The overall objective of KEnS b is to minimize the sum of ranks of all correct answers in the combined ranking list q e∈Ω(q) r(e), where Ω(q) is the answer set of query q and r(e) is the rank of entity e in the combined ranking list of the ensemble inference. Essentially, the above objective is minimizing the number of mis-ordered critical entity pairs in the combined ranking list. Let the set of all the critical entity pairs from all the validation queries of an entity as P . Freund et al. (2004) have proved that, when using RankBoost, this ranking loss is bounded as follows: where M is the number of KGs and therefore the maximum number of rounds in boosting. Z m is the weighted ranking loss of the m-th round: where p m = 1 if the critical entity pair p is ranked in correct order by the selected embedding model in the m-th round, otherwise p m = −1, D m (p) is the weight of the critical entity pair p in the m-th round, and w m is the weight of the chosen model in that round. Now the ranking combination problem is reduced to a common classifier ensemble problem. Boosting procedure. The boosting process alternately repeats two steps: (i) Evaluate the ranking performance of the embedding models and choose the best one f m according to the entity pair weight distribution in that round; (ii) Update entity pair weights to put more emphasis on the pairs which f m ranks incorrectly. Entity pair weights are initialized uniformly over P as D 1 (p) = 1 |P | , p ∈ P . In the m-th round (m = 1, 2, ..., M ), KEnS b chooses an embedding model f m and sets its weight w m , seeking to minimize the weighted ranking loss Z m defined in Eq.(7). By simple calculus, when choosing the embedding model f i as the model of the m-th round, w m i should be set as follows to minimize Z m : As we can see from Eq. (8), the higher w m i indicates the better performance of f i under the current entity pair weight distribution D m . We select the best embedding model in the m-th round f m based on the maximum weight w m = max{w m 1 , ..., w m M }. After choosing the best model f m at this iteration, we update the entity pair weight distribution to put more emphasis on what f m ranked wrong. The new weight distribution D m+1 is updated as: where Z m works as a normalization factor. KEnS b decreases the weight of D(p) if the selected  model ranks the entity pair in correct order and increases the weight otherwise. Thus, D(p) will tend to concentrate on the pairs whose relative ranking is hardest to determine.
For queries related to a specific entity, this process is able to recognize the embedding models that perform well on answering those queries and rectify the mistakes made in the previous iteration.

Other Ensemble Techniques
We also investigate two other model variants with simpler ensemble techniques. Majority vote (KEnS v ): A straightforward ensemble method is to re-rank entities by their nomination counts in the prediction of all knowledge models, which substitutes the voting score (Eq. 6) with is 1 if e is ranked among the top K by the knowledge model f i , otherwise 0. When there is a tie, we order by the MRR given by the models on the validation set. MRR weighting (KEnS m ): MRR is a widely-used metric for evaluating the ranking performance of a model (Bordes et al., 2013;Yang et al., 2015;Trouillon et al., 2016), which may also serve as a weight metric for estimating the prediction confidence of each language-specific embedding in ensemble inference (Shen et al., 2017). Let the MRR of f i be u i on the validation set, the entities are ranked according to the weighted voting score

Experiments
In this section, we conduct the experiment of fact prediction by comparing KEnS variants with various KG embeddings. We also provide a detailed case study to help understand the principle of ensemble knowledge transfer.

Experiment Settings
To the best of our knowledge, existing datasets for fact prediction contain only one monolingual KG or bilingual KGs. Hence, we prepared a new dataset DBP-5L, which contains five languagespecific KGs extracted from English (EN), French (FR), Spanish (ES) and Japanese (JA) and Greek   (EL) DBpedia (Lehmann et al., 2015). Table 1 lists the statistics of the contributed dataset DBP-5L. The relations of the five KGs are represented in a unified schema, which is consistent with the problem definition in Section 3.1. The English KG is the most populated one among the five. To produce KGs with a relatively consistent set of entities, we induce the subgraphs by starting from a set of seed entities where we have alignment among all language-specific KGs and then incrementally collecting triples that involve other entities. Eventually between any two KGs, the alignment information covers around 40% of entities. Based on the same set of seed entities, the Greek KG ends up with a notably smaller vocabulary and fewer triples than the other four. We split the facts in each KG into three parts: 60% for training, 30% for validation and weight learning, and 10% for testing. Experimental setup. We use the Adam (Kingma and Ba, 2014) as the optimizer and fine-tune the hyper-parameters by grid search based on Hits@1 on the validation set. We select among the following sets of hyper-parameter values: learning rate lr ∈ {0.01, 0.001, 0.0001}, dimension d ∈ {64, 128, 200, 300}, batch size b ∈ {256, 512, 1024}, and TransE margin γ ∈ {0.3, 0.5, 0.8}. The best setting is {lr = 0.001, d = 300, b = 256} for KEnS(TransE) and {lr = 0.01, d = 200, b = 512} for KEnS(RotatE). The margin for TransE is 0.3. The L 2 regularization coefficient is fixed as 0.0001. Evaluation protocol. For each test case (h, r, t), we consider it as a query (h, r, ?t) and retrieve top K prediction results for ?t. We compare the propor-tion of queries with correct answers ranked within top K retrieved entities. We report three metrics with K as 1, 3, 10. Hits@1 is equivalent to accuracy. All three metrics are preferred to be higher. Although another common metric, Mean Reciprocal Rank (MRR), has been used in previous works (Bordes et al., 2013), it is not applicable to the evaluation of our framework because our ensemble framework combines the top entity candidates from multiple knowledge models and yields top K final results without making any claims for entities out of this scope. Following previous works, we use the "filtered" setting with the premise that the candidate space has excluded the triples that have been seen in the training set (Wang et al., 2014). Competitive methods. We compare six variants of KEnS, which are generated by combining two knowledge models and three ensemble inference techniques introduced in in Section 3. For baseline methods, besides the single-embedding TransE (Bordes et al., 2013) and RotatE (Sun et al., 2019b), we also include DistMult (Yang et al., 2015), TransD (Ji et al., 2015), and HolE (Nickel et al., 2016). After extensive hyperparameter tuning, the baselines are set to their best configurations. We also include a baseline named RotatE+PARIS, which trains RotatE on 5 KGs and uses the representative non-embedding symbolic entity alignment tool PARIS (Suchanek et al., 2011) for entity matching. PARIS delivered entity matching predictions for 58%-62% entities in the English, French, and Spanish KG, but almost no matches are delivered for entities in the Greek and Japanese KG, since PARIS mainly relies on entity label similarity.  The results on the Greek and Japanese KG are thus omitted for RotatE+PARIS.

Main Results
The results are reported in Table 2. As shown, the ensemble methods by KEnS lead to consistent improvement in fact prediction. Overall, the ensemble inference leads to 1.1%-13.0% of improvement in Hits@1 over the best baseline methods. The improved accuracy shows that it is effective to leverage complementary knowledge from external KGs for KG completion. We also observe that KEnS brings larger gains on sparser KGs than on the well-populated ones. Particularly, on the low-resource Greek KG, KEnS b (RotatE) improves Hits@1 by as much as 13.0% over its single-KG counterpart. This finding corroborates our intuition that the KG with lower knowledge coverage and sparser graph structure benefits more from complementary knowledge.
Among the variants of ensemble methods, KEnS m offers better performance than KEnS v , and KEnS b outperforms the other two in general. For example, on the Japanese KG, KEnS v (TransE) improves Hits@1 by 3.5% from the single-KG TransE, while KEnS m leads to a 5.0% increase, and KEnS b further provides a 5.6% of improvement. The results suggest that MRR is an effective measure of the trustworthiness of knowledge models during ensemble inference. Besides, KEnS b is able to assess trustworthiness at a finer level of granularity by learning entity-specific model weights and can thus further improve the performance.
In summary, the promising results by KEnS variants show the effectiveness of transferring and leveraging cross-lingual knowledge for KG completion. Among the ensemble techniques, the boosting technique represents the most suitable one for combining the prediction results from different models.

Case Studies
In this section, we provide case studies to show how KEnS is able to transfer cross-lingual knowledge to populate different KGs. Model weights. The key to the significantly enhanced performance of KEnS b is the effective combination of multilingual knowledge from multiple sources. Fig 2 shows the average model weight learnt by KEnS b (TransE), which depicts how external knowledge from cross-lingual KGs contributes to target KG completion in general. The model weights imply that sparser KGs benefit more from the knowledge transferred from others. Particularly, when predicting for the Greek KG, the weights of other languages sums up to 81%. This observation indicates that the significant boost received on the Greek KG comes with the fact that it has accepted the most complementary knowledge from others. In contrast, when predicting on the most populated English KG, the other language-specific models give a lesser total weight of 57%.
Among the three KEns variants, the superiority of KEnS b is attributed to identification of more credible knowledge sources, thus making more accurate predictions. For language-specific KGs, the higher level of credibility often stems from the cultural advantage the KG has over the entity. Fig 3  presents the model weights for 6 culture-related entities learned by KEnS b (TransE). It shows that KEns can locate the language-specific knowledge model that has a cultural advantage and assign it with a higher weight, which is the basis of an accurate ensemble prediction. Ensemble inference. To help understand how the combination of multiple KGs improves KG completion and show the effectiveness of leveraging complementary culture-specific knowledge , we present a case study about predicting the fact (Nintendo, industry, ?t) for English KG. Table 3 lists the top 3 predicted tails yielded by the KEnS(TransE) variants, along with those by the English knowledge model and supporter knowledge models before ensemble. The predictions made by the Japanese KG are the closest to the ground truths. The reason may be that Japanese KG has documented much richer knowledge about this Japanese video game company, including many of the video games that this com-pany has released. Among the three KEnS variants, KEnS b correctly identifies Japanese as the most credible source and yields the best ranking.

Conclusion
In this paper, we have proposed a new ensemble prediction framework aiming at collaboratively predicting unseen facts using embeddings of different language-specific KGs. In the embedding space, our approach jointly captures both the structured knowledge of each KG and the entity alignment that bridges the KGs. The significant performance improvements delivered by our model on the task of KG completion were demonstrated by extensive experiments. This work also suggests promising directions of future research. One is to exploit the potential of KEnS on completing low-resource KGs, and the other is to extend the ensemble transfer mechanism to population sparse domain knowledge in biological (Hao et al., 2020) and medical knowledge bases (Zhang et al., 2020). Pariticularly, we also seek to ensure the global logical consistency of predicted facts in the ensemble process by incorporating probabilistic constraints .
In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 5429-5435. International Joint Conferences on Artificial Intelligence Organization. Tianran Zhang, Muhao Chen, and Alex Bui. 2020. Diagnostic prediction with sequence-of-sets representation learning for clinical event. In Proceedings of the 18th International Conference on Artificial Intelligence in Medicine (AIME). Springer.