Neighborhood Matching Network for Entity Alignment

Structural heterogeneity between knowledge graphs is an outstanding challenge for entity alignment. This paper presents Neighborhood Matching Network (NMN), a novel entity alignment framework for tackling the structural heterogeneity challenge. NMN estimates the similarities between entities to capture both the topological structure and the neighborhood difference. It provides two innovative components for better learning representations for entity alignment. It first uses a novel graph sampling method to distill a discriminative neighborhood for each entity. It then adopts a cross-graph neighborhood matching module to jointly encode the neighborhood difference for a given entity pair. Such strategies allow NMN to effectively construct matching-oriented entity representations while ignoring noisy neighbors that have a negative impact on the alignment task. Extensive experiments performed on three entity alignment datasets show that NMN can well estimate the neighborhood similarity in more tough cases and significantly outperforms 12 previous state-of-the-art methods.


Introduction
By aligning entities from different knowledge graphs (KGs) to the same real-world identity, entity alignment is a powerful technique for knowledge integration. Unfortunately, entity alignment is nontrivial because real-life KGs are often incomplete and different KGs typically have heterogeneous schemas. Consequently, equivalent entities from two KGs could have distinct surface forms or dissimilar neighborhood structures.
In recent years, embedding-based methods have become the dominated approach for entity alignment (Zhu et al., 2017;Pei et al., 2019a;Cao et al., 2019;Li et al., 2019a;   2020). Such approaches have the advantage of not relying on manually constructed features or rules (Mahdisoltani et al., 2015). Using a set of seed alignments, an embedding-based method models the KG structures to automatically learn how to map the equivalent entities among different KGs into a unified vector space where entity alignment can be performed by measuring the distance between the embeddings of two entities.
The vast majority of prior works in this direction build upon an important assumption -entities and their counterparts from other KGs have similar neighborhood structures, and therefore, similar embeddings will be generated for equivalent entities. Unfortunately, the assumption does not always hold for real-life scenarios due to the incompleteness and heterogeneities of KGs. As an example, consider Figure 1 (a), which shows two equivalent entities from the Chinese and English versions of Wikipedia. Here, both central entities refer to the same real-world identity, Brooklyn, a borough of New York City. However, the two entities have different sizes of neighborhoods and distinct topological structures. The problem of dissimilar neighborhoods between equivalent entities is ubiquitous. Sun et al. (2020) reports that the majority of equivalent entity pairs have different neighbors in the benchmark datasets DBP15K, and the proportions of such entity pairs are over 86% (up to 90%) in different language versions of DBP15K. Particularly, we find that the alignment accuracy of existing embedding-based methods decreases significantly as the gap of equivalent entities' neighborhood sizes increases. For instance, RDGCN (Wu et al., 2019a), a state-of-the-art, delivers an accuracy of 59% on the Hits@1 score on entity pairs whose number of neighbors differs by no more than 10 on DBP15K ZH−EN . However, its performance drops to 42% when the difference for the number of neighbors increases to 20 and to 35% when the difference increases to be above 30. The disparity of the neighborhood size and topological structures pose a significant challenge for entity alignment methods.
Even if we were able to set aside the difference in the neighborhood size, we still have another issue. Since most of the common neighbors would be popular entities, they will be neighbors of many other entities. As a result, it is still challenging to align such entities. To elaborate on this point, let us now consider Figure 1 (b). Here, the two central entities (both indicate the city Liverpool) have similar sizes of neighborhoods and three common neighbors. However, the three common neighbors (indicate United Kingdom, England and Labour Party (UK), respectively) are not discriminative enough. This is because there are many city entities for England which also have the three entities in their neighborhoods -e.g., the entity Birmingham. For such entity pairs, in addition to common neighbors, other informative neighbors -like those closely contextually related to the central entities -must be considered. Because existing embedding-based methods are unable to choose the right neighbors, we need a better approach.
We present Neighborhood Matching Network (NMN), a novel sampling-based entity alignment framework. NMN aims to capture the most informative neighbors and accurately estimate the similarities of neighborhoods between entities in different KGs. NMN achieves these by leveraging the recent development in Graph Neural Networks (GNNs). It first utilizes the Graph Convolutional Networks (GCNs) (Kipf and Welling, 2017) to model the topological connection information, and then selectively samples each entity's neighborhood, aiming at retaining the most informative neighbors towards entity alignment. One of the key challenges here is how to accurately estimate the similarity of any two entities' sampled neighborhood. NMN addresses this challenge by designing a discriminative neighbor matching module to jointly compute the neighbor differences between the sampled subgraph pairs through a cross-graph attention mechanism. Note that we mainly focus on the neighbor relevance in the neighborhood sampling and matching modules, while the neighbor connections are modeled by GCNs. We show that, by integrating the neighbor connection information and the neighbor relevance information, NMN can effectively align entities from real-world KGs with neighborhood heterogeneity.
We evaluate NMN by applying it to benchmark datasets DBP15K  and DWY100K (Sun et al., 2018), and a sparse variant of DBP15K. Experimental results show that NMN achieves the best and more robust performance over state-ofthe-arts. This paper makes the following technical contributions. It is the first to: • employ a new graph sampling strategy for identifying the most informative neighbors towards entity alignment (Sec. 3.3).
• exploit a cross-graph attention-based matching mechanism to jointly compare discriminative subgraphs of two entities for robust entity alignment (Sec. 3.4).

Related Work
Embedding-based entity alignment. In recent years, embedding-based methods have emerged as viable means for entity alignment. Early works in the area utilize TransE (Bordes et al., 2013) to embed KG structures, including MTransE (Chen et al., 2017), JAPE , IPTransE (Zhu et al., 2017), BootEA (Sun et al., 2018), NAEA (Zhu et al., 2019) and OTEA (Pei et al., 2019b). Some more recent studies use GNNs to model the structures of KGs, including GCN-Align (Wang et al., 2018), GMNN , RDGCN (Wu et al., 2019a), AVR-GCN (Ye et al., 2019), and HGCN-JE (Wu et al., 2019b). Besides the structural information, some recent methods like KDCoE (Chen et al., 2018), AttrE (Trisedya et al., 2019), MultiKE  and HMAN (Yang et al., 2019) also utilize additional information like Wikipedia entity descriptions and attributes to improve entity representations. However, all the aforementioned methods ignore the neighborhood heterogeneity of KGs. MuGNN (Cao et al., 2019) and AliNet (Sun et al., 2020) are two most recent efforts for addressing this issue. While promising, both models still have drawbacks. MuGNN requires both pre-aligned entities and relations as training data, which can have expensive overhead for training data labeling. AliNet considers all one-hop neighbors of an entity to be equally important when aggregating information. However, not all one-hop neighbors contribute positively to characterizing the target entity. Thus, considering all of them without careful selection can introduce noise and degrade the performance. NMN avoids these pitfalls. With only a small set of pre-aligned entities as training data, NMN chooses the most informative neighbors for entity alignment.
Graph neural networks. GNNs have recently been employed for various NLP tasks like semantic role labeling  and machine translation (Bastings et al., 2017). GNNs learn node representations by recursively aggregating the representations of neighboring nodes. There are a range of GNN variants, including the Graph Convolutional Network (GCN) (Kipf and Welling, 2017), the Relational Graph Convolutional Network (Schlichtkrull et al., 2018), the Graph Attention Network (Veličković et al., 2018). Giving the powerful capability for modeling graph structures, we also leverage GNNs to encode the structural information of KGs (Sec. 3.2).
Graph matching. The similarity of two graphs can be measured by exact matching (graph isomorphism) (Yan et al., 2004) or through structural information like the graph editing distance (Raymond et al., 2002). Most recently, the Graph Matching Network (GMN) (Li et al., 2019b) computes a similarity score between two graphs by jointly reasoning on the graph pair through cross-graph attention-based matching. Inspired by GMN, we design a cross-graph neighborhood matching module (Sec. 3.4) to capture the neighbor differences between two entities' neighborhoods.
Graph sampling. This technique samples a subset of vertices or edges from the original graph. Some of the popular sampling approaches include vertex-, edge-and traversal-based sampling (Hu and Lau, 2013). In our entity alignment framework, we propose a vertex sampling method to select informative neighbors and to construct a neighborhood subgraph for each entity.

Our Approach
Formally, we represent a KG as G = (E, R, T ), where E, R, T denote the sets of entities, relations and triples respectively. Without loss of generality, we consider the task of entity alignment between two KGs, G 1 and G 2 , based on a set of pre-aligned equivalent entities. The goal is to find pairs of equivalent entities between G 1 and G 2 .

Overview of NMN
As highlighted in Sec. 1, the neighborhood heterogeneity and noisy common neighbors of real-world KGs make it difficult to capture useful information for entity alignment. To tackle these challenges, NMN first leverages GCNs to model the neighborhood topology information. Next, it employs neighborhood sampling to select the more informative neighbors. Then, it utilizes a cross-graph matching module to capture neighbor differences.
As depicted in Figure 2, NMN takes as input two KGs, G 1 and G 2 , and produces embeddings for each candidate pair of entities, e 1 and e 2 , so that entity alignment can be performed by measuring the distance, d(e 1 , e 2 ), of the learned embeddings. It follows a four-stage processing pipeline: (1) KG structure embedding, (2) neighborhood sampling, (3) neighborhood matching, and (4) neighborhood aggregation for generating embeddings.

KG Structure Embedding
To learn the KG structure embeddings, NMN utilizes multi-layered GCNs to aggregate higher degree neighboring structural information for entities.
NMNs uses pre-trained word embeddings to initialize the GCN. This strategy is shown to be effective in encoding the semantic information of entity names in prior work Wu et al., 2019a). Formally, let G 1 = (E 1 , R 1 , T 1 ) and G 2 = (E 2 , R 2 , T 2 ) be two KGs to be aligned, we put G 1 and G 2 together as one big input graph to NMN. Each GCN layer takes a set of node features as input and updates the node representations as: where {h To control the accumulated noise, we also introduce highway networks (Srivastava et al., 2015) to GCN layers, which can effectively control the noise propagation across GCN layers (Rahimi et al., 2018;Wu et al., 2019b).

Neighborhood Sampling
The one-hop neighbors of an entity are key to determine whether the entity should be aligned with other entities. However, as we have discussed in Sec. 1, not all one-hop neighbors contribute positively for entity alignment. To choose the right neighbors, we apply a down-sampling process to select the most informative entities towards the central target entity from its one-hop neighbors.
Recall that we use pre-trained word embeddings of entity names to initialize the input node features of GCNs. As a result, the entity embeddings learned by GCNs contain rich contextual information for both the neighboring structures and the entity semantics. NMN exploits such information to sample informative neighbors, i.e., neighbors that are more contextually related to the central entity are more likely to be sampled. Our key insight is that the more often a neighbor and the central (or target) entity appear in the same context, the more representative and informative the neighbor is towards the central entity. Since the contexts of two equivalent entities in real-world corpora are usually similar, the stronger a neighbor is contextually related to the target entity, the more alignment clues the neighbor is likely to offer. Experimental results in Sec. 5.3 confirm this observation.
Formally, given an entity e i , the probability to sample its one-hop neighbor e i j is determined by: where N i is the one-hop neighbor index of central entity e i , h i and h i j are learned embeddings for entities e i and e i j respectively, and W s is a shared weight matrix. By selectively sampling one-hop neighbors, NMN essentially constructs a discriminative subgraph of neighborhood for each entity, which can enable more accurate alignment through neighborhood matching.

Neighborhood Matching
The neighborhood subgraph, produced by the sampling process, determines which neighbors of the target entity should be considered in the later stages. In other words, later stages of the NMN processing pipeline will only operate on neighbors within the subgraph. In the neighborhood matching stage, we wish to find out, for each candidate entity in the counterpart KG, which neighbors of that entity are closely related to a neighboring node within the subgraph of the target entity. Such information is essential for deciding whether two entities (from two KGs) should be aligned.
As discussed in Sec. 3.3, equivalent entities tend to have similar contexts in real-world corpora; therefore, their neighborhoods sampled by NMN should be more likely to be similar. NMN exploits this observation to estimate the similarities of the sampled neighborhoods.
Candidate selection. Intuitively, for an entity e i in E 1 , we need to compare its sampled neighborhood subgraph with the subgraph of each candidate entity in E 2 to select an optimal alignment entity. Exhaustively trying all possible entities of E 2 would be prohibitively expensive for large real-world KGs. To reduce the matching overhead, NMN takes a low-cost approximate approach. To that end, NMN first samples an alignment candidate set C i = {c i 1 , c i 2 , ..., c it |c i k ∈ E 2 } for e i in E 1 , and then calculates the subgraph similarities between e i and these candidates. This is based on an observation that the entities in E 2 which are closer to e i in the embedding space are more likely to be aligned with e i . Thus, for an entity e j in E 2 , the probability that it is sampled as a candidate for e i can be calculated as: Cross-graph neighborhood matching. Inspired by recent works in graph matching (Li et al., 2019b), our neighbor matching module takes a pair of subgraphs as input, and computes a cross-graph matching vector for each neighbor, which measures how well this neighbor can be matched to any neighbor node in the counterpart. Formally, let (e i , c i k ) be an entity pair to be measured, where e i ∈ E 1 and c i k ∈ E 2 is one of the candidates of e i , p and q are two neighbors of e i and c i k , respectively. The cross-graph matching vector for neighbor p can be computed as: where a pq are the attention weights, m p is the matching vector for p, and it measures the difference between h p and its closest neighbor in the other subgraph, N s i k is the sampled neighbor set of c i k , h p and h q are the GCN-output embeddings for p and q respectively.
Then, we concatenate neighbor p's GCN-output embeddings with weighted matching vector m p : For each target neighbor in a neighborhood subgraph, the attention mechanism in the matching module can accurately detect which of the neighbors in the subgraph of another KG is most likely to match the target neighbor. Intuitively, the matching vector m p captures the difference between the two closest neighbors. When the representations of the two neighbors are similar, the matching vector tends to be a zero vector so that their representations stay similar. When the neighbor representations differ, the matching vector will be amplified through propagation. We find this matching strategy works well for our problem settings.

Neighborhood Aggregation
In the neighborhood aggregation stage, we combine the neighborhood connection information (learned at the KG structure embedding stage) as well as the output of the matching stage (Sec. 3.4) to generate the final embeddings used for alignment.
Specifically, for entity e i , we first aggregate its sampled neighbor representations {ĥ p }. Inspired by the aggregation method in (Li et al., 2016), we compute a neighborhood representation for e i as: Then, we concatenate the central entity e i 's GCN-output representation h i with its neighborhood representation to construct the matching oriented representation for e i :

Entity Alignment and Training
Pre-training. As discussed in Sec. 3.3, our neighborhood sampling is based on the GCNoutput entity embeddings. Therefore, we first pretrain the GCN-based KG embedding model to produce quality entity representations. Specifically, we measure the distance between two entities to determine whether they should be aligned: The objective of the pre-trained model is: (10) where γ > 0 is a margin hyper-parameter; L is our alignment seeds and L is the set of negative aligned entity pairs generated by nearest neighbor sampling (Kotnis and Nastase, 2017).
Overall training objective. The pre-training phase terminates once the entity alignment performance has converged to be stable. We find that after this stage, the entity representations given by the GCN are sufficient for supporting the neighborhood sampling and matching modules. Hence, we replace the loss function of NMN after the pretraining phase as: where the negative alignments set C = {(r , t )|(r = r ∧ t ∈ C r ) ∨ (t = t ∧ r ∈ C t )} is made up of the alignment candidate sets of r and t, C r and C t are generated in the candidate selection stage described in Sec. 3.4. Note that our sampling process is nondifferentiable, which corrupts the training of weight matrix W s in Eq. 2. To avoid this issue, when training W s , instead of direct sampling, we aggregate all the neighbor information by intuitive weighted summation: where α ip is the aggregation weight for neighbor p, and is the sampling probability p(h p |h i ) for p given by Eq. 2. Since the aim of training W s is to let the learned neighborhood representations of aligned entities to be as similar as possible, the objective is: In general, our model is trained end-to-end after pre-training. During training, we use Eq. 11 as the main objective function, and, every 50 epochs, we tune W s using Eq. 14 as the objective function.

Experimental Setup
Datasets. Follow the common practice of recent works (Sun et al., 2018;Cao et al., 2019;Sun et al., 2020), we evaluate our model on DBP15K  and DWY100K (Sun et al., 2018) datasets, and use the same split with previous works, 30% for training and 70% for testing. To   evaluate the performance of NMN in a more challenging setting, we also build a sparse dataset S-DBP15K based on DBP15K. Specifically, we randomly remove a certain proportion of triples in the non-English KG to increase the difference in neighborhood size for entities in different KGs. Table 1 gives the detailed statistics of DBP15K and S-DBP15K, and the information of DWY100K is exhibited in Table 2. Figure 3 shows the distribution of difference in the size of one-hop neighborhoods of aligned entity pairs. Our source code and datasets are freely available online. 1 Comparison models. We compare NMN against 12 recently proposed embedding-based alignment methods: MTransE (Chen et al., 2017), JAPE , IPTransE (Zhu et al., 2017), GCN-Align (Wang et al., 2018), BootEA (Sun et al., 2018), SEA (Pei et al., 2019a), RSN , MuGNN (Cao et al., 2019), KECG (Li et al., 2019a), AliNet (Sun et al., 2020), GMNN  and RDGCN (Wu et al., 2019a). The last two models also utilize entity names for alignment. Implementation details. The configuration we use in the DBP15K and DWY100k datasets is: β = 0.1, γ = 1.0, and we sample 5 neigh-bors for each entity in the neighborhood sampling stage (Sec. 3.3). For S-DBP15K, we set β to 1. We sample 3 neighbors for each entity in S-DBP15K ZH−EN and S-DBP15K JA−EN , and 10 neighbors in S-DBP15K F R−EN . NMN uses a 2layer GCN. The dimension of hidden representations in GCN layers described in Sec. 3.2 is 300, and the dimension of neighborhood representation g i described in Sec. 3.5 is 50. The size of the candidate set in Sec. 3.4 is 20 for each entity. The learning rate is set to 0.001. To initialize entity names, for the DBP15K datasets, we first use Google Translate to translate all non-English entity names into English, and use pre-trained English word vectors glove.840B.300d 2 to construct the initial node features of KGs. For the DWY100K datasets, we directly use the pretrained word vectors to initialize the nodes.
Metrics. Following convention, we use Hits@1 and Hits@10 as our evaluation metrics. A Hits@k score is computed by measuring the proportion of correctly aligned entities ranked in the top k list. A higher Hits@k score indicates better performance. Table 3 reports the entity alignment performance of all approaches on DBP15K and DWY100K datasets. It shows that the full implementation of NMN significantly outperforms all alternative approaches.

Performance on DBP15K and DWY100K
Structured-based methods. The top part of the table shows the performance of the state-of-the-art structure-based models which solely utilize structural information. Among them, BootEA delivers the best performance where it benefits from more training instances through a bootstrapping process. By considering the structural heterogeneity, MuGNN and AliNet outperform most of other structure-based counterparts, showing the importance of tackling structural heterogeneity.
Entity name initialization. The middle part of Table 3 gives the results of embedding-based models that use entity name information along with structural information. Using entity names to initialize node features, the GNN-based models, GMNN and RDGCN, show a clear improvement over structure-based models, suggesting that entity names provide useful clues for entity alignment. In particular, GMNN achieves the highest Hits@10 on the DWY100K datasets, which are the only monolingual datasets (in English) in our experiments. We also note that, GMNN pre-screens a small candidate set for each entity based on the entity name similarity, and only traverses this candidate set during testing and calculating the Hits@k scores.
NMN vs. its variants. The bottom part of Table 3 shows the performance of NMN and its variants. Our full NMN implementation substantially outperforms all baselines across nearly all metrics and datasets by accurately modeling entity neighborhoods through neighborhood sampling and matching and using entity name information. Specifically, NMN achieves the best Hits@1 score on DBP15K ZH−EN , with a gain of 2.5% compared with RDGCN, and 5.4% over GMNN. Although RDGCN employs a dual relation graph to model the complex relation information, it does not address the issue of neighborhood heterogeneity. While GMNN collects all one-hop neighbors to construct a topic entity graph for each entity, its strategy might introduce noises since not all onehop neighbors are favorable for entity alignment.
When comparing NMN and NMN (w/o nbr-m), we can observe around a 2.5% drop in Hits@1 and a 0.6% drop in Hits@10 on average, after removing the neighborhood matching module. Specifically, the Hits@1 scores between NMN and NMN (w/o nbr-m) differ by 3.9% on DBP15K F R−EN . These results confirm the effectiveness of our neighborhood matching module in identifying matching neighbors and estimating the neighborhood similarity.
Removing the neighbor sampling module from NMN, i.e., NMN (w/o nbr-s), leads to an average performance drop of 0.3% on Hits@1 and 1% on Hits@10 on all the datasets. This result shows the important role of our sampling module in filtering irrelevant neighbors.
When removing either the neighborhood matching module (NMN (w/o nbr-m)) or sampling module (NMN (w/o nbr-s)) from our main model, we see a substantially larger drop in both Hits@1 and Hits@10 on DBP15K than on DWY100K. One reason is that the heterogeneity problem in DBP15K is more severe than that in DWY100K. The average proportion of aligned entity pairs that have a different number of neighbors is 89% in DBP15K compared to 84% in DWY100K. These results show   that our sampling and matching modules are particularly important, when the neighborhood sizes of equivalent entities greatly differ and especially there may be few common neighbors in their neighborhoods.

Performance on S-DBP15K
On the more sparse and challenging datasets S-DBP15K, we compare our NMN model with the strongest structure-based model, BootEA, and GNN-based models, GMNN and RDGCN, which also utilize the entity name initialization.
Baseline models. In Table 4, we can observe that all models suffer a performance drop, where BootEA endures the most significant drop. With the support of entity names, GMNN and RDGCN achieve better performances over BootEA. These results show when the alignment clues are sparse, structural information alone is not sufficient to support precise comparisons, and the entity name semantics are particularly useful for accurate alignment in such case.

NMN.
Our NMN outperforms all three baselines on all sparse datasets, demonstrating the effectiveness and robustness of NMN. As discussed in Sec. 1, the performances of existing embeddingbased methods decrease significantly as the gap of equivalent entities' neighborhood sizes increases.
Specifically, on DBP15K ZH−EN , our NMN outperforms RDGCN, the best-performing baseline, by a large margin, achieving 65%, 53% and 48% on Hits@1 on the entity pairs whose number of neighbors differs by more than 10, 20 and 30, respectively.
Sampling and matching strategies. When we compare NMN and NMN (w/o nbr-m) on the S-DBP15K, we can see a larger average drop in Hits@1 than on the DBP15K (8.2% vs. 3.1%). The result indicates that our neighborhood matching module plays a more important role on the more sparse dataset. When the alignment clues are less obvious, our matching module can continuously amplify the neighborhood difference of an entity pair during the propagation process. In this way, the gap between the equivalent entity pair and the negative pairs becomes larger, leading to correct alignment. Compared with NMN, removing sampling module does hurt NMN in both Hits@1 and Hits@10 on S-DBP15K ZH−EN . But, it is surprising that NMN (w/o nbr-s) delivers slightly better results than NMN on S-DBP15K JA−EN and S-DBP15K F R−EN . Since the average number of neighbors of entities in S-DBP15K is much less than that in the DBP15K datasets. When the number of neighbors is small, the role of sampling will be unstable. In addition, our sampling method is relatively simple. When the alignment clues are very sparse, our strategy may not be robust enough. We will explore more adaptive sampling method and scope in the future.

Analysis
Impact of neighborhood sampling strategies.
To explore the impact of neighborhood sampling strategies, we compare our NMN with a variant that uses random sampling strategy on S-DBP15K datasets. Figure 4 illustrates the Hits@1 of NMN using our designed graph sampling method (Sec. 3.3) and a random-sampling-based variant when sampling different number of neighbors. Our NMN consistently delivers better results compared to the variant, showing that our sampling strategy can effectively select more informative neighbors.
Impact of neighborhood sampling size. From Figure 4, for S-DBP15K ZH−EN , both models reach a performance plateau with a sampling size of 3, and using a bigger sampling size would lead to performance degradation. For S-DBP15K JA−EN and S-DBP15K F R−EN , we observe that our NMN performs similarly when sampling different number of neighbors. From Table 1, we can see that S-DBP15K ZH−EN is more sparse than S-DBP15K JA−EN and S-DBP15K F R−EN . All models deliver much lower performance on S-DBP15K ZH−EN . Therefore, the neighbor quality of this dataset might be poor, and a larger sampling size will introduce more noise. On the other hand, the neighbors in JA-EN and FR-EN datasets might be more informative. Thus, NMN is not sensitive to the sampling size on these two datasets.
How does the neighborhood matching module work? In an attempt to understand how our neighborhood matching strategy helps alignment, we visualize the attention weights in the neighborhood matching module. Considering an equivalent entity pair in DBP15K ZH−EN , both of which indicate an American film studio Paramount Pictures. From Figure 5, we can see that the five neighbors sampled by our sampling module for each central entity are very informative ones for aligning the two central entities, such as the famous movies released by Paramount Pictures, the parent company and subsidiary of Paramount Pictures. This demonstrates the effectiveness of our sampling strategy again. Among the sampled neighbors, there are also two pairs of common neighbors (indicate Saving Private Ryan and Viacom). We observe that for each pair of equivalent neighbors, one neighbor can be particularly attended by its counterpart (the corresponding square has a darker color). This example clearly demonstrates that our neighborhood matching module can accurately estimate the neighborhood similarity by accurately detecting the similar neighbors.

Conclusion
We have presented NMN, a novel embedded-based framework for entity alignment. NMN tackles the ubiquitous neighborhood heterogeneity in KGs. We achieve this by using a new sampling-based approach to choose the most informative neighbors for each entity. As a departure from prior works, NMN simultaneously estimates the similarity of two entities, by considering both topological structure and neighborhood similarity. We perform extensive experiments on real-world datasets and compare NMN against 12 recent embedded-based methods. Experimental results show that NMN achieves the best and more robust performance, consistently outperforming competitive methods across datasets and evaluation metrics.