Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval

This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interaction-based neural ranking networks. The two components are learned end-to-end, making EDRM a natural combination of entity-oriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models.


Introduction
The emergence of large scale knowledge graphs has motivated the development of entity-oriented search, which utilizes knowledge graphs to improve search engines. The recent progresses in entity-oriented search include better text representations with entity annotations (Xiong et al., 2016;Raviv et al., 2016), richer ranking features (Dalton et al., 2014), entity-based connections between query and documents (Liu and Fang, 2015;Xiong and Callan, 2015), and soft-match query and documents through knowledge graph relations or embeddings (Xiong et al., 2017c;Ensan and Bagheri, 2017). These approaches bring in entities and semantics from knowledge graphs and have greatly improved the effectiveness of feature-based search systems. * Corresponding author: M. Sun (sms@tsinghua.edu.cn) Another frontier of information retrieval is the development of neural ranking models (neural-IR). Deep learning techniques have been used to learn distributed representations of queries and documents that capture their relevance relations (representation-based) (Shen et al., 2014), or to model the query-document relevancy directly from their word-level interactions (interactionbased) (Guo et al., 2016a;Xiong et al., 2017b;Dai et al., 2018). Neural-IR approaches, especially the interaction-based ones, have greatly improved the ranking accuracy when large scale training data are available (Dai et al., 2018).
Entity-oriented search and neural-IR push the boundary of search engines from two different aspects. Entity-oriented search incorporates human knowledge from entities and knowledge graph semantics. It has shown promising results on feature-based ranking systems. On the other hand, neural-IR leverages distributed representations and neural networks to learn more sophisticated ranking models form large-scale training data. However, it remains unclear how these two approaches interact with each other and whether the entity-oriented search has the same advantage in neural-IR methods as in feature-based systems.
This paper explores the role of entities and semantics in neural-IR. We present an Entity-Duet Neural Ranking Model (EDRM) that incorporates entities in interaction-based neural ranking models. EDRM first learns the distributed representations of entities using their semantics from knowledge graphs: descriptions and types. Then it follows a recent state-of-the-art entity-oriented search framework, the word-entity duet (Xiong et al., 2017a), and matches documents to queries with both bag-of-words and bag-of-entities. Instead of manual features, EDRM uses interactionbased neural models (Dai et al., 2018) to match query and documents with word-entity duet rep-resentations. As a result, EDRM combines entityoriented search and the interaction based neural-IR; it brings the knowledge graph semantics to neural-IR and enhances entity-oriented search with neural networks.
One advantage of being neural is that EDRM can be learned end-to-end. Given a large amount of user feedback from a commercial search log, the integration of knowledge graph semantics to neural ranker, is learned jointly with the modeling of query-document relevance in EDRM. It provides a convenient data-driven way to leverage external semantics in neural-IR.
Our experiments on a Sogou query log and CN-DBpedia demonstrate the effectiveness of entities and semantics in neural models. EDRM significantly outperforms the word-interaction-based neural ranking model, K-NRM (Xiong et al., 2017a), confirming the advantage of entities in enriching word-based ranking. The comparison with Conv-KNRM (Dai et al., 2018), the recent stateof-the-art neural ranker that models phrase level interactions, provides a more interesting observation: Conv-KNRM predicts user clicks reasonably well, but integrating knowledge graphs using EDRM significantly improves the neural model's generalization ability on more difficult scenarios.
Our analyses further revealed the source of EDRM's generalization ability: the knowledge graph semantics. If only treating entities as ids and ignoring their semantics from the knowledge graph, the entity annotations are only a cleaner version of phrases. In neural-IR systems, the embeddings and convolutional neural networks have already done a decent job in modeling phraselevel matches. However, the knowledge graph semantics brought by EDRM can not yet be captured solely by neural networks; incorporating those human knowledge greatly improves the generalization ability of neural ranking systems.

Related Work
Current neural ranking models can be categorized into two groups: representation based and interaction based (Guo et al., 2016b). The earlier works mainly focus on representation based models. They learn good representations and match them in the learned representation space of query and documents. DSSM (Huang et al., 2013) and its convolutional version CDSSM (Shen et al., 2014) get representations by hashing letter-tri-grams to a low dimension vector. A more recent work uses pseudo-labeling as a weak supervised signal to train the representation based ranking model (Dehghani et al., 2017).
The interaction based models learn word-level interaction patterns from query-document pairs.
The Deep Relevance Matching Model (DRMM) (Guo et al., 2016b) uses pyramid pooling (histogram) to summarize the word-level similarities into ranking models. K-NRM and Conv-KNRM use kernels to summarize wordlevel interactions with word embeddings and provide soft match signals for learning to rank. There are also some works establishing positiondependent interactions for ranking models (Pang et al., 2017;Hui et al., 2017). Interaction based models and representation based models can also be combined for further improvements (Mitra et al., 2017).
Recently, large scale knowledge graphs such as DBpedia (Auer et al., 2007), Yago (Suchanek et al., 2007) and Freebase (Bollacker et al., 2008) have emerged. Knowledge graphs contain human knowledge about real-word entities and become an opportunity for search system to better understand queries and documents. There are many works focusing on exploring their potential for ad-hoc retrieval. They utilize knowledge as a kind of pseudo relevance feedback corpus (Cao et al., 2008) or weight words to better represent query according to well-formed entity descriptions. Entity query feature expansion (Dietz and Verga, 2014) uses related entity attributes as ranking features.
Another way to utilize knowledge graphs in information retrieval is to build the additional connections from query to documents through related entities. Latent Entity Space (LES) builds an unsupervised model using latent entities' descriptions (Liu and Fang, 2015). EsdRank uses related entities as a latent space, and performs learning to rank with various information retrieval features (Xiong and Callan, 2015). AttR-Duet develops a four-way interaction to involve cross matches between entity and word representations to catch more semantic relevance patterns (Xiong et al., 2017a).
There are many other attempts to integrate knowledge graphs in neural models in related tasks (Miller et al., 2016;Gupta et al., 2017;Ghazvininejad et al., 2018). Our work shares a similar spirit and focuses on exploring the effectiveness of knowledge graph semantics in neural-IR.

Entity-Duet Neural Ranking Model
This section first describes the standard architecture in current interaction based neural ranking models. Then it presents our Entity-Duet Neural Ranking Model, including the semantic entity representation which integrates the knowledge graph semantics, and then the entity-duet ranking framework. The overall architecture of EDRM is shown in Figure 1.

Interaction based Ranking Models
Given a query q and a document d, interaction based models first build the word-level translation matrix between q and d (Berger and Lafferty, 1999). The translation matrix describes word pairs similarities using word correlations, which are captured by word embedding similarities in interaction based models.
Typically, interaction based ranking models first map each word t in q and d to an L-dimensional embedding v t with an embedding layer Emb w : vt = Embw(t). (1) It then constructs the interaction matrix M based on query and document embeddings. Each element M ij in the matrix, compares the ith word in q and the jth word in d, e.g. using the cosine similarity of word embeddings: ( With the translation matrix describing the term level matches between query and documents, the next step is to calculate the final ranking score from the matrix. Many approaches have been developed in interaction base neural ranking models, but in general, that would include a feature extractor φ() on M and then one or several ranking layers to combine the features to the ranking score.

Semantic Entity Representation
EDRM incorporates the semantic information about an entity from the knowledge graphs into its representation. The representation includes three embeddings: entity embedding, description embedding, and type embedding, all in L dimension and are combined to generate the semantic representation of the entity.
Entity Embedding uses an L-dimensional embedding layer Emb e to get an entity embedding v emb e for e: v emb e = Embe(e). (3) Description Embedding encodes an entity description which contains m words and explains the entity. EDRM first employs the word embedding layer Emb w to embed the description word w to v w . Then it combines all embeddings in text to an embedding matrix V w . Next, it leverages convolutional filters to slide over the text and compose the h length n-gram as g j e : where W CNN and b CNN are two parameters of the covolutional filter. Then we use max pooling after the convolution layer to generate the description embedding v des e : v des e = max( g 1 e , ..., g j e , ..., g m e ).
Type Embedding encodes the categories of entities. Each entity e has n kinds of types F e = {f 1 , ..., f j , ..., f n }. EDRM first gets the f j embedding v f j through the type embedding layer Emb tp : Then EDRM utilizes an attention mechanism to combine entity types to the type embedding v where a j is the attention score, calculated as: P j is the dot product of the query or document representation and type embedding f j . We leverage bag-of-words for query or document encoding. W bow is a parameter matrix.
Combination. The three embeddings are combined by an linear layer to generate the semantic representation of the entity: W e is an L×2L matrix and b e is an L-dimensional vector.

Neural Entity-Duet Framework
Word-entity duet (Xiong et al., 2017a) is a recently developed framework in entity-oriented search. It utilizes the duet representation of bag-of-words and bag-of-entities to match q-d with hand crafted features. This work introduces it to neural-IR. We first construct bag-of-entities q e and d e with entity annotation as well as bag-of-words q w and d w for q and d. The duet utilizes a four-way interaction: query words to document words (q w -d w ), query words to documents entities (q w -d e ), query entities to document words (q e -d w ) and query entities to document entities (q e -d e ).
Instead of features, EDRM uses a translation layer that calculates similarity between a pair of query-document terms: And elements in them are the cosine similarities of corresponding terms: The final ranking feature Φ(M) is a concatenation (⊕) of four cross matches (φ(M )): where the φ can be any function used in interaction based neural ranking models.
The entity-duet presents an effective way to cross match query and document in entity and word spaces. In EDRM, it introduces the knowledge graph semantics representations into neural-IR models.

Integration with Kernel based Neural Ranking Models
The duet translation matrices provided by EDRM can be plugged into any standard interaction based neural ranking models. This section expounds special cases where it is integrated with K-NRM (Xiong et al., 2017b) and Conv-KNRM (Dai et al., 2018), two recent stateof-the-arts. K-NRM uses K Gaussian kernels to extract the matching feature φ(M ) from the translation matrix M . Each kernel K k summarizes the translation scores as soft-TF counts, generating a K-dimensional feature vector φ(M ) = {K 1 (M ), ..., K K (M )}: µ k and δ k are the mean and width for the kth kernel. Conv-KNRM extend K-NRM incorporating hgram compositions g i h from text embedding V T using CNN: Then a translation matrix M hq,h d is constructed. Its elements are the similarity scores of h-gram pairs between query and document: We also extend word n-gram cross matches to word entity duet matches: Each ranking feature φ(M hq,h d ) contains three parts: query h q -gram and document h d -gram match feature (φ(M ww hq ,h d )), query entity and document h d -gram match feature (φ(M ew 1,h d )), and query h q -gram and document entity match feature (φ(M ww hq ,1 )): We then use learning to rank to combine ranking feature Φ(M) to produce the final ranking score: ω r and b r are the ranking parameters. tanh is the activation function. We use standard pairwise loss to train the model: where the d + is a document ranks higher than d − .
With sufficient training data, the whole model is optimized end-to-end with back-propagation. During the process, the integration of the knowledge graph semantics, entity embedding, description embeddings, type embeddings, and matching with entities-are learned jointly with the ranking neural network.

Experimental Methodology
This section describes the dataset, evaluation metrics, knowledge graph, baselines, and implementation details of our experiments.
Dataset. Our experiments use a query log from Sogou.com, a major Chinese searching engine (Luo et al., 2017). The exact same dataset and training-testing splits in the previous research (Xiong et al., 2017b;Dai et al., 2018) are used. They defined the ad-hoc ranking task in this dataset as re-ranking the candidate documents provided by the search engine. All Chinese texts are segmented by ICTCLAS (Zhang et al., 2003), after that they are treated the same as English. Prior research leverages clicks to model user behaviors and infer reliable relevance signals using click models (Chuklin et al., 2015). DCTR and TACM are two click models: DCTR calculates the relevance scores of a query-document pair based on their click through rates (CTR); TACM (Wang et al., 2013) is a more sophisticated model that uses both clicks and dwell times. Following previous research (Xiong et al., 2017b), both DCTR and TACM are used to infer labels. DCTR inferred relevance labels are used in training. Three testing scenarios are used: Testing-SAME, Testing-DIFF and Testing-RAW.
Testing-SAME uses DCTR labels, the same as in training. Testing-DIFF evaluates models performance based on TACM inferred relevance labels. Testing-RAW evaluates ranking models through user clicks, which tests ranking performance for the most satisfying document. Testing-DIFF and Testing-RAW are harder scenarios that challenge the generalization ability of all models, because their training labels and testing labels are generated differently (Xiong et al., 2017b).
Evaluation Metrics. NDCG@1 and NDCG@10 are used in Testing-SAME and Testing-DIFF. MRR is used for Testing-Raw. Statistic significances are tested by permutation test with P< 0.05. All are the same as in previous research (Xiong et al., 2017b).
Knowledge Graph. We use CN-DBpedia , a large scale Chinese knowledge graph based on Baidu Baike, Hudong Baike, and Chinese Wikipedia. CN-DBpedia contains 10,341,196 entities and 88,454,264 relations. The query and document entities are annotated by CMNS, the commonness (popularity) based entity linker (Hasibi et al., 2017). CN-DBpedia and CMNS provide good coverage on our queries and documents. As shown in Figure 2, the majority of queries have at least one entity annotation; the average number of entity annotated per document title is about four.
Baselines. The baselines include feature-based ranking models and neural ranking models. Most of the baselines are borrowed from previous research (Xiong et al., 2017b;Dai et al., 2018).
Our main baselines are K-NRM and Conv-KNRM, the recent state-of-the-art neural models on the Sogou-Log dataset. The goal of our experiments is to explore the effectiveness of knowledge graphs in these state-of-the-art interaction based neural models.
Implementation Details. The dimension of word embedding, entity embedding and type embedding are 300. Vocabulary size of entities and words are 44,930 and 165,877. Conv-KNRM uses one layer CNN with 128 filter size for the ngram composition. Entity description encoder is a one layer CNN with 128 and 300 filter size for Conv-KNRM and K-NRM respectively.
All models are implemented with PyTorch. Adam is utilized to optimize all parameters with learning rate = 0.001, = 1e − 5 and early stopping with the practice of 5 epochs.
There are two versions of EDRM: EDRM-KNRM and EDRM-CKNRM, integrating with K-NRM and Conv-KNRM respectively. The first one (K-NRM) enriches the word based neural ranking model with entities and knowledge graph semantics; the second one (Conv-KNRM) enriches the n-gram based neural ranking model.

Evaluation Results
Four experiments are conducted to study the effectiveness of EDRM: the overall performance, the contributions of matching kernels, the ablation study, and the influence of entities in different scenarios. We also do case studies to show effect of EDRM on document ranking.

Ranking Accuracy
The ranking accuracies of the ranking methods are shown in Table 1. K-NRM and Conv-KNRM outperform other baselines in all testing scenarios by large margins as shown in previous research.
EDRM-KNRM out performs K-NRM by over 10% improvement in Testing-SAME and Testing-DIFF. EDRM-CKNRM has almost same performance on Testing-SAME with Conv-KNRM. A possible reason is that, entity annotations provide effective phrase matches, but Conv-KNRM is also able to learn phrases matches automatically from data. However, EDRM-CKNRM has significant improvement on Testing-DIFF and Testing-RAW. Those demonstrate that EDRM has strong ability to overcome domain differences from different labels. Table 2: Ranking accuracy of adding diverse semantics based on K-NRM and Conv-KNRM. Relative performances compared are in percentages. †, ‡, §, ¶, * , * * indicate statistically significant improvements over K-NRM † (or Conv-KNRM † ), +Embed ‡ , +Type § , +Description ¶ , +Embed+Type * and +Embed+Description * * respectively.   These results show the effectiveness and the generalization ability of EDRM. In the following experiments, we study the source of this generalization ability.

Contributions of Matching Kernels
This experiment studies the contribution of knowledge graph semantics by investigating the weights learned on the different types of matching kernels.
As shown in Figure 3(a), most of the weight in EDRM-KNRM goes to soft match (Exact VS. Soft); entity related matches play an as important role as word based matches (Solo Word VS. Others); cross-space matches are more important than in-space matches (In-space VS. Crossspace). As shown in Figure 3(b), the percentages of word based matches and cross-space matches are more important in EDRM-CKNRM compared to in EDRM-KNRM.
The contribution of each individual match type in EDRM-CKNRM is shown in Figure 4. The weight of unigram, bigram, trigram, and entity is almost uniformly distributed, indicating the effectiveness of entities and all components are important in EDRM-CKNRM.

Ablation Study
This experiment studies which part of the knowledge graph semantics leads to the effectiveness and generalization ability of EDRM.
There are three types of embeddings incorporating different aspects of knowledge graph information: entity embedding (Embed), description embedding (Description) and type embedding (Type). This experiment starts with the word-only K-NRM and Conv-KNRM, and adds these three types of embedding individually or two-by-two (Embed+Type and Embed+Description).
The performances of EDRM with different groups of embeddings are shown in Table 2. The description embeddings show the greatest improvement among the three embeddings. Entity This experiments shows that knowledge graph semantics are crucial to EDRM's effectiveness. Conv-KNRM learns good phrase matches that overlap with the entity embedding signals. However, the knowledge graph semantics (descriptions and types) is hard to be learned just from user clicks.

Performance on Different Scenarios
This experiment analyzes the influence of knowledge graphs in two different scenarios: multiple difficulty degrees and multiple length degrees.
Query Length Experiment evaluates EDRM's effectiveness on Short (1 words), Medium (2-3 words) and Long (4 or more words) queries. As shown in Figure 6, EDRM has more win cases and achieves the greatest improvement on short queries. Knowledge embeddings are more crucial when limited information is available from the original query text. These two experiments reveal that the effectiveness of EDRM is more observed on harder or shorter queries, whereas the word-based neural models either find it difficult or do not have sufficient information to leverage. Table 3 provide examples reflecting two possible ways, in which the knowledge graph semantics could help the document ranking.

Case Study
First, the entity descriptions explain the meaning of entities and connect them through the word space. Meituxiuxiu web version and Meilishuo are two websites providing image processing and shopping services respectively. Their descriptions provide extra ranking signals to promote the related documents.
Second, entity types establish underlying relevance patterns between query and documents. The underlying patterns can be captured by crossspace matches. For example, the types of the query entity Crayon Shin-chan and GINTAMA overlaps with the bag-of-words in the relevant documents. They can also be captured by the entity-based matches through their type overlaps,  Table 3a shows query-document pairs. Table 3b lists the related entity semantics that include useful information to match the query-document pair. The examples and related semantics are picked by manually examining the ranking changes between different variances of EDRM-CKNRM.

Conclusions
This paper presents EDRM, the Entity-Duet Neural Ranking Model that incorporating knowledge graph semantics into neural ranking systems. EDRM inherits entity-oriented search to match query and documents with bag-of-words and bag-of-entities in neural ranking models. The knowledge graph semantics are integrated as distributed representations of entities. The neural model leverages these semantics to help document ranking. Using user clicks from search logs, the whole model-the integration of knowledge graph semantics and the neural ranking networksis trained end-to-end. It leads to a data-driven combination of entity-oriented search and neural information retrieval. Our experiments on the Sogou search log and CN-DBpedia demonstrate EDRM's effectiveness and generalization ability over two state-of-theart neural ranking models. Our further analyses reveal that the generalization ability comes from the integration of knowledge graph semantics. The neural ranking models can effectively model n-gram matches between query and document, which overlaps with part of the ranking signals from entity-based matches: Solely adding the entity names may not improve the ranking accuracy much. However, the knowledge graph se-mantics, introduced by the description and type embeddings, provide novel ranking signals that greatly improve the generalization ability of neural rankers in difficult scenarios. This paper preliminarily explores the role of structured semantics in deep learning models. Though mainly fouced on search, we hope our findings shed some lights on a potential path towards more intelligent neural systems and will motivate more explorations in this direction.