Domain Adaptation for Person-Job Fit with Transferable Deep Global Match Network

Person-job fit has been an important task which aims to automatically match job positions with suitable candidates. Previous methods mainly focus on solving the match task in single-domain setting, which may not work well when labeled data is limited. We study the domain adaptation problem for person-job fit. We first propose a deep global match network for capturing the global semantic interactions between two sentences from a job posting and a candidate resume respectively. Furthermore, we extend the match network and implement domain adaptation in three levels, sentence-level representation, sentence-level match, and global match. Extensive experiment results on a large real-world dataset consisting of six domains have demonstrated the effectiveness of the proposed model, especially when there is not sufficient labeled data.


Introduction
Recent years have witnessed the rapid growth of online recruitment platforms. According to (Iqbal, 2019), there were 590 million users and 11 million job listings in LinkedIn 1 from about over 200 countries and territories all over the world. With the increasing amount of online recruitment data, it has become an essential task that is able to automatically match jobs with suitable candidates, called person-job fit (Qin et al., 2018;Shen et al., 2018).
Due to the importance of person-job fit, many efforts have been devoted to improving the match algorithms Malinowski et al., 2006;Paparrizos et al., 2011;Lee and Brusilovsky, 2007;Zhang et al., 2014). Among these studies, a typical approach is to casting the task as a supervised text match problem. Given a set of labeled data (i.e., person-job match records), it aims to predict the match label based on the text content of job postings and candidate resumes. More recently, deep learning has shed light on person-job fit methods by learning more effective text representations or text match models (Qin et al., 2018;Jiang et al., 2019).
Although these methods have achieved significant progress in person-job fit, they rely on a considerable amount of labeled data to learn the match models. When labeled data is insufficient, existing methods may not work well as expected. Therefore, the deficiency in labeled data has raised a significant challenge to the person-job fit algorithms. The availability of labeled data for different job categories is highly varying. A job position from a major category usually attracts hundreds of job applicants, while one from a minor category receives very few applications. Besides, due to the emergence of new positions, data accumulation under these new job categories is limited. For this purpose, we investigate domain adaptation for person-job fit, focusing on the text match between job postings and candidate resumes. We aim to utilize the acquired knowledge and information from a source domain with sufficient labeled data to improve the prediction performance in a target domain with limited or little labeled data.
Text-based domain adaptation has been extensively studied in the literature, such as text classification Glorot et al., 2011;Ziser and Reichart, 2017). However, these works mainly focus on how to model an individual document instead of the match for a document pair, which are not directly applicable to our task. As for existing person-job fit methods (Qin et al., 2018), they mainly learn an overall representation for both a job posting and a candidate resume. Then, the match score is measured via the similarity between the two overall representations, which is limited to extend for domain adaptation. A major reason is that job postings and resumes are written on top of basic semantic units (e.g., sentences) related to skills, abilities or experiences, we have to model the global interaction between finegrained semantic units for an accurate match. Such global match information is important to consider for effective domain adaptation.
To address these difficulties, we first design a deep global match network for modeling personjob fit in a single domain. This model can capture the comprehensive interaction information based on all pairwise sentence interactions given a job posting and a resume. Furthermore, we extend the match network for domain adaptation in three levels. First, to overcome the semantic gap or language variation across different domains, the sentence representations have been enhanced via structural correspondence learning. The derived sentence representations are more transferable across domains. Second, the sentence-level match function has been set to share parameters across domains for improving the match accuracy in target domains. We construct a match matrix to contain all the match results in sentence level. Third, we design a convolution based component to learn transferable match patterns and components across domains. Although different domains have varying semantic representations or match functions, the global match information reflected in the match matrix should share similar patterns.
To our knowledge, it is the first time that domain adaptation has been studied for the person-job fit task. To evaluate our proposed model, we construct a large real-world dataset containing two large and four small domains from an online recruitment platform. Extensive experiment results have demonstrated the effectiveness of the proposed model, especially when the labeled data is limited.

Related Work
As an important task in recruitment data mining (Li et al., 2017a;Oentaryo et al., 2018), person-job fit has been extensively studied in the literature. Early methods include treating person-job fit as a job/candidate recommendation problem (Lu et al., 2013;Diaby et al., 2013) and extending users' historical behavior by job application records for better match (Hong et al., 2013). Recently, deep learning methods have been utilized to design a hierarchical representation structure based on historical recruitment records (Qin et al., 2018;Ramanath et al., 2018).
Domain adaptation is a hot topic in natural language processing, which has received much attention over the decades. Among the early works, Structural Correspondence Learning (SCL) (Blitzer et al., 2007(Blitzer et al., , 2006) is a classic algorithm which learns transferable feature representations across domains.
With the revival of deep learning, several studies focus on extending the neural network models for domain adaptation (Yu and Jiang, 2016;Ziser and Reichart, 2017). To further eliminate the distribution differences between source and target domains, the adversarial mechanism has been incorporated into domain adaptation (Li et al., 2017bGanin et al., 2016).
Besides, our work is also related to semantic text match (Mueller and Thyagarajan, 2016;Yin et al., 2016;Wang et al., 2017), specially the work which casts text match as image recognization (Pang et al., 2016) or captures both global and local interactions (Dai et al., 2018;Rao et al., 2018;.
Our work is highly based on these related studies. To our knowledge, domain adaptation for person-job fit has seldom been studied before. We focus on how to transfer the match information between job postings and resumes across domains.

Problem Formulation
We assume text content of a job posting and a candidate resume are available as input. A job posting p is described as n p sentences {w denotes the k-th word of the i-th sentence in p. Each job posting sentence typically describes the skill or ability that is required for the position. Similarly, a resume r consists of n r sentences {w is a sequence of n r,i word tokens denoted by {w i,k } the k-th word of the i-th sentence in p and r y the match result for a person-job pair based on the p and r D t , D s the training set for the target domain t and source domain s nt, ns the number of target labeled instances and source labeled instances h the hidden states learned from BiGRU network for the k-th word of the i-th sentence in the p and r L P , L R the dimension size of hidden states for the p and r M the match matrix (∈ R np ×nr ) for the sentence similarity between the p and r W the weight matrix (∈ R K×K ) , K is the dimension size m p,r the match representation derived by stacked convolutional layers with max-pooling layers Θ all the involved parameters h p , h r the representations of the p and r y the output score calculated by the Multi-Layer Perceptron candidate resume r.
Following (Qin et al., 2018;, we cast the person-job fit problem into a text match task based on the text information of job postings and candidate resumes. The Person-Job Fit task aims to learn a match function that is able to predict the actual label for a new job-person pair from the target domain.
Furthermore, we consider a domain adaptation setting for such an match task. Given a target domain t, a training set of n t labeled instances is given. Considering the target domain to be new or minor, there is very little labeled data. We assume an auxiliary training set D s = { p s j , r s j , y j } ns j=1 from another source domain s is given, consisting of n s labeled instances.
The source domain (or auxiliary domain) is selected from an existing domain with sufficient match records. In this case, we have n t n s . For simplicity, we will drop the domain index unless needed, which is the target domain by default. The notations used in this paper are summarized in Table 1.

The Proposed Model
In this section, we first present a deep global match network for person-job fit using only target domain data. Then, we extend the match network by leveraging the labeled data of an auxiliary domain.

A Deep Global Match Network for Single-Domain Person-Job Fit
Previous studies (Shen et al., 2018; mainly focus on modeling the overall interaction between the representations of a job and a candidate. They cannot explicitly capture global match information in terms of fine-grained semantic units. In this section, we propose a deep global match network which is able to characterize global sentence-level interaction for semantic match between a job posting and a candidate resume. Hierarchical Attention-based RNN Encoder. First, we employ the bi-directional recurrent neural networks with gated recurrent unit (BiGRU) to model both sentences and documents (i.e., a job posting or a resume).
Formally, let i,k denote the forward and backward states learned from the BiGRU network for the k-th word of the i-th sentence in job posting p. We concatenate the two directional representations as the representation of a word, Since some words in a sentence are important than others, we apply the attention mechanism (Qin et al., 2018) to derive the sentence representation h where the attention weight α k is calculated as Similarly, we can derive the sentence representations for a resume, denoted by h (r) i ∈ R L R . Based on the learned sentence representations, we apply the similar attentional BiGRU network to derive the overall representations for a job posting and a resume, denoted by h p and h r : where β are the attention weights defined in a similar way as Eq. 2.
Global Match Representation. After encoding the sentences, we further model global semantic interactions between sentence representations of a job and a resume. In specific, we compute the similarities between each sentence in a job posting and each sentence in a candidate resume. Assume that a job posting p has n p sentences and a resume r has n r sentences. We can derive a match matrix M ∈ R np×nr for modeling the global semantic interactions. Formally, to calculate the sentence similarity in M, we apply a linear form as where W ∈ R K×K is the match matrix, h (p) i is the representation of the i-th sentence for a job posting p, and h (r) i is the representation of the i -th sentence for resume r. Inspired by the recent work on image-based text match models (Pang et al., 2016), we propose to use the convolution based method to model the match information. Formally, we stack two 2-D convolutional layers with two 2-D max-pooling layers in an interleaving way, and derive a match representation m p,r as where m p,r summarizes the match information from global sentence interactions, and "Θ" denote all the involved parameters.
Predicting the Match Label. With the learned match representation m p,r , we concatenate it with the representations of the job posting and the resume as the input for the predictor, where we h p and h r are the representations of job posting p and resume r defined in Eq. 3 and Eq. 4 respectively, and MLP(·) is the Multi-Layer Perceptron with a nonlinear layer and a sigmoid layer. Compare with previous work (Qin et al., 2018;, a major novelty of this match model lies in the fact it has explicitly modeled the match information between a job posting and a resume. It is able to model sentencelevel interactions between two text documents.

Domain Adaptation by Transferring Local and Global Match Information
In this part, we study how to extend the proposed deep global match network for domain adaptation. Instead of directly enriching the overall representation of a training instance, we seek a way to transfer local and global match information.
Enhanced Sentence Representation with Structural Correspondence. Different domains may have significant language variation or semantic gap, which makes it difficult to derive transferable sentence representations across domains. Inspired by the classic SCL algorithm (Blitzer et al., 2007), we propose to model the structural correspondence of sentences in different domains by using pivot keywords. Consider an example of two snippets from two different job domains: S 1 : grasp C programming skills S 2 : grasp picture editing skills Although the semantics of computer programming and picture editing are very different, they are aligned skill requirements for the two domains via the pivot word grasp. By pre-selecting a number of high-quality pivot words, SCL algorithm is able to learn such semantic alignment via large-scale co-occurrence data. In specific, the SCL algorithm is able to learn a mapping function f SCL (·) that transforms an original into a more transferable representation: For the i-th sentence of a job posting, we derive its representation by concatenating the original representation in Eq. 1 and the transformed representation using SCL algorithm: where " ⊕ " is the vector concatenation operation. Similarly, we can obtain the enhanced resume sentence representation, denoted by h r i . With the h p i and h r i , we can update the overall representations of h p and h r in Eq. 3 and 4.
Transferring Sentence-level Match Information. After obtaining the new sentence representations, we study how to transfer the parameters for modeling sentence-pair match. As shown in Eq. 5, we incorporate a transformation matrix W to compute sentence similarities. A simple transfer strategy is to share the whole matrix in both target and source domains. However, such a method will make it less flexible to capture domain-specific match information, since the local match information is likely to be varied across domains. Hence, we propose to factorize the matrix W into a product of two smaller matrices A ∈ R l×K and B d ∈ R l×K : where A is shared by all the domains, while B d is specific to domain d. In this factorization, we share the parameters across domains and meanwhile set domain-specific parameters to better capture domain-specific information.
Transferring Global Match Information. As shown in Eq. 5, we have constructed a match matrix consisting of pairwise sentence similarities between a job posting and a resume. Our idea is that although sentence-level representation or match is different across domains, global semantic interactions are likely to show the similar patterns. For example, the majority of required skills or abilities in the job posting should be well covered for a good candidate, corresponding to large entries in the match matrix M. Based on this idea, we propose to transfer the parameters of the convolution module across domains. In specific, we first use the rich training data of the auxiliary domain s to train the parameters in Eq. 6, and obtain the learned parametersΘ s . We assume the mapping process from the match matrix to the final match representation (i.e., M → m p,r ) is transferable. Hence, we directly reuseΘ s trained from the source domain. For enhancing the modeling capacity, we also train a domain-specific convolution component with the parametersΘ t . For the target domain, we derive the new match representation by using the concatenation of the output from both components with the parameterŝ Θ s andΘ t :

The Final Model and Training
To predict the match label for domain adaptation, we replace h p , h r and m p,r in Eq. 7 with h p , h r ,  and m p,r respectively. We present an overview sketch of the proposed model in Fig. 1. We have highlighted the three points for transferring information across domain. Different from previous text-based domain adaption methods, we focus on the transfer of match information, including both local and global semantic interactions. To optimize our model, we adopt the the binary cross-entropy loss over the entire training data as the total loss. To learn the model parameters, we adopt the Adam optimizer (Kingma and Ba, 2014). In order to avoid overfitting, we adopt the dropout strategy with a rate of 0.1. More parameter settings can be found in Table 2.

Experiments
In this section, we first set up the experiments, and then report the results and analysis.

Experimental Setup
We evaluate our model on a large real-world dataset provided by the largest online recruiting platform named "BOSS Zhipin" (the BOSS Recruiting) 2 in China. To protect the privacy of candidates, all the records have been anonymized by deleting identity information.
Since the original amount of recruitment data is huge, we randomly sample a fraction of the entire data, containing job postings and resumes within a time period of six months. Specifically, the dataset has covered six job domains: two major domains have a large number person-job pairs and four  minor domains have a limited number of personjob pairs. The match label information is obtained according to the acceptance status of the candidates, provided by the online recruiting platform. In our dataset, the ratio between matched and unmatched instances is 1:1 3 . We perform the basic preprocessing steps on the text data, including tokenization and stopword removal. The statistics of the dataset are summarized in Table 3. Our code and data are available at https://github. com/RUCAIBox/Person-Job-Fit. We consider two evaluation settings, namely single domain evaluation and domain adaptation evaluation. For all domains, we first split the entire data set with a ratio of 1:1 into a training set and a test set. Single domain evaluation examines the performance of a model model trained with domain-specific training data respectively for each of all six domains, while domain adaptation evaluation examines the performance of a model trained with the data from both the source and target domains. For domain adaptation, we consider the four minor domains as target 3 There were more negative instances in original data. we randomly sample an equal number of negative instances to reduce the data bias. domains, and the major domains as source domains.
Since our task is casted as a classification task, we adopt four commonly used evaluation metrics, including Accuracy, Precision, Recall and F1.

Comparison Methods
Single Domain. For single domain evaluation, we consider three latest methods as baselines: • DSSM (Huang et al., 2013): it utilizes a deep neural network (DNN) to map high-dimensional sparse features into low-dimensional dense features, and calculates the semantic similarity of the text pair.
• BPJFNN (Qin et al., 2018): It applies BiLSTMs to obtain the semantic representation of each word in job postings and resumes, and considers the text content as a single sequence.
• PJFNN : It proposes a bipartite Convolutional Neural Networks that can effectively learn the joint representation of Person-Job fitness from historical job applications.
• APJFNN (Qin et al., 2018): It learns a word-level semantic representation for both job requirements and resumes based on RNN and four hierarchical ability-aware attention strategies.
Domain Adaptation. For domain adaptation evaluation, we consider three baseline methods: • SCL-MI (Blitzer et al., 2007): It proposes the SCL algorithm where pivot features are seleced based on mutual information in the unlabeled data of both the source and target domains.
• AE-SCL-SR (Ziser and Reichart, 2017): It proposes to find a shared low dimensional representation in order to overcome the domain adaptation problem.
• ASP-MTL (Liu et al., 2017): It proposes an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other.
For our deep match network, we also prepare three simple single-domain variants, include (1) Tgt-Only trains the model with only target domain data, (2) Src-Only trains the model with only source domain data, and (3) Mixed trains the model with a simple mixture of source and target domain data.

Results and Analysis
In this part, we construct a series of experiments on the effectiveness of the proposed model for the person-job fit task.
Single Domain Results. We first report the results on single domain evaluation, where we only use the training data from the target domain. In Table 4, we compare our model with three recently proposed methods for person-job fit. Among the three baselines, the APJFNN method performs better than the other two methods. It adopts finegrained attention mechanism based on skills and abilities. Our (single domain) model outperforms all the baselines. The proposed deep global match network extends the APJFNN by explicitly modeling the global match information for the sentence representations of job postings and resumes. These results indicate the global match information is useful to improve the performance of the person-job fit task.
Domain Adaptation Results. We select the domains of technology and sales as the source domains, since both contain much more training data, while the rest domains are selected as the target domains. Table 5 presents the performance of different comparison methods for domain adaptation. First, for the three variants for our single- Domain adaptation with different train/test splitting ratios in the target domain for "Technology→Car". domain model, the performance order is Src-Only < Mixed < Tgt-Only. The result indicates that directly incorporating the source domain data as training data may hurt the performance for the target domain. Second, ASP-MTL performs best among the three baselines. It sets up both shared and private feature representations, and apply the adversarial mechanism to alleviate the shared and private latent feature spaces from interfering with each other. Finally, our model outperforms all the baselines with a large margin. Previous methods (e.g., ASP-MTL) learns an overall representation for a document, which cannot characterize global semantic interaction in fine-grained semantic units. As a comparison, our deep global match network is able to transfer information in three levels, namely sentence representation, sentencelevel match and global match. Different from existing text-based domain adaption methods, our model is tailored to transfer the match information between a document pair.
Ablation Analysis. Recall we proposed three transfer techniques to improve the performance for domain adaptation in Section 4.2, including sentence representation (denoted by SR), sentencelevel match (denoted by SM) and global match (denoted by GM). Now we examine the effect of each factor on the prediction performance. At each time, we remove an individual factor while keep the rest. Due to space limit, we only report the results with the cases "Technology→Cars" and "Sales→Product". As shown in Table 6  Parameters Sensitivity Analysis. In our model, we have two important parameters, namely the dimension number K of the matrix W in Eq. 10 and the number of pivot words in the SCL algorithm. We vary the two parameters and present the tuning results in Fig. 3. We select the best baseline ASP-MTL as a comparison. As we can see, the performance of our model is relatively stable w.r.t. the varying of the two parameters. A

Qualitative Analysis
A key point of our model is that it can transfer the global match information derived by computing the pairwise similarities between two sentence respectively from a job posting and a resume. Recall that we reuse the convolution-based component  trained with the source domain data. Now, we study how such a mechanism work and why it is useful to improve the performance.
In Fig. 4, we present four person-job pairs from our dataset containing two matched pairs and two unmatched pairs. In a subfigure, each row corresponds to a sentence in a job posting, while a column corresponds to a sentence in a resume. Due to space limit, we only select six sentences for each document. We follow the method in Eq. 5 to construct the match matrix M. We use the darkness degree to indicate the similarity between two sentences. For ease of understanding, we further perform the row and column permutation to make the large entries distributed in blocks.
From Fig. 4, we can make the following observations. First, global match matrices have a significant difference between a matched and an unmatched case. Although the match matrix for an unmatched pair also contains many large values, they are usually located in off-diagonal entries. Comparing Fig. 4(a) and Fig. 4(c), it is interesting to see that they share a similar distribution pattern, i.e., large values are mainly distributed along the diagonal line and these values are also aggregated in small blocks. Such a phenomenon indicates that the majority of the required skills have been well covered by a candidate in a matched pair. Without considering global semantic interactions, it is difficult to correctly predict the label for the unmatched pairs in Fig. 4(b) and Fig. 4(d), since they are likely to have a large similarity measured by the overall representations.

Conclusion
This paper studied the domain adaptation for person-job fit. We first proposed a deep global match network for the single-domain match setting. Then, we extended the proposed model for domain adaptation in three aspects. We constructed extensive experiments on a large real-world recruitment dataset, containing six job domains. The results have demonstrated the effectiveness of our model in terms of the prediction accuracy for person-job fit, especially when the training data is limited. Our current characterization with global match information provides a good form to develop interpretable domain adaptation models, i.e., what kind of information has been transferred across domains. As future work, we will work along this line and investigate the design interpretable solutions to domain adaptation for person-job fit. We will also consider how to model the domain relationship for effectively transferring information across domains.