PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation

Transfer learning is an effective technique to improve a target recommender system with the knowledge from a source domain. Existing research focuses on the recommendation performance of the target domain while ignores the privacy leakage of the source domain. The transferred knowledge, however, may unintendedly leak private information of the source domain. For example, an attacker can accurately infer user demographics from their historical purchase provided by a source domain data owner. This paper addresses the above privacy-preserving issue by learning a privacy-aware neural representation by improving target performance while protecting source privacy. The key idea is to simulate the attacks during the training for protecting unseen users’ privacy in the future, modeled by an adversarial game, so that the transfer learning model becomes robust to attacks. Experiments show that the proposed PrivNet model can successfully disentangle the knowledge benefitting the transfer from leaking the privacy.


Introduction
Recommender systems (RSs) are widely used in everyday life ranging from Amazon products (Zhou et al., 2018;Wan et al., 2020) and YouTube videos (Gao et al., 2010;Cheng et al., 2016) to Twitter microblogs  and news feeds (Wang et al., 2018a;Ma et al., 2019b). RSs estimate user preferences on items from their historical interactions. RSs, however, cannot learn a reliable preference model if there are too few interactions in the case of new users and items, i.e., suffering from the data sparsity issues.
Transfer learning is an effective technique for alleviating the issues of data sparsity by exploiting the knowledge from related domains (Pan et al., 2010;Liu et al., 2018). We may infer user preferences on videos from their Tweet texts (Huang and Lin, 2016), from movies to books (Li et al.,  2009), and from news to apps (Hu et al., 2018(Hu et al., , 2019. These behaviors across domains are different views of the same user and may be driven by some inherent user interests (Elkahky et al., 2015).
There is a privacy concern when the source domain shares their data to the target domain due to the ever-increasing user data abuse and privacy regulations (Ramakrishnan et al., 2001;. Private information contains those attributes that users do not want to disclose, such as gender and age (Jia and Gong, 2018). They can be used to train better recommendation models by alleviating the data sparsity issues to build better user profiles (Zhao et al., 2014;Cheng et al., 2016). Previous work (Weinsberg et al., 2012;Beigi et al., 2020) shows that an attacker can accurately infer a user's gender, age, and occupation from their ratings, recommendation results, and a small amount of users who reveal their demographics.
A technical challenge for protecting user privacy in transfer learning is that the transferred knowledge has dual roles: usefulness to improve target recommendation and uselessness to infer source user privacy. In this work, we propose a novel model (PrivNet) to achieve the two goals by learning privacy-aware transferable knowledge such that it is useful for improving recommendation performance in the target domain while it is useless to infer private information of the source domain. The key idea is to simulate the attack during the training for protecting unseen users' privacy in the future. The privacy attacker and the recommender are naturally modeled by an adversarial learning game. The main contributions are two-fold: • PrivNet is the first to address the privacy protection issues, i.e., protecting source user private attributes while improving the target performance, during the knowledge transfer in neural recommendation.
• PrivNet achieves a good tradeoff between the utility and privacy of the source information through evaluation on real-world datasets by comparing with strategies of adding noise (i.e., differential privacy) and perturbing ratings.
2 Related Work

Transfer learning in recommendation
Transfer learning in recommendation (Cantador et al., 2015) is an effective technique to alleviate the data sparsity issue in one domain by exploiting the knowledge from other domains. Typical methods apply matrix factorization (Singh and Gordon, 2008;Pan et al., 2010;Yang et al., 2017b) and representation learning Man et al., 2017;Yang et al., 2017a;Gao et al., 2019a;Ma et al., 2019a) on each domain and share the user (item) factors, or learn a cluster level rating pattern (Li et al., 2009;Yuan et al., 2019). Transfer learning is to improve the target performance by exploiting knowledge from auxiliary domains (Pan and Yang, 2009;Elkahky et al., 2015;Zhang and Yang, 2017;Chen et al., 2019;Gao et al., 2019b). One transfer strategy (two-stage) is to initialize a target network with transferred representations from a pre-trained source network (Oquab et al., 2014;Yosinski et al., 2014). Another transfer strategy (end-to-end) is to transfer knowledge in a mutual way such that the source and target networks benefit from each other during the training, with examples including the cross-stitch networks (Misra et al., 2016) and collaborative cross networks (Hu et al., 2018). These transfer learning methods have access to the input or representations from source domain. Therefore, it raises a concern on privacy leaks and provides an attack possibility during knowledge transfer.

Privacy-preserving techniques
Existing privacy-preserving techniques mainly belong to three research threads. One thread adds noise (e.g., differential privacy (Dwork et al., 2006)) to the released data or the output of recommender systems (McSherry and Mironov, 2009;Jia and Gong, 2018;Meng et al., 2018;Wang et al., 2018b;Wang and Zhou, 2020). One thread perturbs user profiles such as adding (or deleting/changing) dummy items to the user history so that it hides the user's actual ratings (Polat and Du, 2003;Weinsberg et al., 2012). Adding noise and perturbing ratings may still suffer from privacy inference attacks when the attacker can successfully distinguish the true profiles from the noisy/perturbed ones. Furthermore, they may degrade performance since data is corrupted. Another thread uses adversary loss (Resheff et al., 2019;Beigi et al., 2020) to formulate the privacy attacker and the recommender system as an adversarial learning problem. However, they face the data sparsity issues. A recent work (Ravfogel et al., 2020) trains linear classifiers to predict a protected attribute and then remove it by projecting the representation on its null-space. Some other work uses encryption and federated learning so as to protect the personal data without affecting performance (Nikolaenko et al., 2013;Chen et al., 2018;. They suffer from efficiency and scalability due to high cost of computation and communication.

Problem Statement
We have two domains, a source domain S and a target domain T . User sets in two domains are shared, denoted by U (of size m = |U|). Denote item sets in two domains by I S and I T (of size n S = |I S | and n T = |I T |), respectively. For the target domain, a binary matrix R T ∈ R m×n T describes the user-item interactions, where the entry r ui ∈ {0, 1} equals 1 if user u has an interaction with item i and 0 otherwise. Similarly, for the source domain, we have R S ∈ R m×n S and the entry r uj ∈ {0, 1}. We reserve i and j to index the target and source items, respectively. Let Y p ∈ R m×cp denote the p-th user private attribute (e.g., p='Gender') matrix where each entry y u,p is the value of the p-th private information for user u (e.g., y u,p ='Male') and there are c p choices. Denote all n private attributes data by Y = {Y p } n p=1 (e.g., Gender, Age). We can define the problem as follows: REQUIRE: An attacker is difficult to infer the source user private attributes from the knowledge transferred to the target domain.
ASSUMPTION: Some users U pub ⊂ U share their private information with the public profile.

The Proposed Framework
The architecture of PrivNet is shown in Figure 2. It has two components, a recommender and an attacker. We introduce the recommender (Section 4.1) and present an attack against it (Section 4.2). We propose PrivNet to protect source user privacy during the knowledge transfer (Section 4.3).

Recommender
In this section, we introduce a novel transferlearning recommender which has three parts, a source network for the source domain, a target network for the target domain, and a knowledge transfer unit between the two domains.
Target network The input is a pair of (user, item) and the output is their matching degree. The user is represented by their w-sized historical items [i 1 , ..., i w ]. First, an item embedding matrix A T projects the discrete item indices to the ddimensional continuous representations: x i and x i * where * ∈ [1, 2, ..., w]. Second, the user representation x u is computed by the user encoder module based on an attention mechanism by querying their historical items with the predicted item: Third, a multilayer perceptron (MLP) f T parameterized by φ T is used to compute target preference score (the notation [·, ·] denotes concatenation): Source network Similar to the three-step computing process in the target network, we compute the source preference score by: Transfer unit The transfer unit implements the knowledge transfer from the source to the target domain. Since typical neural networks have more than one layer, say L, the representations are transferred in a multilayer way. Let x u|# where # ∈ {S, T } be user u's source/target representation in the -th layer ( = 1, 2, ..., L − 1) where The transferred representation is computed by projecting the source representation to the space of target representations with a translation matrix H : With the knowledge from the source domain, the target network learns a linear combination of the two input activations from both networks and then feeds these combinations as input to the successive layer's filter. In detail, the ( + 1)-th layer's input of the target network is computed by: W T x u|T + x u|trans where W T is the connection weight matrix in the -th layer of the target network. The total transferred knowledge is concatenated by all layers's representations: Objective The recommender minimizes the neg-ative logarithm likelihood: and D S are target and source training examples, respectively. We introduce how to generate them in Section 4.4.

Attacker
The recommender can fulfil the Problem 1 (see Section 3) if there is no attacker existing. A challenge for the recommender is that it does not know the attacker models in advance. To address this challenge, we add an attacker component during the training to simulate the attacks for the test. By integrating a simulated attacker into the recommender, it can deal with the unseen attacks in the future. In this section, we introduce an attacker to infer the user private information from the transferred knowledge. In the next Section 4.3, we will introduce an adversarial recommender by exploiting the simulated attacker to regularize the recommendation process in order to fool the adversary so that it can protect the privacy of unseen users in the future.
The attacker model predicts the private user attribute from their source representation sent to the target domain: whereŷ u,p is the predicted value of user u's p-th private attribute and p = 1, ..., n. f p is the prediction model parameterized by θ p . Note that, an attacker can use any prediction model and here we use an MLP due to its nonlinearity and generality.
For all n private user attributes, the attacker model minimizes the multitask loss: where Θ = {θ p } n p=1 and D p is training examples for the p-th attribute. We introduce how to generate them in Section 4.4.

PrivNet
So far, we have introduced a recommender to exploit the knowledge from a source domain and a privacy attacker to infer user private information from the transferred knowledge. To fulfill the Problem 1 in Section 3, we need to achieve two goals: improving the target recommendation and protecting the source privacy. In this section, we propose a novel model (PrivNet) by exploiting the attacker component to regularize the recommender.
Since we have two rival objectives (i.e., target quality and source privacy), we adopt the adversarial learning technique (Goodfellow et al., 2014) to learn a privacy-aware transfer model. The generator is a privacy attacker which tries to accurately infer the user privacy, while the discriminator is an recommender which learns user preferences and deceives the adversary. The recommender of PrivNet minimizes:L where the hyperparameter λ controls the influence from the attacker component. PrivNet seeks to improve the recommendation quality (the first term on the right-hand side) and fools the adversary by maximizing the loss of the adversary (the second term,). The adversary has no control over the transferred knowledge, i.e., x u|trans . Losses of the two components are interdependent but they optimize their own parameters. PrivNet is a general framework since both the recommender and the attacker can be easily replaced by their variants. PrivNet reduces to privacy-agnostic transfer model when λ = 0.

Generating Training Examples
We generate D T and D S as follows and take the target domain as an example since the procedure is the same for the source domain. Suppose we have a whole item interaction history for some user u, say [i 1 , i 2 , ..., i l ]. Then we generate the positive training examples by sliding over the sequence of the history: We adopt the random negative sampling technique (Pan et al., 2008) to generate the corresponding neg- As the same with (Weinsberg et al., 2012;Beigi et al., 2020), we assume that some users U pub ⊂ U share their private attributes with the public profile. Then we have the labelled privacy data D priv = {D p } n p=1 where D p = {(u, y u,p ) : u ∈ U pub }.

Model Learning
The training process of PrivNet is illustrated in Algorithm 1. Lines 1-3 are to optimize the privacy part related parameter, i.e., Θ in L(Θ). On line 1, it creates a mini-batch size examples from data D priv . Each example contains a user and their corresponding private attributes (u, {y u,p } n p=1 ). On line 2, it feeds users and their historical items in the source domain to the source network so as to generate transferred knowledge x u|trans . On line 3, the transferred knowledge and their corresponding private attributes (x u|trans , {y u,p } n p=1 ) are used to train the privacy attacker component by descending its stochastic gradient using the mini-batch examples: ∇ Θ L(Θ). Line 4 is to optimize the recommender part related parameter, i.e., θ by descending its stochastic gradient with adversary loss using mini-batch examples:∇ θL (θ).

Complexity Analysis
The parameter complexity of PrivNet is the addition of its recommender component and the privacy component. The embedding matrices of the recommender dominate the number of parameters as they vary with the input. As a result, the parameter complexity of PrivNet is O(d · (n S + n T )) where d is the embedding dimension, and n S and n T are the number of items in the source and target domains respectively.
The learning complexity of PrivNet divides into two parts: the forward prediction and backward parameter update. The forward prediction of PrivNet  is the addition of its recommender component and two times of the privacy component since the recommender component needs the loss from the privacy component. The complexity of backward parameter update is the addition of its recommender component and the privacy component since they optimize their own parameters.

Experiments
In this section, we conduct experiments to evaluate both recommendation performance and privacy protection of PrivNet.

Dataset
We evaluate on the following real-world datasets. Foursquare (FS) It is a public available data on user-venue checkins (Yang et al., 2019a). The source and target domains are divided by the checkin's time, i.e., dealing with the covariate shift issues where the distribution of the input variables change between the old data and the newly collected one. The private user attribute is Gender.
MovieLens (ML) It is a public available data on user-movie ratings (Harper and Konstan, 2016). We reserve those ratings over three stars as positive feedbacks. The source and target domains are divided by the movie's release year, i.e., transferring from old movies to the new ones. The private user attributes are Gender and Age. Following (Beigi et al., 2020), we categorize Age into three groups: over-45, under-35, and between 35 and 45.
The statistics are summarized in Table 1 and we can see that all of the datasets have more than 99% sparsity. It is expected that the transfer learning technique is helpful to alleviate the data sparsity issues in these real-world recommendation services.

Evaluation Metric
For privacy evaluation, we follow the protocol in (Jia and Gong, 2018) to randomly sample 80% of users as the training set and treat the remaining users as the test set. The users in the training set has publicly shown their private information while  the users in the test set keep it private. We split a small data from the training set as the validation set where the ratio is train:valid:test=7:1:2. For privacy metrics, we compute Precision, Recall, and F1-score in a weighted 1 way which are suitable for imbalanced data distribution (Fawcett, 2006). We report results for each private attribute. We first calculate metrics for each label, and then compute their average weighted by support (the number of true instances for each label). A lower value indicates better privacy protection. For recommendation evaluation, we follow the leave-one-out strategy in (He et al., 2017), i.e., reserving the latest one interaction as the test item for each user, then randomly sampling a number of (e.g., 99) negative items that are not interacted by the user. We evaluate how well the recommender can rank the test item against these negative ones.
We split a small data from the training set as the validation set where the ratio is train:valid:test=7:1:2. For recommendation metrics, we compute hit ratio (HR), normalized discounted cumulative gain (NDCG), mean reciprocal rank (MRR), and AUC for top-K (default K = 10) item recommendation (Gao et al., 2019a). A higher value indicates better recommendation.

Implementation
All methods are implemented using TensorFlow. Parameters are initialized by default. The optimiz-1 Note, the weighted F1 values are not necessarily equal to the harmonic mean of the corresponding Precision and Recall values.

Methods
Knowledge transfer Privacy protection (+strategy) BPRMF (Rendle et al., 2009) MLP (He et al., 2017) CSN (Misra et al., 2016) CoNet (Hu et al., 2018) BlurMe (Weinsberg et al., 2012) (+perturbation) LDP (Bassily and Smith, 2015) (+noise) PrivNet (ours) (+adversary) er is the adaptive moment estimation with learning rate 5e-4. The size of mini-batch is 128 with negative sampling ratio 1. The embedding size is 80 while the MLP has one hidden layer with size 64. The history size is 10. λ is 1 in Eq. (6). The noise level is 10%. The number of dummy items are 5. The privacy related metrics are computed by Python scikit-learn library. The setting of hyperparameters used to train our model and the baselines is summarized in Table 2.

Baseline
We compare PrivNet with various kinds of baselines as summarized in Table 3.
The following methods are privacy-agnostic. BPRMF: Bayesian personalized ranking (Rendle et al., 2009) is a latent factors approach which learns user and item factors via matrix factorization. MLP: Multilayer perceptron (He et al., 2017) is a neural CF approach which learns the user-item interaction function using neural networks. CSN: The cross-stitch network (Misra et al., 2016) is a deep transfer learning model which couples the two basic networks via a linear combination of activation maps using a translation scalar. CoNet: Collaborative cross network (Hu et al., 2018) is a deep transfer learning method for cross-domain recommendation which learns linear combination of activation maps using a translation matrix.
The following methods are privacy-aware. BlurMe: This method (Weinsberg et al., 2012) perturbs a user's profile by adding dummy items to their history. It is a representative of the perturbation-based technique to recommend items while protect private attributes. LDP: Local differential privacy (Bassily and Smith, 2015) modifies user-item ratings by adding noise to them based on the differential privacy. It is a representative of the noise-based technique to recommend items while protect private attributes. Note, the original LDP and BlurMe are single-domain models which are also used as comparing baselines in (Beigi et al.,  2020). To be fair and to investigate the influence of privacy-preserving strategies, we replace the adversary strategy of PrivNet with the strategy of LDP (adding noise) and BlurMe (perturbing ratings), and keep the other components the same.

Result on Recommendation Performance
The results of different methods on recommendation are summarized in Table 4. A higher value indicates better recommendation performance.
Comparing with the privacy-agnostic methods (BPRMF, MLP, CSN, and CoNet), PrivNet is superior than them with a large margin on the Movie-Lens dataset. This shows that PrivNet is effective in recommendation while it protects the source private attributes. Since these four methods represent a wide range of typical recommendation methods (matrix factorization, neural CF, transfer learning), we can see that the architecture of PrivNet is a reasonable design for recommender systems.
Comparing with the privacy-aware methods (LD-P and BlurMe), we can see that LDP significantly degrades recommendation performance with a reduction about six to ten percentage points on the Foursquare dataset. This shows that LDP suffers from the noisy source information since it harms the usefulness of the transferred knowledge to the target task. For BlurMe, we can see that BlurMe still degrades recommendation performance on the Foursquare dataset, for example with relative 4.0% performance reduction in terms of MRR. This shows that BlurMe suffers from the perturbed source information since it harms the usefulness of the transferred knowledge to the target task.
Among the privacy-aware methods, PrivNet achieves the best recommendation performance in terms of all HR, NDCG, and MRR on the Foursquare dataset, and the best in terms of HR on the MovieLens dataset. It shows that PrivNet is bet-  ter for improving the usefulness of the transferred knowledge by comparing with LDP and BlurMe. In summary, PrivNet is effective in transferring the knowledge, showing that the adversary strategy of PrivNet achieves state-of-the-art performance by comparing with the strategies of adding noise (LDP) and perturbing ratings (BlurMe).

Result on Privacy Protection
The results of different methods on privacy inference are summarized in Table 5 (Note, there are no results for the four privacy-agnostic methods). A lower value indicates better privacy protection.
Comparing PrivNet and BlurMe, we can see that the perturbation method by adding dummy items still suffers from privacy inference attacks in terms of Precision and Recall on the Foursquare dataset, and in terms of F1 on the MovieLens dataset. The reason may be that the attacker can effectively distinguish the true profiles from the dummy items. That is, it can accurately learn from the true profiles while ignore the dummy items. Comparing PrivNet and LDP, we can see that adding noise to ratings still suffers from privacy inference attacks in terms of Recall on the Foursquare dataset, and in terms of all three metrics on the MovieLens dataset. It implies that the occurrence of a rating, regardless of its numeric value (true or noisy), leaks the user privacy. That is, the binary event of excluding or including an item in a user's profile is a signal for user privacy inference nearly as strong as numerical ratings. In particular, there are 50 movies rated by Female only (e.g., Country Life (1994)) while 350 by Male only (e.g., Time Masters (1982)). Adding noise to these ratings may not influence the inference of Gender for these users very much.
PrivNet achieves nearly half the best results on privacy protection in terms of three evaluation metrics on the two datasets. It has significantly lower F1 scores in comparison to all baselines on the MovieLens dataset. It is effective to hide private information during the knowledge transfer. By simulating the attacks during the training, PrivNet is prepared against the malicious attacks for unseen users in the future. In summary, PrivNet is an effective source privacy-aware transfer model such that it makes the malicious attackers more difficult to infer the source user privacy during the knowledge transfer, compared with the strategies of adding noise (LDP) and perturbing ratings (BlurMe). Figure 1 shows t-SNE projections of 4,726 users' transferred representations on the MovieLens-Gender dataset. These user vectors are computed from the user encoder as shown in Figure 2. We can see that the vectors are more mixed distributed among male and female users with the training of PrivNet. In contract, the vectors for female users are clustered on the top-left corner while male users are on the bottom-right without the training of PrivNet (λ = 0, see Section 5.6.1). To quantify the difference, we perform K-means clustering on the user vectors where K=2, and calculate the Vmeasure (Rosenberg and Hirschberg, 2007) which assesses the degree of overlap between the 2 clusters and the Gender groups. The measure is 0.0119 and 0.0027 respectively for without and with training of PrivNet. Note that a lower measure is better since we do not want to the two classes to be easily separable.

Parameter Sensitivity
In this section, we analyse the model ablation, impact of privacy inference component, and impact of public users who share their profiles.

Model Ablation
The key component of PrivNet is the adversary loss used to regularize the recommender. We remove (a) Privacy attacker.
(b) Public users. this component to show its necessity to protect the private attributes by setting the λ = 0 in Eq. (6). The results are summarized in Table 6. As we expect, PrivNet without adversary loss is most vulnerable to privacy attacks since it has no privacy defense. There is a significant drop in terms of all three privacy-related metrics without this model component.

Impact of Privacy Component
We vary the λ (see Eq. (6)) of privacy component with {0, 0.1, 0.25, 0.5, 0.75, 1.0} to show the its impact on privacy protection and recommendation (where λ = 0 corresponds to without privacy attack component, see also Table 6). Figure 3a shows the impact on privacy protection. The privacy inference generally becomes more difficult with the increase of λ, showing that the privacy inference component of PrivNet is a key factor for protecting the user privacy in the source domain. In particular, all results of λ = 0 are better than that of λ = 0 in hiding the private information. Privacy inference results, however, are subtle among different private attributes and evaluation metrics. On the Foursquare dataset, F1 decreases at first (until λ to 0.1), then it increases. On the MovieLens-Gender dataset, the F1 score decreases at first (until λ to 0.25) and then it increases. It means that the private information is obscured more successfully in the beginning but less in the end. The reason may be that the model overfits by increasing the value of λ and leads to an inaccurate estimation of privacy inference. On the MovieLens-Age dataset, the F1 score consistently decreases with the increase of λ. Figure 4a shows the impact on recommendation performance. The recommendation performance decreases with λ increasing from 0 to 0.1 on the MovieLens dataset, showing that increasing the impact of privacy inference component harms the   recommendation quality to some extent.

Impact of Public Users
We vary the percentage of public users U pub (see Section 3) with {10, 30, 50, 70, 80, 90}. Figure 3b shows the impact on the privacy inference. It is surprising that the privacy inference does not become more easy with the increase of public users. On the Foursquare dataset, it infers inaccurately until the percentage increases to 50% and then accurately until to 80% in terms of F1. This shows that the adversary strategy of PrivNet is effective to protect unseen users' privacy when only a small number of users (e.g., 10%) reveal their profiles for the training. On the MovieLens dataset, it infers inaccurately after 50% until to 80% in terms of F1. Figure 4b shows the impact on recommendation performance. Since the amount of public users controls how much knowledge is shared between the source and target domains, the recommendation performance improves with the increasing amount of public users. In summary, PrivNet is favourable in practice since it can achieve a good tradeoff on the utility and privacy when only a small amount of users reveal their profiles to the public.

Case Study
One advantage of PrivNet is that it can explain which item in a user's history matters the most for a candidate item by using the attention weights. Table 7 shows an example of interactions between a user's historical movies (No. 0∼9) and the candidate movie (No. 10). We can see that the latest  movie matters a lot since the user interests may remain the same during a short period. The oldest movie, however, also has some impact on the candidate movie, reflecting that the user interests may mix with a long-term characteristic. PrivNet can capture these subtle short-/long-term user interests. Furthermore, the movie (No. 7) belonging to the same genre as the candidate movie matters the most. PrivNet can also capture this high-level category relationship.

Conclusion
We presented an attack scenario to infer the private user attributes from the transferred knowledge in recommendation, raising the issues of source privacy leakage beyond target performance. To protect user privacy in the source domain, a privacy-aware transfer model (PrivNet) is proposed beyond improving the performance in the target domain. It is effective in terms of recommendation performance and privacy protection, achieving a good trade-off between the utility and privacy of the transferred knowledge. In future works, we want to relax the assumption that the private user attributes need to provide in advance in order to train the privacy inference component for protecting unseen users. this paper indexed in the ACL anthology. The work was supported by Hong Kong CERG projects 16209715/16244616, and Hong Kong PhD Fellowship Scheme.