TextHide: Tackling Data Privacy for Language Understanding Tasks

An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only 1.9%. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem.


Introduction
Data privacy for deep learning has become a challenging problem for many application domains including Natural Language Processing. For example, healthcare institutions train diagnosis systems on private patients' data (Pham et al., 2017;Xiao et al., 2018). Google trains a deep learning model for next-word prediction to improve its virtual keyboard using users' mobile device data (Hard et al., 2018). Such data are decentralized but moving them to a centralized location for training a model may violate regulations such as Health Insurance Portability and Accountability Act (HIPAA) (Act, 1996) and California Consumer Privacy Act (CCPA) (Legislature, 2018).
Federated learning (McMahan et al., year;Kairouz et al., 2019) allows multiple parties training a global neural network model collaboratively in a distributed environment without moving data to a centralized storage. It lets each participant compute a model update (i.e., gradients) on its local data using the latest copy of the global model, and then send the update to the coordinating server. The server then aggregates these updates (typically by averaging) to construct an improved global model.
Privacy has many interpretations depending on the assumed threat models (Kairouz et al., 2019). This paper assumes an eavesdropping attacker with access to all information communicated by all parties, which includes the parameters of the model being trained. With such a threat model, a recent work (Zhu et al., 2019) suggests that an attacker can reverse-engineer the private input.
Multi-party computation (Yao, 1982) or homomorphic encryption (Gentry, 2009) can ensure full privacy but they slow down computations by several orders of magnitude. Differential privacy (DP) approach (Dwork et al., 2006;Dwork, 2009) is another general framework to ensure certain amount of privacy by adding controlled noise to the training pipeline. However, it trades off data utility for privacy preservation. A recent work that applies DP to deep learning was able to reduce accuracy losses (Abadi et al., 2016) but they still remain relatively high.
The key challenge for distributed or federated learning is to ensure privacy preservation without slowing down training or reducing accuracy. In this paper, we propose TextHide to address this challenge for natural language understanding tasks. The goal is to protect training data privacy at a minimal cost. In other words, we want to ensure that an adversary eavesdropping on the communicated bits will not be able to reverse-engineer training data from any participant.
TextHide requires each participant in a distributed or federated learning setting to add a simple encryption step with one-time secret keys to hide the hidden representations of its text data.
The key idea was inspired by InstaHide (Huang et al., 2020) for computer vision tasks, which encrypts each training datapoint using a random pixel-wise mask and the MixUp technique (Zhang et al., 2018a) of data augmentation. However, application of InstaHide to text data is unclear because of the well-known dissimilarities between image and language: pixel values are real numbers whereas text is sequences of discrete symbols.
TextHide is designed to plug into the popular framework which transforms textual input into output vectors through pre-trained language models (e.g., BERT (Devlin et al., 2019)) and use those output representations to train a new shallow model (e.g., logistic regression) for any supervised single-sentence or sentence-pair task. The pretrained encoder is fine-tuned as well while training the shallow model. We evaluate TextHide on the GLUE benchmark (Wang et al., 2019). Our results show that TextHide can effectively defend attacks on shared gradients or representations while the averaged accuracy reduction is only 1.9%.
Lastly, TextHide and InstaHide have completely different security arguments due to the new designs. To understand the security of the proposed approach, we also invent a new security argument using a conjecture about the computational intractability of a mathematical problem.
2 InstaHide and Its Challenges for NLP InstaHide (Huang et al., 2020) has achieved good performance in computer vision for privacypreserving distributed learning, by providing a cryptographic 2 security while incurring much smaller utility loss and computation overhead than the best approach based on differential privacy (Abadi et al., 2016).
InstaHide is inspired by the observation that a classic computation problem, k-VECTOR SUBSET SUM 3 , also appears in the MixUp (Zhang et al., 2018a) method for data augmentation, which is used to improve accuracy on image data.
2 Cryptosystem design since the 1970s seeks to ensure any attack must solves a computationally expensive task.
3 k-VECTOR SUBSET SUM is known to be hard: in the worst case, finding the secret indices requires ≥ N k/2 time (Abboud and Lewi, 2013) under the conjecture Exponential Time Hypothesis (Impagliazzo et al., 1998). See Appendix A.
To encrypt an image x ∈ R d from a private dataset, InstaHide first picks k − 1 other images s 2 , s 3 , . . . , s k from that private dataset, or a large public dataset of N images, and random nonnegative coefficients λ i for i = 1, .., k that sum to 1, and creates a composite image λ 1 x + k i=2 λ i s i (k is typically small, e.g., 4). A composite label is also created using the same set of coefficients. 4 Then it adds another layer of security: pick a random mask σ ∈ {−1, 1} d and output the encryption where • is coordinatewise multiplication of vectors. The neural network is then trained on encrypted images, which look like random pixel vectors to the human eye and yet lead to good classification accuracy. Note that the "one-time secret key" σ, s 2 , · · · , s k used to encrypt x will not be reused to encrypt other images.
Challenges of applying InstaHide to NLP. There are two challenges to apply InstaHide to text data for language understanding tasks. The first is the discrete nature of text, while the encryption in InstaHide operates at continuous inputs. The second is that most NLP tasks today are solved by fine-tuning pretrained language models such as BERT on downstream tasks. It remains an open question how to add encryption into such a framework and what type of security argument it will provide. The following section presents our approach that overcomes these two challenges.

TextHide: Formal Description
There are two key ideas in TextHide. The first one is using the "one-time secret key" coming from In-staHide for encryption, and the second is a method to incorporate such encryption into the popular framework of fine-tuning a pre-trained language model e.g., BERT (Devlin et al., 2019).
In the following, we will describe how to integrate TextHide in the federated learning setting (Section 3.1), and then present two TextHide schemes (Section 3.2 and 3.3). We analyze the security of TextHide in Section 3.4.

Fine-tuning BERT with TextHide
In a federated learning setting, multiple participants holding private text data may wish to solve NLP tasks by using a BERT-style fine-tuning

TextHide Representation TextHide Label
Label: Label: Figure 1: An illustration of TextHide encryption with k = 2, where k is the number of inputs (sentence or sentence-pair) got mixed in each TextHide representation. TextHide first encodes each text input using a transformer encoder, then linearly combines their output representations (i.e., [CLS] tokens), as well as their labels. Finally, an entry-wise mask is chosen from a randomly pre-generated pool and applied on the mixed representation. The entry-wise mask, together with the other datapoints to mix constitute the "one-time secret key" of the TextHide scheme. Note that training directly takes place on encrypted data and no decryption is needed. pipeline, where TextHide, a simple InstaHideinspired encryption step can be applied at its intermediate level to ensure privacy (see Figure 1). The BERT fine-tuning framework assumes (input, label) pairs (x, y)'s, where x takes the form of [CLS]s 1 [SEP]for single-sentence tasks, or [CLS]s 1 [SEP]s 2 [SEP]for sentence-pair tasks. y is a one-hot vector for classification tasks, or a real-valued number for regression tasks. 5 For a standard fine-tuning process, federated learning participants use a BERT-style model f θ 1 to compute hidden representations f θ 1 (x)'s for their inputs x's and then train a shallow classifier h θ 2 on f θ 1 (x), while also fine-tuning θ 1 . The parameter vectors θ 1 , θ 2 will be updated at the central server via pooled gradients. All participants hold current copies of the two models.
To ensure privacy of their individual inputs x's, federated learning participants can apply TextHide encryption at the output f θ 1 (x)'s. The model h θ 2 will be trained on these encrypted representations. Each participant will compute gradients by backpropagating through their private encryption, and this is going to be the source of the secrecy: the attacker can see the communicated gradients but not the secret encryptions, which limits leakage of information about the input.
We then formally describe two TextHide schemes for fine-tuning BERT in the federated learning setting: TextHide intra which encrypts an input using other examples from the same dataset, and TextHide inter which utilizes a large public dataset to perform encryption. Due to a large public dataset, TextHide inter is more secure than TextHide intra , but the latter is quite secure in practice when the training set is large.

Basic TextHide: Intra-Dataset TextHide
In TextHide, we have a pre-trained text encoder f θ 1 , which takes x, a sentence or a sentence pair, and maps it to a representation e = f θ 1 (x) ∈ R d (e.g., d = 768 for BERT base ). We use [b] to denote the set {1, 2, · · · , b}. Given a dataset D, we denote the set {x i , y i } i∈ [b] an "input batch" by B, where x 1 , · · · , x b are b inputs randomly drawn from D, and y 1 , · · · , y b are their labels. For each x i in the batch B, i ∈ [b], we can encode x i using f θ 1 , and obtain a new set of {e i = f θ 1 (x i ), y i } i∈ [b] . We refer to this set as an "encoding batch", and denote it by E. Later in this section, we use e i to denote the TextHide encryption of e i for i ∈ [b], and name the set E = { e i , y i } i∈[b] as a "hidden batch" of E.
We use σ ∈ {−1, +1} d to denote an entry-wise sign-flipping mask. For a TextHide scheme, M = {σ 1 , · · · , σ m } denotes its randomly pre-generated mask pool of size m, and k denotes the number of sentences combined in a TextHide representation. We name such a parametrized scheme as (m, k)-TextHide.
Plug into federated BERT fine-tuning. Algorithm 2 shows how to incorporate (m, k)-TextHide in federated learning, to allow a centralized server and C distributed clients collaboratively fine-tune a language model (e.g., BERT) for any downstream tasks, without sharing raw data. Each client (indexed by c) holds its own private data D c and a private mask pool M c , and C c=1 |M c | = m. The procedure takes a pre-trained BERT f θ 1 and an initialized task-specific classifier h θ 2 , and runs T steps of global updates of both θ 1 and θ 2 . In each global update, the server aggregates local updates of C clients. For a local update at client c, the client receives the latest copy of f θ 1 and h θ 2 from the server, samples a random input batch {x i , y i } i∈[b] from its private dataset D c , and encodes it into an encoding batch To protect privacy, each client will run (m, k)-TextHide with its own mask pool M c to encrypt the encoding batch E into a hidden batch E (line 22 in Algorithm 2). The client then uses the hidden batch E to calculate the model updates (i.e., gradients) of both the BERT encoder f θ 1 and the shallow classifier h θ 2 , and returns them to the server (line 23 in Algorithm 2). The server averages all updates from C clients, and runs a global update for f θ 1 and h θ 2 (lines 12, 13 in Algorithm 2).
Algorithm 2 Federated fine-tuning BERT using (m, k)-TextHide with C clients (indexed by c) 1: m: size of each client's mask pool 2: k: number of training samples to be mixed 3: d: hidden size (e.g., 768 in BERT) 4: procedure SERVEREXECUTION(f θ 1 , h θ 2 ) 5: f θ 1 : the pre-trained BERT; h θ 2 : a shallow classifer 6: T : number of model updates, η: learning rate 7: for each client c in parallel do 10: Run on Client c 18: b: batch size; D c : private train set of client c 19: Mc: the mask pool of size m owned by client c,

Inter-dataset TextHide
Inter-dataset TextHide encrypts private inputs with text data from a second dataset, which can be a large public corpus (e.g., Wikipedia). The large public corpus plays a role reminiscent of the random oracle in cryptographic schemes (Canetti et al., 2004).
Assume we have a private dataset D private and a large public dataset D public , TextHide inter randomly chooses k/2 sentences from D private and the other k/2 from D public , mixes their representations, and applies on it a random mask from the pool. A main difference between TextHide inter and TextHide intra is, TextHide intra mixes all labels of inputs used in the combination, while in TextHide inter , only the labels from D private will be mixed (there is usually no label from the public dataset). Specifically, for an original datapoint denote the set of data points' indices that its TextHide encryption combines, and |S| = k. Then its TextHide inter label is given by is a permutation.

On Security of TextHide
The encrypted representations produced by Tex-tHide themselves are secure -i.e., do not allow any efficient way to recover the text xfrom the security framework of InstaHide (see Appendix A for k-VECTOR SUBSET SUM). However, an additional source of information leakage is the shared gradients during federated learning, as shown by (Zhu et al., 2019). We mitigate this by ensuring that the secret mask σ used to encrypt the representation of input x is changed each epoch. The pool of masks is usually much larger than the number of epochs, which means that each mask gets used only once for an input (with negligible failure probability). The gradient-matching attack of (Zhu et al., 2019) cannot work in this scenario. In the following section, we will show that it does not even work with a fixed mask.

Experiments
We evaluate the utility and privacy of TextHide in our experiments. We aim to answer the following questions in our experiments: • What is the accuracy when using TextHide for sentence-level natural language understanding tasks (Section 4.2)?
• How effective is TextHide in terms of hiding the gradients (Section 4.3) and the representations of the original input (Section 4.4)?

Experimental Setup
Dataset. We evaluate TextHide on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019), a collection of 9 sentence-level language understanding tasks: • Two sentence-level classification tasks including Corpus of Linguistic Acceptability (CoLA) (Warstadt et al., 2019), and Stanford Sentiment Treebank (SST-2) (Socher et al., 2013 (Devlin et al., 2019;Joshi et al., 2020), we exclude WNLI in the evaluation. Table 1 summarizes the data size, tasks and evaluation metrics of all the datasets. All tasks are single-sentence or sentence-pair classification tasks except that STS-B is a regression task.
Implementation. We fine-tune the pre-trained cased BERT base model released by (Devlin et al., 2019) on each dataset. We notice that generalizing to different masks requires a more expressive classifier, thus instead of adding a linear classifier on top of the [CLS] token, we use a multilayer perceptron of hidden-layer size (768,768,768) to get better performance under TextHide. We use AdamW (Kingma and Ba, 2015) as the optimizer, and a linear scheduler with a warmup ratio of 0.1. More details of hyperparameter selection are given in Appendix B.3. To show TextHide's compatibility with the state-of-the-art model, we also test with the RoBERTa base model released by (Liu et al., 2019) and report the results in Appendix B.2.

Accuracy Results of TextHide
To answer the first question, we compare the accuracy of TextHide to the BERT baseline without any encryption.
We the vary TextHide scheme as follows: Results with different (m, k) pairs. Figure 2 shows the performance of TextHide intra parameterized with different (m, k)'s. When m is fixed,  Increasing m makes learning harder since the network needs to generalize to different masking patterns. However, for most datasets (except for RTE), TextHide with m = 256 only reduces accuracy slightly comparable to the baseline. Our explanation for the poor performance on RTE is that we find training on this small dataset (even without encryption) to be quite unstable. This has been observed in (Dodge et al., 2020) before. In general, TextHide can work with larger m (better security) when the training corpus is larger (e.g., m = 512 for data size > 100k).
TextHide intra vs. TextHide inter . TextHide intra mixes the representations from the same private dataset, whereas TextHide inter combines representations of private inputs with representations of random inputs from a large public corpus (MNLI in our case). Table 1 shows the results of the baseline and TextHide (both TextHide intra and TextHide inter ) on the GLUE benchmark, with (m, k) = (256, 4) except for RTE with (m, k) = (16, 4). The averaged accuracy reduction of TextHide intra is 1.9%, when compared to the baseline model. With the same (m, k), TextHide inter incurs an additional 2.5% accuracy loss on average, but as previously suggested, the large public corpus gives a stronger notion of security.

Security of Gradients in TextHide
We test TextHide against the gradients matching attack in federated learning (Zhu et al., 2019), which has been shown effective in recovering private inputs from public gradients.
Gradients matching attack. Given a public model and the gradients generated by private data from a client, the attacker aims to recover the private data: he starts with some randomly initialized dummy data and dummy labels (i.e., a dummy batch). In each iteration of attack, he calculates the 2 -distance between gradients generated by the dummy batch and the real gradients, and backpropagates that loss to update the dummy batch (see Algorithm 3 in Appendix C for details).
The original attack is infeasible in the TextHide setting, because the attacker can't backpropagate the loss of the dummy batch through the secret mask of each client. Thus, we enhance the attack by allowing the attacker to learn the mask: at the beginning of the attack, he also generates some dummy masks and back-propagates the loss of gradient to update them.
Setup and metric. We use the code 7 of the original paper (Zhu et al., 2019) for evaluation. Due to the unavailability of their code for attacks in text data, we adapted their setting for computer vision (see Appendix C for more details). We use the success rate as the metric: an attack is said to be successful if the mean squared error between the original input and the samples recovered from gradients is ≤ 0.001. We vary two key variables in the evaluation: k and d, where d is the dimensionality of the representation (768 for BERT base ).
Test the leakage upper bound. We run the attack in a much easier setting for the attacker to test the upper bound of privacy leakage: • The TextHide scheme uses a single mask throughout training (i.e., m = 1).
• The batch size is 1. 8 • The attacker knows the true label for each private input. 9 7 https://github.com/mit-han-lab/dlg 8 The original paper (Zhu et al., 2019) pointed out that attacking a larger batch is more difficult. 9 As suggested by Zhao et al. (2020), guessing the correct label is crucial for success in the attack.   TextHide makes gradients matching harder. As shown in Table 2, increasing d, greatly increases the difficulty of attack -for no mixing (k = 1), a representation with d = 1024 reduces the success rate of 82% (baseline) to only 8%. The defense becomes much stronger when combined with mixing: a small mask of 4 entries combined with k = 2 makes the attack infeasible in the tested setting. Figure 3 suggests that the success of this attack largely depends on whether the mask is successfully matched, which is aligned with the security argument of TextHide in Section 3.4.

Effectiveness of Hiding Representations
We also design an attack-based evaluation to test whether TextHide representations effectively "hide" its original representations, i.e., how 'different' the TextHide representation is from its original representation. In Appendix C, we present another attack, which suggests that a deep architecture can not be trained to reconstruct the original representations from the TextHide representation.

Representation-based Similarity Search (RSS).
Given a corpus of size n, and 1) a search index: , where x i is the i-th example in the training corpus, e i is x i 's  Table 3: Averaged similarity score of five metrics over 1,000 independent RSS attacks on CoLA (a) and . For each score, the scheme with the worst similarity (best hiding) is marked in bold. Rand: random baseline. As shown, attacker against TextHide gives similar performance to random guessing.
2) a query e: TextHide representation of any input x in the corpus, RSS returns x v from the index such that v = arg min i∈ [n] cos(e i , e). If x v is dramatically different from x, then e hides e (the original representation of x ) effectively. To build the search index, we dump all (x i , e i ) pairs of a corpus by extracting each sentence's [CLS] token from the baseline BERT model. We use Facebook's FAISS library (Johnson et al., 2017) for efficient similarity search to implement RSS. Metrics. The evaluation requires measuring the similarity of a sentence pair, (x, x * ), where x is a sample in corpus, and x * is RSS's answer given x's encoding e as query. Our evaluation uses three explicit leakage metrics: words in x * |/|words in x ∪ words in x * | • TF-IDF sim : cosine similarity between x's and x * 's TF-IDF representation in the corpus and two implicit (semantic) leakage metrics: • Label: 1 if x * , x have the same label, else 0.
• SBERT sim : cosine similarity between x's and x * 's SBERT representations pretrained on  Table 4: Example queries and answers of RSS with different representation schemes. We mark words with similar meanings in the same color. We annotate the acceptability for CoLA (' ': yes, '×': no) and sentiment for SST-2 (' ': positive, ' ': negative). Querying with a Mix-only representation still retrieve the original sentence (Query1), or sentence with similar meanings (Query2).

NLI-STS 10 (Reimers and Gurevych, 2019).
For all five metrics above, a larger value indicates a higher similarity between x and x * , i.e., worse 'hiding'.
Test Setup. For an easier demonstration, we run RSS on two single-sentence datasets CoLA and SST-2 with TextHide intra . The results presumably can generalize to larger datasets and TextHide inter , since attacking a small corpus with a weaker security is often easier than attacking a larger one with a stronger security. For each task, we test three (m, k) variants: baseline (m = 0, k = 1), mix-only (m = 0, k = 4), and TextHide (m = 256, k = 4). We report a random baseline for reference -for each query, the attacker returns an input randomly selected from the index.
Baseline. The result with original representation as query can be viewed as an upper bound of privacy leakage where no defense has been taken. As shown in Table 3 and Table 4, RSS almost returns the correct answer all the time (i.e., Identity close to 1), which is a severe explicit leakage.
Mix-only. Mix-only representation greatly reduces both explicit leakage (i.e., gives much lower similarity on all first 3 metrics) compared to the undefended baseline. However, RSS still can query back the original sentence with Mixonly representations (see Query1 in Table 4). Also, semantic leakage, measured by Label and SBERT sim , is higher than the random baseline.
TextHide TextHide works well in protecting both explicit and semantic information: sample attacks on TextHide (see Table 4) return sentences seemingly irrelevant to the original sentence hidden in the query representation. Note that the sophisticated attacker (RSS) against TextHide gives similar performance to a naive random guessing attacker.

Related Work
Differential privacy. Differential privacy (Dwork et al., 2006;Dwork and Roth, 2014) adds noise drawn from certain distributions to provide guarantees of privacy.
Applying differential privacy techniques in distributed deep learning is interesting but non-trivial. Shokri and Shmatikov (2015) proposed a distributed learning scheme by directly adding noise to the shared gradients. Abadi et al. (2016) proposed to dynamically keep track of privacy spending based on the composition theorem (Dwork, 2009), andMcMahan et al. (2018) adapted this approach to train large recurrent language models. However, the amount of privacy guaranteed drops with the number of training epochs and the size of shared parameters (Papernot et al., 2020), and it remains unclear how much privacy can still be guaranteed in practical settings. Cryptographic methods. Homomorphic encryption (Gentry, 2009;Graepel et al., 2012;Li et al., 2017) or secure multi-party computation (MPC) (Yao, 1982;Beimel, 2011;Mohassel and Zhang, 2017;Dolev et al., 2019) allow multiple data cites (clients) to jointly train a model over their private inputs in distributed learning setting. Recent work proposed to use cryptographic methods to secure federated learning by designing a secure gradients aggregation protocol (Bonawitz et al., 2017) or encrypting gradients (Aono et al., 2017). However, these approaches shared the same key drawback: slowing down the computation by several orders of magnitude, thus currently impractical for deep learning.

InstaHide. See Section 2.
Privacy in NLP. Training with user-generated language data raises privacy concerns: sensitive information can take the form of key phrases explicitly contained in the text (Harman et al., 2012;Hard et al., 2018); it can also be implicit (Coavoux et al., 2018;Pan et al., 2020), e.g., text data contains latent information about the author and situation (Hovy and Spruit, 2016;Elazar and Goldberg, 2018). Recently, Song and Raghunathan (2020) suggests that text embeddings from language models such as BERT can be inverted to partially recover some of the input data.
To deal with explicit privacy leakage in NLP, Zhang et al. (2018b) added DP noise to TF-IDF (Salton andMcGill, 1986) textual vectors, andHu et al. (2020) obfuscated the text by substituting each word with a new word of similar syntactic role. However, both approaches suffer large utility loss when trying to ensure practical privacy.
Adversarial learning (Li et al., 2018;Hu et al., 2020) has been used to address implicit leakage to learn representations that are invariant to private-sensitive attributes. Similarly, Mosallanezhad et al. (2019) used reinforcement learning to automatically learn a strategy to reduce privateattribute leakage by playing against an attributeinference attacker. However, these approaches does not defend explicit leakage.

Conclusion
We have presented TextHide, a practical approach for privacy-preserving NLP training with a pretrain and fine-tuning framework in a federated learning setting. It requires all participants to add a simple encryption step with an one-time secret key. It imposes a slight burden in terms of computation cost and accuracy. Attackers who wish to break such encryption and recover user inputs have to pay a large computational cost.
We see this as a first step in using cryptographic ideas to address privacy issues in language tasks. We hope our work motivates further research, including applications to other NLP tasks. An important step could be to successfully train language models directly on encrypted texts, as is done for image classifiers. A k-VECTOR SUBSET SUM Cryptosystem design since the 1970s seeks to ensure that attackers can violate privacy only by solving a computationally expensive task. A simple example is the VECTOR SUBSET SUM problem (Bhattacharyya et al., 2011;Abboud et al., 2014). Here a set of N vectors v 1 , v 2 , . . . , v k ∈ R d are publicly released. The defender picks secret indices i 1 , i 2 , . . . , i k ∈ [N ] def = {1, · · · , N } and publicly releases the vector j v i j . Given this released vector the attacker has to find secret indices i 1 , i 2 , . . . , i k . In worst cases even when the answer happens to be unique, finding the secret indices requires ≥ N k/2 time (Abboud and Lewi, 2013) under the famous conjecture, Exponential Time Hypothesis (ETH) (Impagliazzo et al., 1998). Note that ETH is a stronger notion than NP = P, and ETH is widely accepted computational complexity community.

B.2 More Evaluations
Compatibility with the state-of-the-art model. To test if TextHide is also compatible with stateof-the-art models, we repeat our accuracy evaluation in Section 4.2 but replace the BERT base model with the RoBERTa base model (Liu et al., 2019).
As shown in Table 5, TextHide behaves consistently for BERT base and RoBERTa base : when incorporated with RoBERTa base , the averaged accuracy reduction of TextHide intra is 1.1% when compared with the baseline model (was 1.9% for BERT base ). TextHide inter incurs an additional 2.6% accuracy loss on average (was 2.5% for BERT base ).
TextHide inter with different public corpora: A case study of SST-2. We investigate whether using different public corpora affects the performance of TextHide inter . We fix SST-2 as the private dataset, set (m, k) = (256, 4), and choose the public corpora from unlabeled {QNLI, QQP, MNLI}. We intentionally make the public corpora larger than the private dataset (SST-2 in this test), since TextHide inter was designed to Algorithm 3 Gradients matching attack (Zhu et al., 2019) in TextHide 1: Require : 2: The function F (x; W ) can be thought of as a neural network 3: For each l ∈ [L], we define W l ∈ R m l ×m l−1 to be the weight matrix in l-th layer, and m0 = di and m l = do 4: Let W = {W1, W2, · · · , WL} denote the weights over all layers 5: Let L : R do×do → R denote loss function 6: Let g(x, y) = ∇L(F (x; W ), y) denote the gradients of loss function 7: Let g = g(σ, x, y)|σ=σ 0 ,x=x 0 ,y=y 0 denote the gradients computed on x0 with label y0, and secret mask σ0 8: procedure INPUTRECOVERYFROMGRADIENTS 9: x (1) ← N (0, 1), y (1) ← N (0, 1), σ (1) ← N (0, 1) Random initialization of the input, label and mask 10: for t = 1 → T do 11: Let Dg(σ, x, y) = g(σ, x, y) − g 2 2 12: end for 16: return x (T +1) , y (T +1) , σ (T +1) 17: end procedure use a large public corpus as the source of randomness to provide useful security. Table 6 suggests that for our case study of SST-2, the choice of the public corpus does not have a major impact on the final accuracy of TextHide inter . However, this may not be true for every dataset.

C Details of attacks C.1 Gradients matching attack
Algorithm 3 describes the gradients matching attack (Zhu et al., 2019) in TextHide setting. This attack aims to recover the original image from model gradients computed on it. As discussed in Section 2, masks are kept private in TextHide setting, thus the attacker also need to start from a dummy mask (line 9) and iteratively update it to compromise the real mask (line 14). In our experiment, we made this attack much easier for the     attacker, by revealing to him the real ground truth label (y 0 in line 7), which means he simply sets y (t) = y 0 throughout the attack.
Dataset and architecture. We used CIFAR-10 (Krizhevsky, 2009) Table 8: Similarity score of five metrics for RepRecon on CoLA (a) and SST-2 (b) datasets. We report the average score over 500 independent queries. Test queries come from only the dev set. For each score, the scheme with the worst similarity (best hiding) is marked in bold. As shown, attacker against TextHide gives similar performance to random guessing. LeCun et al., 1998) as the architecture to mimic TextHide.

(
Given the original LeNet-5, we firstly removed the last linear layer with output size d o , which gives us a new network. We use d c to denote the size of output in the new network. Then, we appended an MLP with hidden-layer size d m and output size d o to the new architecture. As in an (m, k)-TextHide scheme, for each private input, we first gets its TextHide representations by extracting the output from the hidden-layer, and mixes it with representations of other datapoints. We then apply a mask on this combination. Note: in this mimic setting, the mask's dimension is d m .

C.2 Representation-based Similarity Search (RSS)
Running-time. For CoLA, building the search index takes 267 seconds; each search takes < 0.1 seconds. For SST-2, building the index takes 1, 576 seconds; each search takes < 0.1 seconds.

C.3 Representation Reconstruction (RepRecon)
RepRecon tests whether a deep architecture can learn to disrupt our 'hiding' scheme. For an representation e ∈ R d , and its TextHide version e ∈ R d , RepRecon tries to reconstruct e from e by training a network f : R d → R d such that e − f ( e) 2 is minimized.
We use a multi-layer perception of hidden-layer size (1024, 1024) as the reconstruction architecture. We train the network on the train set of a benchmark for 20 epochs, and run evaluation using the dev set. We then run RSS to map the recovered representation to its closet sentence in the index, and measure the privacy leakage.
Quantitative and qualitative results of RepRecon are shown in Table 8 and Table 7.