A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

Balancing the privacy-utility tradeoff is a crucial requirement of many practical machine learning systems that deal with sensitive customer data. A popular approach for privacy-preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space, perturbed by sampling a spherical noise from an appropriate distribution, and then projected back to the discrete vocabulary space. While this allows the perturbation to admit the required metric differential privacy, often the utility of downstream tasks modeled on this perturbed data is low because the spherical noise does not account for the variability in the density around different words in the embedding space. In particular, words in a sparse region are likely unchanged even when the noise scale is large. %Using the global sensitivity of the mechanism can potentially add too much noise to the words in the dense regions of the embedding space, causing a high utility loss, whereas using local sensitivity can leak information through the scale of the noise added. In this paper, we propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. For any given noise scale, this metric adds an elliptical noise to account for the covariance structure in the embedding space. This heterogeneity in the noise scale along different directions helps ensure that the words in the sparse region have sufficient likelihood of replacement without sacrificing the overall utility. We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees. Additionally, we empirically show that our mechanism improves the privacy statistics to achieve the same level of utility as compared to the state-of-the-art Laplace mechanism.

In this paper, we propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. For any given noise scale, this metric adds an elliptical noise to account for the covariance structure in the embedding space. This heterogeneity in the noise scale along different directions helps ensure that the words in the sparse region have sufficient likelihood of replacement without sacrificing the overall utility. We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees. Additionally, we empirically show that our mechanism improves the privacy statistics to achieve the same level of utility as compared to the state-of-the-art Laplace mechanism.

Introduction
Machine learning has been successfully utilized in a wide variety of real world applications including information retrieval, computer graphics, speech recognition, and text mining. Technology companies like Amazon, Google, and Microsoft already provide MLaaS (Machine Learning as a Service), where customers can input their datasets for model training and receive black-box prediction results as output. However, those datasets may contain personal and potentially sensitive information, which can be exploited to identify the individuals in the datasets, even if it has been anonymized (Sweeney, 1997;Narayanan and Shmatikov, 2008). Removing personally identifiable information is often inadequate, since having access to the summary statistics on the dataset has been shown to be sufficient to infer individual's membership in the dataset with high probability (Homer N, 2008;Sankararaman S., 2009;Dwork et al., 2015). Moreover, machine learning models themselves can reveal information on the training data. In particular, sophisticated deep neural networks for natural language processing tasks like next word prediction or neural machine translation, often tend to memorize their training data, which makes them vulnerable to leaking information about their training data (Shokri et al., 2017;Salem et al., 2018).
To provide a quantifiable privacy guarantee against such information leakage, Differential Privacy (DP) has been adopted as a standard framework for privacy-preserving analysis in statistical databases (Dwork et al., 2006;Dwork, 2008;Dwork et al., 2014). Intuitively, a randomized algorithm is differentially private if the output distributions from two neighboring databases are indistinguishable. However, a direct application of DP to text analysis can be too restrictive because it requires a lower bound on the probability of any word to be replaced by any other word in the vocabulary.
Metric differential privacy arises as a generalization of local differential privacy (Kasiviswanathan et al., 2011), which originated in protecting location privacy such that locations near the user's location are assigned with higher probability while those far away are given negligible probability . In the context of privacy-preserving text analysis, metric differential privacy implies that the indistinguishability of the output distributions of any two words in the vocabulary is scaled by their distance, where the distance metrics used in the literature include Hamming distance (reduced to DP), Manhattan distance (Chatzikokolakis et al., 2015), Euclidean distance Fernandes et al., 2019;Feyisetan et al., 2020), Chebyshev distance (Wagner and Eckhoff, 2018), hyperbolic distance (Feyisetan et al., 2019).
In this paper, we propose a novel privacypreserving text perturbation method by adding an elliptical noise to word embeddings in the Euclidean space, where the scale of the noise is calibrated by the regularized Mahalanobis norm (formally defined in Section 3). We compare our method to the existing multivariate Laplace mechanism for privacy-preserving text analysis in the Euclidean space (Fernandes et al., 2019;Feyisetan et al., 2020). In both papers, text perturbation is implemented by adding a spherical noise sampled from multivariate Laplace distribution to the original word embedding. However, the spherical noise does not account for the structure in the embedding space. In particular, words in a sparse region are likely unchanged even when the scale of noise is large. This can potentially result in severe privacy breach when sensitive words do not get perturbed. To increase the substitution probability of words in sparse regions, the scale of noise has to be large in the multivariate Laplace mechanism, which will hurt the downstream machine learning utility.
We address this problem by adding an elliptical noise to word embeddings according to the covariance structure in the embedding space. The intuition is that given a fixed scale of noise, we want to stretch the noise equidistant contour in the direction so that the substitution probability of words in the sparse region is increased on average. Intuitively, this direction is the one that explains the largest variability in the word embedding vectors in the vocabulary. We prove the theoretical metric differential privacy guarantee of the proposed method. Furthermore, we use empirical analysis to show that the proposed method significantly improves the privacy statistics while achieves the same level of utility as compared to the multivariate Laplace mechanism. Our main contributions are as follows: • We develop a novel Mahalanobis mechanism for differentially private text perturbation, which calibrates the elliptical noise by accounting for the covariance structure in the word embedding space.
• A theoretical metric differential privacy proof is provided for the proposed method.
• We compare the privacy statistics and utility results between our method and the multivariate Laplace Mechanism through experiments, which demonstrates that our method has significantly better privacy statistics while preserving the same level of utility.

Related Works
Privacy-preserving text analysis is a well-studied problem in the literature (Hill et al., 2016). One of the common approaches is to identify sensitive terms (like personally identifiable information) in a document and replace them with some more general terms (Cumby and Ghani, 2011;Anandan et al., 2012;Sánchez and Batet, 2016). Another line of research achieves text redaction by injecting additional words into the original text without detecting sensitive entities (Domingo-Ferrer et al., 2009;Pang et al., 2010;SáNchez et al., 2013). However, those methods are shown to be vulnerable to reidentification attacks (Petit et al., 2015). In order to provide a quantifiable theoretical privacy guarantee, the differential privacy framework (Dwork, 2008) has been used for privacypreserving text analysis. In the DPText model (Beigi et al., 2019), an element-wise univariate Laplace noise is added to the pre-trained autoencoders to provide privacy for text representations. Another approach for privacy-preserving text perturbation is in the metric differential privacy framework , an extended notion of local differential privacy (Kasiviswanathan et al., 2011), which adds noise to the pre-trained word embeddings. Metric differential privacy requires that the indistinguishability of the output distributions of any two words in the vocabulary be scaled by their distance, which reduces to differential privacy when Hamming distance is used . A hyperbolic distance metric (Feyisetan et al., 2019) was proposed to provide privacy by perturbing vector representations of words, but it requires specialized training of word embeddings in the highdimensional hyperbolic space. For the word em-beddings in the Euclidean space (Fernandes et al., 2019;Feyisetan et al., 2020), text perturbation is implemented by sampling independent spherical noise from multivariate Laplace distributions. The former work (Fernandes et al., 2019) subsequently used an Earth mover's metric to derive a Bag-of-Words representation on the text, whereas the latter (Feyisetan et al., 2020) directly worked on the word-level embeddings. Since we work with word embeddings in the Euclidean space, we compare our method to the multivariate Laplace mechanism for text perturbation in those two papers (Fernandes et al., 2019;Feyisetan et al., 2020).
Mahalanobis distance has been used as a sensitivity metric for differential privacy in functional data (Hall et al., 2012(Hall et al., , 2013 and differentially private outlier analysis (Okada et al., 2015). Outside the realm of text analysis, Mahalanobis distance is a common tool in cluster analysis, pattern recognition, and anomaly detection (De Maesschalck et al., 2000;Xiang et al., 2008;Warren et al., 2011;Zhao et al., 2015;Zhang et al., 2015).

Methodology
We begin by formally defining Euclidean norm and the regularized Mahalanobis norm.
Definition 1 (Euclidean Norm). For any vector x ∈ R m , its Euclidean norm is: Definition 2 (Mahalanobis Norm). For any vector x ∈ R m , and a positive definite matrix Σ, its Mahalanobis norm is:

Definition 3 (Regularized Mahalanobis Norm).
For any vector x ∈ R m , λ ∈ [0, 1], and a positive definite matrix Σ, its regularized Mahalanobis norm is: From the definitions above, λ can be considered as a tuning parameter. When λ = 0, the regularized Mahalanobis norm reduces to the Euclidean norm; when λ = 1, the regularized norm reduces to the Mahalanobis norm (Mahalanobis, 1936) Note that for any η > 0, the trajectory of {y ∈ R m : y − x 2 = η} is spherical, whereas the trajectory of {y ∈ R m : y − x RM = η} is elliptical unless λ = 0. We will exploit this key difference in the geometry of equidistant contour between the Euclidean norm and the regularized Mahalanobis norm to motivate our text perturbation method. The proposed regularized Mahalanobis norm is a type of shrinkage estimator (Daniels and Kass, 2001;Schäfer and Strimmer, 2005;Couillet and McKay, 2014), which is commonly used to estimate the covariance matrix of high-dimensional vectors so as to ensure the stability of the estimator. The matrix Σ in the regularized Mahalanobis norm controls the direction to which the equidistant contour in the noise distribution is stretched, while the parameter λ controls the degree of the stretch. Definition 4 (Metric Differential Privacy.). For any > 0, a randomized algorithm M : X → Y satisfies d X −privacy if for any x, x ∈ X and y ∈ Y, the following holds: Metric differential privacy (d X −privacy) originated in privacy-preserving geolocation studies , where the metric d is Euclidean distance. It has been extended to quantifying privacy guarantee in text analysis, which states that for any two words w, w in the vocabulary W, the likelihood ratio of observing anyŵ ∈ W is bounded by The multivariate Laplace mechanism (Fernandes et al., 2019;Feyisetan et al., 2020) perturbs the word embedding φ(w) by adding a random spherical noise Z sampled from density f Z (z) ∝ e z 2 , and then find the nearest neighbor in the embedding space as the output of the mechanism.
The left panel in Figure 1 illustrates the text perturbation by spherical noise in the multivariate Laplace mechanism. Here A is in the sparse region in the two-dimensional embedding space.
Given the privacy budget , A has a small probability of being substituted by other words because its expected perturbation (dotted trajectory) still has itself as the nearest neighbor. The right panel in Figure 1 shows text perturbation in the proposed Mahalanobis mechanism, where the word embeddings are redacted by an elliptical noise at the same privacy budget . The matrix Σ is taken to be the sample covariance matrix of the word embeddings scaled by the mean sample variance, so that the noise contour is stretched toward the direction that explains the largest variability in the embedding space. The purpose of the scaling step is to ensure that the scale of the elliptical noise is the same as the scale of the spherical noise. By transforming the spherical noise contour into an elliptical contour, we increase the substitution probability of A in the sparse region, thus improving the privacy guarantee. Meanwhile, since the scale of noise does not change, the utility is preserved at the same level. This is an illustrative example demonstrating the intuition and motivation of the proposed Mahalanobis mechanism.
Our proposed Mahalanobis mechanism for text perturbation shares the same general structure as the multivariate Laplace mechanism. The key difference is that the spherical noise f Z (z) ∝ exp ( z 2 ) in the multivariate Laplace mechanism is replaced by the elliptical noise sampled from density f Z (z) ∝ exp(− z RM ), which can be efficiently performed via Algorithm 1.
An overview for the proposed Mahalanobis mechanism is presented in Algorithm 2. When λ = 0, the proposed Mahalanobis mechanism reduces to the multivariate Laplace mechanism. A heuristic method for choosing the tuning parameter λ is to find the value of λ that maximizes the improvement in the privacy guarantee while maintaining the same level of utility. This can be done through empirical privacy and utility experiments as described in Section 5. The input Σ in Algorithm 2 is computed by scaling the sample covariance matrix of the word embeddings by the mean sample variance so as to guarantee that the trace of Σ equals the trace of I m . Since Σ is a scaled counterpart of the sample covariance matrix, it will stretch the elliptical noise toward the direction with the largest variability in the word embedding space, which maximizes the overall expected probability of words being substituted. We remark that in order to maximize the substitution probability for each individual word, a personalized covariance matrix Σ w can be computed in the neighborhood of each word. This is beyond the scope of this paper and we leave it as future work.

Theoretical Properties
Lemma 1. The random variable Z returned from Algorithm 1 has a probability density function of the form f Z (z) ∝ exp(− z RM ).
Proof. Define U = Y X. Note that conditional on Y , U follows a uniform distribution on a sphere with radius y in the m−dimensional space, which implies f U |Y (u|y) ∝ 1/y m−1 when m i=1 u 2 i = y 2 and 0 otherwise. Therefore, where δ(·) is the Dirac delta function. Since Z = {λΣ+(1−λ)I m } 1/2 U , which is well-defined because λ ∈ [0, 1] and Σ is positive definite, it follows that: The result in the lemma follows by definition.
Proof. It suffices to show that for any strings s = [w 1 . . . w n ], s = [w 1 . . . w n ],ŝ = [ŵ 1 . . .ŵ n ], > 0, λ ∈ [0, 1], and positive definite matrix Σ, where M : W n → W n is the Mahalanobis mechanism and φ : W → R m is the embedding function. We begin by showing that for any w, w ,ŵ ∈ W, it holds that the probability Pr{M (w) =ŵ} is at most e φ(w)−φ(w ) RM times the probability Pr{M (w ) =ŵ}. We define Cŵ = {v ∈ R m : v − φ(ŵ) 2 < min w∈W\ŵ v − φ(w) 2 } be the set of vectors v that are closer toŵ than any other Algorithm 1: Sampling from f Z (z) ∝ exp(− z RM ) 1 Input: Dimension m, a positive definite matrix Σ, tuning parameter λ ∈ [0, 1] 2 Sample an m-dimensional random vector N from a multivariate normal distribution with mean zero and identity covariance matrix. 3 Normalize X = N/ N 2 . 4 Sample Y from a Gamma distribution with shape parameter m and scale parameter 1/ .

5
Replace w i withŵ i = arg min w∈W φ(w) −φ i 2 . 6 returns =ŵ 1ŵ2 . . .ŵ n . word in the embedding space. Let Z be sampled from f Z (z) ∝ exp(− z RM ) by Algorithm 1, where the last step follows from Lemma 1. Since Σ is positive definite, it admits a spectral decomposition Σ = QΛQ , where Q = Q −1 , and Λ is a diagonal matrix with positive entries ξ 1 , . . . , ξ m . Then we can rewrite {λΣ + (1 − λ)I m } −1 = QΩQ , where Ω −1 = λΛ + (1 − λ)I m . Definẽ v = Ω 1/2 Q v andφ(w) = Ω 1/2 Q φ(w). By the triangle inequality, the following hold: The probability ratio is computed by: Finally, since each word in the string is processed independently, Next, we relate the proved theoretical guarantee of the Mahalanobis mechanism to that of the multivariate Laplace mechanism (Fernandes et al., 2019;Feyisetan et al., 2020), which enjoys metrical differential privacy guarantee with respect to the Euclidean metric. The following lemma will help establish our result.
Lemma 2. Let v ∈ R 2 and m = trace(Σ). Let c > 0 be a lower bound on the smallest eigenvalue of Σ. Then, the following bounds hold: Proof. Since Σ is positive definite, it admits a spectral decomposition Σ = QΛQ , where Q = Q −1 , and Λ is a diagonal matrix with eigenvalues ξ 1 , . . . , ξ m . Since by assumption of the minimum eigenvalue greater than c > 0, and that The lemma below then follows.
Lemma 3. Assume trace(Σ) = m and the minimum eigenvalue of Σ is greater than c for some constant c > 0, then for any w, w ∈ W, then The fact that the probability ratio as a function of · RM can be sandwiched by lower and upper bounds as functions of · 2 shows the noise scale is comparable between the Mahalanobis mechanism and multivariate Laplace mechanism.

Experiments
We empirically compare the proposed Mahalanobis mechanism and the existing multivariate Laplace mechanism in both privacy experiments and utility experiments on the following two datasets (more details in Appendix A): • Twitter Dataset. This is a publicly available Kaggle competition dataset (https://www. kaggle.com/c/nlp-getting-started), which contains 7,613 tweets, each with a label indicating whether the tweet describes a disaster event (43% disaster).

Privacy experiments
In the privacy experiments, we compare the Mahalanobis mechanism with the multivariate Laplace mechanism on the following two privacy statistics: 1. N w = Pr{M (w) = w}, which is the probability of the word not getting redacted in the mechanism. This is approximated by counting the number of times an input word w does not get replaced by other words after running the mechanism for 100 times.
2. S w = |{w ∈ W : Pr{M (w) = w } ≥ η|, which is the number of distinct words that have a probability greater than η of being the output of M (w). This is approximated by counting the number of distinct substitutions for an input word w after running the mechanism for 100 times.
We note that N w and S w have been previously used in privacy-preserving text analysis literature to qualitatively characterize the privacy guarantee (Feyisetan et al., 2019(Feyisetan et al., , 2020. We make the following connection between the privacy statistics (N w and S w ) with the DP privacy budget : a smaller corresponds to a stronger privacy guarantee by adding a larger scale (1/ ) of noise in the mechanism, which leads to fewer unperturbed words (lower N w ) and more diverse outputs for each word (higher S w ). For any fixed noise scale of 1/ , the mechanism with a better privacy guarantee will have a lower value of N w and higher value of S w .
In Figure 2 and 3, we summarize how the 5 th , 50 th , and 95 th percentiles of N w and S w change in different configurations of the mechanisms using the 300-d FastText embedding (Bojanowski et al., 2017). The vocabulary set includes 28,596 words in the vocabulary union from the two real datasets. For all 5 th , 50 th , and 95 th percentiles, the Mahalanobis mechanism has a lower value of N w and a higher value of S w as compared to the multivariate Laplace mechanism, which indicates an improvement in privacy statistics. Table 1 and 2 compare the mean and standard deviation of N w and S w across different settings. The mean N w converges to 0 as decreases and converges to 100 as increases. An opposite trend is observed for S w , which is as expected. In the middle range of privacy budget ( = 5, 10, 20), the proposed Mahalanobis mechanism has significantly lower values of N w and higher values of S w , where    the statistical significance is established by comparing the 95% confidence intervals for mean N w and S w in the form of mean ± 1.96 × std/ √ 100. While the scale of the noise is controlled to be the same across settings, the probability that a word does not change becomes smaller and the number of distinct substitutions becomes larger in our proposed mechanism. This shows the advantage of the Mahalanobis mechanism over the multivariate Laplace mechanism in privacy statistics.
The results are qualitatively similar when we repeat the same set of privacy experiments using the 300-d GloVe embeddings (Pennington et al., 2014). As can be seen in Figure 4, Figure 5, Table 3, and  Table 4), the Mahalanobis mechanism has lower values of N w and higher values of S w compared Figure 4: Percentiles for N w (number of times an input word w does change) for 300-d GloVe embedding over 100 repetitions. The Mahalanobis mechanism has lower values of N w than the Laplace mechanism.    to the Laplace mechanism, which demonstrates a better privacy guarantee.

Utility Experiments
In the utility experiments, we compare the Mahalanobis mechanism with the multivariate Laplace mechanism in terms of text classification perfor-mance on the two real datasets.
On the Twitter Dataset, the task is to classify whether a tweet describes a disaster event, where the benchmark FastText model (Joulin et al., 2016) achieves 0.78 accuracy, 0.78 precision, and 0.69 recall. On the SMSSpam Dataset, the task is spam classification, where the benchmark Bag-of-Words Figure 6: Text classification results on Twitter Dataset. There is no significant difference across mechanisms in terms of accuracy, precision, and recall. This shows the utility is maintained at the same level in the proposed Mahalanobis mechanism across λ. There is no significant difference across mechanisms in terms of accuracy, precision, and recall. This shows the utility is maintained at the same level in the proposed Mahalanobis mechanism across λ. model achieves 0.99 accuracy, 0.92 precision, and 0.99 recall.
In both tasks, we use 70% of the data for training, and 30% of the data for testing. The word embedding vectors are from 300-d FastText.  Figure 7 present the utility results in terms of accuracy, precision and recall on two text classification tasks, respectively. As a general trend in both Twitter and SMSSpam Dataset, the classification accuracy increases with and eventually approaches the benchmark performance, which is as expected. There are cases where the recall drops when increases in SMSSpam Dataset, but such drop is not significant as the recall values are all around 0.95 or higher. Across the range of λ, the difference between utility is negligible between the Mahalanobis mechanism and the multivariate Laplace mechanism. Together with results in Section 5.1, we conclude that our proposed mechanism improves the privacy statistics while maintaining the utility at the same level.

Conclusions
We develop a differentially private Mahalanobis mechanism for text perturbation. Compared to the existing multivariate Laplace mechanism, our mechanism exploits the geometric property of elliptical noise so as to improve the privacy statistics while maintaining a similar level of utility. Our method can be readily extended to the privacypreserving analysis on other natural language processing tasks, where utility can be defined according to specific needs.
We remark that the choice of Σ as the global covariance matrix of the word embeddings can be generalized to the personalized covariance matrix within the neighborhood of each word. In this sense, local sensitivity can be used instead of global sensitivity to calibrate the privacy-utility tradeoff. This can be done by adding a preprocessing clustering step on the word embeddings in the vocabulary, and then perform the Mahalanobis mechanism within each cluster using the cluster-specific covariance matrix.
Furthermore, the choice of the tuning parameter λ can also be formulated as an optimization problem with respect to pre-specified privacy and utility constraints. Since λ is the only tuning parameter on a bounded interval of [0, 1], a grid search would suffice, which can be conducted by finding the λ value that maximizes the utility (privacy) objective given the fixed privacy (utility) constraints.