Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records

De-identification is the task of detecting protected health information (PHI) in medical text. It is a critical step in sanitizing electronic health records (EHR) to be shared for research. Automatic de-identification classifiers can significantly speed up the sanitization process. However, obtaining a large and diverse dataset to train such a classifier that works well across many types of medical text poses a challenge as privacy laws prohibit the sharing of raw medical records. We introduce a method to create privacy-preserving shareable representations of medical text (i.e. they contain no PHI) that does not require expensive manual pseudonymization. These representations can be shared between organizations to create unified datasets for training de-identification models. Our representation allows training a simple LSTM-CRF de-identification model to an F1 score of 97.4%, which is comparable to a strong baseline that exposes private information in its representation. A robust, widely available de-identification classifier based on our representation could potentially enable studies for which de-identification would otherwise be too costly.


Introduction
Electronic health records (EHRs) are are valuable resource that could potentially be used in largescale medical research (Botsis et al., 2010;Birkhead et al., 2015;Cowie et al., 2017). In addition to structured medical data, EHRs contain free-text patient notes that are a rich source of information (Jensen et al., 2012). However, due to privacy and data protection laws, medical records can only be shared and used for research if they are sanitized to not include information potentially identifying patients. The PHI that may not be shared includes potentially identifying information such as names, geographic identifiers, dates, and account numbers; the American Health Insurance Portability Accountability Act 1 (HIPAA, 1996) defines 18 categories of PHI. De-identification is the task of finding and labeling PHI in medical text as a step toward sanitization. As the information to be removed is very sensitive, sanitization always requires final human verification. Automatic deidentification labeling can however significantly speed up the process, as shown for other annotation tasks in e.g. Yimam (2015).
Trying to create an automatic classifier for deidentification leads to a "chicken and egg problem" (Uzuner et al., 2007): without a comprehensive training set, an automatic de-identification classifier cannot be developed, but without access to automatic de-identification, it is difficult to share large corpora of medical text in a privacypreserving way for research (including for training the classifier itself). The standard method of data protection compliant sharing of training data for a de-identification classifier requires humans to pseudonymize protected information with substitutes in a document-coherent way. This includes replacing e.g. every person or place name with a different name, offsetting dates by a random amount while retaining date intervals, and replacing misspellings with similar misspellings of the pseudonym (Uzuner et al., 2007).
In 2019, a pseudonymized dataset for deidentification from a single source, the i2b2 2014 dataset, is publicly available .
However, de-identification classifiers trained on this dataset do not generalize well to data from other sources (Stubbs et al., 2017). To obtain a universal de-identification classifier, many medical institutions would have to pool their data. But, preparing this data for sharing using the document-coherent pseudonymization ap- N o nr e v e r s ib le tr a n s fo r m a ti o n Figure 1: Sharing training data for de-identification. PHI annotations are marked with [brackets]. Upper alternative: traditional process using manual pseudonymization. Lower alternative: our approach of sharing private vector representations. The people icon represents tasks done by humans; the gears icon represents tasks done by machines; the lock icon represents privacy-preserving artifacts. Manual pseudonymization is marked with a dollar icon to emphasize its high costs.
To address this problem, we introduce an adversarially learned representation of medical text that allows privacy-preserving sharing of training data for a de-identification classifier by transforming text non-reversibly into a vector space and only sharing this representation. Our approach still requires humans to annotate PHI (as this is the training data for the actual de-identification task) but the pseudonymization step (replacing PHI with coherent substitutes) is replaced by the automatic transformation to the vector representation instead. A classifier then trained on our representation cannot contain any protected data, as it is never trained on raw text (as long as the representation does not allow for the reconstruction of sensitive information). The traditional approach to sharing training data is conceptually compared to our approach in Fig. 1.

Related Work
Our work builds upon two lines of research: firstly de-identification, as the system has to provide good de-identification performance, and secondly adversarial representation learning, to remove all identifying information from the representations to be distributed.

Automatic De-Identification
Analogously to many natural language processing tasks, the state of the art in de-identification changed in recent years from rule-based systems and shallow machine learning approaches like conditional random fields (CRFs) (Uzuner et al., 2007;Meystre et al., 2010) to deep learning methods (Stubbs et al., 2017;Dernoncourt et al., 2017;Liu et al., 2017).
Three i2b2 shared tasks on de-identification were run in 2006 (Uzuner et al., 2007), 2014 , and 2016 (Stubbs et al., 2017). The organizers performed manual pseudonymization on clinical records from a single source to create the datasets for each of the tasks. An F 1 score of 95% has been suggested as a target for reasonable de-identification systems . Dernoncourt et al. (2017) first applied a long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) model with a CRF output component to de-identification. Transfer learning from a larger dataset slightly improves performance on the i2b2 2014 dataset (Lee et al., 2018). Liu et al. (2017) achieve state-of-the-art performance in de-identification by combining a deep learning ensemble with a rule component.
Up to and including the 2014 shared task, the organizers emphasized that it is unclear if a system trained on the provided datasets will generalize to medical records from other sources (Uzuner et al., 2007;. The 2016 shared task featured a sight-unseen track in which deidentification systems were evaluated on records from a new data source. The best system achieved an F 1 score of 79%, suggesting that systems at the time were not able to deliver sufficient performance on completely new data (Stubbs et al., 2017).

Adversarial Representation Learning
Fair representations (Zemel et al., 2013;Hamm, 2015) aim to encode features of raw data that allows it to be used in e.g. machine learning algorithms while obfuscating membership in a protected group or other sensitive attributes. The domain-adversarial neural network (DANN) architecture (Ganin et al., 2016) is a deep learning implementation of a three-party game between a representer, a classifier, and an adversary component. The classifier and the adversary are deep learning models with shared initial layers. A gradient reversal layer is used to worsen the representation for the adversary during back-propagation: when training the adversary, the adversary-specific part of the network is optimized for the adversarial task but the shared part is updated against the gradient to make the shared representation less suitable for the adversary.
Although initially conceived for use in domain adaptation, DANNs and similar adversarial deep learning models have recently been used to obfuscate demographic attributes from text (Elazar and Goldberg, 2018;Li et al., 2018) and subject identity (Feutry et al., 2018) from images. Elazar and Goldberg (2018) warn that when a representation is learned using gradient reversal methods, continued adversary training on the frozen representation may allow adversaries to break representation privacy. To test whether the unwanted information is not extractable from the generated information anymore, adversary training needs to continue on the frozen representation after finishing training the system. Only if after continued adversary training the information cannot be recovered, we have evidence that it really is not contained in the representation anymore.

Dataset and De-Identification Model
We evaluate our approaches using the i2b2 2014 dataset , which was released as part of the 2014 i2b2/UTHealth shared task track 1 and is the largest publicly available dataset for de-identification today. It contains 1304 free-text documents with PHI annotations. The i2b2 dataset uses the 18 categories of PHI defined by HIPAA as a starting point for its own set of PHI categories. In addition to the HIPAA set of categories, it includes (sub-)categories such as doctor names, professions, states, countries, and ages under 90. We compare three different approaches: a nonprivate de-identification classifier and two privacyenabled extensions, automatic pseudonymization (Section 4) and adversarially learned representations (Section 5).
Our non-private system as well as the privacyenabled extensions are based on a bidirectional LSTM-CRF architecture that has been proven to work well in sequence tagging (Huang et al., 2015;Lample et al., 2016) and de-identification (Dernoncourt et al., 2017;Liu et al., 2017). We only use pre-trained FastText (Bojanowski et al., 2017) or GloVe (Pennington et al., 2014) word embeddings, not explicit character embeddings, as we suspect that these may allow easy re-identification of private information if used in shared representations. In place of learned character features, we provide the casing feature from Reimers and Gurevych (2017) as an additional input. The feature maps words to a one-hot representation of their casing (numeric, mainly numeric, all lower, all upper, initial upper, contains digit, or other). Table 1 shows our raw de-identification model's hyperparameter configuration that was determined through a random hyperparameter search.

Automatic Pseudonymization
To provide a baseline to compare our primary approach against, we introduce a naïve word-level automatic pseudonymization approach that exploits the fact that state-of-the-art de-identification models (Liu et al., 2017;Dernoncourt et al., 2017) as well as our non-private de-identification model work on the sentence level and do not rely on document coherency. Before training, we shuffle the James was admitted · · · · · · Representation Model · · ·
De-identification output Adversary output While the resulting sentences do not necessarily make sense to a reader (e.g. "Croix Scott" is not a realistic hospital name), its embedding representation is similar to the original. We train our deidentification model on the transformed data and test it on the raw data. The number of neighbors N controls the privacy properties of the approach: N = 1 means no pseudonymization; setting N to the number of rows in a precomputed embedding matrix delivers perfect anonymization but the resulting data may be worthless for training a deidentification model.

Adversarial Representation
We introduce a new data sharing approach that is based on an adversarially learned private repre-sentation and improves on the pseudonymization from Section 4. After training the representation on an initial publicly available dataset, e.g. the i2b2 2014 data, a central model provider shares the frozen representation model with participating medical institutions. They transform their PHIlabeled raw data into the pseudonymized representation, which is then pooled into a new public dataset for de-identification. Periodically, the pipeline consisting of the representation model and a trained de-identification model can be published to be used by medical institutions on their unlabeled data.
Since both the representation model and the resulting representations are shared in this scenario, our representation procedure is required to prevent two attacks: A1. Learning an inverse representation model that transforms representations back to original sentences containing PHI.
A2. Building a lookup table of inputs and their exact representations that can be used in known plaintext attacks.

Architecture
Our approach uses a model that is composed of three components: a representation model, the deidentification model from Section 3, and an adversary. An overview of the architecture is shown in Fig. 2. The representation model maps a sequence of word embeddings to an intermediate vector representation sequence. The de-identification model receives this representation sequence as an input instead of the original embedding sequence. It retains the casing feature as an auxiliary input. The adversary has two inputs, the representation sequence and an additional embedding or representation sequence, and a single output unit.

Representation
To protect against A1, our representation must be invariant to small input changes, like a single PHI token being replaced with a neighbor in the embedding space. Again, the number of neighbors N controls the privacy level of the representation.
To protect against A2, we add a random element to the representation that makes repeated transformations of one sentence indistinguishable from representations of similar input sentences.
We use a bidirectional LSTM model to implement the representation. It applies Gaussian noise N with zero mean and trainable standard deviations to the input embeddings E and the output sequence. The model learns a standard deviation for each of the input and output dimensions.
In a preliminary experiment, we confirmed that adding noise with a single, fixed standard deviation is not a viable approach for privacypreserving representations. To change the cosine similarity neighborhoods of embeddings at all, we need to add high amounts of noise (more than double of the respective embedding matrix's standard deviation), which in turn results in unrealistic embeddings that do not allow training a deidentification model of sufficient quality.
In contrast to the automatic pseudonymization approach from Section 4 that only perturbs PHI tokens, the representation models in this approach processes all tokens to represent them in a new embedding space. We evaluate the representation sizes d ∈ {50, 100, 300}.

Adversaries
We use two adversaries that are trained on tasks that directly follow from A1 and A2: T1. Given a representation sequence and an embedding sequence, decide if they were obtained from the same sentence.
T2. Given two representation sequences (and their cosine similarities), decide if they were obtained from the same sentence.
We generate the representation sequences for the second adversary from a copy of the representation model with shared weights. We generate real and fake pairs for adversarial training using the automatic pseudonymization approach presented in Section 4, limiting the number of replaced PHI tokens to one per sentence. The adversaries are implemented as bidirectional LSTM models with single output units. We confirmed that this type of model is able to learn the adversarial tasks on random data and raw word embeddings in preliminary experiments. To use the two adversaries in our architecture, we average their outputs.

Training
We evaluate two training procedures: DANN training (Ganin et al., 2016) and the three-part procedure from Feutry et al. (2018).
In DANN training, the three components are trained conjointly, optimizing the sum of losses. Training the de-identification model modifies the representation model weights to generate a more meaningful representation for de-identification. The adversary gradient is reversed with a gradient reversal layer between the adversary and the representation model in the backward pass, causing the representation to become less meaningful for the adversary.
The training procedure by Feutry et al. (2018) is shown in Fig. 3. It is composed of three phases: P1. The de-identification and representation models are pre-trained together, optimizing the de-identification loss l deid .
P2. The representation model is frozen and the adversary is pre-trained, optimizing the adversarial loss l adv .
P3. In alternation, for one epoch each: (a) The representation is frozen and both de-identification model and adversary are trained, optimizing their respective losses l deid and l adv . (b) The de-identification model and adversary are frozen and the representation is trained, optimizing the combined loss l repr = l deid + λ|l adv − l random | (2) In each of the first two phases, the respective validation loss is monitored to decide at which point the training should move on to the next phase. The alternating steps in the third phase each last one training epoch; the early stopping time for the third phase is determined using only the combined validation loss from Phase P3b. Gradient reversal is achieved by optimizing the combined representation loss while the adversary weights are frozen. The combined loss is motivated by the fact that the adversary performance should be the same as a random guessing model, which is a lower bound for anonymization (Feutry et al., 2018). The term |l adv −l random | approaches 0 when the adversary performance approaches random guessing 2 . λ is a weighting factor for the two losses; we select λ = 1.

Experiments
To evaluate our approaches, we perform experiments using the i2b2 2014 dataset.
Preprocessing: We apply aggressive tokenization similarly to Liu et al. (2017), including splitting at all punctuation marks and mid-word e.g. if a number is followed by a word ("25yo" is split into "25", "yo") in order to minimize the amount of GloVe out-of-vocabulary tokens. We extend spaCy's 3 sentence splitting heuristics with additional rules for splitting at multiple blank lines as well as bulleted and numbered list items.
Deep Learning Models: We use the Keras framework 4 (Chollet et al., 2015) with the Tensor-Flow backend (Abadi et al., 2015) to implement our deep learning models.
Evaluation: In order to compare our results to the state of the art, we use the token-based binary HIPAA F 1 score as our main metric for deidentification performance. Dernoncourt et al. (2017) deem it the most important metric: deciding if an entity is PHI or not is generally more important than assigning the correct category of PHI, and only HIPAA categories of PHI are required to be removed by American law. Non-PHI tokens are not incorporated in the F 1 score. We perform the evaluation with the official i2b2 evaluation script 5 .   Table 2 shows de-identification performance results for the non-private de-identification classifier (upper part, in comparison to the state of the art) as well as the two privacy-enabled extensions (lower part). The results are average values out of five experiment runs.

Non-private De-Identification Model
When trained on the raw i2b2 2014 data, our models achieve F 1 scores that are comparable to Dernoncourt et al.'s results. The casing feature improves GloVe by 0.4 percentage points.

Automatic Pseudonymization
For both FastText and GloVe, moving training PHI tokens to random tokens from up to their N = 200 closest neighbors does not significantly reduce deidentification performance (see Fig. 4). F 1 scores for both models drop to around 95% when selecting from N = 500 neighbors and to around 90% when using N = 1 000 neighbors. With N = 100, the FastText model achieves an F 1 score of 96.75% and the GloVe model achieves an F 1 score of 96.42%.

Adversarial Representation
We do not achieve satisfactory results with the conjoint DANN training procedure: in all cases, our models learn representations that are not sufficiently resistant to the adversary. When training the adversary on the frozen representation for an additional 20 epochs, it is able to distinguish real from fake input pairs on a test set with accuracies above 80%. This confirms the difficulties of DANN training as described by Elazar and Goldberg (2018) (see Section 2.2).
In contrast, with the three-part training procedure, we are able to learn a representation that allows training a de-identification model while preventing an adversary from learning the adversarial tasks, even with continued training on a frozen representation. Figure 5 (left) shows our de-identification results when using adversarially learned representations. A higher number of neighbors N means a stronger invariance requirement for the representation. For values of N up to 1 000, our FastText and GloVe models are able to learn representations that allow training de-identification models that reach or exceed the target F 1 score of 95%. However, training becomes unstable for N > 500: at this point, the adversary is able to break the representation privacy when trained for an additional 50 epochs (Fig. 5 right).
Our choice of representation size d ∈ {50, 100, 300} does not influence de-identification or adversary performance, so we select d = 50 for further evaluation. For d = 50 and N = 100, the FastText model reaches an F 1 score of 97.4% and the GloVe model reaches an F 1 score of 96.89%.

De-Identification Performance
In the following, we discuss the results of our models with regard to our goal of sharing sensitive training data for automatic de-identification. Overall, privacy-preserving representations come at a cost, as our best privacy-preserving model scores 0.27 points F 1 score lower than our best non-private model; we consider this relative increase of errors of less than 10% as tolerable.
Raw Text De-Identification: We find that the choice of GloVe or FastText embeddings does not meaningfully influence de-identification performance. FastText's approach to embedding unknown words (word embeddings are the sum of their subword embeddings) should intuitively prove useful on datasets with misspellings and ungrammatical text. However, when using the additional casing feature, FastText beats GloVe only by 0.05 percentage points on the i2b2 test set. In this task, the casing feature makes up for GloVe's inability to embed unknown words. Liu et al. (2017) use a deep learning ensemble in combination with hand-crafted rules to achieve state-of-the-art results for de-identification. Our model's scores are similar to the previous state of the art, a bidirectional LSTM-CRF model with character features (Dernoncourt et al., 2017).
Automatically Pseudonymized Data: Our naïve automatic word-level pseudonymization approach allows training reasonable de-identification models when selecting from up to N = 500 neighbors. There is almost no decrease in F 1 score for up to N = 20 neighbors for both the FastText and GloVe model.
Adversarially Learned Representation: Our adversarially trained vector representation allows training reasonable de-identification models (F 1 scores above 95%) when using up to N = 1 000 neighbors as an invariance requirement. The adversarial representation results beat the automatic pseudonymization results because the representation model can act as a task-specific feature extractor. Additionally, the representations are more general as they are invariant to word changes.

Privacy Properties
In this section, we discuss our models with respect to their privacy-preserving properties.
Embeddings: When looking up embedding space neighbors for words, it is notable that many FastText neighbors include the original word or parts of it as a subword. For tokens that occur as PHI in the i2b2 training set, on average 7.37 of their N = 100 closest neighbors in the Fast-Text embedding matrix contain the original token as a subword. When looking up neighbors using GloVe embeddings, the value is 0.44. This may indicate that FastText requires stronger perturbation (i.e. higher N ) than GloVe to sufficiently obfuscate protected information.
Automatically Pseudonymized Data: The word-level pseudonymization does not guarantee a minimum perturbation for every word, e.g. in a set of pseudonymized sentences using N = 100 FastText neighbors, we found the phrase Additionally, the approach may allow an adversary to piece together documents from the shuffled sentences. If multiple sentences contain similar pseudonymized identifiers, they will likely come from the same original document, undoing the privacy gain from shuffling training sentences across documents. It may be possible to infer the original information using the overlapping neighbor spaces. To counter this, we can re-introduce document-level pseudonymization, i.e. moving all occurrences of a PHI token to the same neighbor. However, we would then also need to detect misspelled names as well as other hints to the actual tokens and transform them similarly to the original, which would add back much of the complexity of manual pseudonymization that we try to avoid.
Adversarially Learned Representation: Our adversarial representation empirically satisfies a strong privacy criterion: representations are invariant to any protected information token being replaced with any of its N neighbors in an embedding space. When freezing the representation model from an experiment run using up to N = 500 neighbors and training the adversary for an additional 50 epochs, it still does not achieve higherthan-chance accuracies on the training data. Due to the additive noise, the adversary does not overfit on its training set but rather fails to identify any structure in the data.
In the case of N = 1 000 neighbors, the representation never becomes stable in the alternating training phase. The adversary is always able to break the representation privacy.

Conclusions & Future Work
We introduced a new approach to sharing training data for de-identification that requires lower human effort than the existing approach of document-coherent pseudonymization. Our approach is based on adversarial learning, which yields representations that can be distributed since they do not contain private health information. The setup is motivated by the need of deidentification of medical text before sharing; our approach provides a lower-cost alternative than manual pseudonymization and gives rise to the pooling of de-identification datasets from heterogeneous sources in order to train more robust classifiers. Our implementation and experimental data are publicly available 6 .
As precursors to our adversarial representation approach, we developed a deep learning model for de-identification that does not rely on explicit character features as well as an automatic word-level pseudonymization approach. A model trained on our automatically pseudonymized data with N = 100 neighbors loses around one percentage point in F 1 score when compared to the non-private system, scoring 96.75% on the i2b2 2014 test set.
Further, we presented an adversarial learning based private representation of medical text that is invariant to any PHI word being replaced with any of its embedding space neighbors and contains a random element. The representation allows training a de-identification model while being robust to adversaries trying to re-identify protected information or building a lookup table of representations. We extended existing adversarial representation learning approaches by using two adversaries that discriminate real from fake sequence pairs with an additional sequence input.
The representation acts as a task-specific feature extractor. For an invariance criterion of up to N = 500 neighbors, training is stable and adversaries cannot beat the random guessing accuracy of 50%. Using the adversarially learned representation, de-identification models reach an F 1 score of 97.4%, which is close to the non-private system (97.67%). In contrast, the automatic pseudonymization approach only reaches an F 1 score of 95.0% at N = 500.
Our adversarial representation approach enables cost-effective private sharing of training data for sequence labeling. Pooling of training data for de-identification from multiple institutions would lead to much more robust classifiers. Eventually, improved de-identification classifiers could help enable large-scale medical studies that eventually improve public health.
Future Work: The automatic pseudonymization approach could serve as a data augmentation scheme to be used as a regularizer for deidentification models. Training a model on a combination of raw and pseudonymized data may result in better test scores on the i2b2 test set, possibly improving the state of the art.
Private character embeddings that are learned from a perturbed source could be an interesting extension to our models.
In adversarial learning with the three-part training procedure, it might be possible to tune the λ parameter and define a better stopping condition that avoids the unstable characteristics with high values for N in the invariance criterion. A fur-ther possible extension is a dynamic noise level in the representation model that depends on the LSTM output instead of being a trained weight. This might allow using lower amounts of noise for certain inputs while still being robust to the adversary.
When more training data from multiple sources become available in the future, it will be possible to evaluate our adversarially learned representation against unseen data.