Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference

Ahmadreza Mosallanezhad, Ghazaleh Beigi, Huan Liu


Abstract
User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User’s privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.
Anthology ID:
D19-1240
Original:
D19-1240v1
Version 2:
D19-1240v2
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2360–2369
Language:
URL:
https://aclanthology.org/D19-1240
DOI:
10.18653/v1/D19-1240
Bibkey:
Cite (ACL):
Ahmadreza Mosallanezhad, Ghazaleh Beigi, and Huan Liu. 2019. Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2360–2369, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference (Mosallanezhad et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1240.pdf