DILBERT: Customized Pre-Training for Domain Adaptation with Category Shift, with an Application to Aspect Extraction

Entony Lekhtman, Yftah Ziser, Roi Reichart


Abstract
The rise of pre-trained language models has yielded substantial progress in the vast majority of Natural Language Processing (NLP) tasks. However, a generic approach towards the pre-training procedure can naturally be sub-optimal in some cases. Particularly, fine-tuning a pre-trained language model on a source domain and then applying it to a different target domain, results in a sharp performance decline of the eventual classifier for many source-target domain pairs. Moreover, in some NLP tasks, the output categories substantially differ between domains, making adaptation even more challenging. This, for example, happens in the task of aspect extraction, where the aspects of interest of reviews of, e.g., restaurants or electronic devices may be very different. This paper presents a new fine-tuning scheme for BERT, which aims to address the above challenges. We name this scheme DILBERT: Domain Invariant Learning with BERT, and customize it for aspect extraction in the unsupervised domain adaptation setting. DILBERT harnesses the categorical information of both the source and the target domains to guide the pre-training process towards a more domain and category invariant representation, thus closing the gap between the domains. We show that DILBERT yields substantial improvements over state-of-the-art baselines while using a fraction of the unlabeled data, particularly in more challenging domain adaptation setups.
Anthology ID:
2021.emnlp-main.20
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
219–230
Language:
URL:
https://aclanthology.org/2021.emnlp-main.20
DOI:
10.18653/v1/2021.emnlp-main.20
Bibkey:
Cite (ACL):
Entony Lekhtman, Yftah Ziser, and Roi Reichart. 2021. DILBERT: Customized Pre-Training for Domain Adaptation with Category Shift, with an Application to Aspect Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 219–230, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
DILBERT: Customized Pre-Training for Domain Adaptation with Category Shift, with an Application to Aspect Extraction (Lekhtman et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.20.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.20.mp4
Code
 tonylekhtman/dilbert
Data
MAMS