Amjad Almahairi


2023

pdf bib
Residual Prompt Tuning: improving prompt tuning with residual reparameterization
Anastasiia Razdaibiedina | Yuning Mao | Madian Khabsa | Mike Lewis | Rui Hou | Jimmy Ba | Amjad Almahairi
Findings of the Association for Computational Linguistics: ACL 2023

Prompt tuning is one of the successful approaches for parameter-efficient tuning of pre-trained language models. Despite being arguably the most parameter-efficient (tuned soft prompts constitute <0.1% of total parameters), it typically performs worse than other efficient tuning methods and is quite sensitive to hyper-parameters. In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning. We propose to reparameterize soft prompt embeddings using a shallow network with a residual connection. Our experiments show that Residual Prompt Tuning significantly outperforms prompt tuning across T5-Large, T5-Base and BERT-Base models. Notably, our method reaches +7 points improvement over prompt tuning on SuperGLUE benchmark with T5-Base model and allows to reduce the prompt length by 10 times without hurting performance. In addition, we show that our approach is robust to the choice of learning rate and prompt initialization, and is effective in few-shot settings.

pdf bib
Learning Easily Updated General Purpose Text Representations with Adaptable Task-Specific Prefix
Kuan-Hao Huang | Liang Tan | Rui Hou | Sinong Wang | Amjad Almahairi | Ruty Rinott
Findings of the Association for Computational Linguistics: EMNLP 2023

Many real-world applications require making multiple predictions from the same text. Fine-tuning a large pre-trained language model for each downstream task causes computational burdens in the inference time due to several times of forward passes. To amortize the computational cost, freezing the language model and building lightweight models for downstream tasks based on fixed text representations are common solutions. Accordingly, how to learn fixed but general text representations that can generalize well to unseen downstream tasks becomes a challenge. Previous works have shown that the generalizability of representations can be improved by fine-tuning the pre-trained language model with some source tasks in a multi-tasking way. In this work, we propose a prefix-based method to learn the fixed text representations with source tasks. We learn a task-specific prefix for each source task independently and combine them to get the final representations. Our experimental results show that prefix-based training performs better than multi-tasking training and can update the text representations at a smaller computational cost than multi-tasking training.

2022

pdf bib
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning
Yuning Mao | Lambert Mathias | Rui Hou | Amjad Almahairi | Hao Ma | Jiawei Han | Scott Yih | Madian Khabsa
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-growing number of new PELT methods and tasks. In light of model diversity and the difficulty of model selection, we propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup via gating mechanism. On the GLUE benchmark, UniPELT consistently achieves 1 4% gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups. Moreover, UniPELT generally surpasses the upper bound that takes the best performance of all its submodules used individually on each task, indicating that a mixture of multiple PELT methods may be inherently more effective than single methods.

2019

pdf bib
The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation
Mai Oudah | Amjad Almahairi | Nizar Habash
Proceedings of Machine Translation Summit XVII: Research Track