On Biasing Transformer Attention Towards Monotonicity

Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich


Abstract
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.
Anthology ID:
2021.naacl-main.354
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4474–4488
Language:
URL:
https://aclanthology.org/2021.naacl-main.354
DOI:
10.18653/v1/2021.naacl-main.354
Bibkey:
Cite (ACL):
Annette Rios, Chantal Amrhein, Noëmi Aepli, and Rico Sennrich. 2021. On Biasing Transformer Attention Towards Monotonicity. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4474–4488, Online. Association for Computational Linguistics.
Cite (Informal):
On Biasing Transformer Attention Towards Monotonicity (Rios et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.354.pdf
Video:
 https://aclanthology.org/2021.naacl-main.354.mp4
Code
 ZurichNLP/monotonicity_loss