Smoothing and Shrinking the Sparse Seq2Seq Search Space

Ben Peters, André F. T. Martins


Abstract
Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax—the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since they can shrink the search space by assigning zero probability to bad hypotheses, but their ability to handle word-level tasks with transformers has never been tested. In this work, we show that entmax-based models effectively solve the cat got your tongue problem, removing a major source of model error for neural machine translation. In addition, we generalize label smoothing, a critical regularization technique, to the broader family of Fenchel-Young losses, which includes both cross-entropy and the entmax losses. Our resulting label-smoothed entmax loss models set a new state of the art on multilingual grapheme-to-phoneme conversion and deliver improvements and better calibration properties on cross-lingual morphological inflection and machine translation for 7 language pairs.
Anthology ID:
2021.naacl-main.210
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2642–2654
Language:
URL:
https://aclanthology.org/2021.naacl-main.210
DOI:
10.18653/v1/2021.naacl-main.210
Bibkey:
Cite (ACL):
Ben Peters and André F. T. Martins. 2021. Smoothing and Shrinking the Sparse Seq2Seq Search Space. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2642–2654, Online. Association for Computational Linguistics.
Cite (Informal):
Smoothing and Shrinking the Sparse Seq2Seq Search Space (Peters & Martins, NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.210.pdf
Video:
 https://aclanthology.org/2021.naacl-main.210.mp4
Code
 deep-spin/S7
Data
WMT 2016