Learning attention for historical text normalization by learning to pronounce

Marcel Bollmann, Joachim Bingel, Anders Søgaard


Abstract
Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.
Anthology ID:
P17-1031
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
332–344
Language:
URL:
https://aclanthology.org/P17-1031
DOI:
10.18653/v1/P17-1031
Bibkey:
Cite (ACL):
Marcel Bollmann, Joachim Bingel, and Anders Søgaard. 2017. Learning attention for historical text normalization by learning to pronounce. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 332–344, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Learning attention for historical text normalization by learning to pronounce (Bollmann et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1031.pdf
Presentation:
 P17-1031.Presentation.pdf
Video:
 https://aclanthology.org/P17-1031.mp4
Data
CELEX