Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University

Parnia Bahar; Patrick Wilken; Tamer Alkhouli; Andreas Guta; Pavel Golik; Evgeny Matusov; Christian Herold

doi:10.18653/v1/2020.iwslt-1.3

Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University

Parnia Bahar, Patrick Wilken, Tamer Alkhouli, Andreas Guta, Pavel Golik, Evgeny Matusov, Christian Herold

Abstract

AppTek and RWTH Aachen University team together to participate in the offline and simultaneous speech translation tracks of IWSLT 2020. For the offline task, we create both cascaded and end-to-end speech translation systems, paying attention to careful data selection and weighting. In the cascaded approach, we combine high-quality hybrid automatic speech recognition (ASR) with the Transformer-based neural machine translation (NMT). Our end-to-end direct speech translation systems benefit from pretraining of adapted encoder and decoder components, as well as synthetic data and fine-tuning and thus are able to compete with cascaded systems in terms of MT quality. For simultaneous translation, we utilize a novel architecture that makes dynamic decisions, learned from parallel data, to determine when to continue feeding on input or generate output words. Experiments with speech and text input show that even at low latency this architecture leads to superior translation results.

Anthology ID:: 2020.iwslt-1.3
Volume:: Proceedings of the 17th International Conference on Spoken Language Translation
Month:: July
Year:: 2020
Address:: Online
Editors:: Marcello Federico, Alex Waibel, Kevin Knight, Satoshi Nakamura, Hermann Ney, Jan Niehues, Sebastian Stüker, Dekai Wu, Joseph Mariani, Francois Yvon
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44–54
Language:
URL:: https://aclanthology.org/2020.iwslt-1.3
DOI:: 10.18653/v1/2020.iwslt-1.3
Bibkey:
Cite (ACL):: Parnia Bahar, Patrick Wilken, Tamer Alkhouli, Andreas Guta, Pavel Golik, Evgeny Matusov, and Christian Herold. 2020. Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University. In Proceedings of the 17th International Conference on Spoken Language Translation, pages 44–54, Online. Association for Computational Linguistics.
Cite (Informal):: Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University (Bahar et al., IWSLT 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.iwslt-1.3.pdf
Video:: http://slideslive.com/38929614

PDF Cite Search Video