Transfer Learning for Speech Recognition on a Budget

Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober


Abstract
End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network’s weights were sufficient for good performance, especially for inner layers.
Anthology ID:
W17-2620
Volume:
Proceedings of the 2nd Workshop on Representation Learning for NLP
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Phil Blunsom, Antoine Bordes, Kyunghyun Cho, Shay Cohen, Chris Dyer, Edward Grefenstette, Karl Moritz Hermann, Laura Rimell, Jason Weston, Scott Yih
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
168–177
Language:
URL:
https://aclanthology.org/W17-2620
DOI:
10.18653/v1/W17-2620
Bibkey:
Cite (ACL):
Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, and Sebastian Stober. 2017. Transfer Learning for Speech Recognition on a Budget. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 168–177, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Transfer Learning for Speech Recognition on a Budget (Kunze et al., RepL4NLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2620.pdf
Code
 transfer-learning-asr/transfer-learning-asr +  additional community code
Data
LibriSpeech