Parallelizable Stack Long Short-Term Memory

Shuoyang Ding, Philipp Koehn


Abstract
Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.
Anthology ID:
W19-1501
Volume:
Proceedings of the Third Workshop on Structured Prediction for NLP
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Andre Martins, Andreas Vlachos, Zornitsa Kozareva, Sujith Ravi, Gerasimos Lampouras, Vlad Niculae, Julia Kreutzer
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/W19-1501
DOI:
10.18653/v1/W19-1501
Bibkey:
Cite (ACL):
Shuoyang Ding and Philipp Koehn. 2019. Parallelizable Stack Long Short-Term Memory. In Proceedings of the Third Workshop on Structured Prediction for NLP, pages 1–6, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Parallelizable Stack Long Short-Term Memory (Ding & Koehn, NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1501.pdf
Presentation:
 W19-1501.Presentation.pdf
Code
 shuoyangd/hoolock