A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher


Abstract
Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.
Anthology ID:
D17-1206
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1923–1933
Language:
URL:
https://aclanthology.org/D17-1206
DOI:
10.18653/v1/D17-1206
Bibkey:
Cite (ACL):
Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1923–1933, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks (Hashimoto et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1206.pdf
Attachment:
 D17-1206.Attachment.zip
Code
 hassyGo/charNgram2vec +  additional community code
Data
Penn Treebank