Double Path Networks for Sequence to Sequence Learning

Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu


Abstract
Encoder-decoder based Sequence to Sequence learning (S2S) has made remarkable progress in recent years. Different network architectures have been used in the encoder/decoder. Among them, Convolutional Neural Networks (CNN) and Self Attention Networks (SAN) are the prominent ones. The two architectures achieve similar performances but use very different ways to encode and decode context: CNN use convolutional layers to focus on the local connectivity of the sequence, while SAN uses self-attention layers to focus on global semantics. In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion. During the encoding step, we develop a double path architecture to maintain the information coming from different paths with convolutional layers and self-attention layers separately. To effectively use the encoded context, we develop a gated attention fusion module and use it to automatically pick up the information needed during the decoding step, which is also a double path network. By deeply integrating the two paths, both types of information are combined and well exploited. Experiments show that our proposed method can significantly improve the performance of sequence to sequence learning over state-of-the-art systems.
Anthology ID:
C18-1259
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3064–3074
Language:
URL:
https://aclanthology.org/C18-1259
DOI:
Bibkey:
Cite (ACL):
Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, and Tie-Yan Liu. 2018. Double Path Networks for Sequence to Sequence Learning. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3064–3074, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Double Path Networks for Sequence to Sequence Learning (Song et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1259.pdf
Code
 StillKeepTry/Transformer-PyTorch