Mingxuan Wang


2019

pdf bib
Towards Linear Time Neural Machine Translation with Capsule Networks
Mingxuan Wang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this study, we first investigate a novel capsule network with dynamic routing for linear time Neural Machine Translation (NMT), referred as CapsNMT. CapsNMT uses an aggregation mechanism to map the source sentence into a matrix with pre-determined size, and then applys a deep LSTM network to decode the target sequence from the source representation. Unlike the previous work sutskever2014sequence to store the source sentence with a passive and bottom-up way, the dynamic routing policy encodes the source sentence with an iterative process to decide the credit attribution between nodes from lower and higher layers. CapsNMT has two core properties: it runs in time that is linear in the length of the sequences and provides a more flexible way to aggregate the part-whole information of the source sentence. On WMT14 English-German task and a larger WMT14 English-French task, CapsNMT achieves comparable results with the Transformer system. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for sequence to sequence problems.

pdf bib
Imitation Learning for Non-Autoregressive Neural Machine Translation
Bingzhen Wei | Mingxuan Wang | Hao Zhou | Junyang Lin | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Non-autoregressive translation models (NAT) have achieved impressive inference speedup. A potential issue of the existing NAT algorithms, however, is that the decoding is conducted in parallel, without directly considering previous context. In this paper, we propose an imitation learning framework for non-autoregressive machine translation, which still enjoys the fast translation speed but gives comparable translation performance compared to its auto-regressive counterpart. We conduct experiments on the IWSLT16, WMT14 and WMT16 datasets. Our proposed model achieves a significant speedup over the autoregressive models, while keeping the translation quality comparable to the autoregressive models. By sampling sentence length in parallel at inference time, we achieve the performance of 31.85 BLEU on WMT16 RoEn and 30.68 BLEU on IWSLT16 EnDe.

2018

pdf bib
Neural Machine Translation with Decoding History Enhanced Attention
Mingxuan Wang | Jun Xie | Zhixing Tan | Jinsong Su | Deyi Xiong | Chao Bian
Proceedings of the 27th International Conference on Computational Linguistics

Neural machine translation with source-side attention have achieved remarkable performance. however, there has been little work exploring to attend to the target-side which can potentially enhance the memory capbility of NMT. We reformulate a Decoding History Enhanced Attention mechanism (DHEA) to render NMT model better at selecting both source-side and target-side information. DHA enables dynamic control of the ratios at which source and target contexts contribute to the generation of target words, offering a way to weakly induce structure relations among both source and target tokens. It also allows training errors to be directly back-propagated through short-cut connections and effectively alleviates the gradient vanishing problem. The empirical study on Chinese-English translation shows that our model with proper configuration can improve by 0:9 BLEU upon Transformer and the best reported results in the dataset. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.

pdf bib
Tencent Neural Machine Translation Systems for WMT18
Mingxuan Wang | Li Gong | Wenhuan Zhu | Jun Xie | Chao Bian
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

2017

pdf bib
Deep Neural Machine Translation with Linear Associative Unit
Mingxuan Wang | Zhengdong Lu | Jie Zhou | Qun Liu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art Neural Machine Translation (NMT) with its capability in modeling complex functions and capturing complex linguistic structures. However NMT with deep architecture in its encoder or decoder RNNs often suffer from severe gradient diffusion due to the non-linear recurrent activations, which often makes the optimization much more difficult. To address this problem we propose a novel linear associative units (LAU) to reduce the gradient propagation path inside the recurrent unit. Different from conventional approaches (LSTM unit and GRU), LAUs uses linear associative connections between input and output of the recurrent unit, which allows unimpeded information flow through both space and time The model is quite simple, but it is surprisingly effective. Our empirical study on Chinese-English translation shows that our model with proper configuration can improve by 11.7 BLEU upon Groundhog and the best reported on results in the same setting. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.

pdf bib
Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation
Jinchao Zhang | Mingxuan Wang | Qun Liu | Jie Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper proposes three distortion models to explicitly incorporate the word reordering knowledge into attention-based Neural Machine Translation (NMT) for further improving translation performance. Our proposed models enable attention mechanism to attend to source words regarding both the semantic requirement and the word reordering penalty. Experiments on Chinese-English translation show that the approaches can improve word alignment quality and achieve significant translation improvements over a basic attention-based NMT by large margins. Compared with previous works on identical corpora, our system achieves the state-of-the-art performance on translation quality.

2016

pdf bib
Memory-enhanced Decoder for Neural Machine Translation
Mingxuan Wang | Zhengdong Lu | Hang Li | Qun Liu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Encoding Source Language with Convolutional Neural Network for Machine Translation
Fandong Meng | Zhengdong Lu | Mingxuan Wang | Hang Li | Wenbin Jiang | Qun Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
genCNN: A Convolutional Architecture for Word Sequence Prediction
Mingxuan Wang | Zhengdong Lu | Hang Li | Wenbin Jiang | Qun Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)