Fei Gao


2019

pdf bib
Soft Contextual Data Augmentation for Neural Machine Translation
Fei Gao | Jinhua Zhu | Lijun Wu | Yingce Xia | Tao Qin | Xueqi Cheng | Wengang Zhou | Tie-Yan Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited. In this paper, we present a novel data augmentation method for neural machine translation. Different from previous augmentation methods that randomly drop, swap or replace words with other words in a sentence, we softly augment a randomly chosen word in a sentence by its contextual mixture of multiple related words. More accurately, we replace the one-hot representation of a word by a distribution (provided by a language model) over the vocabulary, i.e., replacing the embedding of this word by a weighted combination of multiple semantically similar words. Since the weights of those words depend on the contextual information of the word to be replaced,the newly generated sentences capture much richer information than previous augmentation methods. Experimental results on both small scale and large scale machine translation data sets demonstrate the superiority of our method over strong baselines.

pdf bib
Depth Growing for Neural Machine Translation
Lijun Wu | Yiren Wang | Yingce Xia | Fei Tian | Fei Gao | Tao Qin | Jianhuang Lai | Tie-Yan Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of the neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even drop in performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT14 EnglishGerman and EnglishFrench translation tasks.

pdf bib
Machine Translation With Weakly Paired Documents
Lijun Wu | Jinhua Zhu | Di He | Fei Gao | Tao Qin | Jianhuang Lai | Tie-Yan Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the large amounts of parallel sentences, which hinders its applicability to low-resource language pairs. Recent works explore the possibility of unsupervised machine translation with monolingual data only, leading to much lower accuracy compared with the supervised one. Observing that weakly paired bilingual documents are much easier to collect than bilingual sentences, e.g., from Wikipedia, news websites or books, in this paper, we investigate training translation models with weakly paired bilingual documents. Our approach contains two components. 1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals. 2) We leverage the topic consistency of two weakly paired documents and learn the sentence translation model by constraining the word distribution-level alignments. We evaluate our method on weakly paired documents from Wikipedia on six tasks, the widely used WMT16 GermanEnglish, WMT13 SpanishEnglish and WMT16 RomanianEnglish translation tasks. We obtain 24.1/30.3, 28.1/27.6 and 30.1/27.6 BLEU points separately, outperforming previous results by more than 5 BLEU points in each direction and reducing the gap between unsupervised translation and supervised translation up to 50%.

pdf bib
Microsoft Research Asia’s Systems for WMT19
Yingce Xia | Xu Tan | Fei Tian | Fei Gao | Di He | Weicong Chen | Yang Fan | Linyuan Gong | Yichong Leng | Renqian Luo | Yiren Wang | Lijun Wu | Jinhua Zhu | Tao Qin | Tie-Yan Liu
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks. We won the first place for 8 of the 11 directions and the second place for the other three. Our basic systems are built on Transformer, back translation and knowledge distillation. We integrate several of our rececent techniques to enhance the baseline systems: multi-agent dual learning (MADL), masked sequence-to-sequence pre-training (MASS), neural architecture optimization (NAO), and soft contextual data augmentation (SCA).

2018

pdf bib
Efficient Sequence Learning with Group Recurrent Networks
Fei Gao | Lijun Wu | Li Zhao | Tao Qin | Xueqi Cheng | Tie-Yan Liu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on. One of the key factors to these successes is big models. However, training such big models usually takes days or even weeks of time even if using tens of GPU cards. In this paper, we propose an efficient architecture to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps. To demonstrate the advantages of our models, we conduct experiments on several datasets and tasks. The results show that our architecture achieves comparable or better accuracy comparing with baselines, with a much smaller number of parameters and at a much lower computational cost.