Yang Feng


2019

pdf bib
Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation
Zhengxin Yang | Jinchao Zhang | Fandong Meng | Shuhao Gu | Yang Feng | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Context modeling is essential to generate coherent and consistent translation for Document-level Neural Machine Translations. The widely used method for document-level translation usually compresses the context information into a representation via hierarchical attention networks. However, this method neither considers the relationship between context words nor distinguishes the roles of context words. To address this problem, we propose a query-guided capsule networks to cluster context information into different perspectives from which the target translation may concern. Experiment results show that our method can significantly outperform strong baselines on multiple data sets of different domains.

pdf bib
Incremental Transformer with Deliberation Decoder for Document Grounded Conversations
Zekang Li | Cheng Niu | Fandong Meng | Yang Feng | Qian Li | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Document Grounded Conversations is a task to generate dialogue responses when chatting about the content of a given document. Obviously, document knowledge plays a critical role in Document Grounded Conversations, while existing dialogue models do not exploit this kind of knowledge effectively enough. In this paper, we propose a novel Transformer-based architecture for multi-turn document grounded conversations. In particular, we devise an Incremental Transformer to encode multi-turn utterances along with knowledge in related documents. Motivated by the human cognitive process, we design a two-pass decoder (Deliberation Decoder) to improve context coherence and knowledge correctness. Our empirical study on a real-world Document Grounded Dataset proves that responses generated by our model significantly outperform competitive baselines on both context coherence and knowledge relevance.

pdf bib
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Chenze Shao | Yang Feng | Jinchao Zhang | Fandong Meng | Xilin Chen | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model through discarding the autoregressive mechanism and generating target words independently, which fails to exploit the target sequential information. Over-translation and under-translation errors often occur for the above reason, especially in the long sentence translation scenario. In this paper, we propose two approaches to retrieve the target sequential information for NAT to enhance its translation ability while preserving the fast-decoding property. Firstly, we propose a sequence-level training method based on a novel reinforcement algorithm for NAT (Reinforce-NAT) to reduce the variance and stabilize the training procedure. Secondly, we propose an innovative Transformer decoder named FS-decoder to fuse the target sequential information into the top layer of the decoder. Experimental results on three translation tasks show that the Reinforce-NAT surpasses the baseline NAT system by a significant margin on BLEU without decelerating the decoding speed and the FS-decoder achieves comparable translation performance to the autoregressive Transformer with considerable speedup.

pdf bib
Bridging the Gap between Training and Inference for Neural Machine Translation
Wen Zhang | Yang Feng | Fandong Meng | Di You | Qun Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural Machine Translation (NMT) generates target words sequentially in the way of predicting the next word conditioned on the context words. At training time, it predicts with the ground truth words as context while at inference it has to generate the entire sequence from scratch. This discrepancy of the fed context leads to error accumulation among the way. Furthermore, word-level training requires strict matching between the generated sequence and the ground truth sequence which leads to overcorrection over different but reasonable translations. In this paper, we address these issues by sampling context words not only from the ground truth sequence but also from the predicted sequence by the model during training, where the predicted sequence is selected with a sentence-level optimum. Experiment results on Chinese->English and WMT’14 English->German translation tasks demonstrate that our approach can achieve significant improvements on multiple datasets.

pdf bib
Modeling Semantic Relationship in Multi-turn Conversations with Hierarchical Latent Variables
Lei Shen | Yang Feng | Haolan Zhan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multi-turn conversations consist of complex semantic structures, and it is still a challenge to generate coherent and diverse responses given previous utterances. It’s practical that a conversation takes place under a background, meanwhile, the query and response are usually most related and they are consistent in topic but also different in content. However, little work focuses on such hierarchical relationship among utterances. To address this problem, we propose a Conversational Semantic Relationship RNN (CSRR) model to construct the dependency explicitly. The model contains latent variables in three hierarchies. The discourse-level one captures the global background, the pair-level one stands for the common topic information between query and response, and the utterance-level ones try to represent differences in content. Experimental results show that our model significantly improves the quality of responses in terms of fluency, coherence, and diversity compared to baseline methods.

pdf bib
Improving Domain Adaptation Translation with Domain Invariant and Specific Information
Shuhao Gu | Yang Feng | Qun Liu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In domain adaptation for neural machine translation, translation performance can benefit from separating features into domain-specific features and common features. In this paper, we propose a method to explicitly model the two kinds of information in the encoder-decoder framework so as to exploit out-of-domain data in in-domain training. In our method, we maintain a private encoder and a private decoder for each domain which are used to model domain-specific information. In the meantime, we introduce a common encoder and a common decoder shared by all the domains which can only have domain-independent information flow through. Besides, we add a discriminator to the shared encoder and employ adversarial training for the whole model to reinforce the performance of information separation and machine translation simultaneously. Experiment results show that our method can outperform competitive baselines greatly on multiple data sets.

2018

pdf bib
Refining Source Representations with Relation Networks for Neural Machine Translation
Wen Zhang | Jiawei Hu | Yang Feng | Qun Liu
Proceedings of the 27th International Conference on Computational Linguistics

Although neural machine translation with the encoder-decoder framework has achieved great success recently, it still suffers drawbacks of forgetting distant information, which is an inherent disadvantage of recurrent neural network structure, and disregarding relationship between source words during encoding step. Whereas in practice, the former information and relationship are often useful in current step. We target on solving these problems and thus introduce relation networks to learn better representations of the source. The relation networks are able to facilitate memorization capability of recurrent neural network via associating source words with each other, this would also help retain their relationships. Then the source representations and all the relations are fed into the attention component together while decoding, with the main encoder-decoder framework unchanged. Experiments on several datasets show that our method can improve the translation performance significantly over the conventional encoder-decoder model and even outperform the approach involving supervised syntactic knowledge.

pdf bib
Speeding Up Neural Machine Translation Decoding by Cube Pruning
Wen Zhang | Liang Huang | Yang Feng | Lei Shen | Qun Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Although neural machine translation has achieved promising results, it suffers from slow translation speed. The direct consequence is that a trade-off has to be made between translation quality and speed, thus its performance can not come into full play. We apply cube pruning, a popular technique to speed up dynamic programming, into neural machine translation to speed up the translation. To construct the equivalence class, similar target hidden states are combined, leading to less RNN expansion operations on the target side and less softmax operations over the large target vocabulary. The experiments show that, at the same or even better translation quality, our method can translate faster compared with naive beam search by 3.3x on GPUs and 3.5x on CPUs.

pdf bib
Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation
Chenze Shao | Xilin Chen | Yang Feng
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural machine translation (NMT) models are usually trained with the word-level loss using the teacher forcing algorithm, which not only evaluates the translation improperly but also suffers from exposure bias. Sequence-level training under the reinforcement framework can mitigate the problems of the word-level loss, but its performance is unstable due to the high variance of the gradient estimation. On these grounds, we present a method with a differentiable sequence-level training objective based on probabilistic n-gram matching which can avoid the reinforcement framework. In addition, this method performs greedy search in the training which uses the predicted words as context just as at inference to alleviate the problem of exposure bias. Experiment results on the NIST Chinese-to-English translation tasks show that our method significantly outperforms the reinforcement-based algorithms and achieves an improvement of 1.5 BLEU points on average over a strong baseline system.

pdf bib
Knowledge Diffusion for Neural Dialogue Generation
Shuman Liu | Hongshen Chen | Zhaochun Ren | Yang Feng | Qun Liu | Dawei Yin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

End-to-end neural dialogue generation has shown promising results recently, but it does not employ knowledge to guide the generation and hence tends to generate short, general, and meaningless responses. In this paper, we propose a neural knowledge diffusion (NKD) model to introduce knowledge into dialogue generation. This method can not only match the relevant facts for the input utterance but diffuse them to similar entities. With the help of facts matching and entity diffusion, the neural dialogue generation is augmented with the ability of convergent and divergent thinking over the knowledge base. Our empirical study on a real-world dataset prove that our model is capable of generating meaningful, diverse and natural responses for both factoid-questions and knowledge grounded chi-chats. The experiment results also show that our model outperforms competitive baseline models significantly.

2017

pdf bib
Memory-augmented Neural Machine Translation
Yang Feng | Shiyue Zhang | Andi Zhang | Dong Wang | Andrew Abel
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Neural machine translation (NMT) has achieved notable success in recent times, however it is also widely recognized that this approach has limitations with handling infrequent words and word pairs. This paper presents a novel memory-augmented NMT (M-NMT) architecture, which stores knowledge about how words (usually infrequently encountered ones) should be translated in a memory and then utilizes them to assist the neural model. We use this memory mechanism to combine the knowledge learned from a conventional statistical machine translation system and the rules learned by an NMT system, and also propose a solution for out-of-vocabulary (OOV) words based on this framework. Our experiments on two Chinese-English translation tasks demonstrated that the M-NMT architecture outperformed the NMT baseline by 9.0 and 2.7 BLEU points on the two tasks, respectively. Additionally, we found this architecture resulted in a much more effective OOV treatment compared to competitive methods.

pdf bib
Flexible and Creative Chinese Poetry Generation Using Neural Memory
Jiyuan Zhang | Yang Feng | Dong Wang | Yang Wang | Andrew Abel | Shiyue Zhang | Andi Zhang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.

2014

pdf bib
Factored Markov Translation with Robust Modeling
Yang Feng | Trevor Cohn | Xinkai Du
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

2013

pdf bib
A Markov Model of Machine Translation using Non-parametric Bayesian Inference
Yang Feng | Trevor Cohn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Automatic Bilingual Phrase Extraction from Comparable Corpora
Ahmet Aker | Yang Feng | Robert Gaizauskas
Proceedings of COLING 2012: Posters

pdf bib
Hierarchical Chunk-to-String Translation
Yang Feng | Dongdong Zhang | Mu Li | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Left-to-Right Tree-to-String Decoding with Prediction
Yang Feng | Yang Liu | Qun Liu | Trevor Cohn
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2010

pdf bib
An Efficient Shift-Reduce Decoding Algorithm for Phrased-Based Machine Translation
Yang Feng | Haitao Mi | Yang Liu | Qun Liu
Coling 2010: Posters

2009

pdf bib
Joint Decoding with Multiple Translation Models
Yang Liu | Haitao Mi | Yang Feng | Qun Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Lattice-based System Combination for Statistical Machine Translation
Yang Feng | Yang Liu | Haitao Mi | Qun Liu | Yajuan Lü
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing