Rui Yan


2019

pdf bib
One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues
Chongyang Tao | Wei Wu | Can Xu | Wenpeng Hu | Dongyan Zhao | Rui Yan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Currently, researchers have paid great attention to retrieval-based dialogues in open-domain. In particular, people study the problem by investigating context-response matching for multi-turn response selection based on publicly recognized benchmark data sets. State-of-the-art methods require a response to interact with each utterance in a context from the beginning, but the interaction is performed in a shallow way. In this work, we let utterance-response interaction go deep by proposing an interaction-over-interaction network (IoI). The model performs matching by stacking multiple interaction blocks in which residual information from one time of interaction initiates the interaction process again. Thus, matching information within an utterance-response pair is extracted from the interaction of the pair in an iterative fashion, and the information flows along the chain of the blocks via representations. Evaluation results on three benchmark data sets indicate that IoI can significantly outperform state-of-the-art methods in terms of various matching metrics. Through further analysis, we also unveil how the depth of interaction affects the performance of IoI.

pdf bib
Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems
Jiazhan Feng | Chongyang Tao | Wei Wu | Yansong Feng | Dongyan Zhao | Rui Yan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study learning of a matching model for response selection in retrieval-based dialogue systems. The problem is equally important with designing the architecture of a model, but is less explored in existing literature. To learn a robust matching model from noisy training data, we propose a general co-teaching framework with three specific teaching strategies that cover both teaching with loss functions and teaching with data curriculum. Under the framework, we simultaneously learn two matching models with independent training sets. In each iteration, one model transfers the knowledge learned from its training set to the other model, and at the same time receives the guide from the other model on how to overcome noise in training. Through being both a teacher and a student, the two models learn from each other and get improved together. Evaluation results on two public data sets indicate that the proposed learning approach can generally and significantly improve the performance of existing matching models.

pdf bib
Are Training Samples Correlated? Learning to Generate Dialogue Responses with Multiple References
Lisong Qiu | Juntao Li | Wei Bi | Dongyan Zhao | Rui Yan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Due to its potential applications, open-domain dialogue generation has become popular and achieved remarkable progress in recent years, but sometimes suffers from generic responses. Previous models are generally trained based on 1-to-1 mapping from an input query to its response, which actually ignores the nature of 1-to-n mapping in dialogue that there may exist multiple valid responses corresponding to the same query. In this paper, we propose to utilize the multiple references by considering the correlation of different valid responses and modeling the 1-to-n mapping with a novel two-step generation architecture. The first generation phase extracts the common features of different responses which, combined with distinctive features obtained in the second phase, can generate multiple diverse and appropriate responses. Experimental results show that our proposed model can effectively improve the quality of response and outperform existing neural dialogue models on both automatic and human evaluations.

pdf bib
Sampling Matters! An Empirical Study of Negative Sampling Strategies for Learning of Matching Models in Retrieval-based Dialogue Systems
Jia Li | Chongyang Tao | Wei Wu | Yansong Feng | Dongyan Zhao | Rui Yan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We study how to sample negative examples to automatically construct a training set for effective model learning in retrieval-based dialogue systems. Following an idea of dynamically adapting negative examples to matching models in learning, we consider four strategies including minimum sampling, maximum sampling, semi-hard sampling, and decay-hard sampling. Empirical studies on two benchmarks with three matching models indicate that compared with the widely used random sampling strategy, although the first two strategies lead to performance drop, the latter two ones can bring consistent improvement to the performance of all the models on both benchmarks.

pdf bib
Who Is Speaking to Whom? Learning to Identify Utterance Addressee in Multi-Party Conversations
Ran Le | Wenpeng Hu | Mingyue Shang | Zhenjun You | Lidong Bing | Dongyan Zhao | Rui Yan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Previous research on dialogue systems generally focuses on the conversation between two participants, yet multi-party conversations which involve more than two participants within one session bring up a more complicated but realistic scenario. In real multi- party conversations, we can observe who is speaking, but the addressee information is not always explicit. In this paper, we aim to tackle the challenge of identifying all the miss- ing addressees in a conversation session. To this end, we introduce a novel who-to-whom (W2W) model which models users and utterances in the session jointly in an interactive way. We conduct experiments on the benchmark Ubuntu Multi-Party Conversation Corpus and the experimental results demonstrate that our model outperforms baselines with consistent improvements.

pdf bib
Modeling Personalization in Continuous Space for Response Generation via Augmented Wasserstein Autoencoders
Zhangming Chan | Juntao Li | Xiaopeng Yang | Xiuying Chen | Wenpeng Hu | Dongyan Zhao | Rui Yan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Variational autoencoders (VAEs) and Wasserstein autoencoders (WAEs) have achieved noticeable progress in open-domain response generation. Through introducing latent variables in continuous space, these models are capable of capturing utterance-level semantics, e.g., topic, syntactic properties, and thus can generate informative and diversified responses. In this work, we improve the WAE for response generation. In addition to the utterance-level information, we also model user-level information in latent continue space. Specifically, we embed user-level and utterance-level information into two multimodal distributions, and combine these two multimodal distributions into a mixed distribution. This mixed distribution will be used as the prior distribution of WAE in our proposed model, named as PersonaWAE. Experimental results on a large-scale real-world dataset confirm the superiority of our model for generating informative and personalized responses, where both automatic and human evaluations outperform state-of-the-art models.

pdf bib
How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing
Shen Gao | Xiuying Chen | Piji Li | Zhangming Chan | Dongyan Zhao | Rui Yan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Under special circumstances, summaries should conform to a particular style with patterns, such as court judgments and abstracts in academic papers. To this end, the prototype document-summary pairs can be utilized to generate better summaries. There are two main challenges in this task: (1) the model needs to incorporate learned patterns from the prototype, but (2) should avoid copying contents other than the patternized words—such as irrelevant facts—into the generated summaries. To tackle these challenges, we design a model named Prototype Editing based Summary Generator (PESG). PESG first learns summary patterns and prototype facts by analyzing the correlation between a prototype document and its summary. Prototype facts are then utilized to help extract facts from the input document. Next, an editing generator generates new summary based on the summary pattern or extracted facts. Finally, to address the second challenge, a fact checker is used to estimate mutual information between the input document and generated summary, providing an additional signal for the generator. Extensive experiments conducted on a large-scale real-world text summarization dataset show that PESG achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.

pdf bib
Semi-supervised Text Style Transfer: Cross Projection in Latent Space
Mingyue Shang | Piji Li | Zhenxin Fu | Lidong Bing | Dongyan Zhao | Shuming Shi | Rui Yan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Text style transfer task requires the model to transfer a sentence of one style to another style while retaining its original content meaning, which is a challenging problem that has long suffered from the shortage of parallel data. In this paper, we first propose a semi-supervised text style transfer model that combines the small-scale parallel data with the large-scale nonparallel data. With these two types of training data, we introduce a projection function between the latent space of different styles and design two constraints to train it. We also introduce two other simple but effective semi-supervised methods to compare with. To evaluate the performance of the proposed methods, we build and release a novel style transfer dataset that alters sentences between the style of ancient Chinese poem and the modern Chinese.

pdf bib
Stick to the Facts: Learning towards a Fidelity-oriented E-Commerce Product Description Generation
Zhangming Chan | Xiuying Chen | Yongliang Wang | Juntao Li | Zhiqiang Zhang | Kun Gai | Dongyan Zhao | Rui Yan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Different from other text generation tasks, in product description generation, it is of vital importance to generate faithful descriptions that stick to the product attribute information. However, little attention has been paid to this problem. To bridge this gap we propose a model named Fidelity-oriented Product Description Generator (FPDG). FPDG takes the entity label of each word into account, since the product attribute information is always conveyed by entity words. Specifically, we first propose a Recurrent Neural Network (RNN) decoder based on the Entity-label-guided Long Short-Term Memory (ELSTM) cell, taking both the embedding and the entity label of each word as input. Second, we establish a keyword memory that stores the entity labels as keys and keywords as values, and FPDG will attend to keywords through attending to their entity labels. Experiments conducted a large-scale real-world product description dataset show that our model achieves the state-of-the-art performance in terms of both traditional generation metrics as well as human evaluations. Specifically, FPDG increases the fidelity of the generated descriptions by 25%.

2018

pdf bib
On the Abstractiveness of Neural Document Summarization
Fangfang Zhang | Jin-ge Yao | Rui Yan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Many modern neural document summarization systems based on encoder-decoder networks are designed to produce abstractive summaries. We attempted to verify the degree of abstractiveness of modern neural abstractive summarization systems by calculating overlaps in terms of various types of units. Upon the observation that many abstractive systems tend to be near-extractive in practice, we also implemented a pure copy system, which achieved comparable results as abstractive summarizers while being far more computationally efficient. These findings suggest the possibility for future efforts towards more efficient systems that could better utilize the vocabulary in the original document.

pdf bib
Generating Classical Chinese Poems via Conditional Variational Autoencoder and Adversarial Training
Juntao Li | Yan Song | Haisong Zhang | Dongmin Chen | Shuming Shi | Dongyan Zhao | Rui Yan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

It is a challenging task to automatically compose poems with not only fluent expressions but also aesthetic wording. Although much attention has been paid to this task and promising progress is made, there exist notable gaps between automatically generated ones with those created by humans, especially on the aspects of term novelty and thematic consistency. Towards filling the gap, in this paper, we propose a conditional variational autoencoder with adversarial training for classical Chinese poem generation, where the autoencoder part generates poems with novel terms and a discriminator is applied to adversarially learn their thematic consistency with their titles. Experimental results on a large poetry corpus confirm the validity and effectiveness of our model, where its automatic and human evaluation scores outperform existing models.

pdf bib
Iterative Document Representation Learning Towards Summarization with Polishing
Xiuying Chen | Shen Gao | Chongyang Tao | Yan Song | Dongyan Zhao | Rui Yan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce Iterative Text Summarization (ITS), an iteration-based model for supervised extractive text summarization, inspired by the observation that it is often necessary for a human to read an article multiple times in order to fully understand and summarize its contents. Current summarization approaches read through a document only once to generate a document representation, resulting in a sub-optimal representation. To address this issue we introduce a model which iteratively polishes the document representation on many passes through the document. As part of our model, we also introduce a selective reading mechanism that decides more accurately the extent to which each sentence in the model should be updated. Experimental results on the CNN/DailyMail and DUC2002 datasets demonstrate that our model significantly outperforms state-of-the-art extractive systems when evaluated by machines and by humans.

bib
Deep Chit-Chat: Deep Learning for ChatBots
Wei Wu | Rui Yan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

The tutorial is based on the long-term efforts on building conversational models with deep learning approaches for chatbots. We will summarize the fundamental challenges in modeling open domain dialogues, clarify the difference from modeling goal-oriented dialogues, and give an overview of state-of-the-art methods for open domain conversation including both retrieval-based methods and generation-based methods. In addition to these, our tutorial will also cover some new trends of research of chatbots, such as how to design a reasonable evaluation system and how to "control" conversations from a chatbot with some specific information such as personas, styles, and emotions, etc.

pdf bib
Marrying Up Regular Expressions with Neural Networks: A Case Study for Spoken Language Understanding
Bingfeng Luo | Yansong Feng | Zheng Wang | Songfang Huang | Rui Yan | Dongyan Zhao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The success of many natural language processing (NLP) tasks is bound by the number and quality of annotated data, but there is often a shortage of such training data. In this paper, we ask the question: “Can we combine a neural network (NN) with regular expressions (RE) to improve supervised learning for NLP?”. In answer, we develop novel methods to exploit the rich expressiveness of REs at different levels within a NN, showing that the combination significantly enhances the learning effectiveness when a small number of training examples are available. We evaluate our approach by applying it to spoken language understanding for intent detection and slot filling. Experimental results show that our approach is highly effective in exploiting the available training data, giving a clear boost to the RE-unaware NN.

pdf bib
Modeling discourse cohesion for discourse parsing via memory network
Yanyan Jia | Yuan Ye | Yansong Feng | Yuxuan Lai | Rui Yan | Dongyan Zhao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Identifying long-span dependencies between discourse units is crucial to improve discourse parsing performance. Most existing approaches design sophisticated features or exploit various off-the-shelf tools, but achieve little success. In this paper, we propose a new transition-based discourse parser that makes use of memory networks to take discourse cohesion into account. The automatically captured discourse cohesion benefits discourse parsing, especially for long span scenarios. Experiments on the RST discourse treebank show that our method outperforms traditional featured based methods, and the memory based discourse cohesion can improve the overall parsing performance significantly.

2017

pdf bib
Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix
Bingfeng Luo | Yansong Feng | Zheng Wang | Zhanxing Zhu | Songfang Huang | Rui Yan | Dongyan Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction. We show that the dynamic transition matrix can effectively characterize the noise in the training data built by distant supervision. The transition matrix can be effectively trained using a novel curriculum learning based method without any direct supervision about the noise. We thoroughly evaluate our approach under a wide range of extraction scenarios. Experimental results show that our approach consistently improves the extraction results and outperforms the state-of-the-art in various evaluation scenarios.

pdf bib
How to Make Context More Useful? An Empirical Study on Context-Aware Neural Conversational Models
Zhiliang Tian | Rui Yan | Lili Mou | Yiping Song | Yansong Feng | Dongyan Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Generative conversational systems are attracting increasing attention in natural language processing (NLP). Recently, researchers have noticed the importance of context information in dialog processing, and built various models to utilize context. However, there is no systematic comparison to analyze how to use context effectively. In this paper, we conduct an empirical study to compare various models and investigate the effect of context information in dialog systems. We also propose a variant that explicitly weights context vectors by context-query relevance, outperforming the other baselines.

pdf bib
Towards Implicit Content-Introducing for Generative Short-Text Conversation Systems
Lili Yao | Yaoyuan Zhang | Yansong Feng | Dongyan Zhao | Rui Yan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The study on human-computer conversation systems is a hot research topic nowadays. One of the prevailing methods to build the system is using the generative Sequence-to-Sequence (Seq2Seq) model through neural networks. However, the standard Seq2Seq model is prone to generate trivial responses. In this paper, we aim to generate a more meaningful and informative reply when answering a given question. We propose an implicit content-introducing method which incorporates additional information into the Seq2Seq model in a flexible way. Specifically, we fuse the general decoding and the auxiliary cue word information through our proposed hierarchical gated fusion unit. Experiments on real-life data demonstrate that our model consistently outperforms a set of competitive baselines in terms of BLEU scores and human evaluation.

pdf bib
Diversifying Neural Conversation Model with Maximal Marginal Relevance
Yiping Song | Zhiliang Tian | Dongyan Zhao | Ming Zhang | Rui Yan
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Neural conversation systems, typically using sequence-to-sequence (seq2seq) models, are showing promising progress recently. However, traditional seq2seq suffer from a severe weakness: during beam search decoding, they tend to rank universal replies at the top of the candidate list, resulting in the lack of diversity among candidate replies. Maximum Marginal Relevance (MMR) is a ranking algorithm that has been widely used for subset selection. In this paper, we propose the MMR-BS decoding method, which incorporates MMR into the beam search (BS) process of seq2seq. The MMR-BS method improves the diversity of generated replies without sacrificing their high relevance with the user-issued query. Experiments show that our proposed model achieves the best performance among other comparison methods.

2016

pdf bib
Multi-view Response Selection for Human-Computer Conversation
Xiangyang Zhou | Daxiang Dong | Hua Wu | Shiqi Zhao | Dianhai Yu | Hao Tian | Xuan Liu | Rui Yan
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
How Transferable are Neural Networks in NLP Applications?
Lili Mou | Zhao Meng | Rui Yan | Ge Li | Yan Xu | Lu Zhang | Zhi Jin
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation
Lili Mou | Yiping Song | Rui Yan | Ge Li | Lu Zhang | Zhi Jin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Using neural networks to generate replies in human-computer dialogue systems is attracting increasing attention over the past few years. However, the performance is not satisfactory: the neural network tends to generate safe, universally relevant replies which carry little meaning. In this paper, we propose a content-introducing approach to neural network-based generative dialogue systems. We first use pointwise mutual information (PMI) to predict a noun as a keyword, reflecting the main gist of the reply. We then propose seq2BF, a “sequence to backward and forward sequences” model, which generates a reply containing the given keyword. Experimental results show that our approach significantly outperforms traditional sequence-to-sequence models in terms of human evaluation and the entropy measure, and that the predicted keyword can appear at an appropriate position in the reply.

pdf bib
Enriching Cold Start Personalized Language Model Using Social Network Information
Yu-Yang Huang | Rui Yan | Tsung-Ting Kuo | Shou-De Lin
International Journal of Computational Linguistics & Chinese Language Processing, Volume 21, Number 1, June 2016

pdf bib
Chinese Couplet Generation with Neural Network Structures
Rui Yan | Cheng-Te Li | Xiaohua Hu | Ming Zhang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Natural Language Inference by Tree-Based Convolution and Heuristic Matching
Lili Mou | Rui Men | Ge Li | Yan Xu | Lu Zhang | Rui Yan | Zhi Jin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Tackling Sparsity, the Achilles Heel of Social Networks: Language Model Smoothing via Social Regularization
Rui Yan | Xiang Li | Mengwen Liu | Xiaohua Hu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors
Rui Yan | Mingkun Gao | Ellie Pavlick | Chris Callison-Burch
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Enriching Cold Start Personalized Language Model Using Social Network Information
Yu-Yang Huang | Rui Yan | Tsung-Ting Kuo | Shou-De Lin
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval
Rui Yan | Han Jiang | Mirella Lapata | Shou-De Lin | Xueqiang Lv | Xiaoming Li
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Tweet Recommendation with Graph Co-Ranking
Rui Yan | Mirella Lapata | Xiaoming Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Timeline Generation through Evolutionary Trans-Temporal Summarization
Rui Yan | Liang Kong | Congrui Huang | Xiaojun Wan | Xiaoming Li | Yan Zhang
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization
Rui Yan | Jian-Yun Nie | Xiaoming Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing