Yizhe Zhang


2019

pdf bib
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models
Dinghan Shen | Asli Celikyilmaz | Yizhe Zhang | Liqun Chen | Xin Wang | Jianfeng Gao | Lawrence Carin
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables. However, previous works typically focus on synthesizing relatively short sentences (up to 20 words), and the posterior collapse issue has been widely identified in text-VAEs. In this paper, we propose to leverage several multi-level structures to learn a VAE model for generating long, and coherent text. In particular, a hierarchy of stochastic layers between the encoder and decoder networks is employed to abstract more informative and semantic-rich latent codes. Besides, we utilize a multi-level decoder structure to capture the coherent long-term structure inherent in long-form texts, by generating intermediate sentence representations as high-level plan vectors. Extensive experimental results demonstrate that the proposed multi-level VAE model produces more coherent and less repetitive long text compared to baselines as well as can mitigate the posterior-collapse issue.

pdf bib
Improving Textual Network Embedding with Global Attention via Optimal Transport
Liqun Chen | Guoyin Wang | Chenyang Tao | Dinghan Shen | Pengyu Cheng | Xinyuan Zhang | Wenlin Wang | Yizhe Zhang | Lawrence Carin
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Constituting highly informative network embeddings is an essential tool for network analysis. It encodes network topology, along with other useful side information, into low dimensional node-based feature representations that can be exploited by statistical modeling. This work focuses on learning context-aware network embeddings augmented with text data. We reformulate the network embedding problem, and present two novel strategies to improve over traditional attention mechanisms: (i) a content-aware sparse attention module based on optimal transport; and (ii) a high-level attention parsing module. Our approach yields naturally sparse and self-normalized relational inference. It can capture long-term interactions between sequences, thus addressing the challenges faced by existing textual network embedding schemes. Extensive experiments are conducted to demonstrate our model can consistently outperform alternative state-of-the-art methods.

pdf bib
Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Vighnesh Leonardo Shiv | Chris Quirk | Anshuman Suri | Xiang Gao | Khuram Shahid | Nithya Govindarajan | Yizhe Zhang | Jianfeng Gao | Michel Galley | Chris Brockett | Tulasi Menon | Bill Dolan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.

pdf bib
Structuring Latent Spaces for Stylized Response Generation
Xiang Gao | Yizhe Zhang | Sungjin Lee | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

pdf bib
Domain Adaptive Text Style Transfer
Dianqi Li | Yizhe Zhang | Zhe Gan | Yu Cheng | Chris Brockett | Bill Dolan | Ming-Ting Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Text style transfer without parallel data has achieved some practical success. However, in the scenario where less data is available, these methods may yield poor performance. In this paper, we examine domain adaptation for text style transfer to leverage massively available data from other domains. These data may demonstrate domain shift, which impedes the benefits of utilizing such data for training. To address this challenge, we propose simple yet effective domain adaptive text style transfer models, enabling domain-adaptive information exchange. The proposed models presumably learn from the source domain to: (i) distinguish stylized information and generic content information; (ii) maximally preserve content information; and (iii) adaptively transfer the styles in a domain-aware manner. We evaluate the proposed models on two style transfer tasks (sentiment and formality) over multiple target domains where only limited non-parallel data is available. Extensive experiments demonstrate the effectiveness of the proposed model compared to the baselines.

pdf bib
Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models
Woon Sang Cho | Yizhe Zhang | Sudha Rao | Chris Brockett | Sungjin Lee
Proceedings of the 3rd Workshop on Neural Generation and Translation

Ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined queries, each of which relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. We propose a new task of generating common question from multiple documents and present simple variant of an existing multi-source encoder-decoder framework, called the Multi-Source Question Generator (MSQG). We first train an RNN-based single encoder-decoder generator from (single document, question) pairs. At test time, given multiple documents, the Distribute step of our MSQG model predicts target word distributions for each document using the trained model. The Aggregate step aggregates these distributions to generate a common question. This simple yet effective strategy significantly outperforms several existing baseline models applied to the new task when evaluated using automated metrics and human judgments on the MS-MARCO-QA dataset.

pdf bib
Towards Coherent and Cohesive Long-form Text Generation
Woon Sang Cho | Pengchuan Zhang | Yizhe Zhang | Xiujun Li | Michel Galley | Chris Brockett | Mengdi Wang | Jianfeng Gao
Proceedings of the First Workshop on Narrative Understanding

Generating coherent and cohesive long-form texts is a challenging task. Previous works relied on large amounts of human-generated texts to train neural language models. However, few attempted to explicitly improve neural language models from the perspectives of coherence and cohesion. In this work, we propose a new neural language model that is equipped with two neural discriminators which provide feedback signals at the levels of sentence (cohesion) and paragraph (coherence). Our model is trained using a simple yet efficient variant of policy gradient, called ‘negative-critical sequence training’, which is proposed to eliminate the need of training a separate critic for estimating ‘baseline’. Results demonstrate the effectiveness of our approach, showing improvements over the strong baseline – recurrent attention-based bidirectional MLE-trained neural language model.

pdf bib
Unsupervised Dialogue Spectrum Generation for Log Dialogue Ranking
Xinnuo Xu | Yizhe Zhang | Lars Liden | Sungjin Lee
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Although the data-driven approaches of some recent bot building platforms make it possible for a wide range of users to easily create dialogue systems, those platforms don’t offer tools for quickly identifying which log dialogues contain problems. This is important since corrections to log dialogues provide a means to improve performance after deployment. A log dialogue ranker, which ranks problematic dialogues higher, is an essential tool due to the sheer volume of log dialogues that could be generated. However, training a ranker typically requires labelling a substantial amount of data, which is not feasible for most users. In this paper, we present a novel unsupervised approach for dialogue ranking using GANs and release a corpus of labelled dialogues for evaluation and comparison with supervised methods. The evaluation result shows that our method compares favorably to supervised methods without any labelled data.

pdf bib
Jointly Optimizing Diversity and Relevance in Neural Response Generation
Xiang Gao | Sungjin Lee | Yizhe Zhang | Chris Brockett | Michel Galley | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

2018

pdf bib
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Dinghan Shen | Guoyin Wang | Wenlin Wang | Martin Renqiang Min | Qinliang Su | Yizhe Zhang | Chunyuan Li | Ricardo Henao | Lawrence Carin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

pdf bib
Joint Embedding of Words and Labels for Text Classification
Guoyin Wang | Chunyuan Li | Wenlin Wang | Yizhe Zhang | Dinghan Shen | Xinyuan Zhang | Ricardo Henao | Lawrence Carin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding problem: each label is embedded in the same space with the word vectors. We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels. The attention is learned on a training set of labeled samples to ensure that, given a text sequence, the relevant words are weighted higher than the irrelevant ones. Our method maintains the interpretability of word embeddings, and enjoys a built-in ability to leverage alternative sources of information, in addition to input text sequences. Extensive results on the several large text datasets show that the proposed framework outperforms the state-of-the-art methods by a large margin, in terms of both accuracy and speed.