Syntax-Infused Variational Autoencoder for Text Generation

We present a syntax-infused variational autoencoder (SIVAE), that integrates sentences with their syntactic trees to improve the grammar of generated sentences. Distinct from existing VAE-based text generative models, SIVAE contains two separate latent spaces, for sentences and syntactic trees. The evidence lower bound objective is redesigned correspondingly, by optimizing a joint distribution that accommodates two encoders and two decoders. SIVAE works with long short-term memory architectures to simultaneously generate sentences and syntactic trees. Two versions of SIVAE are proposed: one captures the dependencies between the latent variables through a conditional prior network, and the other treats the latent variables independently such that syntactically-controlled sentence generation can be performed. Experimental results demonstrate the generative superiority of SIVAE on both reconstruction and targeted syntactic evaluations. Finally, we show that the proposed models can be used for unsupervised paraphrasing given different syntactic tree templates.


Introduction
Neural language models based on recurrent neural networks (Mikolov et al., 2010) and sequence-tosequence architectures (Sutskever et al., 2014) have revolutionized the NLP world.Deep latent variable modes, in particular, the variational autoencoders (VAE) (Kingma and Welling, 2014;Rezende et al., 2014) integrating inference models with neural language models have been widely adopted on text generation (Bowman et al., 2016;Yang et al., 2017;Kim et al., 2018), where the encoder and the decoder are modeled by long short-term memory  (LSTM) networks (Chung et al., 2014).For a random vector from the latent space representing an unseen input, the decoder can generate realisticlooking novel data in the context of a text model, making the VAE an attractive generative model.Compared to simple neural language models, the latent representation in a VAE is supposed to give the model more expressive capacity.
Although syntactic properties can be implicitly discovered by such generative models, Shi et al. (2016) show that many deep structural details are still missing in the generated text.As a result of the absence of explicit syntactic information, generative models often produce ungrammatical sentences.To address this problem, recent works attempt to leverage explicit syntactic knowledge to improve the quality of machine translation (Eriguchi et al., 2016;Bastings et al., 2017;Chen et al., 2017) and achieve good results.Motivated by such success, we suggest that deep latent variable models for text generation can also benefit from the incorporation of syntactic knowledge.Instead of solely modeling sentences, we want to utilize augmented data by introducing an auxiliary input, a syntactic tree, to enrich the latent representation and make the generated sentences more grammatical and fluent.Syntactic trees can either be obtained from existing human-labeled trees or syntactically parsed sentences using well-developed parsers.An example of a constituency tree is shown in Figure 1.In this work, we remove leaf nodes and linearize the bracketed parse structure into a syntactic tree sequence to simplify the encoding and decoding processes.For example, the syntactic tree sequence for the sentence "The book that you love is good." is Given such data, we aim to train a latent variable model that jointly encodes and decodes a sentence and its syntactic tree.
We propose a syntax-infused VAE model to help improve generation, by integrating syntactic trees with sentences.In contrast to the current VAEbased sentence-generation models, a key differentiating aspect of SIVAE is that we map the sentences and the syntactic trees into two latent representations, and generate them separately from the two latent spaces.This design decouples the semantic and syntactic representations and makes it possible to concentrate generation with respect to either syntactic variation or semantic richness.To accommodate the two latent spaces in one VAE framework, the evidence lower bound (ELBO) objective needs to be redesigned based on optimizing the joint log likelihood of sentences and syntactic trees.This new objective makes SIVAE a task-agnostic model, with two encoders and two decoders, so that it can be further used for other generative tasks.
Two variants of SIVAE that differ in the forms of the prior distributions corresponding to the syntactic tree latent variables are presented.SIVAE-c captures dependencies between two latent variables by making the syntax prior conditioned on the sentence prior.During generation, we first sample a latent variable from the sentence latent space and then sample the syntactic tree latent variable depending on the sampled sentence latent variable.This process resembles how humans write: think about substances like entities and topics first, then realize with a specific syntactic structure.We further propose SIVAE-i assuming the two priors are independent, and change the ELBO of the joint log likelihood correspondingly.This independence assumption manifests syntactically-controlled sentence generation as it allows to alter the syntactic structure, desirable for related tasks like paraphrase generation.Given a sentence and a syntactic tree template, the model produces a paraphrase of the sentence whose syntax conforms to the template.Our SIVAE-based paraphrasing network is purely unsupervised, which makes it particularly suitable for generating paraphrases in low-resource languages or types of content.
The experiments are conducted on two datasets: one has trees labeled by humans and the other has trees parsed by a state-of-the-art parser (Kitaev and Klein, 2018).Other than employing the standard language modeling evaluation metrics like perplexity, we also adopt the targeted syntactic evaluation (Marvin and Linzen, 2018) to verify whether the incorporation of syntactic trees improves the grammar of generated sentences.Experiments demonstrate that the proposed model improves the quality of generated sentences compared to other baseline methods, on both the reconstruction and grammar evaluations.The proposed methods show the ability for unsupervised paraphrase generation under different syntactic tree templates.
Our contributions are four-fold: i) We propose a syntax-infused VAE that integrates syntactic trees with sentences, to grammatically improve the generated sentences.ii) We redesign the ELBO of the joint log likelihood, to accommodate two separate latent spaces in one VAE framework, for two SIVAE model variants based on different intuitions, which can be further used for other applications.iii) We evaluate our models on data with humanconstituted trees or parsed trees, and yield promising results in generating sentences with better reconstruction loss and less grammatical errors, compared to other baseline methods.iv) We present an unsupervised paraphrasing network based on SIVAE-i that can perform syntactically controlled paraphrase generation.

Methodology
Given a sentence x and its corresponding syntactic tree y, the goal is to jointly encode x and y into latent representations z x ∈ R d and z y ∈ R d , and then decode them jointly from the two latent spaces.We employ the VAE framework such that realisticlooking novel sentences can be generated with randomly sampled latent representations.However, current VAE-based language models cannot accommodate two separate latent spaces for z x and z y .To incorporate x, y, z x , and z y in one VAE framework, the objective needs to be redesigned to optimize the log joint likelihood log p(x, y).We propose two model variants of SIVAE.The first  model (SIVAE-c; Section 2.1), directly capturing the dependencies between z x and z y , presumes that semantic information should influence syntax structure.During the sampling stage, the prior for z y is drawn based on z x from a conditional prior network p(z y |z x ); z x implicitly encodes the subject of the sentence, and z y encodes the corresponding syntax.Although this model has robust performance on generation, it doesn't allow us to syntactically control the generated sentences by freely changing the syntactic tree template in z y .Thus we propose SIVAE-i (Section 2.2), which generates sentences and syntactic trees assuming the priors p(z x ) and p(z y ) are independent.The entire architecture is shown in Figure 2.

Modeling Syntax-Semantics Dependencies
Since the syntax of a sentence is influenced by the semantics, especially when the content is long, we first propose a generative model to exploit the dependencies between z x and z y , through a conditional prior network p ψ (z y |z x ).Formally, SIVAEc models the joint probability of the sentence and its syntactic tree: where the prior over z x is the isotropic Gaussian p(z x ) = N (0, I).We define q(•) to be the variational posterior distributions that approximate the true posterior distributions.The model is trained by maximizing the lower bound of the log likelihood where ψ, φ, and θ are the parameters of the prior network, the recognition networks, and the generation networks, respectively.We apply the reparameterize trick to yield a differentiable unbiased estimator of the lower bound objective.

Conditional Prior Network
The key to SIVAEc is the conditional prior which is used to model the dependencies between the sentence latent variable z x and the syntactic tree latent variable z y .Given z x , the prior for z y is sampled from a conditional probability p ψ (z y |z x ) modeled by a multivariate Gaussians N (µ , σ 2 I).The parameters of the Gaussian distribution are computed from z x with a conditional prior network parameterized by ψ.In particular, µ and σ 2 are the outputs of multilayer perceptron (MLP) networks taking z x as the input.
Recognition Networks To differentiate through the sampling stage z ∼ q φ (z|x), the VAE encoder q φ (z x |x) is also assumed to be a Gaussian distribution N (µ x , Σ x ), where µ(x) and diag(Σ(x)) are the outputs of feedforward networks taking x as the input.The recognition network consists of a bidirectional LSTM encoder to produce a sentence embedding for x and two linear networks to transform the embedding to the Gaussian parameters.The Kullback-Leibler (KL) divergence between q φ (z x |x) and the isotropic Gaussian prior p(z x ) is So we only need to model µ x and the diagonal of Σ x to compute the KL divergence.
To reconcile the conditional prior p ψ (z y |z x ), the variational posterior q φ (z y |y, z x ) = N (µ y , σ 2 y I), also depends on the latent variable z x .µ y and σ 2 y are obtained from a recognition network that contains a bidirectional LSTM encoder, producing a syntactic tree embedding, and two linear networks, taking the embedding and z x as inputs.The KL divergence is then given by Generation Networks We employ an LSTM to generate y from p θ (y|z y ).A word v y is selected by computing the probability of y t = v y conditioned on previously generated words y −t and z y p( where h y t is the current hidden states of the LSTM tree decoder To generate x from p θ (x|y, z x ), we modify the generative model in GNMT (Shah and Barber, 2018).First, the last hidden states h y |y| and c y |y| in ( 6) are directly used as the generated syntactic tree y, where |y| is the length of y.Then we use another LSTM for sentence generation, The conditional probabilities of In this way, the generated sentence is conditioned on z x and the generated syntactic tree y.SIVAEc selects possible syntactic tree templates for a given sentence latent variable, but the syntactic tree template cannot be freely determined.

Syntactically-Controlled Sentence Generation
In order to freely change the syntactic tree template embedded in z y , we propose an alternative model assuming the independence of two priors.Let priors z x and z y be independent random variables drawn from N (0, I).The variational posteriors q φ (z x |x) and q φ (z y |y) follow Gaussian distributions parameterized by the outputs of feedforward networks, whose inputs are x and y.The model is trained by maximizing the lower bound objective Since y and z x are assumed to be independent when computing the joint probability p(x, y), we seek to minimize the mutual information I(y; z x ) during training.
The recognition networks and the generation networks of SIVAE-i are similar to those adopted in SIVAE-c, so we omit them for brevity.

Unsupervised Paraphrasing
Paraphrases are sentences with the same meaning but different syntactic structures.SIVAE allows us to execute syntax transformation, producing the desired paraphrases with variable syntactic tree templates.The syntactically controlled paraphrase generation is inspired by Iyyer et al. (2018); the difference is that our SIVAE-based syntactic paraphrase network is purely unsupervised.Unsupervised paraphrasing can be performed using both SIVAE-c and SIVAE-i.
One way to generate paraphrases is to perform syntactically controlled paraphrase generation using SIVAE-i.The latent representations of an input sentence z x and a syntactic tree template z y are fed into SIVAE-i, and the syntax of the generated sentence conforms with the explicitly selected target template.However, linearized syntactic sequences are relatively long (as shown in Table 1) and long templates are more likely to mismatch particular input sentences, which may result in nonsensical paraphrase outputs.Therefore, we use simplified syntactic sequences as templates, by taking the top two levels of the linearized constituency trees.
The paraphrase generative process is: 1. Encode the original sentence to z x ; 2. Select and encode a syntactic template into z y ; 3. Generate the reconstructed syntactic sequence y from p(y|z y ); 4. Generate the paraphrase of the original sentence that conforms to y from p(x|y, z x ).
We can also use a trained SIVAE-c to generate paraphrases.The paraphrase generation process is similar to sampling from a standard VAE with various tempera.The difference is that SIVAE-c first selects possible syntactic tree templates using the conditional prior network p ψ (z y |z x ) then generates paraphrases based on the syntactic template and the latent variable.

Related Work
Syntax-Aware Neural Text Generation The ability to generate sentences is core to many NLP tasks, such as machine translation (Bahdanau et al., 2015), summarization (Rush et al., 2015), and dialogue generation (Vinyals and Le, 2015).Recent works have shown that neural text generation can benefit from the incorporation of syntactic knowledge (Shen et al., 2018;Choe and Charniak, 2016).Sennrich and Haddow (2016) propose to augment each source word representation with its corresponding part-of-speech tag, lemmatized form and dependency label; Eriguchi et al. (2016) and Bastings et al. (2017) utilize a tree-based encoder and a graph convolutional network encoder respectively to embed the syntactic parse trees as part of the source sentence representations; Chen et al. (2017) model source-side syntactic trees with a bidirectional tree encoder and tree-coverage decoder; Eriguchi et al. (2017) implicitly leverage linguistic prior by treating syntactic parsing as an auxiliary task.However, most of these syntax-aware generation works only focus on neural machine translation.
Deep Latent Variable Models Deep latent variable models that combine the complementary strengths of latent variable models and deep learning have drawn much attention recently.Generative adversarial networks (Goodfellow et al., 2014) and variational autoencoders (Kingma and Welling, 2014) are the two families of deep generative models that are widely adopted in applications.As VAEs allow discrete generation from a continuous space, they have been a popular variant for NLP tasks including text generation (Bowman et al., 2016;Yang et al., 2017;Xu and Durrett, 2018;Shen et al., 2019;Wang et al., 2019).The flexibility of VAEs also enables adding conditions during inference to perform controlled language generation (Hu et al., 2017;Zhao et al., 2017).Divergent from these VAE-based text generation models, our work decouples the latent representations corresponding to the sentence and its syntactic tree respectively.
Paraphrase Generation Due to the similarity between two tasks, neural machine-translationbased models can often be utilized to achieve paraphrase generation (Hasan et al., 2016;Mallinson et al., 2017).Recently, Iyyer et al. (2018) proposed to syntactically control the generated paraphrase and Gupta et al. (2018) generate paraphrases in a deep generative architecture.However, all these methods assume the existence of some parallel paraphrase corpora while unsupervised paraphrase generation has been little explored.

Experiments
We conduct our experiments on two datasets: sentence-level Penn Treebank (Marcus et al., 1993) with human-constituted parse trees and a 90 million word subset of Wikipedia (Gulordava et al., 2018) with parsed trees.When the decoder is too strong, VAE suffers from posterior collapse where the model learns to ignore the latent variable (Bowman et al., 2016).To avoid posterior collapse, KLterm annealing and dropping out words during decoding are employed for training in this work.We also tried an advanced method replacing Gaussian priors with von Mises-Fisher priors (Xu and Durrett, 2018)   are about the same.
To discover whether the incorporation of syntactic trees is helpful for sentence generation, we compare our two versions of SIVAE with three baselines that do not utilize syntactic information: a 5gram Kneser-Ney language model (KN5) (Heafield et al., 2013), an LSTM language model (LSTM-LM) (Sundermeyer et al., 2012), and a standard VAE (Bowman et al., 2016) using an LSTM-based encoder and decoder.Experimental results of language modeling are evaluated by the reconstruction loss using perplexity and the targeted syntactic evaluation proposed in (Marvin and Linzen, 2018).In section 5.3, we show the unsupervised paraphrase generation results.
Datasets We use two datasets in this paper.For sentence-level Penn Treebank (PTB), the syntactic trees are labeled by humans (i.e."gold-standard" trees).For Wikipedia-90M (wiki90M), which does not contain human-generated trees, we first feed the sentences into a state-of-the-art constituency parser (Kitaev and Klein, 2018), and then use the parsed trees as syntactic information for our model.Further, we replace (low-frequency) words that appear only once in both datasets with the <unk> token.Statistics about the two datasets are shown in Table 1.As we can see, the linearized sequences are much longer than sentences.The vocabulary of trees sequences is much smaller than the vocabulary of sentences; and golden trees have larger vocabulary than parsed trees.
Settings The parameters are fine-tuned on the validation set.Our implementation of SIVAE uses one-layer bi-directional LSTM architectures for both encoders, and one-layer unidirectional LSTM architectures for both decoders.The size of hidden units in the LSTM is 600 and the size of word embeddings is 300.The latent variable size is set to 150 for both sentences and their syntactic trees.The hidden units size of the MLP in the conditional prior network is 400.We also tried to use different model sizes for sentences and syntactic trees but the results are about the same and the performance even get worse when the difference of the model sizes is too big.We use SGD for optimization, with a learning rate of 0.0005.The batch size is 32 and the number of epochs is 10.The word dropout rate during decoding is 0.4.For KL annealing, the initial weights of the KL terms are 0, and then we gradually increase the weights as training progresses, until they reach the KL threshold of 0.8; the rate of this increase is set to 0.5 with respect to the total number of batches.

Language Modeling Results
We explore two settings for the decoders: standard and inputless.In the standard setting, the input to the LSTM decoder is the concatenation of the latent representation z and the previous ground truth word.A powerful decoder usually results in good reconstruction in this setting but the model may ignore the latent variable.In the inputless setting, the decoder purely relies on the latent representations without any use of prior words, so that the model is driven to learn high-quality latent representations of the sentences and syntactic trees.
The language-modeling results, on testing sets evaluated by negative log likelihood (NLL) and perplexity (PPL), are shown in Table 2. SIVAEs outperform all other baselines on both datasets, demonstrating the explicit incorporation of syntactic trees helps with the reconstruction of sentences.The performance boost on the wiki90M dataset also shows that syntactic trees parsed by a welldeveloped parser can serve the same function as human-constituted trees, for our model to utilize syntactic information; this underscores how mature parser technology may be leveraged in text generation.Between the two proposed methods, SIVAE-i is better at reconstructing sentences while SIVAE-c is better at reconstructing syntactic trees.In the standard setting, VAE performs almost the same as the LSTM language model, possibly because the strong LSTM decoder plays a dominant role when it uses prior words, so the VAE becomes similar to an LSTM language model.Furthermore, the KL divergence of the proposed models indicate that SIVAE is better at avoiding posterior collapse, so the LSTM sentence decoder can take advantage of the encoded latent variable as well as the previously generated syntactic tree.In the inputless setting, we see that VAE contains a significantly larger KL term and shows substantial improvement over KN5 and LSTM language models.SIVAEs further reduces PPL from 317 to 261 on PTB and from 308 to 256 on wiki90M, compared to VAE.

Targeted Syntactic Evaluation
We adopt targeted syntactic evaluation (Marvin and Linzen, 2018) to examine whether the proposed methods improve the grammar of generated sentences.The idea is to assign a higher probability for generating the grammatical sentence than the ungrammatical one, given a pair of sentences that only differ in grammar.There are three types of sentence pairs used in this work.
Subject-verb agreement (SVA): Third-person present English verbs need to agree with the number of their subjects.
Reflexive anaphoras (RA): A reflective pronoun such as himself needs to agree in number (and gender) with its antecedent.For example, simple RA: (a).The senators embarrassed themselves.(b).*The senators embarrassed herself.
Negative polarity items (NPI): Words like any and ever that can only be used in the scope of negation are negative polarity items.For example, simple NPI: (a).No students have ever lived here.(b).*Most students have ever lived here.
In the above examples, we expect the probability of generating (a) to be higher than the probability of generating (b).However, it is trivial to identify these simple test pairs with simple syntax.depth in relative clauses, identifying which requires more understanding of the syntactic structure.
The accuracy per grammar test case of each method is shown in Table 3. Human scores on these test pairs in (Marvin and Linzen, 2018) are also shown for reference.SIVAE outperforms other baselines on grammar testing cases, demonstrating the explicit incorporation of syntactic trees helps with the grammar of generated sentences.For simple SVA testing pairs, SIVAE-c has a better score than humans.Even for a difficult grammar test like NPI, our methods still makes significant progress compared to other baselines, whose scores show no syntactic understanding of these sentences.From Table 3, note that KN5 can only identify simple SVA pairs.In addition, VAE has similar syntactic performance as a LSTM language model, which verifies the results in reconstruction.Between the two proposed methods, SIVAE-i makes more grammar mistakes than SIVAE-c, although it has better perplexity in Table 2.This is because SIVAE-c considers the dependency between the sentence prior and the syntactic tree prior, so it can more efficiently prevent the mismatch between two latent variables.In other words, SIVAE-c learns more robust syntactic representations, but this advantage is not reflected on the reconstruction evaluation.

Unsupervised Paraphrasing Results
The proposed method is used for generating paraphrases by implicitly selecting (SIVAE-c) or explicitly changing (SIVAE-i) the syntactic tree templates.Our model is not trained on a paraphrase corpora, which makes it a purely unsupervised paraphrasing network.the discovery of dinosaurs has been a legend , is it ?( S ( " ) ( NP ) ( VP ) ( " ) ( NP ) ( VP ) ( . ) ) " the discovery of dinosaurs is a legend " he said .( S ( VP ) ( , ) ( NP ) ( . ) ) having been accompanied , the unk lengend .Ori the new york times has been one of the best selling newspapers in america .

Syntactically
Gen1 the new york times also has been used as american best selling newspaper .
Gen2 the new york times also has been used as a " unk " that sells in america .
Gen3 the new york times also has been used as the best " unk " selling in america .phrasing network is trained on sentences and their simplified syntactic sequences of PTB and wiki90M dataset.Table 4 shows some example paraphrases generated by SIVAE-i using different syntactic templates.We see that SIVAE-i has the ability to syntactically control the generated sentences that conform to the target syntactic template.The examples are well-formed, semantically sensible, and grammatically correct sentences that also preserve semantics of the original sentences.However, the model can generate nonsensical outputs, like the failed cases in Table 4, when the target template mismatches the input sentence.
Paraphrasing with Different Tempera We further perform paraphrasing using SIVAE-c with different tempera.Table 5 shows example paraphrases generated by SIVAE-i.We see that SIVAE-c can generate grammatical sentences that are relevant to the original sentence.However, the generated paraphrases are very similar, indicating that the variance of the conditional prior network is small.In other words, given a sentence latent representation, the range for SIVAE-c selecting a possible syntactic tree representation is small, so it tends to generate similar paraphrases.
Qualitative Human Evaluation We adopt similar human evaluation metrics as in (Gupta et  2018) for generated paraphrases.For 20 original sentences, we collect 5 paraphrases for each sentence (100 in total) generated by SIVAE-c or SIVAE-i using 5 different syntactic templates.The models are trained on PTB and wiki90M.Three aspects are verified in human evaluation: Relevance with the original sentence, Readability w.r.t. the syntax of generated sentences, and Diversity of different generations for the same original sentence.Three human evaluators assign a score on a scale of 1-5 (higher is better) for each aspect per generation.The human evaluation results for unsupervised paraphrase generation using standard VAE, SIVAEi and SIVAE-c are shown in Table 6.SIVAE-c has the best scores and standard VAE has the worst scores at the readability of generated sentences, which further verifies that syntactic information is helpful for sentence generation.Paraphrases generated by SIVAE-i are more diverse under different syntactic templates, compared to SIVAE-c and standard VAE.All three models show better paraphrasing performance on the wiki90M dataset.

Continuity of Latent Spaces
We further test the continuity of latent spaces in our model.Two vectors z A and z B are randomly sampled from the sentence latent space of SIVAEc.Table 7 shows generated sentences based on intermediate points between z A and z B .We see the transitions are smooth and the generations are grammatical, verifying the continuity of the sen-A in january 2014 , the unk announced that one player would be one of the first two heroes .
• in january 2014 , he was one of the first two players to be the most successful .
• until the end of the first half of the series , he has played the most reported time .
• until the end of world war i , he was the first player in the united states .
• there are also a number of other members in the american war association .
B there are also a number of other american advances , such as the unk unk of the american association .
Table 7: Intermediate sentences are generated between two random points in the latent space of SIVAE-c.
tence latent space.The syntactic structure remains consistent in neighborhoods along the path, indicating the continuity in the syntactic tree latent space.

Conclusion
We present SIVAE, a novel syntax-infused variation autoencoder architecture for text generation, leveraging constituency parse tree structure as the linguistic prior to generate more fluent and grammatical sentences.The new lower bound objective accommodates two latent spaces, for jointly encoding and decoding sentences and their syntactic trees.The first version of SIVAE exploits the dependencies between two latent spaces, while the second version enables syntactically controlled sentence generation by assuming the two priors are independent.Experimental results demonstrate the incorporation of syntactic trees is helpful for reconstruction and grammar of generated sentences.In addition, SIVAE can perform unsupervised paraphrasing with different syntactic tree templates.

Figure 1 :
Figure 1: An example of a constituency tree structure.

Figure 2 :
Figure 2: Block diagram of the proposed SIVAE model encoding and decoding sentences and their syntactic trees jointly.The prior network (dashed lines) is used only for the sampling stage of SIVAE-c.

Table 1 :
Statistics of the two datasets used in this paper.Ave_s/ Ave_t, Max_s/ Max_t, and Voc_s/ Voc_t denote the average length, maximum length, and vocabulary size for sentences/ tree sequences correspondingly.
to prevent KL collapse, but the results

Table 2 :
Language modeling results on testing sets of PTB and wiki90M.For two SIVAE models, the syntactic tree sequence reconstruction scores are shown in parenthesis alongside the sentence reconstruction scores.Lower is better for PPL and NLL.The best results are in bold.
Thus we include complex longer test pairs with greater

Table 4 :
Examples of syntactically controlled paraphrases generated by SIVAE-i.We show two successful and one failed (in blue) generations with different templates for each input sentence.

Table 5 :
An example of paraphrases generated by SIVAE-c.

Table 6 :
al., Human evaluation results on Relevance, Readability, and Diversity of generated paraphrases.