Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents with respect to specific aspects. Our motivating application is embedding biomedical abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that our method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval. We demonstrate that the approach generalizes beyond our motivating application in experiments on two multi-aspect review corpora.


Introduction
A classic problem that arises in (distributed) representation learning is that it is difficult to determine what information individual dimensions in an embedding encode.When training a classifier to distinguish between images of people and landscapes, we do not know a priori whether the model is sensitive to differences in color, contrast, shapes or textures.Analogously, in the case of natural language, when we calculate similarities between document embeddings of user reviews, we cannot know if this similarity primarily reflects user sentiment, the product discussed, or syntactic patterns.This lack of interpretability makes it difficult to assess whether a learned representations is likely to generalize to a new task or domain, hindering model transferability.Disentangled representations with known semantics could allow more efficient training in settings in which supervision is expensive to obtain (e.g., biomedical NLP).
Thus far in NLP, learned distributed representations have, with few exceptions (Ruder et al., 2016;He et al., 2017;Zhang et al., 2017), been entangled: they indiscriminately encode all aspects of texts.Rather than representing text via a monolithic vector, we propose to estimate multiple embeddings that capture complementary aspects of texts, drawing inspiration from the ML in vision community (Whitney, 2016;Veit et al., 2017a).
As a motivating example we consider documents that describe clinical trials.Such publications constitute the evidence drawn upon to support evidence-based medicine (EBM), in which one formulates precise clinical questions with respect to the Populations, Interventions, Comparators and Outcomes (PICO elements) of interest (Sackett et al., 1996). 1 Ideally, learned representations of such articles would factorize into embeddings for the respective PICO elements.This would enable aspect-specific similarity measures, in turn facilitating retrieval of evidence concerning a given condition of interest (i.e., in a specific patient population), regardless of the interventions and outcomes considered.Better representations may reduce the amount of supervision needed, which is expensive in this domain.
Our work is one of the first efforts to induce disentangled representations of texts,2 which we believe may be broadly useful in NLP.Concretely, our contributions in this paper are as follows: • We formalize the problem of learning disentangled representations of texts, and develop a relatively general approach for learning these from aspect-specific similarity judgments expressed as triplets (s, d, o) a , which indicate that document d is more similar to document s than to document o, with respect to aspect a.
• We perform extensive experiments that provide evidence that our approach yields disentangled representations of texts, both for our motivating task of learning PICO-specific embeddings of biomedical abstracts, and, more generally, for multi-aspect sentiment corpora.

Framework and Models
Recent approaches in computer vision have emphasized unsupervised learning of disentangled representations by incorporating informationtheoretic regularizers into the objective (Chen et al., 2016;Higgins et al., 2017).These approaches do not require explicit manual annotations, but consequently they require post-hoc manual assignment of meaningful interpretations to learned representations.We believe it is more natural to use weak supervision to induce meaningful aspect embeddings.

Learning from Aspect Triplets
As a general strategy for learning disentangled representations, we propose exploiting aspectspecific document triplets (s, d, o) a : this signals that s and d are more similar than are d and o, with respect to aspect a (Karaletsos et al., 2015;Veit et al., 2017b), i.e., sim a (d, s) > sim a (d, o), where sim a quantifies similarity w.r.t.aspect a.
We associate with each aspect an encoder enc a (encoders share low-level layer parameters; see Section 2.2 for architecture details).This is used to obtain text embeddings (e a s , e a d , e a o ).To estimate the parameters of these encoders we adopt a simple objective that seeks to maximize the similarity between (e a d , e a s ) and minimize similarity between (e a d , e a o ), via the following maximum margin loss L(e a s , e a d , e a o ) = max{0, 1 − sim(e a d , e a s ) + sim(e a d , e a o )} (1) Where similarity between documents i and j with respect to a particular aspect a, sim a (i, j), is simply the cosine similarity between the aspect embeddings e a i and e a j .This allows for the same documents to be similar with respect to some aspects while dissimilar in terms of others.
The above setup depends on the correlation between aspects in the training data.At one extreme, when triplets enforce identical similarities for all aspects, the model cannot distinguish between aspects at all.At the other extreme, triplets are present for only one aspect a, and absent for all other aspects a : In this case the model will use only the embeddings for aspect a to represent similarities.In general, we expect a compromise between these extremes, and propose using negative sampling to enable the model to learn targeted aspect-specific encodings.

Encoder Architecture
Designing an aspect-based model requires specifying an encoder architecture.One consideration here is interpretability: a desirable property for aspect encoders is the ability to identify salient words for a given aspect.With this in mind, we propose using gated CNNs, which afford introspection via the token-wise gate activations.
Figure 2 schematizes our encoder architecture.The input is a sequence of word indices d = (w 1 , ..., w N ) which are mapped to m-dimensional word embeddings and stacked into a matrix E = [e 1 , ..., e N ].These are passed through sequential convolutional layers C 1 , ..., C L , which induce representations H l ∈ R N ×k : where X ∈ R N ×k is the input to layer C l (either a set of n-gram embeddings or H l−1 ) and k is the number of feature maps.Kernel K l ∈ R F ×k×k and b l ∈ R k are parameters to be estimated, where F is the size of kernel window. 3An activation function f e is applied element-wise to the output of the convolution operations.We fix the size of H l−1 ∈ R N ×k by zero-padding where necessary.Keeping the size of feature maps constant across layers allows us to introduce residual connections; the output of layer l is summed with the outputs of preceding layers before being passed forward.We multiply the output of the last convolutional layer our final embedding e d ∈ R 1×k : where w g ∈ R k×1 and b g ∈ R are learned parameters and σ is the sigmoid activation function.We impose a sparsity-inducing constraint on g via the 1 norm; this allows the gates to effectively serve as an attention mechanism over the input.Additionally, to capture potential cross-aspect correlation, weights in the embedding and first convolutional layers are shared between aspect encoders.Alternative encoders.To assess the relative importance of the specific encoder model architecture used, we conduct experiments in which we fine-tune standard document representation models via triplet-based training.Specifically, we consider a single-layer MLP with BoW inputs, and a Neural Variational Document Model (NVDM) (Miao et al., 2016).For the NVDM we take a weighted sum of the original loss function and the triplet-loss over the learned embeddings, where the weight is a model hyperparameter.

Varieties of Supervision
Our approach entails learning from triplets that codify relative similarity judgments with respect to specific aspects.We consider two approaches to acquiring such triplets: the first exploits aspectspecific summaries written for texts, and the second assumes a more general scenario in which we solicit aspect-wise triplet judgments directly.

Deriving Triplets from Aspect Summaries
In the case of our motivating example -disentangled representations for articles describing clinical trials -we have obtained aspect-specific summaries from the Cochrane Database of System-atic Reviews (CDSR).Cochrane is an international organization that creates and curates biomedical systematic reviews.Briefly, such reviews seek to formally synthesize all relevant articles to answer precise clinical questions, i.e., questions that specify a particular PICO frame.The CDSR consists of a set of reviews {R i }.Reviews include multiple articles (studies) {S ij }.Each study S consists of an abstract A and a set of free text summaries (s P , s I , s O ) written by reviewers describing the respective P, I and O elements in S.
Reviews implicitly specify PICO frames, and thus two studies in any given review may be viewed as equivalent with respect to their PICO aspects.We use this observation to derive document triplets.Recall that triplets for a given aspect include two comparatively similar texts (s, d) and one relatively dissimilar (o).Suppose the aspect of interest is the trial population.Here we match a given abstract (d) with its matched population summary from the CDSR (s); this encourages the encoder to yield similar embeddings for the abstract and the population description.The dissimilar o is constructed to distinguish the given abstract from (1) other aspect encodings (of interventions, outcomes), and, (2) abstracts for trials with different populations.
Concretely, to construct a triplet (s, d, o) for the PICO data, we draw two reviews R 1 and R 2 from the CDSR at random, and sample two studies from the first (s 1 , s 1 ) and one from the second (s 2 ).Intuitively, s 2 will (very likely) comprise entirely different PICO elements than (s 1 , s 1 ), by virtue of belonging to a different review.To formalize the preceding description, our triplet is then: , where s abstract 1 is the abstract for study s 1 , and aspect summaries for studies are denoted by superscripts.We include a concrete example of triplet construction in the Appendix, Section D.

Learning Directly from Aspect-Wise Similarity Judgments
The preceding setup assumes a somewhat unique case in which we have access to aspect-specific summaries written for texts.As a more general setting, we also consider learning directly from triplet-wise supervision concerning relative similarity with respect to particular aspects (Amid and Ukkonen, 2015;Veit et al., 2017a;Wilber et al., 2014).The assumption is that such judgments can be solicited directly from annotators, and thus the approach may be applied to arbitrary domains, so long as meaningful aspects can be defined implicitly via pairwise similarities regarding them.
We do not currently have corpora with such judgments in NLP, so we constructed two datasets using aspect-specific sentiment ratings.Note that this highlights the flexibility of exploiting aspectwise triplet supervision as a means of learning disentangled representations: existing annotations can often be repurposed into such triplets.

Datasets and Experiments
We present a series of experiments on three corpora to assess the degree to which the learned representations are disentangled, and to evaluate the utility of these embeddings in simple downstream retrieval tasks.We are particularly interested in the ability to identify documents similar w.r.t. a target aspect.All parameter settings for baselines are reported in the Appendix (along with additional experimental results).The code is available at https://github.com/successar/neural-nlp.

PICO (EBM) Domain
We first evaluate embeddings quantitatively with respect to retrieval performance.In particular, we assess whether the induced representations afford improved retrieval of abstracts relevant to a particular systematic review (Cohen et al., 2006;Wallace et al., 2010).We then perform two evaluations that explicitly assess the degree of disentanglement realized by the learned embeddings.
The PICO dataset comprises 41K abstracts of articles describing clinical trials extracted from the CDSR.Each abstract is associated with a review and three summaries, one per aspect (P/I/O).We keep all words that occur in ≥ 5 documents, converting all others to unk.We truncate documents to a fixed length (set to the 95th percentile).

Quantitative Evaluation
Baselines.We compare the proposed P, I and O embeddings and their concatenation [P|I|O] to the following.TF-IDF: standard TF-IDF representation of abstracts.RR-TF: concatenated TF-IDF vectors of sentences predicted to describe the respective PICO elements, i.e., sentence predictions made using the pre-trained model from (Wallace et al., 2016) -this model was trained using distant supervision derived from the CDSR.doc2vec: standard (entangled) distributed representations of abstracts (Le and Mikolov, 2014).LDA: Latent Dirichlet Allocation.NVDM: A generative model of text where the representation is a vector of logfrequencies that encode a topic (Miao et al., 2016).ABAE: An autoencoder model that discovers latent aspects in sentences (He et al., 2017).We obtain document embeddings by summing over constituent sentence embeddings.DSSM: A CNN based encoder trained with triplet loss over abstracts (Shen et al., 2014).
Hyperparameters and Settings.We use three layers for our CNN-based encoder (with 200 filters in each layer; window size of 5) and the PReLU activation function (He et al., 2015) as f e .We use 200d word embeddings, initialized via pretraining over a corpus of PubMed abstracts (Pyysalo et al., 2013).We used the Adam optimization function with default parameters (Kingma and Ba, 2014).We imposed 2 regularization over all parameters, the value of which was selected from the range (1e-2, 1e-6) as 1e-5.The 1 regularization parameter for gates was chosen from the range (1e-2, 1e-8) as 1e-6.All model hyperparameters for our models and baselines were chosen via line search over a 10% validation set.
Metric.For this evaluation, we used a held out set of 15 systematic reviews (comprising 2,223 studies) compiled by Cohen et al. (2006).The idea is that good representations should map abstracts in the same review (which describe studies with the same PICO frame) relatively near to one another.To compute AUCs over reviews, we first calculate all pairwise study similarities (i.e., over all studies in the Cohen corpus).We can then construct an ROC for a given abstract a from a particular review to calculate its AUC: this measures the probability that a study drawn from the same review will be nearer to a than a study from a different review.A summary AUC for a review is taken as the mean of the study AUCs in that review.
Results.Table 1 reports the mean AUCs over individual reviews in the Cohen et al. (2006) corpus, and grand means over these (bottom row).In brief: The proposed PICO embeddings (concatenated) obtain an equivalent or higher AUC than baseline strategies on 12/14 reviews, and strictly higher AUCs in 11/14.It is unsurprising that we outperform unsupervised approaches, but we also best RR-TF, which was trained with the same CDSR corpus (Wallace et al., 2016), andDSSM (Shen et al., 2014)  as our model.We outperform the latter by an average performance gain of 4 points AUC (significant at 95% level using independent 2-sample t-test).
We now turn to the more important questions: are the learned representations actually disentangled, and do they encode the target aspects?Table 2 shows aspect-wise gate activations for PICO elements over a single abstract; this qualitatively suggests disentanglement, but we next investigate this in greater detail.

Qualitative Evaluation
To assess the degree to which our PICO embeddings are disentangled -i.e., capture complementary information relevant to the targeted aspectswe performed two qualitative studies.
First, we assembled 87 articles (not seen in training) describing clinical trials from a review on the effectiveness of decision aids (Stacey et al., 2014) for: women with, at risk for, and genetically at risk for, breast cancer (BCt, BCs and BCg, respectively); type II diabetes (D); menopausal women (MW); pregnant women generally (PW) and those who have undergone a C-section previously (PWc); people at risk for colon cancer (CC); men with and at risk of prostate cancer (PCt and PCs, respectively) and individuals with atrial fibrillation (AF).This review is unusual in that it studies a single intervention (decision aids) across different populations.Thus, if the model is successful in learning disentangled representations, the corresponding P vectors should roughly cluster, while the I/C should not.
Figure 3 shows a TSNE-reduced plot of the P, I/C and O embeddings induced by our model for these studies.Abstracts are color-coded to indicate the populations enumerated above.As hypothesized, P embeddings realize the clearest separation with respect to the populations, while the I and O embeddings of studies do not co-localize to the same degree.This is reflected quantitatively in the AUC values achieved using each aspect embedding (listed on the Figure).This result implies disentanglement along the desired axes.
Next we assembled 50 abstracts describing trials involving hip replacement arthroplasty (HipRepl).
We selected this topic because HipRepl will either describe the trial population (i.e., patients who have received hip replacements) or it will be the intervention, but not both.Thus, we would expect that abstracts describing trials in which HipRepl describes the population cluster in the corresponding embedding space, but not in the intervention space (and vice-versa).To test this, we first manually annotated the 50 abstracts, associating HipRepl with either P or I.We used these labels to calculate pairwise AUCs, reported in Table 3.The results imply that the population embeddings discriminate between studies that    enrolled patients with HipRepl and other studies.Likewise, studies in which HipRepl was the intervention are grouped in the interventions embedding space, but not in the populations space.Aspect words.In Table 4, we report the most activated unigrams for each aspect embedding on the decision aids corpus.To derive these we use the outputs of the gating mechanism (Eq.3), which is applied to all words in the input text.For each word, we average the activations across all abstracts and find the top ten words for each aspect.
The words align nicely with the PICO aspects, providing further evidence that our model learns to focus on aspect-specific information.

Multi-Aspect Reviews
We now turn from the specialized domain of biomedical abstracts to more general applications.
In particular, we consider learning disentangled representations of beer, hotel and restaurant re-  views.Learned embeddings should capture different aspects, e.g., taste or look in the case of beer.

Beer Reviews (BeerAdvocate)
We conducted experiments on the BeerAdvocate dataset (McAuley et al., 2012), which contains 1.5M reviews of beers that address four aspects: appearance, aroma, palate, and taste.Free-text reviews are associated with aspect-specific numerical ratings for each of these, ranging from 1 to 5. We consider ratings < 3 as negative, and > 3 as positive, and use these to generate triplets of reviews.For each aspect a, we construct triplets (s, d, o) a by first randomly sampling a review d.We then select s to be a review with the same sen-  7: 'Decorrelated' cross-AUC results on the BeerAdvocate data, which attempt to mitigate confounding due to overall sentiment being correlated.Each cell reports metrics over subsets of reviews in which the sentiment differs between the row and column aspects.The numbers in each cell are the AUCs w.r.t.sentiment regarding the column aspect achieved using the row and column aspect representations, respectively.
timent with respect to a as d, and o to be a review with the opposite sentiment regarding a.We selected 90K reviews for experiments, such that we had an equal number of positive and negative reviews for each aspect.We only keep words appearing in at least 5 documents, converting all others to unk.We truncated reviews to 95 percentile length.We split our data into 80/10/10 ratio for training, validation and testing, respectively.
Baselines.We used the same baselines as for the PICO domain, save for RR-TF, which was domain-specific.Here we also evaluate the result of replacing the CNN-based encoder with NVDM, BoW and DSSM based encoders, respectively, each trained using triplet loss.Hyperparameters and Settings.For the CNNbased encoder, we used settings and hyperparameters as described for the PICO domain.For the BoW encoder, we used 800d output embeddings and a PReLU activation function with 2 regularization set to 1e-5.For the NVDM based encoder, we used 200d embeddings.
Metrics.We again performed an IR-type evaluation to assess the utility of representations.For each aspect k, we constructed an affinity matrix A k such that A k ij = sim k (r i , r j ) for beer reviews r i and r j .We consider two reviews similar under a given aspect k if they have the same (dichotomized) sentiment value for said aspect.We compute AUCs for each review and aspect using the affinity matrix A k .The AUC values are averaged over reviews in the test set to obtain a final AUC metric for each aspect.We also report cross AUC measures in which we use embeddings for aspect k to distinguish reviews under aspect k .Results We report the AUC measures for each aspect on our test set using different representations in Table 5.Our model consistently outperforms baseline strategies over all aspects.Unsurprisingly, the model outperforms unsupervised approaches. 4We realize consistent though modest improvement over triplet-supervised approaches that use alternative encoders.
In Table 6 we present cross AUC evaluations.Rows correspond to the embedding used and columns to the aspect evaluated against.As expected, aspect-embeddings perform better w.r.t. the aspects for which they code, suggesting some disentanglement.However, the reduction in performance when using one aspect representation to discriminate w.r.t.others is not as pronounced as above.This is because aspect ratings are highly correlated: if taste is positive, aroma is very likely to be as well.Effectively, here sentiment entangles all of these aspects. 5n Table 7, we evaluate cross AUC performance for beer by first 'decorrelating' the aspects.Specifically, for each cell (k, k ) in the table, we first retrieve the subset of reviews in which the sentiment w.r.t.k differs from the sentiment w.r.t.k .Then we evaluate the AUC similarity of these reviews on the basis of sentiment concerning k using both k and k embeddings, yielding a pair of AUCs (listed respectively).We observe that the using k embeddings to evaluate aspect k similarity yields better results than using k embeddings.
We present the most activated words for each aspect (as per the gating mechanism) in Table 8.And we present an illustrative review color-coded with aspect-wise gate activations in Table 9.For completeness, we reproduce the top words for aspects discovered using He et al. (2017) in the Appendix; these do not obviously align with the target aspects, which is unsurprising given that this is an unsupervised method.

Hotel & Restaurant Reviews
Finally, we attempt to learn embeddings that disentangle domain from sentiment in reviews.For this we use a combination of TripAdvisor and Look attractive, beautiful, fingers, pumpkin, quarter, received, retention, sheets, sipper, well-balanced Aroma beer, cardboard, cheap, down, follows, mediumlight, rice, settled, skunked, skunky Palate bother, crafted, luscious, mellow, mint, range, recommended, roasted, tasting, weight Taste amazingly, down, highly, product, recommended, tasted, thoroughly, to, truly, wow Table 8: Most activated words for aspects on the beer corpus, as per the gating mechanism.
Yelp! ratings data.The former comprises reviews of hotels, the latter of restaurants; both use a scale of 1 to 5. We convert ratings into positive/negative labels as above.Here we consider aspects to be the domain (hotel or restaurant) and the sentiment (positive or negative).We aim to generate embeddings that capture information about only one of these aspects.We use 50K reviews from each dataset for training and 5K for testing.Baselines.We use the same baselines as for the BeerAdvocate data, and similarly use different encoder models trained under triplet loss.Evaluation Metrics.We perform AUC and cross-AUC evaluation as in the preceding section.For the domain aspect, we consider two reviews similar if they are from the same domain, irrespective of sentiment.Similarly, reviews are considered similar with respect to the sentiment aspect if they share a sentiment value, regardless of domain.
Results.In Table 10 we report the AUCs for each aspect on our test set using different representations.Baselines perform reasonably well on the domain aspect because reviews from different domains are quite dissimilar.Capturing sentiment information irrespective of domain is more difficult, and most unsupervised models fail in this respect.In Table 11, we observe that cross AUC results are much more pronounced than for the BeerAdvocate data, as the domain and sentiment are uncorrelated (i.e., sentiment is independent of domain).

Related Work
Work in representation learning for NLP has largely focused on improving word embeddings (Levy and Goldberg, 2014;Faruqui et al., 2015;Huang et al., 2012).But efforts have also been made to embed other textual units, e.g.characters (Kim et al., 2016), and lengthier texts including sentences, paragraphs, and documents (Le and Mikolov, 2014; Kiros et al., 2015).
Triplet-based judgments have been used in multiple domains, including vision and NLP, to es-timate similarity information implicitly.For example, triplet-based similarity embeddings may be learned using 'crowdkernels' with applications to multi-view clustering (Amid and Ukkonen, 2015).Models combining similarity with neural networks mainly revolve around Siamese networks (Chopra et al., 2005) which use pairwise distances to learn embeddings (Schroff et al., 2015), a tactic we have followed here.Similarity judgments have also been used to generate document embeddings for IR tasks (Shen et al., 2014;Das et al., 2016).
Recently, He et al. ( 2017) introduced a neural model for aspect extraction that relies on an attention mechanism to identify aspect words.They proposed an autoencoder variant designed to tease apart aspects.In contrast to the method we propose, their approach is unsupervised; discovered aspects may thus not have a clear interpretation.Experiments reported here support this hypothesis, and we provide additional results using their model in the Appendix.
Other recent work has focused on text generation from factorized representations (Larsson et al., 2017).And Zhang et al. ( 2017) proposed a lightly supervised method for domain adaptation using aspect-augmented neural networks.They exploited source document labels to train a classifier for a target aspect.They leveraged sentencelevel scores codifying sentence relevance w.r.t.individual aspects, which were derived from terms a priori associated with aspects.This supervision is used to construct a composite loss that captures both classification performance on the source task and a term that enforces invariance between source and target representations.
There is also a large body of work that uses probabilistic generative models to recover latent structure in texts.Many of these models derive from Latent Dirichlet Allocation (LDA) (Blei et al., 2003), and some variants have explicitly represented topics and aspects jointly for sentiment tasks (Brody and Elhadad, 2010;Sauper et al., 2010Sauper et al., , 2011;;Mukherjee and Liu, 2012;Sauper and Barzilay, 2013;Kim et al., 2013).
A bit more generally, aspects have also been interpreted as properties spanning entire texts, e.g., a perspective or theme which may then color the discussion of topics (Paul and Girju, 2010).This intuition led to the development of the factorial LDA family of topic models (Paul and Dredze, 2012;Wallace et al., 2014); these model individ-Look : deep amber hue , this brew is topped with a finger of off white head .smell of dog unk , green unk , and slightly fruity .taste of belgian yeast , coriander , hard water and bready malt .light body , with little carbonation .
Aroma : deep amber hue , this brew is topped with a finger of off white head .smell of dog unk , green unk , and slightly fruity .taste of belgian yeast , coriander , hard water and bready malt .light body , with little carbonation .
Palate : deep amber hue , this brew is topped with a finger of off white head .smell of dog unk , green unk , and slightly fruity .taste of belgian yeast , coriander , hard water and bready malt .light body , with little carbonation .
Taste :deep amber hue , this brew is topped with a finger of off white head .smell of dog unk , green unk , and slightly fruity .taste of belgian yeast , coriander , hard water and bready malt .light body , with little carbonation .

Conclusions
We have proposed an approach for inducing disentangled representations of text.To learn such representations we have relied on supervision codified in aspect-wise similarity judgments expressed as document triplets.This provides a general supervision framework and objective.We evaluated this approach on three datasets, each with different aspects.Our experimental results demonstrate that this approach indeed induces aspect-specific embeddings that are qualitatively interpretable and achieve superior performance on information retrieval tasks.
Going forward, disentangled representations may afford additional advantages in NLP, e.g., by facilitating transfer (Zhang et al., 2017), or supporting aspect-focused summarization models.
2017.Aspect-augmented Adversarial Networks for Domain Adaptation.In Transactions of the Association for Computational Linguistics.

A Example of PICO Triplet generation
As an illustrative example, we walk through construction of a single triplet for the PICO domain in detail.Recall that we first randomly draw two reviews, R 1 and R 2 .In this case, R 1 consists of studies involving nocturnal enuresis and review R2 concerns asthma.From R 1 we randomly select two studies S, S .
(1) Here we sample abstract A for study S, shown below: In recent years the treatment of primary nocturnal enuresis (PNE) with desmopressin (DDAVP) has been promising.The route of administration until now had been intranasal, but because the tablets were introduced for the treatment of diabetes insipidus they have also become available for the treatment of PNE.To find the optimal dosage of desmopressin tablets and to compare desmopressin's efficacy with placebo in a group of adolescents with severe monosymptomatic enuresis.The long-term safety of desmopressin was also studied in the same group of patients.The effect of oral desmopressin (1-deamino-8-D-arginine-vasopressin) (DDAVP tablets, Minirin) was investigated in 25 adolescents (ages 11 to 21 years) with severe monosymptomatic nocturnal enuresis.The first part of the dose-ranging study comprised a single-blind dose titration period, followed by a double-blind, crossover efficacy period comparing desmopressin with placebo.The final part was an open long-term study consisting of two 12-week treatment periods.The efficacy of the drug was measured in reductions of the number of wet nights per week.During the first dose-titration period, the majority of the patients were given desmopressin 400 micrograms, and the number of wet nights decreased from a mean of 4.9 to 2.8.During the double-blind period, a significant reduction of wet nights was observed (1.8 vs 4.1 for placebo).During the two long-term periods, 48% and 53% of the patients could be classified as responders (0 to 1 wet night per week) and 22% and 23.5% as intermediate responders (2 to 3 wet nights per week).No weight gain was observed due to water retention.After cessation of the drug, 44% of the patients had a significant decrease in the number of wet nights.Oral desmopressin has a clinically significant effect on patients with PNE, and therapy is safe when administered as long-term treatment.
For study S, the summaries in the CDSR are as follow.First the P summary (s P ) , Wet nights during trial (number, mean, SD): A: 10, 1.8 (SD 1.4); B: 10, 4.1 (1.5) Side effects: headache (5); abdominal pain (6); nausea and vertigo (1) All resolved while treatment continued (2) From study S , the summaries in the CDSR are reproduced as follows.First, the P Summary (s P ):

B Implementation Details & Baseline Hyperparameters
All tokenisation has been done using default spaCy6 tokenizer.

B.1.1 Baselines
For the TF-IDF baseline, we use the implementation in scikit-learn7 .The TF-IDF was fit on all CDSR data.The resulting transformation was applied to Cohen corpus.We use cosine similarity to evaluate TF-IDF model.For Doc2Vec, we used the gensim implementation8 with an 800d embedding size, and a window size of 10.These parameters were selected via random search over validation data.We otherwise used the default settings in gensim.The vocabulary used was the same as above.
For LDA, we again used the scikit-learn implementation, setting the number of topics to 7; this was selected via line search over the validation set across the range 1 to 50.
For the NVDM baseline, we used hidden layer comprising 500 dimensions and an output embedding dimension of 200.We used Tanh activation functions.The model was trained using the Adam optimizer, using a learning rate of 5e-5.We performed early stopping based on validation perplexity.
For the ABAE model, we use the same hyperparameters as reported in the original He et al. (2017) paper.We experimented with 20, 50 and 100 aspects.Of these, 100 performed best, which is what we used to generate the reported results.
For the DSSM model, we used 300d filters with filter windows of sizes (1, 6) and output embedding dimension of 256.We used tanh activation for both the convolution and output layers, and imposed l2 regularization on weights (1e-5).The model was trained using the Adam optimizer with early stopping on validation loss.

B.2.1 Baselines
For the TF-IDF baseline, we used the scikit-learn implementation.TF-IDF parameters were fit on the available training set.We used cosine similarity for evaluation.
For Doc2Vec, we used the same method as above, here setting (after line search) the embedding size to 800d and window size to 7.
For the LDA baseline, we followed the same strategy as above, arriving at 4 topics.
For NVDM, we set the number of dimensions in the hidden layer to 500, and set the embedding dimension to 145.As an activation for the hidden layer we used Tanh.Again the model was trained using the Adam optimizer, with a learning rate of 5e-5 and early stopping based on validation perplexity.
For the ABAE baseline, we use the same hyperparameters as in He et al. (2017) paper, since BeerAdvocate was one of the evaluation datasets used in said work.
For NVDM + Triplet Loss, we used same settings as reported above for NVDM, with loss weighting of 0.001.
For the DSSM baseline, we used 200d filters with filter windows of sizes (2, 4) and output embedding dimension of 256.We used tanh activations for both convolution and output layer and l2 regularization on weights of 1e-6.The model was trained using Adam optimizer with early stopping on validation loss.

B.3.1 Baselines
To induce the baseline TF-IDF vectors, we used the scikit-learn implementation.TF-IDF parameters were fit on the available training set.Our vocabulary comprised those words that appeared in at least 5 unique reviews, which corresponded to 10K words.We used cosine similarity for evaluation.
For Doc2Vec, we set the embedding size to 800d and the window size to 7. For LDA, we again used the scikit-learn implementation, setting the number of topics to 10.
For NVDM, we set the number of dimensions in the hidden layer to 500, and set the embedding dimension to 200.As an activation for the hidden layer we used Tanh.Again the model was trained using the Adam optimizer, with a learning rate of 5e-5 and early stopping based on validation perplexity.
For ABAE, we use the same hyperparameters in He et al. (2017), since Yelp was another one of the evaluation datasets used in their work.
For NVDM + Triplet Loss, we used same set-tings as above for NVDM, and again used loss weighting of 0.001.For the DSSM baseline, we used 200d filters with filter windows of sizes (1, 4) and output embedding dimension of 256.We used tanh activations for both convolution and output layer and l2 regularization on weights of 1e-6.The model was trained using the Adam optimizer with early stopping on validation loss.Most activated words for hotels/restaurant data.

C Yelp
We did not have sufficient space to include the most activated words per aspect (inferred via the gating mechanism) for the Yelp!/TripAdvisor corpus in the manuscript (as we did for the other domains) and so we include them here.

D Highlighted Text
To visualize the output of gating mechanism g ∈ R N ×1 , we perform the following transformations: 1. Since the gating is applied on the top convolution layer, there is no one-to-one correspondence between words and values in g.Hence, we convolved the output a 5 length mean filter to smooth the gating output.
2. We normalize the range of values in g to [0, 1] using the following equation: In the following subsections, we provide 3 examples from each of our datasets highlighted with corresponding gating output.

Color Legend : Population Intervention Outcome
Example 1 determined the efficacy of unk ( unk ) in a clinical population of aggressive , urban children diagnosed with attention deficit unk disorder ( adhd ) . in previous studies of unk children with adhd , unk has been shown to be effective when compared with placebo .eighteen unk children ( ages qqq to qqq years ) , diagnosed with adhd and attending a unk treatment program for youth with unk behavior disorders , participated in a double-blind placebo trial with assessment data obtained from staff in the program and parents at home .based on staff ratings of the children 's behavior in the program and an academic classroom , the children displayed significant improvements in adhd symptoms and aggressive behavior with unk and high-dose unk conditions .at home , parents and unk reported few significant differences between placebo and unk on behavior ratings . in both settings , unk was well tolerated with few side effects found during active drug conditions .
determined the efficacy of unk ( unk ) in a clinical population of aggressive , urban children diagnosed with attention deficit unk disorder ( adhd ) . in previous studies of unk children with adhd , unk has been shown to be effective when compared with placebo .eighteen unk children ( ages qqq to qqq years ) , diagnosed with adhd and attending a unk treatment program for youth with unk behavior disorders , participated in a double-blind placebo trial with assessment data obtained from staff in the program and parents at home .based on staff ratings of the children 's behavior in the program and an academic classroom , the children displayed significant improvements in adhd symptoms and aggressive behavior with unk and high-dose unk conditions .at home , parents and unk reported few significant differences between placebo and unk on behavior ratings . in both settings , unk was well tolerated with few side effects found during active drug conditions .
determined the efficacy of unk ( unk ) in a clinical population of aggressive , urban children diagnosed with attention deficit unk disorder ( adhd ) . in previous studies of unk children with adhd , unk has been shown to be effective when compared with placebo .eighteen unk children ( ages qqq to qqq years ) , diagnosed with adhd and attending a unk treatment program for youth with unk behavior disorders , participated in a double-blind placebo trial with assessment data obtained from staff in the program and parents at home .based on staff ratings of the children 's behavior in the program and an academic classroom , the children displayed significant improvements in adhd symptoms and aggressive behavior with unk and high-dose unk conditions .at home , parents and unk reported few significant differences between placebo and unk on behavior ratings . in both settings , unk was well tolerated with few side effects found during active drug conditions .
Example 2 aims : to assess maternal and neonatal complications in pregnancies of diabetic women treated with oral hypoglycaemic agents during pregnancy .methods : a cohort study including all unk registered , orally treated pregnant diabetic patients set in a diabetic unk service at a university hospital : qqq women treated with metformin , qqq women treated with sulphonylurea during pregnancy and a reference group of qqq diabetic women treated with insulin during pregnancy .results : the prevalence of pre-eclampsia was significantly increased in the group of women treated with metformin compared to women treated with sulphonylurea or insulin ( qqq vs. qqq vs. qqq % , p ¡ qqq ) .no difference in neonatal morbidity was observed between the orally treated and unk group ; no cases of severe hypoglycaemia or jaundice were seen in the orally treated groups .however , in the group of women treated with metformin in the third trimester , the perinatal mortality was significantly increased compared to women not treated with metformin ( qqq vs. qqq % , p ¡ qqq ) .conclusion : treatment with metformin during pregnancy was associated with increased prevalence of pre-eclampsia and a high perinatal mortality .
aims : to assess maternal and neonatal complications in pregnancies of diabetic women treated with oral hypoglycaemic agents during pregnancy .methods : a cohort study including all unk registered , orally treated pregnant diabetic patients set in a diabetic unk service at a university hospital : qqq women treated with metformin , qqq women treated with sulphonylurea during pregnancy and a reference group of qqq diabetic women treated with insulin during pregnancy .results : the prevalence of pre-eclampsia was significantly increased in the group of women treated with metformin compared to women treated with sulphonylurea or insulin ( qqq vs. qqq vs. qqq % , p ¡ qqq ) .no difference in neonatal morbidity was observed between the orally treated and unk group ; no cases of severe hypoglycaemia or jaundice were seen in the orally treated groups .however , in the group of women treated with metformin in the third trimester , the perinatal mortality was significantly increased compared to women not treated with metformin ( qqq vs. qqq % , p ¡ qqq ) .conclusion : treatment with metformin during pregnancy was associated with increased prevalence of pre-eclampsia and a high perinatal mortality .
aims : to assess maternal and neonatal complications in pregnancies of diabetic women treated with oral hypoglycaemic agents during pregnancy .methods : a cohort study including all unk registered , orally treated pregnant diabetic patients set in a diabetic unk service at a university hospital : qqq women treated with metformin , qqq women treated with sulphonylurea during pregnancy and a reference group of qqq diabetic women treated with insulin during pregnancy .results : the prevalence of pre-eclampsia was significantly increased in the group of women treated with metformin compared to women treated with sulphonylurea or insulin ( qqq vs. qqq vs. qqq % , p ¡ qqq ) .no difference in neonatal morbidity was observed between the orally treated and unk group ; no cases of severe hypoglycaemia or jaundice were seen in the orally treated groups .however , in the group of women treated with metformin in the third trimester , the perinatal mortality was significantly increased compared to women not treated with metformin ( qqq vs. qqq % , p ¡ qqq ) .conclusion : treatment with metformin during pregnancy was associated with increased prevalence of pre-eclampsia and a high perinatal mortality .
Example 3 purpose : congestive heart failure is an important cause of patient morbidity and mortality .although several randomized clinical trials have compared beta-blockers with placebo for treatment of congestive heart failure , a meta-analysis unk the effect on mortality and morbidity has not been performed recently .data unk : the unk , unk , and unk of unk electronic unk were unk from qqq to july qqq unk were also identified from unk of unk unk .study selection : all randomized clinical trials of beta-blockers versus placebo in chronic stable congestive heart failure were included .data extraction : a specified protocol was followed to extract data on patient characteristics , unk used , overall mortality , hospitalizations for congestive heart failure , and study quality .data unk : a unk unk model was used to unk the results .a total of qqq trials involving qqq qqq patients were identified .there were qqq deaths among qqq patients randomly assigned to placebo and qqq deaths among qqq patients assigned to unk therapy . in these groups , qqq and qqq patients , respectively , required hospitalization for congestive heart failure .the probability that unk therapy reduced total mortality and hospitalizations for congestive heart failure was almost qqq % .the best estimates of these advantages are qqq unk unk and qqq fewer hospitalizations per qqq patients treated in the first year after therapy .the probability that these benefits are clinically significant ( ¿ qqq unk unk or ¿ qqq fewer hospitalizations per qqq patients treated ) is qqq % .both selective and unk agents produced these unk effects .the results are unk to any unk publication unk .conclusions : unk therapy is associated with clinically meaningful reductions in mortality and morbidity in patients with stable congestive heart failure and should be routinely offered to all patients similar to those included in trials .
purpose : congestive heart failure is an important cause of patient morbidity and mortality .although several randomized clinical trials have compared beta-blockers with placebo for treatment of congestive heart failure , a meta-analysis unk the effect on mortality and morbidity has not been performed recently .data unk : the unk , unk , and unk of unk electronic unk were unk from qqq to july qqq unk were also identified from unk of unk unk .study selection : all randomized clinical trials of beta-blockers versus placebo in chronic stable congestive heart failure were included .data extraction : a specified protocol was followed to extract data on patient characteristics , unk used , overall mortality , hospitalizations for congestive heart failure , and study quality .data unk : a unk unk model was used to unk the results .a total of qqq trials involving qqq qqq patients were identified .there were qqq deaths among qqq patients randomly assigned to placebo and qqq deaths among qqq patients assigned to unk therapy . in these groups , qqq and qqq patients , respectively , required hospitalization for congestive heart failure .the probability that unk therapy reduced total mortality and hospitalizations for congestive heart failure was almost qqq % .the best estimates of these advantages are qqq unk unk and qqq fewer hospitalizations per qqq patients treated in the first year after therapy .the probability that these benefits are clinically significant ( ¿ qqq unk unk or ¿ qqq fewer hospitalizations per qqq patients treated ) is qqq % .both selective and unk agents produced these unk effects .the results are unk to any unk publication unk .conclusions : unk therapy is associated with clinically meaningful reductions in mortality and morbidity in patients with stable congestive heart failure and should be routinely offered to all patients similar to those included in trials .
purpose : congestive heart failure is an important cause of patient morbidity and mortality .although several randomized clinical trials have compared beta-blockers with placebo for treatment of congestive heart failure , a meta-analysis unk the effect on mortality and morbidity has not been performed recently .data unk : the unk , unk , and unk of unk electronic unk were unk from qqq to july qqq unk were also identified from unk of unk unk .study selection : all randomized clinical trials of beta-blockers versus placebo in chronic stable congestive heart failure were included .data extraction : a specified protocol was followed to extract data on patient characteristics , unk used , overall mortality , hospitalizations for congestive heart failure , and study quality .data unk : a unk unk model was used to unk the results .a total of qqq trials involving qqq qqq patients were identified .there were qqq deaths among qqq patients randomly assigned to placebo and qqq deaths among qqq patients assigned to unk therapy . in these groups , qqq and qqq patients , respectively , required hospitalization for congestive heart failure .the probability that unk therapy reduced total mortality and hospitalizations for congestive heart failure was almost qqq % .the best estimates of these advantages are qqq unk unk and qqq fewer hospitalizations per qqq patients treated in the first year after therapy .the probability that these benefits are clinically significant ( ¿ qqq unk unk or ¿ qqq fewer hospitalizations per qqq patients treated ) is qqq % .both selective and unk agents produced these unk effects .the results are unk to any unk publication unk .conclusions : unk therapy is associated with clinically meaningful reductions in mortality and morbidity in patients stable congestive heart failure and should be routinely offered to all patients similar to those included in trials .

D.2 BeerAdvocate
Color Legend : Look Aroma Palate Taste Example 1 appearance was gold , clear , no head , and fizzy .almost looked like champagne .aroma was actually real good .sweet apricot , unk smith apple , almost like a unk wine .taste was unk .i unk a couple of bottles an tried them so if they were skunked , it was unk unk that were bad .there was no real apricot taste , and no sweetness .it tasted like hay but not in the good wheat beer sense , more like dirty hay thats been under the budweiser unk .mouthfeel was thin and carbonated .drinkability . . .well this is the second beer in my long list that i 've actually had to pour out .
appearance was gold , clear , no head , and fizzy .almost looked like champagne .aroma was actually real good .sweet apricot , unk smith apple , almost like a unk wine .taste was unk .i unk a couple of bottles an tried them so if they were skunked , it was unk unk that were bad .there was no real apricot taste , and no sweetness .it tasted like hay but not in the good wheat beer sense , more like dirty hay thats been under the budweiser unk .mouthfeel was thin and carbonated .drinkability . . .well this is the second beer in my long list that i 've actually had to pour out .
appearance was gold , clear , no head , and fizzy .almost looked like champagne .aroma was actually real good .sweet apricot , unk smith apple , almost like a unk wine .taste was unk .i unk a couple of bottles an tried them so if they were skunked , it was unk unk that were bad .there was no real apricot taste , and no sweetness .it tasted like hay but not in the good wheat beer sense , more like dirty hay thats been under the budweiser unk .mouthfeel was thin and carbonated .drinkability . . .well this is the second beer in my long list that i 've actually had to pour out .
appearance was gold , clear , no head , and fizzy .almost looked like champagne .aroma was actually real good .sweet apricot , unk smith apple , almost like a unk wine .taste was unk .i unk a couple of bottles an tried them so if they were skunked , it was unk unk that were bad .there was no real apricot taste , and no sweetness .it tasted like hay but not in the good wheat beer sense , more like dirty hay thats been under the budweiser unk .mouthfeel was thin and carbonated .drinkability . . .well this is the second beer in my long list that i 've actually had to pour out .
Example 2 poured into a nonic pint glass . . .jet black , unk no highlights around the edges at all , dense tan unk head . . .looks great ! a bit of sediment in the bottom of the bottle .bottle read " live ale , keep unk ." the store where i bought it from had it on the shelf at room temp : ( the smell was surprising . . .an earthy roasted smell , mixed with day old coffee .also some weird " off " licorice notes .taste was dry at the start , and very dry on the finish .bitter roast notes with a sort of unk .full bodied with appropriate carbonation .i was very excited to try this beer , and i was pretty disappointed .this bottle could be old , or the flavors could be " off " from the room temp unk .
poured into a nonic pint glass . . .jet black , unk no highlights around the edges at all , dense tan unk head . . .looks great ! a bit of sediment in the bottom of the bottle .bottle read " live ale , keep unk ." the store where i bought it from had it on the shelf at room temp : ( the smell was surprising . . .an earthy roasted smell , mixed with day old coffee .also some weird " off " licorice notes .taste was dry at the start , and very dry on the finish .bitter roast notes with a sort of unk .full bodied with appropriate carbonation .i was very excited to try this beer , and i was pretty disappointed .this bottle could be old , or the flavors could be " off " from the room temp unk .
poured into a nonic pint glass . . .jet black , unk no highlights around the edges at all , dense tan unk head . . .looks great ! a bit of sediment in the bottom of the bottle .bottle read " live ale , keep unk ." the store where i bought it from had it on the shelf at room temp : ( the smell was surprising . . .an earthy roasted smell , mixed with day old coffee .also some weird " off " licorice notes .taste was dry at the start , and very dry on the finish .bitter roast notes with a sort of unk .full bodied with appropriate carbonation .i was very excited to try this beer , and i was pretty disappointed .this bottle could be old , or the flavors could be " off " from the room temp unk .poured into a nonic pint glass . . .jet black , unk no highlights around the edges at all , dense tan unk head . . .looks great ! a bit of sediment in the bottom of the bottle .bottle read " live ale , keep unk ." the store where i bought it from had it on the shelf at room temp : ( the smell was surprising . . .an earthy roasted smell , mixed with day old coffee .also some weird " off " licorice notes .taste was dry at the start , and very dry on the finish .bitter roast notes with a sort of unk .full bodied with appropriate carbonation .i was very excited to try this beer , and i was pretty disappointed .this bottle could be old , or the flavors could be " off " from the room temp unk .
Example 3 this is a real " nothing " beer .pours a unk yellow color , looking more like unk water than beer .strange unk smell , with minimal unk hop aroma unk away behind whatever it is that unk in this brew .taste is equally as unk .nearly unk , the most you get from this beer is a slightly sweet unk flavor and a lot of carbonation in the beginning .watery and thin , but something that goes down easy so the drinkability is unk up slightly .unk .
this is a real " nothing " beer .pours a unk yellow color , looking more like unk water than beer .strange unk smell , with minimal unk hop aroma unk away behind whatever it is that unk in this brew .taste is equally as unk .nearly unk , the most you get from this beer is a slightly sweet unk flavor and a lot of carbonation in the beginning .thin , but something that goes down easy so the drinkability is unk up slightly .unk .
this is a real " nothing " beer .pours a unk yellow color , looking more like unk water than beer .strange unk smell , with minimal unk hop aroma unk away behind whatever it is that unk in this brew .taste is equally as unk .nearly unk , the most you get from this beer is a slightly sweet unk flavor and a lot of carbonation in the beginning .watery and thin , but something that goes down easy so the drinkability is unk up slightly .unk .
this is a real " nothing " beer .pours a unk yellow color , looking more like unk water than beer .strange unk smell , with minimal unk hop aroma unk away behind whatever it is that unk in this brew .taste is equally as unk .nearly unk , the most you get from this beer is a slightly sweet unk flavor and a lot of carbonation in the beginning .watery and thin , but something that goes down easy so the drinkability is unk up slightly .unk .

D.3 Yelp!/TripAdvisor Dataset
Color Legend : Sentiment Domain Example 1 i love everything about this place , i find my self there more often then i want , the deserts are amazing , so many choices , the deli is awesome , the sushi bar is great , you can not go wrong with this place , if you want a formal dining you can sit at the full service restaurant .
i love everything about this place , i find my self there more often then i want , the deserts are amazing , so many choices , the deli is awesome , the sushi bar is great , you can not go wrong with this place , if you want a formal dining you can sit at the full service restaurant .
Example 2 we have just returned from our first trip to rome and what a wonderful time we had !our stay at unk unk was everything we had hoped for and all we had read about it was true ! the room was small , but very clean and comfortable .the staff was fantastic and friendly , very and the breakfast every day was wonderful !we loved the lift up to the rooms ! the location was great and at night we left the windows open and listened to the bustle of the street below and a unk off somewhere . . .we felt so ' roman " !they gave us champagne on our last night ! the gentleman who works the night shift even gave us a personal escort to unk unk , a maze of metro stops , and delivered us right at the gate !unk was just so nice !i highly recommend this small hotel and would absolutely stay there again if we are lucky enough to return !they deserve their # qqq tripadvisor rating !thanks you , unk unk , for your part in making this trip such a pleasure !we have just returned from our first trip to rome and what a wonderful time we had !our stay at unk unk was everything we had hoped for and all we had read about it was true ! the room was small , but very clean and comfortable .the staff was fantastic and friendly , very helpful and the breakfast every day was wonderful !we loved the lift up to the rooms ! the location was great and at night we left the windows open and listened to the bustle of the street below and a unk off somewhere . . .we felt so ' roman " !they gave us champagne on our last night ! the gentleman who works the night shift even gave us a personal escort to unk unk , a maze of metro stops , and delivered us right at the gate !unk was just so nice !i highly recommend this small hotel and would absolutely stay there again if we are lucky enough to return !they deserve their # qqq tripadvisor rating !thanks you , unk unk , for your part in making this trip such a pleasure !Example 3 oh , unk .true to name , you 're like a guilty indulgence that i want more of .while visiting cleveland , i went to unk for a late dinner around qqq on a thursday .i sat at the bar , and there was quite a full crowd because they have a second , late happy hour from qqq with great deals on drinks and small plates .i ordered the house red wine ( $ qqq at happy hour ! ) , and two suggestions of the bartender : the roasted dates to start and the steak entree with a side of brussels sprouts .the dates ( $ qqq ) were incredibly decadent : roasted and topped with almonds , bacon , unk and parsley , they are a party in your both of sweet and salty and spicy .the sirloin ( $ qqq ) , which the bartender told me is a new menu item , was beautifully cooked and topped with parmesan , truffle butter , arugula and mushrooms .it was good , although there are other things on the menu that might be more exciting .the stand out dish of the evening was the fried brussels sprouts , which i would very much like the recipe for so i can unk myself on them every night .those babies , topped with crispy crumbled bits of unk , capers and walnuts , were pure bliss .and let 's be honest , how often do you really have the opportunity to call unk sprouts unk ?probably not very often .i unk off the evening with a glass of unk scotch , neat .perfection .

Figure 1 :
Figure 1: We propose associating aspects with encoders (low-level parameters are shared across aspects; this is not shown) and training these with triplets codifying aspect-wise relative similarities.

Figure 2 :
Figure 2: Schematic of our encoder architecture.

Figure 3 :
Figure 3: TSNE-reduced scatter of disentangled PICO embeddings of abstracts involving "decision aid" interventions.Abstracts are colored by known population group (see legend).Population embeddings for studies in the same group co-localize, more so than in the intervention and outcome space.
Number of children: 10 Inclusion criteria: adolescents (puberty stage at least 2, at least 12 years) Exclusion criteria: treatment in previous 2 weeks, daytime wetting, UTI, urinary tract abnormalities Previous treatment: failed using alarms, desmopressin, other drugs Age range 11-21, median 13 years Baseline wetting 4.7 (SD 1.1) wet nights/week Department of Paediatric Surgery, Sweden I Summary (s I ): A : desmopressin orally (dosage based on titration period) B : placebo Duration 4 weeks each The O summary (s O ) Number of children: 135 Dropouts: 23 excluded for noncompliance, and 39 lost to follow up including 12 failed with alarms Inclusion criteria: monosymptomatic nocturnal enuresis, age > 5 years Exclusion criteria: previous treatment with DDAVP or alarm, urological pathology, diurnal enuresis, UTI Age, mean years: 11.2 Baseline wetting: A 21% dry nights, B 14% dry nights I Summary (s I ): A (62): desmopressin 20 g intranasally increasing to 40 g if response partial B (73): alarm (pad-and-bell) Duration of treatment 3 months.If failed at that time, changed to alternative arm The O summary (s O ) DRY nights at 3 months: A 85%; B: 90% Number not achieving 14 dry nights: A 12/39; B: 6/37 Side effects: not mentioned (3) From review R 2 , we sample one study S .Matched summaries in the CDSR are as follows.P Summary (s P ): n = 8 Mean age = 52 Inclusion: intrinsic asthma, constant reversibility > 20% None had an acute exacerbation at time of study Exclusion: none listed I summary (s I ): #1: Atenolol 100 mg Metoprolol 100 mg Placebo #2: Terbutaline (IV then inhaled) after Tx or placebo O summary (s O ): FEV1 Symptoms (4) We note how the summaries for S and S are similar to each other (but not identical) since they belong in the same review whereas they are quite different from summaries for S which belongs in a different review.Now we construct the triplet (s, d, o) P as follows: s = s P : Number of children: 135 Dropouts: 23 excluded for noncompliance, and 39 lost to follow up including 12 failed with alarms Inclusion criteria: monosymptomatic nocturnal enuresis, age > 5 years Exclusion criteria: previous treatment with DDAVP or alarm, urological pathology, diurnal enuresis, UTI Age, mean years: 11.2 Baseline wetting: A 21% dry nights, B 14% dry nights So we have d : A and o = [s P |s I |s O ]: n = 8 Mean age = 52 Inclusion: intrinsic asthma, constant reversibility > 20% None had an acute exacerbation at time of study Exclusion: none listed -A (62): desmopressin 20 g intranasally increasing to 40 g if response partial B (73): alarm (pad-and-bell) Duration of treatment 3 months.If failed at that time, changed to alternative arm -DRY nights at 3 months: A 85%; B: 90% Number not achieving 14 dry nights: A 12/39; B: 6/37 Side effects: not mentioned

Table 1 :
, which exploits the same triplet loss AUCs achieved using different representations on the Cohen et al. corpus.Models to the right of the | are supervised; those to the right of || constitute the proposed disentangled embeddings.

Table 4 :
Top ten most activated words, as determined by the gating mechanism.

Table 5 :
AUC results for different representations on the BeerAdvocate data.Models beneath the second line are supervised.

Table 6 :
Cross AUC results for different representations on the BeerAdvocate data.Row: Embedding used.Column: Aspect evaluated against.

Table 9 :
Gate activations for each aspect in an example beer review.

Table 10 :
AUC results for different representations on the Yelp!/TripAdvisor Data.Models beneath the second line are supervised.

Table 11 :
Cross AUC results for different representations for Yelp!/TripAdvisor Dataset.
Eisenstein et al. (2011)a product of multiple latent factors characterizing a text.This is similar to the Sparse Additive Generative (SAGE) model of text proposed byEisenstein et al. (2011).