Rationalizing Neural Predictions

Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications -- rationales -- that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.


Introduction
Many recent advances in NLP problems have come from formulating and training expressive and elaborate neural models. This includes models for sentiment classification, parsing, and machine translation among many others. The gains in accuracy have, however, come at the cost of interpretability since complex neural models offer little transparency concerning their inner workings. In many applications, such as medicine, predictions are used to drive critical decisions, including treatment options. It is necessary in such cases to be able to verify and under-the beer was n't what i expected, and i'm not sure it's "true to style", but i thought it was delicious. a very pleasant ruby red-amber color with a rela9vely brilliant finish, but a limited amount of carbona9on, from the look of it. aroma is what i think an amber ale should be -a nice blend of caramel and happiness bound together.

Ratings
Look: 5 stars Smell: 4 stars Figure 1: An example of a review with ranking in two categories. The rationale for Look prediction is shown in bold.
stand the underlying basis for the decisions. Ideally, complex neural models would not only yield improved performance but would also offer interpretable justifications -rationales -for their predictions.
In this paper, we propose a novel approach to incorporating rationale generation as an integral part of the overall learning problem. We limit ourselves to extractive (as opposed to abstractive) rationales. From this perspective, our rationales are simply subsets of the words from the input text that satisfy two key properties. First, the selected words represent short and coherent pieces of text (e.g., phrases) and, second, the selected words must alone suffice for prediction as a substitute of the original text. More concretely, consider the task of multi-aspect sentiment analysis. Figure 1 illustrates a product review along with user rating in terms of two categories or aspects. If the model in this case predicts five star rating for color, it should also identify the phrase "a very pleasant ruby red-amber color" as the rationale underlying this decision.
In most practical applications, rationale genera-tion must be learned entirely in an unsupervised manner. We therefore assume that our model with rationales is trained on the same data as the original neural models, without access to additional rationale annotations. In other words, target rationales are never provided during training; the intermediate step of rationale generation is guided only by the two desiderata discussed above. Our model is composed of two modular components that we call the generator and the encoder. Our generator specifies a distribution over possible rationales (extracted text) and the encoder maps any such text to task specific target values. They are trained jointly to minimize a cost function that favors short, concise rationales while enforcing that the rationales alone suffice for accurate prediction.
The notion of what counts as a rationale may be ambiguous in some contexts and the task of selecting rationales may therefore be challenging to evaluate. We focus on two domains where ambiguity is minimal (or can be minimized). The first scenario concerns with multi-aspect sentiment analysis exemplified by the beer review corpus (McAuley et al., 2012). A smaller test set in this corpus identifies, for each aspect, the sentence(s) that relate to this aspect. We can therefore directly evaluate our predictions on the sentence level with the caveat that our model makes selections on a finer level, in terms of words, not complete sentences. The second scenario concerns with the problem of retrieving similar questions. The extracted rationales should capture the main purpose of the questions. We can therefore evaluate the quality of rationales as a compressed proxy for the full text in terms of retrieval performance. Our model achieves high performance on both tasks. For instance, on the sentiment prediction task, our model achieves extraction accuracy of 96%, as compared to 38% and 81% obtained by the bigram SVM and a neural attention baseline.

Related Work
Developing sparse interpretable models is of considerable interest to the broader research community (Letham et al., 2015;Kim et al., 2015). The need for interpretability is even more pronounced with recent neural models. Efforts in this area include analyzing and visualizing state activation (Hermans and Schrauwen, 2013;Karpathy et al., 2015;, learning sparse interpretable word vectors (Faruqui et al., 2015b), and linking word vectors to semantic lexicons or word properties (Faruqui et al., 2015a;Herbelot and Vecchi, 2015).
Beyond learning to understand or further constrain the network to be directly interpretable, one can estimate interpretable proxies that approximate the network. Examples include extracting "if-then" rules (Thrun, 1995) and decision trees (Craven and Shavlik, 1996) from trained networks. More recently, Ribeiro et al. (2016) propose a modelagnostic framework where the proxy model is learned only for the target sample (and its neighborhood) thus ensuring locally valid approximations. Our work differs from these both in terms of what is meant by an explanation and how they are derived. In our case, an explanation consists of a concise yet sufficient portion of the text where the mechanism of selection is learned jointly with the predictor.
Attention based models offer another means to explicate the inner workings of neural models (Bahdanau et al., 2015;Cheng et al., 2016;Martins and Astudillo, 2016;Chen et al., 2015;Xu and Saenko, 2015;Yang et al., 2015). Such models have been successfully applied to many NLP problems, improving both prediction accuracy as well as visualization and interpretability (Rush et al., 2015;Rocktäschel et al., 2016;Hermann et al., 2015).  introduced a stochastic attention mechanism together with a more standard soft attention on image captioning task. Our rationale extraction can be understood as a type of stochastic attention although architectures and objectives differ. Moreover, we compartmentalize rationale generation from downstream encoding so as to expose knobs to directly control types of rationales that are acceptable, and to facilitate broader modular use in other applications.
Finally, we contrast our work with rationale-based classification (Zaidan et al., 2007;Marshall et al., 2015;Zhang et al., 2016) which seek to improve prediction by relying on richer annotations in the form of human-provided rationales. In our work, rationales are never given during training. The goal is to learn to generate them.

Extractive Rationale Generation
We formalize here the task of extractive rationale generation and illustrate it in the context of neural models. To this end, consider a typical NLP task where we are provided with a sequence of words as input, namely x = {x 1 , · · · , x l }, where each x t ∈ R d denotes the vector representation of the ith word. The learning problem is to map the input sequence x to a target vector in R m . For example, in multi-aspect sentiment analysis each coordinate of the target vector represents the response or rating pertaining to the associated aspect. In text retrieval, on the other hand, the target vectors are used to induce similarity assessments between input sequences. Broadly speaking, we can solve the associated learning problem by estimating a complex parameterized mapping enc(x) from input sequences to target vectors. We call this mapping an encoder. The training signal for these vectors is obtained either directly (e.g., multi-sentiment analysis) or via similarities (e.g., text retrieval). The challenge is that a complex neural encoder enc(x) reveals little about its internal workings and thus offers little in the way of justification for why a particular prediction was made.
In extractive rationale generation, our goal is to select a subset of the input sequence as a rationale. In order for the subset to qualify as a rationale it should satisfy two criteria: 1) the selected words should be interpretable and 2) they ought to suffice to reach nearly the same prediction (target vector) as the original input. In other words, a rationale must be short and sufficient. We will assume that a short selection is interpretable and focus on optimizing sufficiency under cardinality constraints.
We encapsulate the selection of words as a rationale generator which is another parameterized mapping gen(x) from input sequences to shorter sequences of words. Thus gen(x) must include only a few words and enc(gen(x)) should result in nearly the same target vector as the original input passed through the encoder or enc(x). We can think of the generator as a tagging model where each word in the input receives a binary tag pertaining to whether it is selected to be included in the rationale. In our case, the generator is probabilistic and specifies a distribution over possible selections.
The rationale generation task is entirely unsupervised in the sense that we assume no explicit annotations about which words should be included in the rationale. Put another way, the rationale is introduced as a latent variable, a constraint that guides how to interpret the input sequence. The encoder and generator are trained jointly, in an end-to-end fashion so as to function well together.

Encoder and Generator
We use multi-aspect sentiment prediction as a guiding example to instantiate the two key componentsthe encoder and the generator. The framework itself generalizes to other tasks.
Encoder enc(·): Given a training instance (x, y) where x = {x t } l t=1 is the input text sequence of length l and y ∈ [0, 1] m is the target m-dimensional sentiment vector, the neural encoder predictsỹ = enc(x). If trained on its own, the encoder would aim to minimize the discrepancy between the predicted sentiment vectorỹ and the gold target vector y. We will use the squared error (i.e. L 2 distance) as the sentiment loss function, The encoder could be realized in many ways such as a recurrent neural network. For example, let h t = f e (x t , h t−1 ) denote a parameterized recurrent unit mapping input word x t and previous state h t−1 to next state h t . The target vector is then generated on the basis of the final state reached by the recurrent unit after processing all the words in the input sequence. Specifically, Generator gen(·): The rationale generator extracts a subset of text from the original input x to function as an interpretable summary. Thus the rationale for a given sequence x can be equivalently defined in terms of binary variables {z 1 , · · · , z l } where each z t ∈ 0, 1 indicates whether word x t is selected or not. From here on, we will use z to specify the binary selections and thus (z, x) is the actual rationale generated (selections, input). We will use generator gen(x) as synonymous with a probability distribution over binary selections, i.e., z ∼ gen(x) ≡ p(z|x) where the length of z varies with the input x.
In a simple generator, the probability that the t th word is selected can be assumed to be conditionally independent from other selections given the input x. That is, the joint probability p(z|x) factors according to The component distributions p(z t |x) can be modeled using a shared bi-directional recurrent neural network. Specifically, let − → f () and ← − f () be the forward and backward recurrent unit, respectively, then Independent but context dependent selection of words is often sufficient. However, the model is unable to select phrases or refrain from selecting the same word again if already chosen. To this end, we also introduce a dependent selection of words, which can be also expressed as a recurrent neural network. To this end, we introduce another hidden state s t whose role is to couple the selections. For example, Joint objective: A rationale in our definition corresponds to the selected words, i.e., {x k |z k = 1}. We will use (z, x) as the shorthand for this rationale and, thus, enc(z, x) refers to the target vector obtained by applying the encoder to the rationale as the input. Our goal here is to formalize how the rationale can be made short and meaningful yet function well in conjunction with the encoder. Our generator and encoder are learned jointly to interact well but they are treated as independent units for modularity.
The generator is guided in two ways during learning. First, the rationale that it produces must suffice as a replacement for the input text. In other words, the target vector (sentiment) arising from the rationale should be close to the gold sentiment. The corresponding loss function is given by Note that the loss function depends directly (parametrically) on the encoder but only indirectly on the generator via the sampled selection.
Second, we must guide the generator to realize short and coherent rationales. It should select only a few words and those selections should form phrases (consecutive words) rather than represent isolated, disconnected words. We therefore introduce an additional regularizer over the selections where the first term penalizes the number of selections while the second one discourages transitions (encourages continuity of selections). Note that this regularizer also depends on the generator only indirectly via the selected rationale. This is because it is easier to assess the rationale once produced rather than directly guide how it is obtained.
Our final cost function is the combination of the two, cost(z, x, y) = L(z, x, y) + Ω(z). Since the selections are not provided during training, we minimize the expected cost: where θ e and θ g denote the set of parameters of the encoder and generator, respectively, and D is the collection of training instances. Our joint objective encourages the generator to compress the input text into coherent summaries that work well with the associated encoder it is trained with.
Minimizing the expected cost is challenging since it involves summing over all the possible choices of rationales z. This summation could potentially be made feasible with additional restrictive assumptions about the generator and encoder. However, we assume only that it is possible to efficiently sample from the generator.
Doubly stochastic gradient We now derive a sampled approximation to the gradient of the expected cost objective. This sampled approximation is obtained separately for each input text x so as to work well with an overall stochastic gradient method. Consider therefore a training pair (x, y). For the parameters of the generator θ g , The last term is the expected gradient where the expectation is taken with respect to the generator distribution over rationales z. Therefore, we can simply sample a few rationales z from the generator gen(x) and use the resulting average gradient in an overall stochastic gradient method. A sampled approximation to the gradient with respect to the encoder parameters θ e can be derived similarly, Choice of recurrent unit We employ recurrent convolution (RCNN), a refinement of local-ngram based convolution. RCNN attempts to learn n-gram features that are not necessarily consecutive, and average features in a dynamic (recurrent) fashion. Specifically, for bigrams (filter width n = 2) RCNN computes h t = f (x t , h t−1 ) as follows Number of reviews 1580k Avg length of review 144.9 Avg correlation between aspects 63.5% Max correlation between two aspects 79.1% Number of annotated reviews 994 RCNN has been shown to work remarkably in classification and retrieval applications (Lei et al., 2015;Lei et al., 2016) compared to other alternatives such CNNs and LSTMs. We use it for all the recurrent units introduced in our model.

Experiments
We evaluate the proposed joint model on two NLP applications: (1) multi-aspect sentiment analysis on product reviews and (2) similar text retrieval on AskUbuntu question answering forum.

Multi-aspect Sentiment Analysis
Dataset We use the BeerAdvocate 2 review dataset used in prior work (McAuley et al., 2012). 3 This dataset contains 1.5 million reviews written by the website users. The reviews are naturally multiaspect -each of them contains multiple sentences describing the overall impression or one particular aspect of a beer, including appearance, smell (aroma), palate and the taste. In addition to the written text, the reviewer provides the ratings (on a scale of 0 to 5 stars) for each aspect as well as an overall rating. The ratings can be fractional (e.g. 3.5 stars), so we normalize the scores to [0, 1] and use them as the (only) supervision for regression. McAuley et al. (2012) also provided sentencelevel annotations on around 1,000 reviews. Each sentence is annotated with one (or multiple) aspect label, indicating what aspect this sentence covers.   We use this set as our test set to evaluate the precision of words in the extracted rationales. Table 1 shows several statistics of the beer review dataset. The sentiment correlation between any pair of aspects (and the overall score) is quite high, getting 63.5% on average and a maximum of 79.1% (between the taste and overall score). If directly training the model on this set, the model can be confused due to such strong correlation. We therefore perform a preprocessing step, picking "less correlated" examples from the dataset. 4 This gives us a de-correlated subset for each aspect, each containing about 80k to 90k reviews. We use 10k as the development set. We focus on three aspects since the fourth aspect taste still gets > 50% correlation with the overall sentiment.

Sentiment Prediction
Before training the joint model, it is worth assessing the neural encoder separately to check how accurately the neural network predicts the sentiment. To this end, we compare neural encoders with bigram SVM model, training medium and large SVM models using 260k and all 4 Specifically, for each aspect we train a simple linear regression model to predict the rating of this aspect given the ratings of the other four aspects. We then keep picking reviews with largest prediction error until the sentiment correlation in the selected subset increases dramatically. nales (x-axis). 220k training data is used. 1580k reviews respectively. As shown in Table 3, the recurrent neural network models outperform the SVM model for sentiment prediction and also require less training data to achieve the performance. The LSTM and RCNN units obtain similar test error, getting 0.0094 and 0.0087 mean squared error respectively. The RCNN unit performs slightly better and uses less parameters. Based on the results, we choose the RCNN encoder network with 2 stacking layers and 200 hidden states.
To train the joint model, we also use RCNN unit with 200 states as the forward and backward recurrent unit for the generator gen(). The dependent generator has one additional recurrent layer. For this layer we use 30 states so the dependent version still has a number of parameters comparable to the independent version. The two versions of the generator have 358k and 323k parameters respectively. Figure 2 shows the performance of our joint dependent model when trained to predict the sentiment of all aspects. We vary the regularization λ 1 and λ 2 to show various runs that extract different amount of text as rationales. Our joint model gets performance close to the best encoder run (with full text) when few words are extracted. a beer that is not sold in my neck of the woods , but managed to get while on a roadtrip . poured into an imperial pint glass with a generous head that sustained life throughout . nothing out of the ordinary here , but a good brew s9ll . body was kind of heavy , but not thick . the hop smell was excellent and en9cing . very drinkable very dark beer . pours a nice finger and a half of creamy foam and stays throughout the beer . smells of coffee and roasted malt . has a major coffee-like taste with hints of chocolate . if you like black coffee , you will love this porter . creamy smooth mouthfeel and definitely gets smoother on the palate once it warms . it 's an ok porter but i feel there are much beAer one 's out there .
poured into a sniBer . produces a small coffee head that reduces quickly . black as night . preAy typical imp . roasted malts hit on the nose . a liAle sweet chocolate follows . big toasty character on the taste . in between i 'm geDng plenty of dark chocolate and some biAer espresso . it finishes with hop biAerness . nice smooth mouthfeel with perfect carbona9on for the style . overall a nice stout i would love to have again , maybe with some age on it .
i really did not like this . it just seemed extremely watery . i dont ' think this had any carbona9on whatsoever . maybe it was flat , who knows ? but even if i got a bad brew i do n't see how this would possibly be something i 'd get 9me and 9me again . i could taste the hops towards the middle , but the beer got preAy nasty towards the boAom . i would never drink this again , unless it was free . i 'm kind of upset i bought this . a : poured a nice dark brown with a tan colored head about half an inch thick , nice red/garnet accents when held to the light . liAle clumps of lacing all around the glass , not too shabby . not terribly impressive though s : smells like a more guinness-y guinness really , there are some roasted malts there , signature guinness smells , less burnt though , a liAle bit of chocolate … … m : rela9vely thick , it is n't an export stout or imperial stout , but s9ll is preAy heBy in the mouth , very smooth , not much carbona9on . not too shabby d : not quite as drinkable as the draught , but s9ll not too bad . i could easily see drinking a few of these . Rationale Selection To evaluate the supporting rationales for each aspect, we train the joint encodergenerator model on each de-correlated subset. We set the cardinality regularization λ 1 between values {2e − 4, 3e − 4, 4e − 4} so the extracted rationale texts are neither too long nor too short. For simplicity, we set λ 2 = 2λ 1 to encourage local coherency of the extraction.
For comparison we use the bigram SVM model and implement an attention-based neural network model. The SVM model successively extracts unigram or bigram (from the test reviews) with the highest feature. The attention-based model learns a normalized attention vector of the input tokens (using similarly the forward and backward RNNs), then the model averages over the encoder states accordingly to the attention, and feed the averaged vector to the output layer. Similar to the SVM model, the attention-based model can selects words based on their attention weights. The smell (aroma) aspect is the target aspect. Table 2 presents the precision of the extracted rationales calculated based on sentence-level aspect annotations. The λ 1 regularization hyper-parameter is tuned so the two versions of our model extract similar number of words as rationales. The SVM and attention-based model are constrained similarly for comparison. Figure 4 further shows the precision when different amounts of text are extracted. Again, for our model this corresponds to changing the λ 1 regularization. As shown in the table and the figure, our encoder-generator networks extract text pieces describing the target aspect with high precision, ranging from 80% to 96% across the three aspects appearance, smell and palate. The SVM baseline performs poorly, achieving around 30% accuracy. The attention-based model achieves reasonable but worse performance than the rationale generator, suggesting the potential of directly modeling rationales as explicit extraction. Figure 5 shows the learning curves of our model for the smell aspect. In the early training epochs, both the independent and (recurrent) dependent selection models fail to produce good rationales, getting low precision as a result. After a few epochs of exploration however, the models start to achieve high accuracy. We observe that the dependent version learns more quickly in general, but both versions obtain close results in the end.
Finally we conduct a qualitative case study on the extracted rationales. Figure 3 presents several reviews, with highlighted rationales predicted by the model. Our rationale generator identifies key phrases or adjectives that indicate the sentiment of a particular aspect.

Similar Text Retrieval on QA Forum
Dataset For our second application, we use the real-world AskUbuntu 5 dataset used in recent work (dos Santos et al., 2015;Lei et al., 2016). This set contains a set of 167k unique questions (each consisting a question title and a body) and 16k useridentified similar question pairs. Following previous work, this data is used to train the neural encoder that learns the vector representation of the input question, optimizing the cosine distance (i.e. cosine similarity) between similar questions against random non-similar ones. We use the "one-versusall" hinge loss (i.e. positive versus other negatives) for the encoder, similar to (Lei et al., 2016). During development and testing, the model is used to score 20 candidate questions given each query question, and a total of 400×20 query-candidate question pairs are annotated for evaluation 6 .
Task/Evaluation Setup The question descriptions are often long and fraught with irrelevant details. In this set-up, a fraction of the original question text should be sufficient to represent its content, and be used for retrieving similar questions. Therefore, we will evaluate rationales based on the accuracy of the question retrieval task, assuming that better rationales achieve higher performance. To put this performance in context, we also report the accuracy when full body of a question is used, as well as titles alone. The latter constitutes an upper bound on 5 askubuntu.com 6 https://github.com/taolei87/askubuntu  the model performance as in this dataset titles provide short, informative summaries of the question content. We evaluate the rationales using the mean average precision (MAP) of retrieval.
Results Table 4 presents the results of our rationale model. We explore a range of hyper-parameter values 7 . We include two runs for each version. The first one achieves the highest MAP on the development set, The second run is selected to compare the models when they use roughly 10% of question text (7 words on average). We also show the results of different runs in Figure 6. The rationales achieve the MAP up to 56.5%, getting close to using the titles. The models also outperform the baseline of using the noisy question bodies, indicating the the models' capacity of extracting short but important fragments. Figure 7 shows the rationales for several questions in the AskUbuntu domain, using the recurrent version with around 10% extraction. Interestingly, the model does not always select words from the question title. The reasons are that the question body can contain the same or even complementary information useful for retrieval. Indeed, some rationale fragments shown in the figure are error messages, i accidentally removed the ubuntu soBware centre , when i was actually trying to remove my ubuntu one applica9ons . although i do n't remember directly uninstalling the centre , i think dele9ng one of those packages might have triggered it . i can not look at history of applica9on changes , as the soBware centre is missing . please advise on how to install , or rather reinstall , ubuntu soBware centre on my computer . how do i install ubuntu soBware centre applica9on ?
i know this will be an odd ques9on , but i was wondering if anyone knew how to install the ubuntu installer package in an ubuntu installa9on . to clarify , when you boot up to an ubuntu livecd , it 's got the installer program available so that you can install ubuntu to a drive . naturally , this program is not present in the installed ubuntu . is there , though , a way to download and install it like other packages ? invariably , someone will ask what i 'm trying to do , and the answer … install installer package on an installed system ? what is the easiest way to install all the media codec available for ubuntu ? i am having issues with mul9ple applica9ons promp9ng me to install codecs before they can play my files . how do i install media codecs ? what should i do when i see <unk> report this <unk> ? an unresolvable problem occurred while ini9alizing the package informa9on . please report this bug against the 'update-manager ' package and include the following error message : e : encountered a sec9on with no package : header e : problem with mergelist <unk> e : the package lists or status file could not be parsed or opened . please any one give the solu9on for this whenever i try to convert the rpm file to deb file i always get this problem error : <unk> : not an rpm package ( or package manifest ) error execu9ng `` lang=c rpm -qp --queryformat % { name } <unk> ' '' : at <unk> line 489 thanks conver9ng rpm file to debian fle how do i mount a hibernated par99on with windows 8 in ubuntu ? i ca n't mount my other par99on with windows 8 , i have ubuntu 12.10 amd64 : error moun9ng /dev/sda1 at <unk> : command-line `mount -t `` n[s ' ' -o `` uhelper=udisks2 , nodev , nosuid , uid=1000 , gid=1000 , dmask=0077 , fmask=0177 '' `` /dev/sda1 '' `` <unk> '' ' exited with non-zero exit status 14 : windows is hibernated , refused to mount . failed to mount '/dev/sda1 ' : opera9on not permiAed the n[s par99on is hibernated . please resume and shutdown windows properly , or mount the volume read-only with the 'ro ' mount op9on which are typically not in the titles but very useful to identify similar questions.

Discussion
We proposed a novel modular neural framework to automatically generate concise yet sufficient text fragments to justify predictions made by neural networks. We demonstrated that our encoder-generator framework, trained in an end-to-end manner, gives rise to quality rationales in the absence of any explicit rationale annotations. The approach could be modified or extended in various ways to other applications or types of data.
Choices of enc(·) and gen(·). The encoder and generator can be realized in numerous ways without changing the broader algorithm. For instance, we could use a convolutional network (Kim, 2014;Kalchbrenner et al., 2014), deep averaging network (Iyyer et al., 2015;Joulin et al., 2016) or a boosting classifier as the encoder. When rationales can be expected to conform to repeated stereotypical patterns in the text, a simpler encoder consistent with this bias can work better. We emphasize that, in this paper, rationales are flexible explanations that may vary substantially from instance to another. On the generator side, many additional constraints could be imposed to further guide acceptable rationales.
Dealing with Search Space. Our training method employs a REINFORCE-style algorithm (Williams, 1992) where the gradient with respect to the parameters is estimated by sampling possible rationales.
Additional constraints on the generator output can be helpful in alleviating problems of exploring potentially a large space of possible rationales in terms of their interaction with the encoder. We could also apply variance reduction techniques to increase stability of stochastic training (cf. (Weaver and Tao, 2001;Mnih et al., 2014;).