Word-Level Uncertainty Estimation for Black-Box Text Classifiers using RNNs

Estimating uncertainties of Neural Network predictions paves the way towards more reliable and trustful text classifications. However, common uncertainty estimation approaches remain as black-boxes without explaining which features have led to the uncertainty of a prediction. This hinders users from understanding the cause of unreliable model behaviour. We introduce an approach to decompose and visualize the uncertainty of text classifiers at the level of words. Our approach builds on top of Recurrent Neural Networks and Bayesian modelling in order to provide detailed explanations of uncertainties, enabling a deeper reasoning about unreliable model behaviours. We conduct a preliminary experiment to check the impact and correctness of our approach. By explaining and investigating the predictive uncertainties of a sentiment analysis task, we argue that our approach is able to provide a more profound understanding of artificial decision making.


Introduction
Neural Networks or variations of them achieve state-of-the-art accuracy across a wide range of text classification tasks like sentiment analysis (Nakov et al., 2016) or spam detection (Wu et al., 2017). Neural Networks are however not interpretable, since they provide no information about why particular decisions were made (Baehrens et al., 2010). This lack of transparency makes it hard for users to assess the certainty and trustfulness of predictions. In the worst case, an unreliable prediction is considered correct even if it is not. It is thus important to know what a model does not know. This would allow treating particular error prone or unreliable predictions with additional care -enabling a better understanding of why wrong predictions occur.
Several techniques to assess the uncertainty of individual predictions have been successfully applied to Neural Networks (Gal and Ghahramani, 2016;Lakshminarayanan et al., 2017;Kendall and Gal, 2017). However, a global uncertainty estimation for the whole input text does not describe which features lead to an uncertain prediction. It is thus unclear, why a Neural Networks classifier is, e.g., uncertain whether a text represents a positive sentiment or whether it is a spam.
As a first step towards a better understanding of Neural Networks-based text classification tasks, we suggest the decomposition of uncertainties on the level of words. We present a novel uncertainty modelling approach that estimates word-level uncertainties in any text classification task. Our approach applies Bayesian modelling to a sequence attribution technique for Recurrent Neural Networks (RNNs). We implement the approach using TensorFlow and Keras and demonstrate its effectiveness by investigating word-level uncertainties in a sentiment analysis task.

Word-Level Uncertainty Estimation
We first introduce how we model uncertainty in a classification task and then describe how we decompose the prediction uncertainties on the level of words.

Modelling Predictive Uncertainty in Classification Tasks
To measure the uncertainty of Neural Networks predictions, we use the uncertainty modelling technique called Monte Carlo Dropout: According to Gal and Ghahramani (2016), the regularization method dropout (Srivastava et al., 2014) can be interpreted as a Bayesian approximation of a Gaussian process (Rasmussen, 2003). By enabling dropout at inference time, each forward pass uses a random sample of weights resulting in a probabilistic model. We obtain a sample of an approximated posterior distribution by feeding the same input e multiple times to the model. We approximate the mean predictive posterior probability denoted as p(y = c|e, D train ) by averaging the posterior probabilities of multiple forward passes. That is, where F is the number of forward passes, ω f the parameters of the f th sample, and D train the data used for training.
A measurement of the predictive uncertainty regarding an input e is derived by analysing the statistical dispersion of the output distribution. Kwon et al. (2020) propose a natural way of estimating uncertainties in Neural Networks. Their approach relies on a variation of the law of the total variance: is a vector of posterior probabilities p(y = c|e, ω f ) for each class c ∈ C of the f th forward pass, and diag(p f ) is a diagonal matrix with elements of the vector p f . Equation 1 allows to decompose uncertainties in its aleatory or epistemic components (Der Kiureghian and Ditlevsen, 2009). Aleatory uncertainty captures irreducible noise and randomness inherent in the observation. Epistemic uncertainty has its source in inadequate and missing knowledge and can be reduced by additional learning.

Attribution of Recurrent Neural Network Inputs
We build our approach to decompose prediction uncertainties on the level of words on top of Long Short Term Memory (LSTM) models, a Recurrent Neural Network variation initially described by Hochreiter and Schmidhuber (1997). An LSTM consists of a range of repeated cells, each computing a hidden state h t . For each index t of the input sequence e, the output of a corresponding cell is controlled by a set of gates as a function of a cell input x t and the previous hidden state h t−1 . To use an LSTM for text classification, we add a discriminative layer after the last activation vector h T to obtain class activation scores S ω c (e) = W c h T using a weight matrix W . Since the i th hidden state vector h i is updated by prior hidden states, previous elements of the input sequence ( 1 , ..., i−1 ) are already taken into account in the evaluation of h i . Thus, S ω c (e i ) = W c h i describes the accumulated class activation of the first i word embeddings e i = ( 1 , ..., i ) of an input e. In order to assess the contribution of a single word to the final prediction, we decompose the final class activation score into the sum of multiple individual word contributions: Computing the mean posterior probability p(y = c|e i , D train ) for each index 1 ≤ i ≤ E by applying Monte Carlo Dropout allows us to assess the development of uncertainties along the input sequence. Analogous to Equation 2, we measure the word-level aleatory uncertainty U a , epistemic uncertainty U e , or total uncertainty U t as the change of uncertainty contributed by a single word: Additionally, we derive the relevance R c of each word regarding its contribution to the final class activation score S c (e) = 1 . The relevance of a word is calculated as the class activation contribution by a word compared to its prior sequence:

Experiments
We conducted a preliminary evaluation of the advantage and correctness of our approach, by applying it to a common sentiment analysis task. We use the IMDB dataset (Maas et al., 2011), which consists of polarized film reviews. In our experiments we use an LSTM with an additional dropout-layer after the embedding-layer with p drop = 0.5. Further, we consider the LSTM configuration used in the official Tensorflow example 1 with pre-trained word2vec embeddings. Our implementation and experimental results are publicly available online 2 .

Decomposition of Classifier Outputs
First, we study the information gained by decomposing Neural Networks predictions and their uncertainties. We append a clearly positive review with 239 words to a clearly negative review with 140 words. For the new created review, Figure 1a shows the path of the mean posterior across the word index i of the evaluated input sequence e i . Figure 1b plots the corresponding total uncertainty as well as its aleatory and epistemic components. At the beginning of the second review, the mean posterior probability drops and starts to become highly uncertain. Furthermore, Figure 1b shows that the uncertainty starts to increase when the sentiment shifts. Thus, our approach seems to correctly infer sentiment changes in the input sequence. Overall this example indicates that the decomposition of Neural Networks outputs can provide valuable information to support the understanding of Neural Networks decisions.

Word-Level Uncertainties
To check the contribution of single words to the models output, we analyse the connection between the average word relevance and uncertainty contributed by each word, as shown on Figure 2. The xaxis denotes the word relevance R c and the y-axis refers to the aleatory and epistemic uncertainty. The plot reveals that relevant words are more likely to increase or decrease the uncertainty of the model. Furthermore, uncommon words are likely to contribute to the uncertainty, whereas most frequently used words reduce uncertainty. Comparing the figures shows a similar behaviour of the aleatory and epistemic uncertainty. It is worth to note that the model is overall less affected by epistemic uncertainty compared to aleatory uncertainty.
In Figure 3, we visualize word-level relevance and uncertainties obtained by our approach with a Heatmap. Figure 3a shows the word relevance, where negative sentiment words are highlighted in red and positive sentiments are highlighted in green. The example is classified positive. Figure 3b shows the total uncertainty contributed by each word. Words which reduce the uncertainty are marked in blue and words which add uncertainty in orange. Further, we vary the opacity of the font to indicate the sequence uncertainty U t (e i ) at each index i. A low opacity indicates a high sequence uncertainty and vice versa. In the given example, the model is uncertain until it observes a relevant term. The term 'excellent' reduces the uncertainty. Hereby, the model becomes more confident that the example belongs to the positive sentiment class. When the contradicting term 'unfortunately' is observed in the sequence, the class probability drops, resulting in an increased uncertainty. Finally, the model remains highly uncertain in this example about the overall prediction.

Related Work
Uncertainty Estimation in Neural Networks. Prior work aimed at estimating predictive uncertainties in Neural Networks (Xiao and Wang, 2019;Kendall and Gal, 2017;Kwon et al., 2020;Gal and Ghahramani, 2016) and applied these to text classification tasks (Xiao and Wang, 2019;Burkhardt et al., 2018;Siddhant and Lipton, 2018). However, these techniques are generally used to only assess input-level rather than word-level uncertainties. Li et al. (2017) investigate the detection of uncertain words using Neural Networks, too. However, they learn the words' uncertainty from a labelled dataset, while our approach can be considered unsupervised as it does not require additional labelling.
Attribution of Neural Networks. The field of explainable AI (Adadi and Berrada, 2018) seeks to overcome the black-box processing of Neural Networks by providing humans interpretable information to reason about artificial decision-making. Techniques such as gradient based sensitivity analysis (Simonyan et al., 2013) or layer-wise relevance propagation (Bach et al., 2015) allows us to infer word relevance, which can also be achieved by our approach. Techniques to explain word relevance have been previously applied to text classifiers (Li et al., 2016;Arras et al., 2017). However, these techniques do not assess the uncertainty of a prediction. Du et al. (2019) follow a different approach to decompose RNNs outputs to assess feature relevance. Our approach might be further improved by their findings.

Conclusion and Further Work
This paper proposes a simple novel approach to estimate word-level uncertainties in text classification tasks. Our approach uses Monte Carlo Dropout in conjuncture with a sequence modelling technique to decompose uncertainties. Our approach does not require additional labelling effort beside the original training data for the classifier. We exemplary show that the transparency gained by applying our approach enables a deeper understanding of artificial decision-making. The visualization in Figure 3 can, e.g., help a human moderator to understand the classifier uncertainty. We plan several empirical studies to examine the impact and benefits of word-level uncertainty awareness in Human-in-the-Loop applications (Zanzotto, 2019). Further, we plan to adapt and compare our approach to additional RNN variations like bidirectional LSTMs and gated recurrent units (GRUs).