Multi-Task Learning Framework for Mining Crowd Intelligence towards Clinical Treatment

In recent past, social media has emerged as an active platform in the context of healthcare and medicine. In this paper, we present a study where medical user’s opinions on health-related issues are analyzed to capture the medical sentiment at a blog level. The medical sentiments can be studied in various facets such as medical condition, treatment, and medication that characterize the overall health status of the user. Considering these facets, we treat analysis of this information as a multi-task classification problem. In this paper, we adopt a novel adversarial learning approach for our multi-task learning framework to learn the sentiment’s strengths expressed in a medical blog. Our evaluation shows promising results for our target tasks.


Introduction
"Can someone help me please????". These types of queries have swamped the web with the phenomenal rise in social media contents almost in every domain including health care. Generally, the users posts seeking for health-related information, sharing medical experiences and opinions of other users (i.e., patients, health professional or doctors). With the enormous amount of posts increasing day after day, it is difficult for the health professionals to read and answer every post. It would be helpful to have a sentiment analyzer that could study the user's sentiment associated with the post related to his/her health-status. In this paper, we make an attempt to capture medical sentiment (MS) by analyzing the subjectivity expressions describing a patient's medical conditions at the blog level. MS can be studied as an event that characterizes the patient's medical condition, in which the patient expresses stance 1 The reader is encouraged to contact the authors regarding the availability of data and source code towards clinical and social situations. The notion of sentiment in medical context unlike traditional sentiment analysis (SA) is more granular which can be studied after considering various aspects (Denecke and Deng, 2015) that can directly impact the users' health conditions, such as: (1) Changes in the medical condition (e.g., Sentiment can be observed as a change in a patient's medical condition which can improve or worsen over a time.) (2) Severity of the medical condition that impacts patient life (e.g., severe headache impacts the patient's life more than a mild headache.) (3) Outcome of a treatment (e.g., there may be positive or negative impacts in a patient's treatment.) In the current study, the problem of medical sentiment identification is addressed by exploiting two important associated aspects as shown in Figure-1 and leveraging their synergies in a deep multi-task learning framework. In recent years, neural network models have gained their popularity for solving problems in several domains (Misra et al., 2016;Luong et al., 2015), as they facilitate an efficient way of amalgamating information from several tasks. This method of multi-task learning provides advantages in (1) minimizing the number of parameters and (2) reducing the risk of over-fitting. The aim of multi-task learning (MTL) is to efficiently enhance the system performance by integrating the other similar tasks. The primal factor of MTL is the sharing scheme in latent feature space. Most of the existing methods on multi-task classification attempt to divide the features of different tasks based on task-specific and task-invariant feature space, considering only parameters of some components that could be shared. The major drawback of this mechanism is that the common feature space often incorporates some redundant task-variant features, while certain common features could also lie in the task specific feature space, leading to feature redundancy. Adversarial learning (Goodfellow et al., 2014) is the process of learning a model to correctly classify both unmodified data and adversarial data through the regularization method. It can be used to combat this issue by ensuring the mutual representation between the task that could inherently disjoint task-specific and task-invariant feature space. This helps in eliminating redundant features from the feature space.
Motivated by the success of adversarial learning in several classification tasks (Miyato et al., 2016;Ge et al., 2017), we adopt the adversarial multi-learning framework to capture the MS in various medical aspects.
Contribution: (i) a description of the medicalsentiment classification task by mining medical blogs using users sentiments towards medical condition and medication, and (ii) a method for analysis of medical sentiments over various aspects by exploiting the multi-task adversarial training framework which enables multiple aspects of MS tasks to be jointly trained.

Related Works
In the recent past, there has been a significant growth in the studies to analyze the sentiment of users in a healthcare/medical domain. The study conducted by Denecke and Deng (2015) provides the quantitative assessment of sentiment across the clinical narrative and social media sources. Towards this, they created a domain-specific corpus from MIMIC II database containing clinical doc-uments (nurse letters, radiology reports, and discharge summaries). They also studied users self reported drug reviews on blogs (WebMD, Dru-gRating) to asses the possible medical sentiments. Majority of the current research in medical sentiment analysis are focused on understanding the mental health disorder, mainly depression. Several shared tasks (Losada et al., 2017;Hollingshead et al., 2017) have also been organized to study the patient health-related opinions on social media. The challenge defined in Milne et al. (2016) aims to automatically classify the user posts from an online mental health forum into four different categories (crisis/red/amber/green) according to how urgently the post needs the attention. Shickel et al. (2016) introduced the notion of applying sentiment analysis to the mental health domain by defining new polarity classification scheme. They split the traditional 'neutral' class into both a dual polarity sentiment (both positive and negative) and a 'neither positive nor negative' sentiment class. Some of the other prominent works in the opinion mining in medical setting, includes studies by (Bobicev et al., 2012;Sokolova and Bobicev, 2011;Ali et al., 2013). In the study conducted by (Pestian et al., 2012), authors analyzed the emotions and sentiment of suicide notes. The other study in medical sentiment analysis includes the work of Bobicev et al. (2014), where they analyzed sequences of sentiments (encouragement, gratitude, confusion, facts, and endorsement) in In Vitro Fertilization (IVF) medical forum. In terms of methods, majority of the work utilizes machine learning technique (SVM, naive Bayes, logistic regression) by exploiting features such as 272 Figure 2: Architecture of proposed methodology bigram, trigram, parts of speech. Also, there has been predominant use of general sentiment lexicon, however their analysis shows that it does not help in capturing the medical sentiment. More domain specific knowledge is also embedded using medical knowledge graph such as UMLS to identify the medical condition and treatment .

Overview of the proposed model
We formulate the MS analysis problem as a multitask classification problem. Problem Statement: Let us assume that a blogtext P consisting of k sentences i.e., P = {s 1 , s 2 . . . s k } and the set of tasks, T = {t 1 , t 2 } be given. Let the data set of task t ∈ T be D t = {(x n t , y n t ) : n = 1 . . . N t }, where x n t denotes a blog-text P with the corresponding label y n t from a task t having N t instances. The task is to predictȳ t such thatȳ t = argmax yt {p(y t |x t )}.
We clearly illustrate the two tasks related to MS identification in Figure-1.
In this section, we present an overview of the proposed model for multi-task medical sentiment classification. We use the bi-directional gated recurrent units (Bi-GRU) (Chung et al., 2014) to encode the blog-text as it is computationally cheaper than long short term memory (LSTM) (Hochreiter and Schmidhuber, 1997). The updates for Bi-GRU units can be computed by where, h t and h t−1 are the hidden units at time t and (t − 1), respectively. x t is the input at time t.

Classification of Medical Blog
Let us assume that a blog-text P having k sentences and word sequence w = {w 1 , w 2 , . . . w l } be given. The embedding layer is used to find out the vector representation x i ∈ R d×V from a d dimensional pre-trained word embedding of vocabulary V . Each word w i ∈ w will be represented by its respective word embedding x i . The hidden units h l learned at the last time step (l) of sequence are considered as the encoding of the medical blog, P . The representations h l generated from the Eq 1 are fed to a fully connected softmax layer to generate the probability distribution over the given classes.
Here, W and z are weight matrix and bias vector, respectively. The termȳ denotes the predicted probability distribution. Loss Function: Cross entropy is used to define the loss function. Given a training dataset D = {(x i , y i ) : i = 1 . . . N }, the network parameters are trained to minimize the cross entropy of the predicted probability distributions (ȳ) and true probability distributions (y) over the C number of classes.
273  Features for multi-task learning: The multi-task learning is governed by sharing the latent features over different tasks. In the proposed neural network based model, the features are the hidden states of BiGRU at the end of sequence. Motivated by the shared-private feature sharing scheme in (Liu et al., 2017), for each task we define two feature spaces; task-specific and taskinvariant. Mathematically, for a given blog-text P of task t, we can compute its task-specific features h t l = BiGRU (h t l−1 , x l ) and task-invariant features f t l = BiGRU (f t l−1 , x l ). Subsequently, the final features will be the concatenation of both features.

Adversarial Training
Although the feature sharing scheme separates the features into two features spaces, but there is no guarantee that contamination will not be made. Inspired by adversarial networks, we follow the generative-discriminative strategy to avoid the contamination in features space in which a BiGRU works as generator (G) to generate task-invariant features. A discriminator model (D) is used to map the task-invariant features of a blog-text into a probability distribution. It is mainly a multilayer perceptron classifier which classifies a blog sentence into its respective tasks. The adversarial loss is used to train the model which produces task-invariant features such that a classifier cannot reliably predict the task based on these features. Similar to (Goodfellow et al., 2014;Liu et al., 2017), we use the following adversarial loss function i is the gold label indicating the type of the current task. Based on the recent work (Bousmalis et al., 2016;Liu et al., 2017) on shared-private latent space analysis, we introduce another divergence loss function L div to castigate the redundant features and encourage the task-invariant and task-specific feature extractors to encode different aspects of the inputs. The divergence loss function can be computed as L div = T t=1 F t T H t F , where F t and H t are two matrices, where rows are task-invariant and task-specific features of a blogtext from a task t. The . F denotes the Frobenius norm of the matrix. The final loss function L = α 1 L mtask + α 2 L adv + α 3 L div is used as underlying loss function to train the network. Here α 1 , α 2 and α 3 are the hyper-parameters of the networks.

Dataset and Experimental Setup
We generate a corpus 2 of 7, 490 blog-text collected on four popular groups, namely Depression, Allergy, Asthma, and Anxiety. Out of total blog-text, 5, 188 blogs concern about the medical conditions and 2, 302 are classified as medication. We provide the detailed dataset statistics for both the task is presented in Table-1. A team of three annotators 3 independently annotated the user posts with three classes on both the classification strategies. The Cohen's kappa approach (Cohen, 1960) was used to measure the inter-annotator agreement. We observe high agreement ratio of 0.79 (task 1) and 0.84 (task 2) for exact matching of the class w.r.t each blog post. We have performed 5-fold cross-validation experiment on both the datasets. The pre-trained embeddings (Mikolov et al., 2013) of dimension 300 were used in the experiments. The dimension of Bi-GRU hidden unit is set to 100 via grid search, on the basis of cross-validation performance. We choose the same value of 0.5 for both the weight factors λ 1 and λ 2 to impose equal importance on both the tasks. Training was performed using stochastic gradient descent over mini-batches of size 50 considering the Adadelta (Zeiler, 2012) update rule with an initial learning rate of 0.01. The min-max optimization is performed with the help of gradient reversal layer (Ganin et al., 2016). As a regularizer, we use dropout (Hinton et al., 2012) with a probability of 0.5. We train the network with 130 epochs. The optimal 4 hyper-parameter values are obtained via a grid search for α 1 , α 2 , and α 3 over the best cross-validation performance.

Performance Evaluation
In order to show the effectiveness of our proposed method, we chose the neural network models popular in single task and multitask setting for our specified problem of text classification. Baseline 1: Single Task-LSTM Baseline 2: Multi Task-LSTM (Liu et al., 2016). Table-2 reports the results of our proposed approach with baselines system. From the results, we observe that the performance on both the tasks significantly increase with the introduction of adversarial learning in multi-task framework. More specifically, compared to baseline 1, we observe the performance improvement of 2.81 and 1.31 Fscore points on Task 1 & 2, respectively. In multitask framework (Baseline 2), our system achieves the improvements of 2.35 and 2.28 F-score points on Task 1 & 2, respectively. We also analyze that mere introduction of multi-task framework sometimes may cause a drop in performance. This is because of the shared feature-space which includes both private and shared features leading to redundancy. Statistical significance test shows that the improvements over both the baselines are statistically significant as (p-value < 0.05).

Analysis
Our analysis on medical blog-text discovers that unlike traditional SA study on social media text, SA on medical text owes several unique challenges which have formed the major causes of the errors: (1) Usually, the user present the health related information in a more elusive way which requires deeper analysis of metaphor and sarcasm. For example: " Lol I'm just a big ball of anxiety fun.", " My head is like air." (2) MS is often presented implicitly which need to be inferred, for instance, from the medical concepts used in documents. Implicit MS (Exist) present in the blog are for example: "It almost feels like im half awake and half asleep." (3) The usage of abbreviated and short words have become ubiquitous in medical blog text. For e.g.,"Cit" for the "Citopram".
(4) The context scope of a sentiment changes extensively from a single phrase to multiple sentences. Moreover, adversative transitive words were widely used to link these phrases or sentences. The medical sentiment was bounded and implied by these inter and intra-sentence discourse relations. For example: "The thoughts are of anything which is quite good. I have an anxiety disorder but I can't cope with it..."

Conclusion and Future Work
In this paper, we have introduced different aspects of sentiments in the context of medicine such as 'medication' and 'medical condition' instead of conventional polarity to judge user's health status. For this, we have utilized highly representative medical blog text to validate our study. We have proposed a robust sentiment-sensitive multitask framework, settling on adversarial learning to capture the medical sentiment in the user's blogpost. We were able to obtain significant performance improvements over the state-of-the-art baseline system in all the cases. In future, we plan to address the implicit and sarcastic medical sentiments that account to the majority of the errors.