Identification of Emergency Blood Donation Request on Twitter

Social media-based text mining in healthcare has received special attention in recent times due to the enhanced accessibility of social media sites like Twitter. The increasing trend of spreading important information in distress can help patients reach out to prospective blood donors in a time bound manner. However such manual efforts are mostly inefficient due to the limited network of a user. In a novel step to solve this problem, we present an annotated Emergency Blood Donation Request (EBDR) dataset to classify tweets referring to the necessity of urgent blood donation requirement. Additionally, we also present an automated feature-based SVM classification technique that can help selective EBDR tweets reach relevant personals as well as medical authorities. Our experiments also present a quantitative evidence that linguistic along with handcrafted heuristics can act as the most representative set of signals this task with an accuracy of 97.89%.


Introduction
Sufficiency in the availability of blood in emergency situations can dramatically improve the life expectancy and quality of lives of patients in chronic medical conditions. However, many patients still suffer due to the dual challenges of timely availability and shortfall of required whole blood and blood components. In the case of countries with low rates of blood donation record, blood donation is largely dependent on the families and friends of patients, usually through the word of mouth or peer to peer networking. With the increasing accessibility of social media websites, several instances have emerged where the 1 The dataset and code are available for research purposes at https://github.com/pmathur5k10/EBDR friends and the family of the patients in need of a blood transfusion have tried to voice their urgent need of blood donation through social media channels. They have reached out to the online community through tweets, Facebook posts, and status updates on popular social media platforms. The effect of such tweets in an emergency situation is largely limited to a user's first few degrees of connections. Thus, it fails to reach the desired donor within the stipulated critical time. Problem Statement: Emergency Blood Donation Request (EBDR) detection is the task of identifying tweets that explicitly or implicitly mention a necessity for an urgent blood donor.

Challenges
The key challenges in preparing the corpus are: 1. More than one topic categorization: The task of EBDR tweets prediction is not limited to segregation into binary classes on the basis of certain keywords (like urgent need, blood required, etc.). It involves identification of multiple textual modalities such as blood group, quantity of blood required, the disease being treated and presence of personal details for authentication. 2. Multiple instances of retweets: It is difficult to obtain a unique set of EBDR tweets as many of the instances of such tweets extracted through the search API are retweets by immediate connections of the original tweet author. In many cases, the tweets describing the same events are spread by rephrasing, morphing or editing the original tweets causing duplication of tweet instances.

Contributions
The main contributions can be summarized as: • Building a corpus of annotated emergency blood donation request tweets divided into three separate datasets using different tweet extraction methodologies.
• Extraction of ancillary handcrafted features from tweets pertaining to the specifications of the requested blood donation.
• Feature modeling using four independent sets of tweet features: linguistic, handcrafted, user specific metadata and textual metadata for the purpose of tweet classification followed by determination of the most relevant set of auxiliary features for SVM based classification.

Related Work
Several successful attempts in health text mining have shown that social media can act as a rich source of information for public health monitoring (Broniatowski et al., 2015). MA and Eldredge developed an annotated dataset from consumer drug review posts on social media. Twitter data has been used previously for identifying mentions of medication intake (Mahata et al., 2018a,b), monitoring prescription drug abuse (Hanson et al., 2013). The domain of health text mining also extends to include mental health. Past work on identifying hateful behaviour (Mathur et al., 2018a,b) and suicidal behavior on Twitter ).

Dataset Creation
The tweets were collected between 10 May 2018 to 10 July 2018 using Twitter Streaming API. The major problem of extracting tweets for solicitation of blood donation was the infrequent and sporadic nature of their occurrence. Due to the limitation of time restriction imposed on the search query based retrieval, two parallel strategies were developed to build three separate datasets: 1. Personal Donation Requests(PDR) Dataset (1311 EBDR, 1511 non-EBDR): A curated list of 53 medical phrases was extracted from selected online blood donation information portals such as American Red Cross 2 , Australian Red Cross 3 and NHS Blood and transplant 4 by using TF/IDF (Ramos et al., 2003) to identify the most frequently occurring terms. A few such phrases have been depicted in Fig.1 which were used to query tweets related to blood donation after removing the stop words. Apart from them, tweets mined by using general medical terms were incorporated to form the complete dataset.

Blood
Donation Community(BDC) Dataset (1889 EBDR, 3268 non-EBDR): The list of users present in the tweets in PDR dataset was obtained. Specific Twitter handles of community blood donation groups were identified from these users and historical tweets from their timeline were extracted for the positive class. For the negative class, past tweets from extraneous users were collected.

Dataset HO: (741 EBDR, 1072 non-EBDR):
This represents the held-out dataset, with tweets collected using both approaches stated above and no overlap with PDR and BDC.
We utilized the traditional query based tweet mining practice which made the PDR dataset more generic in nature. In addition, we employed a Twitter handle supervision technique in BDC dataset to focus on a larger tweet corpus heavily specific to the positive class. Lastly, the held out dataset was created using both the techniques in conjunction for a fair assessment of real world cases. The collected corpus of tweets was filtered down to remove tweets involving non-English text using Ling-Pipe, non-Unicode characters, duplicate tweets, and tweets containing only URL's, images, videos or having less than 3 words.

Data Annotation
The datasets were annotated by two independent human annotators. In the case of conflict amongst the annotators, an NLP expert finally assigned the ground truth annotation for ambiguous tweets. A satisfactory agreement between the annotators was inferred from Cohen's Kappa score of 0.86 and the 2. Non-EBDR Tweets: • Tweets with no motive to discuss blood donation; e.g., "We will have to urgently counter this bloody war to sustain our basic living requirements". • Tweets related to general medical terminology or about general awareness; e.g., "Iron helps in blood clotting...". • Promotional content highlighting the usefulness of blood donation to reach out to a target audience. e.g., "Lets pledge to donate blood every 2 months to help people fighting Leukemia". • Tweets such as "We thank @user for registering as #AB-blood donor..." that portray gratitude for blood donation. • Tweets publicizing an offer to donate blood might give rise to contextual bias; e.g., "I will donate my rare O-blood..."   Table 1 presents the complete set of features corresponding to each annotated tweet. The feature set is composed of four constituents: (i) linguistic features, (ii) user metadata, (iii) textual metadata and (iv) handcrafted features. Linguistic features consist of standard unigrams and bigrams as n-grams features along with TF-IDF frequencies that capture the syntactic as well as semantic information. Tweet virality (Cha et al., 2010) and user's network worth (Recuero et al., 2011), measured by the count of friends, followers, favorites and status effect, are necessary parameters to gauge the ability to broadcast emergency messages through the social media network. A common observation during the tweet mining has been the presence of hashtags, URL's and user mentions related to blood donation in EBDR tweets. For instance, hashtags similar to #SaveLife, #BloodMatters, #HelpEmergency were prominently present in the positive category of EBDR dataset. Lastly, several handcrafted elements including presence of blood group, blood quantity required, the name of the hospital or blood bank soliciting blood donation on behalf of a patient, disease for which blood transfusion is desired, name, place and contact number of the patient; were extracted by human annotators. Personal details such as user mentions, name, address and phone numbers of patients and tweet posters were anonymized due to privacy concerns of individuals. This resulted in the accumulation of blood donation specific traits as depicted in Table 2. Table 3 shows the performance of datasets BDC, PDR and HO, where the three datasets have been trained using SVM classifier (Chang and Lin, 2011) by taking a combination of one or more feature sets mentioned in Section 4. The train-test split in each case was fixed to 70:30 and stratified five-fold cross-validation performance is reported  to account for any imbalance of tweet classes in the datasets that may occur.In each case, linguistic features achieve a marginally better accuracy as compared to the handcrafted features when trained separately, but outperform all other combinations of features when utilized in pair. The extensive under-performance due to inclusion of textual and user metadata prove that these feature sets poorly correlate with the positive class. Dataset BDC consists of a more number of samples having a direct correlation with emergency blood donation requests, as opposed to dataset PDR having a greater abundance of samples relevant to the topic of blood donation. This leads to a higher precision but lower recall in evaluation of Linguistic and Handcrafted feature based PDR dataset. In contrast, the datasets PDR and HO, show a better score, implying the ability to effectively identify posts of EBDR class, thereby reducing the false positive cases. Also, despite the downside of PDR dataset in terms of accuracy, the evaluation metrics follow a similar trend. The best performance in terms of F1-score is shown by using linguistic and handcrafted features in all the three datasets. The HO dataset performs better in terms of accuracy (97.89%) as compared to both PDR and BDC, implying that training classifiers with tweets covering various other topics and aspects increases its robustness towards noise.

Error Analysis
Some categories of errors that were noticed are: 1. Rants due to non-availability of blood donors: Tweets like "Can't believe we live in a pathetic world, no one came forward to donate a single bottle of B+ve blood ...#Hu-manityIsDead" are an example of reactionary posts. Such posts do not belong to EBDR. However the supervised classifiers classify such tweets into the same, making it difficult to separate false requests from genuine cases.
2. Acknowledgment of blood donation: The tweet "We thank @user for registering as #AB-blood donor ..." was correctly identified by the human annotators but misclassified by the automated classifiers. This can be attributed to the inefficiency of the classifiers to derive contextual meaning from the tweets.

Conclusion and Future Work
In this paper, we introduced a robust feature based classification system in addition to an annotated corpus to accurately identify Emergency Blood Donation Request (EBDR) tweets and separate them from other unrelated blood donation communication, referred to as non-EBDR tweets. Given the diverse nature of emergency request tweets, we adopted a two-way corpus construction strategy. We mine three datasets to probe various aspects such as robustness and accuracy and manually annotated them to validate the performance of the proposed classification system. In addition we also perform an analysis of the efficiency of four independent feature sets extracted from the tweets. The results point out that the linguistic features like n-grams and TF-IDF statistics along with handcrafted features related to blood donation requirement are best suited for classification. The EBDR data corpus can benefit researchers in various aspects including but not limited to (i) automatic evaluation of emergency blood donation requests from health posts, (ii) named entity extraction of patient details, blood group and quantity requirement statistics with the help of handcrafted features provided with the tweets, (iii) crisis assessment and management through social media monitoring of medical emergency events and (iv) feature modelling using genetic algorithms as done by Sawhney et al. (2018b,c).