Automatic Triage of Mental Health Online Forum Posts: CLPsych 2016 System Description

This paper presents a system capable of performing automatic triage of forum posts from ReachOut.com, a mental health online forum. The system assigns to each post a tag that indicates how urgently moderator attention is needed. The evaluation is based on experiments conducted on the CLPsych 2016 task, and the system is released as an open-source software.


Introduction
This paper describes a system that was presented at the CLPsych Shared Task 2016 1 . The goal of the task is to perform automatic triage of user posts gathered from the ReachOut.com mental health online forum 2 . Posts must be classified into four categories (green, amber, red, and crisis), which indicate how urgently any intervention from forum moderators is required. The automatic triage of Rea-chOut forum posts is a challenging task. First, the targeted documents -from the amber, red, and crisis classes -are highly underrepresented in the data to be analyzed. Second, forum post content can be highly noisy, since posts commonly present symbols, emoticons, pictures, and mispelled words.
The objective of an automatic triage of ReachOut posts is to allow forum moderators to quickly identify posts that require urgent intervention. Posts labeled as red or crisis could indicate an imminent dangerous or harmful condition, for example, an author that suggests a possibility of self-harm.
To handle the task of ReachOut post automatic triage, we propose a system relying on the combination of two text classification techniques, namely supervised learning and rule-based classification. Our experiments are performed utilizing three classification algorithms, and classification rules designed based on discriminative vocabularies selected from documents of the minority classes. In addition, we studied the use of different feature types and subsets. This paper is organized as follows: Section 2 describes some related works while Section 3 provides details about our approach, and the system architecture. Experiments and results are reported in Section 4, and we conclude in Section 5.

Related Work
The automatic triage of documents can be used to support a variety of data handling processes. It supports professionals and researchers working in the medical (Tuarob et al., 2014;Almeida et al., 2015) or biological fields (Almeida et al., 2014). Data gathered from forum posts have been used in several related classification tasks. In (Huh et al., 2013), the triage supports patients handling several health conditions, while it was used to identify mental health issues in (Saleem et al., 2012), and to recognize user sentiments in (Thelwall et al., 2012).
Designing efficient automatic approaches for textual data triage can be challenging, especially when documents of interest represent a very small part of the entire dataset. Machine learning approaches are impacted by the class distribution, and many classifiers do not perform well in unbalanced contexts. Support Vector Machines (SVM) (Vapnik, 1995) were previously utilized in forum post triage handling mental health subjects (Saleem et al., 2012). Models using Sequential Minimal Optimization (SMO) (Platt, 1998) for optimizing SVM, were applied to perform sentiment analysis in forum data, outperforming other methods when used on large datasets (Thelwall et al., 2012). Logistic Model Trees (LMT) (Landwehr et al., 2005) were shown to outperform other classification algorithms in tasks that handle (highly) imbalanced data (Charton et al., 2013;Almeida et al., 2014). Previous studies have combined rule-based and supervised classification approaches to handle forum posts (Saleem et al., 2012), patients medical records (Xu et al., 2012), or sentiment in social media (Chikersal et al., ). In these works, combined strategies usually obtained better performance compared to supervised only or rule-based only approaches.
The use of lexical features, such as n-grams, Part-Of-Speech (POS) tags, and lemmas, as well as sentiment dictionaries, were shown to perform well in tasks handling forum posts (Biyani et al., 2014), and mining sentiments or opinion (Thelwall et al., 2012). Feature selection methods have been studied to choose relevant attribute subsets (Liu et al., 2010;Basu and Murthy, 2012). Among these methods, Correlation-based Feature Selection (CFS) selects a subset of attributes that are highly correlated with the class, yet uncorrelated with each other (Hall, 1999). Methods to determine relevant vocabulary for specific class labels were previously studied (Melville et al., 2009;Charton et al., 2013). Melville et al. (2009) built a discriminative vocabulary to represent sentiment polarity, while Charton et al. (2013) used one to represent minority classes. In both cases, the use of discriminative vocabularies in the classification models improved performance.

Methodology
To tackle the task of automatic triage of forum posts, the proposed system combines rule-based and machine learning based classification. Our approach makes use of several feature types, such as n-grams, POS tags, and a sentiment dictionary generated from two sentiment libraries. Various features subsets were filtered using the CFS feature selection method. In the following sections we explain with more details the system pipeline, and the methods

CLPsych Dataset
The CLPsych corpus consists of 65024 publicly available posts gathered from the ReachOut forum, which have been posted between July 2012 and May 2015. Among these posts, 1188 posts were manually annotated with class labels, then split into a training and a test set. The training set is composed of 947 posts while the test set contains 241 posts. The class distribution on the training and the test data is shown in Table 1.

Feature Extraction and Selection
Prior to performing feature extraction, the forum posts were pre-processed by normalization procedures, which included normalizing HTML characters, symbols, punctuation, smiley pictures, and smiley symbols. Each smiley was replaced by a corresponding word extracted either from the picture URL, or from a concise mapping containing the smiley textual meaning (e.g., :) or =] or :D are all replaced by happy). The features used in our experiments were of type bigrams, POS tags, and sentiments. Extraction of POS tags was performed using the POSTaggerAnnotator from the Stanford CoreNLP suite (Manning et al., 2014). POS features are composed of forum post words annotated with discriminative POS tags, which were adjective (JJ*), nouns (NN*), predeterminer (PDT), particle (RP), and verbs (VB*). The selection of discriminative POS tags was based on experimental results. Sentiment features are dataset lemmas found within a sentiment dictionary. The dataset lemmas were extracted using the Stanford CoreNLP suite. We built a sentiment dictionary based on a list of feeling words used in mental status exams (see http://psychpage.com/learning/library/ assess/feelings.html), and a conceptual feature   (Cambria et al., 2014). Stopwords were not removed from the data, since they seem to carry relevant discriminative power for the task, as previously demonstrated by (Saif et al., 2014). All feature lists were separately filtered by the CFS method. Feature distributions by type before and after CFS filtering are reported in Table 2.

Classification Algorithms
We performed experiments utilizing three classification algorithms: Bayesian Network (BN) (Pearl, 1988), SMO, and LMT. A BN is a probabilistic directed acyclic graph, in which nodes are random variables with arcs representing their conditional dependencies. BN was used as a baseline classifier. SMO-SVM were previously applied in similar tasks as described in Section 2. SMO (Platt, 1998) is an optimization algorithm for training SVMs. SMO is an iterative algorithm that solves the quadratic programming problem of SVM training by breaking it into smaller sub-problems easier to solve. As described in Section 2, LMT previously demonstrated good performance in classification tasks on imbalanced datasets. LMT is an algorithm that produces decision trees with linear logistic models at the leaves.

Discriminative Vocabulary Rules
For the red and the crisis classes, a discriminative vocabulary was utilized to develop classification rules. The discriminative vocabulary was extracted from red and crisis labeled documents. The extraction of the discriminative vocabulary was implemented with the approach described in (Charton et al., 2013). The relative frequency of each word is computed for each class. Then, the average difference of word frequencies between the red/crisis classes and the green and amber classes is computed. Each word for which the average difference is above an experimentally set threshold is added to the discriminative vocabulary of a given class. After defining the discriminative vocabularies for the red  Table 3: Results obtained on training set and the crisis classes, we utilized up to the five best ranked vocabulary terms to build classification rules based on the appearance of these words in a forum post. The rules were applied on top of the predictions made by the supervised classifiers.

Experiments and Results
We performed a set of experiments to evaluate the usage of different classifiers, feature sets (combining different feature types), as well as the use of CFS, and finally the integration of classification rules to the supervised approach. The system pipeline is implemented as follows: 1. Dataset pre-processing and normalization 2. POS and lemma annotation 3. Feature extraction (POS tags, bigrams, sentiments) 4. CFS filtering of feature sets 5. Generation of documents versus features matrix using selected feature subsets 6. Output of predictions by machine learning based classifiers 7. Re-evaluation of predictions using classification rules    On the CLPsych training data, the best results were obtained by LMT and SMO algorithms trained on bigrams, sentiment features, and specific POS features. Rule-based classification was applied on the predictions, using a subset of 5 discriminative words from the vocabularies of each red and crisis classes. Table 3 presents the results obtained on the training data while Table 4 shows the results obtained on the test data. We submitted 4 runs using the models that performed best on the training data, namely LMT with and without rules (using 5 or 3 words), and a SMO with rules (5 words). None of our approaches found the unique crisis post present in the test. Posts from the crisis class are indeed the most difficult to find since they are rare, but we also explain this by the difference between crisis ratio in the training set (4.18%) and the test set (0.42%). The system performed consistently on the other classes. Our official results are presented in Table 5, and official results for the 16 teams that participated in the task are provided in Table 6.

Conclusion
We presented a system capable of performing automatic triage of forum posts from a mental health online forum. The system assigns to each post a tag that indicates how urgently moderator attention is needed.
The evaluation is based on experiments conducted on the CLPsych 2016 task, and the system is available as an open-source software in the following repository: https://github.com/BigMiners/CLPsych2016 Shared Task