Automatically Identifying Complaints in Social Media

Complaining is a basic speech act regularly used in human and computer mediated communication to express a negative mismatch between reality and expectations in a particular situation. Automatically identifying complaints in social media is of utmost importance for organizations or brands to improve the customer experience or in developing dialogue systems for handling and responding to complaints. In this paper, we introduce the first systematic analysis of complaints in computational linguistics. We collect a new annotated data set of written complaints expressed on Twitter. We present an extensive linguistic analysis of complaining as a speech act in social media and train strong feature-based and neural models of complaints across nine domains achieving a predictive performance of up to 79 F1 using distant supervision.


Introduction
Complaining is a basic speech act used to express a negative mismatch between reality and expectations towards a state of affairs, product, organization or event (Olshtain and Weinbach, 1987). Understanding the expression of complaints in natural language and automatically identifying them is of utmost importance for: (a) linguists to obtain a better understanding of the context, intent and types of complaints on a large scale; (b) psychologists to identify human traits underpinning complaint behavior and expression; (c) organizations and advisers to improve the customer service by identifying and addressing client concerns and issues effectively in real time, especially on social media; (d) developing downstream natural language processing (NLP) applications, such as 1 Data and code is available here: https: //github.com/danielpreotiuc/ complaints-social-media Tweet C S @FC Help hi, I ordered a necklace over a week ago and it still hasn't arrived (...) @BootsUK I love Boots! Shame you're introducing a man tax of 7% in 2018 :( You suck dialogue systems that aim to automatically identify complaints. However, complaining has yet to be studied using computational approaches. The speech act of complaining, as previously defined in linguistics research (Olshtain and Weinbach, 1987) and adopted in this study, has as its core the concept of violated or breached expectations i.e., the person posting the complaint had their favorable expectations breached by a party, usually the one to which the complaint is addressed.
Complaints have been previously analyzed by linguists (Vásquez, 2011) as distinctly different from expressing negative sentiment towards an entity. Key to the definition of complaints is the expression of the breach of expectations. Table 1 shows examples of tweets highlighting the differences between complaints and sentiment. The first example expresses the writer's breach of expectations about an item that was expected to arrive, but does not express negative sentiment toward the entity, while the second shows mixed sentiment and expresses a complaint about a tax that was introduced. The third statement is an insult that implies negative sentiment, but there are not enough cues to indicate any breach of expectations; hence, this cannot be categorized as a complaint.
This paper presents the first extensive analysis of complaints in computational linguistics. Our contributions include: 1. The first publicly available data set of complaints extracted from Twitter with expert annotations spanning nine domains (e.g., software, transport); 2. An extensive quantitative analysis of the syntactic, stylistic and semantic linguistic features distinctive of complaints; 3. Predictive models using a broad range of features and machine learning models, which achieve high predictive performance for identifying complaints in tweets of up to 79 F1; 4. A distant supervision approach to collect data combined with domain adaptation to boost predictive performance.

Related Work
Complaints have to date received significant attention in linguistics and marketing research. Olshtain and Weinbach (1987) provide one of the early definitions of a complaint as when a speaker expects a favorable event to occur or an unfavorable event to be prevented and these expectations are breached. Thus, the discrepancy between the expectations of the complainer and the reality is the key component of identifying complaints.
Complaining is considered to be a distinct speech act, as defined by speech act theory (Austin, 1975;Searle, 1969) which is central to the field of pragmatics. Complaints are either addressed to the party responsible for enabling the breach of expectations (direct complaints) or indirectly mention the party (indirect complaints) (Boxer, 1993b). Complaints are widely considered to be among the face-threatening acts (Brown and Levinson, 1987) -acts that aim to damage the face or self-esteem of the person or entity the act is directed at. The concept of face (Goffman, 1967) represents the public image specific of each person or entity and has two aspects: positive (i.e., the desire to be liked) and negative face (i.e., the desire to not be imposed upon). Complaints can intrinsically threaten both positive and negative face. Positive face of the responsible party is affected by having enabled the breach of expectations. Usually, when a direct complaint is made, the illocutionary function of the complaint is to request for a correction or reparation for these events. Thus, this aims to affect negative face by aiming to impose an action to be undertaken by the responsible party. Complaints usually co-occur with other speech acts such as warnings, threats, suggestions or advice (Olshtain and Weinbach, 1987;Cohen and Olshtain, 1993).
Previous linguistics research has qualitatively examined the types of complaints elicited via discourse completion tests (DCT) (Trosborg, 1995) and in naturally occurring speech (Laforest, 2002). Differences in complaint strategies and expression were studied across cultures (Cohen and Olshtain, 1993) and socio-demographic traits (Boxer, 1993a). In naturally occurring text, the discourse structure of complaints has been studied in letters to editors (Hartford and Mahboob, 2004;Ranosa-Madrunio, 2004). In the area of linguistic studies on computer mediated communication, Vásquez (2011) performed an analysis of 100 negative reviews on TripAdvisor, which showed that complaints in this medium often co-occur with other speech acts including positive and negative remarks, frequently make explicit references to expectations not being met and directly demand a reparation or compensation. Meinl (2013) studied complaints in eBay reviews by annotating 200 reviews in English and German with the speech act sequence that makes up each complaint e.g., warning, annoyance (the annotations are not available publicly or after contacting the authors). Mikolov et al. (2018) analyze which financial complaints submitted to the Consumer Financial Protection Bureau will receive a timely response. Most recently, Yang et al. (2019) studied customer support dialogues and predicted if these complaints will be escalated with a government agency or made public on social media. To the best of our knowledge, the only previous work that tackles a concept defined as a complaint with computational methods is by Zhou and Ganesan (2016) which studies Yelp reviews. However, they define a complaint as a 'sentence with negative connotation with supplemental information'. This definition is not aligned with previous research in linguistics (as presented above) and represents only a minor variation on sentiment analysis. They introduce a data set of complaints, unavailable at the time of this submission, and only perform a qualitative analysis, without building predictive models for identifying complaints.

Data
To date, there is no available data set with annotated complaints as previously defined in linguistics (Olshtain and Weinbach, 1987). Thus, we create a new data set of written utterances annotated with whether they express a complaint. We use Twitter as the data source because (1) it represents a platform with high levels of self-expression; and (2) users directly interact with other users or corporate brand accounts. Tweets are openly available and represent a popular option for data selection in other related tasks such as predicting sentiment (Rosenthal et al., 2017), affect (Mohammad et al., 2018), emotion analysis (Mohammad and Kiritchenko, 2015), sarcasm (González-Ibánez et al., 2011;Bamman and Smith, 2015), stance , text-image relationship (Vempala and Preoţiuc-Pietro, 2019) or irony (Van Hee et al., 2016;Cervone et al., 2017;Van Hee et al., 2018).

Collection
We choose to manually annotate tweets in order to provide a solid benchmark to foster future research on this task.
Complaints represent a minority of the total written posts on Twitter. We use a data sampling method that increases the hit rate of complaints, following previous work on labeling infrequent linguistic phenomena such as irony (Mohammad et al., 2018). Numerous companies use Twitter to provide customer service and address user complaints. We select tweets directed to these accounts as candidates for complaint annotation. We manually assembled a list of 93 customer service handles. Using the Twitter API, 2 we collected all the tweets that are available to download (the most recent 3,200). We then identified all the original tweets to which the customer support handle responded. We randomly sample an equal number of tweets addressed to each customer support handle for annotation. Using this method, we collected 1,971 tweets to which the customer support handles responded.
Further, we have also manually grouped the customer support handles in several high-level domains based on their industry type and area of activity. We have done this to enable analyzing complaints by domain and assess transferability of classifiers across domains. In related work on sentiment analysis, reviews for products from four different domains were collected across domains in a similar fashion (Blitzer et al., 2007). All customer support handles grouped by category are presented in Table 2.
We add to our data set randomly sampled tweets to ensure that there is a more representative and diverse set of tweets for feature analysis and to ensure that the evaluation does not disproportionally contain complaints. We thus additionally sampled 1,478 tweets consisting of two groups of 739 tweets: the first group contains random tweets addressed to any other Twitter handle (at-replies) to match the initial sample, while the second group contains tweets not addressed to a Twitter handle.
As preprocessing, we anonymize all usernames present in the tweet and URLs and replace them with placeholder tokens. To extract the unigrams used as features, we use DLATK, which handles social media content and markup such as emoticons or hashtags (Schwartz et al., 2017). Tweets were filtered for English using langid.py (Lui and Baldwin, 2012) and retweets were excluded.

Annotation
We create a binary annotation task for identifying if a tweet contains a complaint or not. Tweets are short and usually express a single thought. Therefore, we consider the entire tweet as a complaint if it contains at least one complaint speech act. For annotation, we adopt as the guideline a complaint definition similar to that from previous linguistic research (Olshtain and Weinbach, 1987;Cohen and Olshtain, 1993): "A complaint presents a state of affairs which breaches the writer's favorable expectation".
Each tweet was labeled by two independent annotators, authors of the paper, with significant experience in linguistic annotation. After an initial calibration run of 100 tweets (later discarded from the final data set), each annotator labeled all 1,971 tweets independently. The two annotators achieved a Cohen's Kappa κ = 0.731, which is in the upper part of the substantial agreement band (Artstein and Poesio, 2008). Disagreements were discussed and resolved between the annotators. In total, 1,232 tweets (62.4%) are complaints and 739 are not complaints (37.6%). The statistics for each category is in Table 3.

Features
In our analysis and predictive experiments, we use the following groups of features: generic linguistic features proven to perform well in text classification tasks Preoţiuc-Pietro et al., 2017;Volkova and Bell, 2017;Preoţiuc-Pietro and Ungar, 2018) (unigrams, LIWC, word clusters), methods for predict-   Unigrams. We use the bag-of-words approach to represent each tweet as a TF-IDF weighted distribution over the vocabulary consisting of all words present in at least two tweets (2,641 words).
LIWC. Traditional psychology studies use dictionary-based approaches to representing text. The most popular method is based on Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al., 2001) consisting of 73 manually constructed lists of words (Pennebaker et al., 2015) including parts-of-speech, topical or stylistic categories. Each tweet is thus represented as a distribution over these categories. Word2Vec Clusters. An alternative to LIWC for identifying semantic themes in a tweet is to use automatically generated word clusters. These clusters can be thought of as topics i.e., groups of words that are semantically and/or syntactically similar. The clusters help reduce the feature space and provide good interpretability (Lampos et al., 2014;Lampos et al., 2016;Aletras and Chamberlain, 2018). We follow  to compute clusters using spectral clustering (Shi and Malik, 2000) applied to a word-word similarity matrix weighted with the cosine similarity of the corresponding word embedding vectors (Mikolov et al., 2013). The clusters help reduce the feature space and provide good interpretability. 3 For brevity and clarity, we present experiments using 200 clusters as in . We aggregated all the words in a tweet and represent each tweet as a distribution of the fraction of tokens belonging to each cluster. Part-of-Speech Tags. We analyze part-of-speech tag usage to quantify the syntactic patterns associated with complaints and to enhance the representation of unigrams. We part-of-speech tag all tweets using the Twitter model of the Stanford Tagger (Derczynski et al., 2013). In prediction experiments we supplement each unigram feature with their POS tag (e.g., I PRP, bought VBN).
For feature analysis, we represent each tweet as a bag-of-words distribution over part-of-speech unigrams and bigrams in order to uncover regular syntactic patterns specific of complaints. Sentiment & Emotion Models. We use existing sentiment and emotion analysis models to study their relationship to complaint annotations and to measure their predictive power on our complaint data set. If the concepts of negative sentiment and complaint were to coincide, standard sentiment prediction models that have access to larger sets of training data should be very competitive on predicting complaints. We test the following models: • MPQA: We use the MPQA sentiment lexicon (Wiebe et al., 2005) to assign a positive and negative score to each tweet based on the ratio of tokens in a tweet which appear in the positive and negative MPQA lists respectively. These scores are used as features. • NRC: We use the word lexicon derived using crowd-sourcing from Turney, 2010, 2013) for assigning to each tweet the proportion of tokens that have positive, negative and neutral sentiment, as well as one of eight emotions that include the six basic emotions of Ekman (Ekman, 1992) (anger, disgust, fear, joy, sadness and surprise) plus trust and anticipation. All scores are used as features in prediction in order to maximize their predictive power. • Volkova & Bachrach (V&B): We quantify positive, negative and neutral sentiment as well as the six Ekman emotions for each message using the model made available in (Volkova and Bachrach, 2016) and use them as features in predicting complaints. The sentiment model is trained on a data set of 19,555 tweets that combine all previously annotated tweets across seven public data sets. • VADER: We use the outcome of the rule-based sentiment analysis model which has shown very good predictive performance on predicting sentiment in tweets (Gilbert and Hutto, 2014). • Stanford: We quantify sentiment using the Stanford sentiment prediction model as described in (Socher et al., 2013).
Complaint Specific Features. The features in this category are inspired by linguistic aspects specific to complaints (Meinl, 2013): • Request. The illocutionary function of complaints is often that of requesting for a correction or reparation for the event that caused the breach of expectations (Olshtain and Weinbach, 1987). We explicitly predict if an utterance is a request using the model introduced in (Danescu- Niculescu-Mizil et al., 2013).
• Intensifiers. In order to increase the facethreatening effect a complaint has on the complainee, intensifiers are usually used by the person expressing the complaint (Meinl, 2013). We use features derived from: (1) capitalization patterns often used online as an equivalent to shouting (e.g., number/percentage of capitalized words, number/percentage of words starting with capitals, number/percentage of capitalized letters); and (2) repetitions of exclamation marks, question marks or letters within the same token.
• Temporal References. Temporal references are often used in complaints to stress how long a complainer has been waiting for a correction or reparation from the addressee or to provide context for their complaint (e.g., mentioning the date in which they have bought an item) (Meinl, 2013). We identify time expressions in tweets using SynTime, which achieved state-of-the-art results across on several benchmark data sets (Zhong et al., 2017). We represent temporal expressions both as days elapsed relative to the day of the post and in buckets of different granularities (one day, week, month, year).
• Pronoun Types. Pronouns are used in complaints to reveal the personal involvement or opinion of the complainer and intensify or reduce the face-threat of the complaint based on the person or type of the pronoun (Claridge, 2007;Meinl, 2013). We split pronouns using dictionaries into: first person, second person, third person, demonstrative (e.g., this) and indefinite (e.g., everybody).

Linguistic Feature Analysis
This section presents a quantitative analysis of the linguistic features distinctive of tweets containing complains in order to gain linguistic insight into this task and data. We perform analysis of all previously described feature sets using univariate Pearson correlation (Schwartz et al., 2013). We compute correlations independently for each feature between its distribution across messages (features are first normalized to sum up to unit for each message) and a variable encoding if the tweet was annotated as a complaint or not.
Top unigrams and part-of-speech features specific of complaints and non-complaints are presented in  categories and Word2Vec topics are presented in Table 5. All correlations shown in these tables are statistically significant at p < .01, with Simes correction for multiple comparisons.
Negations. Negations are uncovered through unigrams (not, no, won't) and the top LIWC category (NEGATE). Central to complaining is the concept of breached expectations. Hence the complainers use negations to express this discrepancy and to describe their experience with the product or service that caused this. Issues. Several unigrams (error, issue, working, fix) and a cluster (Issues) contain words referring to issues or errors. However, words regularly describing negative sentiment or emotions are not one of the most distinctive features for complaints. On the other hand, the presence of terms that show positive sentiment or emotions (good, great, win, POSEMO, AFFECT, ASSENT) are among the top most distinctive features for a tweet not being la-beled as a complaint. In addition, other words and clusters expressing positive states such as gratitude (thank, great, love) or laughter (lol) are also distinctive for tweets that are not complaints.
Linguistics research on complaints in longer documents identified that complaints are likely to co-occur with other speech acts, including with expressions of positive or negative emotions (Vásquez, 2011). In our data set, perhaps due to the particular nature of Twitter communication and the character limit, complainers are much more likely to not express positive sentiment in a complaint and do not regularly post negative sentiment. Instead, they choose to focus more on describing the issue regarding the service or product in an attempt to have it resolved. Pronouns. Across unigrams, part-of-speech patterns and word clusters, we see a distinctive pattern emerging around pronoun usage. Complaints use more possessive pronouns, indicating that the user is describing personal experiences. A distinctive part-of-speech pattern common in complaints is possessive pronouns followed by nouns (PRP$ NN) which refer to items of services possessed by the complainer (e.g., my account, my order). Complaints tend to not contain personal pronouns (he, she, it, him, you, SHEHE, MALE, FE-MALE), as the focus on expressing the complaint is on the self and the party the complaint is addressed to and not other third parties. Punctuation. Question marks are distinctive of complaints, as many complaints are formulated as questions to the responsible party (e.g., why is this not working?, when will I get my response?). Complaints are not usually accompanied by exclamation marks. Although exclamation marks are regularly used for emphasis in the context of complaints, most complainers in our data set prefer not to use them perhaps in an attempt to address them in a less confrontational manner. Temporal References. Mentions of time are specific of complaints (been, still, on, days, Temporal References cluster). Their presence is usually needed to provide context for the event that caused the breach of expectations. Another role of temporal references is to express dissatisfaction towards non-responsiveness of the responsible party in addressing their previous requests. In addition, the presence of verbs in past participle (VBN) is the most distinctive part-of-speech pattern of complaints. These are used to describe actions com-  NEGATE not,no,can't,don't,never,nothing,doesn't,won't .271 POSEMO thanks,love,thank,good,great,support,lol,win .185 RELATIV in,on,when,at,out,still,now,up,back,new .225 AFFECT thanks,love,thank,good,great,support,lol .111 FUNCTION the,i,to,a,my,and,you,for Table 5: Group text features associated with tweets that are complaints and not complaints. Features are sorted by Pearson correlation (r) between their each feature's normalized frequency and the outcome. We restrict to only the top six categories for each feature type. All correlations are significant at p < .01, two-tailed t-test, Simes corrected. Within each cluster, words are sorted by frequency in our data set. Labels for Word2Vec clusters are assigned by the authors.
pleted in the past (e.g., i've bought, have come) in order to provide context for the complaint.
Verbs. Several part-of-speech patterns distinctive of complaints involve present verbs in third person singular (VBZ). In general, these verbs are used in complaints to reference an action that the author expects to happen, but his expectations are breached (e.g., nobody is answering). Verbs in gerund or present participle are used as a complaint strategy to describe things that just happened to a user (e.g., got an email saying my service will be terminated).
Topics. General topics typical of complaint tweets include requiring assistance or customer support. Several groups of words are much more likely to appear in a complaint, although not used to express complaints per se: about orders or deliveries (in the retail domain), about access (in complaints to service providers) and about parts of tech products (in tech). This is natural, as people are more likely to deliberately tweet about an order or tech parts if they want to complain about them. This is similar to sentiment analysis, where not only emotionally valenced words are predictive of sentiment.

Predicting Complaints
In this section, we experiment with different approaches to build predictive models of complaints from text content alone. We first experiment with feature based approaches including Logistic Regression classification with Elastic Net regularization (LR) (Zou and Hastie, 2005). 4 We train the classifiers with all individual feature types.
Neural Methods. For reference, we experiment with two neural architectures. In both architectures, tweets are represented as sequences of onehot word vectors which are first mapped into embeddings. A multi-layer perceptron (MLP) network (Hornik et al., 1989) feeds the embedded representation (E = 200) of the tweet (mean embedding of its constituent words) into a dense hidden layer (D = 100) followed by a ReLU activation function and dropout (0.2). The output layer is one dimensional dense layer with a sigmoid activation function. The second architecture, a Long-Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) network, processes sequentially the tweet by modeling one word (embedding) at each time step followed by the same output layer as in MLP. The size of the hidden state of the LSTM is L = 50. We train the networks using the Adam optimizer (Kingma and Ba, 2014) (learning rate is set to 0.01) by minimizing the binary cross-entropy. Experimental Setup. We conduct experiments using a nested stratified 10-fold cross-validation, where nine folds are used for training and one for testing (i.e., outer loop). In the inner loop, we choose the model parameters 5 using a 3fold cross-validation on the tweets from the nine folds of training data (from the outer loop). Train/dev/test splits for each experiment are released together with the data for replicability. We report predictive performance of the models as the mean accuracy, F1 (macro-averaged) and ROC AUC over the 10 folds (Dietterich, 1998).  Results. Results are presented in Table 6. Most sentiment analysis models show accuracy above chance in predicting complaints. The best results are obtained by the Volkova & Bachrach model (Sentiment -V&B) which achieves 60 F1. However, models trained using linguistic features on the training data obtain significantly higher predictive accuracy. Complaint specific features are predictive of complaints, but to a smaller extent than sentiment, reaching an overall 55.2 F1. From this group of features, the most predictive groups are intensifiers and downgraders. Syntactic part-ofspeech features alone obtain higher performance than any sentiment or complaint feature group, showing the syntactic patterns discussed in the previous section hold high predictive accuracy for the task. The topical features such as the LIWC dictionaries (which combine syntactic and semantic information) and Word2Vec topics perform in the same range as the part of speech tags. However, best predictive performance is obtained using bag-of-word features, reaching an F1 of up to 77.5 and AUC of 0.866. Further, combining all features boosts predictive accuracy to 78 F1 and 0.864 AUC. We notice that neural network approaches are comparable, but do not outperform the best performing feature-based model, likely in part due to the training data size.  Distant Supervision. We explore the idea of identifying extra complaint data using distant supervision to further boost predictive performance. Previous work has demonstrated improvements on related tasks relying on weak supervision e.g., in the form of tweets with related hashtags (Bamman and Smith, 2015;Volkova and Bachrach, 2016;Cliche, 2017). Following the same procedure, seven hashtags were identified with the help of the training data to likely correspond to complaints: #appallingcustomercare, #badbusiness, #badcustomerserivice, #badservice, #lostbusiness, #unhappycustomer, #worstbrand. Tweets were collected to contain these hashtags from a combination of the 1% Twitter archive between 2012-2018 and by filtering tweets with these hashtags in real-time from Twitter REST API for three months. We collected in total 18,218 tweets (excluding retweets and duplicates) equated to complaints. As negative complaint examples, the same amount of tweets were sampled randomly from the same time interval. All hashtags were removed and the data was preprocessed identically as the annotated data set.
We experiment with two techniques for combining distantly supervised data with our annotated data. First, the tweets obtained through distant supervision are simply added to the annotated training data in each fold (Pooling). Secondly, as important signal may be washed out if the features are joined across both domains, we experiment with domain adaptation using the popular EasyAdapt algorithm (Daumé III, 2007) (EasyAdapt). Experiments use logistic regression with bag-of-word features enhanced with part-ofspeech tags, because these performed best in the previous experiment. Table 7 show that the domain adaptation approach further boosts F1 by 1 point to 79 (t-test, p<0.5) and ROC AUC by 0.012. However, simply pooling the data actually hurts predictive performance leading to a drop of more than 2 points in F1.

Domain Experiments
We assess the performance of models trained using the best method and features by using in training: (1) using only indomain data (In-Domain); (2) adding out-ofdomain data into the training set (Pooling); and (3) combining in-and out-of-domain data with EasyAdapt domain adaptation (EasyAdapt). The experimental setup is identical to the one described in the previous experiments. Table 8 shows the model performance in macro-averaged F1 using the best performing feature set. Results show that, in all but one case, adding out-of-domain data helps predictive performance. The apparel domain is qualitatively very different from the others as a large number of complaints are about returns or the company not stocking items, hence leading to different features being important for prediction. Domain adaptation is beneficial the majority of domains, lowering performance on a single domain compared to data pooling. This highlights the differences in expressing complaints across domains. Overall, predictive performance is high across all domains, with the exception of transport.

Cross Domain Experiments
Finally, Table 9 presents the results of models trained on tweets from one domain and tested on all tweets from other domains, with additional models trained on tweets from all domains except the one that the model is tested on.
We observe that predictive performance is relatively consistent across all domains with two exceptions ('Food & Beverage' consistently shows lower performance, while 'Other' achieves higher performance) when using all the data available from the other domains.

Conclusions & Future Work
We presented the first computational approach using methods from computational linguistics and machine learning to modeling complaints as de-  fined in prior studies in linguistics and pragmatics (Olshtain and Weinbach, 1987). To this end, we introduced the first data set consisting of English Twitter posts annotated with complaints across nine domains. We analyzed the syntactic patterns and linguistic markers specific of complaints. Then, we built predictive models of complaints in tweets using a wide range of features reaching up to 79% Macro F1 (0.885 AUC) and conducted experiments using distant supervision and domain adaptation to boost predictive performance. We studied performance of complaint prediction models on each individual domain and presented results with a domain adaptation approach which overall improves predictive accuracy. All data and code is available to the research community to foster further research on complaints. A predictive model for identification of complaints is useful to companies that wish to automatically gather and analyze complaints about a particular event or product. This would allow them to improve efficiency in customer service or to more cheaply gauge popular opinion in a timely manner in order to identify common issues around a product launch or policy proposal.
In the future, we plan to identify the target of the complaint in a similar way to aspect-based sentiment analysis (Pontiki et al., 2016). We plan to use additional context and conversational structure to improve performance and identify the sociodemographic covariates of expressing and phrasing complaints. Another research direction is to study the role of complaints in personal conversation or in the political domain, e.g., predicting political stance in elections (Tsakalidis et al., 2018).