Personality Traits Recognition in Literary Texts

Interesting stories often are built around interesting characters. Finding and detailing what makes an interesting character is a real challenge, but certainly a significant cue is the character personality traits. Our exploratory work tests the adaptability of the current personality traits theories to literal characters, focusing on the analysis of utterances in theatre scripts. And, at the opposite, we try to find significant traits for interesting characters. The preliminary results demonstrate that our approach is reasonable. Using machine learning for gaining insight into the personality traits of fictional characters can make sense.


Introduction
The availability of texts produced by people using the modern communication means can give an important insight in personality profiling. And computational linguistic community has been quite active in this topic. In this paper we want to explore the use of the techniques and tools nowadays used for user generated content, for the analysis of literary characters in books and plays. In particular we will focus on the analysis of speech utterances in theatre scripts. Dialogues in theatre plays are quite easy to collect (i.e. the characters are explicitly stated in the scripts) without the need of lengthy and costly manual annotation.
Of course the style of the language in social media is very different. For example, usually the user generated content is quite short in length, not always with the correct spelling and correct in syntax, and nowadays full of emoticons. On the other hand we can expect that authors of great theatre masterpieces (e.g. Shakespeare) had exceptional skill in rendering the personality traits of the characters, just only through dialogues among them.
Computational linguistics exploits different frameworks for the classification of psychological traits. In particular the Five Factor Model (Big5) is often used. Advantages and drawbacks of those frameworks are well-known. A good reference on this is the work by Lee and Ashton (2004). We are interested in a broad, exploitative classification and do not endorse a model over the others. We choose to use the Big Five model because the gold labeled dataset we exploited was built using this framework.

Literature review
To our knowledge, there is little ongoing research on personality traits recognition in literary texts. Most of the works in literary text is focused on other aspects such as author attribution, stylometry, plagiarism detection. Regarding personality traits recognition, the used datasets are often collected from modern communication means, e.g. messages posted in social media.
Indeed there is interest in using modern NLP tools in literary texts, for example Grayson et al. (2016) use word embeddings for analyzing literature, Boyd (2017) describes the current status and tool for psychological text analysis, Flekova and Gurevych (2015) profile fictional characters, Liu et al. (2018) conduct a traits analysis of two fictional characters in a Chinese novel.
The use of the Five Factor Model for literature is explained in McCrae et al. (2012). Bamman et al. (2014) consider the problem of automatically inferring latent character types in a collection of English novels. Bamman et al. (2013) present a new dataset for the text-driven analysis of film. Then they present some latent variable models for learning character types in movies. Vala et al. (2015) propose a novel technique for character detection, achieving significant improvements over state of the art on multiple datasets.

Model Building and Evaluation
We approached the problem as a supervised learning problem, using a labeled dataset and then transferring the result to our dataset.
In literary studies it is difficult to find a classification of characters according to some model of personality. Literary critics often prefer to go deeper into analyzing a character rather than putting her/him in an simple categories.
At the basis of our model, and in general in the framework we mentioned there is a lexical hypothesis: we are, at least to some extent, allowed to infer personality traits from the language and from words. From a psychological point of view, a support to the lexical hypothesis is in Ashton and Lee (2005). Our concern is also if those models can be applied to theatrical scripts, where everything is faked (and thus false) to be real (true). A crucial role is played by author's expertise in knowing how to render in the scripts the psychological traits of the characters.

Big5 Dataset with Gold Labels
As a labelled dataset, we used "essays", originally from Pennebaker and King (1999). "Essays" is a large dataset of stream-of-consciousness texts (about 2400, one for each author/user), collected between 1997 and 2004 and labelled with personality classes. Texts have been produced by students who took the Big5 questionnaires. The labels, that are self-assessments, are derived by zscores computed by Mairesse et al. (2007) and converted from scores to nominal classes by Celli et al. (2013) with a median split 1 .
The main reason behind the usage of this dataset is that is the only one containing gold labels suitable for our task. For sure the fact that the material does not match perfectly with literary text can pose some issues, discussed later in Section 3.

Literary Dataset Building and Validation
The proposed task is to recognize the personality of a character of a literary text, by the word s/he says. Theatre play scripts is probably the easiest type of literary text from which to extract characters' dialogues.
The name of the character speaking, following an old established convention, is at the start of the line, usually in a bold typeface, and after a colon ":" or a dot "." the text of the utterance follows until another character takes the turn or the play, act or scene ends.
An excerpt from William Shakespeare, Hamlet, Act III, Scene 4 shows the patterns: Hamlet. Do you see nothing there?
Gertrude. Nothing at all; yet all that is I see.
Hamlet. Nor did you nothing hear?
Gertrude. No, nothing but ourselves.
[. . . ] Our first candidate dataset was the Shakespeare Corpus in NLTK by Bird et al. (2009) that consist of several tragedies and comedies of Shakespeare well formatted in the XML format. However the the Shakespeare Corpus in NLTK is only a fraction of Shakespeare's plays. To collect more data we looked for a larger corpus. Open Source Shakespeare (OSS) contains all the 38 works 2 (some split in parts) commonly attributed to William Shakespeare 3 , in a format good enough to easily parse dialogue structure In our model a character is associated to all her/his turns as a single document. This is a simplified view but good enough as a starting point. One of the main consequences of this is a sort of flattening of the characters and the missing of the utterances said together at the same time by two characters. A quick check did not spot this type of event for two or more named characters. Very seldom, there are some references to the character "All" that mean all the characters on the stage together.
We know in advance that our models to be based on common words between the corpora, so we quickly checked the total lemma in commons that is 6755 over the two different corpora with roughly 60000 words each.

Modeling
We started working on our model using the Scikitlearn toolkit to do Machine Learning in Python (Pedregosa et al., 2011).
The initial task was to get reasonable performance on the "essay" dataset.
The problem falls in the class of multi-output labels. For simplicity each label (corresponding to a personal trait) can be treated as independent, partitioning the problem in 5 classification problems.
Starting from a simple bag of words model, we added to the features the output of TextPro (Pianta et al., 2008) for the columns: pos, chunk, entity, and tokentype 4 .
The possible values of those columns are categorical variables that can be counted for each character in order to build a feature for the model.
Following the suggestion from (Celli et al., 2013(Celli et al., , 2014, our model was built as a pipeline incorporating both the bag of word model and the output of TextPro. We acknowledge that a lot of tweaking is possible for improving the performance of a model (such as building a different model for each trait, or use different features or classifier). However that was not our primary scope.

Testing the model
The OSS dataset missed some features used in the training and testing of the original model. We solved the issue by adding the required features with the initial value of 0. Since the feature are related to countable occurrences or related to ratios, this operation is safe.
We briefly discuss a couple of models. In Table 1 are reported the results for a simple model that uses the bag-of-word concept and with some POS tagging extracts the lemmas. By adding the features obtained with TextPro as described in subsection 2.3 we gained some score for most of the traits, for the weighted average of each trait our results are comparable to the ones reported by Celli et al. (2013).
The results shown in Table 2 report the best results. Going deeper into commenting the feature engineering and comparing the models should give us insight for understanding the linguistic features related to personality. This requires further knowledge beyond the scope of the current work and it leaves the path open to future explorations.

Results and Discussion
As with our models on a known dataset we got state of the art performance, we tried to apply the classifier on the Shakespeare's plays dataset. Table 3 reports the results for the most verbose speakers of a selected list of plays.  We do not have a gold labeled dataset to confront with. But a quick look at the result table for the best known (at least to us) characters of the Shakespeare's plays reveals some traits in common for characters that are at the opposite, like the protagonist and the antagonist. This is the case for "Hamlet", the most verbose characters seem to have similar traits. We are glad that Portia and Antonio in "Merchant of Venice" display conscientiousness and Shylock neuroticism, as our shallow knowledge of the play remind us. A vertical look reveals low variability for the Agreeableness, Extraversion and Openness traits. Intuitively we acknowledge that something strange is happening here. Those traits are for sure related to the selfexpression, something that a character is forced to do in a play. A model with numerical scores instead of boolean values would have offered some guidance here. In general we think that there are some reasons for the drawbacks of the model. We detail them in the following paragraphs.
A possible explanation is that the Big Five model, and/or our implementation does not capture the personality traits of the characters that are at the level that is needed. In other words, the model is too broad. Indeed, the Big Five Model has also the sub-classification for "Facets", but we do not know any public available gold standard dataset labeled at facet level.
The idea that the Big 5 is quite limited is not new. For example Paunonen and Jackson (2000) sustain that the following categories are to be considered outliers of the Big 5: a) Religious, devout, reverent; b) Sly, deceptive, manipulative; c) Honest, ethical, moral; d) Sexy, sensual, erotic; e) Thrifty, frugal, miserly; f) Conservative, traditional, down-to-earth; g) Masculine-feminine; h) Egotistical, conceited, snobbish; i) Humorous, witty, amusing; j) Risk-taking or Thrill-seeking.
Indeed those categories seem more appropriate to describe some of the characters in Shakespeare's scripts.
The essays dataset does not matches the OSS dataset along a number of dimensions. The most relevant ones that come to our mind are: purpose, grammar and mistakes and language diacronicity. Purpose: the essay is stream of consciousness, written once, probably never checked, by the author, the OSS is high quality writing, mostly where people speak with others. Grammar and mistakes: we think that the OSS corpus contains low rate of spelling errors, and the formulation of the sentences should be almost always correct, due to the nature of the corpus, but of course is another thing to check. For sure also English grammar changed, so additional caveats may apply. Language diacronicity: Shakespeare's English is not today's English. To what extended this has an impact need to be verified.
Usually the personality traits are considered stable. But the need of creating tension and drama in a play may also imply some evolution in the personality of the character. A possible insight on this could come from a dispersion plot of the character traits all along the play, maybe with different granularity (one for each utterance, one for each scene). The dispersion plot should highlight such changes.
By following the Aristotle's Rules (Aristoteles, 1998) the playwright may set the dramatic tone for an entire scene, and the personality traits coherence may be sacrificed for this. The previously mentioned dispersion plot could show if such changes are aligned with the scenes or the acts. Since the acts usually are situated in distant time settings is reasonable to expect a change in personality traits due to the development of the character.
Our assumption that the characters' words capture the personality traits could be not always correct. Especially for plays where there are a lot of lies, alliances and breakages. Additional analysis taking care of the persons whom the speech is directed may discover that the personality traits change in relation of the recipient.
For sure there are different ways to write and perform a play. We choose Shakespeare's scripts because they are a classical resource and they rely a lot on the dialogues. But discarding the actions may have caused the loss of information, hindering the discover of the personality trait. Indeed, an advice for newcomers in drama writing is to build characters by showing their actions.
Two different characters (e.g., a king and a jester) can say the same thing, with a total different meaning. What differentiates them is the reaction of the others on the stage. A thorough modelling should take into account also this type of event. Intuitively this is used in comedies. Two different characters say the same thing to a third person in a short span of time. The second time the audience will anticipate the utterance in mind and will be delighted of the effect. In general a more detailed investigation has to be done to highlight different traits characterization in comedies and tragedies.
In conclusion the present work describes a first step towards capturing the personality traits of the characters in literary texts, in particular in the Shakespeare's theatre scripts. We discussed the results and the limitation of our approach and envisioned possible mitigations or solutions that require more research and dedication.