Unravelling Names of Fictional Characters

In this paper we explore the correlation be-tween the sound of words and their meaning, by testing if the polarity (‘good guy’ or ‘bad guy’) of a character’s role in a work of ﬁction can be predicted by the name of the character in the absence of any other context. Our approach is based on phonological and other features proposed in prior theoretical studies of ﬁc-tional names. These features are used to construct a predictive model over a manually annotated corpus of characters from motion pictures. By experimenting with different mixtures of features, we identify phonological features as being the most discriminative by comparison to social and other types of features, and we delve into a discussion of speciﬁc phonological and phonotactic indicators of a character’s role’s polarity.


Introduction
Could it be possible for fictional characters' names such as 'Dr. No' and 'Hannibal Lecter' to be attributed to positive characters whereas names such as 'Jane Eyre' and 'Mary Poppins' to negative ones? Could someone guess who is the hero and who is the competitor based only on the name of the character and what would be the factors that contribute to such intuition? Literary theory suggests that it should be possible, because fictional character names function as expressions of experience, ethos, teleology, values, culture, ideology, and attitudes of the character.
However, work in literary theory, psychology, linguistics and philosophy has studied fictional names by analysing individual works or small clusters of closely related works, such as those of a particular author. By contrast, we apply tools from computational linguistics at a larger scale aiming to identify more general patterns that are not tied to any specific creator's idiosyncrasies and preferences; in the hope that extracting such patterns can provide valuable insights about how the sound of names and, more generally, words correlates with their meaning.
At the core of our approach is the idea that the names of fictional characters follow (possibly subconsciously) a perception of what a positive or a negative name ought to sound like that is shared between the creator and the audience. Naturally the personal preferences or experiences of the creator might add noise, but fictional characters' names will at least not suffer (or suffer less) from the systematic cultural bias bound to exist in real persons' names.
In the remainder of this paper, we first present the relevant background, including both theoretical work and computational work relevant to peoples' names (Section 2). Based on this theoretical work, we then proceed to formulate a set of features that can be computationally extracted from names, and which we hypothesise to be discriminative enough to allow for the construction of a model that accurately predicts whether a character plays a positive or negative role in a work of fiction (Section 3). In order to test this hypothesis, we constructed a corpus of characters from popular English-language motion pictures. After describing corpus construction and presenting results (Section 4), we proceed to discuss these results (Section 5) and conclude (Section 6).

Onomastics
The procedure of naming an individual, a location or an object is of particular importance and serves 2154 purposes beyond the obvious purpose of referring to distinct entities. Characteristics such as place of origin, gender, and socioeconomic status can often be guessed from the name or nickname that has been attributed to an individual. Onomastics, the study of the origin, history, and use of proper names has attracted scholarly attention as early as antiquity and Plato's 'Cratylos' (Hajdú, 1980).
In fiction and art, in particular, names are chosen or invented without having to follow the naming conventions that are common in many cultures. This allows creators to apply other criteria in selecting a name for their characters, one of which being the intuitions and preconceptions about the character that the name alone implies to the audience. Black and Wilcox (2011) note that writers take informed and careful decisions when attributing names to their characters. Specifically, while care is taken to have names that are easily identifiable and phonologically attractive, or that are important for personal reasons, these are not the only considerations: names are chosen so that they match the personality, the past, and the cultural background of a character.
According to Algeo (2010) behind each name lies a story while Ashley (2003) suggests that a literary name must be treated as a small poem with all the wealth of information that implies. Markey (1982) and Nicolaisen (2008) raised concerns on whether onomastics can be applied to names in art given the different functional roles of names as well as their intrinsic characteristics, namely sensitivity and creativity. 'Redende namen' (significant names) is a widespread theory that seeks the relationship between name and form (Rudnyckyj, 1959). According to this theory, there is a close relationship between the form of a name and its role. This consideration is still prevalent to date as shown by Chen (2008) in her analysis of names in comic books, where names transparently convey the intentions of the creator for the role of each character. Another concern is whether the study of literary names should be examined individually for each creative work or if generalizations can be made (Butler, 2013). However, the scope of most studies is limited to individual projects or creators, creating an opportunity for computational methods that can identify generalizations and patterns across larger bodies of literary work than what is manually feasible.

Related Work
Although serving radically different purposes and applications than our investigation, various methods for the computational analysis of proper nouns have been developed in natural language processing. Without a doubt, some of the oldest and most mature technologies that exploit the properties of proper nouns are those addressing named entity recognition and categorization (NERC). In this direction, there is a recently ongoing effort for the extension of NERC tools so that they cover the needs of literary texts (Borin et al., 2007;Volk et al., 2009;Kokkinakis and Malm, 2011).
Moving beyond recognition, effort has been made to explore characteristics and relationships of literary characters (Nastase et al., 2007). Typically, however, these efforts take advantage of the context, and very little work tries to extract characteristics of literary characters from their names alone. One example is the application of language identification methods in order to extract the cultural background of proper names (Konstantopoulos, 2007;Bhargava and Kondrak, 2010;Florou and Konstantopoulos, 2011). This work showed that people's names in isolation are more amenable to language identification than common nouns. Konstantopoulos (2007), in particular, reports inconclusive results at pinpointing the discriminative features that are present in people's names but not in other words.
Another relatively recent and related research direction that does not focus on proper nouns investigates elements of euphony mostly by examining phonetic devices. The focus is to identify how the sound of words can foster its effectiveness in terms of persuasion (Guerini et al., 2015) or memorability (Danescu-Niculescu-Mizil et al., 2012).

Approach
These earlier attempts relied on the examination of predictive models of n-grams in order to identify the n-grams that are the best discriminants. The aim was that by inspecting these most discriminative n-grams, meaningful patterns would emerge and serve as the vehicle for formulating hypotheses about the correlation between what names sound like and the cultural background of the persons bearing them.
This approaches largely ignored the background in onomastics and literary research. By contrast, we exploit this prior body of theoretical work  to define more sophisticated features that directly correspond to theoretical hypotheses. Our empirical experiments are now aimed at identifying the features (and thus hypotheses) that are the most discriminative, rather than at hoping that a coherent hypothesis can be formulated by observing patterns in n-gram features.
In the remainder of this section, we will present these hypotheses and the machine-extracted features that reflect them. The features are also collected in Table 1.

Emotions
Hypothesis 1 The (positive or negative) polarity of the sentiment that a character's name evokes is associated with the polarity of the character's role.
The understanding of how the language transmits emotions has attracted significant research attention in the field of Computational Linguistics. Most of the relevant literature is directed towards calculating sentiment for units at the document or sentence level. These works are usually boosted by semantic dictionaries that provide information about the emotional hue of concepts such as the Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al., 2001), the Harvard General Inquirer (Stone et al., 1966), the WordNet Affect (Strapparava and Valitutti, 2004) and SentiWordNet (Esuli and Sebastiani, 2006). In our task, the absence of context and the inherent arbitrariness in naming (even in fictional names) increases the difficulty in conveying emotional quality to names. More specifically, the intriguing part was to associate fictional names with concepts from a semantic sentiment resource in order to approximate a sentiment value. To achieve this we used Sen-tiWordNet: a linguistic resource that has derived from the annotation of WordNet synsets according to the estimated degree of positive, negative or neutral hue. The overall valence for a given name is calculated as the sum of the valence of its elements (first name, surname). The valence of each name element is the average valence of all SentiWordNet concepts that are associated with it. To associate a name element and a SentiWordNet concept we used the Soundex phonetic distance and the Levenshtein lexicographic distance (Levenshtein, 1966). A heuristic threshold is used to decide whether a name and a SentiWordNet concept are associated.
More formally, the valence val(n) of a name n comprising name elements e i is calculated as follows: where ass S (·) is the set of SentiWordNet concepts that are Soundex-associated with the given name element, ass L (·) the set of SentiWordNet concepts that are Levenshtein-associated with the given name element, and swn(·) the valence assigned to the given concept by SentiWordNet.

Stylistic and poetic features
Hypothesis 2 Assuming Ashley's (2003) and Butler's (2013) position that 'a name can be a whole "poem" in as little as a single word' we assume that stylistic features usually found in poems can be extracted from the names of fictional charac-ters, and that such features correlate with the polarity of their roles.
The first quantitative analysis efforts of the poetic style can be found in the 1940s and in the study of the poet and literary critic Josephine Miles (1946;1967) where she studied the features of poems over time. Despite the great contribution of this work and others that followed, the creation of a framework for quantitative poetic style analysis remained limited to a small number of poems and much of the work was done manually. The work of Kaplan and Blei (2007) is an attempt to automate and analyze large volumes of poems exploring phonological, spelling and syntax features. For our work, we identified the following poetic devices that can be applied to isolated names: • Alliteration: a stylistic literary device identified by the repeated sound of the first consonant in a series of multiple words, or the repetition of the same sounds of the same kinds of sounds at the beginning of words or in stressed syllables of a phrase. Examples: Peter Parker, Peter Pan Linguistic theory widely adopts the concept of arbitrary relationship between the signifier and the signified (de Saussure, 1916Saussure, 1983Jakobson, 1965). However, an increasing volume of works in various fields investigates the existence of nonarbitrary relations between phonological representation and semantics, a phenomenon known as phonological iconicity. Standing from the side of Computational Linguistics and with the intuition that in fictional names the correlation between a word's form and the emotion it expresses will be stronger, we examined a wide range of phonologyrelated features, shown in Table 1. It should be noted that these features are extracted from the phonetic representation of names derived by applying the spelling-to-phoneme module of the espeak speech synthesizer. 1

Sociolinguistic features
Hypothesis 4 We hypothesize that social aspects of names -such as frequency of use or use of foreign names in a given environment -can relate to role of a fictional character. For instance, a 'girl next door' role is more likely to be assigned a very popular female name than a name that sounds hostile or foreign.
The frequency of names in U.S.A was calculated based on the Social Security Death Index (SSDI), a publicly available database that records deaths of U.S.A citizens since 1936. 2 The same dataset was also used to build a model for recognizing foreignlooking names. More specifically, we trained ngram language models of order 2-5 against the dataset for both orthographic and phonetic representation using the berkeleylm library (Pauls and Klein, 2011). We then heuristically defined a threshold that correlates well with foreign-looking suffixes. Analogously with the name frequency we extract the gender of each name using a baby names dataset that includes gender information. 3 For unisex names the prevalent gender was picked. Finally, honorific titles (e.g. Professor, Phd, Mr, Mrs etc.) were also extracted from names. Honorific titles are intriguing due to their ambiguous meaning since they can express respect and irony in different contexts.

Domain features
Hypothesis 5 We pursued indications to check if domain-related features such as the appearance time of a character in a movie, the movie title or the movie genre is associated (correlates) with the problem under study.
In this category lies the f eature sameastitle since anyone with a quick glance in a list of films would notice that a fictional name often consists of, or is the part of, the movie title, as in, There's Something about Mary, Hannibal, Thelma & Louise, Rocky, etc. On IMDB character names are presented in the form of a list in descending order based on screen credits. In the f eature creditindex we want to check if the naming process is more assiduous for the roles of protagonists based on this list. In the same direction, we examine the f eature genre for a possible correlation between the role of a character and the genre of a film.

Data Collection and Annotation
In order to validate our approach, we first need a corpus of names of fictional characters, annotated with the polarity of their role. As such a resource does not exist to the best of our knowledge, we have created it for the purposes of the work described here.
Our decision to use motion pictures rather than other fictional work is motivated by the relative ease of finding annotators familiar with the plot of these works, so that we could get reliable annotations of the polarity of the leading roles. We compiled a list of 409 movies based on the following criteria: • That they are widely known films, covering all genres of film production. We automatically crosschecked if the candidate movies are included in DBPedia 4 and YAGO 5 , as these are indicators that the films are known to the general public.
• That they have received some award or are positively evaluated by users (i.e., have an IMDB rating of 5.0 or higher). The underlying assumption is that this criterion selects major productions where care has been given to even the most minute detail, including the names of the major characters and what these names connote to the audience.
• That they are recent productions, so that annotators can easily recall the plot and the characters.
We then asked volunteers to select any movie from the list that they where very familiar with, and to assign one of positive, negative or neutral to the top-most characters in the credits list, working only as far down the credits list as they felt confident to. The three categories were defined as follows in the annotation guidelines: • Negative: when the role of a character left a negative impression on you when you saw the movie.
• Neutral: when the role of the character is important for the plot, but you are in doubt or cannot recall whether it was a positive or a negative role.
Neutral tags are ignored in our experiments. They were foreseen only to allow annotators to skip characters and still have a sense of accomplishment, so that they only make choices that they are confident with. We used the Hypothes.is 6 open source annotation application. The annotation was carried out by having volunteers install the Hypothes.is Web browser extension and then visit the IMDB 7 page of any of the movies on our list (direct links were provided to them in the guidelines). IMDB was chosen due to its popularity, so that annotators would already be familiar with the online environment. The annotators tagged the character names directly on the IMDB page and the annotations where collected for us by Hypothes.is (Figure 1).
Eight annotators participated in the procedure and provided 1102 positive and 434 negative tags for characters of 202 movies, out of the 409 movies in the original list. Table 2 gives the annotation distribution per movie genre.
The reliability of the annotated collection by means of inter-rater agreement was also measured. For this purpose, various standard agreement measures (Meyer et al., 2014) were calculated, all showing very high agreement among the annotators (Table 3). This demonstrates that the annota-   Table 3: Inter-annotator agreement tion task is well-formulated, but does not guarantee that our classification task is consistent, since the latter will use different information than that used by the annotators. That is to say, the annotators had access to their understanding of the movies' plot to carry out the task, whereas our classification task will be performed over the characters' names alone.
The collection is publicly available, including the guidelines and instructions to the annotators, the source code for the annotation tool, and the source code for the tool that compiles Weka ARFF files from the JSON output of the annotation tool. 8

Experimental Design
The experimental design consisted of an iterated approach performing experiments with different sets of features. This process was driven by a preliminary chi-squared analysis in order to exploit feature significance. The algorithms that are used for the experiments are Naive Bayes and J48 8 https://bitbucket.org/ dataengineering/fictionalnames  (Salzberg, 1994) decision trees. Each experiment is done using a 10-fold cross validation on the available data, using a confidence factor of 0.25 for post-pruning. For all the experiments we used the Weka toolkit (Hall et al., 2009). Due to the imbalance of our dataset in favor of positive classes (see Table 2), we sub-sampled the dataset maintaining the initial genre distribution. We also applied principal component analysis (PCA) in order to guarantee the independence of the classification features, as required by the Naive Bayes algorithm. To explore the behavior of the algorithms to the change of trained data we generated the learning curves shown in Figure 2. In both cases the learning curves are well-behaved since the error rate grows monotonically as the training set shrinks. However, the precision, recall, and Fscores achieved by J48 are significantly better that those of Naive Bayes (Table 4). This preliminary experiment led us to use J48 for the main experiment, where we try different features in order to understand which are the most discriminative ones. These results are collected in Table 5 and discussed immediately below.

Discussion of Results
A first observation that can be easily made is that the domain features are good discriminants. As these features exploit information such as credit-  ing order that is outside the scope of our hypotheses, there were expected to be good discriminants and are included for comparison only.
By comparing the performance of all features (F = 82%), domain-only features (F = 68%), and allexcept-domain features (F = 80%), we can immediately understand that our name-intrinsic features are better discriminants than domain features; in fact, name-intrinsic features not just better than domain features, they are by themselves almost as good as domain and name-intrinsic features combined. This is a significant finding, as it validates our core hypothesis that there is a correla-tion between what fictional character names look and sound like and the role they play in the plot of the fictional work they appear in.
We will now proceed to look in more detail into the different categories of features used, in order to gain further insights about specific discriminants.

Phonological Features
The phonological features are important separation criteria as evidenced by the drop in performance when they are excluded from the experimental setup (Table 5). Specifically, using all features except phonological features is equivalent to using phonological features alone (about F = 79% in both cases) and slightly worse that using all name-intrinsic features (about F = 80%). By comparison, removing any other category increases performance, leading us to believe that all other features are actually adding noise (rather than discriminatory power) to the feature space.
In order to delve more into this category of features, we proceeded with an n-gram analysis (of order 1 through 4) to look for correlations between phonemes. The results clearly demonstrated the positive effect of the number of vowels (normalized by the length of the utterance) to the positive category. As far as the consonants are concerned, voiced (e.g. /2/, /g/, /d/, /w/) seem to relate more to the negative class. Table 7 summarizes a more fine-grained analysis for the consonants based on their categorization.
The environment plays an important role, with specific combinations showing tendencies that are not observed with isolated phonemes. For example, diphoneme /an/ relates to positive class while /@n/ to negative. Table 6 lists some frequent phoneme 2-and 3-gram examples. The position of each phoneme also seems to play an crucial role Phonemes Class /p/, /b/ (bilabial plosive) P /l/ (alveolar lateral) P /f/, /v/ (labiodental africative) N /k/, /g/ (velar plosive) N /t/, /d/ (alveolar plosive) N /dZ/, /tS/ (affricate) N /m/, /n/ (nasal) N /ô/ (alveolar retroflex) N Table 7: Consonants behavior in the classification task. Specifically, we note that starting with a vowel or a consonant are among the most discriminating features. These observations are consistent to a great extent with work in psychology and literary theory that studied phonological iconicity for common words (Nastase et al., 2007;Auracher et al., 2011;Schmidtke et al., 2014). Some contradictory conclusions in these works are attributed by researchers to the methodologies applied, while at the same time concerns are raised whether such methodologies can inductively lead to cross-language and general conclusions (Auracher et al., 2011). Table 8 summarizes some of the outcomes of these works.

Emotion and Affect
The analysis showed that the features that calculate the emotional load of fictional names based on SentiWordNet contribute to the classification task. However, we believe that there is still room for improvement for the performance of this feature mainly towards the optimization of the selection threshold in order to reduce the degree of false positive matches as well as the addition of more lexical resources for example WordNet Affect or LIWC.

Social Features
The annual publication It's a Man's (Celluloid) World examines the representation of female characters every year. According to its 2015 results (Lauzen, 2015), gender stereotypes were abundant with female characters being younger than their male counterparts and more likely to have prosocial goals including supporting and helping others. This bias makes the gender feature discriminative, but in a way that is not linguistically interesting: female characters are simply related to the Reference Description Taylor and Taylor (1965) evidence that pleasantness relations are language specific Fonagy (1961) sonorants (e.g., /l/,/m/) more common in tender poems, plosives (e.g., /k/,/t/) in aggressive ones Miall (2001) Passages about Hell from Miltons "Paradise Lost" were found to contain significantly more front vowels and hard consonants than passages about Eden while the latter contained more medium back vowels Whissell (1999) plosives correlate with unpleasant words Auracher et al. (2011) nasals (e.g., /m/) relate to sadness, plosives (e.g., /p/) to happiness, parallels across remote languages Zajonc et al. (1989) umlaut /y/ causes negative affective states A somewhat surprising result was that the foreign suffix feature is not discriminative. The hypothesis that the concept of the 'other' is stereotyped negatively does not seem to be true in our dataset. A closer investigation might identify genres where this hypothesis holds (e.g., war movies), but this would be implicit pragmatic information about the context of the film rather than a linguistically interesting finding.

Poetic and Stylistic Features
The experimental findings show that literary devices can actually be identified in fictional characters names, but the same findings also indicate that they do not contribute significantly to the classification task. More specifically, consonance is the only stylistic/poetic feature that affects classification.

Conclusions and Future Work
In this paper we test the hypothesis that the sound and the form of fictional characters' names correlates with meaning, in our particular case with the respective characters' role in the work of fiction. We restricted our study to fictional characters since they are not tied to cultural conventions of naming, such as names that run in a family, so that we are able to look for patterns that are perceived as positive or negative by the audience and used as such (consciously or not) by the creator.
Our experiments have verified that features intrinsic to the names and without any reference to the plot or, in general, any other context are discriminative. Furthermore, we have discovered that the most discriminative features are of phonological nature, rather than features that hint at pragmatic information such as the gender or origin of the character. A further contribution of our work is that we ran an annotation campaign and created an annotated corpus of fictional movie characters and their corresponding polarity. This corpus is offered publicly, and can serve experimentation in the digital humanities beyond the scope of the experiments presented here.
Our future research will test the correlation between the polarity and the name of a fictional character beyond the movie domain. It would, for example, be interesting to seek differences between spoken names (as in films) and names that are only meant to be read (as in literature). In addition, using written literature will allow us to compare texts from different periods, pushing earlier than the relatively young age of motion pictures. Character polarity annotations in written literature could be created by, for example, applying sentiment analysis to the full text of the work.