Towards an Entertaining Natural Language Generation System: Linguistic Peculiarities of Japanese Fictional Characters

One of the key ways of making dialogue agents more attractive as conversation partners is characterization, as it makes the agents more friendly, human-like, and entertaining. To build such characters, utterances suitable for the characters are usually manually prepared. However, it is expensive to do this for a large number of utterances. To reduce this cost, we are developing a natural language generator that can express the linguistic styles of particular characters. To this end, we analyze the linguistic peculiarities of Japanese ﬁctional characters (such as those in cartoons or comics and mascots), which have strong characteristics. The contributions of this study are that we (i) present comprehensive categories of linguistic peculiarities of Japanese ﬁctional characters that cover around 90% of such characters’ linguistic peculiarities and (ii) reveal the impact of each category on characterizing dialogue system utterances.


Introduction
One of the key ways of making dialogue agents more attractive as conversation partners is characterization, as it makes the agents more friendly, human-like, and entertaining. Especially in Japan, fictional characters (such as those in cartoons or comics and mascots) are very popular. Therefore, vividly characterized dialogue agents are strongly desired by customers.
To characterize agents, utterances suitable for them are usually manually prepared. However, it * Presently, the author is with NTT Communications Corporation.
† Presently, the author is with Nippon Telegraph and Telephone East Corporation. is expensive to do this for a large number of utterances. To reduce this cost, we have previously proposed a couple of methods for automatically converting functional expressions into those that are suitable for given personal attributes such as gender, age, and area of residence (Miyazaki et al., 2015) and closeness with a conversation partner (Miyazaki et al., 2016). However, when it comes to expressing the linguistic styles of individual fictional characters whose characteristics should be vividly expressed, these methods, which can convert only function words, i.e., which cannot convert content words such as nouns, adjectives, and verbs, do not have sufficient expressive power. As the first step in developing a natural language generator that can express the linguistic styles of fictional characters, in this work, we analyze the linguistic peculiarities of fictional characters such as those in cartoons or comics and mascots, which have strong characteristics.
The contributions of this study are that we (i) present comprehensive categories of the linguistic peculiarities of Japanese fictional characters that cover around 90% of the fictional characters' linguistic peculiarities and (ii) reveal the impact of each category on characterizing dialogue system utterances.
Note that although we use the term 'utterance', this study does not involve acoustic speech signals. We use this term to refer to a certain meaningful fragment of colloquial written language.

Related work
In the field of text-to-speech systems, there have been various studies on voice conversion that modifies a speaker's individuality (Yun and Ladner, 2013). However, in the field of text generation, there are not so many studies related to the characterization of dialogue agent utterances.
In the field of text generation, there is a method that transforms individual characteristics in dialogue agent utterances using a method based on statistical machine translation (Mizukami et al., 2015). Other methods convert functional expressions into those that are suitable for a speaker's gender, age, and area of residence (Miyazaki et al., 2015) and closeness with a conversation partner (Miyazaki et al., 2016). However, these methods handle only function words or have difficulty in altering other expressions. In this respect, we consider these methods to be insufficient to express a particular character's linguistic style, especially when focusing on fictional characters whose individualities should be vividly expressed.
There have also been several studies on natural language generation that can adapt to speakers' personalities. In particular, a language generator called PERSONAGE that can control parameters related to speakers' Big Five personalities (Mairesse and Walker, 2007) has been proposed. There is also a method for automatically adjusting the language generation parameters of PER-SONAGE by using movie scripts (Walker et al., 2011) and a method for automatically adjusting the parameters so that they suit the characters or stories of role playing games (Reed et al., 2011). However, although there is some aspect of linguistic style that is essential to expressing a particular character's style, PERSONAGE does not have any existing parameter that can manifest that linguistic reflex .
In the present work, we focus on the languages of fictional characters such as those in cartoons or comics and mascots. By analyzing the languages of such characters, we reveal what kind of linguistic peculiarities are needed to express a particular character's linguistic style.  culiarities is explained in detail.

Lexical choice
We consider that lexical choice, which refers to choosing words to represent intended meanings, reflects the supposed speakers' gender, regionspecific characteristics, personality, and so on. In terms of lexical choice, we utilize the following two categories.

P1: Personal pronouns
It is said that personal pronouns are one of the most important components of Japanese role language, which is character language based on social and cultural stereotypes (Kinsui, 2003). Japanese has "multiple self-referencing terms such as watashi 'I,' watakushi 'I-polite,' boku 'I-male self-reference,' ore 'I-blunt male self-reference,' and so on" (Maynard, 1997). Accordingly, if a character uses ore in his utterance, its reader can easily tell the character is male, the utterance is probably uttered in a casual (less formal) situation, and his personality might be rather blunt and rough. As well as the first person pronoun, there are various terms for referencing second person.

P2: Dialectical or distinctive wordings
We assume that using dialectical wordings in characters' utterances not only reinforces the regionspecific characteristics of the characters but also makes the characters more friendly and less formal. It is also said that "regional dialect is a significant factor in judging personality from voice" (Markel et al., 1967). In addition to dialects, the languages of Japanese fictional characters often involve character-specific coined words. The words are, so to speak, 'character dialect.' For example, for the character of a bear (kuma in Japanese), we observed that the word ohakuma is used instead of ohayoo 'good morning.'

Modality
We consider that modality, which refers to a speaker's attitude toward a proposition, reflects the supposed speakers' friendliness or closeness to their listeners, personality, and so on. As for modality, we have the following two categories.

P3: Honorifics
We consider that honorifics have a significant effect on describing speakers' friendliness or closeness to their listeners and on the speakers' social status. Depending on the social, psychological, and emotional closeness between a speaker and a listener, and whether the situation is formal or casual, Japanese has five main choices of honorific verb forms: plaininformal (kuru 'come'), plain-formal (kimasu 'come'), respectful-informal (irassharu 'come'), respectful-formal (irasshaimasu 'come'), and humble-formal (mairimasu) (Maynard, 1997).
Although English does not have such honorific verb forms, it does have linguistic variations corresponding to the honorifics; for example, it is said that "Americans use a variety of expressions to convey different degrees of formality, politeness, and candor" (Maynard, 1997).

P4: Sentence-end expressions
Sentence-end expressions are a key component of Japanese character language, as are personal pronouns (Kinsui, 2003). For example, there are sentence-end expressions that are dominantly used by female characters. We also consider that sentence-end expressions are closely related to speakers' personalities, since the expressions contain elements that convey speakers' attitudes.
We define a sentence-end expression as a sequence of function words that occurs at the end of a sentence. Japanese sentence-end expressions contain interactional particles (Maynard, 1997), which express speaker judgment and attitude toward the message and the listener. For example, ne (an English counterpart would be 'isn't it?') occurs at the end of utterances. In addition, Japanese sentence-end expressions contain auxiliary verbs (e.g., mitai 'like' and souda 'it seems'), which express speaker attitudes.
Some of the expressions that fall into this category have their counterparts in the parameters of PERSONAGE (Mairesse and Walker, 2007). In particular, interactional particles such as ne might be able to be controlled by the TAG QUESTION INSERTION parameter, and auxiliary verbs such as mitai and souda might be able to be controlled by the DOWNTONER HEDGES parameter.

Syntax
We consider that syntax, which refers to sentence structures, reflects the supposed speakers' personality and maturity. With regard to syntax, we have just one category.

P5: Syntactic complexity
Syntactic complexity is considered to be reflective of introverts, and it is also handled in PERSON-AGE (Mairesse and Walker, 2007). In addition, we assume that syntactic complexity reflects the maturity of the supposed speakers. For example, the utterances of a character that is supposed to be a child would include more simple sentences than complex ones.

Phonology and pronunciation
We consider that phonology and pronunciation reflects the supposed speakers' age, gender, personality, and so on. As for phonology and pronunciation, we have three categories. What we want to handle are pronunciations reflected in written expressions.

P6: Relaxed pronunciations
Both English and Japanese have relaxed pronunciations, that is, pronunciation variants that are not normative and are usually easier and effortless ways of pronunciation. These relaxed pronunciations can often be observed as spelling variants. For example, in English, 'ya,' 'kinda', and 'hafta' can be used instead of 'you,' 'kind of', and 'have to', respectively. In Japanese, vowel alternation often occurs in adjectives; for example, alteration from ai to ee, as in itai to itee 'painful'. According to our observation, relaxed pronunciations are seen more often in the utterances of youngsters than older people and more often in males than females. We consider that relaxed pronunciations lend a blunt and rough impression to characters' utterances.

P7: Disfluency
In the utterances of some fictional characters, word fragments are often used for representing disfluent language production by the supposed speakers. For example, ha, hai 'Yes' and bo, bokuwa ga, gakusei-desu 'I am a student,' which are probably done for adding hesitant characteristics to the characters. It is also said that "including disfluencies in speech leads to lower perceived conscientiousness and openness" (Wester et al., 2015).

P8: Arbitrary phoneme replacements
In addition to relaxed pronunciation, it is often observed that arbitrary phonemes are replaced by other arbitrary phonemes, especially in character languages. For example, every consonant 'n' can be replaced by 'ny' (e.g., nyaze nyakunyo instead of naze nakuno 'Why do you cry?'). This phenomenon does not occur in actual human's utterances unless the speaker is kidding. We consider that arbitrary phoneme replacements are utilized to give a funny impression to characters' utterances and to differentiate the linguistic styles of characters.

Surface options
Since we are handling written utterances, there are some options of how an utterance is presented as a sequence of letters and symbols. We consider that surface options are utilized as an easy way of characterizing utterances and differentiating the linguistic styles of characters.

P9: Word segmentation
In normative Japanese texts, unlike English texts, words are not segmented by spaces-rather, they are written adjacently to each other. However, in characters' utterances, it is sometimes observed that words or phrases are segmented by spaces or commas. When Japanese texts are read aloud, spaces and commas are often acknowledged with slight pauses, so we think that inserting extra spaces or commas between words has the effect of giving a slow and faltering impression to the characters' utterances.

P10: Letter type
In the Japanese writing system, there are three types of letters-logographic kanji (adopted Chinese characters), syllabic hiragana, and syllabic katakana-and a combination of these three types is typically used in a sentence. Those who know a lot of rare kanji letters are often regarded as being well educated. In contrast, using too many syllabic hiragana letters in a text gives the text a very childish impression.

P11: Symbols
Symbols such as exclamation marks and emoticons are often used in Japanese texts, in the same manner as in English. We assume that symbols are commonly used as an easy way of expressing speakers' emotional states.

Extras
There are extra expressions that contribute to neither propositional meaning nor communicative function but still strongly contribute to characterization. We prepare the following two categories for such expressions.

P12: Character interjections
Some of the extra expressions occur independently or isolated from other words, as interjections do. We call such expressions 'character interjections' in this study. Onomatopoeias, which describe supposed speakers' characteristics, are often used as such expressions. For example, for the character of a sheep, mofumofu 'soft and fluffy' is used as a character interjection.

P13: Character sentence-end particles
There are expressions called kyara-joshi 'character particles' (Sadanobu, 2015), which typically occur at the end of sentences. The difference between character interjections and character particles is mainly their occurrence position. According to our observation, the word forms of the character particles are something like shortened versions of character interjections, which are often within two or three moras (e.g., mofu as for the character of a sheep).

Eval 1: Coverage of categories of linguistic peculiarities
We conducted an evaluation to assess how well our categories account for the linguistic peculiarities of Japanese fictional characters. The evaluation process is shown in Figure 1. First, we (1) collected characters' utterances. Then, we (2) annotated linguistic expressions that are peculiar to the characters, and finally, we (3) counted how (2) Annotating linguistic peculiarities (1) Collecting characters' utterances

Utterances of characters
Step 1: Marking expressions peculiar to characters Step 2: Classifying peculiar expressions into our 13 categories and 'others' Step 3: Extracting expressions marked by both annotators Annotated utterances Figure 1: Process of the evaluation to assess how well our categories account for the linguistic peculiarities of Japanese fictional characters.
many expressions fall into each of our categories and how many do not fit into any category.

Collecting characters' utterances
As utterances of fictional characters, we collected the following two kinds of text.
Twitter postings We collected Twitter postings of character bots. We chose bots that are authorized by their copyright holders, as we assume these are characterized by professional writers.
Dialogue system utterances We utilize dialogue system utterances that are written by professional writers we hired. The writers are asked to create utterances that are highly probable for given characters to utter as responses to given questions. Contents and linguistic expressions of the utterances are carefully characterized by the writers in accordance with pre-defined character profiles that we created.
The characters we chose (C1-20) are shown in Table 2.
These 20 characters are balanced with respect to humanity (human/nonhuman), animateness (animate/inanimate), gender (male/female/neuter), and maturity (adult/child or adolescent) so that we can find general and exhaustive linguistic peculiarities of various characters.  'NA' indicates that the value of the attribute is not specified in a character's profile. As for gender, 'neuter' refers to a character's gender being specified as neutral between male and female.
We utilized 11 fictional characters from Twitter bots (C4 and C11-20) and six fictional characters from dialogue system characters (C5-10). The reason we use dialogue system utterances along with Twitter postings is that we intend to analyze utterances that are originally designed for a dialogue system. In addition to these 17 fictional characters, we also used three non-fictional (actual human) characters for comparison (C1-3). C1 and C3 are Twitter bots that post Japanese celebrities' remarks from their TV shows or writings and C2 is the official Twitter account of a Japanese celebrity. Note that we did not use these characters in creating the categories in Section 3; that is, these characters have been prepared for evaluation purposes.
We collected 100 utterances from each character for a total of 2000 utterances. The average number of words per utterance of the characters from Twitter (C1-4 and C11-20) and the dialogue system characters (C5-10) are 25.5 and 13.3, respectively. Examples of characters' utterances are given in Table 3.

Annotating linguistic peculiarities
Each of the characters' utterances was annotated with linguistic peculiarities by annotators (not the authors) who are native speakers of Japanese.
Step 1: Marking expressions peculiar to characters For each of the 2000 utterances, we asked two annotators (a primary annotator and a secondary annotator) to mark linguistic expressions that they felt were peculiar to a character. The two annotators worked separately, i.e., without discussing or showing their work to each other. This process was performed by two of the four annotators (A1-4) shown in Table 4. These annotators correspond to annotators A and B in Figure 1.
To analyze the 'linguistic' peculiarities of fictional characters, we asked the annotators to mark peculiar surface expressions and constructions (i.e., to concentrate on 'how to say it') without taking into account the meaning or content (i.e., to ignore 'what to say') of the utterances.
Step 2: Classifying peculiar expressions into categories For each expression marked in step 1, we asked another annotator (not one of the authors) to classify the expression into one of 14 categories, i.e., to tag the category labels to the expressions. These 14 categories include the 13 categories shown in Table 1 plus 'others' for expressions that cannot be classified into any of the 13. The annotator corresponds to annotator C or D in Figure 1. In the example shown in Figure 1, annotator C deals with the expressions marked by annotator A and annotator D deals with the expressions marked by annotator B. When classifying the expressions, annotators C and D are allowed to discuss and show their work to each other. Examples of the tagged utterances are given below.

</character>
Step 3: Extracting expressions that are agreed to be peculiar The utterances that are marked as having peculiar expressions by the two annotators in Step 1 are compared. If the text spans of the expressions marked by the two annotators overlap, such text spans are regarded as the expressions agreed to be peculiar and are extracted.
To evaluate the agreement of the expressions marked by the two annotators, we use three measures: recall, precision, and F-measure. Here, we regard the task of marking expressions performed by two annotators as the secondary annotator's task of extracting the expressions marked by the primary annotator. The three measures are calculated by where B represents the number of expressions marked by both the primary and secondary annotators, P represents the total number of expressions  marked by the primary annotator, and S represents the total number of expressions marked by the secondary annotator.
The number of expressions that are marked by both annotators and the values of the three agreement measures are listed in Table 5. In total, 4,994 expressions were agreed to be peculiar by two annotators. The average values of recall, precision, and F-measure were 0.72, 0.74, and 0.73, respectively-sufficient for the annotators' perception of characters' linguistic peculiarities to be considered as moderately in agreement and for the extracted expressions to be reliable as characters' linguistic peculiarities.

Counting numbers of peculiar expressions in each category
We counted the number of category labels tagged to the expressions that were agreed to be peculiar in Step 3. We used 4,729 expressions that two annotators tagged with the same category (not all of the 4,994 expressions that were agreed to be peculiar). Then, we calculated the proportion of the expressions classified into each category.

Results
The results are shown in Table 6. The proportion of expressions that cannot be classified into any of our categories was just around 12%. In other words, around 88% of the linguistic peculiarities of Japanese characters are covered by our 13 categories. When considering fictional characters (C4-20) only, around 90% of linguistic peculiarities are covered by our categories. However, the proportions of expressions classified into P5 (syntactic complexity), P6 (relaxed pronunciation), P7 (disfluency), P9 (word segmentation), and P10 (letter type) were less than 1%, which suggests these categories might not be as important as other categories, or might not be used as effectively as other categories. The importance (effectiveness in characterization) of each category will be discussed later in Section 5. In Figure 2, the proportions of expressions classified into each category are shown separately by characters' attributes. The proportion of 'others' for non-fictional (actual) human characters is the largest among other characters. The proportion of 'others' is gradually lowered as fictionality is intensified, that is, as the characters become fictional, non-human, and inanimate. We think this result suggests that our 13 categories describe the linguistic peculiarities of fictional characters better than those of non-fictional humans. Actually, P8 (arbitrary phoneme replacements) should not occur so frequently in non-fictional humans' utterances because P8 is primarily for fictional characters (see details in Section 3). This came about because the annotators often confused expressions that should be classified into P6 (relaxed pronunciation) with those that should be classified into P8. The expressions classified into these two categories need to be further investigated.    Table 7: Correlation ratio (η) between the existence of a category and the average score of character appropriateness among ten annotators.

Eval 2: Relations between categories and character appropriateness
The second evaluation is for revealing the characterizing effects of each category.

Preparation: Assessing character appropriateness of utterances
For each of the 2000 utterances collected in Section 4.1, we asked ten annotators (A5-14, listed in Table 4) to assess the appropriateness of the utterances as those uttered by particular characters. The assessment was done on a five-point scale from 1 (very inappropriate; seeming like a different character's utterance) to 5 (very appropriate; expressing the character's typical linguistic characteristics).

Evaluation method
To evaluate the relationships between the categories of linguistic peculiarities and linguistic appropriateness for the given characters, we calculated the correlation ratio (η) between the existence of a category and the average score of character appropriateness among ten annotators. We consider that a high correlation ratio between the existence of a category and the score of character appropriateness tells us how effectively the category invokes humans' perceptions of the linguistic style of a particular character. We use correlation ratio because it can be applied to calculate correlation between categorical data (nominal scale) and interval scale, i.e., the categories of linguistic peculiarities and the average score of character appropriateness in this case. To be precise, the score of character appropriateness in a five-point scale is not an interval scale but an ordinal scale. However, we treat the five-point scale as an interval scale for convenience.

Results
The correlation ratios between the existence of the categories and the average scores of character appropriateness among ten annotators are shown in Table 7. The correlation ratios are shown by character and the top three η values of each character are written in bold. When considering all characters, category P2 (dialectical or distinctive wordings) showed the best correlation ratio, P13 (character sentence-end particles) was the second, and P1 (personal pronouns) was the third. As for P2, since it ranked in the top three categories for 11 of 20 characters, we consider that using dialectical or distinctive wordings is the most general and effective way of characterizing utterances.
In addition to these top three categories across all characters, we consider that P4 (sentence-end expressions) is an important characteristic of human characters because it ranked in the top three categories for seven of ten human characters. Although P4 did not show as high a correlation ratio as the other categories as a whole, we consider that it has a strong effect on characterizing utterances, especially for human characters.
As for non-human characters, P11 (symbols) showed a comparatively high correlation ratio in addition to the categories mentioned above. We suppose that symbols such as exclamation marks and emoticons are used as an easy and effective way of characterizing utterances, especially when handling non-human characters.
Overall, we found that most of our 13 categories of characters' linguistic peculiarities contribute to character appropriateness to some extent. In other words, most of the categories had some effect on characterizing the utterances of Japanese fictional characters.
Note that there are possibilities that the score of character appropriateness is affected by other factors than the existence of a category-such as the capability of a character creator's use of linguistic expressions that belong to our proposed categories, or a particular annotator's like or dislike of a particular category of linguistic expressions. To reduce such possibilities as much as we can, we used various characters and utilized various annotators, which are listed in Tables 2 and 4 respectively, and refrained from making conclusions of this evaluation by only looking at the result of a single character or a single annotator.

Conclusion and future work
With the aim of developing a natural language generator that can express a particular character's linguistic style, we analyzed the linguistic peculiarities of Japanese fictional characters. Our contributions are as follows: • We presented comprehensive categories of the linguistic peculiarities of Japanese fictional characters. • We revealed the relationships between our proposed categories of linguistic peculiarities and the linguistic appropriateness for the characters.
These contributions are supported by the experimental results, which show that our proposed cat-egories cover around 90% of the linguistic peculiarities of 17 Japanese fictional characters (around 88% when we include actual human characters) and that the character appropriateness scores and the existence of our categories of linguistic peculiarities are correlated to some extent.
As future work, we intend to develop a natural language generator that can express the linguistic styles of particular characters on the basis of the 13 categories presented in this paper. To this end, we are first going to build a system that has 13 kinds of modules to convert linguistic expressions, such as a module to convert utterances without honorifics into those with honorifics (corresponds to category P3), a module to convert utterances without relaxed pronunciations into those with relaxed pronunciations (corresponds to category P6), and so on, and that can combine arbitrary kinds of modules to express various linguistic styles. After we build such a generator, we will evaluate its performance in the characterization of dialogue system utterances.