Representing Honorifics via Individual Constraints

Within the context of grammar engineering, modelling honoriﬁcs has been regarded as one of the components for improving machine translation and anaphora resolution. Using the HPSG and MRS framework, this paper provides a computational model of honoriﬁcs. The present study incorporates the honoriﬁc information into the meaning representation sys-tem via Individual Constraints with an eye toward semantics-based processing.


Introduction
Honorific forms express the speaker's social attitude to others and also indicate the social ranks of the participants in the discourse and the intimacy. Because honorifics are crucial for using the language in a socially correct way, they have been studied in computational linguistics as well as theories of grammar. Particularly, using the honorific information improves anaphora resolution, and helps machine translation systems provide more natural-seeming output sentences (Mima et al., 1997;Siegel, 2000;Nariyama et al., 2005).
This paper provides a way of modelling honorifics within the formalism of grammar-based language processing. Building upon Head-driven Phrase Structure Grammar (Pollard and Sag, 1994, HPSG) and Minimal Recursion Semantics (Copestake et al., 2005, MRS), the present study suggests using Individual CONStraints (henceforth, ICONS) for representing honorifics from the perspective of multilingual processing. This paper is structured as follows: Section 2 presents some background knowledge of the current study. Section 3 proposes using Individual Constraints for modelling the honorific system. Building upon the specification, Section 4 shows how honorific expressions can be translated across different honorific types of languages. Section 5 reports a small experiment to see if the current model contributes to semantics-based processing.

Forms of Expressing Honorifics
A cross-linguistic survey reveals that there are three ways of expressing honorifics (Agha, 1994;Ide, 2005): (i) pronouns, (ii) inflection, and (iii) suppletives. Different languages use a different range of honorific systems, but it appears that there exists a hierarchy in the system of honorification, as presented in Table 1. Note that some languages (e.g. English) use no honorific forms. The most widespread linguistic phenomenon regarding honorific expressions can be found in the taxonomy of personal pronouns. In many languages, personal pronouns (particularly, second pronouns) are dualized, viz. ordinary (a.k.a. informal) forms and honorific (a.k.a. formal) forms. For example, Chinese employs two second personal pronouns: 你 nǐ and 您 nín. Both sentences provided in (1) convey a meaning like "What is your name?" in English.
( (1a) is a plain way to ask someone's name, in which both pronouns can be felicitously used. In contrast, (1b) is a way of asking in a courteous manner, in which the use of 你 nǐ is inappropriate.
That is to say, the predicate in (1b) 贵姓 guìxìng is a marked expression in terms of honorification. Some languages employ a more complicated honorific system. In Japanese, Korean, Javanese, Hindi, and some other languages, the inflectional paradigm is conditioned by the honorific relations between dialogue participants (Siegel, 2000;Ohtake and Yamamoto, 2001;Kim and Sells, 2007). For instance, in Japanese and Korean, if the subject is in the honorific form, the predicate is preferred to be in the honorific form, as exemplified in (2). Note that 先生 sensei 'teacher' is an honorific word, and the verbal form o+STEM+ni naru is used to signify honor to the subject.
(2) 先生 sensei teacher  (Dalrymple, 2001, p. 18) Other elements can also be marked with respect to honorification. When non-subjects (e.g. objects and obliques) are honored, the canonical verbal form in Japanese is o+STEM+suru. When the speaker wants to express an honor to the hearer in Japanese, a verbal ending masu is used as shown in the last word of (2). On the other hand, the nominal inflectional system is also influenced by honorifics, as exemplified in (3).
tuli-si-ess-supni-ta give(HON)-HON-PST-HON-DECL '(An honoree) gave a book (to another honoree).' (The hearer is also an honoree.) [kor] The verb in (4) contains three honorific forms for the object, the subject, and the addressee. The lexeme ᄃ ᅳᄅ ᅵ-tuli-is a suppletive counterpart of ᄌ ᅮcwu-'give'. This verb implies the receiver is respectable. The second one is the suffix -si-, which indicates the subject is an honoree. The third one is the ending suffix -supni-, which indicates that the speaker expresses a respect to the hearer.
There are also nominal suppletive forms. The different lexical items that denote the same referent sometimes indicate the relative degree of familiarity to the referent. For example, kinship terms in Japanese vary depending on the relationship between the speaker and the referent: When talking about the speaker's own grandfather with others in a modest attitude, 祖父 sofu is normally used. When either denoting the other's grandfather or calling the speaker's grandfather friendly and informally, お 爺 さ ん o-jii-san is normally used. This contrast shows that o-jii-san lexically involves an honorific information, whereas sofu is neutral. On the other hand, because o-jii-san can be used to denote both the other's grandfather and the speaker's own grandfather, the honorific information has to be flexibly represented so as to cover the two potential relations.
In addition to the forms discussed hitherto, some particular constructions, such as passives and interrogatives, can serve to express honorification. However, the meaning is just pragmatically conveyed in this case. Such a construction is not a necessary condition but a sufficient condition for expressing honorifics. Not all passive sentences in Japanese necessarily involve an honorific relation. In contrast, if the o+STEM+ni naru form in Japanese is used, then the subject is presumed to be an honoree. Since the current work is exclusively concerned with honorific forms, these constructions are out of the scope of this paper.

Motivations
Honorifics have often been regarded as agreement phenomena just as the subject-predicate agreement in many European languages (Boeckx, 2006;Kim et al., 2006). However, there is an opposing view to this (Choe, 2004;Bobaljik and Yatsushiro, 2006;Kim and Sells, 2007). One counterexample is provided in (5). (Choe, 2004, p. 546) The subject of (5) contains an honorific form ᄂ ᅵ ᆷ nim, but the predicate optionally takes the honorific marker ᄉ ᅵ -si-though the verb with the honorific marker sounds more natural. Along this line, the current study does not constrain honorification as a way of agreement.
There are also a couple of reasons for not following honorification-as-agreement. These reasons make it necessary to model honorifics as flexibly as possible.
First, honorification is a matter of tendency rather than restriction. Notice that tendency and restriction are not on a par with each other in grammar engineering. Corpus data provide more than a few cases in which a mismatch of honorific forms happens, as exemplified in (5). Grammar engineering systems must work robustly for even less frequent items if the forms appear in naturally occurring texts and unless they critically violate the principle of human language.
Second, honorification is a matter of acceptability rather than grammaticality. Acceptability is primarily concerned with appropriateness, whereas grammaticality confirms the linguistic rules mostly provided by linguists. Thus, acceptability distinguishes not grammatical and ungrammatical sentences, but felicitous and infelicitous ones. In a similar vein, Zaenen et al. (2004) argue that animacy is mainly relevant to acceptability: For instance, the choice between the Saxon genetive and the of -genetive in English is sensitive to animacy, but the difference has more to do with felicity. The same goes for honorification. The choice of honorific forms leads to a difference in acceptability which forms a continuous spectrum.

Individual Constraints on Honorifics
Minimal Recursion Semantics is the formalism employed to compute semantic compositionality in the present work. In addition, the current work employs ICONS (Individual CONStraints) in order to incorporate discourse-related phenomena into semantic representation of human language sentences. The representation method used in the present study (i.e. MRS+ICONS) has to do with not only semantic information incrementally gathered up to the parse tree, but also other components required to be accessed in the process of cross-lingual processing. MRS+ICONS enables us to model several discourse-related items within an intrasentential system (i.e. sentence-based processing). Notice that there exist several discourserelated items that can be at least partially resolved without seeing adjacent sentences. This can be conceptualized in the format of Dependency MRS (Copestake, 2009), as exemplified in (6).
Himself in (6a) equals the subject John, while him in (6b) does not. The notation in the bracket in each example indicates the relationship between two individuals: equal and non-equal. That is to say, anaphora can be partially identified within an intrasentential domain via such a binary relation.
There are some other phenomena that require contextual information in theory but can be partially resolved in practice in a way similar to (6), and honorification is one of them.
The current work represents honorifics as a binary relation between two individual elements. A set of honorific information is stored into a bag of constraints, and the value is only partially specified unless there is a clue to identify the honorific relation within the intrasentential context.

Comparision to Previous Approaches
On the one hand, MRS+ICONS makes honorification (basically a pragmatic information) visible in semantic representation with an eye toward semantics-based language processing. In the previous HPSG-based studies, honorifics are treated as a typed feature structure under CTXT (ConTeXT). This local structure includes C-INDICES whose components are SPEAKER and ADDRESSEE (Siegel, 2000;Kim et al., 2006). In the LFG-based studies, honorification is regarded as an F-structure, given that it is one of the reliable tests to diagnose subjecthood (Dalrymple, 2001). Outside the scope of grammar-based deep processing, several studies make use of shallow processing techniques, such as POS-based pattern matching rules and regular expressions, for paraphras-ing honorific expressions (Ohtake and Yamamoto, 2001, among others). In sum, no previous approach represents honorifics into a (near) logical form. In the semantics-based processing, all components that have a part in transfer and generation must be accessed in semantic representation.
On the other hand, the current model provides computational flexibility for handling honorification. Many previous studies on honorification employ a syntactic and/or semantic feature [HON bool] (Kim et al., 2006, among others). However, this feature is sometimes misleading for computational processing of honorifics for three reasons. First, there exist more than a few mismatches between honorific forms in real texts written in Korean and Japanese (i.e. no honorification-asagreement (Choe, 2004)). [HON bool] is too restrictive to analyze rather infelicitous but acceptable honorific expressions ( §2.2). For example, the two types of second personal pronouns in Chinese are interchangeable in many cases as provided in (1a), and the use of the informal pronoun 你 nǐ in (1b) merely results in infelicity (not ungrammaticality). The current work deals with honorifics grounded upon the premise "parsing robustly, generating strictly" (Bond et al., 2008). All potential honorific forms can be parsed robustly and flexibly, but the generation outputs are made strictly and felicitously. Second, [HON bool] cannot fully reflect the fact that honorifics are sometimes ambiguous and the specific meaning can be incrementally resolved up to the parse tree (Kim and Sells, 2007). For example, お爺さん o-jiisan 'elderly man' in Japanese can be used either informally or formally, and the choice between them depends on syntactic configuration. The current work makes use of a type hierarchy to constrain honorifics (see Figure 1), which manipulates the potential ambiguity and identifies the meaning throughout unification of structures. Third, the Boolean feature is too crude to place different types of constraints on subjects, objects, and addressee. For instance, the verb of (4) (in Korean) includes three HON glosses, and they have different honorific relations. MRS+ICONS represents honorifics as a binary relation amongst individuals, such as speaker, hearer, and referents.

Fundamentals
MRS+ICONS is structured as shown in (7). The value type is icons whose components are IARG1 and IARG2. Since ICONS stands for a binary re-lation between two individuals, their value type is individual (a supertype of event and ref-ind).
On the other hand, the HOOK structure, which keeps track of the features that need to be externally visible upon semantic composition, has three additional attributes, viz. ICONS-KEY, SPEAKER-KEY, and HEARER-KEY. These features function like a pointer in the compositional construction of the semantic structure. They are required to mark the constituent analyzed as the speaker or the hearer of an utterance and deliver the information up to the parse tree. In particular, first and second personal pronouns specify this value as their own index ( §3.4). CTXT (under local) includes C-INDICES just as Jacy does (Siegel, 2000), but the names are different as presented in the following AVM. Note that the counterpart of "speaker" must be "hearer", and that of "addressee" must be "addressor". The value type is ref-ind, because the speaker and the hearer are also referential individuals.
The values of SPEAKER and HEARER remain underspecified until an utterance is established. The typed feature structure of utterance is presented in (9), in which SPEAKER-KEY and HEARER-KEY under CONT (i.e. mrs) are co-indexed with SPEAKER and HEARER under CTXT. Unless the SPEKER-KEY and the HEARER-KEY are assigned a specific value during the construction of the parse tree, the values are still left underspecified. If the value is not specified until an utterance is built up, that means that the speaker and the hearer cannot be identified within the intrasentential domain.
The utterance rule syntactically forms a nonbranching root node, whose daughter is either a saturated sentence or a fragment (sat-or-frag). This pseudo phrase structure rule introduces two elements into the ICONS list, as shown at the bottom of (9). They are valued as addressor (i.e. speaker) and addressee (i.e. hearer). These ICONS elements play the key role to make dialogue participants visible in semantic representation. Their IARG1s are respectively co-indexed with SPEAKER and HEARER (i.e. 1 and 2 ), and the IARG2 are commonly co-indexed with the semantic head's INDEX of the utterance (i.e. 3 ). The main reason why they have a relation to the semantic head is that it is necessary to resolve the speaker/hearer scope in quotations. For example, (10) contains two different discourse frames, viz. inner frame and outer frame.
The two different frames may have different speakers and different hearers. For instance, the speaker in the inner frame of (10) is Holmes, while that in the outer frame is the narrator of the story. In other words, (10) includes two different utterances, and each introduces its own addressee and addressor elements into the ICONS list (i.e. four ICONS elements, in total).

Type Hierarchy
Going into the details, the type hierarchy of icons for honorification is sketched out in Figure 1. Regarding honorification, icons includes two immediate subtypes: namely, dialogue and rank. The former branches out into addressor and addressee, and the latter includes two levels of subtypes.
Higher-or-int indicates that one individual is socially higher than the other or intimate to the other. Recall that お 爺 さ ん o-jii-san in Japanese can be canonically used when the referent is higher than the speaker (formal) or intimate to the speaker (less formal). The word itself has the [ICONS-KEY higher-or-int] feature, which can be further constrained by the value that the predicate assigns to the word. Honorification is normally relevant to which is "higher" than which, but the linguistic forms can sometimes be altered when talking to someone in the lower position. For instance, Korean employs six levels of imperative inflections conditioned by the relationship between the speaker and the hearer. Lower-or-int and lower work for this case. Finally, note that int inherits from both higher-or-int and lower-or-int.

Specifications
First, pronouns are specified with respect to the speaker and the hearer, as shown in (11).
The first personal pronoun has a co-index between its own INDEX and SPEAKER-KEY, and the ICONS list is empty because it does not contribute to honorification by itself. Likewise, the second personal pronouns link their INDEX to HEARER-KEY. If the pronoun is honorific, one ICONS element is introduced. Otherwise (e.g. 你 nǐ), the ICONS list is empty. The ICONS element of the right AVM indicates that the hearer 1 is higher than the speaker 2 . Second, several inflectional rules introduce an ICONS element as exemplified in (12) for the subject-honorific form and the addressee-honorific form in Japanese. The left AVM's ICONS element represents that the subject 1 is higher than the speaker 3 . Likewise, the right AVM's ICONS element specifies the relation between the hearer 2 and the speaker 1 .
Third, the suppletive forms themselves do not introduce an ICONS element, but the ICONS-KEY is specified in order to place a partial constrain on polarity. This pointer value functions similarly to [HON bool], but operates more flexibly ( §3.1). They are instantiated in (13). Note that お休み oyasumi is a suppletive counterpart of 寝る neru 'sleep' in Japanese.
While the neutral forms provided in (13a) are underspecified, the honorific forms in (13b) place a constraint on ICONS-KEY. Notably, お爺さん ojii-san 'elderly man' in Japanese assigns higheror-int to the ICNOS-KEY covering the ambiguity.

Sample Representation
The example sentence is illustrated in (14). Note that the nominative marker が ga and the two verbal ending forms are semantically empty. Therefore, only underlined elements are left in the semantic representation as shown in (15a). In addition, there are two invisible elements as provided in (15b), such as the speaker and the hearer.
Recall that MRS+ICONS includes these invisible referential individuals into the semantics. These four individuals have four relations as presented in (15c), and they are added into the ICONS list. Each relation given in (15c) is in the format as [α X β], which is read as "α has an X relation to β". For instance, [x1 higher x3] means that x1 (i.e. the subject) is higher than x3 (i.e. the speaker). The first two relations are introduced when the utterance is built up (see (9)). The last two relations came from the verbal ending forms (see (12)). The MRS representation for (14) is provided in (16), in which IARG1 and IARG2 respectively correspond to α and β in the [α X β] format.
The traditional representation (16) can be converted into a dependency graph for ease of exposition. In (17), and tentatively stand for the dialogue participants. The solid line in (17) means that the relation is specified in the RELS list. The dotted line stands for the ICONS element. The relational value is labelled on the arrow, and the direction of the arrow indicates which individual is co-indexed with which IARG. For instance, the arrow from ojiisan to means the same as [x1 higher x3] presented in (15c) and the last ICONS element of (16).

Translating Honorifics
With respect to translating honorific expressions across languages, there are different types of translation strategies. Notice that paraphrasing is regarded as a specific type of translation (i.e. monolingual translation) in the current study, given that it is also carried out via the same procedure consisting of parsing, (transfer), and generation.
First, if both the source language and the target language have a complex honorific system (e.g. Japanese→Japanese), all ICONS elements gathered in the parsing stage persist in the transfer and generation stage. The four sentences in Japanese provided in (18) convey a meaning like "Did you/someone sleep?" in English, but the preference in the choice hinges on the social relation. The felicity condition is presented in Table 2.  (18d) can be paraphrased into (18b). Translating from a less infor-mative form to a more informative form is plausible because there is no discarded information. Second, if the source language has rich honorifics, and the target language places an honorific constraint on only pronouns (e.g. Japanese→Chinese), the ICONS elements are selectively transferred: The element not linked to pronouns are filtered out. In the opposite direction, the underspecified ICONS element in the input MRS can be resolved in the output as discussed above. For instance, the subject and the hearer in (19) is the honorific second pronoun 您 nín in Chinese. (19) cannot be translated into (18cd) in which masu does not show up (see (12)). In contrast, all sentences given in (18)  Third, if the source language employs rich honorification and the target language has no honorific form (e.g. English), the transfer system turns off ICONS. For example, (18a-d) are commonly translated into "Did you sleep?" in English. The other direction (e.g. English→Japanese) raises no problem because all underspecified elements are restored on the target language's side.

Experiment
In order to verify whether the current model works for semantics-based processing, one experiment was conducted with ACE (http://sweaglesw. org/linguistics/ace). The HPSG used for this experiment is the Jacy (Siegel and Bender, 2002). The basic analysis for honorific expressions in Japanese discussed in this paper was implemented. The testset was the first 4,500 sentences in the Tanaka corpus (Tanaka, 2001). Using these resources, paraphrasing in Japanese (i.e. monolingual translation) was carried out with the 5-best option for parsing and 512MB memory capacity. After paraphrasing was completed, the two results (i.e. without or with ICONS) were compared, as provided in Table 3. The comparison was made with respect to (A) the average output numbers, (B) the number of the items with end-toend-success, and (C) the number of the items with exact-match-output out of (B).  Table 3 presents that the current model aids in producing more precise outputs, as indicated in (C): The translation accuracy grows by 8.25%. The number of outputs (A) also grows because all potential forms of expressing honorifics are generated without loss of information: All ambiguous interpretations are generated as long as the information is provided in the semantic representation. The end-to-end-success rate (B) decreases, but it is mainly due to the memory limitation, not the model itself: If the size of generated outputs exceeds the given value of memory limitation (512MB in the current experiment), all the outputs are ignored in comparison. If a bigger value is chosen, this rate also increases though it takes much longer time to yield the outputs.