Copula and Case-Stacking Annotations for Korean AMR

This paper concerns the application of Abstract Meaning Representation (AMR) to Korean. In this regard, it focuses on the copula construction and its negation and the case-stacking phenomenon thereof. To illustrate this clearly, we reviewed the :domain annotation scheme from various perspectives. In this process, the existing annotation guidelines were improved to devise annotation schemes for each issue under the principle of pursuing consistency and efficiency of annotation without distorting the characteristics of Korean.


Introduction
Abstract Meaning Representation (AMR) (Banarescu et al., 2013) is a framework suitable for integrated semantic annotation. When localizing AMR annotation guidelines (Banarescu et al., 2018) for Korean, it is vital to maximize the use of existing annotation conventions widely accepted in English, Chinese, and other languages to ensure high compatibility between the AMR corpora of each language.
However, current AMR annotation guidelines are geared to English vocabulary and grammatical phenomena. Therefore, it is necessary to devise semantic annotation schemes for Korean AMR that reflect existing annotation guidelines as much as possible while accurately representing the characteristics of Korean. This paper reviews annotation methods for several grammatical phenomena that are the characteristics of Korean and presents guidelines thereof. Section 3 presents an annotation method for consistently representing the copula '-이-(-i-)' in the Korean † corresponding author copula construction and the lexical negation '아니-(ani-; to be not)'. Copula constructions in Korean are formed through the copula '-이-(-i-)'; in this regard, it is necessary to establish annotation guidelines considering its negation, relativization, and complementization. Section 4 presents a case-stacking annotation method representing two or more subjects or objects. Case-stacking in Korean is a phenomenon in which several nominative or accusative words are licensed to a single predicate, and they have a pragmatic and semantic relationship with each other.
Regarding such issues, it would be helpful to boost Korean sembanking by properly adjusting AMR annotation standards and Korean-specific grammar phenomena to increase annotation efficiency.

Predicate Annotation
The first problem to consider for representing Korean sentences through AMR is determining what language resources to use in the annotation of core semantic roles. AMR uses the frameset of the Proposition Bank (PropBank) (Palmer et al., 2005) to represent the meaning of sentences, while Chinese AMR (CAMR) uses the Chinese PropBank (Xue & Palmer, 2009) frameset (Li et al., 2016). Brazil-Portuguese AMR (AMR-BR) uses the VerboBrasil dataset (Duran et al., 2013) as its annotation resource, which was built through the PropBank-BR project (Duran & Alu ísio, 2012) promoted after the PropBank initiative. (Anchiêta & Pardo, 2018) The Korean PropBank (Palmer et al., 2006) consists of the Virginia corpus and the Newswire corpus; the Newswire corpus comprises 2,749 predicates attached to more than 23,000 semantic roles. (Bae & Lee, 2015) Given that most languages, including Spanish (Migueles-Abraira et al., 2018), have adopted the PropBank, the use of the Korean PropBank is a top priority when representing Korean AMR. Most importantly, annotating frame-arguments according to the PropBank convention facilitates alignment with AMRs in other languages; compatibility with AMR corpora in many languages is advantageous.
However, as the lists of predicates and predicate senses in each language do not coincide with each other, detailed annotation guidelines may vary. For example, when applying the way of representing copula constructions and its negation with AMR in other languages to Korean copula constructions, problems arise in case of dealing with the predicate '아니-(ani-)', which is the lexicalized form of '-이-(-i-)' and means "to be not." Therefore, it is necessary to present a rational guideline considering the annotation methods of the general copula construction in Korean.

Copula Constructions
As AMR strives to represent the abstract meaning of language expressions, it is recommended to annotate the actual meaning of the copula construction to clarify the actual meaning rather than annotate the syntactic structure. Most copula constructions can be annotated using existing frames with well-defined arguments and semantic roles. For example, (1) is an annotation of the "NP is NP" construction using work-01 to place the focus on meaning.
(1) The boy is a hard worker. However, for annotating some "Noun is noun" constructions or predicate adjectives with no available frame, the English specification proposes an annotation using :domain and its inverse role, :mod, for unspecified modification (often modifying a noun) and the like by relativization. In these cases, as there is no frame available, it is difficult to represent the meaning of the expression clearly. Annotating with :domain and :mod is a simpler and more efficient way to handle the cases for which annotators cannot decide a proper way of representation, while it is a less interpretive representation.
(2) a. The man is a lawyer.
(l / lawyer :domain (m / man)) b. The man who is a lawyer (m / man :mod (l / lawyer)) In (2), the annotation of "The man is a lawyer" and its relativization is shown. Here, as there is no frame for 'lawyer', :domain and its inverse role, :mod, are used for annotation. In reality, AMR for several languages makes limited use of :domain only if there is no available frame. In CAMR, when representing the sentences with the main verb "是," which functions as a copula, :domain is defined as a non-core role relation used in attribution and jurisdiction as in (3). However, as representing meaning by :domain is more ambiguous, it was found that the use of :domain gradually decreased as the labeling process continued. Thus, as the labeling process proceeded, the emphasis was placed on annotations that clearly reveal the semantic relationship rather than those close to the sentence format.
(3) 他是班长。 'He is a leader.' The annotation of the Korean copula construction should also clearly reveal the meaning of the sentence. In subsequent sections, we will examine the usage of the copula '-이-(-i-)' in Korean and briefly propose a proper way to annotate each usage. We will also examine cases in which we are forced to annotate with :domain and discuss special considerations for these cases. Next, we examine cases in which the negation of the copula construction is realized through the predicate ani-, and how to deal with the cases where the conceptual annotation is difficult.

Annotation of Copula -i-
Korean copula '-이-(-i-)', unlike "be", which is a verb in English sentences, combines directly with the content word to form the predicate and is conjugated. In Korean, the usage of the copula '-이-(-i-)' is largely classified into the following: ⅰ) class membership (ascriptive), ⅱ) identity and identification, ⅲ) locational, ⅳ) existential, ⅴ) presentational, ⅵ) temporal, ⅶ) quantificational, ⅷ) cleft sentence and ⅸ) relatedness and illogical usage. (Park, 2012) Most of these uses can be represented through verbalization, non-core roles, special frames, and the like. For example, in the case of ⅰ), it can be represented through :subset, :consist-of, and have-org-role-91; :location for ⅲ) and ⅳ); first-class concept and date-entity for ⅴ) and ⅵ); and :quant for ⅶ). Further, (4a) is an AMR using frame 담당-01 (to be in charge of), and (4b) is simply represented with non-core roles.
office-in printer-NOM three unit-COP-DECL 'There are three printers in the office.' The relativization of the copula construction annotated using :domain is annotated with :mod, the inverse role of :domain (as shown in (5)).

byeonhosa-i-n geu-neun ajik mihon-i-da.
lawyer-COP-PART he-ADP yet unmarried-COP-DECL 'The man who is a lawyer is not married yet.' (결 / 결혼-01|marry :polarity -:mod (아 / 아직|yet) :ARG0 (그 / 그|the man :mod (변 / 변호사|lawyer))) However, interpretive annotation is not possible in all cases. In certain cases of ⅰ), ⅱ), ⅷ), and ⅸ), annotations by :domain may be inevitable. Types ⅰ) and ⅱ) roughly correspond to cases where there is no available frame, while ⅷ) and ⅸ) correspond to cases where there is a presupposed context or it is a focus construction. In these cases, :domain is used. In (6), as there is no way of knowing the "best option" is for what, there is a limit to fully revealing the meaning.
honest-do-PART talk-do-PART thing-NOM best-GEN options-COP-DECL 'That talking honestly is the best option.'

Copula -i-with ani-Negation
Korean copula '-이-(-i-)' does not function as an independent morpheme but constitutes a predicate with preceding words. In contrast, the negation of '-이-(-i-)' is realized by lexical negation with the adjective predicate '아니-(ani-)' or syntactically realized with '-(이)지 않-(-(i)ci anh-)'. ‡ If the difference in the meaning of the proposition of the copula construction and its negation is the presence or absence of negation, then in general situations, it is desirable to annotate negation only with :polarity -.
Below, (7) is a representation of a sentence in which the negation of (4a) is realized through '아니-(ani-)'. Compared to (4a), the only difference is the presence or absence of the :polarity -annotation.
Similarly, compared to (8a), the only difference is the presence or absence of the :polarityannotation. (Note that there is no frame unavailable for predication, "문제(이)다(munje-(i cop )-da; to be problematic)" However, there are cases in which the meaning changes when the core frame arguments of predicate '아니-(ani-)' are inverted. For example, (9b) is a sentence in which the argument from 9(a) is simply replaced, which can change the meaning of the sentence. Here, simply annotating with :polarity -can pose a problem. It is difficult to view (9b) as the negative construction of (8a). In addition, if the meaning of (9b) differs from that of (9a), then the representations should not be the same. To represent (9b) as a trial, (10) assigns :polarity -to "빙하가 녹는 것 (That glaciers are melting)," a thing in a predicate. However, the annotation of (10) is closer to "문제는 빙하가 녹지 않는 것이다. (The problem is (the event) that glaciers are NOT melting)" rather than the meaning of (9b).

munje-neun bingha-ga nog-neun geos-i ani-da.
problem-ADP glacier-NOM melt-PART thing-ADP NEG-DECL 'The problem is not (the event) that glaciers are melting.' In this case, annotation with :polarity -is not appropriate. The Korean PropBank provides the frame 아니-01 (to be not) for the predicate ani-, whose usage seems appropriate. 아니-01 has two core semantic roles; :ARG1(subj) is assigned to "thing in focus" and :ARG2(comp) is assigned to "thing in predication." This method is an appropriate alternative when it is not enough to only add the :polarity -to the AMR of the copula construction to represent the meaning of the ani-construction. Example (11) is an annotation of (9b) using frame 아니-01. The representation of (11) is not the same as that of (10).

eogul-han salam-eun daleun salam-i ani-n balo na-(i)-da.
wrong-PART person-ADP different person-ADP NEG-PART exactly me-COP-DECL 'The wronged person is none other than me.' (억 / 억울-01|to feel wronged :ARG1 (나 / 나|I :mod (아 / 아니-02|besides that :ARG0 (사 / 사람|person :ARG1-of (다 / 다르-01|to be different ))))) It is preferable to annotate the presence or absence of :polarity -based on the same proposition of the copula construction and its negation. This will ensure that the propositions of the affirmative and negative constructions are aligned with one another. In addition, when the copula construction or its negated sentence is relativized, the propositions must also be aligned with each other. The use of the frames 아니-01 or 아니-02 is highly limited to cases in which the meaning of the sentence changes, which is due to the characteristics of the predicate '아니-(ani-)'.

Case-stacking
Korean is a SOV (subject-object-verb) language with case markers. The agent and patient are placed before the predicate, and word order constraints are loose because of the presence of case markers. Therefore, Korean sentences generally rely on the case markers to encode and decode the grammatical relationships and semantic roles of the arguments. Korean has sentence types including the socalled double nominative construction (DNC) and double accusative construction (DAC). (Brown & Yeon, 2015) In DNC and DAC, the nominative marker '-이(-i)`가(-ga)' or the accusative marker '-을(-eul)/-를(-leul)' is licensed to two or more constituents in a sentence.
In the case of DAC, it can be classified as possessive constructions, locative alternation constructions, change-of-state constructions, numeral phrase constructions, support verb constructions, etc. (Shin, 2016) The numeral phrase construction is included in both DNC and DAC, which can be represented in AMR easily by :quant. Two constituents which are marked by the same case marker generally have different grammatical and semantic relation.
These double case marker constructions are conventionally called 'double subject/object construction', while there is still room for argument about whether both constituents with the same marker are both subject or object.
In DNC and DAC, the two subjects or two objects are usually divided into "inner-nominative and accusative" and "outer-nominative and accusative," with a semantic and discourse relationship therebetween. As a result, there is generally a word order constraint between the two constituents.
In Korean, the predicate of a double nominative construction is often an adjective. Although there are various types of Korean nominative case-stacking constructions, (Wunderlich, 2014) this paper discusses only those for predicate clause and psychological adjective constructions. (Yoo, 2000) Korean also has several types of accusative case-stacking constructions. However, this paper discusses only the annotations of constructions corresponding to two objects with a whole-part relationship or dative verb constructions. § (Yeon, 2010) First, this section analyzes the sentence structure of DNC through the major subject (outer-nominative) and sentential predicate (a clause in which inner-nominative and predicate are embedded). We propose using :domain for the major subject that is the topic of a discourse of the sentential predicate. This annotation method is very efficient because it can be applied repeatedly, even when the subject appears more than once. A frame 좋-01 (to be good) in (13) takes only one subject that stands for "thing being good"; two or more subjects appear in real text. Here, the outer nominative was annotated with :domain as the main subject. In adjective constructions in which nominative case-stacking occurs, attention should be paid to the annotation according to predicate sense. In (15), which uses a psychological adjective rather than a qualifying adjective, while nominative case-stacking occurs, the semantic role of each constituent corresponds to agent and patient. If predicate senses vary, the core argument annotation should differ, corresponding to a role set of the frame.
na-neun geu-ga joh-da. Multiple accusative case licensing in DAC usually involves paraphrasing adnominal possessive structures; there are many cases in which possessor raising occurs. (Nakamura, 2002) If it is difficult to clarify the relationship between two accusatives, it is convenient to use :domain as in (16), but if not, it is important to clarify the meaning. Example (17) annotates the relationship between "발톱 (claw)" and "고양이 (cat)" using :part-of, without using :domain.

Conclusion and Further Works
This paper discussed consistent annotation methods for the Korean copula construction, its negation, complementation, and relativization with focus on the copula '-이-(-i-)' and the predicate '아니-(ani-)'. In this process, we also demonstrated cases in which annotation must be performed using the frames 아니-01 and 아니-02.
In addition, we proposed a new usage for :domain with regard to case-stacking, which occurs in Korean sentences frequently. While its limited usage is recommended, in cases in which there are two or more subjects or objects when there is no available frame for the sentential predicate, it can be used repeatedly.
The annotation guideline of the Korean copula construction presented in this paper is essentially based on the copula construction annotation standards of other languages. However, as indicated in the limited discussion of the predicate '아니-(ani-)', devising consistent annotation principles for scopal polarity remains a topic for future discussion. Accordingly, it is necessary to examine various aspects of negative representation more broadly than those discussed in this paper. Besides, as the usage of :domain slightly expands, the usage of :domain and :mod label in terms of the determination of a topic of discourse and modifications should also be considered. This discussion will enable the AMR scheme to represent the semantics of Korean more explicitly.
In the future, we aim to build a Korean AMR corpus reflecting these discussions. For this task, the consistency and efficiency of the annotation guidelines need to be improved. Also, wellestablished language resources are required to reduce cost and efforts to build an actual Korean AMR corpus.
Currently, the following language resources which are labeled with semantic roles are available in Korean: UCorpus-DP/SR & UPropBank of Ucorpus (Released by KLPLAB, University of Ulsan) and SRL datasets of Exobrain Language Analysis Corpus v4.0 (Released by Seoul SW-SoC Convergence R&BD Center, ETRI). The UCorpus uses an extended set of theta roles from Sejong Electronic Dictionary and The Exobrain Corpus follows the annotation system of Korean Proposition Bank.
The next step of this research is to construct a Korean AMR corpus by converting the existing Korean semantic resources followed by correcting it manually. This further work to construct Korean AMR corpus would provide detailed guidelines, which could stimulate future studies in Korean sembanking.