Universal Semantic Tagging for English and Mandarin Chinese

Universal Semantic Tagging aims to provide lightweight unified analysis for all languages at the word level. Though the proposed annotation scheme is conceptually promising, the feasibility is only examined in four Indo–European languages. This paper is concerned with extending the annotation scheme to handle Mandarin Chinese and empirically study the plausibility of unifying meaning representations for multiple languages. We discuss a set of language-specific semantic phenomena, propose new annotation specifications and build a richly annotated corpus. The corpus consists of 1100 English–Chinese parallel sentences, where compositional semantic analysis is available for English, and another 1000 Chinese sentences which has enriched syntactic analysis. By means of the new annotations, we also evaluate a series of neural tagging models to gauge how successful semantic tagging can be: accuracies of 92.7% and 94.6% are obtained for Chinese and English respectively. The English tagging performance is remarkably better than the state-of-the-art by 7.7%.


Introduction
Developing meaning representations across different languages plays a fundamental and essential role in multilingual natural language processing, and is attracting more and more research interests (Costa-jussà et al., 2020). Existing approaches can be roughly divided into three categories: the crosslingual 1 approach focuses on lending semantic annotation of a resource-rich language, such as English, to an under-resourced language (Wang et al., 2019;Blloshmi et al., 2020;Mohiuddin and Joty, 2020); the interlingual approach attempts to * This author is now working in Tencent. 1 The terminology in the literature is quite diverse-the usages of "crosslingual", "interlingual" and "multilingual" vary from author to author. provide a unified semantic framework for all languages (Abend and Rappoport, 2013;White et al., 2016;Ranta et al., 2020); the multilingual approach aims at developing comparable but not necessarily identical annotation schemes shared by different languages (Bond and Foster, 2013;Baker and Ellsworth, 2017;Pires et al., 2019).
In line with the interlingual approach, Universal Semantic Tagging (UST; Bjerva et al., 2016) develops a set of language-neutral tags (hereafter referred to as sem-tag) to annotate individual words, providing shallow yet effective semantic information. Semantic analyses of different languages utilise a same core tag set, but may also employ a few language-specific tags. Figure 1 presents an example.
English I/PRO had/PST repaired/EXT my/HAS watch/CON ./NIL German Ich/PRO hatte/PST meine/HAS Armbanduhr/CON repariert/EXT ./NIL Italian Ho/NOW riparito/EXT il/DEF mio/HAS orologio/CON ./NIL Chinese 我/PRO 把/OBJ 我/PRO 的/MOD 手 表/CON 修/EXT 好/EXT 了/PFT 。/NIL Figure 1: An example of parallel sentences and their sem-tags. PRO: anaphoric & deictic pronouns; PST: past tense; EXT: untensed perfect; HAS: possessive pronoun; CON: concept; NOW: present tense; DEF: definite; OBJ: object; MOD: modification; PFT: perfect tense; NIL: empty semantics. All tags are universal, with the exception of non-core tags OBJ and MOD, which are newly created to annotate Chinese-specific linguistic phenomena that can not be represented by the existing system. their translated counterparts. However, it is insufficient to prove the feasibility of UST only through some cases of inflectional and genetically related languages, because one main challenge in developing interlingual meaning representations is unifying annotations related to different characteristics of different languages. We argue that two questions with regard to universality of UST are still unanswered. Firstly, homologous words in PMB languages facilitate the application of UST, but it is not clear whether UST is equally applicable to languages sharing little cognates, although UST employs a delexicalised method. Another concern is from typology: it still remains unknown whether word-level semantic tags are effective for annotating long "sentence-words" composing many morphemes which are common in agglutinative languages (e.g. Turkish and Japanese) and polysynthetic languages (e.g. Eskimo languages).
This paper takes Mandarin Chinese, a phylogenetically distant language from the Indo-European family, as an example to explore the effectiveness of UST as a universal annotation scheme. Considering the balance of Chinese-specific linguistic properties and universality, we present a more comprehensive tag set where six new tags are added, indicating most sem-tags are applicable to Chinese ( §2). Based on the new tag set, we establish a parallel corpus by manually translating WSJ into corresponding Chinese sentences and annotating sem-tags for 1100 sentence pairs. It is a peer-reviewed corpus with 92.9% and 91.2% interannotator observed agreement of Chinese and English respectively ( §3). This relatively successful practice of UST in Chinese suggests it keeps the balance between the depth of represented information and the breadth of its coverage of languages. In other words, shallow semantics of UST enables it to be extended to annotate diversified languages.
By means of the newly created corpus, we evaluate a series of neural sequence labeling techniques ( §4). The results demonstrate that the proposed scheme is promising with the accuracy of Chinese achieving 92.7% and the accuracy of English 94.6% ( §5). The English tagging performance is remarkably better than the state-of-theart (Abzianidze and Bos, 2017) by 7.7%, even though the sentences in our corpus are much longer than PMB on average, with 25 tokens per sentence compared with 6 in PMB.
In order to analyse the divergence between an-notations of English and Chinese data and the plausibility of developing universal semantic representation in general, we manually annotate word alignment for 500 sentences. By studying the aligned counterparts, we argue that universality is still threatened to some extent because there are 37.0% aligned tokens with mismatched sem-tags. This phenomenon is mainly due to grammatical divergence, information loss of translation and difference of annotation strategies. All the analyses based on word alignment suggest that even for a delexicalised, relatively shallow meaning representation scheme, it can still be problematic to ensure that semantic representations could be comparable in a word-to-word way.

Tailoring Tag Sets for Mandarin Chinese
Considering different linguistic ways to encode tense, aspect, prepositions, measure words, subordinate clauses and comparative expressions, we provide a tailored version of UST to handle Mandarin Chinese. We present the complete tailored tag set in the Appendix.
Events and tense/aspect Different from English as well as many other Indo-European languages, there are no inflection-style tense-markers in Mandarin. Therefore, the morphological tense-related labels, e.g. ENS and EPS, are removed. Alternatively, temporal interpretation of Chinese can be conveyed through function words, adverbials or shared understanding of the context in Chinese (Smith and Erbaugh, 2005). Apart from the last way, the previous two are encoded by sem-tags FUT and IST. As for aspect in Chinese, there are only four commonly recognized aspect markers, denoting the preceding verbs are actualized or ongoing-了/过 are perfective (PFT) and 在/着 are progressive (PRG) (Liu, 2015).
Preposition Prepositions of English and Chinese vary in their history origins though they have similar syntactic function at present. English prepositions are mainly created to replace the lost inflectional case markers (Mitchell, 1985). On the other hand, Chinese prepositions can be traced to verbs. Li and Thompson (1989)   nese prepositions should not follow the practice on English because REL emphasizes grammatical relations between verbs and nouns while in Chinese the degree of grammarization of prepositions is not so far. Consequently, we design a separate set of semtags for Chinesee prepositions by borrowing existing sem-tags (DXT/DXP/ALT) and adding some new sem-tags (MAN/RES/AIM/OBJ/COM).

Meaning
Sem-tag Classifier Classifier is a Chinese-specific word class which is inserted between numerals and nouns to denote quantity. This category does not exist in English so we generalize UOM over the unit of measurement since its function is quite similar to classifiers (Li and Thompson, 1989).
Subordinate clause Whether subordinate clauses exist in Chinese is controversial since not all the clauses meet the standard in a lower position than the main clause. Additionally, words corresponding to subordinate conjunctions of English such as 因 为 (because), 虽 然 (although), etc, constitute a heterogeneous group and do not necessarily select a subordinating clausal complement (Paul, 2016). Given these two reasons, SUB is (temporarily) removed to avoid controversy.
Comparative expression UST designs a detailed label set to annotate comparative expressions in English. See Table 4. In particular, though expressions labeled as MOR/TOP and LES/BOT utilize exactly the same syntactic constructions, they are separated according to their meaning, in a way that is more oriented by applications. Different from English, Mandarin does not have morphological comparatives and superlatives. To express comparative-related meaning, adverbs 更 (roughly means more) and 最 (roughly means most) are utilized and annotated as MOR and TOP respectively. Accordingly, LES and BOT are deleted.

The Corpus
We introduce a new moderate-sized corpus containing high-quality manual annotations for English and Chinese, which is now available at https://github.com/pkucoli/UST.

Data Source
To support fine-grained cross-lingual comparisons, the corpus includes 1100 parallel sentence pairs. We select 1100 sentences from the Wall Street Journal (WSJ) section of Penn TreeBank (PTB; Marcus et al., 1993). We choose it because it contains detailed semantic annotations and the sentences are relatively long, thus potentially carrying more complex information. It is noteworthy that various syntactic and semantic analyses of these English sentences have been built by multiple projects, e.g. DeepBank (Flickinger et al., 2012), PropBank  and OntoNotes (Weischedel et al., 2013).
We then obtain Chinese counterparts of original English sentences by employing English-Chinese bilinguals to do literal translation. In addition, we also select 1000 sentences from Chinese TreeBank (CTB; Xue et al., 2005), where manual syntactic analyses are available.

Annotation
One doctoral student and one undergraduate student, majoring in linguistics, annotate the pair sentences. The guideline for English annotation is derived from the universal semantic tag set (Abzianidze and Bos, 2017) with reference to data in PMB and Chinese is annotated based on the modified tag set in the appendix. The annotation process consists of three steps: firstly, annotators independently annotate 100 Chinese WSJ sentences, and later compare and discuss disagreements between the annotations. The conflicting cases are then analyzed to modify the specification. After some iterations, the consistency between annotators is significantly improved. Additionally, we find partof-speech (POS) tags are quite useful to accelerate manual annotation. Therefore, we apply the Stanford CoreNLP tool (Manning et al., 2014a) to get automatically predicted POS tags for the translated Chinese sentences.
Quality of the corpus The observed interannotator agreement in annotating Chinese and English sub-corpus data achieves 92.9% and 91.2% for Chinese and English sentences respectively. A high consistency in the annotation of both sub-corpus is obtained, which, in our view, demonstrates that UST is feasible for Chinese and the adjustment of original tag set is relatively satisfactory.
Re-tagging In order to improve the quality of annotation, we leverage the re-tagging strategy (Ide and Pustejovsky, 2017). Specifically, we investigate disagreements between initial model predictions and manual tagging, and correct manual annotation errors. After a round of re-tagging and re-training, the disagreement between the gold and the output of the tagger reduces from 10.3% to 7.9% on Chinese and 6.7% to 5.2% for English.

Divergence between English and Chinese Annotations
As a multilingual annotation scheme, UST represents semantic information in an interlingual way. Therefore, we want to answer after the modification of tag set, how the retained cross-lingual syntax and semantic divergence between distant languages still threatens its universality. We leverage a token-level word alignment for 500 parallel sentence pairs and investigate sem-tag mismatching between aligned tokens. Of the total 7295 pairs of tokens aligned, tokens in 3392 pairs share matched semantic tags with their counterparts, with a matching rate of 46.5%. Note that punctuation and tokens tagged with NIL are excluded. Figure 2 shows an example of word alignment and sem-tag matching. Our divergence analysis based on alignment is under the assumption that, as both the tasks of alignment and sem-tagging are concerning tokenlevel semantic representation, the matched token pairs are expected to share the same semtags. Non-correspondence between aligned counterparts would therefore suggest divergence between the annotations in two languages, and further, may reveal problems caused by cross-lingual divergence.
Word alignment Word alignment between sentence pairs is firstly automatically acquired with Berkeley Aligner 2 and then manually corrected.
Matching rate and mismatches In general, aligned tokens are mostly entities or events, and among matches, the most frequent sem-tag is CON, followed by ORG and ROL. Other tags whose proportions in all matches exceed 3% are EXS, QUC, IST, PER and GPE. And the match per edge rates of these tags are also relatively high except for IST (see Table 5). However, since the mismatch phenomenon in CON, ORG and EXS are also  not rare, annotation divergence could probably exist. A linguistically-motivated analysis suggests the following important factors: • Grammatical divergence: an example is EXS in Figure 2. As illustrated in §2, it is used to tag Chinese verbs that are non-progressive or non-perfect, while only limited to untensed simple for English. This grammatical difference leads to tag set modification and thus results in sem-tag mismatch.
• Information loss caused by non-literal translation: In the example in Figure 2, approved its acquisition is translated as 批准...对其进 行收购, which cause mismatch between acquisition (noun, CON) and 收购 (verb, EXS).
• Different annotation strategy for MWE: Corp. is tagged ORG while in their Chinese counterparts 公司 are tagged CON.

Tagging Models
Long Short-Term Memory (Hochreiter and Schmidhuber, 1997, LSTM) models have been widely used in various sequential tagging tasks (Huang et al., 2015;Ma and Hovy, 2016;Bohnet et al., 2018) and have achieved the state-of-theart performance for many popular benchmark datasets. In our paper, we use Bidirecational LSTM (BiLSTM) with and without a Conditional Random Field (CRF) inference layer to build baseline systems for our dataset. In the rest part of this section, we will briefly formulate our baseline tagging models and introduce some widely used techniques that may enhance prediction for some tagging tasks.
Model For a word w i in an input sentence (w 1 , w 2 , ..., w n ), we use dynamically learned word embeddings e 1 summed with the feature vectors calculated by BERT/ELMo after a linear projection W e as the input of BiLSTM. If the POS tag of word w i is used as additional input, we extend x i with the the embedding p i of the POS tag before passing it into the BiLSTM.
After obtaining the contextual representations f i and b i , we pass the concatenation of f i and b i to a multilayer perceptron (MLP) to calculate the scores vector s i over semantic tags.
Finally, we feed s i into a softmax layer to choose a tag with highest probability for each word independently, or a CRF layer which can select the tag sequence with highest probability for the whole sentence.
Subword/Character-level Models In order to solve the out-of-vocabulary (OOV) issues in sequence tagging tasks, many subword-level and character-level models are proposed (Akbik et al., 2018;Ling et al., 2015;Bohnet et al., 2018). We do not use these models for experiments, instead we leverage pretrained language models to handle OOV issues, such as BERT (Devlin et al., 2019) and ELMo (Peters et al., 2018). These pretrained language models are trained on large corpus and use a subword/character-level vocabulary, which provide better contextual word representations.
POS features POS categories can provide lowlevel syntax infomation which is beneficial for sem-tagging. In our experiments, we try to use POS tags as additional inputs for our baseline systems.
Multi-task Learning (MTL) Multi-task learning (MTL) is a widely discussed technique in the literature. Previous work (Changpinyo et al., 2018) shows that MTL can improve sequence tagging tasks in some cases. In our experiments, we try to jointly train a POS tagger and a semantic tagger which use a shared BiLSTM.

Experimental Setup
We conduct experiments on English and Chinese data separately. Since there are only about 2100 Chinese sentences and 1100 English sentences which are annotated, in order to achieve more stable tagging accuary for future comparison, we randomly split the whole dataset into 5 folds. One fold is a test set and the remaining serves as the training set where our model is trained on 85% instances and model selection is judged by the performance on the rest 15% instances. And then the tagging accuracy will be calculated using the best model on the selected fold. Finally, we report the average accuracy on these 5 folds.
Built on the top of PyTorch (Paszke et al., 2017), we employ BiLSTM as our baseline model and all the models are trained for 8000 minibatches, with a size of 32. Using the Adam optimizer (Kingma and Ba, 2015) and a cosine learning rate annealing method, we train the model with an initial learning rate chosen from {0.0001, 0.005, 0.001}. The details of parameters setting in different models are as follow: 1) the dimension of the hidden states of LSTM is set to 128 for each direction and the number of layers is set to 1; 2) the embeddings of POS tags are randomly initialized and has a dimension of 32 while the embeddings of words have a dimension of 300 and are initialized by the GloVe vectors 3 (Pennington et al., 2014) and pre-trained word vectors 4 (Li et al., 2018) for English and Chinese respectively 5 ; 3) the parameters of BERT/ELMo are fixed during the training of our sequence tagging models; 4) for models with MTL, we directly optimize the sum of the losses for both POS tagging and universal semantic tagging. Figure 3 shows the overall performance of different models. Gold POS tags bring significant performance improvements, which is also verified by Huo and de Melo (2020). However, MTL can only slightly improve the overall results. When pre-trained contextualized word embeddings are utilized, the gap between different models becomes insignificant. Additionally, the significant improvement of English accuracy over previous state-of-the art is also attributed to the use of pretraining models: with the help of BERT, a simple BiLSTM tagger can be close to 92.0%-accurate for Chinese and 94.6% for English while without it, tagging accuracy of English data is around 85%.

Comparative Analysis of Tagging Error
Empirical evaluation indicates competitive accuracy of our models. However, the result varies among different sem-tag categories and some of them remain at an extremely low level (Table 6).
To further improve the model's performance and have a better understanding of cross-lingual semantic representation, this section provides a fine-grained error analysis towards each underperforming sem-tag category.  Properties of Chinese adjectives The low predication accuracy of ATT is largely attributable to the difficulties in differentiating IST and SST, especially in the light of high frequencies of adjectives in Chinese, which are a more complicated case compared to English adjectives. Usages of Chinese adjectives and corresponding sem-tags are shown in Table 7:

Category of sem-tag English
Narrow adjectives IST IST/SST IST/SST Distinct words n.a.
IST IST Table 7: Usages and sem-tags of Chinese adjectives. "A" denotes adjective; "N" denotes noun; "de" is a Chinese particle denoting modification. In Mandarin Chinese, there are two sub-types of broad-sensed adjectives: narrow adjectives can both be used as predicates and modifiers while distinct words are only modifiers .
We propose practical strategies to improve the performance of our tagging model on differentiating IST and SST in Chinese. The first method is to establish a lexicon, based on the fact that whether an adjective can be used as a predicate is an inherent property. Thus it is possible to distinguish the use of IST and SST by simply referring to a lexicon. Another strategy is rule-based: an adnominal adjective is tagged SST only when it obtains a gradable reading. We stipulate the follow-ing rules: if tokens preceded by attribute adjectives are tagged INT, EQU, MOR and TOP, adjectives should be marked as SST. After uploading the lexicon and rules, the tagging accuracy of IST and SST raise from 68.8% and 63.1% to 81.4% and 77.9%. Overall accuracies after uploading adjective lexicon and rules are shown in Table 8  Named entity Table 9 shows the accuracy of each of NAM (named entity) for English and Chinese. Although named entities are regarded as one of the most frequently corresponding concepts shared by various languages (see §3), marked differences still exist: • The accuracies of each sem-tag of English are generally higher than those of Chinese 6 .
• English presents a lower diversity of performance (73.3%-98.0%) compared with Chinese (58.6%-97.9%).  We propose an explanation on why English and Chinese sem-taggers perform differently on NAM: named entities in English are identified by capitalization while Chinese not. Therefore, it is harder for Chinese to calculate the scope of proper names than English, and the overall accuracy is thus influenced. Moreover, it can also be inferred that Chinese is more sensitive to the length of named entities given its difficulties in judging scope: sem-tags (PER, GPE and UOM) whose accuracies are higher than the average level, are commonly used to annotate one-token units while other below-average tags (GPO, GEO, ORG and ART) annotate multi-word proper nouns. On the contrary, English, with certain markers of named entites, shows that the decrease of accuracy with length is not as prominent as it of Chinese.
Sparse data input DXD of DXS, ITJ, HES and GRE of ACT, EQU of COM and HAP of NAM, whose presences are not enough for training and learning, need more diverse data as input in further research.
6 On Annotating Semantics

Helpfulness of Syntactic Features
The high-quality manual annotation and automatic tagging both indicate the importance of POS tags in the UST-the inter-annotator agreement and tagging accuracies increase after applying POS tags. Huo and de Melo (2020) believe this is because POS tags may facilitate semantic disambiguation though the extra syntactic information. However, what is not revealed is the underlying mechanism under which a syntactic feature can contribute to semantic analysis.
To investigate the impact of POS tags, 50 new sentences of WSJ and their Chinese counterparts are selected for a pilot study. Two annotators are asked to annotate them with or without the assistance of POS tags.  After a detailed investigation, we summarize the influences of POS tags on inter-annotator agreements as two points: (i) Some tokens have multidimensional semantic features and POS tags are likely to make annotators choose sem-tags related to POS features. For instance, unable may be annotated as NOT (negation) or POS (possibility). However, after the introduction of its POS tag, i.e. ADJ, two annotators are more likely to annotate it as IST, which is appropriate for most of adjectives, rather than NOT and POS; (ii) Gerunds which do not take arguments or are not modified by adverbs are more likely to bring challenges as it is difficult for annotators to determine whether event-related sem-tags or concept-related ones are more suitable for them. It is even more difficult for Chinese annotation in which verbs do not have inflected forms. All these can be easily solved by assigning POS tags.
In our view, the reason why POS contribute to semantic annotations can be traced to discussions of theoretical linguistics. Generally speaking, POS is category of words, whose identification has been a controversial problem for a long time in this area. Some linguists are in favor of a syntactic or distributional basis of POS (Harris, 1951;Edmonds, 1967) while others advocate a semantic or notional basis (Lyons, 1966). From a notion-based perspective, assigning forms to concepts, or POS tags and sem-tags to tokens, are all a process of categorizing and classifying objects referred by these tokens, which helps explain why POS tags have a significant influence on semantic sorts. In this regard, annotations are undoubtedly impacted by POS tags. Nonetheless, some researchers rebate it, believing that the notional definitions of POS are not applicable because of its unclearness. According to them, distribution, morphological features, grammatical functions are all useful criteria for the identification of POS. In our view, contradiction between notion-based and distribution-based approach leads to some difficulties in annotation. To avoid this, we applied POS tags which are automatically-generated by the Stanford CoreNLP tool (Manning et al., 2014b) to assist manual annotation.
However, though POS tags actually improve the inter-annotator agreement by regulating manual annotations of sem-tags in two ways, it is not clear whether they improve the quality of annotationsthe first one increases the possibility of one option while the second one directly makes choices for annotators. To what extent more coarsegrained annotating standards contribute to annotations needs further research.

Challenges of Multilingual Annotations
Building comparative semantic representations across languages has been an important topic in recent years as a strategy to both contribute to semantic parsing and syntactic analysis. Existing approaches towards it can be roughly divided into three categories. First, crosslingual approach is proposed, which lends semantic annotation of a resource-rich language to an underresourced language; see e.g. Damonte and Cohen (2018). However, crosslingual divergence between the lender and the borrower is likely to be retained to a considerable extent, especially for the languages which are phylogenetically distant. Another widely-discussed multilingual approach aims to achieve the goal by developing a comparable scheme of annotations for different languages, such as multilingual FrameNet (Baker et al., 1998) and multilingual WordNet (Miller, 1995), whose main limitation is that the semantic information represented is at the risk of oversimplifying since many in-depth properties are language-specific. The third one, the interlingual approach aims to find universal semantic frameworks for all languages. Yet it can be fairly difficult to find such appropriate interlingual frameworks.
In our view, these strategies are employed by researchers to study the major challenge i.e., the divergence of languages, encountered in representing multilingual data. And UST, which is in line with interlingual method, attempts to address it by a relatively shallow scheme. Despite the high inter-annotator agreements and tagging accuracies, there are still some divergences, which requires more in-depth study of multilingual annotation.

Related Work
UST is one of previous attempts of interlingua , which is originally designed to provide necessary information for semantic parsing (Bjerva et al., 2016). Primary automatic sem-taggers are built using convolutional neural networks and deep residual networks (Bjerva et al., 2016). Later, in PMB project , the authors propose a method of projecting automatically annotated semantic tags from a sentence to its sentence-and word-aligned counterparts. Following previous works, an updated universal semantic tagset is later proposed , with a modification of deriving the tagset in a data-driven manner to disambiguate categories. In this work, a tri-gram based tagging model, TnT tagger (Brants, 2000), is also initially explored for bootstrapping utilization. In a recent study built on Bjerva et al. (2016), employing sem-tag in multi-task learning is found to be beneficial to both sem-tag task and other NLP tasks including Universal Dependency POS tagging, Universal Dependency parsing, and Natural Language Inference (Abdou et al., 2018). Overall, these studies indicate that sem-tags are effective in conducting various NLP tasks.

Conclusion
In this paper, we take Chinese into account to provide a more comprehensive tag set based on which we establish a reliable manually-annotated corpus, and show that promising performance of automatic semantic tagging is obtained after employing MTL as well as gold POS tag and leveraging pre-trained models. The overall success of this approach prompts a reflection of universality of different languages and operability of multilingual meaning representation: 1) UST is plausible in general partly because it is delexicalised and can thus represent phylogenetically languages after some adaptions; 2) universality is threatened to some extent because there are aligned but mismatched tokens between English and Chinese, which are caused by grammatical divergence, information loss of translation and different annotation strategies for MWE; and 3) innate crosslingual divergences still exist even in NAM's thought to be the most consistent pairs, which needs further exploration.
Though our work demonstrates the plausibility of developing a shared delexicalised and shallow annotation scheme to mitigate divergences across languages, it seems that more in-depth semantic analysis, especially lexicalised ones, may not be possible to be unified. We think a wider range of languages can be annotated after some minor adaptions of scheme. But it is still unknown how to get deeper processing information on this basis and thus develop an enhanced understanding of multilingual meaning representation.