Incorporating Chinese Characters of Words for Lexical Sememe Prediction

Sememes are minimum semantic units of concepts in human languages, such that each word sense is composed of one or multiple sememes. Words are usually manually annotated with their sememes by linguists, and form linguistic common-sense knowledge bases widely used in various NLP tasks. Recently, the lexical sememe prediction task has been introduced. It consists of automatically recommending sememes for words, which is expected to improve annotation efficiency and consistency. However, existing methods of lexical sememe prediction typically rely on the external context of words to represent the meaning, which usually fails to deal with low-frequency and out-of-vocabulary words. To address this issue for Chinese, we propose a novel framework to take advantage of both internal character information and external context information of words. We experiment on HowNet, a Chinese sememe knowledge base, and demonstrate that our framework outperforms state-of-the-art baselines by a large margin, and maintains a robust performance even for low-frequency words.


Introduction
A sememe is an indivisible semantic unit for human languages defined by linguists (Bloomfield, 1926). The semantic meanings of concepts (e.g., words) can be composed by a finite number of sememes. However, the sememe set of a word is * Work done while doing internship at Tsinghua University.
† Equal contribution. Huiming Jin proposed the overall idea, designed the first experiment, conducted both experiments, and wrote the paper; Hao Zhu made suggestions on ensembling, proposed the second experiment, and spent a lot of time on proofreading the paper and making revisions. All authors helped shape the research, analysis and manuscript. ‡ Corresponding author: Z. Liu (liuzy@tsinghua.edu.cn) i Code is available at https://github.com/thunlp/Character-enhanced-Sememe-Prediction  Figure 1: Sememes of the word "铁 匠" (ironsmith) in HowNet, where occupation, human and industrial can be inferred by both external (contexts) and internal (characters) information, while metal is well-captured only by the internal information within the character " 铁" (iron). not explicit, which is why linguists build knowledge bases (KBs) to annotate words with sememes manually.
HowNet is a classical widely-used sememe KB (Dong and Dong, 2006). In HowNet, linguists manually define approximately 2, 000 sememes, and annotate more than 100, 000 common words in Chinese and English with their relevant sememes in hierarchical structures. HowNet is well developed and has a wide range of applications in many NLP tasks, such as word sense disambiguation (Duan et al., 2007), sentiment analysis (Fu et al., 2013;Huang et al., 2014) and cross-lingual word similarity (Xia et al., 2011).
Since new words and phrases are emerging every day and the semantic meanings of existing concepts keep changing, it is time-consuming and work-intensive for human experts to annotate new concepts and maintain consistency for large-scale sememe KBs. To address this issue,  propose an automatic sememe prediction framework to assist linguist annotation. They assumed that words which have similar semantic meanings are likely to share similar sememes. Thus, they propose to represent word meanings as embeddings (Pennington et al., 2014;Mikolov et al., 2013) learned from a large-scale text corpus, and they adopt collaborative filtering (Sarwar et al., 2001) and matrix factorization (Koren et al., 2009) for sememe prediction, which are concluded as Sememe Prediction with Word Embeddings (SPWE) and Sememe Prediction with Sememe Embeddings (SPSE) respectively. However, those methods ignore the internal information within words (e.g., the characters in Chinese words), which is also significant for word understanding, especially for words which are of lowfrequency or do not appear in the corpus at all. In this paper, we take Chinese as an example and explore methods of taking full advantage of both external and internal information of words for sememe prediction.
In Chinese, words are composed of one or multiple characters, and most characters have corresponding semantic meanings. As shown by Yin (1984), more than 90% of Chinese characters in modern Chinese corpora are morphemes. Chinese words can be divided into single-morpheme words and compound words, where compound words account for a dominant proportion. The meanings of compound words are closely related to their internal characters as shown in Fig. 1. Taking a compound word "铁匠" (ironsmith) for instance, it consists of two Chinese characters: "铁" (iron) and "匠" (craftsman), and the semantic meaning of "铁匠" can be inferred from the combination of its two characters (iron + craftsman → ironsmith). Even for some single-morpheme words, their semantic meanings may also be deduced from their characters. For example, both characters of the single-morpheme word "徘徊" (hover) represent the meaning of "hover" or "linger". Therefore, it is intuitive to take the internal character information into consideration for sememe prediction.
In this paper, we propose a novel framework for Character-enhanced Sememe Prediction (CSP), which leverages both internal character information and external context for sememe prediction. CSP predicts the sememe candidates for a target word from its word embedding and the corresponding character embeddings. Specifically, we follow SPWE and SPSE as introduced by  to model external information and propose Sememe Prediction with Word-to-Character Filtering (SPWCF) and Sememe Prediction with Character and Sememe Embeddings (SPCSE) to model internal character information. In our experiments, we evaluate our models on the task of sememe prediction using HowNet. The results show that CSP achieves state-of-the-art performance and stays robust for low-frequency words.
To summarize, the key contributions of this work are as follows: (1) To the best of our knowledge, this work is the first to consider the internal information of characters for sememe prediction.
(2) We propose a sememe prediction framework considering both external and internal information, and show the effectiveness and robustness of our models on a real-world dataset.

Related Work
Knowledge Bases. Knowledge Bases (KBs), aiming to organize human knowledge in structural forms, are playing an increasingly important role as infrastructural facilities of artificial intelligence and natural language processing. KBs rely on manual efforts (Bollacker et al., 2008), automatic extraction (Auer et al., 2007), manual evaluation (Suchanek et al., 2007), automatic completion and alignment (Bordes et al., 2013;Toutanova et al., 2015;Zhu et al., 2017) to build, verify and enrich their contents. WordNet (Miller, 1995) and Ba-belNet (Navigli and Ponzetto, 2012) are the representative of linguist KBs, where words of similar meanings are grouped to form thesaurus (Nastase and Szpakowicz, 2001). Apart from other linguistic KBs, sememe KBs such as HowNet (Dong and Dong, 2006) can play a significant role in understanding the semantic meanings of concepts in human languages and are favorable for various NLP tasks: information structure annotation (Gan and Wong, 2000), word sense disambiguation (Gan et al., 2002), word representation learning (Niu et al., 2017;Faruqui et al., 2015), and sentiment analysis (Fu et al., 2013) inter alia. Hence, lexical sememe prediction is an important task to construct sememe KBs.
Automatic Sememe Prediction. Automatic sememe prediction is proposed by .
For this task, they propose SPWE and SPSE, which are inspired by collaborative filtering (Sarwar et al., 2001) and matrix factorization (Koren et al., 2009) respectively. SPWE recommends the sememes of those words that are close to the unlabelled word in the embedding space. SPSE learns sememe embeddings by matrix factorization (Koren et al., 2009) within the same embedding space of words, and it then recommends the most relevant sememes to the unlabelled word in the embedding space. In these methods, word embeddings are learned based on external context information (Pennington et al., 2014;Mikolov et al., 2013) on large-scale text corpus. These methods do not exploit internal information of words, and fail to handle low-frequency words and outof-vocabulary words. In this paper, we propose to incorporate internal information for lexical sememe prediction.
Subword and Character Level NLP. Subword and character level NLP models the internal information of words, which is especially useful to address the out-of-vocabulary (OOV) problem. Morphology is a typical research area of subword level NLP. Subword level NLP has also been widely considered in many NLP applications, such as keyword spotting (Narasimhan et al., 2014), parsing (Seeker and Ç etinoglu, 2015), machine translation (Dyer et al., 2010), speech recognition (Creutz et al., 2007), and paradigm completion (Sutskever et al., 2014;Bahdanau et al., 2015;Cotterell et al., 2016a;Jin and Kann, 2017). Incorporating subword information for word embeddings (Bojanowski et al., 2017;Cotterell et al., 2016b;Wieting et al., 2016;Yin et al., 2016) facilitates modeling rare words and can improve the performance of several NLP tasks to which the embeddings are applied. Besides, people also consider character embeddings which have been utilized in Chinese word segmentation (Sun et al., 2014).
The success of previous work verifies the feasibility of utilizing internal character information of words. We design our framework for lexical sememe prediction inspired by these methods.

Background and Notation
In this section, we first introduce the organization of sememes, senses and words in HowNet. Then we offer a formal definition of lexical sememe prediction and develop our notation.

Sememes, Senses and Words in HowNet
HowNet provides sememe annotations for Chinese words, where each word is represented as a hierarchical tree-like sememe structure. Specifically, a word in HowNet may have various senses, which respectively represent the semantic meanings of the word in the real world. Each sense is defined as a hierarchical structure of sememes. For instance, as shown in the right part of Fig. 1, the word "铁匠" (ironsmith) has one sense, namely ironsmith. The sense ironsmith is defined by the sememe "人" (human) which is modified by sememe "职位" (occupation), "金属" (metal) and "工" (industrial). In HowNet, linguists use about 2, 000 sememes to describe more than 100, 000 words and phrases in Chinese with various combinations and hierarchical structures.

Formalization of the Task
In this paper, we focus on the relationships between the words and the sememes. Following the settings of , we simply ignore the senses and the hierarchical structure of sememes, and we regard the sememes of all senses of a word together as the sememe set of the word. We now introduce the notation used in this paper. Let G = (W, S, T ) denotes the sememe KB, where W = {w 1 , w 2 , . . . , w |W | } is the set of words, S is the set of sememes, and T ⊆ W × S is the set of relation pairs between words and sememes. We denote the Chinese character set as C, with each word w i ∈ C + . Each word w has its sememe set S w = {s|(w, s) ∈ T }. Take the word "铁匠" (ironsmith) for example, the sememe set S 铁匠 (ironsmith) consists of "人" (human), "职 位" (occupation), "金属" (metal) and "工" (industrial).
Given a word w ∈ C + , the task of lexical sememe prediction aims to predict the corresponding P (s|w) of sememes in S to recommend them to w.

Methodology
In this section, we present our framework for lexical sememe prediction (SP). For each unlabelled word, our framework aims to recommend the most appropriate sememes based on the internal and external information. Because of introducing character information, our framework can work for both high-frequency and low-frequency words.
Our framework is the ensemble of two parts: sememe prediction with internal information (i.e., internal models), and sememe prediction with external information (i.e., external models). Explicitly, we adopt SPWE, SPSE, and their ensemble  as external models, and we take SPWCF, SPCSE, and their ensemble as internal models.
In the following sections, we first introduce SPWE and SPSE. Then, we show the details of SPWCF and SPCSE. Finally, we present the method of model ensembling.

SP with External Information
SPWE and SPSE are introduced by  as the state of the art for sememe prediction. These methods represent word meanings with embeddings learned from external information, and apply the ideas of collaborative filtering and matrix factorization in recommendation systems for sememe predication.
SP with Word Embeddings (SPWE) is based on the assumption that similar words should have similar sememes. In SPWE, the similarity of words are measured by cosine similarity. The score function P (s j |w) of sememe s j given a word w is defined as: where w and w i are pre-trained word embeddings of words w and w i . M ij ∈ {0, 1} indicates the annotation of sememe s j on word w i , where M ij = 1 indicates the word s j ∈ S w i and otherwise is not. r i is the descend cosine word similarity rank between w and w i , and c ∈ (0, 1) is a hyper-parameter.
SP with Sememe Embeddings (SPSE) aims to map sememes into the same low-dimensional space of the word embeddings to predict the semantic correlations of the sememes and the words. This method learns two embeddings s ands for each sememe by solving matrix factorization with the loss function defined as: where M is the same matrix used in SPWE. C indicates the correlations between sememes, in which C jk is defined as the point-wise mutual information PMI(s j , s k ). The sememe embeddings are learned by factorizing the word-sememe matrix M and the sememe-sememe matrix C synchronously with fixed word embeddings. b i and b j denote the bias of w i and s j , and λ is a hyperparameter. Finally, the score of sememe s j given a word w is defined as: (3)

SP with Internal Information
We design two methods for sememe prediction with only internal character information without considering contexts as well as pre-trained word embeddings.

SP with Word-to-Character Filtering (SPWCF)
Inspired by collaborative filtering (Sarwar et al., 2001), we propose to recommend sememes for an unlabelled word according to its similar words based on internal information. Instead of using pre-trained word embeddings, we consider words as similar if they contain the same characters at the same positions. In Chinese, the meaning of a character may vary according to its position within a word . We consider three positions within a word: Begin, Middle, and End. For example, as shown in Fig. 2, the character at the Begin position of the word "火车站" (railway station) is "火" (fire), while "车" (vehicle) and "站" (station) are at the Middle and End position respectively. The character "站" usually means station when it is at the End position, while it usually means stand at the Begin position like in "站立" (stand), "站 岗 哨 兵" (standing guard) and "站 起 来" (stand up). Formally, for a word w = c 1 c 2 ...c |w| , we define π B (w) = {c 1 }, π M (w) = {c 2 , ..., c |w−1| }, π E (w) = {c |w| }, and

Begin End Middle
that represents the score of a sememe s j given a character c and a position p, where π p may be π B , π M , or π E . M is the same matrix used in Eq. (1). Finally, we define the score function P (s j |w) of sememe s j given a word w as: SPWCF is a simple and efficient method. It performs well because compositional semantics are pervasive in Chinese compound words, which makes it straightforward and effective to find similar words according to common characters.

SP with Character and Sememe
Embeddings (SPCSE) The method Sememe Prediction with Word-to-Character Filtering (SPWCF) can effectively recommend the sememes that have strong correlations with characters. However, just like SPWE, it ignores the relations between sememes. Hence, inspired by SPSE, we propose Sememe Prediction with Character and Sememe Embeddings (SPCSE) to take the relations between sememes into account. In SPCSE, we instead learn the sememe embeddings based on internal character information, then compute the semantic distance between sememes and words for prediction. Inspired by GloVe (Pennington et al., 2014) and SPSE, we adopt matrix factorization in SPCSE, by decomposing the word-sememe matrix and the sememe-sememe matrix simultaneously. Instead of using pre-trained word embeddings in SPSE, we use pre-trained character embeddings in SPCSE. Since the ambiguity of characters is stronger than that of words, multiple embeddings are learned for each character . We select the most representative character and its embedding to represent the word meaning. Because low-frequency characters are much rare than those low-frequency words, and even lowfrequency words are usually composed of common characters, it is feasible to use pre-trained character embeddings to represent rare words. During factorizing the word-sememe matrix, the character embeddings are fixed.
We set N e as the number of embeddings for each character, and each character c has N e embeddings c 1 , ..., c Ne . Given a word w and a sememe s, we select the embedding of a character of w closest to the sememe embedding by cosine distance as the representation of the word w, as shown in Fig. 3. Specifically, given a word w = c 1 ...c |w| and a sememe s j , we definê k,r = arg min k,r 1 − cos(c r k , (s j +s j )) , (6) wherek andr indicate the indices of the character and its embedding closest to the sememe s j in the semantic space. With the same word-sememe matrix M and sememe-sememe correlation matrix C in Eq.
(2), we learn the sememe embeddings with the loss function: where s j ands j are the sememe embeddings for sememe s j , and cr k is the embedding of the character that is the closest to sememe s j within w i . Note that, as the characters and the words are not embedded into the same semantic space, we learn new sememe embeddings instead of using those learned in SPSE, hence we use different notations for the sake of distinction. b c k and b j denote the biases of c k and s j , and λ is the hyper-parameter adjusting the two parts. Finally, the score function of word w = c 1 ...c |w| is defined as: P (s j |w) ∼ cr k · s j +s j . (8)

Model Ensembling
SPWCF / SPCSE and SPWE / SPSE take different sources of information as input, which means that they have different characteristics: SPWCF / SPCSE only have access to internal information, while SPWE / SPSE can only make use of external information. On the other hand, just like the difference between SPWE and SPSE, SPWCF originates from collaborative filtering, whereas SPCSE uses matrix factorization. All of those methods have in common that they tend to recommend the sememes of similar words, but they diverge in their interpretation of similar. Hence, to obtain better prediction performance, it is necessary to combine these models. We denote the ensemble of SPWCF and SPCSE as the internal model, and we denote the ensemble of SPWE and SPSE as the external model. The ensemble of the internal and the external models is our novel framework CSP. In practice, for words with reliable word embeddings, i.e., highfrequency words, we can use the integration of the internal and the external models; for words with extremely low frequencies (e.g., having no reliable word embeddings), we can just use the internal model and ignore the external model, because the external information is noise in this case. Fig. 4 shows model ensembling in different scenarios. For the sake of comparison, we use the integration of SPWCF, SPCSE, SPWE, and SPSE as CSP in our all experiments. In this paper, two models are integrated by simple weighted addition.

Experiments
In this section, we evaluate our models on the task of sememe prediction. Additionally, we analyze the performance of different methods for various word frequencies. We also execute an elaborate case study to demonstrate the mechanism of our methods and the advantages of using internal information.

Dataset
We use the human-annotated sememe KB HowNet for sememe prediction. In HowNet, 103, 843 words are annotated with 212, 539 senses, and each sense is defined as a hierarchical structure of sememes. There are about 2, 000 sememes in HowNet. However, the frequencies of some sememes in HowNet are very low, such that we consider them unimportant and remove them. Our final dataset contains 1, 400 sememes. For learning the word and character embeddings, we use the Sogou-T corpus ii (Liu et al., 2012), which contains 2.7 billion words.

Experimental Settings
In our experiments, we evaluate SPWCF, SPCSE, and SPWCF + SPCSE which only use internal information, and the ensemble framework CSP which uses both internal and external information for sememe prediction. We use the stateof-the-art models from  as our baselines. Additionally, we use the SPWE model with word embeddings learned by fastText (Bojanowski et al., 2017) that considers both internal and external information as a baseline.
For the convenience of comparison, we select 60, 000 high-frequency words in Sogou-T corpus from HowNet. We divide the 60, 000 words into train, dev, and test sets of size 48, 000, 6, 000, and 6, 000, respectively, and we keep them fixed throughout all experiments except for Section 5.4. In Section 5.4, we utilize the same train and dev sets, but use other words from HowNet as the test set to analyze the performance of our methods for different word frequency scenarios. We select the hyper-parameters on the dev set for all models including the baselines and report the evaluation results on the test set.
We set the dimensions of the word, sememe, and character embeddings to be 200. The word embeddings are learned by GloVe (Pennington et al., 2014). For the baselines, in SPWE, the hyper-parameter c is set to 0.8, and the model considers no more than K = 100 nearest words. We set the probability of decomposing zero elements in the word-sememe matrix in SPSE to be 0.5%. λ in Eq. (2) is 0.5. The model is trained for 20 epochs, and the initial learning rate is 0.01, which decreases through iterations. For fastText, we use skip-gram with hierarchical softmax to learn word embeddings, and we set the minimum length of character n-grams to be 1 and the maximum length ii Sogou-T corpus is provided by Sogou Inc., a Chinese commercial search engine company. https://www. sogou.com/labs/resource/t.php of character n-grams to be 2. For model ensembling, we use λ SPWE λ SPSE = 2.1 as the addition weight. For SPCSE, we use Cluster-based Character Embeddings  to learn pretrained character embeddings, and we set N e to be 3. We set λ in Eq. (7) to be 0.1, and the model is trained for 20 epochs. The initial learning rate is 0.01 and decreases during training as well. Since generally each character can relate to about 15 -20 sememes, we set the probability of decomposing zero elements in the word-sememe matrix in SPCSE to be 2.5%. The ensemble weight of SP-WCF and SPCSE λ SPWCF λ SPCSE = 4.0. For better performance of the final ensemble model CSP, we set λ = 0.1 and λ SPWE λ SPSE = 0.3125, though 0.5 and 2.1 are the best for SPSE and SPWE + SPSE. Finally, we choose λ internal λ external = 1.0 to integrate the internal and external models.

Evaluation Protocol
The task of sememe prediction aims to recommend appropriate sememes for unlabelled words. We cast this as a multi-label classification task, and adopt mean average precision (MAP) as the evaluation metric. For each unlabelled word in the test set, we rank all sememe candidates with the scores given by our models as well as baselines, and we report the MAP results. The results are reported on the test set, and the hyper-parameters are tuned on the dev set.

Experiment Results
The evaluation results are shown in (1) Considerable improvements are obtained via model ensembling, and the CSP model achieves state-of-the-art performance. CSP combines the internal character information with the external context information, which significantly and consistently improves performance on sememe prediction. Our results confirm the effectiveness of a combination of internal and external information for sememe prediction; since different models focus on different features of the inputs, the ensemble model can absorb the advantages of both methods.
(2) The performance of SPWCF + SPCSE is better than that of SPSE, which means using only internal information could already give good results for sememe prediction as well. Moreover, in internal models, SPWCF performs much better than SPCSE, which also implies the strong power of collaborative filtering.
(3) The performance of SPWCF + SPCSE is worse than SPWE + SPSE. This indicates that it is still difficult to figure out the semantic meanings of a word without contextual information, due to the ambiguity and meaning vagueness of internal characters. Moreover, some words are not compound words (e.g., single-morpheme words or transliterated words), whose meanings can hardly be inferred directly by their characters. In Chinese, internal character information is just partial knowledge. We present the results of SPWCF and SPCSE merely to show the capability to use the internal information in isolation. In our case study, we will demonstrate that internal models are powerful for low-frequency words, and can be used to predict senses that do not appear in the corpus.

Analysis on Different Word Frequencies
To verify the effectiveness of our models on different word frequencies, we incorporate the remaining words in HowNet iii into the test set. Since the remaining words are low-frequency, we mainly focus on words with long-tail distribution. We count the number of occurrences in the corpus for each word in the test set and group them into eight categories by their frequency. The evaluation results are shown in Table 2, from which we can observe that: iii In detail, we do not use the numeral words, punctuations, single-character words, the words do not appear in Sogou-T corpus (because they need to appear at least for one time to get the word embeddings), and foreign abbreviations.
word frequency 50 51-100 101 -1,000 1,001 -5,000 5,001 -10,000 10,    (1) The performances of SPSE, SPWE, and SPWE + SPSE decrease dramatically with low-frequency words compared to those with high-frequency words. On the contrary, the performances of SPWCF, SPCSE, and SP-WCF + SPCSE, though weaker than that on highfrequency words, is not strongly influenced in the long-tail scenario. The performance of CSP also drops since CSP also uses external information, which is not sufficient with low-frequency words. These results show that the word frequencies and the quality of word embeddings can influence the performance of sememe prediction methods, especially for external models which mainly concentrate on the word itself. However, the internal models are more robust when encountering longtail distributions. Although words do not need to appear too many times for learning good word embeddings, it is still hard for external models to recommend sememes for low-frequency words. While since internal models do not use external word embeddings, they can still work in such scenario. As for the performance on high-frequency words, since these words are used widely, the ambiguity of high-frequency words is thus much stronger, while the internal models are still stable for high-frequency words.
(2) The results also indicate that even lowfrequency words in Chinese are mostly composed of common characters, and thus it is possible to utilize internal character information for sememe prediction on words with long-tail distribution (even on those new words that never appear in the corpus). Moreover, the stability of the MAP scores given by our methods on various word frequencies also reflects the reliability and universality of our models in real-world sememe annotations in HowNet. We will give detailed analysis in our case study.

Case Study
The results of our main experiments already show the effectiveness of our models. In this case study, we further investigate the outputs of our models to confirm that character-level knowledge is truly incorporated into sememe prediction.
The word "钟表匠" (clockmaker) is composed of three characters: "钟" (bell, clock), "表" (clock, watch) and "匠" (craftsman). Humans can intuitively conclude that clock + craftsman → clockmaker. However, the external model does not per-form well for this example. If we investigate the word embedding of "钟表匠" (clockmaker), we can know why this method recommends these unreasonable sememes. The closest 5 words in the train set to "钟表匠" (clockmaker) by cosine similarity of their embeddings are: "瑞士" (Switzerland), "卢 梭" (Jean-Jacques Rousseau), "鞋 匠" (cobbler), "发明家" (inventor) and "奥地利人" (Austrian). Note that none of these words are directly relevant to bells, clocks or watches. Hence, the sememes "时间" (time), "告诉" (tell), and "用 具" (tool) cannot be inferred by those words, even though the correlations between sememes are introduced by SPSE. In fact, those words are related to clocks in an indirect way: Switzerland is famous for watch industry; Rousseau was born into a family that had a tradition of watchmaking; cobbler and inventor are two kinds of occupations as well. With the above reasons, those words usually co-occur with "钟表匠" (clockmaker), or usually appear in similar contexts as "钟表匠" (clockmaker). It indicates that related word embeddings as used in an external model do not always recommend related sememes.
The word "奥斯卡" (Oscar) is created by the pronunciation of Oscar. Therefore, the meaning of each character in "奥斯卡" (Oscar) is unrelated to the meaning of the word. Moreover, the characters "奥", "斯", and "卡" are common among transliterated words, thus the internal method recommends "专" (ProperName) and "地方" (place), etc., since many transliterated words are proper nouns or place names.

Conclusion and Future Work
In this paper, we introduced character-level internal information for lexical sememe prediction in Chinese, in order to alleviate the problems caused by the exclusive use of external information. We proposed a Character-enhanced Sememe Prediction (CSP) framework which integrates both internal and external information for lexical sememe prediction and proposed two methods for utilizing internal information. We evaluated our CSP framework on the classical manually annotated sememe KB HowNet. In our experiments, our methods achieved promising results and outperformed the state of the art on sememe prediction, especially for low-frequency words.
We will explore the following research directions in the future: (1) Concepts in HowNet are an-notated with hierarchical structures of senses and sememes, but those are not considered in this paper. In the future, we will take structured annotations into account. (2) It would be meaningful to take more information into account for blending external and internal information and design more sophisticated methods. (3) Besides Chinese, many other languages have rich subword-level information. In the future, we will explore methods of exploiting internal information in other languages. (4) We believe that sememes are universal for all human languages. We will explore a general framework to recommend and utilize sememes for other NLP tasks.