Solving and Generating Chinese Character Riddles

,


Introduction
The riddle is regarded as one of the most unique and vital elements in traditional Chinese culture, which is usually composed of a riddle description * The work was done when the first author and the third author were interns at Microsoft Research Asia. and a corresponding solution. The character riddle is one of the most popular forms of various riddles in which the riddle solution is a single Chinese character. While English words are strings of letters together, Chinese characters are composed of radicals that associate with meaning or metaphor. In other words, Chinese characters are usually positioned into some common structures, such as upperlower structure, left-right structure, inside-outside structure, which means they can be decomposed into other characters or radicals. For example, "好" (good), a character with left-right structure, can be decomposed into "女" (daughter) and "子" (son). As illustrated in Figure 1(a), the left part of "好" is "女" and the right part is "子". "女" and "子" are called the "radical" of "好". One of the most important characteristics of character riddle lies in the structure of Chinese characters. Unlike the common riddles which imply the object in the riddle descriptions, character riddles pay more attention to structures such as combination of radicals and decomposition of characters. According to these characteristics, metaphors in the 千 里 会 千 金 thousand kilometer meet thousand gold 马 女 妈 horse daughter mother Figure 2: An example of Chinese character riddle: The solution "妈" is composed of the radical "女" derived from "千 金" and "马" derived from "千 里".
riddles always imply the radicals of characters. We show an example of a Chinese character riddle in Figure 2. The riddle description is "千 里 会 千 金" and the riddle solution is "妈". In this example, "千 里" (thousand kilometer) aligns with "马" (horse) because in Chinese culture it is said that a good horse can run thousands of kilometers per day. Furthermore, "千 金" (thousand gold) aligns with "女" (daughter) because of the analogy that a daughter is very important in the family. The final solution "妈" is composed of these two metaphors because the radical "女" meets the radical "马". Radicals can be derived not only from the meaning of metaphors, but also from the structure of characters. We will describe the alignments and rules in detail in Section 3.
In this paper, we propose a statistical framework to solve and generate Chinese character riddles. We show our pipeline in Figure 3. First, we learn the common alignments and the combination rules from large riddle-solution pairs which are mined from the Web. The alignments and rules are used to identify the metaphors in the riddles. Second, in the solving phase, we utilize a dynamic programming algorithm on the basis of the alignments and rules to figure out the candidate solutions. For the generating phase, we use a template-based method and a replacementbased method based on the decomposition of the character to generate the candidate riddles. Finally, we employ Ranking SVM to rank the candidates in both the solving and generation task. We conduct the evaluation on 2,000 riddles in the riddle solving task and 100 Chinese characters in the riddle generation task. Experimental results show that the proposed method outperforms baseline methods in the solving task. We also get very promising results in the generation task according to human judges.

Related Work
To the best of our knowledge, no previous work has studied on Chinese riddles. For other languages, there are a few approaches concentrated on solving English riddles. Pepicello and Green (1984) describe the various strategies incorporated in riddles. (De Palma and Weiner, 1992;Weiner and De Palma, 1993) use the knowledge representation system to solve English riddles that consist of a single sentence question followed by a single sentence answer. They propose to build the relation between the phonemic representation and their associated lexical concepts. Binsted and Ritchie (1994) implement a program JAPE which generates riddles from humour-independent lexical entries and evaluate the behaviour of the program by 120 children (Binsted et al., 1997). Olaosun and Faleye (2015) identify meaning construction strategies in selected English riddles in the web and account for the mental processes involved in their production, which shows that the meaning of a riddle is an imposed meaning that relates to the logical, experiential, linguistic, literary and intuitive judgments of the riddles. Besides, there are some studies in Yoruba (Akínyemí, 2015b;Akínyemí, 2015a;Magaji, 2014). All of these works focus on the semantic meaning, which is different from Chinese character riddles that focus on the structure of characters.
Another popular word game is Crossword Puzzles (CPs) that normally has the form of a square or rectangular grid of white and black shaded squares. The white squares on the border of the grid or adjacent to the black ones are associated with clues. Compared with our riddle task, the clues in the CPs are derived from each question where the radicals in solution are derived from the metaphors in the riddles. Proverb (Littman et al., 2002) is the first system for the automatic resolution of CPs. Ernandes et al. (2005) utilize a web-search module to find sensible candidates to questions expressed in natural language and get the final answer by ranking the candidates. And the rule-based module and the dictionary module are mentioned in his work. The tree kernel is used to rerank the candidates proposed by Barlacchi et al. (2014) for automatic resolution of crossword puzzles.
From another perspective, there are a few projects  on Chinese language cultures, such as the couplet generation and the poem generation. A statistical machine translation (SMT) framework is proposed to generate Chinese couplets and classic Chinese poetry (He et al., 2012;Zhou et al., 2009;Jiang and Zhou, 2008). Jiang and Zhou (2008) use a phrasebased SMT model with linguistic filters to generate Chinese couplets satisfied couplet constraints, using both human judgments and BLEU scores as the evaluation. Zhou et al. (2009) use the SMT model to generate quatrain with a human evaluation. He et al. (2012) generate Chinese poems with the given topic words by combining a statistical machine translation model with an ancient poetic phrase taxonomy. Following the approaches in SMT framework, it is valid to regard the metaphors with its radicals as the alignments. There are several works using neural network to generate Chinese poems (Zhang and Lapata, 2014;Yi et al., 2016). Due to the limited data and strict rules, it is hard to transfer to the riddle generation.

Phrase-Radical Alignments and Rules
The metaphor is one of the key components in both solving and generation. On the one hand we need to identify these metaphors since each of them aligns a radical in the final solution. On the other hand, we need to integrate these metaphors into the riddle descriptions to generate riddles. Thus, how to extract the metaphors of riddles becomes a big challenge in our task. Below we introduce our method to extract the metaphors based on the phrase-radical alignments and rules. We exploit the phrase-radical alignments as to de-scribe the simple metaphors, e.g. "千 里" aligns "马", which aligns the phrase and the radical by the meaning. We employ a statistical framework with a word alignment algorithm to automatically mine phrase-radical metaphors from riddle dataset. Considering the alignment is often represented as the matching between successive words in the riddle and a radical in the solution, we propose two methods specifically to extract alignments. The first method in according with (Och and Ney, 2003) is described as follows. With a riddle description q and corresponding solution s, we tokenize the input riddle q to character as (w 1 , w 2 , . . . , w n ) and decompose the solution s into radicals as (r 1 , r 2 , . . . , r m ). We count all as alignments. The second method takes into account more structural information of characters. Let (w 1 , w 2 ) denote two successive characters in the riddle q. If w 1 is a radical of w 2 and the rest parts of w 2 as r appear in the solution q, we strongly support that ((w 1 , w 2 ), r) is a alignment. It is identical if w 2 is a radical of w 1 . We count all alignments and filter out the alignments whose occurrence number is lower than 3. Some high-frequency alignments are shown in Table 1. For example, "四方"(square) aligns "口"(mouth) because of the similar shape and "二十载"(two decades) aligns "艹"(grass) because "艹" looks like two small "十"s.
Besides alignments are represented as common collocations, there is another kind of common metaphors concentrating on the structure of characters. We define 6 categories of rules shown in Table 2 to identify this kind of metaphors. A Bigram Alignments  rule is often represented as an operation that applies to a character for obtaining parts of it as radicals. For example, the character "上" (up) is usually represented as an operation to get the upper radical of the corresponding character. We extract the rules from the phrase-radical alignments we just obtain. In a phrase-radical alignment, if a radical appears in the one part of a character, we support that this radical is derived from this character, which means the other words in the phrase may describe an operation to this character. We replace this radical to a placeholder and generate a candidate rule with the corresponding direction by the radical position in this character. Thus, for each phrase-radical alignment ([w 1 , w n ], r), we count (w 1 , . . . , w i−1 , (.), w i+1 , . . . , w n ) as a potential rule only if r is a radical of w i . We count all rules learned from data, and filter out the rules whose occurrence number is lower than 5. Some rules are shown in Table 2. The word or phrase in the rule "A-B" mostly has the analogous meaning of "removing". The word or phrase in the rule "Half" mostly has the analogous meaning of "half". As for the rules "LeftRemove", "RightRemove", "Up-perRemove" and "LowerRemove", there are usually a word or phrase that means "removing" as well as the others mean the "position" and "direction". We mine 14,090 phrase-radical alignments in total. More than 1,000 Chinese characters have at least one alignment, and there are 27 characters with more than 100 alignments. Common radicals are almost all contained in our alignments set. Chinese character is mostly composed of these common radical, so these alignments are enough for our task. We extract 193 rules in total for all categories of rules, all of them are applied to the riddle solving and the riddle generation.

Solving Chinese Character Riddles
The process of solving riddles has two components. First, we identify the metaphors in the riddle as much as possible by matching the phrase-radical alignments and rules, and integrate these metaphors to obtain a candidate set of solutions. Each candidate contains the corresponding parsing clues that imply how and why it is generated as its features. Second, we employ a ranking model to determine the best solution as output. Below we introduce our method to generate solution candidates, and we will  introduce the ranking model in Section 4.3. It is common that two metaphors do not share a character and the metaphor is composed of successive characters. Therefore, we utilize a dynamic programming algorithm based on the CYK algorithm (Kasami, 1965) to identify the metaphors with the help of the learned alignments and the predefined rules. We describe the algorithm in Algorithm 1.
An example to illustrate our algorithm is "上 岗 必 戴 安 全 帽", where the corresponding solution is "密". As shown in Figure 4, "上 岗"(on sentry) aligns "山" by matching the rule "上(up) (.)" which means to take the upper part of the character "岗". "必" and "戴" aligns itself. And the phrase "安 全 帽"(safety helmet) aligns to the radical "宀" by the alignments because of the analogical shape. Our ranking model will get the final solution "密" by these clues.

Generating Chinese Character Riddles
Two major components are required in the process of riddle generation. The first step is to generate a list of candidates of riddle descriptions for a Chinese character as the solution. The second step is to rank the candidate riddle descriptions and select the top-N (e.g. 10) candidates as the output. Below we introduce our method to generate candidates of riddle descriptions, and we will introduce the ranking model in Section 4.3.
We propose two strategies to generate the candidate riddle descriptions for a given Chinese character, called the template-based method and the replacement based-method, respectively. First we show our template-based method to generate riddles. The most natural method is to connect the metaphor of each radical. For a character and its possible splitting RD = rd i , we select a corresponding metaphor by the alignment or rule, and then we connect all metaphor without any other conjunction words to form a riddle. The further method is to add a few conjunction words between each metaphor, which can make the riddle more coherent. We remove the recognized metaphors in riddle sentences,   and count the unigram and bigram word frequency of the rest words. These words are usually common conjunctions. We sample these words based on the frequency distribution and add them into the riddles to connect the metaphor of each radical. Second, we use an alternative replacement-based method to generate the candidate riddle descriptions. Instead of generating the riddle descriptions totally from scratch, we try to replacement part of an existing riddle to generate a new riddle description. Let w = (w 1 , w 2 , . . . , w n ) denote the word sequence of a riddle description on our dataset, where n denotes the length of the riddle in character. Let [w i , w j ] (i,j ∈ [1,n]) denote the word span that can be aligned to a radical rd, and let X=(x 1 , . . . , x m ) denotes the corresponding phrase descriptions of rd. We then replace [w i , w j ] ∈ X with the other alternative phrases descriptions of rd in X. We try all the possible replacements to generate riddle candidates. This method can generate candidate riddles that are more natural and fluent.

Ranking Model
Above we introduce the algorithm to solve and generate candidates, respectively. Then, we develop a ranking model to determine the final output. Below we show the ranking model.
The ranking score is calculated as where c represents a candidate, g i (c) represents the i-th feature in the ranking model, m represents the number of features in total, and λ i represents the weight of the feature. The features of riddle solving and riddle generation are in Table 3 and Table 4, respectively. We use Ranking SVM (Joachims, 2006) 1 to do the model training to get the feature weights. The weights of the features are trained with riddlesolution pairs. Specifically, in the riddle solving task, for the set of solution candidates, we hold that the original solution as the positive sample and others are the negative samples. Using the dynamic programming algorithm to obtain a list of solution candidates, the training process try to optimize the feature weights so that the ranking score of the original solution is greater than any of the ones from the candidate list. In the riddle generation task, we select 100 characters on the basis of the frequency distribution of characters as a solution. For each character we use the riddle generation module to generate a list of riddle candidates. And we label these candidates manually where the better riddle descriptions get the higher score. Then the training process optimizes the feature weights.

Dataset
We crawl 77,308 character riddles including riddle descriptions with its solution from the Web. All of these riddle-solution pairs concentrate on the structure of characters. A stroke table, that contains 3,755 characters encoded in the first level of GB2312-80, is provided to describe how a Chinese character is decomposed into its corresponding radicals. Characters may have more than one splitting forms and a character is typically composed of no more than 3 radicals.
The data for training language model in riddle style include two parts: One is the corpus of riddles mentioned above, and the other is a corpus of Chinese poem and Chinese couplets because of the similar language style. We follow the method that proposed by (He et al., 2012;Zhou et al., 2009), to download the <Tang Poems>,<Song Poems>, <Ming Poems>, <Qing Poems>, <Tai Poems> from the Internet, and use the method proposed by Fan et al. (2007) to recursively mine those data with the help of some seed poems and couplets. It amounts to more than 3,500,000 sentences and 670,000 couplets. Besides the language model trained in riddle style, we also train a general language model with the web documents.

Evaluation on Riddle Solving
We randomly select 2,000 riddles from the riddle dataset as the test data, and 500 riddles as the development data, while the rest as training data.
Our system always returns a ranking list of candidate solutions, so we use the Acc@k (k = 1, 5, 10) as the evaluation metric. The Acc@k is the fraction of questions which obtain correct answers in their top-k results.
Giza++ (Och, 2001) is a common tool to extract   the alignment between bilingual corpuses. We use it as our baseline system that extracts the alignments automatically. And we use the Jaccard similarity coefficient as the baseline ranking metric. The Jaccard similarity coefficient is defined as: where A means the radicals set of the solution and B means the radicals set of the candidate. The results are reported in the Table 5 and Table 6. The baseline method can only give about one-tenth correct solution at the Acc@1. Compared with the baseline model, by using the alignments extracted by our method, the system can improve 6.7% at the Acc@1 and 6.3% at Acc@10. A phenomenon is that only using the alignments we extract has the better results than combining it with the alignments from Giza++ because metaphors matching between phrases and characters are particular in our riddle task. Small changes in the phrase can affect the character that it implies and it may be not a metaphor even if a character in phrase is changed. Furthermore, by using rules to identify the metaphors in riddles, we get an improvement of 10.1% at Acc@1, which proves the validity of the  rule we define. The results prove that it is valid to use the alignments and rules that we extract to identify the metaphors in our character riddle task. The comparison between Jaccard similarity coefficient and our Ranking SVM method shows that the Ranking SVM is better with an improvement of 2.6% at Acc@1, which prove that compared to the Jaccard similarity coefficient, the Ranking SVM determine the solution more correct if we successfully identify all metaphors in riddle descriptions. Moreover, there is less improvement beyond Acc@5, which means the ranking model gets better results even if the system cannot identify all metaphors in riddle descriptions. We think that unlike the Jaccard similarity coefficient which only uses the features between the candidate character and the correct solution, the ranking model uses extra features in the riddles descriptions, e.g. the number of disappearing radicals, which helps to exclude obvious wrong candidates.

Evaluation on Riddle Generation
Because there is no previous work about Chinese riddle generation, in order to prove its soundness, we conduct human evaluations on this task in accordance with the following two reasons. Firstly, the generated riddles, which is different from the certain and unique solution in the riddle solving task, are varied. So it is hard to measure the quality of generated riddles with a well defined answer set. Secondly, small differences in riddles have a great effect on the corresponding solution. It may imply distinct radicals even if only a character in the metaphors is changed. The existing metrics such as BLEU, are not suitable for our task. Based on above analysis, each riddle that the system generates is evaluated by human annotators according to a 5 division criterion described in Table 7. We randomly sample 100 characters following the distribution of the character as a solution. The

Method
Avg(Score) Template-based Method 3.49 Replacement-based Method 4.14 Riddle from dataset 4.38 Table 8: Human evaluation of different methods system generates riddle descriptions following the methods in Section 4.2 for each character. Sometimes the riddles we generate exist in our training data. We remove these riddles for the reason that we want to evaluate the ability of generating new riddles. In order to avoid the influence of annotators and compare the riddles generated by the system with the riddles written by human beings, the riddles are randomly disordered so that the annotators do not know the generating method of each riddle. For each character, we select 5 riddles generated by the template-base method, 5 riddles generated by the replacement-based method, and 2 riddles from the riddles dataset written by human beings, which form a set of 12 riddles in total. The annotators score each riddle according to the above criterion. The result is shown in Table 8. The riddles written by human beings from the riddle dataset get the highest score than the riddles generated by the system. The riddles generated by the replacementbased method have a greater improvement than the basic template-based method. We consider that the replacement-based method retains some human information, which makes the generated riddles more coherent.
Another result is that the riddle whose solution is a common character or is composed of common radicals gets the higher score, which is explicit that we can get the better results if we have the more alternative metaphors of a radical.
Below we show two examples of the riddle descriptions generated with the solution "思"(miss) which often decompose into "田"(field) and "心"(heart) shown in Figure 1(b).
• 三 星 伴 月 似 画 里 (Three stars with the moon, like in the picture): The radical "田" is the inside part of "画". The shape of "心" is three points and a curved line, which looks like three stars around a crescent.

853
The radical "田" is composed of two "日"s, and "心" occurs in the riddle description. The character "头"(top) means the radical "田" is on the top position.

Conclusion
We introduce a novel approach to solving and generating Chinese character riddles. We extract alignments and rules to capture the metaphors of phrases in riddle descriptions and radicals in the solution characters. In total, we obtain 14,090 alignments that imply the metaphors between phrases and radicals as well as 193 rules in 6 categories formed as regular expressions. To solve riddles, we utilize a dynamic programming algorithm to combine the identified metaphors based on the alignments and rules to obtain the candidate solutions. To generate riddles, we propose a template-based method and a replacement-based method to generate candidate riddle descriptions. We employ the Ranking SVM to rank the candidates on both the riddle solving and generation. Our method outperforms baseline methods in the solving task. We also get promising results in the generation task by human evaluation.