Chengyu Cloze Test

We present a neural recommendation model for Chengyu, which is a special type of Chinese idiom. Given a query, which is a sentence with an empty slot where the Chengyu is taken out, our model will recommend the best Chengyu candidate that best fits the slot context. The main challenge lies in that the literal meaning of a Chengyu is usually very different from it’s figurative meaning. We propose a new neural approach to leverage the definition of each Chengyu and incorporate it as background knowledge. Experiments on both Chengyu cloze test and coherence checking in college entrance exams show that our system achieves 89.5% accuracy on cloze test and outperforms human subjects who attended competitive universities in China. We will make all of our data sets and resources publicly available as a new benchmark for research purposes.


Introduction
Chengyu ("成 语", literal translation: "form phrases") is a special type of Chinese idiom, and represents one of the most beautiful, fascinating and unique aspects of the Chinese language. 96% Chengyus consist of four characters each. Chengyus were mainly created from ancient stories, literature and sayings which can be traced back to thousands of years ago. Some examples are shown in Table 1. More than 7,000 Chengyus are still widely used in the modern Chinese, Japanese, Korean and Vietnamese languages. Like idioms in other languages, using Chengyu appropriately makes communication more compelling and engaging because they introduce powerful imagery and figurative meanings that differ from their literal meanings.
When learning Chinese phrases, Chengyus are always the most difficult to understand and mem-1 https://github.com/bazingagin/chengyu data orize. Second-language learners generally have a love-hate relation with Chengyu and tend to avoid it. A typical way to measure a Chinese learner's Chengyu knowledge is "Cloze Test", in which the learner is asked to supply the best Chengyu that has been removed from a sentence. It's considered as one of the most difficult problems in Chinese college entrance language and literature exams, and has been the focus of several TV talent shows in China such as the Chinese Idiom Congress by CCTV. This motivated us to develop the first Chengyu recommendation system to assist Chinese learners. Given a context sentence ("query") with a Chengyu removed, the system will automatically recommend the best Chengyu to fill in the blank.
The four characters in each Chengyu are often unintelligible without understanding the background story. For example, "沉 鱼 落 雁 (literal translation: sink fish fall swallow)" and "闭月羞 花 (literal translation: hide moon shame flower)" were used to summarize four stories of the top four beauties in ancient China: Xi Shi, Wang Zhaojun, Diao Chan and Yang Yuhuan. They were being so beautiful that fish sank, birds fell from the sky, the moon hid, and flowers were shamed. As a result, we cannot compose the meaning of a Chengyu only based on its four characters. Moreover, each Chengyu is highly succinct, compact and synthetic. For example, "一日三秋 (literal translation: one day three autums)" means greatly missing someone so that one day feels as long as three years. However, its key meaning "missing" is not in this Chengyu.
To address these challenges, we create a new Chengyu Cloze Test benchmark, which consists of 108,987 query sentences and 7,395 target Chengyus. Each Chengyu is associated with a definition, which describes its general meanings and scenarios where it occurs. Then we develop an Origin Query

Recommended Chengyu and Its Definition
Historical Story ⽂学接受史上经常有这样的现象，某 些作品在它的那个时代曾经风⾏⼀ 时，_____。 Throughout history some literary works were extremely popular, so much so that ____.

Ancient
Chinese Literature
⽩驹过隙 (time passes quickly like a white pony's shadow across a crevice) 《庄⼦》："⼈⽣天地之间，若⽩驹之过隙，忽然⽽已。" Chuang Tzu said ``Human life between heaven and earth is like the white pony seen through a crack in the wall, it's just a moment.
Foreign Literature 我们喜欢⽤经济去控制⼀个国家的命 脉,⽤信仰去控制⼀个种族,⽤利益让 别⼈为我们____。 We like to use economy to control a country's faith, use belief to control a race and use profit to control others so they can ____ for us.

⽕中取栗 (pull chestnuts from the embers) 出⾃⼗七世纪法国寓⾔诗⼈拉·封丹的寓⾔《猴⼦与猫》。⽐喻受⼈利⽤ 去冒险，吃了苦头却得不到⼀点好处。
From the 17 century French fabulist Jean de la Fontaine's "The Monkey and the Cat". Bertrand the monkey persuades Raton the cat to pull chestnuts from the embers amongst which they are roasting, promising him a share. As the cat scoops them from the fire one by one, burning his paw in the process, the monkey gobbles them up. It's used to describe a person used unwittingly or unwillingly by another to accomplish the other's own purpose with his own risk but gets nothing.

⽩璧微瑕 (white jade with a little blemish) 洁⽩的⽟上有些⼩斑点。⽐喻很好的⼈或物有些⼩缺点，美中不⾜。
A flaw in a white jade. It's a metaphor for a good person or a good thing with a little defect.  iter and Schmidhuber, 1997). To better capture the correlation between the query and the definition, we apply a soft attention to assign a weight to each word in the query sentence, and predict a matching score for each candidate Chengyu. Our system significantly outperforms human learners who attended top universities in China.

Related Work
Our Chengyu cloze test task is similar to reading comprehension (Hermann et al., 2015;Cui et al., 2016;Kadlec et al., 2016;Seo et al., 2016). However, it's more challenging because the context includes a sentence instead of a paragraph, the Chengyu phrase itself does not convey its figurative meaning, and there are many more candidate answers. Very few Natural Language Processing techniques have been applied to understand or recommend Chengyu.
Chung (2009) (Xu et al., 2010) and improve Chinese word segmentation (Chan and Chong, 2008;Sun and Xu, 2011;Wang and Xu, 2017). Chengyus differ from metaphors in other languages (Tsvetkov et al., 2014;Shutova, 2010) because they do not follow the grammatical structure and syntax of the modern Chinese.
3 Approach Figure 1 shows the overall architecture of our approach. For a query and the definition of a candidate Chengyu, we first apply a word segmentation tool jieba 2 to segment query and definition into words, and apply a Bi-LSTM network to encode each word with a contextual embedding. In order to better capture the correlation between a query and a Chengyu, we further compare the representations of the Chengyu definition and the contextual embedding of each word in the query, and take the weighted sum of the query word contextual embeddings as input to a linear function to determine the probability score of the candidate Chengyu. Next we show the approach details.
Encoding Given a query q and a Chenyu definition d j from the target Chengyu database D = {d 1 , d 2 , ..., d m }, we apply two Bi-LSTM networks to encode them separately. Each Bi-LSTM network leverages long distance features from the whole sentence to capture the context information by using a memory cell (Hochreiter and Schmidhuber, 1997). Each word in q and d j is assigned a contextual embedding.
Attention To better capture the correlation between a query and each Chengyu definition, we use an attention mechanism (Bahdanau et al., 2014;Sutskever et al., 2014) to compare the semantic relatedness of each word in the query sentence with the meaning of each Chengyu definition.
Given the hidden states H = h 0 , h 1 , ..., h n of the Bi-LSTM encoding the query sentence, where h i denotes the concatenation of the hidden states of word w i with forward and backward LSTMs, the attention layer sum over h i with learnable weight α: R = n i=1 α i · h i , where R is the weighted sum vector representation of the query. α i is a learnable weight which is computed by α i = exp(e i ) n i=1 exp(e i ) and e i = d T ·W α ·h i , where W α is a parameter to capture the relevance between a query and a definition flexibly . d T is the last hidden hidden state of the Bi-LSTM encoding the definition.
Training With the weighted sum vector representation of the query R, we apply a softmax function to compute the probability of each candidate Chengyu d j to be filled into the slot.
where W β maps the final representation of the query into R m , and m is the number of classes. Then we optimize the log likelihood: L = m j=1 y j log(p j ), where y j is 0 or 1 depending on if the truth is Chengyu d j or not.
Prediction For prediction, we take a query with each Chengyu definition (q, d j ), 1 ≤ j ≤ m as input, and predict a probability matrix M ∈ R m×m , where m is the number of candidates. For example, a choose-one-from-four task will have m = 4. The final predicted Chengyu d j is selected by argmax(M[:, j]), 1 ≤ j ≤ m.

Data and Setting
We crawled 108,987 sentences including 7,395 unique idioms from http://zaojv.com, and the definitions of these idioms from http:// cy.5156edu.com. Training and test set contain 108,432 and 555 sentences, and 7,071 and 508 Chengyus respectively. We use the whole Chengyu dataset to train word embeddings. We perform two tests: (1) cloze test: for each sentence in the test set, we take out the ground-truth Chengyu, and let the system select a Chengyu TYPE QUERY SYSTEM GROUND TRUTH ANALYSIS

这事已势不可遏，任何想阻挡他 的⼈都如____，简直是不⾃量 ⼒。
This event is unstoppable, anyone who tries to stop it will be like ____, almost not recognizing his/her own limited power.
蚍蜉撼树 an ant shaking a tree, to describe one fails to recognize one's limited power 蚍蜉撼树 an ant shaking a tree, to describe one fails to recognize one's limited power The definition significantly enriches the semantic meanings of Chengyu itself. 蚍蜉撼树(an ant shaking a tree) is a metaphor to describe ⾃不 量⼒(fail to recognize one's own limited power).

Attention Mechanism
刘备思贤若渴，三请诸葛亮的故 事在我国可是____，⼈⼈皆知的 佳话。 It's ____ well known by everyone in our country that Liu Bei was eager to recruit talents and invited Zhu Geliang three times.
家喻户晓 well known by every family 家喻户晓 well known by every family By incorporating the attention mechanism, our approach can better capture the correlations between query context and Chengyu definition. our approach successfully selects 家喻户晓 (well known by every family) to fill in the slot since it shares similar semantic meanings with query context word 知 (known).

村上春树____，29岁才写他的第 ⼀部作品。
Haruki Murakami ____, he was already at age 29 when he wrote his first works.
画龙点睛 bring the painted dragon to life by putting in the pupils of its eyes ⼤器晚成 takes a long time to make a great instrument We need to know "age 29" is relatively late to produce the first works for a writer.

逍遥法外 at large
背井离乡 leave one's hometown Our system focused on the shared meaning of escape/leave while ignored this Chengyu has a specific object "the arm of the law".

多少⼈认为⼀个作家不仅能妙笔 ⽣花，也是____的。
Many people think that a writer can not only write like an angel but also ____.

你在他⾯前说那些话，实在是班 门弄斧，不知____。
The words you said in front of him were really like showing off axe in front of Lu Ban, without knowing ____.
孤陋寡闻 with very limited knowledge and scanty information 天⾼地厚 high as heaven, deep as earth Our system did not detect negation clues and thus failed to select the right Chengyu antonyms.

写⽂章先要构思好，不要下笔千 ⾔，____。
We should think about the plot carefully before write an article, don't write down thousands of words, ____.
词不达意 the words fail to express the meaning 离题万⾥ get away from the title ten thousands of miles When multiple Chengyus appear in the same query sentence, they tend to follow the same grammatical structure.

爱是⼈性的美的⼒量，爱是爱你年 少时的____，更爱你年⽼时⽩发苍 苍。
Love is the beauty of humanity. To love is to love your youthful vigor like ____ as well as your gray hair.

high-spirited and vigorous
桃之夭夭 the peach trees in full blossom Multiple chengyus tend to appear in rhythmical form. In this example, "苍苍"(pronunciation: Cāng Cāng) and "夭夭"(Yāo Yāo) are both reduplication with similar vowel pronunciations. (2) coherence checking in college entrance exam: we collected 14 problem sets from (1998,2000) China college entrance exam, where each problem set consists of four sentences including Chengyus. We let the system select the sentence that contains the most appropriate Chengyu that fits into the context in a coherent way. For comparison with human, we asked two Chinese native speakers (not system developers) who attended top universities in China to perform the same tests.   Table 3 shows our approach achieves comparable performance as human experts. For 18% of our system recommended Chengyus which don't exactly match the ground truth, they are also acceptable choices for the given query contexts.

Results and Analysis
For example, our system output "白驹过隙(time passes quickly like a white pony's shadow across a crevice)" and ground truth "光阴似箭(time flies)" are near synonyms. Table 2 shows some correct examples and the remaining challenges that require capabilities beyond lexical semantics.

Conclusions and Future Work
We created a new benchmark dataset for a new task of Chengyu cloze test. We also proposed a neural model which leverages the definitions of Chengyu as background knowledge and outperforms human experts. In the future we will explore collective inference to rank multiple Chengyus in the same discourse simultaneously, and incorporate richer linguistic clues based on structures and rhythms.