Bilingual Keyword Extraction and its Educational Application

We introduce a method that extracts keywords in a language with the help of the other. The method involves estimating preferences for topical keywords and fusing language-specific word statistics. At run-time, we transform parallel articles into word graphs, build cross-lingual edges for word statistics integration, and exploit PageRank with word keyness information for keyword extraction. We apply our method to keyword analysis and language learning. Evaluation shows that keyword extraction benefits from cross-language information and language learners benefit from our keywords in reading comprehension test.


Introduction
Keyword extraction algorithms (KEA) have been developed to extract keywords for content understanding, event tracking, or opinion mining. However, most of them calculate article-level word keyness in a single language. The articles' counterparts in another language may have different keyword candidates in mind since languages differ in grammar, phrase structure, and word usage, all of which play a role in word keyness statistics, thus keyword analysis.
Consider the English article in Figure 1. Monolingual KEA, based solely on the English content, may not identify the best keyword set. A better set might be obtained by consulting the article in more than a language (e.g., the Chinese counterparts) in that language divergence such as phrasal structure (i.e., word order), and word usage and word repetition (resulting from word translation or word sense) lead to different views on keywords across languages. Example English-Chinese divergence in Figure 1 includes the word order in the phrase social reintegration and 重 返 社 會 (social translated to 社 會 and reintegration inversely to 重 返 ), many-to-one mapping/translation e.g. both prosthesis and artificial limbs translated to 義 肢 , and one-tomany mapping e.g. physical respectively translated to 物理 and 身體 in context physical therapist and physical rehabilitation. We hypothesize that, with the differences in languages, language-specific word statistics might be fused to contribute to keyword analysis. We present a system, BiKEA, that learns to identify keywords in a language with the help of the other. The cross-language information is expected to reinforce language similarities and respect language dissimilarities, and better understand articles in terms of keywords. An example keyword analysis of an English article is shown in Figure 1. BiKEA has aligned the parallel articles at word level and determined the topical keyword preference scores for words.
BiKEA learns these topic-related scores during training by analyzing a collection of articles.
At run-time, BiKEA transforms an article in a language into PageRank word graph. To hear another side of the story, BiKEA also constructs word graph from its counterpart in another language. These two graphs are then bridged over bilingually equivalent nodes. The bridging is to take language divergence into account and to allow for language-wise interaction over word statistics. At last, BiKEA iterates in bilingual context with word keyness scores to find keywords.
The body of KEA focuses on learning word statistics in document collection. Approaches such as tfidf and entropy, using local document and/or across-document information, pose strong baselines (Liu et al. (2009) and Gebre et al. (2013)). On the other hand, Mihalcea and Tarau (2004) apply PageRank, connecting words locally, to extract essential words. In our work, we integrate globally learned keyword preferences into PageRank to identify keywords.
Recent work has been incorporating semantics into PageRank. For example,  construct PageRank synonym graph to accommodate words with similar meaning. And Huang and Ku (2013) weigh PageRank edges based on nodes' degrees of reference. In contrast, we bridge PageRank word graphs from parallel articles to facilitate re-distribution or interaction of the word statistics of the involved languages.
In studies more closely related to our work,  and Zhao et al. (2011) present PageRank algorithms leveraging article topic information for keyword identification. The main differences from our current work are that the article topics we exploit are specified by humans, not automated systems, and that our PageRank graphs are built and connected bilingually.
In contrast to the previous research on topic modeling (e.g., Zhao and Xing (2007)) and keyword extraction, we present a keyword extraction algorithm that learns topical keyword preferences and bilingually inter-connects PageRank graphs. The bilinguality is to help predict better keywords taking into account the perspectives of the languages involved including the language similarities and dissimilarities. We also use our keywords for educational purpose like reading comprehension.

Problem Statement
We focus on identifying keywords of a given article in a language with the help of the other. Keyword candidates are returned as the output of the system. The returned keyword list can be examined by humans (e.g., for keyword evaluation or language learning), or passed on to article recommendation systems for article retrieval. Therefore, our goal is to return a reasonable-sized set of keyword candidates that contain the given article's essential terms. We now formally describe the problem that we are addressing.
Problem Statement: We are given a bilingual parallel article collection of various topics from social media (e.g., TED), an article ART e in language e, and its counterpart ART c in language c. Our goal is to determine a set of words that are likely to contain important words of ART e . For The English Article: I've been in Afghanistan for 21 years. I work for the Red Cross and I'm a physical therapist. My job is to make arms and legs --well it's not completely true. We do more than that. We provide the patients, the Afghan disabled, first with the physical rehabilitation then with the social reintegration. It's a very logical plan, but it was not always like this. For many years, we were just providing them with artificial limbs. It took quite many years for … English Keywords from Bilingual Perspectives: prosthesis, artificial, leg, rehabilitation, orthopedic, … this, we take into account word keyness w.r.t. ART e 's topic and bridge language-specific statistics of ART e and ART c via bilingual information (e.g., word alignments) such that cross-lingual diversities are valued in extracting keywords in e.

Topical Keyword Preferences
We attempt to estimate language-wise keyword preferences with respect to a wide range of article topics. Basically, the estimation is to calculate word significance in a domain topic. Our learning process has following four stages.
In the first two stages of the learning process, we generate two sets of article and word information. The input to these stages is a set of articles and their domain topics. The output is a set of pairs of article ID and word in the article, e.g., (ID_ART e =1,w e =prosthesis) in language e or (ID_ART c =1,w c =義肢) in language c, and a set of pairs of article topic and word in the article, e.g., (tp e =disability,w e =prosthesis) in e and (tp e =disability,w c =義肢) in c. Note that the topic information is shared across languages, and that, to respect language diversities, words' topical significance is calculated within their specific language and the original language-independent word statistics will later be fused and interact at run-time.
The third stage estimates keyword preferences for words across articles and domain topics using aforementioned (ART,w) and (tp,w) sets. In our paper, simple yet effective tfidf estimation is used: tfidf(w)=freq(ART,w)/appr (ART',w) where term frequency in an article is divided by its appearance in the article collection to distinguish important words from common words.
tfidf takes global information (i.e., article collection) into account, and will be used as keyword preference model in PageRank at runtime which locally connects words (i.e., within articles).

Run-Time Keyword Extraction
Once language-specific keyword preference scores for words are learned, they are stored for run-time reference. BiKEA then uses the procedure in Figure 2 to fuse word statistics across languages to determine keyword list for a given article. In this procedure machine translation technique i.e., IBM word aligner is exploited to glue statistics in the involved languages and make bilingually motivated random-walk algorithm (i.e., PageRank) possible.  In Steps (1) and (2) of Figure 2 we construct PageRank word graphs for the article ART e in language e and its counterpart ART c in language c. They are built independently using the procedure in Figure 3 to respect language properties (such as subject-verb-object or subject-object-verb structure). In the algorithm of Figure 3, EW stores normalized edge weights for word wi and wj (Step (2)). And EW is a v by v matrix where v is the vocabulary size of ART e and ART c . Note that the graph is directed (from words to words that follow) and edge weights are words' co-occurrences within window size WS. Additionally we incorporate edge weight multiplier m>1 to propagate more PageRank scores to content words.  Figure 2 linearly combines word graphs EW e and EW c using α. We use α to balance language properties/statistics, and BiKEA backs off to monolingual KEA if α is one.
In Step (4) for each word alignment (wi c , wj e ), we construct a link between the word nodes with the weight BiWeight. The inter-language link is expected to reinforce language similarities and respect language divergence while the weight is to facilitate cross-language statistics interaction. Word alignments WA are derived using IBM models 1-5 (Och and Ney, 2003). Based on the directional word-aligned entry (wi c , wj e ), the inter-language link is directed from wi c to wj e , i.e. from language c to e. The fusion or bridging of PageRank graphs across languages is expected to help keyword extraction in language e with the statistics in language c. Although alternative approach can be used for bridging, our approach is intuitive, and most importantly in compliance with the directional spirit of PageRank.
Step (6) sets keyword preference model KP using topical preference scores from Section 3.2, while Step (7) initializes KN of PageRank scores or, in our case, word keyness scores. Then we distribute keyness scores until KN converges. In each iteration, a word's keyness score is the linear combination of its keyword preference score and the sum of the propagation of its inbound words' previous PageRank scores. For the word wj e in ART e , any edge (wi e ,wj e ) in ART e , and any edge (wk c ,wj e ) in WA, its new PageRank score is computed as 1, λ α 1, , ∈ 1 α 1, , ∈ 1 λ 1, Once the iterative process stops, we rank words according to their final keyness scores and return N top-ranked words in language e as keyword candidates of the given article ART e .

Data Sets
We collected 3.8M-word English transcripts along with their Chinese counterparts from TED for our experiments. GENIA tagger (Tsuruoka and Tsujii, 2005) was used to lemmatize and part-of-speech tag the English transcripts while CKIP (Ma and Chen, 2003) was used to segment the Chinese.
Fifty parallel articles (approximately 2,500 words per article) were randomly chosen and manually annotated with English keywords for keyword analysis. Table 1 summarizes the keyword extraction results of the baseline tfidf and our best systems on the test set. The evaluation metrics are precision, mean reciprocal rank, and nDCG (Jarvelin and Kekalainen, 2002).

Evaluation on Keywords
As we can see, monolingual PageRank (PR) and bilingual PageRank (BiKEA), using global information tfidf, outperform tfidf. They relatively boost nDCG by 21% and P by 55%. MRR's also indicate their superiority: their toptwo candidates are often keywords vs. the 2 ndranked from tfidf. Encouragingly, BiKEA+tfidf achieves better performance than the strong monolingual PR+tfidf, further improving nDCG relatively by 7.4% and MRR relatively by 9.4%.
Overall, topical keyword preferences and inter-language bridging in PageRank which values language properties/statistics, help keyword extraction.

Application to Language Learning
The role of highlighting keywords in reading comprehension has been attracting interest in the field of language learning and educational psychology (Nist and Hogrebe, 1987;Peterson 1991;Silvers and Kreiner, 1997). In this paper, we further examine keywords in the context of computer assisted language learning. Specifically, we applied our automatic BiKEA to keyword highlighting in reading comprehension and intended to see how much language learners can benefit from BiKEA keywords in reading comprehension test. In our case study, we asked an English professor to set a multiple-choice reading comprehension test based on one English TED transcript (See Figure 4) and recruited 26 second-year college students learning English as a second language. Their proficiency in English was estimated to be of pre-intermediate level.
These students were randomly and evenly divided into experimental (reading the English transcript with BiKEA keywords) and control group (reading without). Promisingly, our keywords helped the students: students in the experimental group achieved better averaged test score (.82) than those in the control group (.74). Relatively, the improvement was 10%. Moreover, post-study survey indicated that 90% of the participants found our keywords helpful for their article reading and key concept grasping. We are analyzing the influence of the highlighted BiKEA keywords on the high-performing students as well as the low-performing students in the test.

Summary
We have introduced a method for extracting keywords in bilingual context. The method involves automatically estimating topical keyword preferences and bridging languagespecific PageRank word statistics. Evaluation shows that the method can yield better keywords than strong monolingual KEA. And a case study indicates that language learners benefit from our keywords in reading comprehension test. Admittedly, using our keywords for educational purposes needs further experiments.