Evaluative Pattern Extraction for Automated Text Generation

Getting travel tips from the experienced blo g-gers and online forums has been one of the important supplements to the travel guidebook in the web society. In this paper we present a novel approach by identifying and extracting evaluative patterns, providing a different li n-guistically - motivated framework for aut o-mated evaluative text generation. We target at domain - specific observation in online travel blogs in Chinese. Results suggest that the s e-mantic prosody accompanying the patterns demonstrates that online travel bloggers prefer to employ tacit pragmatic strategy in pr e-senting their sentiment polarity in comments. The extracted patterns and their differentiation can be beneficial to identifying and characte r-izing evaluative language for further aut o-mated opinion summarization and ma c-ro/micro planning in natural language generation ( NLG ) as well.


Introduction
With the rapidly growing use of the Internet, text mining, sentiment analysis, and evaluative language analysis of online resources are becoming essential issues. Online travel blogs serve as main opinions and comments providers sharing their traveling experiences where the texts are constructed with authors' evaluation about the traveling. The automation of text planning in this domain has become highly demanded. This paper aims to propose a linguistic framework of working with evaluative expressions by examining domainrestricted specialized discourse of traveling articles. Identifying the particular linguistic behaviors and patterns of evaluative language agglomerative structure would facilitate both the macro/ micro planning in NLG in this domain. In online travel blog articles, evaluative language is expressed in several kinds. lexical level terms such as 'recommend', 'delicious', and 'surprise', are explicit evaluations. Other than this, patterns are found and can be generalized into a certain fixed meanings in traveling domain. For instance, 有 N 味 'has the flavor/feeling of N' is a common pattern used as in 有 家鄉 味 'has the feeling of home', 有 台灣 味 'has the feeling of Taiwan' as positive evaluation in the data. We propose to adopt pattern grammar (Hunston, 1999) in approaching the evaluative prosody widely occurred in the travel blogs. Pattern grammar focuses on the concept that meaning belongs to patterns, targeting on the recurring co-occurrences and the particularly shared meanings of lexical item nodes. There is specialized domain-specific grammar not applying to general grammar, resulting a fixed meaning of patterns in that specific domain. As Sinclair (1991) said: "It seems that there is a strong tendency for sense and syntax to be associated", suggesting that meaning and its patterns are highly related. Francis (1993) used the pattern v it adj as an example, which limits the choices of its lexical items on either verbs or adjectives, indicating that the meaning of a pattern is also limited and patterns will occur with words through semantic restriction. Therefore, patterns extracted from the texts should be the primary consideration and observation for natural language processing, particularly for semantic and sentiment analysis, whether as for annotation, summarization or text generation.

Literature Review
In NLG, content determination is an essential process to decide what is the communicated information in texts (Reiter, 1995). In order to generate natural-language text, a system must be able to determine what to include and how to organize the information to achieve its communicative goal most effectively. McKeown (1985) based on discourse strategies as a guide for natural-language text generation, which generated paragraph-length responses. In domain-specific texts such as weather forecast (Adeyanju, 2012), automated text generation is expected to have similar weather conditions where its language pattern is observable. In traveling blog articles, the evaluative language is its dominant feature. Evaluative language has been researched since 1970s, starting from Halliday (1976), with others making further developments or moving on to new approaches such as Chafe (1986), Biber and Finegan (1989), Hunston (1994), Francis (1995), and Martin and White (2000). Hunston (1994Hunston ( , 2000Hunston ( , 2004 defined evaluative language as which is "expressed through language which indexes the act of evaluation or the act of stance-taking. It expresses an attitude towards a person, situation, or other entity and is both subjective and located within a societal value system". It is the driving force behind virtually all communications. (Thompson and Hunston, 2000). Patterns of a word are defined as "all the words and structures which are regularly associated with the word and which contribute to its meaning". The relationship between patterns and lexis is mutually dependent, in that each pattern appears with a limited set of lexical items, and each lexical item occurs with a restricted set of patterns. As patterns are highly associated with meaning, words sharing a given pattern will also tend to share an aspect of meaning (Hunston, 1999). With the concepts combination of evaluative language and pattern grammar, we can discover that how evaluation is spread across texts with fixed meanings. The necessity of examining evaluation language is obvious in that online travel blog articles serve as the purpose for sharing comments and opinions to readers, and to find out if there are certain structures or patterns in the texts are utilizable for generating opinion summaries.

Patterns and Evaluative Meanings in Content Determination
The categorization of evaluation languages is diverse for different research purposes. To fit the communicative goal in the traveling context, where recommendation instead of neutral descriptions is needed, the following relevant attributes are targeted: attraction, hotel, restaurant, food, and event. Among these targets, evaluative expressions are realized in different aspects. For instance, main evaluated aspects for attraction are its environment, transportation, popularity, culture, and so on. While in food, its price, taste, quality, or quantity are main discussed issues.  In this study, data are crawled from ten online travel blogs nominated as the ten most popular online travel blogs in GOLDDOT Award 2015 1 , held by Pixnet in Taiwan, with 540 articles in total. A corpus-based approach is taken for exploring the data and extracting the patterns. As evaluated patterns are embodied within sentences and flexible in its unit, there is no straightforward way to observe them in the corpus. Annotation is based on the attributes mentioned earlier for categorization, using LOPOTATOR, an online linguistic annotation tool designed by LOPE lab 2 . One annotator is involved in annotation process. Chunks are considered as units for patterns detection, mostly restricted in phrasal units, where the evaluator and the evaluation are included so as to know the relationship between the property of evaluated entity and the evaluation expression. For instance, chunk like 值 得一探的美景 'a beautiful view that is worth visiting' will be annotated as with the evaluator 美景 'beautiful view' and its expression 值得一探的 'something which is worth visiting'. The processing pipeline is shown in Figure 1.

Data Annotation and Analysis
Different from previous linguistic formalisms (such as Rhetorical Structure Theory) used in document structuring, where the main focus is hierarchical construct of messages, the evaluative pattern grammar as proposed in this paper explores the linear interaction of lexis and configuration at the evaluative level. In our corpus, lexical items are explicitly observable evaluation, such as 大 'big', 新 'new', 好 'good', 分享 'share', 推薦 'recommend', 喜歡 'like', and 享受 'enjoy' are frequently occurred in the data. Our primary attention here is to extract the fixed patterns denoting fossilized polarity in evaluation co-occurring with a variety of word choices.
Manual annotation for patterns extraction in online travel blog articles provides an exhaustive result of all possible evaluative use. In all annotated units, expressions with similar meanings and structures can be generalized into patterns, generating a fixed basic meaning, where they seem to be neutral but denote a polarity when used in a context. Table 2 summaries the patterns listed by different aspects, with a symbol '+' and '-' representing the polarity being positive or negative the pattern implies. Due to limit of pages a few patterns are listed as instances. Whenever a pattern occurs, it brings out a value merging with the meaning of its variant noun, verb, or adjectives. 非 常有 N 味 'so full of N's flavor or feeling' is taken as an example. In this pattern, it's the comment on the food evaluator that it is 'full of the flavor or feeling' of the noun phrase, with implicit neutral evaluation until noun phrase is filled in, such as 非 常有 家鄉 味 and realized as the meaning of 'full of home's feeling; the food makes you feel or think of home', gaining positive evaluation.  Table 2: Evaluative patterns and data instances. Table 2 are case-specific to the traveling domain, and they can be taken as selfembedded evaluative meaning carriers which are useful cues in content determination in that a pattern can simply be a comment unit shown a posi-tive or negative evaluation toward the evaluated targets.

Figure 2:
User interface snapshots of traveling recommendation searching and searching results. Figure 2 is a temporary template of user interface where users can search for traveling comments or opinions, and the comments can be either using the evaluative patterns generated from our work or the origin sentences from the author. Comments from several authors' comments and scores of the traveling targets are useful when only searching for a single and specific target, such as Taipei 101 or W Hotel. However, common occasions are that people want to know all possible comments on one target, such as recommendation for traveling in Tokyo, with all things might be experienced in Tokyo. Therefore, we create a simplified plan (exemplified in English version) as in Figure 3 for generating the evaluative summary from a single author's traveling article. Parenthesis units such as '(name of the author)' in Figure 3 are information to be extracted from the article, including author's name, places or things experienced by the author with comments. Evaluators are comment units extracting from our pattern generation work. Both opinions are informative generation results.
In short, the identification of evaluative patterns in texts, as inspired by usage-based linguistic pattern grammar theory, can be utilized as a key feature for domain-specialized research on opinion mining and generation in evaluative texts.

Conclusion and Future Work
Due to the socio-pragmatic reasons, the evaluative patterns found in online travel blogs have their own characteristics and therefore call for more attention. On one hand, the recurrent linguistic means of evaluation as performed in texts of this genre are mostly beyond the word level; on the other hand, bloggers often tacitly organize their discourse of feelings or assessments in a relatively polite manner. It constitutes a challenge for content selection and text planning, more linguistic framework should be involved in properly tailoring the data for potential users. The approach proposed in this paper can handle with affective contents as seen crucial in the opinionated text mining and generation, has encountered its limitation mainly related to the annotation process. Manual annotation can achieve higher accuracy in extracting possible patterns, however subjective annotation with only one annotator causes time-consuming and inefficiency problems. There are few studies relating to the evaluative language in online traveling blog domain, this paper serves as a point of departure in discovering the evaluative patterns, and as a reference for probing into other domain-specific evaluative language. Patterns extraction can be applied to other domains and the annotated data can be used for automatic pattern extraction algorithms and for text summarization in the process of document planning in NLG. For text generation, pattern is a significant feature as a representation of the sentiment or polarity toward the evaluation. Automated patterns extraction will be a valuable progress in generating evaluative text summary.