Using lexical level information in discourse structures for Basque sentiment analysis

Systems for opinion and sentiment analysis rely on different resources: a lexicon, annotated corpora and constraints (morphological, syntactic or discursive), depending on the nature of the language or text type. In this respect, Basque is a language with fewer linguistic resources and tools than other languages, like English or Spanish. The aim of this work is to study whether some kinds of discourse structures based on nuclearity are suf ﬁ cient to correctly assign positive and negative polarity with a lexicon-based approach for sentiment analysis. The evaluation is performed in two phases: i ) Text extraction following some constraints on discourse structure from manually annotated trees. ii ) Automatic annotation of semantic orientation (or polarity). Results show that the method is useful to detect all positive cases, but fails with the negative ones. An error analysis shows that negative cases have to be addressed in a different way. The immediate results of this work include an evaluation on how discourse structure can be exploited in Basque. In the future, we will also publish a manually created Basque dictionary to use in sentiment analysis tasks.


Introduction
Sentiment analysis is "the field of study that analyzes people's opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes" (Liu, 2012, p. 7).
Automatic sentiment analysis is an area in continuous development. It first started with the identification of subjectivity (Wiebe, 2000) and, after that, polarity identification and measurement of strength have become the center of new developments (Turney, 2002). The objectives of sentiment analysis are evolving as well, as different types of information are used. For instance, initially, entity-and aspectbased information was used (Hu and Liu, 2004) but, later, new types of information, such as discourse structure information, have been used (Polanyi and Zaenen, 2006). 1 This study is the first work that examines lexical and discourse structure information for sentiment analysis of Basque. The main aim is to evaluate which discourse structures can help in polarity detection following a lexicon-based approach. Our hypothesis is that some discourse structures are more related to opinions than others and we want to identify and study how they can help in a sentiment analysis task.
The paper is organized as follows: Section 2 discusses related works. Section 3 explains the methodology of the study and Section 4 presents the results and error analysis. Finally, conclusions and future work are given in Section 5.

Related Work
Various studies from different theoretical approaches analyze the influence of nuclearity and some rhetorical relations in sentiment analysis tasks. For example, Zhou et al. (2011) use discursive in-1 See a detailed review of sentiment analysis in Taboada (2016).

39
Proceedings of the 6th Workshop Recent Advances in RST and Related Formalisms, pages 39−47, Santiago de Compostela, Spain, September 4 2017. c �2017 Association for Computational Linguistics formation in Chinese to eliminate noise at the intrasentence level, improving not only polarity classification but also the labeling of rhetorical relations at sentence level. Wu and Qiu (2012) analyze sentiment analysis based on Rhetorical Structure Theory (RST) (Mann and Thompson, 1988) in Chinese texts. They split texts in segments and, then, they train weights taking into account relations and nuclearity, showing that CONTRAST, CAUSE, CONDITION and GENERAL-IZATION have a more important role in this task than other discourse relations. Bhatia et al. (2015) use a simpler classification of relations into CONTRAST or NON-CONTRAST, and they show that the distinction improves the results of bag-of-words classifiers using Rhetorical Recursive Neural Networks. Chardon et al. (2013) rate documents using three approaches: i) bag-of-words, ii) partial discourse information and iii) full discourse information. The discursive approach gives the best result in the framework of Segmented Discursive Representation Theory (SDRT). Trnavac et al. (2016) propose that a few rhetorical relations have a significant effect on polarity: CON-CESSION, CONTRAST, EVALUATION and RESULT. They also conclude that nuclei tend to contain more evaluative words than satellites. Alkorta et al. (2015) analyze which features perform better in order to detect the polarity of texts using machine learning techniques on Basque texts. Their results show that discourse structure is needed to improve results along with other types of features. They use a dictionary created by automatic means with an unsupervised method (Vicente et al., 2017).
The dictionary values of their work are binary (−1 for negative polarity and +1 for a positive one).
In this work, we analyze which coherence relations could help to improve lexicon-based sentiment analysis, so that we can assign different weights to discourse structures following Bhatia et al. (2015) when calculating sentiment analysis for a whole text. For this task, we use the RST framework.
The main contributions of this work are: i) A fine-grained dictionary, manually created for Basque with 5 different negative values and 5 different positive ones, ranging from −5 to +5. ii) A study of how discourse structure interacts with this polarity lexicon.

Methodology
The subsections below detail the main steps followed in the present study.

Extraction of discourse structures
In the first phase, different discourse structures were compared. They will be used to determine which ones can be helpful in sentiment analysis. To extract as many discourse structures as possible, we use the corpus described in Alkorta et al. (2016), annotated for discourse relations according to RST.
The corpus contains 29 book reviews. Regarding polarity, it is a balanced corpus, with 14 positive reviews and 15 negative ones. The majority of reviews were collected from a website specialized in Basque literary reviews (Kritiken Hemeroteka). 2 The following subcorpora were created, following some discourse constraints: − Full text, containing all the RS-tree of the text. − Text spans extracted from the CU of the text and from the central subconstituent (CS) 4 of some rhetorical relations (see Table 1).  We extracted 139 instances of rhetorical relations from our corpus. For some relations, such as ELAB-ORATION and PREPARATION (66 of 139), we do not expect them to contain important polarity information, because these relations only add extra information to the central unit. In fact, Mann and Thompson (1988, p. 273) mention that in the case of ELABORATION "R(eader) recognized the situation presented in S(atellite) as providing additional detail for N(uclei). R(eader) identifies the element of subject matter for which detail is provided". Similarly, in PREPARATION "R(eader) is more ready, interested or oriented for reading N(uclei)". We did not take into account relations with low frequency (a single instance), such as MOTIVATION, JUSTIFICA-TION, ANTITHESIS and PURPOSE. Consequently, we will work with a subcorpus containing 69 relations, where almost half of them are central subconstituents of EVALUATION. 5

Polarity extraction and evaluation
Polarity was extracted from all the discourse structures using a dictionary (v1.0) of words annotated with their semantic orientation: polarity (positive or negative) and strength (from 1 to 5). To do so, the Spanish SO-CAL dictionary (Taboada et al., 2011) was translated using the Elhuyar (Zerbitzuak, 2013) and Zehazki (Sarasola, 2005) bilingual Spanish-Basque dictionaries. Our dictionary contains information about grammatical categories: nouns, adjectives, verbs and adverbs.  As Table (2) shows, the dictionary contains a total of 8,353 words. The majority of words are nouns 5 All the reviews of the corpus were coded, assigning the domain LIB (for literature review) and a number, and each discourse structure extracted from them was also coded: CU stands for text that only contains the central unit of the text, CAUS for texts that contain CAUSE relation, INT for INTERPRETATION, ELAB for ELABORATION, CIR for CIRCUMSTANCE, BACK for BACKGROUND and finally, EVA for EVALUATION. In addition, if the same relation appears more than once in each text, we added letters (e.g., a, b, c) to each relation, to indicate their order of appearance. and adjectives. In terms of polarity, there are more negative words (almost one thousand more).
We created a polarity tagger, based on this dictionary. The polarity tagger used the output of Eustagger (Aduriz et al., 2003), which is a robust and wide-coverage morphological analyzer and a Partof-Speech tagger (POS) for Basque, to enrich the text with a POS analysis information and to assign polarity to every lemma of the dictionary that matches with the lemma and category of the text. With the aim of comparing the results of the system, a linguist annotated the polarity (positive, negative or neutral) of all the discourse structures described in Section (3.1). Figure 1 shows a portion of the RST tree of one text (LIB28). 6 After the full RST analysis was performed for each text, we extracted the following discourse structures: i) the text of the central unit (EDU 2 ), as shown in Example (1), and ii) the central subconstituent of the EVALUATION (EDU 21,22,23,25 ), in Example (2). (1) XIX. mendean Gasteiz inguruak izutu (−3) zituen Juan Diaz de Garaio Sacamantecas pertsonaia hartu (+2) du Aitor Aranak (Legazpi, 1963) bere azken eleberrian (+2) . (LIB28 CU) English: Aitor Arana (Legazpi, 1963) has taken (+2) in his last novel (+2) the character Juan Diaz Garaio Sacamantecas who scared (−3) the surroundings of Gasteiz in the 19th century.

Normalization of semantic orientation results
We normalized the results obtained with the classifier to compare the different discourse structures, as in the following examples: English: I consider (+4) it is like a manual with a small (−1) dose of psychoanalysis, a domestic (+2) consideration (+1) to reflect about (+1) our being (+3) .
The results obtained by the classifier are +112 (LIB10), 7 +10 (LIB26 INT) and −6 (LIB23 CIR), as shown in Table 4. To compare those results among them, we normalized the frequencies dividing these results by the number of the words in each discourse structure. We show the normalized frequencies in Table 4 Table 4 shows how normalization helps to better adjust the weight of the automatically assigned polarities. As a matter of fact, the values are adjusted to a smaller range and, therefore, they are more easily comparable.

Results and error analysis
The results show that using a simple classifier with a manually built dictionary, along with different rhetorical structures, helps to identify the strength of such structures. For example, the result obtained in the central subconstituent of EVALUATION is strong.
We have analyzed the discourse structure with the aim of determining the strongest discourse structures of our corpus and therefore the structures that contribute most to improving sentiment labeling.
Most of the values are between −1 and +1, but in 11.59% of the relations (8 of 69 relations), the values are higher than one (see Table 5).  Table 5: Polarity strength (< +1 and > −1) of central subconstituents.

RR
The most frequent and strongest value is obtained in EVALUATION (18.75%, 6 of 32). After that, the second strongest relation is INTERPRETATION with 16.67% (1 of 6). And, finally, BACKGROUND is once above one (7.69%, 1 of 13).
In contrast, we did not see any case of other central subconstituents with a value higher than one. If we compare partial discourse structures with the results obtained with all words of a text, the strength is lower in all cases. This is because polarity words do not have the same frequency in other rhetorical relations and, as a consequence, the concentration of words with semantic orientation is smaller. The highest value across the texts is +0.50 (LIB35), and the lowest value is −0.1 (LIB28).
These results suggest that opinions and, consequently, words with semantic orientation, are mainly found in the central subconstituent of EVALUATION, INTERPRETATION and BACKGROUND.
Apart from helping to identify the strongest central subconstituents, we have observed that the dictionary together with some central subconstituents can help in sentiment analysis. In fact, assigning a weight to some CSs could help to improve sentiment analysis results, as in text LIB34.
The human annotator marked LIB34 as a negative review and the system assigns a value of +0.15 for the entire text, but a negative value of −0.2 (−5/25=−0.2) for LIB34b EVA, Example (14). If the proper weight was assigned to this CS (LIB34b EVA), the semantic positive orientation of the entire text (LIB34) would be corrected and tagged as negative.
We analyzed the previous finding in all the CSs of EVALUATION, but taking the results of the human annotator, instead of the classifier. In total, in 29 texts, there are 32 CSs of EVALUATION and in 24 of them, the human annotation of polarity of CSs and texts agree. So, the agreement happens in 75% of CSs and 86.20% of texts (25 texts).
Even though most of the times there is agreement between the annotated polarity of CSs and texts, this does not happen in all cases. For example, in other cases, the same text has one positive central subconstituent and another negative central subconstituent of EVALUATION. These cases are 12.50% of central subconstituents and 6.89% of texts (LIB03ab and LIB12ab).
Finally, there are two cases in which the polarity of the central subconstituent of EVALUATION and the polarity of all text are the opposite (LIB02ab and LIB19ab).
In this case, the text LIB19 is negative, whereas examples (15) and (16) are positive. We observe that the change of polarity happens in the EVAL-UATION situated inside an ELABORATION coherence relation.
In Example (17), there are some discourse markers (but, however) and words (doubtful, wear away, not very impressive) that suggest a change of polarity that affects all text. Consequently, this example shows that, apart from central constituents of EVAL-UATION, a deeper analysis of nuclearity assigning different weighs could be necessary in order to improve sentiment analysis.

Error analysis
In this section, we will analyze the errors that can affect accurate detection of sentiment analysis, and specially the ones that were relevant in this study: i) errors in negative reviews, and ii) errors related to syntax. Brooke et al. (2009) mention that lexicon-based sentiment classifiers show a positive bias because humans tend to use positive language (see also Taboada et al. (2017)). We also found this problem by examining the results of the classifier.

Errors in negative reviews
As Table (2) shows, the majority of the words in the dictionary are negative. Therefore, it is expected that we will detect more negative words in the texts. However, the results of the classifier with our dictionary show a tendency to classify texts as positive in different discourse structures of the texts.
For example, this tendency is observed in results of the CS of EVALUATION 8 (see Table 7).   Table 7 demonstrates that the classifier tends to consider as positive the majority of central subconstituents of this rhetorical relation. In fact, 26 of 32 central subconstituents have been classified as positive. Consequently, the correct guess rate in CSs is higher in positive (95%) versus negative (36.36%).

CS of EVALUATION Total
A tendency to positive semantic orientation is higher if we analyze the results of all texts instead of just central subconstituents of EVALUATION as shown in Table 8.  As a consequence of this positive bias, our classifier guesses easily the texts with positive polarity and the correct guess rate is 100%. In contrast, the rate is very low in negative texts, as a matter of fact, there is only one right guess in text LIB28 (−0.1) and consequently, the correct guess rate is 6.67%.
However, if we compare the results of central subconstituents and texts, we can observe another tendency. The rate of correct assignments in positive texts is higher (95% vs. 100%) on the full texts (long text), while for negatives it is higher (36.36% vs. 6.67%) in central subconstituents (short text). This suggests that the tendency to positive semantic orientation is stronger using our dictionary as a bag-of-words approach as the text is longer.
In summary, the dictionary classifier shows the same problem already described in previous research, as there is a strong tendency towards positive semantic orientation, which increases as the text is longer.

Errors related to syntax
As we mentioned in Section 4.1.1, there is a tendency towards positive polarity caused by the use of positive language and, for that reason, the correct guess rate is lower in negative texts. However, it is not the only reason, and information at the syntactic level also affects the results. As an example, we will discuss one particular problem, negation. Due to negation, the polarity of a sentence is changed and it is necessary to take this characteristic into account in sentiment analysis. In Example (18), the semantic orientation of the sentence would be negative but our classifier regards it as positive. The classifier has detected bereganatu 'to get hold of' as a positive word (+4/7=+0.57). But, in this case, a correct analysis should assign it a negative value.
In a first study of our subcorpus of CSs of different rhetorical relations, we estimate that this affects to 11.43% of the constituents, since 8 of 70 CSs have some type of negation.

Conclusions and future work
This study has analyzed whether combining a semantic oriented dictionary with some discourse structure constraints is helpful in sentiment analysis of Basque.
The results show that i) the central subconstituents (CS) of EVALUATION, INTERPRETATION and BACKGROUND are the units with the strongest semantic orientation, and ii) the CSs of EVALUA-TION could help in improving semantic orientation of the texts, given that the results of the human annotation of polarity of CSs and the full text text agree in 75% of the cases.
On the other hand, error analysis has shown that there are some aspects that should be addressed: i) a tendency to positive semantic orientation, and ii) sentence and more discourse level constraints are needed.
In the near future, we plan to pursue the following aspects: i) Do reviews have a specific discourse structure?
We hypothesize that reviews have a specific structure and, consequently, the same discourse relations will be repeated with high frequency, and they will appear in the same place.
ii) How we can weigh properly the central subconstituents of EVALUATION and INTERPRE-TATION, and neutralize the positive tendency, to improve the results for negative reviews?
iii) Are other CSs not linked to the CU important for sentiment analysis?